# Performance of whole-genome promoter nucleosome profiling of maternal plasma cell-free DNA for prenatal noninvasive prediction of fetal macrosomia: a retrospective nested case-control study in mainland China – BMC Pregnancy and Childbirth

Sep 10, 2022

### Study participants

The study was approved by the Internal Ethics Committee of Nanfang Hospital, Southern Medical University (NFEC-2017-049), and all women provided written informed consent for the use of their data in ongoing research before the blood draw. This was a nested case-control study. During the study period (Jan 1, 2016, and Aug 31, 2019), 3600 naturally conceived singleton pregnant women at 12+ 0 ~ 27+ 6 gestational weeks were recruited from three independent hospitals, namely, Nanfang Hospital of Southern Medical University (SMU), the Third Affiliated Hospital of Sun Yat-sen University (SYSU), and Cangzhou People’s Hospital. All plasma samples were subject to routine noninvasive prenatal testing.

### Inclusion and exclusion criteria

Based on pregnancy outcomes and neonatal birth weight, pregnancies were classified into the macrosomia and control groups. The exclusion criteria were as follows: 1) gestational age at blood collection (less than 12 weeks or more than 28 weeks); 2) maternal overweight or obesity (BMI over 25 before pregnancy); 3) multiple pregnancy; 4) singleton pregnancy with positive results on NIPT and ultrasound scans; 5) premature delivery; 6) birth weight below 2500 g and 7) lost in the follow-up. Pregnancies meeting any of these criteria were excluded. We identified macrosomia cases and controls by retrospectively analyzing the participant follow-up results, including pregnancy outcomes and neonatal birth weight. According to the gestational weeks at blood collection and the sex of the fetus, each macrosomia case was randomly matched to four selected control cases.

In total, 810 NIPT samples (162 macrosomia and 648 controls) were selected for further evaluation. Macrosomia was identified is cases with a birth weight beyond 4000 g.

### Sample processing and next-generation sequencing

Since pre-analytical factors can significantly affect cfDNA analysis [21, 22]. To guarantee that the samples from different hospitals would not be influenced by pre-analytical factors (e.g. storage temperature and time before processing), we have formulated protocols for quality control. The peripheral blood samples were collected from each participant in cfDNA BCT tubes, and centrifuged for 10 minutes at 1600×g to collect the plasma. And then, to remove the residual cellular fragments, plasma samples were centrifuged at 16,000×g for 10 minutes. All plasma samples were stored frozen at ≤ − 80 °C, and the cfDNA was extracted from those frozen samples by using the QIAamp DNA Blood Mini kit (Qiagen) by following the manufacturer’s instructions.

To construct DNA library, a total of 40.5 μL of extracted DNA was needed by means of TruSeq DNA Sample Prep reagents (Illumina, Paris). The DNA libraries were measured by using Qubit (Life Technocologies), and the integration of DNA were verified by using Agilent Bioanalyzer 2100 (Agilent Technologies). The purified libraries from twelve different individual samples were pooled, and massively parallel sequencing were performed on the Ion Proton sequencing platform (Life Technocologies) or the NextSeq500 sequencing platform (Illumina). The DNA sequencing was performed at a depth of 0.3× average coverage [23].

### Sequence analysis

After removal of sequencing reads with low quality, sequencing reads were aligned to the hg19 human reference genome using BWA-MEM [24], and PCR duplicated were removed SAMtools (ver. 1.2) [25]. Read counts of regions ranging from − 1000 bp to + 1000 bp around transcription start sites (TSS), defined as the primary transcription start site (pTSS), were calculated using BEDTools (ver. 2.17.0) and then normalized using the following formula:

$$mathrm{Normalized} mathrm{pTSS}=frac{mathrm{Reads} mathrm{at} mathrm{pTTS}}{mathrm{Totally} mathrm{mapped} mathrm{reads}times mathrm{length} mathrm{of} mathrm{pTTS}left(2mathrm{kb}right)}$$

(1)

### Prediction model construction and validation

To obtain effective classifiers for predicting pregnancies with fetal macrosomia, a three-stage workflow was designed, including exploration of genes with differential promoter profiling (discovery stage), construction of classifiers (training stage), and evaluation the performance of classifiers (validation stage) (Fig. 1).

At the discovery stage, we first sequenced cfDNA from maternal plasma samples collected from 47 macrosomia cases and 47 gestational age-matched controls, and the coverage at the pTSSs was compared between the two groups. P-values were then calculated using the Wilcoxon rank sum test and then adjusted to FDR using R software. pTSSs with fold change ≥1.3 and FDR ≤ 0.05 were considered significantly changed.

At the training stage, two machine learning models, including support vector machine (SVM) and logistic regression (LR), were used to develop promoter profiling-based classifiers to distinguish macrosomia cases from controls. To develop classifiers, a stepwise method was used to identify promoter combinations among genes showing differential coverage at the pTSSs. The robustness of the classifiers was assessed using leave-one-out cross-validation (LOOCV) [20, 23]. In brief, each subject in the training cohort was excluded from the training model in turn, with the remaining subjects all being submitted to train the model. The trained model was then used to predict the class (pregnancies with complications or controls) of the withheld subject. This procedure continued until all subjects in the training cohort were classified. The performance of each classifier was evaluated by using receiver operating characteristic (ROC) analysis, including area under curve (AUC), accuracy, sensitivity, and specificity. The classifier that achieved the maximum value of AUC in the training cohort, was considered to be the optimal classifier.

At the validation stage, for further evaluation, the efficiency of optimal prediction classifier was assessed in three validation cohorts, separately. The composition of internal cohort was samples collected from SMU, and cohorts composed of samples collected from SYSU and Cangzhou People’s Hospital were considered as external validation.