Study participants

The study was approved by the Internal Ethics Committee of Nanfang Hospital, Southern Medical University (NFEC-2017-049), and all women provided written informed consent for the use of their data in ongoing research before the blood draw. This was a nested case-control study. During the study period (Jan 1, 2016, and Aug 31, 2019), 3600 naturally conceived singleton pregnant women at 12+ 0 ~ 27+ 6 gestational weeks were recruited from three independent hospitals, namely, Nanfang Hospital of Southern Medical University (SMU), the Third Affiliated Hospital of Sun Yat-sen University (SYSU), and Cangzhou People’s Hospital. All plasma samples were subject to routine noninvasive prenatal testing.

Inclusion and exclusion criteria

Based on pregnancy outcomes and neonatal birth weight, pregnancies were classified into the macrosomia and control groups. The exclusion criteria were as follows: 1) gestational age at blood collection (less than 12 weeks or more than 28 weeks); 2) maternal overweight or obesity (BMI over 25 before pregnancy); 3) multiple pregnancy; 4) singleton pregnancy with positive results on NIPT and ultrasound scans; 5) premature delivery; 6) birth weight below 2500 g and 7) lost in the follow-up. Pregnancies meeting any of these criteria were excluded. We identified macrosomia cases and controls by retrospectively analyzing the participant follow-up results, including pregnancy outcomes and neonatal birth weight. According to the gestational weeks at blood collection and the sex of the fetus, each macrosomia case was randomly matched to four selected control cases.

In total, 810 NIPT samples (162 macrosomia and 648 controls) were selected for further evaluation. Macrosomia was identified is cases with a birth weight beyond 4000 g.

Sample processing and next-generation sequencing

Since pre-analytical factors can significantly affect cfDNA analysis [21, 22]. To guarantee that the samples from different hospitals would not be influenced by pre-analytical factors (e.g. storage temperature and time before processing), we have formulated protocols for quality control. The peripheral blood samples were collected from each participant in cfDNA BCT tubes, and centrifuged for 10 minutes at 1600×g to collect the plasma. And then, to remove the residual cellular fragments, plasma samples were centrifuged at 16,000×g for 10 minutes. All plasma samples were stored frozen at ≤ − 80 °C, and the cfDNA was extracted from those frozen samples by using the QIAamp DNA Blood Mini kit (Qiagen) by following the manufacturer’s instructions.

To construct DNA library, a total of 40.5 μL of extracted DNA was needed by means of TruSeq DNA Sample Prep reagents (Illumina, Paris). The DNA libraries were measured by using Qubit (Life Technocologies), and the integration of DNA were verified by using Agilent Bioanalyzer 2100 (Agilent Technologies). The purified libraries from twelve different individual samples were pooled, and massively parallel sequencing were performed on the Ion Proton sequencing platform (Life Technocologies) or the NextSeq500 sequencing platform (Illumina). The DNA sequencing was performed at a depth of 0.3× average coverage [23].

Sequence analysis

After removal of sequencing reads with low quality, sequencing reads were aligned to the hg19 human reference genome using BWA-MEM [24], and PCR duplicated were removed SAMtools (ver. 1.2) [25]. Read counts of regions ranging from − 1000 bp to + 1000 bp around transcription start sites (TSS), defined as the primary transcription start site (pTSS), were calculated using BEDTools (ver. 2.17.0) and then normalized using the following formula:

$$mathrm{Normalized} mathrm{pTSS}=frac{mathrm{Reads} mathrm{at} mathrm{pTTS}}{mathrm{Totally} mathrm{mapped} mathrm{reads}times mathrm{length} mathrm{of} mathrm{pTTS}left(2mathrm{kb}right)}$$


Prediction model construction and validation

To obtain effective classifiers for predicting pregnancies with fetal macrosomia, a three-stage workflow was designed, including exploration of genes with differential promoter profiling (discovery stage), construction of classifiers (training stage), and evaluation the performance of classifiers (validation stage) (Fig. 1).

Fig. 1
figure 1

Study design flowchart for obtaining the macrosomia classifiers. a Samples collected from SMU. b Samples collected from SYSU. c Samples collected from Cangzhou People’s Hospital

At the discovery stage, we first sequenced cfDNA from maternal plasma samples collected from 47 macrosomia cases and 47 gestational age-matched controls, and the coverage at the pTSSs was compared between the two groups. P-values were then calculated using the Wilcoxon rank sum test and then adjusted to FDR using R software. pTSSs with fold change ≥1.3 and FDR ≤ 0.05 were considered significantly changed.

At the training stage, two machine learning models, including support vector machine (SVM) and logistic regression (LR), were used to develop promoter profiling-based classifiers to distinguish macrosomia cases from controls. To develop classifiers, a stepwise method was used to identify promoter combinations among genes showing differential coverage at the pTSSs. The robustness of the classifiers was assessed using leave-one-out cross-validation (LOOCV) [20, 23]. In brief, each subject in the training cohort was excluded from the training model in turn, with the remaining subjects all being submitted to train the model. The trained model was then used to predict the class (pregnancies with complications or controls) of the withheld subject. This procedure continued until all subjects in the training cohort were classified. The performance of each classifier was evaluated by using receiver operating characteristic (ROC) analysis, including area under curve (AUC), accuracy, sensitivity, and specificity. The classifier that achieved the maximum value of AUC in the training cohort, was considered to be the optimal classifier.

At the validation stage, for further evaluation, the efficiency of optimal prediction classifier was assessed in three validation cohorts, separately. The composition of internal cohort was samples collected from SMU, and cohorts composed of samples collected from SYSU and Cangzhou People’s Hospital were considered as external validation.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.


This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (