The study protocol was approved by the Medical Ethics Committee of Zhengzhou University, and the need for informed consent was waived.
Clinical, pathological and CT imaging data of AEG patients who underwent surgical resection after preoperative NAC at the First Affiliated Hospital of Zhengzhou University and Henan Provincial Cancer Hospital were retrospectively collected from November 2014 to September 2020. The patient enrollment criteria included the following: (1) AEG confirmation by gastroscopic biopsy pathology prior to treatment; (2) pre-NAC clinical stage of cT2–4N0–3M0 stage; (3) NAC treatment in 2–6 cycles; (4) lack of other antitumor therapy administered before NAC; (5) enhanced CT scan obtained within 1 week prior to NAC treatment with complete imaging data; (6) lesion covering at least 3 slices on CT cross section and a maximum plane diameter of at least 2 cm; and (7) radical resection performed after NAC treatment with complete postoperative pathology data. The exclusion criteria included: (1) combined history of other malignancies; (2) poor CT image quality or lack of raw DICOM data; (3) adverse event during NAC or less than 2 cycles of NAC; (4) combined heart, lung and other important organ dysfunction in which a CT examination could not be performed; and (5) incomplete CT imaging data or clinical and pathological data.
Patients with AEG from the First Affiliated Hospital of Zhengzhou University were included in this study as the training group (n = 60), and 32 patients with AEG from Cancer Hospital of Zhengzhou University were included as the external validation group (n = 32).
The chemotherapy regimens included the following: (1) XELOX regimen: oxaliplatin given intravenously at 130 mg/m2 for 2 h on Day 1, repeated every 3 weeks; patients received capecitabine at a dose of 1000 mg/m2 (bid, 1–14 d) orally twice daily; (2) SOX regimen: oxaliplatin 130 mg/m2 intravenously + oral capsules 80 mg/m2 in combination for 14 days twice daily. Patients and families signed informed consent forms. The treatment consisted of between 2 and 6 cycles unless disease progression, intolerable toxicity, or death occurred. All patients underwent radical surgical resection within 1 week of the end of NAC treatment.
The clinical data collected in our study included age, sex, carcinoembryonic antigen levels (CEA, normal range 0–5 ng/mL), carbohydrate antigen 199 level (CA199, normal range 0.01–37 U/ml), carbohydrate antigen 125 levels (CA125, normal range 0.01–35 U/ml), and serum albumin levels (normal range 35–55 g/l). TNM staging of tumors performed using CT images was evaluated according to the 8th edition of the American Joint Committee on Cancer (AJCC) staging . The Borrmann typing of the AEG was documented . The postoperative pathological TRG grading was recorded as a criterion for evaluating the efficacy of NAC. A TRG grade of 0 is defined as complete response, with no viable cancer cells remaining in the primary lesion and lymph nodes; TRG grade 1 is defined as moderate response, with single or small clusters of cancer cells remaining in the lesion; TRG grade 2 is defined as mild response, with significant disappearance of cancer cells under the microscope but some amount of cancer cells remaining, but with less interstitial fibrosis; and TRG grade 3 is defined as poor response, with no significant disappearance of cancer cells under the microscope or only a few cancer cells remaining. Patients were divided into the pCR group (tumor regression grade [TRG] = grade 0) and the nonpCR group (TRG = grade 1–3) based on the postoperative pathological histological TRG evaluation. In the training group, there were 19 patients in the pCR group and 41 patients in the nonpCR group. In the external validation group, there were four patients in the pCR group and 28 patients in the nonpCR group. The clinical characteristics of the enrolled patients are summarized in Table 1.
CT image acquisition
All patients underwent contrast-enhanced CT scans, and informed consent forms were signed before inspection. The CT scans were acquired with a 64-row CT scanner (Discovery CT 750 HD, GE Healthcare, Waukesha, WI, United States) or a 256-row CT scanner (Revolution CT, GE Healthcare, Waukesha, WI, United States). Preparation for the examination occurred as follows: Patients fasted for more than 8 h before the examination and were given an intramuscular injection of scopolamine 10–20 mg 15–20 min before the examination to reduce gastrointestinal motility (Hangzhou Minsheng Pharmaceutical PG Roup Co., Ltd., Specifications: 10 mg/ml) and breath-holding exercises were performed. The patients also drank 800–1000 mL of warm water 10–15 min before the examination. The scanning parameters were as follows: tube voltage 120 kV; tube current using automatic milliampere second technology with a pitch of 1.375/1.1; field of view (FOV) of 500 mm; matrix 512 × 512 mm; and a scan thickness of 0.625–5 mm with scan spacing from 0.625 to 5 mm. The scan area at least encompassed the lower esophagus to the lower border of both kidneys. For the enhancement scan, 90–100 mL of nonionic contrast agent was injected through the elbow vein using a high-pressure syringe (iopromide, 370 mg/mL, GE Medical Systems, 1.5 mL/kg and 3 mL/s). Using the low-dose trigger technique, when the descending aorta reached 100 HU after the injection of contrast medium, arterial phase images were collected 10 s later, and venous phase images were collected at intervals of 30 s.
Image processing and ROI segmentation
The CT images in arterial and venous phases were isotropically resampled by using trilinear interpolation in Artificial Intelligence Kit software (A. K, version: 3.3.0. R, GE Healthcare, USA) with a voxel size of 1 × 1 × 1 mm to minimize the effect of different scanning protocols or equipment on the radiomics features . Region of interest (ROI) segmentation was performed by delineating around the tumor outline for the largest cross-sectional area in the CT axial plane (Fig. 1). Care was taken to avoid the gastric cavity and stomach contents, fatty tissue around the stomach wall and blood vessels when segmenting. Each ROI was outlined by a radiologist (L.C. 6 years of experience in abdominal imaging diagnosis) and supervised by a radiologist (Z.H., 8 years of experience in abdominal imaging diagnosis). To ensure the reliability and reproducibility of the radiomics features, 30 patients were randomly selected for their data to be segmented. For an analysis of interobserver agreement, a radiologist (L. CC) conducted the first-time whole-dataset segmentation, and another radiologist (H.W., 7 years of experience in abdominal imaging diagnosis) who was supervised by a radiologist (L.L., 9 years of experience in abdominal imaging diagnosis) delineated the images of the 30 selected patients during the same period. For analysis of intraobserver agreement, the radiologist (L. CC) repeatedly conducted segmentation 1 month after the first delineation.
Radiomics feature extraction
The radiomics features were automatically extracted by using the Python package Pyradiomics . A total of 1409 radiomics features were separately extracted from the delineated ROI in the arterial and venous phases. There were 107 features extracted from the original images, including 32 first-order features (18 intensity statistical and 14 shape features). Among the 75 textural features, there were 24 gray-level cooccurrence matrix (GLCM), 16 gray-level run length matrix (GLRLM), 16 gray-level size zone matrix (GLSZM), 14 gray-level dependence matrix (GLDM) and 5 neighboring gray tone difference matrix (NGTDM) features. In addition, the same number of first-order grayscale statistical features and texture features were extracted based on different transformed images. A total of 744 features were extracted based on wavelet decomposition images with 8 filter channels, 279 features were extracted based on Laplacian of Gaussian (LoG) transform images (sigma parameters selected as 1.0 mm, 3.0 mm and 5.0 mm), and 279 features were extracted based on local binary pattern (LBP)-filtered images (2nd-order spherical harmonic function, spherical neighborhood operator with radius 1.0 and fine fraction 1) . The features were extracted by discretizing the CT values of the ROI region based on a fixed interval width (bin width = 25 HU). Then, the intra/interclass correlation coefficients (ICCs) were calculated based on the features extracted from the 30 randomly selected patients. The features with intra- and interobserver ICC values simultaneously greater than 0.7 were retained for assessment of arterial- and venous-phase features, respectively .
Radiomics feature selection and model establishment
The training dataset was used for feature selection and modeling, and the same procedure and set of parameters were applied to the external dataset for model validation.
The same method described above was used to perform feature preprocessing and feature screening in the arterial and venous phases and to build independent arterial and venous radiomics models. The feature selection and modeling were performed as follows.
Outlier processing occurred with values greater than the third quartile + 2 × quartile distance being converted to the 95th percentile; values less than the first quartile − 2 × quartile distance were converted to the 10th percentile.
Features with relatively low variance values less than 1.0 were excluded.
Missing data were filled with the median value, and the Z Score standardization method was applied for data standardization and normalization.
The less redundant features were retained by using correlation analysis with a cutoff value of 0.9.
Features with importance coefficients greater than the maximum importance coefficient/3 assessed using the gradient-boosted decision trees (GBDT) feature importance ranking based on decision tree methods were retained.
Radiomics models were established using the naïve Bayes classifier, and the predicted probability of the model output was used as the Radscore for each individual model.
Based on the above feature screening and modeling methods, our study developed two radiomics models based on pCR outcome: (1) an arterial radiomics model with Radscore_AP_pCR and (2) a venous radiomics model with Radscore_VP_pCR.
Clinical feature selection and model establishment
The clinical features were screened using GBDT (selection of the top three features of importance) and univariate logistic regression (p < 0.1). Model building was performed using the naïve Bayes classifier, and the predicted probability of the model output was used as the model score of the clinical model. Based on the above feature screening and modeling methods, clinical models for predicting pCR (Score_clinic_pCR) were established.
Combined model establishment
Based on the established Radscores and clinical factors, radiomics–clinical combined models were developed using the naïve Bayes classifier. Four combined models were built, including the following: (1) arterial–venous combined model; (2) arterial–clinical combined model; (3) venous–clinical combined model; and (4) arterial–venous–clinical combined model.
Evaluation of model predictive performance
The performance of the model was evaluated by using receiver operating characteristic curve (ROC) analysis to obtain the area under the ROC curve (AUC). The sensitivity, accuracy, negative predictive value and positive predictive value were calculated from the cutoff values of the model score corresponding to the maximum Youden index to evaluate the discriminatory performance of the model. The cutoff value of the training group data was applied to the validation group data to obtain their corresponding discriminatory efficacies in the external validation group. The calibration ability of the model was mainly tested using calibration curve analysis and the Hosmer–Lemeshow test for goodness of fit (p > 0.05 indicates no significant difference between the predicted and actual values). To compare the AUC of different models, Delong’s test was applied (p < 0.05 indicates a significant difference). A decision curve analysis (DCA) was used to assess the net clinical benefit or clinical utility obtained by the model at different threshold probabilities.
Statistical analysis was performed using R software (version 3.6.3, http://www.r-project.org). Continuous variable comparisons between two groups were made using independent samples t tests (for data conforming to a normal distribution) or Mann–Whitney U tests (for data not conforming to a normal distribution). Categorical variables were tested by the chi-square test or Fisher’s exact test. A two-sided p value of < 0.05 was considered statistically significant. The following R packages were applied: “icc” for intra/interclass correlation coefficient; “glmnet” for logistic regression; “pROC” for ROC analysis; “rmda” for DCA; calibration function in the “rms” package for calibration analysis, “gbm” for GBDT feature importance analysis, “naïveBayes” function for Naïve Bayes classifier, “adabag” package for AdaBoost classifier, “e1071” package for SVM classifier, and “rpart” package for decision tree classifier.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.