Development data set
The development cohort was derived from the Chronic Kidney Disease Research of Outcomes in Treatment and Epidemiology (CKD-ROUTE) study . This was a prospective, observational cohort study in Japan. Written informed consent was obtained from all patients, which was mentioned in their paper . The CKD-ROUTE study was approved by the ethics committees of Tokyo Medical and Dental University, School of Medicine. All the population was in stage G2–G5 CKD according to the Kidney Disease Improving Global Outcomes (KDIGO) classification and was not undergoing dialysis . Patients who were newly visiting nephrology centres from October 2010 to December 2011 and older than 20 years old were included. Subjects were excluded if they had malignancy, transplantation, active gastrointestinal bleeding or no written informed consent. Over 1000 participants were recruited at the Tokyo Medical and Dental University Hospital and its 15 affiliated hospitals. Participants visited the hospital every 6 months for assessment of their clinical status. The observation duration was 22.91 ± 14.60 months with a range from 1 to 39 months. All the CKD-ROUTE data were from the Dryad data package  of its original publication  from the Dryad Digital Repository, a public resource that provides discoverable, freely reusable, and citable data. For this analysis, no informed consent was required from CKD-ROUTE patients since all the data were deidentified, and the Ethics Committee of the First Hospital of Foshan in China approved the study with Number 64 in 2020.
Validation data set
The validation consisted of internal validity and external validity. The validation cohort was also derived from the CKD-ROUTE study. The external validation was from retrospective data collected at the First Hospital of Foshan in China from January 2013 to December 2018. Patients with CKD stage G2-G5 hospitalized in Foshan Hospital were included, but those with malignancy, transplantation, active gastrointestinal bleeding and without follow-up visits in our hospital were excluded. Finally, 297 patients in total were recruited. All patients in Foshan Hospital in China also provided written informed consents.
At the time of enrolment, candidate dependent variables were selected by literature research, previous studies  and the data we attained in the CKD-ROUTE study, including age, sex, body mass index(BMI), aetiology of CKD, blood pressure, albumin, haemoglobin, eGFR, dipstick proteinuria, case history, CKD stages and urinary occult blood. Biochemical variables were collected by testing blood and urine samples. The eGFR in all populations was calculated from the formula: eGFR = 194 × serum creatinine − 1.094 × age − 0.287 (if female, × 0.739). It was calculated using the modified three-variable Modification of Diet in Renal Disease equation developed by the Japanese Society of Nephrology . CKD was classified according to the Kidney Disease Improving Global Outcomes(KDIGO) guideline , which is defined as G2, G3, G4 and G5 if the corresponding eGFR (mL/min/1.73 m2) is 60–89, 45–59, 30–44,15–29 and < 15. In addition, none of the patients were undergoing dialysis. Dipstick proteinuria was defined as − 1 or 0 as negative or trace protein by dipstick urinary test at enrolment. Dipstick proteinuria 1 to 4 represented the degree of proteinuria from 1 to 4.
Consistent with the CKD-ROUTE study, the primary endpoint was CKD adverse outcomes, which were defined as > 50% eGFR loss, initiation of dialysis in ESRD, cardiovascular events (CVEs), and all-cause death. CVEs included ischaemic heart disease, congestive heart failure, peripheral arterial disease, or stroke.
Usually, the effective sample size is often associated with the number of outcome events. According to previous rules and experience, at least 10 events should be ensured per candidate predicted variable before variable selection [17,18,19].
The CKD-ROUTE dataset was randomly divided into two cohorts with the R software (i386 3.5.3)– development dataset (70% of the total data) and internal validation dataset (30% of the total data). The dataset from the First Hospital of Foshan was used for the external validity. Baseline continuous characteristics of all datasets are presented as the means ± standard deviation and were compared by paired Student’s t-test if the data were normally distributed or by the paired rank-sum test for non-normally distributed data. Other categorical data between two groups are expressed as numbers and percentages and were compared by using the paired chi square test. All probabilities were two-tailed and the level of significance was 0.05.
To develop the model, first, we tested the associations between potential variables and CKD adverse outcomes by univariable and multivariable Cox proportional hazards models with SPSS version 22.0 (Chicago, IL, USA). P values of < 0.05 were considered statistically significant. Next, a predictive nomogram was developed with variables selected from the Cox analysis with R software.
The validation of our model was tested with different methods in different aspects.
First, the discriminations, the ability of the model to separate individuals who develop events from those who do not, was evaluated by the C-statistic in all three data sets, which is defined as perfect, good, moderate and poor if the corresponding figure is 1, > 0.8, 0.6–0.8 and < 0.6, respectively . The C-statistics for the predictor eGFR were also calculated in these three data sets.
Second, the calibration (or goodness-of-fit)  was tested by calibration plots. The calibration was good if the calibration line between the predicted probability and the observed outcome fitted to the ideal standard line(y = x).
Third, decision curve analysis (DCA) was used to test the clinical value of our model and visualize the potential net benefit of the model . The model DCA was compared with the DCA of predictor eGFR and other variables.
All statistical analyses above were performed using R software. All probabilities were two-tailed and the level of significance was 0.05.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.