Study population and design

The European Prospective Investigation into Cancer and Nutrition (EPIC) is a cohort study of 519,978 volunteers from 23 centres in 10 countries, who were recruited between 1992 and 2000. The study design and methods of the EPIC study have been described in detail elsewhere [20]. Our current analysis uses data of all EPIC centres from Denmark, Germany and Spain and two centres from Italy (Florence and Varese). These cohorts included mainly volunteers from the general population aged 35 to 65 years. Exceptions are Spain, where participants were mostly blood donors, and Denmark, where the age range of participants at enrolment was 50 to 65 years [20]. The design of our current analysis consists of two steps: First, we used data of the full cohort population to estimate individual BMI-trajectories across age. These trajectories were subsequently used to derive the predicted mean BMI between ages 20 to 50 years, which served as a measure of cumulative BMI during early to mid-adulthood for each participant. Second, we restricted the data to patients who went on to develop colorectal or breast cancer during follow-up and performed a survival analysis to estimate the effect of cumulative BMI and cardiometabolic comorbidities on cancer survival (Fig. 1).

Fig. 1
figure 1

Flow chart with number of excluded and eligible participants of the study population

Data collection

Each EPIC centre collected questionnaire data on lifestyle and health factors and anthropometric measurements at enrolment [20]. Up to three weight assessments were available for each participant, including weight measurements at enrolment, self-reported weight at follow-up, which was obtained on average 5 years after enrolment [21], and self-reported weight at age 20, which was assessed retrospectively in the baseline questionnaire. The respective BMI was calculated using height that was measured at enrolment.

In this analysis, we defined CMD as the combination of one or more comorbidities among self-reported history of T2D and CVDs at recruitment into the EPIC cohort, and incident events of T2D and non-fatal CVDs during follow-up between 1992 and 2007. Incident cases of T2D were ascertained and verified at each participating centre by a combination of self-report, linkage to primary-care registers, secondary-care registers, medication use (drug registers), hospital admissions and mortality data, and national diabetes and pharmaceutical registries [22]. Incident cardiovascular events included the following diagnoses according to the International Classification of Diseases (ICD-10): Myocardial infarction (I21, I22), angina (I20) or other coronary heart disease (I23-I25), haemorrhagic stroke (I60-I61), ischaemic stroke (I63), unclassified stroke (I64) and other acute cerebrovascular events (I62, I65-69, F01) [23]. First non-fatal coronary events were ascertained by different methods depending on the follow-up procedures by centre, using active follow-up through questionnaires or linkage with morbidity and hospital registries, or both. Validation was performed by retrieving and assessing medical records or hospital notes, contact with medical professionals, retrieving and assessing death certificates, or verbal autopsy [23].

The EPIC cohort was followed up for cancer diagnoses using linkages with population-based cancer registries in Denmark, Italy, and Spain, and based on active follow-up in Germany. Patients were identified according to the International Classification of Diseases for Oncology (ICD-O-3) with the codes C50 for breast cancer and C18-C20 for colorectal cancer sites. Stage of disease at diagnosis as available from the different study centres was harmonized into categories for localised or advanced (regional and distant) tumours.

All-cause mortality was collected by study centres using record linkages with cancer registries, boards of health and death indices in Denmark, Italy and Spain or through active follow-up (inquiries by mail/telephone, municipal registries/regional health departments, physicians/hospitals) in Germany. The data used in the present study includes follow-up of study participants from baseline (1992–2000) until December 2009 to December 2013 for countries with record linkage. For Germany, the end of follow-up was the last known contact with study participants (December 2009).

Information on smoking (never or ever), level of education (primary, secondary or tertiary) and average lifetime use of alcohol (g/day) was retrieved from a standardised dataset of EPIC lifestyle questionnaires at enrolment [20]. Alcohol consumption was substituted by a variable for alcohol use at recruitment (g/day) from the EPIC dietary questionnaire for 13 cancer patients where data on lifetime use was missing. A variable for occupational and recreational physical activity was created by collapsing the summary index of physical activity derived from the questions used in EPIC into two categories (inactive or active) [24].

Statistical analysis

Estimation of individual BMI-trajectories

In the first step of the analysis, we estimated individual-specific BMI-trajectories based on the repeated BMI assessments of each participant of the full cohort using a growth curve model. Participants aged younger than 20 years at recruitment, with fewer than two BMI assessments during follow-up, extreme anthropometric values [25], or no eligible BMI measurement after the exclusion of measurements taken in the year before a diagnosis of cancer were excluded (Fig. 1).

We used a nested linear mixed effects model with a quadratic polynomial of age, notated as

$$BM{I}_{ijk}={beta }_{0}+{u}_{0k}+{v}_{0jk}+left({beta }_{1}+{u}_{1k}+{v}_{1jk}right)cdot Ag{e}_{ijk}+{beta }_{2}cdot Ag{e}_{ijk}^{2}+{epsilon }_{ijk}$$

to model the BMI measurement i of a patient j from country k as a function of age where u and v denote the random intercept and slope. Separate models were fit for males and females. The resulting individual quadratic functions of age were used to derive the BMI-related variables of cumulative exposure before cancer diagnosis for each participant. The predicted mean BMI was defined as the integral of the BMI trajectory between ages 20 to 50 years divided by 30 years.

Survival analysis

In a second step, we only included participants with incident cancers of the colorectum or the female breast in the survival analysis. Further inclusion criteria were cancer diagnosis at age 50 or older, non-missing information on vital status and (non-zero) follow-up time, CVD and T2D follow-up until cancer diagnosis and availability of information on stage at diagnosis and other adjusting variables (Fig. 1).

Cox proportional hazard regression was used to estimate hazard ratios (HRs) and 95% confidence intervals (CIs) for mortality in breast and colorectal cancer patients with years since diagnosis as the time scale.

Cox analyses were performed for mean BMI and CMD separately and for both exposures combined. This allowed for a qualitative evaluation of dependencies between the two variables. The models were stratified by age at diagnosis of cancer (categories for 50–69 years and 70 years or older), country and sex (for colorectal cancer). Models were adjusted for smoking status, physical activity, alcohol consumption and educational level at recruitment. Subgroup analyses by stage at diagnosis were performed to investigate differential effects comparing patients with localised or advanced disease. The proportional hazards assumption was assessed using the Grambsch-Therneau test. Likelihood ratio tests were used to test if the model fit could be improved by including CMD in addition to BMI or, vice versa, by including BMI in addition to CMD. To analyse potential non-linear effects of mean BMI, we repeated the Cox analyses using the same adjustment factors (including CMD) and estimated penalised B-splines with four degrees of freedom for the BMI variable. The resulting models were compared to models with a constant effect of mean BMI using likelihood ratio tests. To explore potential biases that would be introduced depending on the mechanism of missing data, we compared patients with missing information on stage at diagnosis with patients with a diagnosis of localised and advanced stage disease regarding their patient characteristics and their survival based on Kaplan–Meier curves.

Statistical tests with P-values below or equal to 0.05 were considered statistically significant. All analyses were carried out using the R statistical software version 3.6.1. In particular, the package nlme version 3.1–140 for linear mixed models and the package survival version 3.1–12 for Cox proportional regression including the function pspline for penalised B-splines [26, 27].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.


This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (

By admin