Study subjects

Cancer-specific case-control genome-wide association studies (GWASs)

The current MR analysis was comprehensively performed by leveraging information from ten GWASs totaling 602,435 participants of European ancestry, including 297,699 cancer cases and 304,736 controls across the bladder, breast, colorectal, esophagus, lung, oral and pharynx, ovarian, pancreatic, prostate, and kidney cancer. The characteristics of each cancer-specific GWAS including sample sizes and data sources are illustrated in Additional file 1: Table S1.

Briefly, as outcomes of interest, we collected available GWAS data across ten cancers. For summary-level GWAS data of 4 cancers (i.e., breast, ovarian, prostate, and lung cancer), quality control procedures and population details have been described elsewhere [9,10,11,12]. For six cancers (bladder, colorectal, esophagus, oral and pharynx, pancreatic, and kidney cancer) which we had access to individual-level genotyping data [13,14,15,16,17,18,19,20,21,22,23], we performed stringent quality control procedures of population via removing unexpected duplicates or probable relatives based on pairwise identity by descent, guaranteeing all individuals to be of European ancestry.

UK Biobank cohort data

The UK Biobank cohort was a prospective population-based study that recruited 502,528 adults aged 40–69 years from the general population between April 2006 and December 2010. The study protocol and information about data access are available online (, and more details of the recruitment and study design have been published in previous studies [24]. The UK Biobank resource used by this study was under Application #45611.

After the quality control of the following population: (i) excluded individuals with prevalent cancer (except non-melanoma skin cancer, based on the International Classification of Diseases, 10th revision [ICD-10, C44]) at baseline; (ii) excluded individuals of sex discordance; (iii) excluded outliers for genotype missingness or excess heterozygosity; (iv) retained unrelated participants; (v) restricted to “white British” individuals of European ancestry; and (vi) removed individuals who decided not to participate in this program, a total of 355,543 participants remained for analysis. Moreover, we defined the ten cancers using the ICD-10 codes (Additional file 1: Table S2). The follow-up time was calculated from baseline assessment to the first diagnosis of cancer, loss to follow-up, death, or last follow-up (December 14, 2016), whichever occurred first.

Information on dietary vitamin E intake in UK Biobank participants was retrieved from data field #100025 (description: vitamin E; category: estimated nutrients yesterday—diet by a 24-h recall—online follow-up). Measurements were performed at baseline (2006–2010) and/or subsequent follow-up visits. In the present study, we included 49,579 individuals (23,107 males and 26,472 females) with baseline vitamin E measurements.

Two-sample MR analysis and sensitivity analysis of cancer-specific GWAS

Based on cancer-specific GWAS databases, depends on the availability of data, we applied a summary statistics-based approach to all cancers, and additionally, a genetic risk score (GRS)-based approach to some of the cancers (bladder, colorectal, esophagus, oral and pharynx, pancreatic, and kidney cancer), followed by sensitivity analysis.

Instrumental variable (IV) selection

Circulating vitamin E was the main exposure of interest. We collected 3 independent GWAS-identified circulating vitamin E-associated single-nucleotide polymorphisms (SNPs; rs964184, rs11057830, and rs2108622) from a large GWAS available to date [25], which met the following criteria as instruments for MR analysis: (i) reported P-value < 5.00×10-8, (ii) minor allele frequency (MAF) ≥ 0.05, (iii) call rate ≥ 95%, and (iv) Hardy-Weinberg equilibrium (HWE) P-value in controls ≥ 1×10-6 (Additional file 1: Table S3). The online web tool mRnd ( was used to estimate statistical power [26]. To calculate the minimum detectable effect size, we set 80.0% statistical power and 5.0% alpha level and used the proportion of circulating vitamin E variance (R2, i.e., 1.7% estimated by Major et al.) explained by the 3 IVs as calculated in the previous GWAS [25, 27]. We further quantified the strength of IVs by F-statistics, where F-statistics > 10 provided good evidence for the IV being a strong instrument [28].

Summary statistic-based method

The summary statistics-based methods, including an inverse variance weighting (IVW) method and a likelihood-based method, were primarily used to infer causal associations. The formula of IVW method was as follows: ({beta}_{IVW}=frac{sum_{i=1}^k{beta}_{Xi}{beta}_{Yi}{sigma}_{Yi}^{-2}}{sum_{i=1}^k{beta}_{Xi}^2{sigma}_{Yi}^{-2}}); ({SE}_{IVW}=sqrt{frac{1}{sum_{i=1}^k{beta}_{Xi}^2{sigma}_{Yi}^{-2}}}), where i is the ith SNP, βXi, and σXi are the estimate and standard error of genetic association with the exposure that were derived from IVs, and βYi and σYi are the estimate and standard error of genetic association with the outcome that were derived from cancer-specific GWAS. In addition, we adopted the likelihood-based method, which can be used to obtain appropriately sized confidence intervals when there is considerable imprecision in the estimates.

GRS-based method

We further constructed a weighted GRS to integrate the genetic effects of candidate SNPs on the exposure of interest for available individual-level genotyping data. We summed three circulating vitamin E-associated SNPs weighted by corresponding effect sizes using the formula: (mathrm{GRS}={sum}_{i=1}^n{beta}_i{mathrm{SNP}}_{mathrm{i}}), where n is the number of SNPs, SNPi is the number of risk alleles (0, 1, 2) carried by the ith SNP, and βi is the previously published regression coefficient for ith SNP. We then evaluated the association of circulating vitamin E-GRS with cancer risk through the logistic regression model with adjustment for sex, age, and the first ten principal components when appropriate.

Multiple testing correction was performed by false discovery rate (FDR) method using the “p.adjust” function in R software.

Sensitivity analysis

Estimates from MR can only be reliably interpreted when three model assumptions are valid, including (i) the IVs are associated with exposure variables, (ii) the IVs are not related to other confounding factors, and (iii) the IVs only influence outcome variables through their effects on exposure variables. Therefore, we performed heterogeneity analysis and MR-Egger regression analysis to evaluate the potential violation to the second and third assumptions. The heterogeneity test was used to assess whether a genetic variant’s effect on cancer risk was proportional to its effect on circulating vitamin E. MR-Egger regression (MR-Egger intercept test) was fitted to evaluate the presence of horizontal pleiotropy. We additionally conducted a leave-one-out analysis where we excluded one SNP at a time and performed IVW analysis on the remaining two SNPs to evaluate the robustness of our results.

Furthermore, to control for the effects of potential confounding factors on significant associations of univariable MR analyses, we also conducted multivariable IVW analysis using the effect size retrieved from the Gene ATLAS database ( [29].

Validation in the UK Biobank cohort

Circulating vitamin E based GRS analysis

We used Cox proportional hazards models to calculate hazard ratios (HRs) and 95% confidence intervals (CIs) for the associations between circulating vitamin E-GRS and the risk of ten cancers, with the adjustment of sex, age, study centers, body mass index (BMI), smoking status, drinking status, and first ten principal components when appropriate.

One-sample MR analysis for dietary vitamin E intake

One-sample MR analysis was used to evaluate the association between dietary vitamin E intake at baseline and the cancer risk. The genetic IVs for one sample MR were extracted from the UK Biobank imputation dataset, which followed the extensive quality control of SNPs, including (i) imputation confidence score (info score) ≥ 0.3, (ii) MAF ≥ 0.05, (iii) call rate ≥ 95%, and (iv) HWE P-value ≥ 1×10-6. Then, we performed linear regression analysis between each variant and log-transformed dietary vitamin E measurements, to provide independent (linkage disequilibrium r2 < 0.1) dietary vitamin E-associated IVs under different significance thresholds (i.e., P-value ≤ 5.00×10-7, P-value ≤ 5.00×10-6, P-value ≤ 5.00×10-5). These IVs with different significance thresholds were further used to construct weighted GRS, as well as unweighted GRS to avoid potential over-fitting. In addition, we also annotated the dietary vitamin E-associated lead loci with functional activity (with HaploReg v4.1, [30] and expression quantitative trait loci (eQTL) analysis (with eQTLGen consortium of 31,684 blood samples, [31].

Briefly, a two-stage method was implemented for one-sample MR analysis: (i) the first-stage model consisted of a linear regression of the log-transformed dietary vitamin E measurements on the weighted and unweighted GRSs and (ii) the second-stage model consisted of a Cox regression of the cancer risk on the fitted values from the first-stage optimal regression model (with the strongest correlation with observed dietary vitamin E level). The covariates included sex, age, study centers, BMI, smoking status, drinking status, and the first ten principal components when appropriate.

Sensitivity analysis

Several sensitivity analyses were also performed in the UK Biobank cohort, including (i) re-analyzed the association using logistic regression model with incident and prevalent cancer cases in a case-control design and (ii) additionally adjusted for socioeconomic (i.e., education and employment status) and chronic disease status (i.e., coronary artery disease, stroke, hypertension, and type 2 diabetes).

All statistical analyses were performed using R version 3.6.1, and a two-sided P-value less than 0.05 was considered as strong evidence for a causal association.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.


This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (