Data source and population

We performed a cohort study in a large population of patients in the UK Clinical Practice Research Datalink (CPRD) Gold database [12, 13]. CPRD-GOLD contain primary care electronic health records from the UK that have been collected by general practitioners and are broadly representative of the UK population. CPRD-GOLD contains data on recorded health conditions, prescriptions, laboratory measurements taken in primary care, lifestyle, and measurement values. Data within CPRD-GOLD can be linked to UK data on hospitalisation and death. To be included, patients had to be permanently registered with a general practice contributing up-to-standard data in CPRD-GOLD for at least 1 year and with linkage to hospital episodes statistics (HES) discharge and Office for National Statistics (ONS) mortality data, be aged ≥ 25 years and < 85 years with no prior history of CVD (on GP records or linked hospital records), and have no history of prior statin treatment. Cohort entry was the latest of these dates on or after 1 January 2004. Cohort exit was the date of the earliest of first CVD event, non-CVD death, prescription of a statin, deregistration from the general practice, date of the last data collection from the practice, or the end of the study on 31 March 2016. The study was approved by the MHRA Independent Scientific Advisory Committee for database studies (ISAC 16/248).


A first CVD event was defined as the earliest recording of any fatal or non-fatal coronary heart disease (CHD), ischaemic stroke, or transient ischaemic attack. Fatal CVD events were identified from ICD-10 codes recorded in ONS death registration. Non-fatal events were identified either in GP records (using Read codes, the standard coding system used in UK general practice) or HES discharge diagnoses (ICD-10 codes). Read and ICD-10 codes defining outcomes are those used in QRISK3 derivation and have previously been published [11].

Prediction model

The following variables were included from the QRISK3 model: age, ethnicity, deprivation, systolic blood pressure, body mass index, total cholesterol to high density lipoprotein cholesterol ratio, smoking, family history of coronary heart disease in a first degree relative aged less than 60 years, type 1 diabetes, type 2 diabetes, treated hypertension, rheumatoid arthritis, atrial fibrillation, chronic kidney disease (stage 3, 4, or 5), systolic blood pressure variability (standard deviation of repeated measures), migraine, atypical antipsychotics, corticosteroids, systemic lupus erythematosus (SLE), severe mental illness, HIV/AIDs, and erectile dysfunction diagnosis or treatment in men. Our population was based on the published QRISK3-2017 prediction model with some exceptions, namely (1) we chose a later cohort entry date (1 January 2004 rather than 1 January 1998); (2) we handled cholesterol missingness differently (if no values were available at baseline, QRISK3 derivation allowed cholesterol values from after the index date to be used if they were before any event; we only included values recorded before the index date to avoid using future information in prediction); and (3) we evaluated the Townsend deprivation score as the median of the vigintile (equal 20th) of score that an individual lived in, as individual values were not available. We included all covariates that were included in the QRISK3 model. Read and ICD-10 codes defining predictors in QRISK3 are not publicly available. We therefore developed our own code sets, and these and methods of data handling have previously been published [11].


For each patient at baseline, we additionally calculated a modified Charlson Comorbidity Index (CCI) based on primary care Read codes (modified in that CVD could not contribute to the score as all participants are CVD-free at baseline) using a published code set for this purpose [14]. CCI (grouped into 0, 1, 2, and 3+) was included in the competing risk model as a predictor of non-CVD death to examine whether this improved model performance.

Missing data

As with QRISK3 derivation, patients with missing Townsend deprivation score were excluded from the cohort, those with missing ethnicity were assumed to be white, and multiple imputation was used for missing body mass index (BMI), total cholesterol to HDL cholesterol ratio (TC:HDL), systolic blood pressure (SBP), SBP variability, and smoking status assuming data was missing at random [11]. Multiple imputation included all predictor variables and the outcome. Multiple Imputation by Chained Equations was used to generate five imputed datasets [15]. Analyses of these datasets were combined using Rubin’s rules to give summary point estimates with confidence limits that reflect the added uncertainty associated with imputing missing values [16].

Statistical methods

The study size was determined by the data available in CPRD, which was considered sufficient, and no formal power calculation was done [17]. Patients were randomly allocated to a fixed derivation and test dataset in a 2:1 ratio with the split balanced in terms of age and final event status. The derivation dataset was used to derive CRISK, a new Fine-Gray model to predict the 10-year risk of experiencing a CVD event accounting for the competing risk of non-CVD death. Separate models were estimated for men and women. The Fine-Gray model calculates the subdistribution hazard ratio that is the instantaneous risk of failure from the CVD event in subjects who have not yet experienced a CVD event, whilst simultaneously accounting for the occurrence of non-CVD death. Since we wished to explicitly compare prediction in a model accounting for competing risk versus QRISK3, we included all the same main effects and age interactions as in QRISK3, but we also accounted for non-CVD death as a second (competing) outcome. We also re-estimated fractional polynomial terms for continuous variables, selecting terms based on those performing best (as measured by the C-statistic) in balanced 10-fold cross-validation and showing consistency of model fit (AIC) across folds of the derivation data set. We then derived a further model (CRISK-CCI) which additionally included the CCI score in the model (categorised as 0, 1, 2, ≥ 3) as a validated predictor of total mortality [14]. Note that these models allow the cumulative incidence function (CIF) or probability of a CVD event occurring over time to be directly predicted. However, the subdistribution hazard ratios (sHRs) in the Fine–Gray models describe the direction but not the magnitude of the effect of predictors on the CIF. Also, the use of fractional polynomials and the inclusion of interactions with age further complicate their interpretation.

The performance of CRISK and CRISK-CCI was compared to QRISK3 in the independent validation dataset by examining discrimination and calibration of all models. Discrimination is the ability of the risk score to differentiate between patients who experience the event of interest during the study and those who do not. We used Harrell’s C-statistic to describe discrimination. A C-statistic of 0.5 indicates discrimination that is no better than chance, whereas a C-statistic of 1 indicates perfect discrimination [18].

Calibration refers to how closely the predicted and observed probabilities agree at group level. This was assessed by plotting the observed versus predicted risk for CRISK, CRISK-CCI, and QRISK3. Observed risk was estimated using the Aalen-Johansen estimator which accounts for competing mortality risk [19]. Plots were generated separately by sex, for all patients and for pre-specified subgroups of age and CCI based on summary statistics pooled across the imputed dataset.

Examining patient reclassification

CVD guideline recommendations for primary preventive treatment use thresholds of predicted risk to classify patients as having a high enough risk of CVD to be offered treatment. We examined changes in patients recommended for treatment by CRISK-CCI and QRISK3, focusing on patients reclassified to be either side of the 20% (UK recommended threshold till 2014), 10% (current NICE recommended threshold), and 7.5% (plausible future) thresholds of predicted CVD risk. We described the characteristics of reclassified patients including the observed risks of CVD at 10 years and the number needed to treat to prevent one new CVD event assuming all people recommended for treatment take a statin having a relative risk reduction of 25% for new CVD events. All models were fitted in R, version 4.0.0, and STATA, version 11.2.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.


This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (