45 and Up Study data

The Sax Institute’s 45 and Up Study cohort comprises 267,153 people from NSW, Australia, recruited between January 2006 and December 2009. Participants aged ≥ 45 years were randomly sampled from the Services Australia (formerly the Australian Government Department of Human Services) Medicare enrolment database that has near-complete coverage of the population. People living in remote and rural areas and those aged ≥ 80 years were oversampled. Overall, the response rate was ~ 18% and the cohort represents ~ 11% of the NSW population aged ≥ 45 years. Participants self-completed a postal questionnaire at recruitment, which included health, socio-demographic and past medical history information. Further details are described elsewhere [2].

Baseline data were linked to the NSW Cancer Registry (NSWCR; 01-January-1994 to 31-December-2013), which contains all notifications of cancer diagnosed in NSW, to ascertain primary incident cancers of the lung (ICD-10 classification code: C33-C34), colorectum (C18-C20), prostate (C61) and female breast (C50). Cases with a record prior to or at recruitment were excluded. Additionally, we linked to NSW Registry of Births, Deaths and Marriages (RBDM; 01-February-2006 to 31-December-2013) to ascertain deaths that occurred before the end of follow-up (i.e., 31-December-2013) required for calculating person-years at risk. Data were sourced from the Cancer Institute NSW and NSW Ministry of Health and were probabilistically linked by the Centre for Health Record Linkage using a best practice approach to linkage while preserving privacy [14]. The probabilistic matching process is known to be highly accurate (false-positive and false-negative rates of ~ 0.5%) [15]. All data were accessed using the Secure Unified Research Environment (SURE).

The conduct of the 45 and Up Study was approved by the University of New South Wales Human Research Ethics Committee. The NSW Population and Health Services Research Ethics Committee approved the record linkage and analysis of the 45 and Up Study data (approval number 2014/08/551).

Population data used for developing weights

The Census of Population and Housing Survey data

The Census is a compulsory survey of all people in Australia, conducted by the Australian Bureau of Statistics (ABS) every five years, and provides demographic, socioeconomic and housing characteristics of the entire population. Data for people aged ≥ 45 years from the 2006 Census, the closest in time to recruitment of the 45 and Up Study sample, were obtained using ABS online Table Builder Basic [16]. We considered all characteristics in the Census that were highly comparable to those in the 45 and Up Study’s baseline questionnaire (Additional file 1, Table A). This identified the seven characteristics (sex, 5-year age group, place of residence (coded using the Accessibility and Remoteness Index of Australia [ARIA]), education, region of birth, language other than English spoken at home and marital status) which were then considered further for inclusion in the weights.

Surveys used to compare health characteristics and behaviours

As many health and behaviour characteristics are not included in the Census, we compared the estimated prevalence of these from the 45 and Up Study to those from two independent population benchmarks.

National Drug Strategy Household Survey (NDSHS) data

The NDSHS is conducted by the Australian Institute of Health and Welfare (AIHW) every three years and provides information on alcohol, tobacco and illicit drug use for a representative sample of the Australian population (see Additional file 1, Table B for characteristics used in this study) [17, 18]. To ensure compatibility with the 45 and Up Study’s mode of data collection, we included data collected using self-completed questionnaires (85% and 100% of all survey participants in 2007 and 2010, respectively). Data for participants aged ≥ 45 years from the 2007 (n = 12,470) and 2010 (n = 14,388) surveys were used, with overall response rates of 54% and 51%, respectively. Data from each survey were weighted using weights supplied with the survey information so that the sample was approximately representative of the Australian population in terms of age, sex, place of residence and household size.

Australian National Health Survey (ANHS) data

The ANHS is a household survey conducted by the ABS every three years which provides health information for a sample of the Australian population [19]. Data from the 2007 survey were obtained using the Remote Access Data Laboratory [20]. There were 15,800 households randomly sampled (91% response rate), and 8,531 people aged ≥ 45 years were interviewed in person. We identified 17 characteristics from the ANHS questionnaire that were comparable to items in the 45 and Up Study’s baseline questionnaire (Additional file 1, Table B). Weighted frequencies for these characteristics in the ANHS were calculated using the person weights provided in the dataset, which adjusted for the probability of a person being selected and were calibrated so that the proportions in the sample aligned with those in the Australian population for sex, age group and place of usual residence.

Population-wide cancer incidence data

The total numbers of people by sex and 5-year age group for the NSW and Australian population were obtained from the ABS [21].

We obtained the NSW-wide numbers of incident primary lung, colorectum, prostate and female breast cancers by sex and 5-year age group from the NSWCR for 01-January-2009 to 31-December-2013, using the same ICD-10 codes as above. To match the inclusion criteria used for the 45 and Up Study, NSWCR cases were excluded if they were diagnosed with multiple primary cancers, secondary cancers or who were notified to the NSWCR through death certificate only. The NSW Population and Health Services Research Ethics Committee approved the analysis of cancer incidence data for all of NSW (Reference: HREC/09/CIPHS/16).

We did not have access to primary cancer incidence data for the whole of Australia with equivalent inclusion criteria to those for the 45 and Up Study cohort. However, age-standardised NSW cancer incidence rates for lung, colorectal, prostate and breast cancers are almost identical to the Australian rates when equivalent inclusion criteria are used as reported in Cancer Data in Australia by the AIHW for 1982–2016 (Additional file 2 with all rates standardised to the Australian population in 2001) [22]. Consequently, we used the NSWCR data as a proxy for the Australian national rates.

Statistical analyses

All analyses were conducted in SAS 9.4 and STATA (release 16.1. College Station, TX: Stata Corporation; 2019).

Weighting methods

We applied post-stratification and raking methods to data from the 45 and Up Study, to derive weights matching the distribution of demographic data in the 2006 Australian Census for the NSW and Australian populations. We used both a ‘full’ and ‘basic’ set of characteristics to construct separate raking weights, and the basic set to construct post-stratification weights.


Seven demographic characteristics (listed in Table 1) were selected to create two raked weights for the 45 and Up Study (‘full raking’), one each for the NSW and Australian populations. Another set of weights were created separately for the NSW and Australian populations using ‘basic raking’ with sex, 5-year age group and place of residence only. Participants from the sample were excluded (n = 11,788) if they had missing values for any of the characteristics used to construct the weights. For each estimated weight, values outside of the median plus six times the interquartile range (IQR) were trimmed to remove extreme outliers. We used the STATA ipfraking package [12] to calculate the raked weights. The Additional file 3 (‘Development of raking weights’) includes a step-by-step description of the method.

Table 1 45 and Up Study participants’ characteristics (2006–2009) used in fully raked weighting and comparison with Census data for the NSW and Australian populations

Post-stratification weighting

We created two post-stratification weights to match the NSW and Australian populations separately, using the same characteristics as for ‘basic raking’ (with a total of 2 × 9 × 4 = 72 combinations).

Comparison of the prevalence of health characteristics and behaviours

To establish whether raking and post-stratification weighting improved the representativeness of the 45 and Up Study cohort, we compared distributions of participants’ health and lifestyle characteristics, which were not included in the raking weights, to those in the NDSHS and ANHS (listed in Table 2). All NDSHS and ANHS questionnaire items were examined for similarity to those in the 45 and Up Study. Six characteristics in both surveys were identified as moderately or highly comparable to the 45 and Up Study.

Table 2 45 and Up Study participants’ socioeconomic, health and lifestyle characteristics (2006–2009) before and after applying fully raked weights, compared to those in the NDSHS and ANHS

The unweighted and weighted prevalence of each characteristic was estimated with 95% confidence intervals (95%CIs) in the 45 and Up Study using the SAS surveyfreq procedure. Weighted percentages and 95%CIs for these characteristics in the NDSHS were generated using weights provided in the dataset and the STATA `svy` function. For characteristics that were available from both NDSHS datasets, we estimated the prevalence separately for 2007 and 2010, and as these were similar for all characteristics, we used the averaged weighted estimates. We additionally compared estimates for eight characteristics in the 45 and Up Study to those in the ANHS that were not available in the NDSHS, including private health insurance, Department of Veterans’ Affairs (DVA) white or gold healthcare benefits cards, ever diagnosed with asthma or diabetes, number of alcoholic drinks per week, fruit and vegetable consumption and the main type of milk consumed. However, the ANHS data available to us did not include confidence intervals.

To summarise the overall effectiveness of post-stratification, basic raking and full raking in reducing the absolute difference between 45 and Up Study weighted estimates and population benchmark estimates, we calculated four measures based on all characteristics together: 1) the number of categories with overlapping 95% confidence intervals for the NDSHS population estimates and the weighted and unweighted 45 and Up Study estimates; 2) the number of categories for which the population benchmark estimates were within the 95% confidence intervals of the weighted and unweighted 45 and Up Study estimates; 3) the number of categories for which the weighted 45 and Up Study point estimates moved closer to the population benchmark estimates relative to the corresponding unweighted estimates; and 4) the median and interquartile range (IQR) for the absolute difference between the population benchmark estimates and the weighted and unweighted 45 and Up Study estimates.

Comparison of cancer incidence rates

We compared the unweighted and weighted cancer incidence in the 45 and Up Study to that for the NSW and Australian populations, separately for males, females, and each cancer type. We used indirect standardisation to estimate the standardised incidence ratio (SIR) by dividing the unweighted or weighted observed number of cancer cases (O) by the expected number (E) in the 45 and Up Study [23]. A detailed description of the method can be found in the Additional file 4 (‘Calculation of standardised incidence ratios’). The expected numbers of new cancer cases were determined using the sex-age-specific incidence rates for the reference population multiplied by the unweighted or weighted person-years at risk in the study cohort. As noted above, the calculations for Australia used the NSW incidence rates as a proxy for Australian rates, and the 45 and Up Study sample weighted to the Australian population.

We calculated the confidence intervals for the SIRs using the Fieller-based method (see Additional file 4 for details). As the 45 and Up Study deliberately over-sampled individuals ≥ 80 years old, we used a second approach to verify the robustness of results (see Additional file 4). The weighted observed and expected numbers of cases were estimated using the STATA `svy` function [24].

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.


This article is autogenerated using RSS feeds and has not been created or edited by OA JF.

Click here for Source link (https://www.biomedcentral.com/)