Deep Q-networks with web-based survey data for simulating lung cancer intervention prediction and assessment in the elderly: a quantitative study – BMC Medical Informatics and Decision Making

Jan 4, 2022

Data collection and preparation

The health-related surveys data from BRFSS [22] were used in this study. BRFSS collected United States residents’ data on health risk behaviors and chronic health conditions [22], which involved various risk factors of lung cancer and its prevalence situation, such as age, body mass index, smoking frequency, smoking start age, smoking intensity, time since quitting smoking, personal cancer history, family history of cancer, e-cigarette use, asthma history, chronic obstructive pulmonary disease (COPD) history, et al. The data selection flowchart was shown in Fig. 1. The whole population (14,043,816 cases) of the survey were aged older than 18 years old. Of those, 47.39% (6,655,364 cases) were men and 52.61% (7,388,452 cases) were women. By leveraging data preprocessing, some cases which had missing values were excluded, e.g. missing smoking related factors, gender, lung cancer screening. The elderly population were those aged 65 years old and older according to international age threshold for the elderly in the developed countries. 1,367,598 elderly cases were obtained totally. The proportion of men 65 years and older was 48.36% (661,370 cases). In order to analyze the specificity of the characteristics of lung cancer incidence in the elderly, men aged 18 years and older and women aged 18 years and older, as well as the whole population, were included in the study to compare with the elderly. In all, five stratified groups: men aged 65 years and older (elderly men), women aged 65 years and older (elderly women), men aged 18 years and older (men), women aged 18 years and older (women) and the whole population (all), were obtained in this study.

We also selected environmental data from US Environmental Protection Agency (EPA) [23] website, which related to particulate matter (PM), carbon monoxide (CO), lead (Pb), Ozone, sulfur dioxide (SO2), nitrogen dioxide (NO2), 24-h average temperature, relative humidity, wind speed, duration of sunshine, precipitation, atmospheric pressure and indoor radon. The Environmental data were linked to BRFSS through the collection date, which could integrate these two datasets together.

Data analysis

We adopted DQN model to predict lung cancer intervention strategy and assess intervention effect for lung cancer high risk. The workflow of this study was shown in Fig. 2. Firstly, we separately screened lung cancer high risk in five stratified groups. Secondly, DQN models were developed to deduce lung cancer intervention strategy in different stratifications. Thirdly, lung cancer incidences were computed according to corresponding intervention strategy, and intervention effects were deduced through DQN models. Lastly, we assessed lung cancer intervention effect to derive optimal intervention strategy.

Lung cancer high risk screening

Timely high risk screening and early intervention [24] might reduce the incidence of lung cancer. We screened risk factors for lung cancer of elderly men and women through our previous study [21]. In elderly men, smoking frequency and time since quitting (i.e. how long has it been since the respondent last smoked a cigarette) were the top two risk factors for lung cancer [21]. Thus, according to the risk factors, the lung cancer high risk of elderly men was screened. Time since quitting and smoked at least 100 cigarettes (i.e. smoked at least 100 cigarettes in respondent’s entire life) were the high risk factors in elderly women [21]. Similarly, we screened lung cancer high risk of elderly women. We obtained 103,629 high risk elderly people and developed intervention simulation to predict lung cancer optimal intervention strategy in elderly men and women.

Deep Q-networks modelling

DQN was a value-based reinforcement learning algorithm, which used CNN to approximate value functions. DQN models’ inputs were risk factors of high risk people, which were obtained from our previous study [21], e.g. smoking frequency, cancer history, asthma history, radiation, use of e-cigarette, time since quitting, physical activity. And models’ outputs were optimal intervention strategies which were deduced from target value functions. Value functions were trained using CNN to get close to maximal intervention effect as much as possible.

We adopted Q-learning method to develop networks and computed the loss function. The loss function was shown in Eq. (1). Q was output value function of neural network, which represented maximum cumulative intervention effect of intervention strategy a from risk state s; Q(s, a; θi) was output of current network; Qi was output of the target network; θ was mean squared error of network parameters; and ρ(s, a) was probability distribution of risk state s and intervention strategy a.

$$L_{i} (theta_{i} ) = E_{s,asim rho ( cdot )} [(Q_{i} – Q(s,a;theta_{i} ))^{2} ]$$

(1)

We iteratively updated weights of optimization loss function using the stochastic gradient descent (SGD) function, as shown in Eq. (2). Q(s′, a′; θi-1) was the target network output; Q(s, a; θi) was current network output; r was intervention effect of current network; ε was intervention environment; and γ was discount factor and between 0 and 1.

$$nabla_{{theta_{i} }} L_{i} (theta_{i} ) =, E_{s,asim rho ( cdot );s^{prime}sim varepsilon } [(r + gamma mathop {max }limits_{a^{prime}} Q(s^{prime},a^{prime};theta_{i – 1} ) – Q(s,a;theta_{i} ))nabla_{{theta_{i} }} Q(s,a;theta_{i} )]$$

(2)

Then, by leveraging SGD function, the current value function was getting close to target value function as much as possible. Output target value function Qi was combined with optimal intervention strategy a and risk state s, which was in Eq. (3) and could be used to deduce optimal intervention strategy.

$$Q_{i} = E_{s^{prime}sim varepsilon } [r + gamma mathop {max }limits_{a^{prime}} Q(s^{prime},a^{prime};theta_{i – 1} )|s,a]$$

(3)

Rectified linear unit was activation function in this study, which was integrated into convolutional layer. The model consisted of one input layer, three convolutional layers, one fully connected layer and one output layer. We adopted input neurons 32 × 32, convolution kernels 5 × 5, 4 × 4 and 3 × 3 of three convolutional layers respectively and four output neurons. Ten-fold cross-validation was used to evaluate the model, which randomly divided the dataset into ten parts and took turns using nine parts for model training and one part for model testing. Python script and PyTorch framework were employed in Ubuntu programming environment based on Docker platform for model training in this study. We separately trained five DQN models of elderly men, elderly women, men, women and the whole population. Intervention strategies of these five groups were derived from their DQN models.

Intervention strategy optimization

1. (i)

Intervention effect prediction

The high risk was a risk state of lung cancer occurrence in this study. There were other risk states as well, such as low risk and lung cancer. Once intervention strategy was conducted, risk state might change, which was risk state transition. Risk state transitions of high risk included from high risk to low risk, from high risk to lung cancer, from high risk to high risk. We used the probability of risk state transition to assess the intervention effect of intervention strategy in this study. Similar intervention effect predictions in different stratifications were developed.

2. (ii)

Lung cancer intervention assessment

Probabilities of risk state transitions were assessed in different groups. As in Fig. 3, we described risk state transitions of high risk in multiple intervention cycles, where St was the set of risk states at time t; At was the set of intervention strategies at time t. We computed probabilities of risk state transition of high risk in different intervention cycles. We comprehensively assessed the intervention effects in elderly men and women using lung cancer incidence.

3. (iii)

Optimal feedback

Based on intervention effect assessment, we employed the reduction of lung cancer incidence to reflect the effectiveness of intervention strategy. The intervention strategy could bring the largest reduction of lung cancer incidence than other strategies, which would be considered as the optimal intervention strategy. Otherwise, this intervention strategy would be adjusted using feedback mechanism. The whole process was reworked as shown in Fig. 2 and intervention effect was comprehensively evaluated until optimal intervention strategy was deduced.

Model performance evaluation

To evaluate the models, we adopted ten-fold cross-validation. Accuracies and area under the receiver operating characteristic curve (AUROC) of five models were computed separately. Then we compared DQN models with support vector machines (SVM), random forest and multiple logistic regression in five groups to conduct method comparison.