Adverse effect risk scoring
We had three aeromedical experts score the individual adverse effects. All three were board-certified flight surgeons and had already completed or were in the process of completing a second graduate residency in aerospace medicine. In total, we collected 1,152 individual adverse effects for scoring. Of these, 297 (25.8%) were scored the same by all three experts, while another 627 (54.4%) were scored within only a single score being different by one risk level (e.g., one expert scored “mildly distracting” while the other two scored “distracting”), and the remaining 228 (19.8%) had a larger discrepancy between judges. Collectively, 924 were scored the same by at least 2 of the 3 physicians with the disagreeing physician only providing an alternate score within 1 severity level.
We calculated the inter-rater reliability in R using Fleiss’ kappa for 3 raters and Krippendorff’s alpha. Overall, we observed fair reliability over the 1,152 side effect ratings with both kappa and alpha being 0.294. This observation reflects the greatest limitation to the systematic risk assessment model as adverse effect severity is a subjective measure determined by experts. The average scores for the raters were similar, although rater 1 scored adverse effects as more severe than the other two raters (rater 1 mean = 2.14 ± 1.13, rater 2 mean = 1.82 ± 1.27, rater 3 mean = 1.86 ± 1.03). Pairwise analyses of Cohen’s unweighted kappa suggest that the scores are similar (rater 1 vs. 2 kappa = 0.326, rater 1 vs. 3 kappa = 0.280, rater 2 vs. 3 kappa = 0.283); however, the pairwise rater biases were 0.257, 0.288, and 0.512 for rater 1 vs. 2, 1 vs. 3 and 2 vs. 3, respectively, suggesting rater 1 was substantially different than raters 2 and 3. The pairwise percent agreement between all raters was consistent around 45% for perfect agreement (1 vs. 2 = 47.1%, 1 vs. 3 = 45.2%, and 2 vs. 3 = 44.2%) and around 88% when allowing for a deviation in severity rating of 1 score (1 vs. 2 = 90.3%, 1 vs. 3 = 88.3%, and 2 vs. 3 = 86.9%).
Overall medication risk assessment
In total, we identified 103 medications for risk assessment in the study (83 unique to the three conditions). Of the 20 medications taken from the approved list provided by Prudhomme and colleagues  (termed “PA”), four were also included in the other conditions (three for hypertension and one for diabetes); these medications were considered as part of the PA group for establishing control limits. The results from the blinded adverse effect scoring revealed a non-normally distributed range of scores for the PA group (Fig. 1) (mean: 235,806; standard deviation:183,126; range: 40,320–643,100; median = 212,810; excess kurtosis = 0.88; skewness = 1.28; D’Agostino-Pearson omnibus test statistic = 7.02 (P = 0.03)). The risk scores were, however, log-normally distributed (mean: 5.24; standard deviation: 0.36; range: 4.61–5.81; median = 5.33; excess kurtosis = -0.67; skewness = -0.25; D’Agostino-Pearson omnibus test statistic = 0.700 (P = 0.70)). We were therefore used the log values to establish the UAL and UCL at risk scores of 601,109.5 and 2,097,721, respectively. While the absolute scores in our study are higher than those found in the Prudhomme and Huntsberger reports (Table 1) [5, 6], our blinded scoring system proved reliable as the Pearson correlation between our scoring and the scores abstracted from Prudhomme equaled 0.557.
Using the established control limits, 75 of the 83 condition-specific medications (73%) were below the UAL (Fig. 2). Another 25 medications (24%) were between the UAL and the UCL, and the final 3 medications were above the UCL. Of the 25 mid-range medications, only a single medication (clonidine) had a score close to the UCL (within 10%). Finally, adverse effect frequencies were not available for dyphylline/guaifenesin or prednisone, so their risk scores were unable to be determined.
Pharmacogenetic evidence level analysis
Overall, we identified 34 medications with pharmacogenetic evidence across the three conditions (9 each for diabetes and hypertension, 16 for asthma). Considering the clinical annotation levels of evidence as defined by PharmGKb (Table 2), none of the medications met the level 1 condition and only four met level 2, all at level 2A (two diabetes medications, and 1 medication each for asthma and hypertension). Two more diabetes medications only had level 4 evidence, and the remaining 28 medications had level 3 evidence.
Beyond the clinical levels of evidence, the 34 medications were associated with 165 genetic variants across 86 unique genes. There were 50 variants with clinical annotations for diabetes, 70 for asthma, and 45 for hypertension. As would be expected, two drug metabolism cytochrome P450 enzymes (CYP2C8 and CYP2C9) were commonly associated with clinical evidence for asthma and diabetes, and all three conditions, respectively. Asthma and hypertension also shared the angiotensin converting enzyme (ACE) and beta-2-adrenergic receptor (ADRB2).
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.