Invasive fungal diseases (IFDs) are life-threatening infections, and their morbidity and mortality have increased in recent decades (1, 2). The most common microorganisms that cause IFDs are Candida species (3). Among IFDs, the incidence of candidaemia ranges between 2.4 and 15 in 100,000 individuals and has increased by 50% over the past 10 years (4–6). Approximately 45% of Candida bloodstream infections occur in critical care units and have become a leading cause of death among ICU patients (7). Previous studies have proven that early optimal antifungal treatment can decrease patient mortality (8–10). A definitive diagnosis of candidaemia mainly relies on blood culture (11–13), which takes time and can thus cause a delay in timely treatment of patients. Early recognition is very difficult, and the indiscriminate use of antifungal agents can cause drug resistance and increase the patient’s economic burden. Therefore, we need a method to identify patients with candidaemia that can be performed faster than blood cultures.
Some predictive models for candidaemia have been proposed (14, 15), such as the Candida colonization index (CI) (9) and Candida score (CS) (16). However, most of the models used limited sample sizes because of the extremely low incidence of candidaemia (5, 6). Three predictive models (15, 17, 18) were built with large sample sizes and had a good negative predictive value of 99%, but the sensitivity and positive predictive value (PPV) were poor. When the specificity reached more than 80%, the sensitivity was only 40.5–51.4%, and the PPV varied from 4 to 9%. Previous studies tended to use traditional modeling methods, but the effectiveness of the models was insufficient.
Clinically, patients with candidaemia lack specific symptoms and signs. Systemic inflammatory response syndrome (SIRS) is often used to trigger clinicians to start anti-infection treatment. When a patient develops SIRS, clinicians will often use antibacterial drugs initially, but antifungal drugs are rarely used timely and accurately, likely causing delays in treating patients with candidaemia. Therefore, doctors must determine the probability of candidaemia when a patient presents with SIRS. Additionally, no predictive model has used SIRS as the starting point to determine the possibility that a patient has candidaemia.
Machine learning algorithms can be applied to help understand large quantities of existing data and to make predictions about new data. Previous studies have used machine learning methods to diagnose or distinguish different types of diseases (19, 20). Because of the extremely low incidence of candidaemia, the development of a prediction model requires a very large sample size and must overcome the imbalance between positive and negative results. Machine learning may provide advantages in the construction of prediction models for candidaemia among ICU patients.
Therefore, this study aimed to establish a new prediction model to determine the possibility of candidaemia in patients with SIRS with machine learning algorithms to improve the efficiency of predictive models and help with precisely prescribing antifungal drugs in the creation of personalized guidelines.
Materials and Methods
This multicenter, retrospective study was performed using data from three hospitals (Peking Union Medical College Hospital, The Affiliated Hospital of Qingdao University, The First Affiliated Hospital of Fujian Medical University) obtained between January 2013 and December 2017.
Blood culture results and various influencing factors were retrospectively collected from the corresponding hospital information systems from patients who had been hospitalized in the ICU.
First, the patients’ data from three hospitals were combined. Second, all the data were randomly divided into a training set and a validation set. The classic 2–8 principle was used to divide the data set: 80% for model training and 20% for model evaluation. Machine learning methods were used to train the prediction models with the data from the training set, and then the prediction models were applied to the data from the validation set to evaluate their efficiency.
Ethics approval was provided by the ethics committee of Peking Union Medical College Hospital. All of the data were anonymized before sharing with researchers.
Patients who were admitted to the above target hospitals and had new-onset SIRS from 2013 to 2017 were selected as the subjects of the study. New-onset SIRS needed to meet the following criteria: (1) SIRS occurred in the ICU; (2) blood culture was obtained during the course of SIRS; (3) no previous SIRS within 24 h.
SIRS was defined when at least two of the following criteria were met (21): (1) body temperature >38°C or <36°C; (2) heart rate > 90 beats/min; (3) respiration rate > 20 times per min or hyperventilation (PaCO2 <32 mmHg); and (4) leukocyte count > 12 × 109/L or <4 × 109/L or neutrophil (rod granulocyte) percentage > 10%.
SIRS can occur many times during a single hospitalization. To avoid repeat measurement, we identified new-onset SIRS as SIRS that occurred after ICU admission and after at least 24 h of a previous SIRS event if multiple SIRS events occurred. SIRS-related candidaemia was defined if a Candida species was identified from blood samples collected within SIRS.
Two automated blood culture systems were used during the study period: a Bactec™ system (Becton Dickinson, Sparks, Maryland, USA) and a Bact/Alert®3D system (bioMérieux, Marcy l’Etoile, France).
Data Collection and Risk Factor Definitions
We identified 28 risk factors with strong clinical significance with candidaemia by searching previous studies (see Table 1). The risk factors are mainly divided into four groups: basic patient factors, primary or combined diseases, laboratory tests, and treatment. We retrospectively collected the data involved in the research in the electronic medical record systems of the three hospitals. Colonization was defined as the presence of Candida species in non-significant samples taken from one or more body sites, including the oropharynx, stomach, urine, or tracheal aspirates (16). Samples were collected after ICU admission and before the collection of blood samples. Colonization information was collected based on the judgement of clinicians and clinical requirements. We retrospectively collected data about colonization from the ICU database, and not all of the patients had actively collected cultures from the oropharynx, stomach, urine, or tracheal aspirates. A previous history of fungal infection was defined as patients with invasive fungal disease before this hospitalization that was recorded in the history of past illness or reported by the patients themselves.
1,3-β-D-glucan (BDG) was defined as positive with a cut-off value of 80 pg/ml (22). The measurement occurred after ICU admission and before blood samples were collected. If there was more than one BDG result, the BDG closest to the SIRS was chosen.
The code of the model training part of this study is written in python (python 3.7.0). We divided the data into a training set and test set, 80% for model training, and 20% for model evaluation. We used stratified division to ensure the distribution of positive and negative cases. In order to deal with the imbalance of sample categories, the SMOTE algorithm is used in this study (the mechanism of SMOTE is listed in the Appendix). The training set was used to construct five prediction models (logistic regression model, support vector machine model, random forest model, extratree model and XGBoost model). A detailed description of the five models is provided in the Appendix. Parameter tuning is performed for each model to improve the efficiency of the models.
The test set was used to evaluate the performance of five different models. We have used five model evaluation index, including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and area under curve (AUC), to compare the performance of five models. The model with the best efficiency was chosen as the final model.
In total, 3,1070 new-onset SIRS incidents for 28,143 patients were included in this study. Excluding 876 new SIRS cases that occurred in 860 patients younger than 14 years old, 9,303 SIRS cases developed outside the ICU, and 20,891 new SIRS cases remained. Among these cases, 8,002 had corresponding blood culture results, among whom 137 were positive for Candida in blood culture and 7,865 were negative or were positive for a pathogen other than Candida. The flowchart of enrolment is described in Figure 1.
The patients were all from the ICU, the median age was 57.4 years [39.9–74.9], and 61.2% were male.
Risk Factor Screening
We selected 28 risk factors through literature search, conducted retrospective data collection and analyzed the distribution of risk factors in different groups (Table 2).
Prediction Model Construction Using XGBoost
The area under the curve (AUC) for the XGBoost model ranged from 0.57 to 0.91 using different risk factors as measured by the importance score as input (Table 3). By comparing the effectiveness of models incorporating different numbers of risk factors, we chose 15 important risk factors to train the prediction models. The importance score of the 15 risk factors is shown in Figure 2.
Performance of the Models
The efficiency of the five different models is shown in Table 4, and the model receiver operating characteristic (ROC) curves are shown in Figure 3. When we set the cut-off value to 0.030, XGBoost achieved the best performance with a sensitivity of 84%, a specificity of 89% and a negative predictive value of 99.6%. Additionally, the XGBoost model achieved the best prediction performance among the machine learning models and traditional regression model.
Figure 3. ROC of models. LR, Logistic regression; RF, Random Forest; SVM, Support Vector Machines; ET, ExtraTree.
This study established a machine learning candidaemia prediction model that could be implemented in a computer program. When an ICU patient develops SIRS, real-time bedside assessment of the possibility of developing candidaemia can guide the appropriate use of antifungal drugs. To our best knowledge, this is the first machine learning-based model developed to predict candidaemia. The final model was proven to have better performance than previous prediction models. Because the machine learning model had a very high negative predictive value larger than 99%, a negative result can effectively exclude people without candidaemia, preventing the use of antifungal therapy.
Comparison of Different Candidemia Prediction Models
Although predictive models for candidaemia have improved in the last few decades, most were trained by traditional logistic regression, and some have not been validated in large validation cohorts (8, 9).
Five well-accepted candidaemia prediction models were developed from 1994 to 2016 (9, 10, 16–18). Three of them (15, 17, 18) had a large sample size and a good negative predictive value from 99.7 to 99.9%, but the sensitivity and positive predictive value were poor. Although the specificity reached more than 80%, the model sensitivity was only 40.5–51.4%, and the PPV varied from 4–9%. Leon et al constructed the “Candida score”, which achieved a sensitivity of 89%, a specificity of 74% and an AUC of 0.847 (16). Another study also produced a model with good efficiency (9). However, these two models were only developed using data from patients with Candida colonization. Consequently, the models can only be used with restricted populations. In the present study, the XGBoost model had very high efficiency with an AUC of 0.92, a sensitivity of 84%, a specificity of 89%, and a negative predictive value of 99.6%. The PPV was not sufficiently high (13%) but was better than that of other prediction models (15, 17, 18). Because the machine learning model had a very high negative predictive value of 99.6%, a negative result can effectively exclude people without candidaemia, indicating that antifungal therapy should not be used. Because of the low number of patients with candidaemia in this study, the positive predictive value was not sufficiently high. A positive result would indicate a probability of the patient developing candidaemia of 13%, which still substantially increases the probability of the effective use of antifungal drugs. Our model can be combined with other prediction methods with high positive predictive value to conduct a second evaluation of patients who are positive according to the machine learning model to further improve the detection efficiency.
Machine Learning Models in China
Because of the low incidence of candidaemia, previous prospective studies lacked a large sample size and demonstrated an imbalance between positive and negative samples. The FIRE study in the UK was a multicenter prospective study on invasive fungal disease and included 60,778 admissions from 96 critical care units (18). Although the study yielded good results, it required considerable economic and labor costs. The use of a database to establish machine learning models not only reduces the economic cost of research but also improves the effectiveness of the resulting predictive models. The validation cohort proved that the XGBoost model could achieve the best prediction performance among the different machine learning models and traditional regression models with an AUC of 0.92.
SIRS as a Starting Point
In clinical practice, the presence of SIRS in ICU patients often leads to suspected infection. SIRS meets clinical needs and has high clinical operability as the starting point to guide antifungal therapy. Additionally, the incidence of SIRS in ICU patients is >80% (23); thus, the proposed prediction model should apply to a wide range of individuals. The innovative use of SIRS as a trigger point to create a candidaemia prediction model, combined with machine learning algorithms, will maximize the use of ICU big data and improve the immediacy and accuracy of prediction.
Useful Software for Clinical Practice
Because this study used a machine learning method to establish the candidaemia predictive model, the test results cannot be determined simply by the weighted scores of the risk factors but must be calculated using a program. When an ICU patient becomes afflicted with SIRS, the clinician can input the corresponding risk factor values into the program, which can automatically output a positive or negative prediction result, thereby achieving real-time prediction at the bedside.
Risk Factors Related to Candidaemia
The most important risk factors in this predictive model included fungal colonization, diabetes, acute kidney injury, total parenteral nutrition and renal replacement therapy, which are consistent with previous studies (8, 24–27). However, some risk factors mentioned in previous studies were not included in our prediction model, such as the APACHE II score (9, 28) and severe sepsis (16).
First, to ensure the accuracy of the study, we excluded SIRS patients without blood samples and only enrolled new-onset SIRS patients with blood cultures obtained during the course of SIRS. We acknowledge that the exclusion of the 12,894 SIRS without blood samples may introduce biases and influence the performance of the prediction model. However, the data of 8,002 SIRS for analysis were relatively large in the prediction model. Additionally, the incidence of candidaemia in all SIRS patients was approximately 0.65% (137/20,891), which was similar to that in previous studies (0.15–0.65%) (29, 30). Second, blood cultures were not obtained for 12,894 patients with SIRS. In clinical practice, the presence of SIRS in ICU patients often leads to suspected infection. However, SIRS is not the only indicator to trigger blood sample culture in clinical practice. Individual differences exist in the standard and clinical practice of blood culture. Hence, it was reasonable to observe an SIRS rate >50% without blood sample culture in the present study. Third, the study population only comprised ICU patients. Therefore, the results may not be generalizable to non-ICU patients. Fourth, the number of positive samples included in this study was relatively small because of the extremely low incidence of candidaemia, possibly affecting the effectiveness of the prediction model. Therefore, we used the SMOTE mechanism to improve the imbalance of positive and negative samples and improve the efficiency of the model. Fifth, including patients from three hospitals may have increased the bias between the hospitals. By adopting strict and consistent risk factor evaluation standards, this bias could be reduced, and the multicenter nature of the research can improve sample representativeness. Sixth, some of the risk factors did not demonstrate significant differences because of their low incidence, such as chemotherapy drugs. These risk factors are less common in the overall ICU population; therefore, their importance is difficult to judge. Additionally, the data concerning colonization were collected retrospectively, possibly influencing the accuracy of this risk factor and efficiency of the model. In the present study, the negative predictive value of BDG was high, partly because of the low incidence of candidaemia. The high negative predictive value will partly contribute to the good efficiency of the prediction model with an NPV of 99.6%. Finally, retrospective studies have inherent data biases. Although the ICU database can ensure some measure of accuracy, the efficiency of the prediction model must be further evaluated in the future.
The machine learning prediction model for candidaemia has good efficiency and can guide antifungal treatment in ICU patients when new-onset SIRS occurs.
Approximately 45% of Candida bloodstream infections occur in critical care units and have become a leading cause of death among ICU patients. Previous prediction models of candidaemia mostly used traditional logistic models and had some limitations. In this study, we developed a machine learning algorithm trained in predicting candidaemia in patients with new-onset systemic inflammatory response syndrome (SIRS) with good performance.
Data Availability Statement
The datasets presented in this article are not readily available because to protect patients’ privacy. Requests to access the datasets should be directed to firstname.lastname@example.org.
Ethics approval was provided by the ethics committee of Peking Union Medical College Hospital. All of the data were anonymized before sharing with researchers.
SY performed the experiments and statistical analysis and wrote the manuscript. YS and XX participated in the design of the study and statistical analysis. HH participated in the design of the study and helped to draft the manuscript. YL conceived of the study, participated in its design and helped to draft, and revise the manuscript. All authors have read and approved the final manuscript.
Capital’s Funds for Health Improvement and Research (No. 2020-2-40111) and Medical and Health Science and Technology Innovation Project of the Chinese Academy of Medical Sciences (No. 2019-12M-1-001). Excellence Program of Key Clinical Specialty of Beijing in 2020. Beijing Municipal Science and Technology Commission (Grant No. Z201100005520051).
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
We would like to express our gratitude to Pfizer and Happy Life Technology for their help.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fmed.2021.720926/full#supplementary-material
2. Pittet D, Li N, Woolson RF, Wenzel RP. Microbiological factors influencing the outcome of nosocomial bloodstream infections: a 6-year validated, population-based model. Clin Infect Dis. (1997) 24:1068–78. doi: 10.1086/513640
3. Wisplinghoff H, Bischoff T, Tallent SM, Seifert H, Wenzel RP, Edmond MB. Nosocomial bloodstream infections in US hospitals: analysis of 24,179 cases from a prospective nationwide surveillance study. Clin Infect Dis. (2004) 39:309ect. doi: 10.1086/421946
4. Lortholary O, Renaudat C, Sitbon K, Madec Y, Denoeud-Ndam L, Wolff M, et al. Worrisome trends in incidence and mortality of candidemia in intensive care units (Paris area, 2002-2010). Intensive Care Med. (2014) 40:1303–12. doi: 10.1007/s00134-014-3408-3
7. Kibbler CC, Seaton S, Barnes RA, Gransden WR, Holliman RE. Johnson EM,et al. Management and outcome of bloodstream infections due to Candida species in England and Wales. J Hosp Infect. (2003) 54:18–24. doi: 10.1016/S0195-6701(03)00085-9
8. Agvald-Ohman C, Klingspor L, Hjelmqvist H, Edlund C. Invasive candidiasis in long-term patients at a multidisciplinary intensive care unit: Candida colonization index, risk factors, treatment and outcome. Scand J Infect Dis. (2008) 40:145–53. doi: 10.1080/00365540701534509
9. Pittet D, Monod M, Suter PM, Frenk E, Auckenthaler R. Candida colonization and subsequent infections in critically ill surgical patients. Ann Surg. (1994) 220:751–8. doi: 10.1097/00000658-199412000-00008
10. Ostrosky-Zeichner L, Sable C, Sobel J, Alexander BD, Donowitz G, Kan V, et al. Multicenter retrospective development and validation of a clinical prediction rule for nosocomial invasive candidiasis in the intensive care setting. Eur J Clin Microbiol Infect Dis. (2007) 26:271–6. doi: 10.1007/s10096-007-0270-z
12. Gits-Muselli M, Villiers S, Hamane S, Berçot B, Donay JL, Denis B, et al. Time to and differential time to blood culture positivity for assessing catheter-related yeast fungaemia: A longitudinal, 7-year study in a single university hospital. Mycoses. (2020) 63:95–103. doi: 10.1111/myc.13024
13. Beyda ND, Amadio J, Rodriguez JR, Malinowski K, Garey KW, Wanger A, et al. In Vitro Evaluation of BacT/Alert FA Blood Culture Bottles and T2Candida Assay for Detection of Candida in the Presence of Antifungals. J Clin Microbiol. (2018) 56:e00471-18. doi: 10.1128/JCM.00471-18
14. Dupont H, Bourichon A, Paugam-Burtz C, Mantz J, Desmonts JM. Can yeast isolation in peritoneal fluid be predicted in intensive care unit patients with peritonitis? Crit Care Med. (2003) 31:752–7. doi: 10.1097/01.CCM.0000053525.49267.77
15. Ostrosky-Zeichner L, Pappas PG, Shoham S, Reboli A, Barron MA, et al. Improvement of a clinical prediction rule for clinical trials on prophylaxis for invasive candidiasis in the intensive care unit. Mycoses. (2011) 54:46–51. doi: 10.1111/j.1439-0507.2009.01756.x
16. León C, Ruiz-Santana S, Saavedra P, Almirante B, Nolla-Salas J, Alvarez-Lerma F, et al. A bedside scoring system (“Candida score”) for early antifungal treatment in nonneutropenic critically ill patients with Candida colonization. Crit Care Med. (2006) 34:730–7. doi: 10.1097/01.CCM.0000202208.37364.7D
17. Paphitou NI, Ostrosky-Zeichner L, Rex JH. Rules for identifying patients at increased risk for candidal infections in the surgical intensive care unit: approach to developing practical criteria for systematic use in antifungal prophylaxis trials. Medical mycology. (2005) 43:235–43. doi: 10.1080/13693780410001731619
18. Shahin J, Allen EJ, Patel K, Muskett H, Harvey SE, Edgeworth J, et al. Predicting invasive fungal disease due to Candida species in non-neutropenic, critically ill, adult patients in United Kingdom critical care units. BMC Infect Dis. (2016) 16:480. doi: 10.1186/s12879-016-1803-9
19. Zhang Z, Liu J, Xi J, Gong Y, Zeng L, Ma P. Derivation and validation of an ensemble model for the prediction of agitation in mechanically ventilated patients maintained under light sedation. Crit Care Med. (2021) 49:e279–90. doi: 10.1097/CCM.0000000000004821
20. Zhang Z, Ho KM, Hong Y. Machine learning for the prediction of volume responsiveness in patients with oliguric acute kidney injury in critical care. Critical care (London, England). (2019) 23:112. doi: 10.1186/s13054-019-2411-z
21. Bone RC, Balk RA, Cerra FB, Dellinger RP, Fein AM, Knaus WA, et al. Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. The ACCP/SCCM Consensus Conference Committee American College of Chest Physicians/Society of Critical Care Medicine. Chest. (1992) 101:1644–55. doi: 10.1378/chest.101.6.1644
22. Angebault C, Lanternier F, Dalle F, Schrimpf C, Roupie AL, Dupuis A, et al. Prospective Evaluation of Serum β-Glucan Testing in Patients With Probable or Proven Fungal Diseases. Open Forum Infect Dis. (2016) 3:ofw128. doi: 10.1093/ofid/ofw128
23. Raith EP, Udy AA, Bailey M, McGloughlin S, MacIsaac C, Bellomo R, et al. Prognostic accuracy of the sofa score, sirs criteria, and qsofa score for in-hospital mortality among adults with suspected infection admitted to the intensive care unit. Jama. (2017) 317:290–300. doi: 10.1001/jama.2016.20328
24. Blumberg HM, Jarvis WR, Soucie JM, Edwards JE, Patterson JE, Pfaller MA, et al. Risk factors for candidal bloodstream infections in surgical intensive care unit patients: the NEMIS prospective multicenter study. The national epidemiology of mycosis survey. Clin Infect Dis. (2001) 33:177–86. doi: 10.1086/321811
25. Chow JK, Golan Y, Ruthazer R, Karchmer AW, Carmeli Y, Lichtenberg DA, et al. Risk factors for albicans and non-albicans candidemia in the intensive care unit. Crit Care Med. (2008) 36:1993–8. doi: 10.1097/CCM.0b013e31816fc4cd
27. Jordà-Marcos R, Alvarez-Lerma F, Jurado M, Palomar M, Nolla-Salas J, León MA, et al. Risk factors for candidaemia in critically ill patients: a prospective surveillance study. Mycoses. (2007) 50:302–10. doi: 10.1111/j.1439-0507.2007.01366.x
28. Ibàñez-Nolla J, Nolla-Salas M, León MA, García F, Marrugat J, Soria G, et al. Early diagnosis of candidiasis in non-neutropenic critically ill patients. J Infect. (2004) 48:181–92. doi: 10.1016/S0163-4453(03)00120-8
29. Chakrabarti A, Sood P, Rudramurthy SM, Chen S, Kaur H, Capoor M, et al. Incidence, characteristics and outcome of ICU-acquired candidemia in India. Intensive Care Med. (2015) 41:285–95. doi: 10.1007/s00134-014-3603-2