Predicting acute kidney injury following open partial nephrectomy treatment using SAT-pruned explainable machine learning model – BMC Medical Informatics and Decision Making

May 16, 2022

Data acquisition

Since 1995, we have been continuously extending our open PN database to include surgical and oncological parameters. For this particular study, we included all adult (> 18 years) patients who underwent open PN for enhancing solid renal mass and then split the data into AKI and non-AKI. Patients with a solitary kidney or multiple tumors were excluded from this study. Therefore, the PN database includes 723 patients. Renal function was assessed the day before surgery, on the day of the surgery, and on a daily basis after the surgery until discharge which more often than not was on post-operative day 3.

Operative technique

An extraperitoneal, extrapleural supra-11th rib incision was done on the operated side. IV Mannitol was given before clamping the renal artery. In situ renal hypothermia was done by cooling the surface of the kidney with ice slush for 10–15 min immediately after clamping the renal artery. The tumor was enucleated with a minimal rim of normal parenchyma. Renorrhaphy was done using either 2/0 VICRYL interrupted sutures or tissue adhesive BioGlue (CryoLife, Atlanta, GA). A more detailed surgical technique has been previously published by our group [18].

Renal function assessment

Baseline serum Creatinine (sCr) was measured the day before surgery. We used both the RIFLE (risk, injury, failure, loss of kidney function, and end-stage renal failure) [19] and AKIN (Acute Kidney Injury Network) [20] criteria to define AKI, comparing each of the post-operative renal function assessments to the baseline level. AKI was defined as the occurrence of one of the following conditions: (1) an increase in serum Creatinine of ≥ 0.5 times above baseline in the first week following surgery, (2) an increase in sCr by ≥ 0.3 mg/dl(≥ 26.5 mmol/l) above baseline in the 48 h window post-operatively, or (3) reduction of more than 25 percent of the estimated Glomerular Filtration Rate in the 7 days period after surgery. In total, 231 patients developed AKI based on the aforementioned criteria and constituted the AKI group, and 492 did not develop AKI and therefore were classified as non-AKI. 723 patients is considered a large enough set to use for the methods shown in the following sections [21].

Data split

In order to develop ML algorithm, the study population was compiled into a data set, split into a training cohort from which the proposed algorithm was derived and a validation cohort on which the model was applied and tested. The training cohort was derived from a random sampling of 80% of the data set, and the validation cohort comprised the remaining 20%. The division process was repeated 1000 times looking for the optimal split that ensures no statistically significant differences between the two cohorts in demographics or AKI outcome. The split was carried on such that the divisions are minimizing the differences of the age, gender, smoking years, and AKI parameters in both the training and validation cohorts. The distribution of the parameters age, smocking, gender, and AKI in both these cohorts are shown in Eq. (1).

$$left[ {begin{array}{*{20}l} {{mathbf{Parameter}}} hfill & {{mathbf{Training}};{mathbf{Cohort}}} hfill & {{mathbf{Validation}};{mathbf{Cohort}}} hfill \ {Age} hfill & {61.23 pm 12.05} hfill & {60.92 pm 13.14} hfill \ {Smoking} hfill & {17.61% pm 38.15% } hfill & {18.34% pm 38.74% } hfill \ {Gender} hfill & {Male:37.98% ,Female:62.02% } hfill & {35.45% ,Female:64.55% } hfill \ {AKI} hfill & {46.84% ,with,AKI} hfill & {38.54% ,with,AKI} hfill \ end{array} } right]$$

(1)

Algorithm

We used the random forest (RF) ML algorithm [22]. We selected the RF algorithm because it can provide a simple explanation of the model’s prediction to healthcare professionals while obtaining a good accuracy on a relatively small data set [23]. We applied the proposed binary AKI prediction decision tree (DT) algorithm on the training cohort and then validated it on the validation cohort that was completed using the sklearn library with Python 3.5. The model’s hyper parameters were determined using the grid search method [24] (see Sect. 2.9) and fivefold cross-validation on the training cohort to determine the values which led to the best performance.

Feature selection

We performed a feature selection in the following order: first, we manually filtered the features available before the surgery (marked as (F)). Afterward, we evaluated the model’s accuracy, picking one feature from (F). The feature that resulted in the model’s highest accuracy was chosen ({F}_{1}). Then, an additional feature from the remaining feature set ((Fbackslash {F}_{1})) was added to the chosen feature set from the previous step such that the model’s accuracy was the highest between all combinations. The process was repeated until the gain in the model’s accuracy upon adding a new feature became less than 1%.

Model pruning

After training the model, we transformed each DT in the RF into a respective Boolean satisfaction problem (SAT). Each branch was converted into a Boolean condition(r): ({x}_{1}) ({x}_{2})···({x}_{n}) where ({left{{x}_{i}right}}_{i=1}^{n}) are the conditions in each node in the branch and (r) was the result label node. Branches with the same result label (r) were stitched together using the’or’ logical operator (). Afterward, each Boolean condition was reduced to the minimal Boolean condition that satisfied the same inputs. The result of Boolean condition was converted back into a DT.

Statistical analysis

We performed a fivefold cross-validation to evaluate the model’s accuracy. The data was divided into five cohorts where four cohorts were used for the training cohort and one for the testing cohort. The process was repeated five times, allowing each patient to be included in both the training and test cohorts. The receiver operating characteristic (ROC) curve was used to measure the model’s classification ability. At each point, the recall and precision were presented in correspondence with a specific decision threshold. The area under the ROC curve (AUC) was used to quantify the model’s classification ability. Finally, the importance of each feature depended on the reduction of classification accuracy caused by removing the feature (e.g., information gain) [25].

Hyper-parameter fine-tuning

We performed hyper-parameter fine-tuning using the grid search method, based on the model’s accuracy [24]. The grid search was performed on

$${mathbb{H}}: = [depth,;MSPL,;LC,,n],$$

where depth is an individual DT tree depth; MSPL is the minimal number of samples for a leaf; LC is the leaf count; and n is the number of trees in the RF model.