Evaluation of the ML models
We construct separate predictive models for two tasks: classification and regression. In the former case, the compounds are assigned to one of the metabolic stability classes (stable, unstable, and of middle stability) according to their half-lifetime (the T1/2 thresholds used for the assignment to particular stability class are provided in the Methods section), and the prediction power of ML models is evaluated with the Area Under the Receiver Operating Characteristic Curve (AUC) . In the case of regression studies, we assess the prediction correctness with the use of the Root Mean Square Error (RMSE); however, during the hyperparameter optimization we optimize for the Mean Square Error (MSE). Analysis of the dataset division into the training and test set as the possible source of bias in the results is presented in the Appendix 1. The model evaluation is presented in Fig. 1, where the performance on the test set of a single model selected during the hyperparameter optimization is shown.
In general, the predictions of compound half-lifetimes are satisfactory with AUC values over 0.8 and RMSE below 0.4–0.45. These are slightly higher values than AUC reported by Schwaighofer et al. (0.690–0.835), although datasets used there were different and the model performances cannot be directly compared . All class assignments performed on human data are more effective for KRFP with the improvement over MACCSFP ranging from ~ 0.02 for SVM and trees up to 0.09 for Naïve Bayes. Classification efficiency performed on rat data is more consistent for different compound representations with AUC variation of around 1 percentage point. Interestingly, in this case MACCSFP provides slightly more effective predictions than KRFP. When particular algorithms are considered, trees are slightly preferred over SVM (~ 0.01 of AUC), whereas predictions provided by the Naïve Bayes classifiers are worse—for human data up to 0.15 of AUC for MACCSFP. Differences for particular ML algorithms and compound representations are much lower for the assignment to metabolic stability class using rat data—maximum AUC variation is equal to 0.02.
When regression experiments are considered, the KRFP provides better half-lifetime predictions than MACCSFP for 3 out of 4 experimental setups—only for studies on rat data with the use of trees, the RMSE is higher by 0.01 for KRFP than for MACCSFP. There is ~ 0.02–0.03 RMSE difference between trees and SVMs with the slight preference (lower RMSE) for SVM. SVM-based evaluations are of similar prediction power for human and rat data, whereas for trees, there is ~ 0.03 RMSE difference between the prediction errors obtained for human and rat data.
Regression vs. classification
Besides performing ‘standard’ classification and regression experiments, we also pose an additional research question related to the efficiency of the regression models in comparison to their classification counterparts. To this end, we prepare the following analysis: the outcome of a regression model is used to assign the stability class of a compound, applying the same thresholds as for the classification experiments. Accuracy of such classification is presented in Table 1.
Analysis of the classification experiments performed via regression-based predictions indicate that depending on the experimental setup, the predictive power of particular method varies to a relatively high extent. For the human dataset, the ‘standard classifiers’ always outperform class assignment based on the regression models, with accuracy difference ranging from ~ 0.045 (for trees/MACCSFP), up to ~ 0.09 (for SVM/KRFP). On the other hand, predicting exact half-lifetime value is more effective basis for class assignment when working on the rat dataset. The accuracy differences are much lower in this case (between ~ 0.01 and 0.02), with an exception of SVM/KRFP with difference of ~ 0.75. The accuracy values obtained in classification experiments for the human dataset are similar to accuracies reported by Lee et al. (75%)  and Hu et al. (75–78%) , though one must remember that the datasets used in these studies are different from ours and therefore a direct comparison is impossible.
Global analysis of all ChEMBL data
We analyzed the predictions obtained on the ChEMBL data with the use of SHAP values in order to find these substructural features, which have the highest contribution to particular class assignment (Fig. 2) or prediction of exact half-lifetime value (Fig. 3); class 0—unstable compounds, class 1—compounds of middle stability, class 2—stable compounds.
Analysis of Fig. 2 reveals that among the 20 features which are indicated by SHAP values as the most important overall, most features contribute rather to the assignment of a compound to the group of unstable molecules than to the stable ones—bars referring to class 0 (unstable compounds, blue) are significantly longer than green bars indicating influence on classifying compound as stable (for SVM and trees). However, we stress that these are averaged tendencies for the whole dataset and that they consider absolute values of SHAP. Observations for individual compounds might be significantly different and the set of highest contributing features can vary to high extent when shifting between particular compounds. Moreover, the high absolute values of SHAP in the case of the unstable class can be caused by two factors: (a) a particular feature makes the compound unstable and therefore it is assigned to this class, (b) a particular feature makes compound stable—in such case, the probability of compound assignment to the unstable class is significantly lower resulting in negative SHAP value of high magnitude.
For both Naïve Bayes classifier as well as trees it is visible that the primary amine group has the highest impact on the compound stability. As a matter of fact, the primary amine group is the only feature which is indicated by trees as contributing mostly to compound instability. However, according to the above-mentioned remark, it suggests that this feature is important for unstable class, but because of the nature of the analysis it is unclear whether it increases or decreases the possibility of particular class assignment.
Amines are also indicated as important for evaluation of metabolic stability for regression models, for both SVM and trees. Furthermore, regression models indicate a number of nitrogen- and oxygen-containing moieties as important for prediction of compound half-lifetime (Fig. 3). However, the contribution of particular substructures should be analyzed separately for each compound in order to verify the exact nature of their contribution.
In order to examine to what extent the choice of the ML model influences the features indicated as important in particular experiment, Venn diagrams visualizing overlap between sets of features indicated by SHAP values are prepared and shown in Fig. 4. In each case, 20 most important features are considered.
When different classifiers are analyzed, there is only one common feature which is indicated by SHAP for all three models: the primary amine group. The lowest overlap between pairs of models occurs for Naïve Bayes and SVM (only one feature), whereas the highest (8 features) for Naïve Bayes and trees. For SVM and trees, the SHAP values indicate 4 common features as the highest contributors to the assignment to particular stability class. Nevertheless, we should remember that for Naïve Bayes the prediction accuracy was significantly lower than for SVM or trees; and therefore, the features indicated by this approach are also less reliable.
Finally, 4 features are common for SVM and trees in the case of regression experiments: the already mentioned primary amine group, alkoxy-substituted phenyl, secondary amine, and ester. This is in line with the intuition on the possible transformations that can occur for compounds containing these chemical moieties.
In order to verify the applicability of the developed methodology on particular case, we analyze the output of an example compound (Fig. 5).
The highest contribution to the stability of CHEMBL2207577 is indicated to be the aromatic ring with the chlorine atom attached (feature 3545) and thiophen (feature 1915), the secondary amine (feature 677) lowers the probability of assignment to the stable class. All these features are present in the examined compounds and their metabolic stability indications are already known by chemists and they are in line with the results of the SHAP analysis.
The results of all experiments can be analyzed in detail with the use of the web service, which can be found at https://metstab-shap.matinf.uj.edu.pl/. In addition, the user can submit their own compound and its metabolic stability will be evaluated with the use of the constructed models and the contribution of particular structural features will be evaluated with the use of the SHAP values (Fig. 6). Moreover, in order to enable manual comparisons, the most similar compound from the ChEMBL set (in terms of the Tanimoto coefficient calculated on Morgan fingerprints) is provided for each submitted compound (if the similarity is above the 0.3 threshold).
Obtaining such information enables optimization of metabolic stability as the substructures influencing this parameter are detected. Moreover, the comparison of several ML models and compound representations allows to provide a comprehensive overview of the problem.
An example analysis of the output of the presented web service and its application in the compound optimization in terms of its metabolic stability is presented in Fig. 7.
The analysis of the submitted compound (evaluated in the classification studies as stable) indicates that the highest positive contribution to its metabolic stability has benzaldehyde moiety, and the feature which has a negative contribution to the assignment to the stable class is aliphatic sulphur. The most similar compound from the ChEMBL dataset is CHEMBL2315653, which differs from the submitted compound only by the presence of a fluorine atom. For this compound, the substructure indicated as the one with the highest positive contribution to compound stability is fluorophenyl. Therefore, the proposed structural modifications of the submitted compound involves the addition of the fluorine atom to the phenyl ring and the substitution of sulfone by ketone.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.