Machine learning (ML) has emerged as a promising method for predicting the properties of small molecules in the field of drug discovery. In this article, we offer a comprehensive overview of the various ML techniques that have been developed for this purpose in recent years. We cover a wide range of properties, including binding affinities, solubility, and ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity). Additionally, we discuss popular datasets and molecular descriptors and embeddings, such as chemical fingerprints and graph-based neural networks.
We also address the challenges associated with predicting and optimizing multiple properties during the hit-to-lead and lead optimization stages of drug discovery. We briefly explore potential multi-objective optimization techniques that can be used to balance diverse properties while optimizing lead candidates. Furthermore, we evaluate techniques for providing an understanding of model predictions, particularly for critical decision-making in drug discovery.
Overall, this review offers valuable insights into the landscape of ML models for predicting small molecule properties in drug discovery. While there are numerous approaches, their performances often yield comparable results. Interestingly, neural networks, despite their flexibility, do not consistently outperform simpler models. This underscores the importance of high-quality training data for training accurate models. Additionally, there is a need for standardized benchmarks, additional performance metrics, and best practices to facilitate more comprehensive comparisons between different techniques and models, thereby shedding light on their differences.