Version 1
: Received: 15 February 2021 / Approved: 16 February 2021 / Online: 16 February 2021 (10:04:48 CET)
Version 2
: Received: 18 February 2021 / Approved: 18 February 2021 / Online: 18 February 2021 (16:05:17 CET)
Version 3
: Received: 23 February 2021 / Approved: 24 February 2021 / Online: 24 February 2021 (13:14:01 CET)
García-Sosa, A.T. Androgen Receptor Binding Category Prediction with Deep Neural Networks and Structure-, Ligand-, and Statistically Based Features. Molecules2021, 26, 1285.
García-Sosa, A.T. Androgen Receptor Binding Category Prediction with Deep Neural Networks and Structure-, Ligand-, and Statistically Based Features. Molecules 2021, 26, 1285.
García-Sosa, A.T. Androgen Receptor Binding Category Prediction with Deep Neural Networks and Structure-, Ligand-, and Statistically Based Features. Molecules2021, 26, 1285.
García-Sosa, A.T. Androgen Receptor Binding Category Prediction with Deep Neural Networks and Structure-, Ligand-, and Statistically Based Features. Molecules 2021, 26, 1285.
Abstract
Substances that can modify the androgen receptor pathway in humans and animals are entering the environment and food chain with the proven ability to disrupt hormonal systems and leading to toxicity and adverse effects on reproduction, brain development, and prostate cancer, among others. State-of-the-art databases with experimental data of human, chimp, and rat effects by chemicals have been used to build machine learning classifiers and regressors and evaluate these on independent sets. Different featurizations, algorithms, and protein structures lead to dif- ferent results, with deep neural networks (DNNs) on user-defined physicochemically-relevant features developed for this work outperforming graph convolutional, random forest, and large featurizations. The results show that these user-provided structure-, ligand-, and statistically-based features and specific DNNs provided the best results as determined by AUC (0.87), MCC (0.47), and other metrics and by their interpretability and chemical meaning of the descriptors/features. In addition, the same features in the DNN method performed better than in a multivariate logistic model: validation MCC = 0.468 and training MCC = 0.868 for the present work compared to evalu- ation set MCC = 0.2036 and training set MCC = 0.5364 for the multivariate logistic regression on the full, unbalanced set. Techniques of this type may improve AR and toxicity description and predic- tion, improving assessment and design of compounds. Source code and data are available at https://github.com/AlfonsoTGarcia-Sosa/ML
Machine Learning; Artificial Intelligence; Androgen Receptor; Random Forest; Deep Neural Network; Convolutional
Subject
Medicine and Pharmacology, Immunology and Allergy
Copyright:
This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Commenter: Alfonso T. Garcia-Sosa
Commenter's Conflict of Interests: Author