Pharmaceutical pollution is becoming an increasing threat to aquatic environments since inactive compounds do not break down,and the drug products are accumulated in living organisms.The ability of a drug to dissolve ...Pharmaceutical pollution is becoming an increasing threat to aquatic environments since inactive compounds do not break down,and the drug products are accumulated in living organisms.The ability of a drug to dissolve in water(i.e.,LogS)is an important parameter for assessing a drug’s environmental fate,biovailability,and toxicity.LogS is typically measured in a laboratory setting,which can be costly and time-consuming,and does not provide the opportunity to conduct large-scale analyses.This research develops and evaluates machine learning models that can produce LogS estimates and may improve the environmental risk assessments of toxic pharmaceutical pollutants.We used a dataset from the ChEMBL database that contained 8832 molecular compounds.Various data preprocessing and cleaning techniques were applied(i.e.,removing the missing values),we then recorded chemical properties by normalizing and,even,using some feature selection techniques.We evaluated logS with a total of several machine learning and deep learning models,including;linear regression,random forests(RF),support vector machines(SVM),gradient boosting(GBM),and artificial neural networks(ANNs).We assessed model performance using a series of metrics,including root mean square error(RMSE)and mean absolute error(MAE),as well as the coefficient of determination(R^(2)).The findings show that the Least Angle Regression(LAR)model performed the best with an R^(2) value close to 1.0000,confirming high predictive accuracy.The OMP model performed well with good accuracy(R^(2)=0.8727)while remaining computationally cheap,while other models(e.g.,neural networks,random forests)performed well but were too computationally expensive.Finally,to assess the robustness of the results,an error analysis indicated that residuals were evenly distributed around zero,confirming the results from the LAR model.The current research illustrates the potential of AI in anticipating drug solubility,providing support for green pharmaceutical design and environmental risk assessment.Future work should extend predictions to include degradation and toxicity to enhance predictive power and applicability.展开更多
文摘Pharmaceutical pollution is becoming an increasing threat to aquatic environments since inactive compounds do not break down,and the drug products are accumulated in living organisms.The ability of a drug to dissolve in water(i.e.,LogS)is an important parameter for assessing a drug’s environmental fate,biovailability,and toxicity.LogS is typically measured in a laboratory setting,which can be costly and time-consuming,and does not provide the opportunity to conduct large-scale analyses.This research develops and evaluates machine learning models that can produce LogS estimates and may improve the environmental risk assessments of toxic pharmaceutical pollutants.We used a dataset from the ChEMBL database that contained 8832 molecular compounds.Various data preprocessing and cleaning techniques were applied(i.e.,removing the missing values),we then recorded chemical properties by normalizing and,even,using some feature selection techniques.We evaluated logS with a total of several machine learning and deep learning models,including;linear regression,random forests(RF),support vector machines(SVM),gradient boosting(GBM),and artificial neural networks(ANNs).We assessed model performance using a series of metrics,including root mean square error(RMSE)and mean absolute error(MAE),as well as the coefficient of determination(R^(2)).The findings show that the Least Angle Regression(LAR)model performed the best with an R^(2) value close to 1.0000,confirming high predictive accuracy.The OMP model performed well with good accuracy(R^(2)=0.8727)while remaining computationally cheap,while other models(e.g.,neural networks,random forests)performed well but were too computationally expensive.Finally,to assess the robustness of the results,an error analysis indicated that residuals were evenly distributed around zero,confirming the results from the LAR model.The current research illustrates the potential of AI in anticipating drug solubility,providing support for green pharmaceutical design and environmental risk assessment.Future work should extend predictions to include degradation and toxicity to enhance predictive power and applicability.