摘要
为了预测分子在人体小肠中的吸收,本文计算了表征分子的电子、拓扑、几何结构、分子形状等特征的10 2个分子描述符,用遗传算法变量选择方法使描述符减少到4 7个。体系共包含了2 30个化合物分子,6 9个不能被吸收(HIA ) ,16 1个可以被吸收(HIA + )。对建立的SVM模型,用5重交叉验证和独立测试集进行验证,预测正确率分别达到79 1%和77 1% ,结果具有较好的一致性。在模型验证中,通过聚类分析方法组合训练集和测试集,保证了模型的稳定性。
A set of molecular descriptors,including electronic descriptors,tpological descriptors,geometric descriptors and molecular shape indices,were calculated to characterize the structural and physicochemical properties for 230 chemical compounds.Support Vector Machine (SVM)classification method was employed to predict human intestinal absorption of molecules.Five-fold cross-validation methold was used to optimize the SVM model and genetic algorthm was used in variable selection,which reduced the number of molecular descriptors from 102 to 47.Five-fold cross-validation method and an independent evaluation set method were used to test SVM modle,where the training sets were effectively and evenly chosen in the descriptors space by clustering based on their chemical similarity,and both of the test methods ave consistent results.Our work suggests that a proper chioice of training set for 5-fold fross-validation methold or the independent test method can improve the efficiency of SVM model building.
出处
《化学研究与应用》
CAS
CSCD
北大核心
2005年第2期176-179,共4页
Chemical Research and Application
基金
国家自然科学基金资助项目 (2 0 473 0 5 4)