期刊文献+

基于地统计学与支持向量回归的QSAR建模 被引量:13

A Novel QSAR Model Based on Geostatistics and Support Vector Regression
在线阅读 下载PDF
导出
摘要 基于主成分分析(PCA)、地统计学(GS)和支持向量回归(SVR),提出了一种新的定量构效关系(QSAR)个体化预测方法——Weight-PCA-GS-SVR.其基本思路是:先以PCA降维并消除自变量间的信息冗余,继以SVR经非线性主成分筛选去除与因变量无关的主成分,再以保留主成分计算样本间的加权距离,然后以高维GS确定公用变程;每一个待测样本都以自身为中心从训练集中找出加权距离小于公用变程的私有k个近邻,以SVR训练建模完成个体化预测.Weight-PCA-GS-SVR从行、列两个方向对模型进行了优化,为自变量提供了一种新的加权方法,为解决最优k近邻选择难题提供了新的思路,并具有SVR原来的优点.经3个化合物活性实例数据集验证,新方法在所有参比模型中预测精度最高,且明显优于文献报道结果,Weight-PCA-GS-SVR在QSAR等回归预测领域有较广泛的应用前景. Based on principal component analysis (PCA), geostatistics (GS) and support vector regression (SVR), a novel individual forecasting method for quantitative structure-activity relationship (QSAR) Weight-PCA-GS-SVR was proposed. The basic principles were as follows: firstly, dimensions were reduced and redundant information from independent descriptors was eliminated using PCA; secondly, the principal components that have no relationship to activity were removed nonlinearly using SVR; thirdly, weighted distances between samples were calculated by the retained principal components; fourthly, a common range was confirmed using high-dimensional geostatistics; lastly, k nearest neighbors of each test sample were found from the training set with their weighted distances shorter than a common range and then the models were constructed and the individual prediction was found to be feasible using SVR. Weight-PCA-GS-SVR optimized the model along the column direction (descriptor) and row direction (sample), and had all the advantages of SVR. It therefore provides a new way to choose k nearest neighbors in the field as well as being a novel weighted method for determining the retained principal components or the retained descriptors. Predicted results from three data sets all verify that the novel method has the highest prediction precision among all reference models and has a remarkable advantage over reported results. Weight-PCA-GS-SVR, therefore, can be widely used in QSAR and other regression prediction fields.
出处 《物理化学学报》 SCIE CAS CSCD 北大核心 2009年第8期1587-1592,共6页 Acta Physico-Chimica Sinica
基金 国家自然科学基金(30570351) 教育部新世纪优秀人才计划(NCET-06-0710) 高等学校博士点基金(200805370002)资助项目~~
关键词 定量构效关系 地统计学 支持向量回归 主成分分析 个体化预测 Quantitative structure-activity relationship Geostatistics Support vector regression Principal component analysis Individual prediction
  • 相关文献

参考文献28

二级参考文献113

共引文献555

同被引文献101

引证文献13

二级引证文献65

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部