摘要
利用电距矢量 (MEDV)表征肽模拟物的分子结构 ,并与包含N、O、S杂原子 ,饱和与不饱和键或共轭体系的二肽分子的生物活性相关。利用多元线性回归方法 ,构建了两组二肽分子的定量构效关系 (QSAR)模型。对于 5 8个二肽组 ,模型相关系数和均方根误差分别为R =0 .84 2 3和RMS =0 .5 3 5。对于 4 8个二肽组 ,R =0 .8199,RMS =0 .3 5 7。为了检验QSAR模型的预测能力 ,对两个二肽组数据集进行了交叉校验 (CV)。采用LOO法即每次从n个样本中抽出n- 1个样本建立QSAR模型继而用该模型去预测余下的 1个样本的生物活性的方法。对于 5 8肽组 ,5 8次预测的生物活性与原实验活性之间的R =0 .790 6,RMS =0 .60 8,而 4 8肽组的R =0 .74 2 2和RMS =0 .4 17。MEDV只利用了分子二维拓扑图中元素电负性和相对化学键长的有关信息 ,不需要任何三维结构知识或分子校准步骤或有关物理化学性质的信息。此外构建QSAR模型时只利用MLR方法而不需应用主成分回归或偏最小二乘技术。方法简便快速 ,模型稳定有预测能力。
A novel molecular electronegativity-distance vector (MEDV) is proposed to characterize the structures of peptide molecule containing heteroatoms such as N, O, and S, unsaturated and conjugated chemical bonds such as CO,CC, and CN bonds and relate to biological activities of two panels of dipeptides. Utilizing multiple linear regression (MLR) method, two six-parameter quantitative structure-activity relationship (QSAR) models, one (labeled M1) for a set of 58 dipeptides and another (M2) for 48 dipeptides, have been developed with the correlation coefficient ( R ) of 0.842 3 for M1 and 0.819 9 for M2, and the root mean square error ( RMS ) of 0.535 for M1 and 0.357 for M2, between the estimated activities and the observed activities. To test the prediction ability of the QSAR models, a cross-validation procedure is performed by leave-one-out method with the predicted R of 0.790 6 for M1 and 0.742 2 for M2 and the predicted RMS of 0.608 (M1) and 0.417 (M2). The MEDV in the present paper only employs information about electronegativity of element atom type and length of chemical bond from the molecular graph and requires no 3D structures or molecular alignment or information related physicochemical properties. Besides, constructing QSAR model utilizes classical MLR technique and no principal component analysis (PCA) or partial least squares (PLS) method. So, the QSAR technique proposed in this paper is fast, easy of use, reproducible and predictable.
关键词
矢量描述子
分子电距矢量
MEDV
二肽
定量构效关系
肽模拟物
分子结构
生物活性
Vector descriptor
Molecular electronegativity-distance vector (MEDV)
dipeptide
Quantitative structure-activity relationship (QSAR)