摘要
为了实现多环芳烃(PAHs)毒性的有效预测,提出应用定量构效技术对多环芳烃的空气-正辛醇分配系数(KOA)和致癌性进行预测。应用分子描述符和试验值确立构效关系,采用支持向量机算法(SVM)和人工神经网络算法(ANN)分别建立了PAHs的KOA回归预测模型和致癌性分类预测模型。利用网格划分(GS)、遗传算法(GA)、粒子群算法(PSO)对SVM进行参数寻优。应用均方误差(MSE)、拟合决定系数R2和分类准确率(Accuracy)分别对模型进行了验证与评价。结果表明,最佳回归预测模型GS-SVR的MSE为0.059 7,R2为0.913 0;最佳分类预测模型GA-SVC的Accuracy为95%。研究表明:应用SVM所建两种模型的稳定性和预测能力都优于应用ANN建立的模型;参数优化后模型的稳定性和预测能力得到了提高。
This paper is inclined to propose a kind of quantitative structure efficiency technology for toxicity analysis in hoping to predict the toxicity of the polycyclic aromatic hydrocarbons( PAHs) and its toxic threatening degrees. The technology is mainly used to forecast or predict the air-octanol partition coefficient of the polycyclic aromatic hydrocarbons( PAHs) and its carcinogenicity. Therefore,the so-called quantitative structure activity relationship( QSAR) the paper has referred to should consist of the molecule descriptors and the experimental data. In the paper,we have established the models by adopting the supporting vector machine algorithm( SVM) and the artificial neural network algorithm( ANN). And,in terms of the MATLAB,we have established the KOAregression forecasting model of PAHs and the prediction model for classifying the carcinogenicity of the PAHs under way. Furthermore,we have managed to use the grid division( GS) method,the genetic algorithm( GA) and the particle swarm optimization algorithm( PSO) to optimize the parameters of SVM,accordingly. The verification and evaluation of the models are supposed to be achieved through the validation of the internal and external parameters. And,then,the KOAregression forecast result of PAHs can further be evaluated by means of R2 and MSE,whereas the classification accuracy can be used to test the results whether the carcinogenicity classification evaluation of PAHs proves effective or not. Moreover,the regression forecast may result in that the gained MSE of SVM ought to be less than MSE of ANN,while R2 of SVM should be bigger than R2 of ANN. In the result of classification prediction,the classification accuracy of SVM should be greater than that of ANN. After the optimization of the regression prediction,GS and PSO should increase R2 of the original model but decrease the MSE of the original model.However,GA may turn to be over the fitting phenomenon and result in that the MSE of the best regression prediction model may be: GS-SVR is 0. 059 7 with R2 being 0. 913 0. Thus,the classification accuracy of the three models should be greater than that of the original one on average on the condition that the classification prediction has been optimized. Therefore,the best classification model can produce a classification precision as high as GA-SVC is95%. The results of our experiments show that the stability and the predictive power of the two models offered by SVM can be better than those of ANN,whereas SVM can heighten the stability and the predictive ability of the original model through parameter optimization.
出处
《安全与环境学报》
CAS
CSCD
北大核心
2017年第4期1600-1604,共5页
Journal of Safety and Environment
基金
国家自然科学基金项目(51206038)
国家安全生产监督管理总局重大事故防治关键技术科技项目(heilongjiang-0001-2014AQ)