Atoms in most organic molecules are often carbon,oxygen,nitrogen,sulfur,halogens,etc. Based on the three-dimensional structure of a molecule,a molecular structural characterization(MSC) method called improved molecu...Atoms in most organic molecules are often carbon,oxygen,nitrogen,sulfur,halogens,etc. Based on the three-dimensional structure of a molecule,a molecular structural characterization(MSC) method called improved molecular electronegativity-distance vector(I-MEDV) was developed. It was used to describe the structures of 37 compounds of styrax japonicus sieb flowers. Through multiple linear regression(MLR),a QSRR model was built up. The correlation coefficient(R1) of the model was 0.980. Then,4 vectors were selected to build another model through the method of stepwise multiple regression(SMR) ,and the correlation coefficient(R2) of the model was 0.975. Moreover,all the two models were evaluated by performing the crossvalidation with the leave-one-out(LOO) procedure and the correlation coefficients(Rcv) were 0.948 and 0.968,respectively. The results show that the I-MEDV could successfully describe the structures of organic compounds. The stability and predictability of the models were good.展开更多
A molecular electronegativity distance vector based on 13 atomic types (MEDV-13) is used to describe the structures of 62 polychlorinated naphthalene (PCN) congeners and related to the gas chromatographic relative ret...A molecular electronegativity distance vector based on 13 atomic types (MEDV-13) is used to describe the structures of 62 polychlorinated naphthalene (PCN) congeners and related to the gas chromatographic relative retention indices (RIs) of PCNs. Using multiple linear regression, a 4-variable quantitative structure-retention relationship (QSRR) with the correlation coefficient of estimations (r) being 0.9912 and the root mean square error of estimations (RMSEE) being 31.4 and the correlation coefficient of predictions (q) and the root mean square error of predictions (RMSEP) in the leave-one-out procedure are 0.9898 and 33.76, respectively.展开更多
Polychlorinated dibenzothiophenes(PCDTs) are classified as persistent organic pollutants in the environment,so the analysis of PCDTs by their gas chromatographic behaviors is of great significance.Quantitative struc...Polychlorinated dibenzothiophenes(PCDTs) are classified as persistent organic pollutants in the environment,so the analysis of PCDTs by their gas chromatographic behaviors is of great significance.Quantitative structure-retention relationship(QSRR) analysis is a useful technique capable of relating chromatographic retention time to the molecular structure.In this paper,a QSRR study of 37 PCDTs was carried out by using molecular electronegativity distance vector(MEDV) descriptors and multiple linear regression(MLR) and partial least-squares regression(PLS) methods.The correlation coefficient R of established MLR,PLS models,leave-one-out(LOO) cross-validation(CV),Q2ext were 0.9951,0.9942,0.9839(MLR) and 0.9925,0.9915,0.9833(PLS),respectively.Results showed that the model exhibited excellent estimate capability for internal sample set and good predictive capability for external sample set.By using MEDV descriptors,the QSRR model can provide a simple and rapid way to predict the gas-chromatographic retention indices of polychlorinated dibenzothiophenes in conditions of lacking standard samples or poor experimental conditions.展开更多
The Boston housing dataset is one of the significant tools used to examine the influencing factors of housing price.Meanwhile,housing price prediction is crucial for government regulation,business decision-making,and ...The Boston housing dataset is one of the significant tools used to examine the influencing factors of housing price.Meanwhile,housing price prediction is crucial for government regulation,business decision-making,and individual homebuyers.Existing studies fall short in balancing model interpretability,computational efficiency,and generalization ability.Hence,this study,based on the Boston housing dataset,focuses on the prediction of the Median Value of Owner-Occupied Homes(MEDV).It constructs three models,including linear regression,decision tree regression,and Bayesian regression,and evaluates their performance using Mean Squared Error(MSE),Mean Absolute Error(MAE),and the coefficient of determination(R^(2)).It examines the effects 13 features,including Per Capita Crime Rate by Town(CRIM)and Average Number of Rooms Per Dwelling(RM)on MEDV.Furthermore,it trains and visualizes the results of each model.The results show that decision tree regression achieves the highest R^(2),effectively capturing nonlinear relationships but being prone to overfitting.Linear regression and Bayesian regression perform better in terms of MSE and MAE;the former is simple in structure and fast to train,while the latter can output probability distributions to assess uncertainty.Each model has its strengths,and the choice should depend on the application scenario.The limitations related to dataset timeliness and the lack of extensive hyperparameter tuning are acknowledged,providing useful insights for housing price prediction research and practice.展开更多
The molecular electronegativity-distance vector (MEDV) is employed todescribe the chemical structure of organic pollutants.Quantitative linear relationships between themolecular descriptors and BCF values are develope...The molecular electronegativity-distance vector (MEDV) is employed todescribe the chemical structure of organic pollutants.Quantitative linear relationships between themolecular descriptors and BCF values are developed by best subset regression and partial leastsquare regression analysis.The main structural factors influencing the bioactivities are -CH2,-X,-Cnot 【,- C not 【,-O-.The high values of r2 and q2_(LOO) present good estimation ability and stabilityof models.The prediction power for external samples is validated by the model developed from thetraining set.展开更多
基金supported by the Youth Foundation of Education Bureau,Sichuan Province (09ZB036)Technology Bureau,Sichuan Province (2006j13-141)
文摘Atoms in most organic molecules are often carbon,oxygen,nitrogen,sulfur,halogens,etc. Based on the three-dimensional structure of a molecule,a molecular structural characterization(MSC) method called improved molecular electronegativity-distance vector(I-MEDV) was developed. It was used to describe the structures of 37 compounds of styrax japonicus sieb flowers. Through multiple linear regression(MLR),a QSRR model was built up. The correlation coefficient(R1) of the model was 0.980. Then,4 vectors were selected to build another model through the method of stepwise multiple regression(SMR) ,and the correlation coefficient(R2) of the model was 0.975. Moreover,all the two models were evaluated by performing the crossvalidation with the leave-one-out(LOO) procedure and the correlation coefficients(Rcv) were 0.948 and 0.968,respectively. The results show that the I-MEDV could successfully describe the structures of organic compounds. The stability and predictability of the models were good.
基金We are especially grateful to the China Postdoctoral Science Foundation and the National High Technology Project of China (No. 2001AA640601) for their financial supports.
文摘A molecular electronegativity distance vector based on 13 atomic types (MEDV-13) is used to describe the structures of 62 polychlorinated naphthalene (PCN) congeners and related to the gas chromatographic relative retention indices (RIs) of PCNs. Using multiple linear regression, a 4-variable quantitative structure-retention relationship (QSRR) with the correlation coefficient of estimations (r) being 0.9912 and the root mean square error of estimations (RMSEE) being 31.4 and the correlation coefficient of predictions (q) and the root mean square error of predictions (RMSEP) in the leave-one-out procedure are 0.9898 and 33.76, respectively.
基金supported by the Foundation of Returned Scholars (Main Program) of Shanxi Province (200902)
文摘Polychlorinated dibenzothiophenes(PCDTs) are classified as persistent organic pollutants in the environment,so the analysis of PCDTs by their gas chromatographic behaviors is of great significance.Quantitative structure-retention relationship(QSRR) analysis is a useful technique capable of relating chromatographic retention time to the molecular structure.In this paper,a QSRR study of 37 PCDTs was carried out by using molecular electronegativity distance vector(MEDV) descriptors and multiple linear regression(MLR) and partial least-squares regression(PLS) methods.The correlation coefficient R of established MLR,PLS models,leave-one-out(LOO) cross-validation(CV),Q2ext were 0.9951,0.9942,0.9839(MLR) and 0.9925,0.9915,0.9833(PLS),respectively.Results showed that the model exhibited excellent estimate capability for internal sample set and good predictive capability for external sample set.By using MEDV descriptors,the QSRR model can provide a simple and rapid way to predict the gas-chromatographic retention indices of polychlorinated dibenzothiophenes in conditions of lacking standard samples or poor experimental conditions.
文摘The Boston housing dataset is one of the significant tools used to examine the influencing factors of housing price.Meanwhile,housing price prediction is crucial for government regulation,business decision-making,and individual homebuyers.Existing studies fall short in balancing model interpretability,computational efficiency,and generalization ability.Hence,this study,based on the Boston housing dataset,focuses on the prediction of the Median Value of Owner-Occupied Homes(MEDV).It constructs three models,including linear regression,decision tree regression,and Bayesian regression,and evaluates their performance using Mean Squared Error(MSE),Mean Absolute Error(MAE),and the coefficient of determination(R^(2)).It examines the effects 13 features,including Per Capita Crime Rate by Town(CRIM)and Average Number of Rooms Per Dwelling(RM)on MEDV.Furthermore,it trains and visualizes the results of each model.The results show that decision tree regression achieves the highest R^(2),effectively capturing nonlinear relationships but being prone to overfitting.Linear regression and Bayesian regression perform better in terms of MSE and MAE;the former is simple in structure and fast to train,while the latter can output probability distributions to assess uncertainty.Each model has its strengths,and the choice should depend on the application scenario.The limitations related to dataset timeliness and the lack of extensive hyperparameter tuning are acknowledged,providing useful insights for housing price prediction research and practice.
基金Supported by the Natural Science Foundation of the Education Commission of Jiangsu Province (Grant No. 07KJB610061)the National Natural Science Founda-tion of China (Grant No. 20577023)+1 种基金the "973" Program (Grant No. 2003CB415002)the "863" Program (Grant No. 2001AA640601-4)
文摘The molecular electronegativity-distance vector (MEDV) is employed todescribe the chemical structure of organic pollutants.Quantitative linear relationships between themolecular descriptors and BCF values are developed by best subset regression and partial leastsquare regression analysis.The main structural factors influencing the bioactivities are -CH2,-X,-Cnot 【,- C not 【,-O-.The high values of r2 and q2_(LOO) present good estimation ability and stabilityof models.The prediction power for external samples is validated by the model developed from thetraining set.