Advancing the integration of artificial intelligence and polymer science requires high-quality,open-source,and large-scale datasets.However,existing polymer databases often suffer from data sparsity,lack of polymer-pr...Advancing the integration of artificial intelligence and polymer science requires high-quality,open-source,and large-scale datasets.However,existing polymer databases often suffer from data sparsity,lack of polymer-property labels,and limited accessibility,hindering system-atic modeling across property prediction tasks.Here,we present OpenPoly,a curated experimental polymer database derived from extensive lit-erature mining and manual validation,comprising 3985 unique polymer-property data points spanning 26 key properties.We further develop a multi-task benchmarking framework that evaluates property prediction using four encoding methods and eight representative models.Our re-sults highlight that the optimized degree-of-polymerization encoding coupled with Morgan fingerprints achieves an optimal trade-off between computational cost and accuracy.In data-scarce condition,XGBoost outperforms deep learning models on key properties such as dielectric con-stant,glass transition temperature,melting point,and mechanical strength,achieving R2 scores of 0.65-0.87.To further showcase the practical utility of the database,we propose potential polymers for two energy-relevant applications:high temperature polymer dielectrics and fuel cell membranes.By offering a consistent and accessible benchmark and database,OpenPoly paves the way for more accurate polymer-property modeling and fosters data-driven advances in polymer genome engineering.展开更多
The molecular structures of hydrocarbons in straight run gasoline were numerically coded. The nonlinear quantitative relationship(QSRR) between gas chromatography(GC) retention indices of the hydrocarbons and their m...The molecular structures of hydrocarbons in straight run gasoline were numerically coded. The nonlinear quantitative relationship(QSRR) between gas chromatography(GC) retention indices of the hydrocarbons and their molecular structures were established by using an error back propagation(BP) algorithm. The GC retention indices of 150 hydrocarbons were then predicted by removing 15 compounds(as a test set) and using the 135 remained molecules as a calibration set. Through this procedure, all the compounds in the whole data set were then predicted in groups of 15 compounds. The results obtained by BP with the correlation coefficient and the standard deviation 0 993 4 and 16 54, are satisfied.展开更多
基金financially supported by the National Natural Science Foundation of China (Nos. 92372126,52373203)the Excellent Young Scientists Fund Program
文摘Advancing the integration of artificial intelligence and polymer science requires high-quality,open-source,and large-scale datasets.However,existing polymer databases often suffer from data sparsity,lack of polymer-property labels,and limited accessibility,hindering system-atic modeling across property prediction tasks.Here,we present OpenPoly,a curated experimental polymer database derived from extensive lit-erature mining and manual validation,comprising 3985 unique polymer-property data points spanning 26 key properties.We further develop a multi-task benchmarking framework that evaluates property prediction using four encoding methods and eight representative models.Our re-sults highlight that the optimized degree-of-polymerization encoding coupled with Morgan fingerprints achieves an optimal trade-off between computational cost and accuracy.In data-scarce condition,XGBoost outperforms deep learning models on key properties such as dielectric con-stant,glass transition temperature,melting point,and mechanical strength,achieving R2 scores of 0.65-0.87.To further showcase the practical utility of the database,we propose potential polymers for two energy-relevant applications:high temperature polymer dielectrics and fuel cell membranes.By offering a consistent and accessible benchmark and database,OpenPoly paves the way for more accurate polymer-property modeling and fosters data-driven advances in polymer genome engineering.
文摘The molecular structures of hydrocarbons in straight run gasoline were numerically coded. The nonlinear quantitative relationship(QSRR) between gas chromatography(GC) retention indices of the hydrocarbons and their molecular structures were established by using an error back propagation(BP) algorithm. The GC retention indices of 150 hydrocarbons were then predicted by removing 15 compounds(as a test set) and using the 135 remained molecules as a calibration set. Through this procedure, all the compounds in the whole data set were then predicted in groups of 15 compounds. The results obtained by BP with the correlation coefficient and the standard deviation 0 993 4 and 16 54, are satisfied.