摘要
We introduce HAPPY(Hierarchically Abstracted rePeat unit of PolYmers),a string representation for polymers,designed to efficiently encapsulate essential polymer structure features for property prediction.HAPPY assigns single constituent elements to groups of sub-structures and employs grammatically complete and independent connectors between chemical linkages.Using a limited number of datapoints,we trained neural networks utilizing both HAPPY and conventional SMILES encoding of repeated unit structures and compared their performance in predicting five polymer properties:dielectric constant,glass transition temperature,thermal conductivity,solubility,and density.The results showed that the HAPPY-based network could achieve higher prediction R-squared score and two-fold faster training times.We further tested the robustness and versatility of HAPPY-based network with an augmented training dataset.Additionally,we present topo-HAPPY(Topological HAPPY),an extension that incorporates topological details of the constituent connectivity,leading to improved solubility and glass transition temperature prediction R-squared score.
基金
supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(2018R1A5A 1025224)
by the Technology Innovation Program(20016176)funded By the Ministry of Trade,Industry&Energy(MI,Korea).