The complexity and diversity of polymer topologies,or chain architectures,present substantial challenges in predicting and engineering polymer properties.Although machine learning is increasingly used in polymer scien...The complexity and diversity of polymer topologies,or chain architectures,present substantial challenges in predicting and engineering polymer properties.Although machine learning is increasingly used in polymer science,applications to address architecturally complex polymers are nascent.Here,we use a generative machine learning model based on variational autoencoders and data generated from molecular dynamics simulations to design polymer topologies that exhibit target properties.展开更多
The prediction of crystal properties plays a crucial role in materials science and applications.Current methods for predicting crystal properties focus on modeling crystal structures using graph neural networks(GNNs)....The prediction of crystal properties plays a crucial role in materials science and applications.Current methods for predicting crystal properties focus on modeling crystal structures using graph neural networks(GNNs).However,accurately modeling the complex interactions between atoms and molecules within a crystal remains a challenge.Surprisingly,predicting crystal properties from crystal text descriptions is understudied,despite the rich information and expressiveness that text data offer.In this paper,we develop and make public a benchmark dataset(TextEdge)that contains crystal text descriptions with their properties.We then propose LLM-Prop,a method that leverages the generalpurpose learning capabilities of large language models(LLMs)to predict properties of crystals from their text descriptions.LLM-Prop outperforms the current state-of-the-art GNN-based methods by approximately 8%on predicting band gap,3%on classifying whether the band gap is direct or indirect,and 65%on predicting unit cell volume,and yields comparable performance on predicting formation energy per atom,energy per atom,and energy above hull.LLM-Prop also outperforms the fine-tuned MatBERT,a domain-specific pre-trained BERT model,despite having 3 times fewer parameters.We further fine-tune the LLM-Prop model directly on CIF files and condensed structure information generated by Robocrystallographer and found that LLM-Prop fine-tuned on text descriptions provides a better performance on average.Our empirical results highlight the importance of having a natural language input to LLMs to accurately predict crystal properties and the current inability of GNNs to capture information pertaining to space group symmetry and Wyckoff sites for accurate crystal property prediction.展开更多
基金M.A.W.and A.B.D acknowledge funding from the Princeton Catalysis Initiative for this researchM.A.W.and S.J.also acknowledge support from the donors of ACS Petroleum Research Fund under Doctoral New Investigator Grant 66706-DNI7.
文摘The complexity and diversity of polymer topologies,or chain architectures,present substantial challenges in predicting and engineering polymer properties.Although machine learning is increasingly used in polymer science,applications to address architecturally complex polymers are nascent.Here,we use a generative machine learning model based on variational autoencoders and data generated from molecular dynamics simulations to design polymer topologies that exhibit target properties.
基金support from the Schmidt DataX Fund at Princeton University made possible through a major gift from the Schmidt Futures FoundationAdji Bousso Dieng acknowledges support from the National Science Foundation,Office of Advanced Cyberinfrastructure(OAC)#2118201,and from the Schmidt Futures AI2050 Early Career Fellowship.
文摘The prediction of crystal properties plays a crucial role in materials science and applications.Current methods for predicting crystal properties focus on modeling crystal structures using graph neural networks(GNNs).However,accurately modeling the complex interactions between atoms and molecules within a crystal remains a challenge.Surprisingly,predicting crystal properties from crystal text descriptions is understudied,despite the rich information and expressiveness that text data offer.In this paper,we develop and make public a benchmark dataset(TextEdge)that contains crystal text descriptions with their properties.We then propose LLM-Prop,a method that leverages the generalpurpose learning capabilities of large language models(LLMs)to predict properties of crystals from their text descriptions.LLM-Prop outperforms the current state-of-the-art GNN-based methods by approximately 8%on predicting band gap,3%on classifying whether the band gap is direct or indirect,and 65%on predicting unit cell volume,and yields comparable performance on predicting formation energy per atom,energy per atom,and energy above hull.LLM-Prop also outperforms the fine-tuned MatBERT,a domain-specific pre-trained BERT model,despite having 3 times fewer parameters.We further fine-tune the LLM-Prop model directly on CIF files and condensed structure information generated by Robocrystallographer and found that LLM-Prop fine-tuned on text descriptions provides a better performance on average.Our empirical results highlight the importance of having a natural language input to LLMs to accurately predict crystal properties and the current inability of GNNs to capture information pertaining to space group symmetry and Wyckoff sites for accurate crystal property prediction.