To address the challenge of limited experimental materials data,extensive physical property databases are being developed based on high-throughput computational experiments,such as molecular dynamics simulations.Previ...To address the challenge of limited experimental materials data,extensive physical property databases are being developed based on high-throughput computational experiments,such as molecular dynamics simulations.Previous studies have shown that fine-tuning a predictor pretrained on a computational database to a real system can result in models with outstanding generalization capabilities compared to learning from scratch.This study demonstrates the scaling law of simulationto-real(Sim2Real)transfer learning for several machine learning tasks in materials science.Case studies of three prediction tasks for polymers and inorganic materials reveal that the prediction error on real systems decreases according to a power-law as the size of the computational data increases.Observing the scaling behavior offers various insights for database development,such as determining the sample size necessary to achieve a desired performance,identifying equivalent sample sizes for physical and computational experiments,and guiding the design of data production protocols for downstream real-world tasks.展开更多
基金support from MEXT as“Program for Promoting Researches on the Supercomputer Fugaku”(project ID:hp210264)JST CREST(Grant Numbers JPMJCR19I3,JPMJCR22O3,JPMJCR2332)+5 种基金MEXT/JSPS KAKENHI Grant-in-Aid for Scientific Research on Innovative Areas(19H05820)Grant-in-Aid for Scientific Research(A)(19H01132)Grant-in-Aid for Research Activity Start-up(23K19980)Grant-in-Aid for Scientific Research(C)(22K11949)Computational resources were provided by Fugaku at the RIKEN Center for Computational Science,Kobe,Japan(hp210264)the supercomputer at the Research Center for Computational Science,Okazaki,Japan(project:23-IMS-C113,24-IMS-C107).
文摘To address the challenge of limited experimental materials data,extensive physical property databases are being developed based on high-throughput computational experiments,such as molecular dynamics simulations.Previous studies have shown that fine-tuning a predictor pretrained on a computational database to a real system can result in models with outstanding generalization capabilities compared to learning from scratch.This study demonstrates the scaling law of simulationto-real(Sim2Real)transfer learning for several machine learning tasks in materials science.Case studies of three prediction tasks for polymers and inorganic materials reveal that the prediction error on real systems decreases according to a power-law as the size of the computational data increases.Observing the scaling behavior offers various insights for database development,such as determining the sample size necessary to achieve a desired performance,identifying equivalent sample sizes for physical and computational experiments,and guiding the design of data production protocols for downstream real-world tasks.