The development of machine learning models has led to an abundance of datasets containing quantum mechanical(QM)calculations for molecular and material systems.However,traditional training methods for machine learning...The development of machine learning models has led to an abundance of datasets containing quantum mechanical(QM)calculations for molecular and material systems.However,traditional training methods for machine learning models are unable to leverage the plethora of data available as they require that each dataset be generated using the same QM method.Taking machine learning interatomic potentials(MLIPs)as an example,we show that meta-learning techniques,a recent advancement from the machine learning community,can be used to fit multiple levels of QMtheory in the same training process.Meta-learning changes the training procedure to learn a representation that can be easily re-trained to new tasks with small amounts of data.We then demonstrate that metalearning enables simultaneously training to multiple large organic molecule datasets.As a proof of concept,we examine the performance of aMLIP refit to a small drug-like molecule and show that pretraining potentials to multiple levels of theory with meta-learning improves performance.This difference in performance can be seen both in the reduced error and in the improved smoothness of the potential energy surface produced.We therefore show that meta-learning can utilize existing datasets with inconsistentQMlevels of theory to producemodels that are better at specializing to new datasets.This opens new routes for creating pre-trained,foundationmodels for interatomic potentials.展开更多
基金supported by the United States Department of Energy(US DOE),Office of Science,Basic Energy Sciences,Chemical Sciences,Geosciences,and Biosciences Division under Triad National Security,LLC(‘Triad’)contract grant no.89233218CNA000001(FWP:LANLE3F2)A.E.A.Allen and S.Matin also acknowledge the Center for Nonlinear Studies.Computer time was provided by the CCS-7 Darwin cluster at LANL.LAUR-23-27568.
文摘The development of machine learning models has led to an abundance of datasets containing quantum mechanical(QM)calculations for molecular and material systems.However,traditional training methods for machine learning models are unable to leverage the plethora of data available as they require that each dataset be generated using the same QM method.Taking machine learning interatomic potentials(MLIPs)as an example,we show that meta-learning techniques,a recent advancement from the machine learning community,can be used to fit multiple levels of QMtheory in the same training process.Meta-learning changes the training procedure to learn a representation that can be easily re-trained to new tasks with small amounts of data.We then demonstrate that metalearning enables simultaneously training to multiple large organic molecule datasets.As a proof of concept,we examine the performance of aMLIP refit to a small drug-like molecule and show that pretraining potentials to multiple levels of theory with meta-learning improves performance.This difference in performance can be seen both in the reduced error and in the improved smoothness of the potential energy surface produced.We therefore show that meta-learning can utilize existing datasets with inconsistentQMlevels of theory to producemodels that are better at specializing to new datasets.This opens new routes for creating pre-trained,foundationmodels for interatomic potentials.