The rapid development of foundation potentials(FPs)in machine learning interatomic potentials demonstrates the possibility for generalizable learning of the universal potential energy surface.The accuracy of FPs can b...The rapid development of foundation potentials(FPs)in machine learning interatomic potentials demonstrates the possibility for generalizable learning of the universal potential energy surface.The accuracy of FPs can be further improved by bridging the model from lower-fidelity datasets to highfidelity ones.In this work,we analyze the challenge of this transfer learning(TL)problem within the CHGNet framework.We show that significant energy scale shifts and poor correlations between GGA and r^(2)SCAN hinder cross-functional transferability.By benchmarking different TL approaches on the MP-r^(2)SCAN dataset,we demonstrate the importance of elemental energy referencing in the TL of FPs.By comparing the scaling law with and without the pre-training on a low-fidelity dataset,we show that significant data efficiency can still be achieved through TL,even with a target dataset of sub-million structures.We highlight the importance of proper TL and multi-fidelity learning in creating nextgeneration FPs on high-fidelity data.展开更多
基金funded by the U.S. Department of Energy, Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division under Contract No. DE-AC0205CH11231 (Materials Project program KC23MP)The work was also supported by the computational resources provided by the Extreme Science and Engineering Discovery Environment (XSEDE), supported by National Science Foundation grant number ACI1053575the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, and the Swift Cluster resource provided by the National Renewable Energy Laboratory (NREL). The authors thank Luca Binci and Lauren N. Walters for valuable discussions.
文摘The rapid development of foundation potentials(FPs)in machine learning interatomic potentials demonstrates the possibility for generalizable learning of the universal potential energy surface.The accuracy of FPs can be further improved by bridging the model from lower-fidelity datasets to highfidelity ones.In this work,we analyze the challenge of this transfer learning(TL)problem within the CHGNet framework.We show that significant energy scale shifts and poor correlations between GGA and r^(2)SCAN hinder cross-functional transferability.By benchmarking different TL approaches on the MP-r^(2)SCAN dataset,we demonstrate the importance of elemental energy referencing in the TL of FPs.By comparing the scaling law with and without the pre-training on a low-fidelity dataset,we show that significant data efficiency can still be achieved through TL,even with a target dataset of sub-million structures.We highlight the importance of proper TL and multi-fidelity learning in creating nextgeneration FPs on high-fidelity data.