Lack of rigorous reproducibility and validation are significant hurdles for scientific development across many fields.Materials science,in particular,encompasses a variety of experimental and theoretical approaches th...Lack of rigorous reproducibility and validation are significant hurdles for scientific development across many fields.Materials science,in particular,encompasses a variety of experimental and theoretical approaches that require careful benchmarking.Leaderboard efforts have been developed previously to mitigate these issues.However,a comprehensive comparison and benchmarking on an integrated platform with multiple data modalities with perfect and defect materials data is still lacking.This work introduces JARVIS-Leaderboard,an open-source and community-driven platform that facilitates benchmarking and enhances reproducibility.The platform allows users to set up benchmarks with customtasks and enables contributions in the form of dataset,code,and meta-data submissions.We cover the following materials design categories:Artificial Intelligence(AI),Electronic Structure(ES).展开更多
The Joint Automated Repository for Various Integrated Simulations(JARVIS)is an integrated infrastructure to accelerate materials discovery and design using density functional theory(DFT),classical force-fields(FF),and...The Joint Automated Repository for Various Integrated Simulations(JARVIS)is an integrated infrastructure to accelerate materials discovery and design using density functional theory(DFT),classical force-fields(FF),and machine learning(ML)techniques.JARVIS is motivated by the Materials Genome Initiative(MGI)principles of developing open-access databases and tools to reduce the cost and development time of materials discovery,optimization,and deployment.展开更多
Recent advances in machine learning(ML)have led to substantial performance improvement in material database benchmarks,but an excellent benchmark score may not imply good generalization performance.Here we show that M...Recent advances in machine learning(ML)have led to substantial performance improvement in material database benchmarks,but an excellent benchmark score may not imply good generalization performance.Here we show that ML models trained on Materials Project 2018 can have severely degraded performance on new compounds in Materials Project 2021 due to the distribution shift.We discuss how to foresee the issue with a few simple tools.Firstly,the uniform manifold approximation and projection(UMAP)can be used to investigate the relation between the training and test data within the feature space.Secondly,the disagreement between multiple ML models on the test data can illuminate out-of-distribution samples.We demonstrate that the UMAP-guided and query by committee acquisition strategies can greatly improve prediction accuracy by adding only 1%of the test data.We believe this work provides valuable insights for building databases and models that enable better robustness and generalizability.展开更多
基金supported by the financial assistance award 70NANB19H117 from the U.S.Department of Commerce,National Institute of Standards and Technologysupported by the U.S.Department of Energy,Office of Science,Basic Energy Sciences,Materials Sciences and Engineering Division,as part of the Computational Materials Sciences Program and Center for Predictive Simulation of Functional Materials+5 种基金supported by the Center for Nanophase Materials Sciences,which is a US Department of Energy,Office of Science User Facility at Oak Ridge National LaboratoryAHR thanks the Supercomputer Center and San Diego Supercomputer Center through allocation DMR140031 from the Advanced Cyberinfrastructure Coordination Ecosystem:Services&Support(ACCESS)program,which is supported by National Science Foundation grants#2138259,#2138286,#2138307,#2137603,and#2138296supported by NIST award 70NANB19H005 and NSF award CMMI-2053929S.H.W.especially thanks to the NSF Non-Academic Research Internships for Graduate Students(INTERN)program(CBET-1845531)for supporting part of the work in NIST under the guidance of K.CA.M.K.acknowledges support from the School of Materials Engineering at Purdue University under startup account F.10023800.05.002support by the Federal Ministry of Education and Research(BMBF)under Grant No.01DM21001B(German-Canadian Materials Acceleration Center).
文摘Lack of rigorous reproducibility and validation are significant hurdles for scientific development across many fields.Materials science,in particular,encompasses a variety of experimental and theoretical approaches that require careful benchmarking.Leaderboard efforts have been developed previously to mitigate these issues.However,a comprehensive comparison and benchmarking on an integrated platform with multiple data modalities with perfect and defect materials data is still lacking.This work introduces JARVIS-Leaderboard,an open-source and community-driven platform that facilitates benchmarking and enhances reproducibility.The platform allows users to set up benchmarks with customtasks and enables contributions in the form of dataset,code,and meta-data submissions.We cover the following materials design categories:Artificial Intelligence(AI),Electronic Structure(ES).
基金K.C.thanks the computational support from XSEDE computational resources under allocation number TGDMR 190095Contributions from K.C.were supported by the financial assistance award 70NANB19H117 from the U.S.Department of Commerce,National Institute of Standards and Technology+3 种基金Contributions by S.M.,K.H.,K.R.,and D.V.were supported by NSF DMREF Grant No.DMR-1629059 and No.DMR-1629346X.Q.was supported by NSF Grant No.OAC-1835690A.A.acknowledges partial support by CHiMaD(NIST award#70NANB19H005)G.P.was supported by the Los Alamos National Laboratory’s Laboratory Directed Research and Development(LDRD)program’s Directed Research(DR)project#20200104DR。
文摘The Joint Automated Repository for Various Integrated Simulations(JARVIS)is an integrated infrastructure to accelerate materials discovery and design using density functional theory(DFT),classical force-fields(FF),and machine learning(ML)techniques.JARVIS is motivated by the Materials Genome Initiative(MGI)principles of developing open-access databases and tools to reduce the cost and development time of materials discovery,optimization,and deployment.
文摘Recent advances in machine learning(ML)have led to substantial performance improvement in material database benchmarks,but an excellent benchmark score may not imply good generalization performance.Here we show that ML models trained on Materials Project 2018 can have severely degraded performance on new compounds in Materials Project 2021 due to the distribution shift.We discuss how to foresee the issue with a few simple tools.Firstly,the uniform manifold approximation and projection(UMAP)can be used to investigate the relation between the training and test data within the feature space.Secondly,the disagreement between multiple ML models on the test data can illuminate out-of-distribution samples.We demonstrate that the UMAP-guided and query by committee acquisition strategies can greatly improve prediction accuracy by adding only 1%of the test data.We believe this work provides valuable insights for building databases and models that enable better robustness and generalizability.