With big-data driven materials research,the new paradigm of materials science,sharing and wide accessibility of data are becoming crucial aspects.Obviously,a prerequisite for data exchange and big-data analytics is st...With big-data driven materials research,the new paradigm of materials science,sharing and wide accessibility of data are becoming crucial aspects.Obviously,a prerequisite for data exchange and big-data analytics is standardization,which means using consistent and unique conventions for,e.g.,units,zero base lines,and file formats.There are two main strategies to achieve this goal.One accepts the heterogeneous nature of the community,which comprises scientists from physics,chemistry,bio-physics,and materials science,by complying with the diverse ecosystem of computer codes and thus develops“converters”for the input and output files of all important codes.These converters then translate the data of each code into a standardized,codeindependent format.The other strategy is to provide standardized open libraries that code developers can adopt for shaping their inputs,outputs,and restart files,directly into the same code-independent format.In this perspective paper,we present both strategies and argue that they can and should be regarded as complementary,if not even synergetic.The represented appropriate format and conventions were agreed upon by two teams,the Electronic Structure Library(ESL)of the European Center for Atomic and Molecular Computations(CECAM)and the NOvel MAterials Discovery(NOMAD)Laboratory,a European Centre of Excellence(CoE).A key element of this work is the definition of hierarchical metadata describing state-of-the-art electronic-structure calculations.展开更多
Reliable artificial-intelligence models have the potential to accelerate the discovery of materials with optimal properties for various applications,including superconductivity,catalysis,and thermoelectricity.Advancem...Reliable artificial-intelligence models have the potential to accelerate the discovery of materials with optimal properties for various applications,including superconductivity,catalysis,and thermoelectricity.Advancements in this field are often hindered by the scarcity and quality of available data and the significant effort required to acquire new data.For such applications,reliable surrogate models that help guide materials space exploration using easily accessible materials properties are urgently needed.Here,we present a general,data-driven framework that provides quantitative predictions as well as qualitative rules for steering data creation for all datasets via a combination of symbolic regression and sensitivity analysis.We demonstrate the power of the framework by generating an accurate analytic model for the lattice thermal conductivity using only 75 experimentally measured values.By extracting the most influential material properties from this model,we are then able to hierarchically screen 732 materials and find 80 ultra-insulating materials.展开更多
A public data-analytics competition was organized by the Novel Materials Discovery(NOMAD)Centre of Excellence and hosted by the online platform Kaggle by using a dataset of 3,000(Al_(x)GayIn_(1-x-y))_(2)O_(3) compound...A public data-analytics competition was organized by the Novel Materials Discovery(NOMAD)Centre of Excellence and hosted by the online platform Kaggle by using a dataset of 3,000(Al_(x)GayIn_(1-x-y))_(2)O_(3) compounds.Its aim was to identify the best machinelearning(ML)model for the prediction of two key physical properties that are relevant for optoelectronic applications:the electronic bandgap energy and the crystalline formation energy.Here,we present a summary of the top-three ranked ML approaches.The first-place solution was based on a crystal-graph representation that is novel for the ML of properties of materials.The second-place model combined many candidate descriptors from a set of compositional,atomic-environment-based,and average structural properties with the light gradient-boosting machine regression model.The third-place model employed the smooth overlap of atomic position representation with a neural network.The Pearson correlation among the prediction errors of nine ML models(obtained by combining the top-three ranked representations with all three employed regression models)was examined by using the Pearson correlation to gain insight into whether the representation or the regression model determines the overall model performance.Ensembling relatively decorrelated models(based on the Pearson correlation)leads to an even higher prediction accuracy.展开更多
Reducing parameter spaces via exploiting symmetries has greatly accelerated and increased the quality of electronic-structure calculations.Unfortunately,many of the traditional methods fail when the global crystal sym...Reducing parameter spaces via exploiting symmetries has greatly accelerated and increased the quality of electronic-structure calculations.Unfortunately,many of the traditional methods fail when the global crystal symmetry is broken,even when the distortion is only a slight perturbation(e.g.,Jahn-Teller like distortions).Here we introduce a flexible and generalizable parametric relaxation scheme and implement it in the all-electron code FHI-aims.This approach utilizes parametric constraints to maintain symmetry at any level.After demonstrating the method’s ability to relax metastable structures,we highlight its adaptability and performance over a test set of 359 materials,across 13 lattice prototypes.Finally we show how these constraints can reduce the number of steps needed to relax local lattice distortions by an order of magnitude.The flexibility of these constraints enables a significant acceleration of high-throughput searches for novel materials for numerous applications.展开更多
We present the Novel-Materials-Discovery(NOMAD)Artificial-Intelligence(AI)Toolkit,a web-browser-based infrastructure for the interactive AI-based analysis of materials-science findable,accessible,interoperable,and reu...We present the Novel-Materials-Discovery(NOMAD)Artificial-Intelligence(AI)Toolkit,a web-browser-based infrastructure for the interactive AI-based analysis of materials-science findable,accessible,interoperable,and reusable(FAIR)data.The AI Toolkit readily operates on the FAIR data stored in the central server of the NOMAD Archive,the largest database of materials-science data worldwide,as well as locally stored,users’owned data.The NOMAD Oasis,a local,stand-alone server can be also used to run the AI Toolkit.By using Jupyter notebooks that run in a web-browser,the NOMAD data can be queried and accessed;data mining,machine learning,and other AI techniques can be then applied to analyze them.This infrastructure brings the concept of reproducibility in materials science to the next level,by allowing researchers to share not only the data contributing to their scientific publications,but also all the developed methods and analytics tools.Besides reproducing published results,users of the NOMAD AI toolkit can modify the Jupyter notebooks toward their own research work.展开更多
Electronic-structure theory is a strong pillar of materials science.Many different computer codes that employ different approaches are used by the community to solve various scientific problems.Still,the precision of ...Electronic-structure theory is a strong pillar of materials science.Many different computer codes that employ different approaches are used by the community to solve various scientific problems.Still,the precision of different packages has only been scrutinized thoroughly not long ago,focusing on a specific task,namely selecting a popular density functional,and using unusually high,extremely precise numerical settings for investigating 71 monoatomic crystals^(1).Little is known,however,about method- and code-specific uncertainties that arise under numerical settings that are commonly used in practice.We shed light on this issue by investigating the deviations in total and relative energies as a function of computational parameters.Using typical settings for basis sets and k-grids,we compare results for 71 elemental^(1) and 63 binary solids obtained by three different electronic-structure codes that employ fundamentally different strategies.On the basis of the observed trends,we propose a simple,analytical model for the estimation of the errors associated with the basis-set incompleteness.We cross-validate this model using ternary systems obtained from the Novel Materials Discovery (NOMAD) Repository and discuss how our approach enables the comparison of the heterogeneous data present in computational materials databases.展开更多
基金funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No.676580The NOMAD Laboratory,a European Center of Excellence,and the BBDC(contract 01IS14013E).
文摘With big-data driven materials research,the new paradigm of materials science,sharing and wide accessibility of data are becoming crucial aspects.Obviously,a prerequisite for data exchange and big-data analytics is standardization,which means using consistent and unique conventions for,e.g.,units,zero base lines,and file formats.There are two main strategies to achieve this goal.One accepts the heterogeneous nature of the community,which comprises scientists from physics,chemistry,bio-physics,and materials science,by complying with the diverse ecosystem of computer codes and thus develops“converters”for the input and output files of all important codes.These converters then translate the data of each code into a standardized,codeindependent format.The other strategy is to provide standardized open libraries that code developers can adopt for shaping their inputs,outputs,and restart files,directly into the same code-independent format.In this perspective paper,we present both strategies and argue that they can and should be regarded as complementary,if not even synergetic.The represented appropriate format and conventions were agreed upon by two teams,the Electronic Structure Library(ESL)of the European Center for Atomic and Molecular Computations(CECAM)and the NOvel MAterials Discovery(NOMAD)Laboratory,a European Centre of Excellence(CoE).A key element of this work is the definition of hierarchical metadata describing state-of-the-art electronic-structure calculations.
基金This work was funded by the NOMAD Center of Excellence(European Union’s Horizon 2020 research and innovation program,grant agreement No 951786)the ERC Advanced Grant TEC1p(European Research Council,grant agreement No 740233)the project FAIRmat(FAIR Data Infrastructure for Condensed-Matter Physics and the Chemical Physics of Solids,German Research Foundation,project No 460197019).T.A.R.P.would like to thank the Alexander von Humboldt(AvH)Foundation for their support through the AvH Postdoctoral Fellowship Program.This research used resources of the Max Planck Computing and Data Facility and the Argonne Leadership Computing Facility,which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.
文摘Reliable artificial-intelligence models have the potential to accelerate the discovery of materials with optimal properties for various applications,including superconductivity,catalysis,and thermoelectricity.Advancements in this field are often hindered by the scarcity and quality of available data and the significant effort required to acquire new data.For such applications,reliable surrogate models that help guide materials space exploration using easily accessible materials properties are urgently needed.Here,we present a general,data-driven framework that provides quantitative predictions as well as qualitative rules for steering data creation for all datasets via a combination of symbolic regression and sensitivity analysis.We demonstrate the power of the framework by generating an accurate analytic model for the lattice thermal conductivity using only 75 experimentally measured values.By extracting the most influential material properties from this model,we are then able to hierarchically screen 732 materials and find 80 ultra-insulating materials.
基金The project received funding from the European Union’s Horizon 2020 research and innovation program(grant agreement no.676580)the Molecular Simulations from First Principles(MS1P).C.S.gratefully acknowledges funding by the Alexander von Humboldt Foundation.
文摘A public data-analytics competition was organized by the Novel Materials Discovery(NOMAD)Centre of Excellence and hosted by the online platform Kaggle by using a dataset of 3,000(Al_(x)GayIn_(1-x-y))_(2)O_(3) compounds.Its aim was to identify the best machinelearning(ML)model for the prediction of two key physical properties that are relevant for optoelectronic applications:the electronic bandgap energy and the crystalline formation energy.Here,we present a summary of the top-three ranked ML approaches.The first-place solution was based on a crystal-graph representation that is novel for the ML of properties of materials.The second-place model combined many candidate descriptors from a set of compositional,atomic-environment-based,and average structural properties with the light gradient-boosting machine regression model.The third-place model employed the smooth overlap of atomic position representation with a neural network.The Pearson correlation among the prediction errors of nine ML models(obtained by combining the top-three ranked representations with all three employed regression models)was examined by using the Pearson correlation to gain insight into whether the representation or the regression model determines the overall model performance.Ensembling relatively decorrelated models(based on the Pearson correlation)leads to an even higher prediction accuracy.
基金This project was supported by TEC1p(the European Research Council(ERC)Horizon 2020 research and innovation programme,grant agreement No.740233)BigMax(the Max Planck Society’s Research Network on Big-Data-Driven Materials-Science)+1 种基金the NOMAD pillar of the FAIR-DI e.V.association.SC and DH acknowledges U.S.DOD-ONR(Grants No.N00014-17-1-2090)D.H.acknowledges support from the U.S.DOD through the National Defense Science and Engineering Graduate(NDSEG)Fellowship Program.
文摘Reducing parameter spaces via exploiting symmetries has greatly accelerated and increased the quality of electronic-structure calculations.Unfortunately,many of the traditional methods fail when the global crystal symmetry is broken,even when the distortion is only a slight perturbation(e.g.,Jahn-Teller like distortions).Here we introduce a flexible and generalizable parametric relaxation scheme and implement it in the all-electron code FHI-aims.This approach utilizes parametric constraints to maintain symmetry at any level.After demonstrating the method’s ability to relax metastable structures,we highlight its adaptability and performance over a test set of 359 materials,across 13 lattice prototypes.Finally we show how these constraints can reduce the number of steps needed to relax local lattice distortions by an order of magnitude.The flexibility of these constraints enables a significant acceleration of high-throughput searches for novel materials for numerous applications.
基金This work received funding from the European Union’s Horizon 2020 research and innovation program under the grant agreement No.951786(NOMAD CoE)the ERC Advanced Grant TEC1P(No.740233)+1 种基金the German Research Foundation(DFG)through the NFDI consortium“FAIRmat”,project 460197019Open Access funding enabled and organized by Projekt DEAL.
文摘We present the Novel-Materials-Discovery(NOMAD)Artificial-Intelligence(AI)Toolkit,a web-browser-based infrastructure for the interactive AI-based analysis of materials-science findable,accessible,interoperable,and reusable(FAIR)data.The AI Toolkit readily operates on the FAIR data stored in the central server of the NOMAD Archive,the largest database of materials-science data worldwide,as well as locally stored,users’owned data.The NOMAD Oasis,a local,stand-alone server can be also used to run the AI Toolkit.By using Jupyter notebooks that run in a web-browser,the NOMAD data can be queried and accessed;data mining,machine learning,and other AI techniques can be then applied to analyze them.This infrastructure brings the concept of reproducibility in materials science to the next level,by allowing researchers to share not only the data contributing to their scientific publications,but also all the developed methods and analytics tools.Besides reproducing published results,users of the NOMAD AI toolkit can modify the Jupyter notebooks toward their own research work.
基金This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No.676580 and No.740233 (TEC1p)O.T.H.and E.W.gratefully acknowledge funding by the Austrian Science Fund,FWF,under the project P27868-N36.
文摘Electronic-structure theory is a strong pillar of materials science.Many different computer codes that employ different approaches are used by the community to solve various scientific problems.Still,the precision of different packages has only been scrutinized thoroughly not long ago,focusing on a specific task,namely selecting a popular density functional,and using unusually high,extremely precise numerical settings for investigating 71 monoatomic crystals^(1).Little is known,however,about method- and code-specific uncertainties that arise under numerical settings that are commonly used in practice.We shed light on this issue by investigating the deviations in total and relative energies as a function of computational parameters.Using typical settings for basis sets and k-grids,we compare results for 71 elemental^(1) and 63 binary solids obtained by three different electronic-structure codes that employ fundamentally different strategies.On the basis of the observed trends,we propose a simple,analytical model for the estimation of the errors associated with the basis-set incompleteness.We cross-validate this model using ternary systems obtained from the Novel Materials Discovery (NOMAD) Repository and discuss how our approach enables the comparison of the heterogeneous data present in computational materials databases.