We present the Novel-Materials-Discovery(NOMAD)Artificial-Intelligence(AI)Toolkit,a web-browser-based infrastructure for the interactive AI-based analysis of materials-science findable,accessible,interoperable,and reu...We present the Novel-Materials-Discovery(NOMAD)Artificial-Intelligence(AI)Toolkit,a web-browser-based infrastructure for the interactive AI-based analysis of materials-science findable,accessible,interoperable,and reusable(FAIR)data.The AI Toolkit readily operates on the FAIR data stored in the central server of the NOMAD Archive,the largest database of materials-science data worldwide,as well as locally stored,users’owned data.The NOMAD Oasis,a local,stand-alone server can be also used to run the AI Toolkit.By using Jupyter notebooks that run in a web-browser,the NOMAD data can be queried and accessed;data mining,machine learning,and other AI techniques can be then applied to analyze them.This infrastructure brings the concept of reproducibility in materials science to the next level,by allowing researchers to share not only the data contributing to their scientific publications,but also all the developed methods and analytics tools.Besides reproducing published results,users of the NOMAD AI toolkit can modify the Jupyter notebooks toward their own research work.展开更多
Computational study of molecules and materials from first principles is a cornerstone of physics,chemistry,and materials science,but limited by the cost of accurate and precise simulations.In settings involving many s...Computational study of molecules and materials from first principles is a cornerstone of physics,chemistry,and materials science,but limited by the cost of accurate and precise simulations.In settings involving many simulations,machine learning can reduce these costs,often by orders of magnitude,by interpolating between reference simulations.This requires representations that describe any molecule or material and support interpolation.We comprehensively review and discuss current representations and relations between them.For selected state-of-the-art representations,we compare energy predictions for organic molecules,binary alloys,and Al–Ga–In sesquioxides in numerical experiments controlled for data distribution,regression method,and hyper-parameter optimization.展开更多
Reliable artificial-intelligence models have the potential to accelerate the discovery of materials with optimal properties for various applications,including superconductivity,catalysis,and thermoelectricity.Advancem...Reliable artificial-intelligence models have the potential to accelerate the discovery of materials with optimal properties for various applications,including superconductivity,catalysis,and thermoelectricity.Advancements in this field are often hindered by the scarcity and quality of available data and the significant effort required to acquire new data.For such applications,reliable surrogate models that help guide materials space exploration using easily accessible materials properties are urgently needed.Here,we present a general,data-driven framework that provides quantitative predictions as well as qualitative rules for steering data creation for all datasets via a combination of symbolic regression and sensitivity analysis.We demonstrate the power of the framework by generating an accurate analytic model for the lattice thermal conductivity using only 75 experimentally measured values.By extracting the most influential material properties from this model,we are then able to hierarchically screen 732 materials and find 80 ultra-insulating materials.展开更多
Singlet fission(SF),the conversion of one singlet exciton into two triplet excitons,could significantly enhance solar cell efficiency.Molecular crystals that undergo SF are scarce.Computational exploration may acceler...Singlet fission(SF),the conversion of one singlet exciton into two triplet excitons,could significantly enhance solar cell efficiency.Molecular crystals that undergo SF are scarce.Computational exploration may accelerate the discovery of SF materials.However,many-body perturbation theory(MBPT)calculations of the excitonic properties of molecular crystals are impractical for large-scale materials screening.We use the sure-independence-screening-and-sparsifying-operator(SISSO)machine-learning algorithm to generate computationally efficient models that can predict the MBPT thermodynamic driving force for SF for a dataset of 101 polycyclic aromatic hydrocarbons(PAH101).SISSO generates models by iteratively combining physical primary features.The best models are selected by linear regression with cross-validation.The SISSO models successfully predict the SF driving force with errors below 0.2 eV.Based on the cost,accuracy,and classification performance of SISSO models,we propose a hierarchical materials screening workflow.Three potential SF candidates are found in the PAH101 set.展开更多
Characterizing crystal structures and interfaces down to the atomic level is an important step for designing advanced materials.Modern electron microscopy routinely achieves atomic resolution and is capable to resolve...Characterizing crystal structures and interfaces down to the atomic level is an important step for designing advanced materials.Modern electron microscopy routinely achieves atomic resolution and is capable to resolve complex arrangements of atoms with picometer precision.Here,we present AI-STEM,an automatic,artificial-intelligence based method,for accurately identifying key characteristics from atomic-resolution scanning transmission electron microscopy(STEM)images of polycrystalline materials.The method is based on a Bayesian convolutional neural network(BNN)that is trained only on simulated images.AI-STEM automatically and accurately identifies crystal structure,lattice orientation,and location of interface regions in synthetic and experimental images.The model is trained on cubic and hexagonal crystal structures,yielding classifications and uncertainty estimates,while no explicit information on structural patterns at the interfaces is included during training.This work combines principles from probabilistic modeling,deep learning,and information theory,enabling automatic analysis of experimental,atomic-resolution images.展开更多
基金This work received funding from the European Union’s Horizon 2020 research and innovation program under the grant agreement No.951786(NOMAD CoE)the ERC Advanced Grant TEC1P(No.740233)+1 种基金the German Research Foundation(DFG)through the NFDI consortium“FAIRmat”,project 460197019Open Access funding enabled and organized by Projekt DEAL.
文摘We present the Novel-Materials-Discovery(NOMAD)Artificial-Intelligence(AI)Toolkit,a web-browser-based infrastructure for the interactive AI-based analysis of materials-science findable,accessible,interoperable,and reusable(FAIR)data.The AI Toolkit readily operates on the FAIR data stored in the central server of the NOMAD Archive,the largest database of materials-science data worldwide,as well as locally stored,users’owned data.The NOMAD Oasis,a local,stand-alone server can be also used to run the AI Toolkit.By using Jupyter notebooks that run in a web-browser,the NOMAD data can be queried and accessed;data mining,machine learning,and other AI techniques can be then applied to analyze them.This infrastructure brings the concept of reproducibility in materials science to the next level,by allowing researchers to share not only the data contributing to their scientific publications,but also all the developed methods and analytics tools.Besides reproducing published results,users of the NOMAD AI toolkit can modify the Jupyter notebooks toward their own research work.
基金This work received funding from the European Union’s Horizon 2020 Research and Innovation Programme,Grant Agreements No.676580,the NOMAD Laboratory CoE,and No.740233,ERC:TEC1PIt was funded in part by the German Ministry for Education and Research as BIFOLD-Berlin Institute for the Foundations of Learning and Data(ref.01IS18025A and ref.01IS18037A)Part of the research was performed while the authors visited the Institute for Pure and Applied Mathematics(IPAM),which is supported by the National Science Foundation(Grant No.DMS-1440415).
文摘Computational study of molecules and materials from first principles is a cornerstone of physics,chemistry,and materials science,but limited by the cost of accurate and precise simulations.In settings involving many simulations,machine learning can reduce these costs,often by orders of magnitude,by interpolating between reference simulations.This requires representations that describe any molecule or material and support interpolation.We comprehensively review and discuss current representations and relations between them.For selected state-of-the-art representations,we compare energy predictions for organic molecules,binary alloys,and Al–Ga–In sesquioxides in numerical experiments controlled for data distribution,regression method,and hyper-parameter optimization.
基金This work was funded by the NOMAD Center of Excellence(European Union’s Horizon 2020 research and innovation program,grant agreement No 951786)the ERC Advanced Grant TEC1p(European Research Council,grant agreement No 740233)the project FAIRmat(FAIR Data Infrastructure for Condensed-Matter Physics and the Chemical Physics of Solids,German Research Foundation,project No 460197019).T.A.R.P.would like to thank the Alexander von Humboldt(AvH)Foundation for their support through the AvH Postdoctoral Fellowship Program.This research used resources of the Max Planck Computing and Data Facility and the Argonne Leadership Computing Facility,which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.
文摘Reliable artificial-intelligence models have the potential to accelerate the discovery of materials with optimal properties for various applications,including superconductivity,catalysis,and thermoelectricity.Advancements in this field are often hindered by the scarcity and quality of available data and the significant effort required to acquire new data.For such applications,reliable surrogate models that help guide materials space exploration using easily accessible materials properties are urgently needed.Here,we present a general,data-driven framework that provides quantitative predictions as well as qualitative rules for steering data creation for all datasets via a combination of symbolic regression and sensitivity analysis.We demonstrate the power of the framework by generating an accurate analytic model for the lattice thermal conductivity using only 75 experimentally measured values.By extracting the most influential material properties from this model,we are then able to hierarchically screen 732 materials and find 80 ultra-insulating materials.
基金Work at CMU was supported by the National Science Foundation(NSF)Division of Materials Research through grant DMR-2021803This research used resources of the Argonne Leadership Computing Facility(ALCF),which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357of the National Energy Research Scientific Computing Center(NERSC),a DOE Office of Science User Facility supported by the Office of Science of the US Department of Energy,under Contract DE-AC02-05CH11231.
文摘Singlet fission(SF),the conversion of one singlet exciton into two triplet excitons,could significantly enhance solar cell efficiency.Molecular crystals that undergo SF are scarce.Computational exploration may accelerate the discovery of SF materials.However,many-body perturbation theory(MBPT)calculations of the excitonic properties of molecular crystals are impractical for large-scale materials screening.We use the sure-independence-screening-and-sparsifying-operator(SISSO)machine-learning algorithm to generate computationally efficient models that can predict the MBPT thermodynamic driving force for SF for a dataset of 101 polycyclic aromatic hydrocarbons(PAH101).SISSO generates models by iteratively combining physical primary features.The best models are selected by linear regression with cross-validation.The SISSO models successfully predict the SF driving force with errors below 0.2 eV.Based on the cost,accuracy,and classification performance of SISSO models,we propose a hierarchical materials screening workflow.Three potential SF candidates are found in the PAH101 set.
基金L.M.G.acknowledges funding from the European Union’s Horizon 2020 research and innovation program,under grant agreements No.951786(NOMAD CoE)and No.740233(TEC1p)Furthermore,the authors acknowledge the Max Planck Computing and Data facility(MPCDF)for computational resources and support,which enabled neural-network training on 1 GPU(Tesla Volta V10032GB)on the Talos machine learning clusterB.C.Y.acknowledges funding from the National Research Foundation(NRF)of Korea under Project Number 2021M3A7C2090586.
文摘Characterizing crystal structures and interfaces down to the atomic level is an important step for designing advanced materials.Modern electron microscopy routinely achieves atomic resolution and is capable to resolve complex arrangements of atoms with picometer precision.Here,we present AI-STEM,an automatic,artificial-intelligence based method,for accurately identifying key characteristics from atomic-resolution scanning transmission electron microscopy(STEM)images of polycrystalline materials.The method is based on a Bayesian convolutional neural network(BNN)that is trained only on simulated images.AI-STEM automatically and accurately identifies crystal structure,lattice orientation,and location of interface regions in synthetic and experimental images.The model is trained on cubic and hexagonal crystal structures,yielding classifications and uncertainty estimates,while no explicit information on structural patterns at the interfaces is included during training.This work combines principles from probabilistic modeling,deep learning,and information theory,enabling automatic analysis of experimental,atomic-resolution images.