The history and current status of materials data activities from handbook to database are reviewed, with introduction to some important products. Through an example of prediction of interfacial thermal resistance base...The history and current status of materials data activities from handbook to database are reviewed, with introduction to some important products. Through an example of prediction of interfacial thermal resistance based on data and data science methods, we show the advantages and potential of material informatics to study material issues which are too complicated or time consuming for conventional theoretical and experimental methods. Materials big data is the fundamental of material informatics. The challenges and strategy to construct materials big data are discussed, and some solutions are proposed as the results of our experiences to construct National Institute for Materials Science(NIMS) materials databases.展开更多
With the rapid development of artificial intelligence and machine learning(ML)methods,materials science is rapidly entering the era of data-driven materials informatics.ML models serve as the most crucial component,cl...With the rapid development of artificial intelligence and machine learning(ML)methods,materials science is rapidly entering the era of data-driven materials informatics.ML models serve as the most crucial component,closely bridging material structure and material properties.There is a considerable difference in the prediction performance of different ML methods for material systems.Herein,we evaluated three categories(linear,kernel,and nonlinear methods)of models,with twelve ML algorithms commonly used in the materials field.In addition,halide perovskite was chosen as an example to evaluate the fitting performance of different models.We constructed a total dataset of 540 halide perovskites and 72 features,with formation energy and bandgap as target properties.We found that different categories of ML models show similar trends for different target properties.Among them,the difference between the models is enormous for the formation energy,with the coefficient of determination(R2)range 0.69-0.953.The fitting performance between the models is closer for bandgap,with the R^(2)range 0.941-0.997.The nonlinear-ensemble model shows the best fitting performance for both the formation energy and the bandgap.It shows that the nonlinear-ensemble model,constructed by combining multiple weak learners,effectively describes the nonlinear relationship between material features and target property.In addition,the extreme gradient boosting decision tree model shows the most superior results among all the models and searches for two new descriptors that are crucial for formation energy and bandgap.Our work provides useful guidance for the selection of effective machine learning methods in the data-mining studies of specific material systems.展开更多
Ternary MAX phases,characterized by the chemical formula M₂AX,represent a group of layered materials with hexagonal lattices.These MAX phases have been the subject of extensive experimental and theoretical studies.For...Ternary MAX phases,characterized by the chemical formula M₂AX,represent a group of layered materials with hexagonal lattices.These MAX phases have been the subject of extensive experimental and theoretical studies.Formation energy and thermodynamic calculations indicate that MAX phases containing late transition metals,such as Rh,Ru,Pt,Pd,Co,and Ni,are unlikely to form.Here,we introduce an alternative family of orthorhombic and monoclinic materials,the LAX phases,which exhibit similarities to MAX phases in terms of their layered structure and A and X elements.However,LAX materials incorporate late transition metals in place of the early transition metals.Advanced techniques for predicting the crystal structure of materials,coupled with data-driven materials research and machine learning algorithms,were employed to investigate the stable structures containing transition metals from the last groups of the d-block elements.The analyses revealed 207 ternary LAX systems that demonstrate robust stability against decomposition,with 100 of these systems showing dynamic stability.An in-depth examination of the top 10 structures revealed five LAX systems that are phase stable and exhibit superior mechanical properties,outperforming MAX phase counterparts in Young's modulus,stiffness,and hardness.These findings indicate that many LAX phase structures are viable candidates for future synthesis,highlighting the potential of heuristic-based structure searches in material discovery.展开更多
MatCloud provides a high-throughput computational materials infrastructure for the integrated management of materials simulation, data, and computing resources. In comparison to AFLOW, Material Project, and NoMad, Mat...MatCloud provides a high-throughput computational materials infrastructure for the integrated management of materials simulation, data, and computing resources. In comparison to AFLOW, Material Project, and NoMad, MatCloud delivers two-fold functionalities: a computational materials platform where users can do on-line job setup, job submission and monitoring only via Web browser, and a materials properties simulation database. It is developed under Chinese Materials Genome Initiative and is a China own proprietary high-throughput computational materials infrastructure. MatCloud has been on line for about one year, receiving considerable registered users, feedbacks, and encouragements. Many users provided valuable input and requirements to MatCloud. In this paper, we describe the present MatCloud, future visions, and major challenges. Based on what we have achieved, we will endeavour to further develop MatCloud in an open and collaborative manner and make MatCloud a world known China-developed novel software in the pressing area of high-throughput materials calculations and materials properties simulation database within Material Genome Initiative.展开更多
Building processing,structure,and property(PSP)relations for computational materials design is at the heart of the Materials Genome Initiative in the era of high-throughput computational materials science.Recent techn...Building processing,structure,and property(PSP)relations for computational materials design is at the heart of the Materials Genome Initiative in the era of high-throughput computational materials science.Recent technological advancements in data acquisition and storage,microstructure characterization and reconstruction(MCR),machine learning(ML),materials modeling and simulation,data processing,manufacturing,and experimentation have significantly advanced researchers’abilities in building PSP relations and inverse material design.In this article,we examine these advancements from the perspective of design research.In particular,we introduce a data-centric approach whose fundamental aspects fall into three categories:design representation,design evaluation,and design synthesis.Developments in each of these aspects are guided by and benefit from domain knowledge.Hence,for each aspect,we present a wide range of computational methods whose integration realizes data-centric materials discovery and design.展开更多
The advancement of artificial intelligence(AI)in material design and engineering has led to significant improvements in predictive modeling of material properties.However,the lack of interpretability in machine learni...The advancement of artificial intelligence(AI)in material design and engineering has led to significant improvements in predictive modeling of material properties.However,the lack of interpretability in machine learning(ML)-based material informatics presents a major barrier to its practical adoption.This study proposes a novel quantitative computational framework that integrates ML models with explainable artificial intelligence(XAI)techniques to enhance both predictive accuracy and interpretability in material property prediction.The framework systematically incorporates a structured pipeline,including data processing,feature selection,model training,performance evaluation,explainability analysis,and real-world deployment.It is validated through a representative case study on the prediction of high-performance concrete(HPC)compressive strength,utilizing a comparative analysis of ML models such as Random Forest,XGBoost,Support Vector Regression(SVR),and Deep Neural Networks(DNNs).The results demonstrate that XGBoost achieves the highest predictive performance(R^(2)=0.918),while SHAP(Shapley Additive Explanations)and LIME(Local Interpretable Model-Agnostic Explanations)provide detailed insights into feature importance and material interactions.Additionally,the deployment of the trained model as a cloud-based Flask-Gunicorn API enables real-time inference,ensuring its scalability and accessibility for industrial and research applications.The proposed framework addresses key limitations of existing ML approaches by integrating advanced explainability techniques,systematically handling nonlinear feature interactions,and providing a scalable deployment strategy.This study contributes to the development of interpretable and deployable AI-driven material informatics,bridging the gap between data-driven predictions and fundamental material science principles.展开更多
Traditional materials informatics leverages big data and machine learning(ML)to forecast material performance based on structural features but often overlooks valuable textual information.In this work,we proposed a no...Traditional materials informatics leverages big data and machine learning(ML)to forecast material performance based on structural features but often overlooks valuable textual information.In this work,we proposed a novel methodology for predicting material performance through context-based modeling using large language models(LLMs).This method integrates both numerical and textual information,enhancing predictive accuracy and scalability.In the case study,the approach is applied to predict the performance of solid amine CO_(2) adsorbents under direct air capture(DAC)conditions.ChatGPT 4o model was used to employ in-context learning to predict CO_(2) adsorption uptake based on input features,including material properties and experimental conditions.The results show that context-based modeling can reduce prediction error in comparison to traditional ML models in the prediction task.We adopted Sapley Additive exPlanations(SHAP)to further elucidate the importance of various input features.This work highlights the potential of LLMs in materials science,offering a cost-effective,efficient solution for complex predictive tasks.展开更多
The manufacturing of nanomaterials by the electrospinning process requires accurate and meticulous inspection of related scanning electron microscope(SEM)images of the electrospun nanofiber,to ensure that no structura...The manufacturing of nanomaterials by the electrospinning process requires accurate and meticulous inspection of related scanning electron microscope(SEM)images of the electrospun nanofiber,to ensure that no structural defects are produced.The presence of anomalies prevents practical application of the electrospun nanofibrous material in nanotechnology.Hence,the automatic monitoring and quality control of nanomaterials is a relevant challenge in the context of Industry 4.0.In this paper,a novel automatic classification system for homogenous(anomaly-free)and non-homogenous(with defects)nanofibers is proposed.The inspection procedure aims at avoiding direct processing of the redundant full SEM image.Specifically,the image to be analyzed is first partitioned into subimages(nanopatches)that are then used as input to a hybrid unsupervised and supervised machine learning system.In the first step,an autoencoder(AE)is trained with unsupervised learning to generate a code representing the input image with a vector of relevant features.Next,a multilayer perceptron(MLP),trained with supervised learning,uses the extracted features to classify non-homogenous nanofiber(NH-NF)and homogenous nanofiber(H-NF)patches.The resulting novel AE-MLP system is shown to outperform other standard machine learning models and other recent state-of-the-art techniques,reporting accuracy rate up to92.5%.In addition,the proposed approach leads to model complexity reduction with respect to other deep learning strategies such as convolutional neural networks(CNN).The encouraging performance achieved in this benchmark study can stimulate the application of the proposed scheme in other challenging industrial manufacturing tasks.展开更多
The mechanical properties of complex concentrated alloys(CCAs)depend on their formed phases and corresponding microstructures.The data-driven prediction of the phase formation and associated mechanical properties is e...The mechanical properties of complex concentrated alloys(CCAs)depend on their formed phases and corresponding microstructures.The data-driven prediction of the phase formation and associated mechanical properties is essential to discovering novel CCAs.The present work collects 557 samples of various chemical compositions,comprising 61 amorphous,167 single-phase crystalline,and 329 multiphases crystalline CCAs.Three classification models are developed with high accuracies to category and understand the formed phases of CCAs.Also,two regression models are constructed to predict the hardness and ultimate tensile strength of CCAs,and the correlation coefficient of the random forest regression model is greater than 0.9 for both of two targeted properties.Furthermore,the Shapley additive explanation(SHAP)values are calculated,and accordingly four most important features are identified.A significant finding in the SHAP values is that there exists a critical value in each of the top four features,which provides an easy and fast assessment in the design of improved mechanical properties of CCAs.The present work demonstrates the great potential of machine learning in the design of advanced CCAs.展开更多
Materials informatics has emerged as a promisingly new paradigm for accelerating materials discovery and design.It exploits the intelligent power of machine learning methods in massive materials data from experiments ...Materials informatics has emerged as a promisingly new paradigm for accelerating materials discovery and design.It exploits the intelligent power of machine learning methods in massive materials data from experiments or simulations to seek new materials,functionality,and principles,etc.Developing specialized facilities to generate,collect,manage,learn,and mine large-scale materials data is crucial to materials informatics.We herein developed an artificial-intelligence-aided data-driven infrastructure named Jilin Artificial-intelligence aided Materials-design Integrated Package(JAMIP),which is an open-source Python framework to meet the research requirements of computational materials informatics.It is integrated by materials production factory,high-throughput first-principles calculations engine,automatic tasks submission and monitoring progress,data extraction,management and storage system,and artificial intelligence machine learning based data mining functions.We have integrated specific features such as an inorganic crystal structure prototype database to facilitate high-throughput calculations and essential modules associated with machine learning studies of functional materials.We demonstrated how our developed code is useful in exploring materials informatics of optoelectronic semiconductors by taking halide perovskites as typical case.By obeying the principles of automation,extensibility,reliability,and intelligence,the JAMIP code is a promisingly powerful tool contributing to the fast-growing field of computational materials informatics.展开更多
A data augmentation technique is employed in the current work on a training dataset of 610 bulk metallic glasses(BMGs),which are randomly selected from 762 collected data.An ensemble machine learning(ML)model is devel...A data augmentation technique is employed in the current work on a training dataset of 610 bulk metallic glasses(BMGs),which are randomly selected from 762 collected data.An ensemble machine learning(ML)model is developed on augmented training dataset and tested by the rest 152 data.The result shows that ML model has the ability to predict the maximal diameter Dmaxof BMGs more accurate than all reported ML models.In addition,the novel ML model gives the glass forming ability(GFA)rules:average atomic radius ranging from 140 pm to 165 pm,the value of TT/(T-T)(T-T)being higher than 2.5,the entropy of mixing being higher than 10 J/K/mol,and the enthalpy of mixing ranging from-32 k J/mol to-26 k J/mol.ML model is interpretative,thereby deepening the understanding of GFA.展开更多
As an implementation tool of data intensive scientific research methods,machine learning(ML)can effectively shorten the research and development(R&D)cycle of new materials by half or even more.ML shows great poten...As an implementation tool of data intensive scientific research methods,machine learning(ML)can effectively shorten the research and development(R&D)cycle of new materials by half or even more.ML shows great potential in the combination with other scientific research technologies,especially in the processing and classification of large amounts of material data from theoretical calculation and experimental characterization.It is very important to systematically understand the research ideas of material informatics to accelerate the exploration of new materials.Here,we provide a comprehensive introduction to the most commonly used ML modeling methods in material informatics with classic cases.Then,we review the latest progresses of prediction models,which focus on new processing–structure–properties–performance(PSPP)relationships in some popular material systems,such as perovskites,catalysts,alloys,two-dimensional materials,and polymers.In addition,we summarize the recent pioneering researches in innovation of material research technology,such as inverse design,ML interatomic potentials,and microtopography characterization assistance,as new research directions of material informatics.Finally,we comprehensively provide the most significant challenges and outlooks related to the future innovation and development in the field of material informatics.This review provides a critical and concise appraisal for the applications of material informatics,and a systematic and coherent guidance for material scientists to choose modeling methods based on required materials and technologies.展开更多
Knowledge of the mechanical properties of structural materials is essential for their practical applications. In the present work,three-hundred and sixty data samples on four mechanical properties of steels—fatigue s...Knowledge of the mechanical properties of structural materials is essential for their practical applications. In the present work,three-hundred and sixty data samples on four mechanical properties of steels—fatigue strength, tensile strength, fracture strength and hardness—were selected from the Japan National Institute of Material Science database, comprising data on carbon steels and low-alloy steels. Five machine learning algorithms were used to predict the mechanical properties of the materials represented by the three-hundred and sixty data samples, and random forest regression showed the best predictive performance.Feature selection conducted by random forest and symbolic regressions revealed the four most important features that most influence the mechanical properties of steels: the tempering temperature of steel, and the alloying elements of carbon, chromium and molybdenum. Mathematical expressions were generated via symbolic regression, and the expressions explicitly predicted how each of the four mechanical properties varied quantitatively with the four most important features. This study demonstrates the great potential of symbolic regression in the discovery of novel advanced materials.展开更多
基金Project supported by “Materials Research by Information Integration” Initiative(MI2I) project of the Support Program for Starting Up Innovation Hub from Japan Science and Technology Agency(JST)
文摘The history and current status of materials data activities from handbook to database are reviewed, with introduction to some important products. Through an example of prediction of interfacial thermal resistance based on data and data science methods, we show the advantages and potential of material informatics to study material issues which are too complicated or time consuming for conventional theoretical and experimental methods. Materials big data is the fundamental of material informatics. The challenges and strategy to construct materials big data are discussed, and some solutions are proposed as the results of our experiences to construct National Institute for Materials Science(NIMS) materials databases.
基金supported by the National Natural Science Foundation of China(Grants Nos.62125402 and 92061113)。
文摘With the rapid development of artificial intelligence and machine learning(ML)methods,materials science is rapidly entering the era of data-driven materials informatics.ML models serve as the most crucial component,closely bridging material structure and material properties.There is a considerable difference in the prediction performance of different ML methods for material systems.Herein,we evaluated three categories(linear,kernel,and nonlinear methods)of models,with twelve ML algorithms commonly used in the materials field.In addition,halide perovskite was chosen as an example to evaluate the fitting performance of different models.We constructed a total dataset of 540 halide perovskites and 72 features,with formation energy and bandgap as target properties.We found that different categories of ML models show similar trends for different target properties.Among them,the difference between the models is enormous for the formation energy,with the coefficient of determination(R2)range 0.69-0.953.The fitting performance between the models is closer for bandgap,with the R^(2)range 0.941-0.997.The nonlinear-ensemble model shows the best fitting performance for both the formation energy and the bandgap.It shows that the nonlinear-ensemble model,constructed by combining multiple weak learners,effectively describes the nonlinear relationship between material features and target property.In addition,the extreme gradient boosting decision tree model shows the most superior results among all the models and searches for two new descriptors that are crucial for formation energy and bandgap.Our work provides useful guidance for the selection of effective machine learning methods in the data-mining studies of specific material systems.
基金Iran National Science Foundation,Grant/Award Number:4025794Japan Society for the Promotion of Science,Grant/Award Number:24K08211。
文摘Ternary MAX phases,characterized by the chemical formula M₂AX,represent a group of layered materials with hexagonal lattices.These MAX phases have been the subject of extensive experimental and theoretical studies.Formation energy and thermodynamic calculations indicate that MAX phases containing late transition metals,such as Rh,Ru,Pt,Pd,Co,and Ni,are unlikely to form.Here,we introduce an alternative family of orthorhombic and monoclinic materials,the LAX phases,which exhibit similarities to MAX phases in terms of their layered structure and A and X elements.However,LAX materials incorporate late transition metals in place of the early transition metals.Advanced techniques for predicting the crystal structure of materials,coupled with data-driven materials research and machine learning algorithms,were employed to investigate the stable structures containing transition metals from the last groups of the d-block elements.The analyses revealed 207 ternary LAX systems that demonstrate robust stability against decomposition,with 100 of these systems showing dynamic stability.An in-depth examination of the top 10 structures revealed five LAX systems that are phase stable and exhibit superior mechanical properties,outperforming MAX phase counterparts in Young's modulus,stiffness,and hardness.These findings indicate that many LAX phase structures are viable candidates for future synthesis,highlighting the potential of heuristic-based structure searches in material discovery.
基金Project supported by the National Key Research and Development Program of China(Grant Nos.2017YFB0701702 and 2016YFB0700501)the National Natural Science Foundation of China(Grant Nos.61472394 and 11534012)Science and Technology Department of Sichuan Province,China(Grant No.2017JZ0001)
文摘MatCloud provides a high-throughput computational materials infrastructure for the integrated management of materials simulation, data, and computing resources. In comparison to AFLOW, Material Project, and NoMad, MatCloud delivers two-fold functionalities: a computational materials platform where users can do on-line job setup, job submission and monitoring only via Web browser, and a materials properties simulation database. It is developed under Chinese Materials Genome Initiative and is a China own proprietary high-throughput computational materials infrastructure. MatCloud has been on line for about one year, receiving considerable registered users, feedbacks, and encouragements. Many users provided valuable input and requirements to MatCloud. In this paper, we describe the present MatCloud, future visions, and major challenges. Based on what we have achieved, we will endeavour to further develop MatCloud in an open and collaborative manner and make MatCloud a world known China-developed novel software in the pressing area of high-throughput materials calculations and materials properties simulation database within Material Genome Initiative.
基金support from the National Science Foundation(NSF)Cyberinfrastructure for Sustained Scientific Innovation program(OAC-1835782)the NSF Designing Materials to Revolutionize and Engineer Our Future program(CMMI-1729743)+1 种基金Center for Hierarchical Materials Design(NIST 70NANB19H005)at Northwestern Universitythe Advanced Research Projects Agency-Energy(APAR-E,DE-AR0001209)。
文摘Building processing,structure,and property(PSP)relations for computational materials design is at the heart of the Materials Genome Initiative in the era of high-throughput computational materials science.Recent technological advancements in data acquisition and storage,microstructure characterization and reconstruction(MCR),machine learning(ML),materials modeling and simulation,data processing,manufacturing,and experimentation have significantly advanced researchers’abilities in building PSP relations and inverse material design.In this article,we examine these advancements from the perspective of design research.In particular,we introduce a data-centric approach whose fundamental aspects fall into three categories:design representation,design evaluation,and design synthesis.Developments in each of these aspects are guided by and benefit from domain knowledge.Hence,for each aspect,we present a wide range of computational methods whose integration realizes data-centric materials discovery and design.
基金supported by the J.Gustaf Richert Stiftelse(2023-00884)Energimyndigheten(P2021-00248)+3 种基金Svenska Forskningsrådet Formas(2022-01475)Kungl.Skogs-och Lantbruksakademien(GFS2023-0131BYG2023-0007GFS2024-0155)Royal Swedish Academy of Forestry and Agriculture(KSLA:GFS2023-0131,BYG2023-0007,GFS2024-0155)Anna and Nils Håkansson's Foundation(nhbidr24-6).
文摘The advancement of artificial intelligence(AI)in material design and engineering has led to significant improvements in predictive modeling of material properties.However,the lack of interpretability in machine learning(ML)-based material informatics presents a major barrier to its practical adoption.This study proposes a novel quantitative computational framework that integrates ML models with explainable artificial intelligence(XAI)techniques to enhance both predictive accuracy and interpretability in material property prediction.The framework systematically incorporates a structured pipeline,including data processing,feature selection,model training,performance evaluation,explainability analysis,and real-world deployment.It is validated through a representative case study on the prediction of high-performance concrete(HPC)compressive strength,utilizing a comparative analysis of ML models such as Random Forest,XGBoost,Support Vector Regression(SVR),and Deep Neural Networks(DNNs).The results demonstrate that XGBoost achieves the highest predictive performance(R^(2)=0.918),while SHAP(Shapley Additive Explanations)and LIME(Local Interpretable Model-Agnostic Explanations)provide detailed insights into feature importance and material interactions.Additionally,the deployment of the trained model as a cloud-based Flask-Gunicorn API enables real-time inference,ensuring its scalability and accessibility for industrial and research applications.The proposed framework addresses key limitations of existing ML approaches by integrating advanced explainability techniques,systematically handling nonlinear feature interactions,and providing a scalable deployment strategy.This study contributes to the development of interpretable and deployable AI-driven material informatics,bridging the gap between data-driven predictions and fundamental material science principles.
基金supported by the Key Project of Natural Science Funds of Tianjin City(22JCZDJC00540).
文摘Traditional materials informatics leverages big data and machine learning(ML)to forecast material performance based on structural features but often overlooks valuable textual information.In this work,we proposed a novel methodology for predicting material performance through context-based modeling using large language models(LLMs).This method integrates both numerical and textual information,enhancing predictive accuracy and scalability.In the case study,the approach is applied to predict the performance of solid amine CO_(2) adsorbents under direct air capture(DAC)conditions.ChatGPT 4o model was used to employ in-context learning to predict CO_(2) adsorption uptake based on input features,including material properties and experimental conditions.The results show that context-based modeling can reduce prediction error in comparison to traditional ML models in the prediction task.We adopted Sapley Additive exPlanations(SHAP)to further elucidate the importance of various input features.This work highlights the potential of LLMs in materials science,offering a cost-effective,efficient solution for complex predictive tasks.
基金supported by the European Commission,the European Social Fund and the Calabria Region(C39B18000080002)supported by the UK Engineering and Physical Sciences Research Council(EPSRC)(EP/M026981/1,EP/T021063/1,EP/T024917/1)。
文摘The manufacturing of nanomaterials by the electrospinning process requires accurate and meticulous inspection of related scanning electron microscope(SEM)images of the electrospun nanofiber,to ensure that no structural defects are produced.The presence of anomalies prevents practical application of the electrospun nanofibrous material in nanotechnology.Hence,the automatic monitoring and quality control of nanomaterials is a relevant challenge in the context of Industry 4.0.In this paper,a novel automatic classification system for homogenous(anomaly-free)and non-homogenous(with defects)nanofibers is proposed.The inspection procedure aims at avoiding direct processing of the redundant full SEM image.Specifically,the image to be analyzed is first partitioned into subimages(nanopatches)that are then used as input to a hybrid unsupervised and supervised machine learning system.In the first step,an autoencoder(AE)is trained with unsupervised learning to generate a code representing the input image with a vector of relevant features.Next,a multilayer perceptron(MLP),trained with supervised learning,uses the extracted features to classify non-homogenous nanofiber(NH-NF)and homogenous nanofiber(H-NF)patches.The resulting novel AE-MLP system is shown to outperform other standard machine learning models and other recent state-of-the-art techniques,reporting accuracy rate up to92.5%.In addition,the proposed approach leads to model complexity reduction with respect to other deep learning strategies such as convolutional neural networks(CNN).The encouraging performance achieved in this benchmark study can stimulate the application of the proposed scheme in other challenging industrial manufacturing tasks.
基金supported by the National Key R&D Program of China(No.2018YFB0704404)the Hong Kong Polytechnic University(internal grant nos.1-ZE8R and G-YBDH)the 111 Project of the State Administration of Foreign Experts Affairs and the Ministry of Education,China(grant no.D16002)。
文摘The mechanical properties of complex concentrated alloys(CCAs)depend on their formed phases and corresponding microstructures.The data-driven prediction of the phase formation and associated mechanical properties is essential to discovering novel CCAs.The present work collects 557 samples of various chemical compositions,comprising 61 amorphous,167 single-phase crystalline,and 329 multiphases crystalline CCAs.Three classification models are developed with high accuracies to category and understand the formed phases of CCAs.Also,two regression models are constructed to predict the hardness and ultimate tensile strength of CCAs,and the correlation coefficient of the random forest regression model is greater than 0.9 for both of two targeted properties.Furthermore,the Shapley additive explanation(SHAP)values are calculated,and accordingly four most important features are identified.A significant finding in the SHAP values is that there exists a critical value in each of the top four features,which provides an easy and fast assessment in the design of improved mechanical properties of CCAs.The present work demonstrates the great potential of machine learning in the design of advanced CCAs.
基金supported by the National Natural Science Foundation of China(61722403,92061113,and 12004131)the Interdisciplinary Research Grant for Ph Ds of Jilin University(101832020DJX043)。
文摘Materials informatics has emerged as a promisingly new paradigm for accelerating materials discovery and design.It exploits the intelligent power of machine learning methods in massive materials data from experiments or simulations to seek new materials,functionality,and principles,etc.Developing specialized facilities to generate,collect,manage,learn,and mine large-scale materials data is crucial to materials informatics.We herein developed an artificial-intelligence-aided data-driven infrastructure named Jilin Artificial-intelligence aided Materials-design Integrated Package(JAMIP),which is an open-source Python framework to meet the research requirements of computational materials informatics.It is integrated by materials production factory,high-throughput first-principles calculations engine,automatic tasks submission and monitoring progress,data extraction,management and storage system,and artificial intelligence machine learning based data mining functions.We have integrated specific features such as an inorganic crystal structure prototype database to facilitate high-throughput calculations and essential modules associated with machine learning studies of functional materials.We demonstrated how our developed code is useful in exploring materials informatics of optoelectronic semiconductors by taking halide perovskites as typical case.By obeying the principles of automation,extensibility,reliability,and intelligence,the JAMIP code is a promisingly powerful tool contributing to the fast-growing field of computational materials informatics.
基金the National Key R&D Program of China(No.2018YFB0704404)the Guangdong Basic and Applied Basic Research Foundation(No.2020A1515110798)+1 种基金the National Natural Science Foundation of China(Grant Nos.91860115)the Stable Supporting Fund of Shenzhen(GXWD20201230155427003-20200728114835006)。
文摘A data augmentation technique is employed in the current work on a training dataset of 610 bulk metallic glasses(BMGs),which are randomly selected from 762 collected data.An ensemble machine learning(ML)model is developed on augmented training dataset and tested by the rest 152 data.The result shows that ML model has the ability to predict the maximal diameter Dmaxof BMGs more accurate than all reported ML models.In addition,the novel ML model gives the glass forming ability(GFA)rules:average atomic radius ranging from 140 pm to 165 pm,the value of TT/(T-T)(T-T)being higher than 2.5,the entropy of mixing being higher than 10 J/K/mol,and the enthalpy of mixing ranging from-32 k J/mol to-26 k J/mol.ML model is interpretative,thereby deepening the understanding of GFA.
基金supported by the National Natural Science Foundation of China(12074015)the Beijing Outstanding Young Scientists Projects(BJJWZYJH01201910005018).
文摘As an implementation tool of data intensive scientific research methods,machine learning(ML)can effectively shorten the research and development(R&D)cycle of new materials by half or even more.ML shows great potential in the combination with other scientific research technologies,especially in the processing and classification of large amounts of material data from theoretical calculation and experimental characterization.It is very important to systematically understand the research ideas of material informatics to accelerate the exploration of new materials.Here,we provide a comprehensive introduction to the most commonly used ML modeling methods in material informatics with classic cases.Then,we review the latest progresses of prediction models,which focus on new processing–structure–properties–performance(PSPP)relationships in some popular material systems,such as perovskites,catalysts,alloys,two-dimensional materials,and polymers.In addition,we summarize the recent pioneering researches in innovation of material research technology,such as inverse design,ML interatomic potentials,and microtopography characterization assistance,as new research directions of material informatics.Finally,we comprehensively provide the most significant challenges and outlooks related to the future innovation and development in the field of material informatics.This review provides a critical and concise appraisal for the applications of material informatics,and a systematic and coherent guidance for material scientists to choose modeling methods based on required materials and technologies.
基金supported by the National Key Research and Development Program of China (Grant No. 2018YFB0704404)the Hong Kong Polytechnic University (Internal Grant Nos. 1-ZE8R and G-YBDH)the 111Project of the State Administration of Foreign Experts Affairs and the Ministry of Education,China (Grant No. D16002)。
文摘Knowledge of the mechanical properties of structural materials is essential for their practical applications. In the present work,three-hundred and sixty data samples on four mechanical properties of steels—fatigue strength, tensile strength, fracture strength and hardness—were selected from the Japan National Institute of Material Science database, comprising data on carbon steels and low-alloy steels. Five machine learning algorithms were used to predict the mechanical properties of the materials represented by the three-hundred and sixty data samples, and random forest regression showed the best predictive performance.Feature selection conducted by random forest and symbolic regressions revealed the four most important features that most influence the mechanical properties of steels: the tempering temperature of steel, and the alloying elements of carbon, chromium and molybdenum. Mathematical expressions were generated via symbolic regression, and the expressions explicitly predicted how each of the four mechanical properties varied quantitatively with the four most important features. This study demonstrates the great potential of symbolic regression in the discovery of novel advanced materials.