Data provides a foundation for machine learning,which has accelerated data-driven materials design.The scientific literature contains a large amount of high-quality,reliable data,and automatically extracting data from...Data provides a foundation for machine learning,which has accelerated data-driven materials design.The scientific literature contains a large amount of high-quality,reliable data,and automatically extracting data from the literature continues to be a challenge.We propose a natural language processing pipeline to capture both chemical composition and property data that allows analysis and prediction of superalloys.Within 3 h,2531 records with both composition and property are extracted from 14,425 articles,coveringγ′solvus temperature,density,solidus,and liquidus temperatures.A data-driven model forγ′solvus temperature is built to predict unexplored Co-based superalloys with highγ′solvus temperatures within a relative error of 0.81%.We test the predictions via synthesis and characterization of three alloys.A web-based toolkit as an online open-source platform is provided and expected to serve as the basis for a general method to search for targeted materials using data extracted from the literature.展开更多
Alloy synthesis and processing determine the design of alloys with desired microstructure and properties.However,using data science to identify optimal synthesis-design routes from a specified set of starting material...Alloy synthesis and processing determine the design of alloys with desired microstructure and properties.However,using data science to identify optimal synthesis-design routes from a specified set of starting materials has been limited by large-scale data acquisition.Text mining has made it possible to convert scientific text into structured data collections.Still,the complexity,diversity,and flexibility of synthesis and processing expressions,and the lack of annotated corpora with a gold standard severely hinder accurate and efficient extraction.Here we introduce a semi-supervised text mining method to extract the parameters corresponding to the sequence of actions of synthesis and processing.We automatically extract a total of 9853 superalloy synthesis and processing actions with chemical compositions from a corpus of 16,604 superalloy articles published up to 2022.These have then been used to capture an explicitly expressed synthesis factor for predictingγ′phase coarsening.The synthesis factor derived from text mining significantly improves the performance of the data-drivenγ′size prediction model.The method thus complements the use of data-driven approaches in the search for relationships between synthesis and structures.展开更多
基金This work is financially supported by the National Key Research and Development Program of China(2020YFB0704503,2016YFB0700500)Guangdong Province Key Area R&D Program(2019B010940001)+1 种基金111 Project(B170003)USTB MatCom of Beijing Advanced Innovation Center for Materials Genome Engineering.
文摘Data provides a foundation for machine learning,which has accelerated data-driven materials design.The scientific literature contains a large amount of high-quality,reliable data,and automatically extracting data from the literature continues to be a challenge.We propose a natural language processing pipeline to capture both chemical composition and property data that allows analysis and prediction of superalloys.Within 3 h,2531 records with both composition and property are extracted from 14,425 articles,coveringγ′solvus temperature,density,solidus,and liquidus temperatures.A data-driven model forγ′solvus temperature is built to predict unexplored Co-based superalloys with highγ′solvus temperatures within a relative error of 0.81%.We test the predictions via synthesis and characterization of three alloys.A web-based toolkit as an online open-source platform is provided and expected to serve as the basis for a general method to search for targeted materials using data extracted from the literature.
基金This work is financially supported by the National Key Research and Development Program of China(2021YFB3702403,2022YFB3707502)National Natural Science Foundation of China(52201061,U22A20106)+1 种基金Fundamental Research Funds for the Central Universities(FRF-TP-22-008A1)USTB MatCom of Beijing Advanced Innova-tion Center for Materials Genome Engineering,and the CNNC Science Fund for Talented Young Scholars(FY222506000902).
文摘Alloy synthesis and processing determine the design of alloys with desired microstructure and properties.However,using data science to identify optimal synthesis-design routes from a specified set of starting materials has been limited by large-scale data acquisition.Text mining has made it possible to convert scientific text into structured data collections.Still,the complexity,diversity,and flexibility of synthesis and processing expressions,and the lack of annotated corpora with a gold standard severely hinder accurate and efficient extraction.Here we introduce a semi-supervised text mining method to extract the parameters corresponding to the sequence of actions of synthesis and processing.We automatically extract a total of 9853 superalloy synthesis and processing actions with chemical compositions from a corpus of 16,604 superalloy articles published up to 2022.These have then been used to capture an explicitly expressed synthesis factor for predictingγ′phase coarsening.The synthesis factor derived from text mining significantly improves the performance of the data-drivenγ′size prediction model.The method thus complements the use of data-driven approaches in the search for relationships between synthesis and structures.