The unique long-range disordered atomic arrangement inherent in amorphous materials endows them with a range of superior properties,rendering them highly promising for applications in catalysis,medicine,and battery te...The unique long-range disordered atomic arrangement inherent in amorphous materials endows them with a range of superior properties,rendering them highly promising for applications in catalysis,medicine,and battery technology,among other fields.Since not all materials can be synthesized into an amorphous structure,the composition design of amorphous materials holds significant importance.Machine learning offers a valuable alternative to traditional“trial-anderror”methods by predicting properties through experimental data,thus providing efficient guidance in material design.In this study,we develop a machine learning workflow to predict the critical casting diameter,glass transition temperature,and Young's modulus for 45 ternary reported amorphous alloy systems.The predicted results have been organized into a database,enabling direct retrieval of predicted values based on compositional information.Furthermore,the applications of high glass forming ability region screening for specified system,multi-property target system screening and high glass forming ability region search through iteration are also demonstrated.By utilizing machine learning predictions,researchers can effectively narrow the experimental scope and expedite the exploration of compositions.展开更多
The authors regret that the original publication of this paper did not include Jawad Fayaz as a co-author.After further discussions and a thorough review of the research contributions,it was agreed that his significan...The authors regret that the original publication of this paper did not include Jawad Fayaz as a co-author.After further discussions and a thorough review of the research contributions,it was agreed that his significant contributions to the foundational aspects of the research warranted recognition,and he has now been added as a co-author.展开更多
As an information-rich collective, there are always some people who choose to take risks for some ulterior purpose and others are committed to finding ways to deal with database security threats. The purpose of databa...As an information-rich collective, there are always some people who choose to take risks for some ulterior purpose and others are committed to finding ways to deal with database security threats. The purpose of database security research is to prevent the database from being illegally used or destroyed. This paper introduces the main literature in the field of database security research in recent years. First of all, we classify these papers, the classification criteria </span><span style="font-size:12px;font-family:Verdana;">are</span><span style="font-size:12px;font-family:Verdana;"> the influencing factors of database security. Compared with the traditional and machine learning (ML) methods, some explanations of concepts are interspersed to make these methods easier to understand. Secondly, we find that the related research has achieved some gratifying results, but there are also some shortcomings, such as weak generalization, deviation from reality. Then, possible future work in this research is proposed. Finally, we summarize the main contribution.展开更多
Chemical structure searching based on databases and machine learning has at-tracted great attention recently for fast screening materials with target func-tionalities.To this end,we estab-lished a high-performance che...Chemical structure searching based on databases and machine learning has at-tracted great attention recently for fast screening materials with target func-tionalities.To this end,we estab-lished a high-performance chemical struc-ture database based on MYSQL engines,named MYDB.More than 160000 metal-organic frameworks(MOFs)have been collected and stored by using new retrieval algorithms for effcient searching and recom-mendation.The evaluations results show that MYDB could realize fast and effcient key-word searching against millions of records and provide real-time recommendations for similar structures.Combining machine learning method and materials database,we developed an adsorption model to determine the adsorption capacitor of metal-organic frameworks to-ward argon and hydrogen under certain conditions.We expect that MYDB together with the developed machine learning techniques could support large-scale,low-cost,and highly convenient structural research towards accelerating discovery of materials with target func-tionalities in the eld of computational materials research.展开更多
Traditionally,nonlinear time history analysis(NLTHA)is used to assess the performance of structures under fu-ture hazards which is necessary to develop effective disaster risk management strategies.However,this method...Traditionally,nonlinear time history analysis(NLTHA)is used to assess the performance of structures under fu-ture hazards which is necessary to develop effective disaster risk management strategies.However,this method is computationally intensive and not suitable for analyzing a large number of structures on a city-wide scale.Surrogate models offer an efficient and reliable alternative and facilitate evaluating the performance of multiple structures under different hazard scenarios.However,creating a comprehensive database for surrogate mod-elling at the city level presents challenges.To overcome this,the present study proposes meta databases and a general framework for surrogate modelling of steel structures.The dataset includes 30,000 steel moment-resisting frame buildings,representing low-rise,mid-rise and high-rise buildings,with criteria for connections,beams,and columns.Pushover analysis is performed and structural parameters are extracted,and finally,incorporating two different machine learning algorithms,random forest and Shapley additive explanations,sensitivity and explain-ability analyses of the structural parameters are performed to identify the most significant factors in designing steel moment resisting frames.The framework and databases can be used as a validated source of surrogate modelling of steel frame structures in order for disaster risk management.展开更多
At present,the database cache model of power information system has problems such as slow running speed and low database hit rate.To this end,this paper proposes a database cache model for power information systems ba...At present,the database cache model of power information system has problems such as slow running speed and low database hit rate.To this end,this paper proposes a database cache model for power information systems based on deep machine learning.The caching model includes program caching,Structured Query Language(SQL)preprocessing,and core caching modules.Among them,the method to improve the efficiency of the statement is to adjust operations such as multi-table joins and replacement keywords in the SQL optimizer.Build predictive models using boosted regression trees in the core caching module.Generate a series of regression tree models using machine learning algorithms.Analyze the resource occupancy rate in the power information system to dynamically adjust the voting selection of the regression tree.At the same time,the voting threshold of the prediction model is dynamically adjusted.By analogy,the cache model is re-initialized.The experimental results show that the model has a good cache hit rate and cache efficiency,and can improve the data cache performance of the power information system.It has a high hit rate and short delay time,and always maintains a good hit rate even under different computer memory;at the same time,it only occupies less space and less CPU during actual operation,which is beneficial to power The information system operates efficiently and quickly.展开更多
To review the existing deep learning applications for diagnosing diabetic retinopathy and retinopathy of prematurity diseases,the available public retinal databases for the diseases and apply the International Journal...To review the existing deep learning applications for diagnosing diabetic retinopathy and retinopathy of prematurity diseases,the available public retinal databases for the diseases and apply the International Journal of Medical Informatics(IJMEDI)checklist were assessed the quality of included studies;an in-depth literature search in Scopus,Web of Science,IEEE and ACM databases targeting articles from inception up to 31st January 2023 was done by two independent reviewers.In the review,26 out of 1476 articles with a total of 36 models were included.Data size and model validation were found to be challenges for most studies.Deep learning models are gaining focus in the development of medical diagnosis tools and applications.However,there seems to be a critical issue with most of the studies being published,with some not including information about data sources and data sizes which is important for their performance verification.展开更多
Planetary surfaces,shaped by billions of years of geologic evolution,display numerous impact craters whose distribution of size,density,and spatial arrangement reveals the celestial body's history.Identifying thes...Planetary surfaces,shaped by billions of years of geologic evolution,display numerous impact craters whose distribution of size,density,and spatial arrangement reveals the celestial body's history.Identifying these craters is essential for planetary science and is currently mainly achieved with deep learning-driven detection algorithms.However,because impact crater characteristics are substantially affected by the geologic environment,surface materials,and atmospheric conditions,the performance of deep learning models can be inconsistent between celestial bodies.In this paper,we first examine how the surface characteristics of the Moon,Mars,and Earth,along with the differences in their impact crater features,affect model performance.Then,we compare crater detection across celestial bodies by analyzing enhanced convolutional neural networks and U-shaped Convolutional Neural Network-based models to highlight how geology,data,and model design affect accuracy and generalization.Finally,we address current deep learning challenges,suggest directions for model improvement,such as multimodal data fusion and cross-planet learning and list available impact crater databases.This review can provide necessary technical support for deep space exploration and planetary science,as well as new ideas and directions for future research on automatic detection of impact craters on celestial body surfaces and on planetary geology.展开更多
Two-dimensional transition metal porphyrinoid materials(2DTMPoidMats),due to their unique electronic structure and tunable metal active sites,have the potential to enhance interactions with nitrogen molecules and prom...Two-dimensional transition metal porphyrinoid materials(2DTMPoidMats),due to their unique electronic structure and tunable metal active sites,have the potential to enhance interactions with nitrogen molecules and promote the protonation process,making them promising electrochemical nitrogen reduction reaction(eNRR)electrocatalysts.Experimentally screening a large number of catalysts for eNRR catalytic performance would consume considerable time and economic resources.First-principles calculations and machine learning(ML)algorithms could greatly improve the efficiency of catalyst screening.Using this approach,we selected 86 candidates capable of catalyzing eNRR from 1290 types of 2DTMPoidMats,and verified the results with density functional theory(DFT)computations.Analysis of the full reaction pathway shows that MoPp-meso-F-β-Py,MoPp-β-Cl-meso-Diyne,MoPp-meso-Ethinyl,and WPp-β-Pz exhibit the best catalytic performance with the onset potential of-0.22,-0.19,-0.23,and-0.35 V,respectively.This work provides valuable insights into efficient design and screening of eNRR catalysts and promotes the application of ML algorithmic models in the field of catalysis.展开更多
Discovering new materials with excellent performance is a hot issue in the materials genome initiative.Traditional experiments and calculations often waste large amounts of time and money and are also limited by vario...Discovering new materials with excellent performance is a hot issue in the materials genome initiative.Traditional experiments and calculations often waste large amounts of time and money and are also limited by various conditions. Therefore, it is imperative to develop a new method to accelerate the discovery and design of new materials. In recent years, material discovery and design methods using machine learning have attracted much attention from material experts and have made some progress. This review first outlines available materials database and material data analytics tools and then elaborates on the machine learning algorithms used in materials science. Next, the field of application of machine learning in materials science is summarized, focusing on the aspects of structure determination, performance prediction, fingerprint prediction, and new material discovery. Finally, the review points out the problems of data and machine learning in materials science and points to future research. Using machine learning algorithms, the authors hope to achieve amazing results in material discovery and design.展开更多
The insulation aging of cross-linked polyethylene(XLPE)cables is the main reason for the reduction in cable life.There is currently a lack of rapid and effective methods for detecting cable insulation defects in power...The insulation aging of cross-linked polyethylene(XLPE)cables is the main reason for the reduction in cable life.There is currently a lack of rapid and effective methods for detecting cable insulation defects in power-related sectors.To this end,this paper presents a method for identifying insulation defects in XLPE cables based on deep learning algorithms.First,the principle of the harmonic method for detecting cable insulation defects is introduced.Second,the ANSYS software is used to simulate the cable insulation layer containing bubbles,protrusions,and water tree defects,and the effects of each type of defect on the magnetic field strength and eddy loss current of the cable insulation layer are analyzed.Then,a total of 10 characteristic quantities of the total harmonic content and 2nd to 10th harmonic currents are constructed to establish a database of cable insulation defects.Finally,the deep learning algorithm,long short-term memory(LSTM),is used to accurately identify the types of insulation defects in cables.The results indicate that the LSTM algorithm can effectively diagnose and identify insulation defects in cables with an accuracy of 95.83%.展开更多
A large database is desired for machine learning(ML) technology to make accurate predictions of materials physicochemical properties based on their molecular structure.When a large database is not available,the develo...A large database is desired for machine learning(ML) technology to make accurate predictions of materials physicochemical properties based on their molecular structure.When a large database is not available,the development of proper featurization method based on physicochemical nature of target proprieties can improve the predictive power of ML models with a smaller database.In this work,we show that two new featurization methods,volume occupation spatial matrix and heat contribution spatial matrix,can improve the accuracy in predicting energetic materials' crystal density(ρ_(crystal)) and solid phase enthalpy of formation(H_(f,solid)) using a database containing 451 energetic molecules.Their mean absolute errors are reduced from 0.048 g/cm~3 and 24.67 kcal/mol to 0.035 g/cm~3 and 9.66 kcal/mol,respectively.By leave-one-out-cross-validation,the newly developed ML models can be used to determine the performance of most kinds of energetic materials except cubanes.Our ML models are applied to predict ρ_(crystal) and H_(f,solid) of CHON-based molecules of the 150 million sized PubChem database,and screened out 56 candidates with competitive detonation performance and reasonable chemical structures.With further improvement in future,spatial matrices have the potential of becoming multifunctional ML simulation tools that could provide even better predictions in wider fields of materials science.展开更多
The history and current status of materials data activities from handbook to database are reviewed, with introduction to some important products. Through an example of prediction of interfacial thermal resistance base...The history and current status of materials data activities from handbook to database are reviewed, with introduction to some important products. Through an example of prediction of interfacial thermal resistance based on data and data science methods, we show the advantages and potential of material informatics to study material issues which are too complicated or time consuming for conventional theoretical and experimental methods. Materials big data is the fundamental of material informatics. The challenges and strategy to construct materials big data are discussed, and some solutions are proposed as the results of our experiences to construct National Institute for Materials Science(NIMS) materials databases.展开更多
The variation of crustal thickness is a critical index to reveal how the continental crust evolved over its four billion years.Generally,ratios of whole-rock trace elements,such as Sr/Y,(La/Yb)n and Ce/Y,are used to c...The variation of crustal thickness is a critical index to reveal how the continental crust evolved over its four billion years.Generally,ratios of whole-rock trace elements,such as Sr/Y,(La/Yb)n and Ce/Y,are used to characterize crustal thicknesses.However,sometimes confusing results are obtained since there is no enough filtered data.Here,a state-of-the-art approach,based on a machine-learning algorithm,is proposed to predict crustal thickness using global major-and trace-element geochemical data of intermediate arc rocks and intraplate basalts,and their corresponding crustal thicknesses.After the validation processes,the root-mean-square error(RMSE)and the coefficient of determination(R2)score were used to evaluate the performance of the machine learning algorithm based on the learning dataset which has never been used during the training phase.The results demonstrate that the machine learning algorithm is more reliable in predicting crustal thickness than the conventional methods.The trained model predicts that the crustal thickness of the eastern North China Craton(ENCC)was-45 km from the Late Triassic to the Early Cretaceous,but-35 km from the Early Cretaceous,which corresponds to the paleo-elevation of 3.0±1.5 km at Early Mesozoic,and decease to the present-day elevation in the ENCC.The estimates are generally consistent with the previous studies on xenoliths from the lower crust and on the paleoenvironment of the coastal mountain of the ENCC,which indicates that the lower crust of the ENCC was delaminated abruptly at the Early Cretaceous.展开更多
Typically, magnesium alloys have been designed using a so-called hill-climbing approach, with rather incremental advances over the past century. Iterative and incremental alloy design is slow and expensive, but more i...Typically, magnesium alloys have been designed using a so-called hill-climbing approach, with rather incremental advances over the past century. Iterative and incremental alloy design is slow and expensive, but more importantly it does not harness all the data that exists in the field. In this work, a new approach is proposed that utilises data science and provides a detailed understanding of the data that exists in the field of Mg-alloy design to date. In this approach, first a consolidated alloy database that incorporates 916 datapoints was developed from the literature and experimental work. To analyse the characteristics of the database, alloying and thermomechanical processing effects on mechanical properties were explored via composition-process-property matrices. An unsupervised machine learning(ML) method of clustering was also implemented, using unlabelled data, with the aim of revealing potentially useful information for an alloy representation space of low dimensionality. In addition, the alloy database was correlated to thermodynamically stable secondary phases to further understand the relationships between microstructure and mechanical properties. This work not only introduces an invaluable open-source database, but it also provides, for the first-time data, insights that enable future accelerated digital Mg-alloy design.展开更多
Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical I...Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.展开更多
All-solid-state batteries(ASSBs)are a class of safer and higher-energy-density materials compared to conventional devices,from which solid-state electrolytes(SSEs)are their essential components.To date,investigations ...All-solid-state batteries(ASSBs)are a class of safer and higher-energy-density materials compared to conventional devices,from which solid-state electrolytes(SSEs)are their essential components.To date,investigations to search for high ion-conducting solid-state electrolytes have attracted broad concern.However,obtaining SSEs with high ionic conductivity is challenging due to the complex structural information and the less-explored structure-performance relationship.To provide a solution to these challenges,developing a database containing typical SSEs from available experimental reports would be a new avenue to understand the structureperformance relationships and find out new design guidelines for reasonable SSEs.Herein,a dynamic experimental database containing>600 materials was developed in a wide range of temperatures(132.40–1261.60 K),including mono-and divalent cations(e.g.,Li^(+),Na^(+),K^(+),Ag^(+),Ca^(2+),Mg^(2+),and Zn^(2+))and various types of anions(e.g.,halide,hydride,sulfide,and oxide).Data-mining was conducted to explore the relationships among different variates(e.g.,transport ion,composition,activation energy,and conductivity).Overall,we expect that this database can provide essential guidelines for the design and development of high-performance SSEs in ASSB applications.This database is dynamically updated,which can be accessed via our open-source online system.展开更多
The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to th...The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to the CAST-NEONS environmental databases used for ocean modeling and prediction. We describe a discovery-learning process (Automatic Data Analysis System) which combines the features of two machine learning techniques to generate sets of production rules that efficiently describe the observational raw data contained in the database. Data clustering allows the system to classify the raw data into meaningful conceptual clusters, which the system learns by induction to build decision trees, from which are automatically deduced the production rules.展开更多
In soft sensor field, just-in-time learning(JITL) is an effective approach to model nonlinear and time varying processes. However, most similarity criterions in JITL are computed in the input space only while ignoring...In soft sensor field, just-in-time learning(JITL) is an effective approach to model nonlinear and time varying processes. However, most similarity criterions in JITL are computed in the input space only while ignoring important output information, which may lead to inaccurate construction of relevant sample set. To solve this problem, we propose a novel supervised feature extraction method suitable for the regression problem called supervised local and non-local structure preserving projections(SLNSPP), in which both input and output information can be easily and effectively incorporated through a newly defined similarity index. The SLNSPP can not only retain the virtue of locality preserving projections but also prevent faraway points from nearing after projection,which endues SLNSPP with powerful discriminating ability. Such two good properties of SLNSPP are desirable for JITL as they are expected to enhance the accuracy of similar sample selection. Consequently, we present a SLNSPP-JITL framework for developing adaptive soft sensor, including a sparse learning strategy to limit the scale and update the frequency of database. Finally, two case studies are conducted with benchmark datasets to evaluate the performance of the proposed schemes. The results demonstrate the effectiveness of LNSPP and SLNSPP.展开更多
基金Project supported by funding from the National Natural Science Foundation of China(Grant Nos.52172258,52473227 and 52171150)the Strategic Priority Research Program of Chinese Academy of Sciences(Grant No.XDB0500200)。
文摘The unique long-range disordered atomic arrangement inherent in amorphous materials endows them with a range of superior properties,rendering them highly promising for applications in catalysis,medicine,and battery technology,among other fields.Since not all materials can be synthesized into an amorphous structure,the composition design of amorphous materials holds significant importance.Machine learning offers a valuable alternative to traditional“trial-anderror”methods by predicting properties through experimental data,thus providing efficient guidance in material design.In this study,we develop a machine learning workflow to predict the critical casting diameter,glass transition temperature,and Young's modulus for 45 ternary reported amorphous alloy systems.The predicted results have been organized into a database,enabling direct retrieval of predicted values based on compositional information.Furthermore,the applications of high glass forming ability region screening for specified system,multi-property target system screening and high glass forming ability region search through iteration are also demonstrated.By utilizing machine learning predictions,researchers can effectively narrow the experimental scope and expedite the exploration of compositions.
文摘The authors regret that the original publication of this paper did not include Jawad Fayaz as a co-author.After further discussions and a thorough review of the research contributions,it was agreed that his significant contributions to the foundational aspects of the research warranted recognition,and he has now been added as a co-author.
文摘As an information-rich collective, there are always some people who choose to take risks for some ulterior purpose and others are committed to finding ways to deal with database security threats. The purpose of database security research is to prevent the database from being illegally used or destroyed. This paper introduces the main literature in the field of database security research in recent years. First of all, we classify these papers, the classification criteria </span><span style="font-size:12px;font-family:Verdana;">are</span><span style="font-size:12px;font-family:Verdana;"> the influencing factors of database security. Compared with the traditional and machine learning (ML) methods, some explanations of concepts are interspersed to make these methods easier to understand. Secondly, we find that the related research has achieved some gratifying results, but there are also some shortcomings, such as weak generalization, deviation from reality. Then, possible future work in this research is proposed. Finally, we summarize the main contribution.
基金This work was supported by the National Natu-ral Science Foundation of China(No.21573204 and No.21421063),Fundamental Research Funds for the Central Universities,National Program for Support of Top-notch Young Professional,CAS Interdisciplinary Innovation Team,and Super Computer Center of USTCSCC and SCCAS.
文摘Chemical structure searching based on databases and machine learning has at-tracted great attention recently for fast screening materials with target func-tionalities.To this end,we estab-lished a high-performance chemical struc-ture database based on MYSQL engines,named MYDB.More than 160000 metal-organic frameworks(MOFs)have been collected and stored by using new retrieval algorithms for effcient searching and recom-mendation.The evaluations results show that MYDB could realize fast and effcient key-word searching against millions of records and provide real-time recommendations for similar structures.Combining machine learning method and materials database,we developed an adsorption model to determine the adsorption capacitor of metal-organic frameworks to-ward argon and hydrogen under certain conditions.We expect that MYDB together with the developed machine learning techniques could support large-scale,low-cost,and highly convenient structural research towards accelerating discovery of materials with target func-tionalities in the eld of computational materials research.
基金financial support from Teesside University to support the Ph.D.programme of the first author.
文摘Traditionally,nonlinear time history analysis(NLTHA)is used to assess the performance of structures under fu-ture hazards which is necessary to develop effective disaster risk management strategies.However,this method is computationally intensive and not suitable for analyzing a large number of structures on a city-wide scale.Surrogate models offer an efficient and reliable alternative and facilitate evaluating the performance of multiple structures under different hazard scenarios.However,creating a comprehensive database for surrogate mod-elling at the city level presents challenges.To overcome this,the present study proposes meta databases and a general framework for surrogate modelling of steel structures.The dataset includes 30,000 steel moment-resisting frame buildings,representing low-rise,mid-rise and high-rise buildings,with criteria for connections,beams,and columns.Pushover analysis is performed and structural parameters are extracted,and finally,incorporating two different machine learning algorithms,random forest and Shapley additive explanations,sensitivity and explain-ability analyses of the structural parameters are performed to identify the most significant factors in designing steel moment resisting frames.The framework and databases can be used as a validated source of surrogate modelling of steel frame structures in order for disaster risk management.
文摘At present,the database cache model of power information system has problems such as slow running speed and low database hit rate.To this end,this paper proposes a database cache model for power information systems based on deep machine learning.The caching model includes program caching,Structured Query Language(SQL)preprocessing,and core caching modules.Among them,the method to improve the efficiency of the statement is to adjust operations such as multi-table joins and replacement keywords in the SQL optimizer.Build predictive models using boosted regression trees in the core caching module.Generate a series of regression tree models using machine learning algorithms.Analyze the resource occupancy rate in the power information system to dynamically adjust the voting selection of the regression tree.At the same time,the voting threshold of the prediction model is dynamically adjusted.By analogy,the cache model is re-initialized.The experimental results show that the model has a good cache hit rate and cache efficiency,and can improve the data cache performance of the power information system.It has a high hit rate and short delay time,and always maintains a good hit rate even under different computer memory;at the same time,it only occupies less space and less CPU during actual operation,which is beneficial to power The information system operates efficiently and quickly.
基金Supported by DAAD,Google Research,and the Organization for Women in Science for the Developing World(OWSD).
文摘To review the existing deep learning applications for diagnosing diabetic retinopathy and retinopathy of prematurity diseases,the available public retinal databases for the diseases and apply the International Journal of Medical Informatics(IJMEDI)checklist were assessed the quality of included studies;an in-depth literature search in Scopus,Web of Science,IEEE and ACM databases targeting articles from inception up to 31st January 2023 was done by two independent reviewers.In the review,26 out of 1476 articles with a total of 36 models were included.Data size and model validation were found to be challenges for most studies.Deep learning models are gaining focus in the development of medical diagnosis tools and applications.However,there seems to be a critical issue with most of the studies being published,with some not including information about data sources and data sizes which is important for their performance verification.
基金funded by the National Natural Science Foundation of China(12363009 and 12103020)Natural Science Foundation of Jiangxi Province(20224BAB211011)+1 种基金Youth Talent Project of Science and Technology Plan of Ganzhou(2022CXRC9191 and 2023CYZ26970)Jiangxi Province Graduate Innovation Special Funds Project(YC2024-S529 and YC2023-S672).
文摘Planetary surfaces,shaped by billions of years of geologic evolution,display numerous impact craters whose distribution of size,density,and spatial arrangement reveals the celestial body's history.Identifying these craters is essential for planetary science and is currently mainly achieved with deep learning-driven detection algorithms.However,because impact crater characteristics are substantially affected by the geologic environment,surface materials,and atmospheric conditions,the performance of deep learning models can be inconsistent between celestial bodies.In this paper,we first examine how the surface characteristics of the Moon,Mars,and Earth,along with the differences in their impact crater features,affect model performance.Then,we compare crater detection across celestial bodies by analyzing enhanced convolutional neural networks and U-shaped Convolutional Neural Network-based models to highlight how geology,data,and model design affect accuracy and generalization.Finally,we address current deep learning challenges,suggest directions for model improvement,such as multimodal data fusion and cross-planet learning and list available impact crater databases.This review can provide necessary technical support for deep space exploration and planetary science,as well as new ideas and directions for future research on automatic detection of impact craters on celestial body surfaces and on planetary geology.
基金support from the National Natural Science Foundation of China(22073033,21873032,21673087,21903032)startup fund(2006013118 and 3004013105)from Huazhong University of Science and Technology,the Fundamental Research Funds for the Central Universities(2019kfyRCPY116)+1 种基金the Innovation and Talent Recruitment Base of New Energy Chemistry and Device(B21003)supported by the public computing service platform provided by the Network and Computing Center of HUST.
文摘Two-dimensional transition metal porphyrinoid materials(2DTMPoidMats),due to their unique electronic structure and tunable metal active sites,have the potential to enhance interactions with nitrogen molecules and promote the protonation process,making them promising electrochemical nitrogen reduction reaction(eNRR)electrocatalysts.Experimentally screening a large number of catalysts for eNRR catalytic performance would consume considerable time and economic resources.First-principles calculations and machine learning(ML)algorithms could greatly improve the efficiency of catalyst screening.Using this approach,we selected 86 candidates capable of catalyzing eNRR from 1290 types of 2DTMPoidMats,and verified the results with density functional theory(DFT)computations.Analysis of the full reaction pathway shows that MoPp-meso-F-β-Py,MoPp-β-Cl-meso-Diyne,MoPp-meso-Ethinyl,and WPp-β-Pz exhibit the best catalytic performance with the onset potential of-0.22,-0.19,-0.23,and-0.35 V,respectively.This work provides valuable insights into efficient design and screening of eNRR catalysts and promotes the application of ML algorithmic models in the field of catalysis.
基金financially supported by the National Natural Science Foundation of China (Nos. 61971208, 61671225 and 51864027)the Yunnan Applied Basic Research Projects (No. 2018FA034)+2 种基金the Yunnan Reserve Talents of Young and Middleaged Academic and Technical Leaders (Shen Tao, 2018)the Yunnan Young Top Talents of Ten Thousands Plan (Shen Tao, Zhu Yan, Yunren Social Development No. 2018 73)the Scientific Research Foundation of Kunming University of Science and Technology (No. KKSY201703016)。
文摘Discovering new materials with excellent performance is a hot issue in the materials genome initiative.Traditional experiments and calculations often waste large amounts of time and money and are also limited by various conditions. Therefore, it is imperative to develop a new method to accelerate the discovery and design of new materials. In recent years, material discovery and design methods using machine learning have attracted much attention from material experts and have made some progress. This review first outlines available materials database and material data analytics tools and then elaborates on the machine learning algorithms used in materials science. Next, the field of application of machine learning in materials science is summarized, focusing on the aspects of structure determination, performance prediction, fingerprint prediction, and new material discovery. Finally, the review points out the problems of data and machine learning in materials science and points to future research. Using machine learning algorithms, the authors hope to achieve amazing results in material discovery and design.
基金supported by the technology project of the State Grid Shanxi Electric Power Company.The name of the project is“Research and Application of Cable electrification diagnosis Technology based on Harmonic method”(5205C02000GL).
文摘The insulation aging of cross-linked polyethylene(XLPE)cables is the main reason for the reduction in cable life.There is currently a lack of rapid and effective methods for detecting cable insulation defects in power-related sectors.To this end,this paper presents a method for identifying insulation defects in XLPE cables based on deep learning algorithms.First,the principle of the harmonic method for detecting cable insulation defects is introduced.Second,the ANSYS software is used to simulate the cable insulation layer containing bubbles,protrusions,and water tree defects,and the effects of each type of defect on the magnetic field strength and eddy loss current of the cable insulation layer are analyzed.Then,a total of 10 characteristic quantities of the total harmonic content and 2nd to 10th harmonic currents are constructed to establish a database of cable insulation defects.Finally,the deep learning algorithm,long short-term memory(LSTM),is used to accurately identify the types of insulation defects in cables.The results indicate that the LSTM algorithm can effectively diagnose and identify insulation defects in cables with an accuracy of 95.83%.
基金support from the Ministry of Education(MOE) Singapore Tier 1 (RG8/20)。
文摘A large database is desired for machine learning(ML) technology to make accurate predictions of materials physicochemical properties based on their molecular structure.When a large database is not available,the development of proper featurization method based on physicochemical nature of target proprieties can improve the predictive power of ML models with a smaller database.In this work,we show that two new featurization methods,volume occupation spatial matrix and heat contribution spatial matrix,can improve the accuracy in predicting energetic materials' crystal density(ρ_(crystal)) and solid phase enthalpy of formation(H_(f,solid)) using a database containing 451 energetic molecules.Their mean absolute errors are reduced from 0.048 g/cm~3 and 24.67 kcal/mol to 0.035 g/cm~3 and 9.66 kcal/mol,respectively.By leave-one-out-cross-validation,the newly developed ML models can be used to determine the performance of most kinds of energetic materials except cubanes.Our ML models are applied to predict ρ_(crystal) and H_(f,solid) of CHON-based molecules of the 150 million sized PubChem database,and screened out 56 candidates with competitive detonation performance and reasonable chemical structures.With further improvement in future,spatial matrices have the potential of becoming multifunctional ML simulation tools that could provide even better predictions in wider fields of materials science.
基金Project supported by “Materials Research by Information Integration” Initiative(MI2I) project of the Support Program for Starting Up Innovation Hub from Japan Science and Technology Agency(JST)
文摘The history and current status of materials data activities from handbook to database are reviewed, with introduction to some important products. Through an example of prediction of interfacial thermal resistance based on data and data science methods, we show the advantages and potential of material informatics to study material issues which are too complicated or time consuming for conventional theoretical and experimental methods. Materials big data is the fundamental of material informatics. The challenges and strategy to construct materials big data are discussed, and some solutions are proposed as the results of our experiences to construct National Institute for Materials Science(NIMS) materials databases.
基金co-funded by the National Natural Science Foundation of China(Grant Nos.42002089,41930428)the National Key R&D Program of China(Grant Nos.2016YFC0600401 and 2017YFC0602302)+1 种基金by Open Research Fund Program of Key Laboratory of Metallogenic Prediction of Nonferrous Metals and Geological Environment Monitoring(Central South University)Ministry of Education(Grant Nos.2020YSJS02,2020YSJS01).
文摘The variation of crustal thickness is a critical index to reveal how the continental crust evolved over its four billion years.Generally,ratios of whole-rock trace elements,such as Sr/Y,(La/Yb)n and Ce/Y,are used to characterize crustal thicknesses.However,sometimes confusing results are obtained since there is no enough filtered data.Here,a state-of-the-art approach,based on a machine-learning algorithm,is proposed to predict crustal thickness using global major-and trace-element geochemical data of intermediate arc rocks and intraplate basalts,and their corresponding crustal thicknesses.After the validation processes,the root-mean-square error(RMSE)and the coefficient of determination(R2)score were used to evaluate the performance of the machine learning algorithm based on the learning dataset which has never been used during the training phase.The results demonstrate that the machine learning algorithm is more reliable in predicting crustal thickness than the conventional methods.The trained model predicts that the crustal thickness of the eastern North China Craton(ENCC)was-45 km from the Late Triassic to the Early Cretaceous,but-35 km from the Early Cretaceous,which corresponds to the paleo-elevation of 3.0±1.5 km at Early Mesozoic,and decease to the present-day elevation in the ENCC.The estimates are generally consistent with the previous studies on xenoliths from the lower crust and on the paleoenvironment of the coastal mountain of the ENCC,which indicates that the lower crust of the ENCC was delaminated abruptly at the Early Cretaceous.
基金the support of the Monash-IITB Academy Scholarshipfunded in part by the Australian Research Council (DP190103592)。
文摘Typically, magnesium alloys have been designed using a so-called hill-climbing approach, with rather incremental advances over the past century. Iterative and incremental alloy design is slow and expensive, but more importantly it does not harness all the data that exists in the field. In this work, a new approach is proposed that utilises data science and provides a detailed understanding of the data that exists in the field of Mg-alloy design to date. In this approach, first a consolidated alloy database that incorporates 916 datapoints was developed from the literature and experimental work. To analyse the characteristics of the database, alloying and thermomechanical processing effects on mechanical properties were explored via composition-process-property matrices. An unsupervised machine learning(ML) method of clustering was also implemented, using unlabelled data, with the aim of revealing potentially useful information for an alloy representation space of low dimensionality. In addition, the alloy database was correlated to thermodynamically stable secondary phases to further understand the relationships between microstructure and mechanical properties. This work not only introduces an invaluable open-source database, but it also provides, for the first-time data, insights that enable future accelerated digital Mg-alloy design.
基金the National Social Science Foundation of China(No.16BGL183).
文摘Many high quality studies have emerged from public databases,such as Surveillance,Epidemiology,and End Results(SEER),National Health and Nutrition Examination Survey(NHANES),The Cancer Genome Atlas(TCGA),and Medical Information Mart for Intensive Care(MIMIC);however,these data are often characterized by a high degree of dimensional heterogeneity,timeliness,scarcity,irregularity,and other characteristics,resulting in the value of these data not being fully utilized.Data-mining technology has been a frontier field in medical research,as it demonstrates excellent performance in evaluating patient risks and assisting clinical decision-making in building disease-prediction models.Therefore,data mining has unique advantages in clinical big-data research,especially in large-scale medical public databases.This article introduced the main medical public database and described the steps,tasks,and models of data mining in simple language.Additionally,we described data-mining methods along with their practical applications.The goal of this work was to aid clinical researchers in gaining a clear and intuitive understanding of the application of data-mining technology on clinical big-data in order to promote the production of research results that are beneficial to doctors and patients.
基金supported by the Ensemble Grant for Early Career Researchers 2022 and the 2023 Ensemble Continuation Grant of Tohoku University,the Hirose Foundation,the Iwatani Naoji Foundation,and the AIMR Fusion Research Grantsupported by JSPS KAKENHI Nos.JP23K13599,JP23K13703,JP22H01803,and JP18H05513+2 种基金the Center for Computational Materials Science,Institute for Materials Research,Tohoku University for the use of MASAMUNEIMR(Nos.202212-SCKXX0204 and 202208-SCKXX-0212)the Institute for Solid State Physics(ISSP)at the University of Tokyo for the use of their supercomputersthe China Scholarship Council(CSC)fund to pursue studies in Japan.
文摘All-solid-state batteries(ASSBs)are a class of safer and higher-energy-density materials compared to conventional devices,from which solid-state electrolytes(SSEs)are their essential components.To date,investigations to search for high ion-conducting solid-state electrolytes have attracted broad concern.However,obtaining SSEs with high ionic conductivity is challenging due to the complex structural information and the less-explored structure-performance relationship.To provide a solution to these challenges,developing a database containing typical SSEs from available experimental reports would be a new avenue to understand the structureperformance relationships and find out new design guidelines for reasonable SSEs.Herein,a dynamic experimental database containing>600 materials was developed in a wide range of temperatures(132.40–1261.60 K),including mono-and divalent cations(e.g.,Li^(+),Na^(+),K^(+),Ag^(+),Ca^(2+),Mg^(2+),and Zn^(2+))and various types of anions(e.g.,halide,hydride,sulfide,and oxide).Data-mining was conducted to explore the relationships among different variates(e.g.,transport ion,composition,activation energy,and conductivity).Overall,we expect that this database can provide essential guidelines for the design and development of high-performance SSEs in ASSB applications.This database is dynamically updated,which can be accessed via our open-source online system.
文摘The present article outlines progress made in designing an intelligent information system for automatic management and knowledge discovery in large numeric and scientific databases, with a validating application to the CAST-NEONS environmental databases used for ocean modeling and prediction. We describe a discovery-learning process (Automatic Data Analysis System) which combines the features of two machine learning techniques to generate sets of production rules that efficiently describe the observational raw data contained in the database. Data clustering allows the system to classify the raw data into meaningful conceptual clusters, which the system learns by induction to build decision trees, from which are automatically deduced the production rules.
基金Supported by the National Natural Science Foundation of China(61273160)the Fundamental Research Funds for the Central Universities(14CX06067A,13CX05021A)
文摘In soft sensor field, just-in-time learning(JITL) is an effective approach to model nonlinear and time varying processes. However, most similarity criterions in JITL are computed in the input space only while ignoring important output information, which may lead to inaccurate construction of relevant sample set. To solve this problem, we propose a novel supervised feature extraction method suitable for the regression problem called supervised local and non-local structure preserving projections(SLNSPP), in which both input and output information can be easily and effectively incorporated through a newly defined similarity index. The SLNSPP can not only retain the virtue of locality preserving projections but also prevent faraway points from nearing after projection,which endues SLNSPP with powerful discriminating ability. Such two good properties of SLNSPP are desirable for JITL as they are expected to enhance the accuracy of similar sample selection. Consequently, we present a SLNSPP-JITL framework for developing adaptive soft sensor, including a sparse learning strategy to limit the scale and update the frequency of database. Finally, two case studies are conducted with benchmark datasets to evaluate the performance of the proposed schemes. The results demonstrate the effectiveness of LNSPP and SLNSPP.