OBJECTIVE: To help researchers selecting appropriate data mining models to provide better evidence for the clinical practice of Traditional Chinese Medicine(TCM) diagnosis and therapy.METHODS: Clinical issues based on...OBJECTIVE: To help researchers selecting appropriate data mining models to provide better evidence for the clinical practice of Traditional Chinese Medicine(TCM) diagnosis and therapy.METHODS: Clinical issues based on data mining models were comprehensively summarized from four significant elements of the clinical studies:symptoms, symptom patterns, herbs, and efficacy.Existing problems were further generalized to determine the relevant factors of the performance of data mining models, e.g. data type, samples, parameters, variable labels. Combining these relevant factors, the TCM clinical data features were compared with regards to statistical characters and informatics properties. Data models were compared simultaneously from the view of applied conditions and suitable scopes.RESULTS: The main application problems were the inconsistent data type and the small samples for the used data mining models, which caused the inappropriate results, even the mistake results. These features, i.e. advantages, disadvantages, satisfied data types, tasks of data mining, and the TCM issues, were summarized and compared.CONCLUSION: By aiming at the special features of different data mining models, the clinical doctors could select the suitable data mining models to resolve the TCM problem.展开更多
Cooling process of iron ore pellets in a circular cooler has great impacts on the pellet quality and systematic energy exploitation. However, multi-variables and non-visualization of this gray system is unfavorable to...Cooling process of iron ore pellets in a circular cooler has great impacts on the pellet quality and systematic energy exploitation. However, multi-variables and non-visualization of this gray system is unfavorable to efficient production. Thus, the cooling process of iron ore pellets was optimized using mathematical model and data mining techniques. A mathematical model was established and validated by steady-state production data, and the results show that the calculated values coincide very well with the measured values. Based on the proposed model, effects of important process parameters on gas-pellet temperature profiles within the circular cooler were analyzed to better understand the entire cooling process. Two data mining techniques—Association Rules Induction and Clustering were also applied on the steady-state production data to obtain expertise operating rules and optimized targets. Finally, an optimized control strategy for the circular cooler was proposed and an operation guidance system was developed. The system could realize the visualization of thermal process at steady state and provide operation guidance to optimize the circular cooler.展开更多
An experience is presented using the finite element method (FEM) and data mining (DM) techniques to develop models that can be used to optimieze the skin-pass rolling process based on its operating conditions. A F...An experience is presented using the finite element method (FEM) and data mining (DM) techniques to develop models that can be used to optimieze the skin-pass rolling process based on its operating conditions. A FE model based on a real skin-pass process is built and validated. Based on this model, a group of FE models is simulated with different adjustment parameters and with different materials for the sheet; both variables are chosen from pre-set ranges, From all FE model simulations, a database is generated; this database is made up of the above mentioned adjustment parameters, sheet properties and the variables of the process arising from the simulation of the model. Various types of data mining algorithms are used to develop predictive models for each of the variables of the process.The best predictive models can be used to predict experimentally hard-to-measure variables (internal stresses, internal straine, etc.) which are useful in the optimal design of the process or to be applied in real time control systems of a skin-pass process in -plant.展开更多
In order to find an effective way to improve the quality of school management,finding valuable information from students' original data and providing feedback for student management are necessary. Firstly,some new...In order to find an effective way to improve the quality of school management,finding valuable information from students' original data and providing feedback for student management are necessary. Firstly,some new and successful educational data mining models were analyzed and compared. These models have better performance than traditional models( such as Knowledge Tracing Model) in efficiency,comprehensiveness,ease of use,stability and so on. Then,the neural network algorithm was conducted to explore the feasibility of the application of educational data mining in student management,and the results show that it has enough predictive accuracy and reliability to be put into practice. In the end,the possibility and prospect of the application of educational data mining in teaching management system for university students was assessed.展开更多
The high temperature dielectrics of Quartz fiber-reinforced silicon dioxide ceramic (Si02/SiO2 ) composites were studied both theoretically and experimentally. A multi-scale theoretical model was developed based on ...The high temperature dielectrics of Quartz fiber-reinforced silicon dioxide ceramic (Si02/SiO2 ) composites were studied both theoretically and experimentally. A multi-scale theoretical model was developed based on the theory of dielectrics. It was realized to predict dielectric properties at higher temperature ( 〉 1200 ℃) by experimental data mining for correlative coefficients in model. The results show that the dielectrics of SiO2/SiO2, which were calculated with the theoretical model, were in agreement with experimental measured value.展开更多
Data Mining has become an important technique for the exploration and extraction of data in numerous and various research projects in different fields (technology, information technology, business, the environment, ec...Data Mining has become an important technique for the exploration and extraction of data in numerous and various research projects in different fields (technology, information technology, business, the environment, economics, etc.). In the context of the analysis and visualisation of large amounts of data extracted using Data Mining on a temporary basis (time-series), free software such as R has appeared in the international context as a perfect inexpensive and efficient tool of exploitation and visualisation of time series. This has allowed the development of models, which help to extract the most relevant information from large volumes of data. In this regard, a script has been developed with the goal of implementing ARIMA models, showing these as useful and quick mechanisms for the extraction, analysis and visualisation of large data volumes, in addition to presenting the great advantage of being applied in multiple branches of knowledge from economy, demography, physics, mathematics and fisheries among others. Therefore, ARIMA models appear as a Data Mining technique, offering reliable, robust and high-quality results, to help validate and sustain the research carried out.展开更多
For the multi-mode radar working in the modern electronicbattlefield, different working states of one single radar areprone to being classified as multiple emitters when adoptingtraditional classification methods to p...For the multi-mode radar working in the modern electronicbattlefield, different working states of one single radar areprone to being classified as multiple emitters when adoptingtraditional classification methods to process intercepted signals,which has a negative effect on signal classification. A classificationmethod based on spatial data mining is presented to address theabove challenge. Inspired by the idea of spatial data mining, theclassification method applies nuclear field to depicting the distributioninformation of pulse samples in feature space, and digs out thehidden cluster information by analyzing distribution characteristics.In addition, a membership-degree criterion to quantify the correlationamong all classes is established, which ensures classificationaccuracy of signal samples. Numerical experiments show that thepresented method can effectively prevent different working statesof multi-mode emitter from being classified as several emitters,and achieve higher classification accuracy.展开更多
Data mining in the educational field can be used to optimize the teaching and learning performance among the students.The recently developed machine learning(ML)and deep learning(DL)approaches can be utilized to mine ...Data mining in the educational field can be used to optimize the teaching and learning performance among the students.The recently developed machine learning(ML)and deep learning(DL)approaches can be utilized to mine the data effectively.This study proposes an Improved Sailfish Optimizer-based Feature SelectionwithOptimal Stacked Sparse Autoencoder(ISOFS-OSSAE)for data mining and pattern recognition in the educational sector.The proposed ISOFS-OSSAE model aims to mine the educational data and derive decisions based on the feature selection and classification process.Moreover,the ISOFS-OSSAEmodel involves the design of the ISOFS technique to choose an optimal subset of features.Moreover,the swallow swarm optimization(SSO)with the SSAE model is derived to perform the classification process.To showcase the enhanced outcomes of the ISOFSOSSAE model,a wide range of experiments were taken place on a benchmark dataset from the University of California Irvine(UCI)Machine Learning Repository.The simulation results pointed out the improved classification performance of the ISOFS-OSSAE model over the recent state of art approaches interms of different performance measures.展开更多
In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Associ...In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Association rules were used to analyze correlation and check consistency between indices. This study shows that the judgment obtained by weak association rules or non-association rules is more accurate and more credible than that obtained by strong association rules. When the testing grades of two indices in the weak association rules are inconsistent, the testing grades of indices are more likely to be erroneous, and the mistakes are often caused by human factors. Clustering data mining technology was used to analyze the reliability of a diagnosis, or to perform health diagnosis directly. Analysis showed that the clustering results are related to the indices selected, and that if the indices selected are more significant, the characteristics of clustering results are also more significant, and the analysis or diagnosis is more credible. The indices and diagnosis analysis function produced by this study provide a necessary theoretical foundation and new ideas for the development of hydraulic metal structure health diagnosis technology.展开更多
To improve surface accuracy of the work-piece and obtain potentially valuable information,a dynamic milling force prediction model was proposed based on data mining.In view of the current dynamic milling force obtaine...To improve surface accuracy of the work-piece and obtain potentially valuable information,a dynamic milling force prediction model was proposed based on data mining.In view of the current dynamic milling force obtained through finite element simulation and analytical calculation,in the finite element modeling,the model built is inevitably different from the actual working conditions,and the analytical calculation is slightly cumbersome and complex,and a dynamic milling force prediction model based on data mining is proposed.The model was established using a combination of regression analysis and Radial Basis Function(RBF) neural network.Using data mining as a means,the internal relationship between milling force,cutting parameters,temperature,vibration and surface quality is deeply analyzed,and the influence of dynamic milling force changes on different situations is extracted and summarized by the methods of cluster analysis and correlation analysis.The results show that the proposed dynamic milling force model has a good prediction effect,ensures the production quality,reduces the occurrence of flutter,improves the surface accuracy of the work-piece,and provides a more accurate basis for the selection of process parameters.展开更多
Statistics are most crucial than ever due to the accessibility of huge counts of data from several domains such as finance,medicine,science,engineering,and so on.Statistical data mining(SDM)is an interdisciplinary dom...Statistics are most crucial than ever due to the accessibility of huge counts of data from several domains such as finance,medicine,science,engineering,and so on.Statistical data mining(SDM)is an interdisciplinary domain that examines huge existing databases to discover patterns and connections from the data.It varies in classical statistics on the size of datasets and on the detail that the data could not primarily be gathered based on some experimental strategy but conversely for other resolves.Thus,this paper introduces an effective statistical Data Mining for Intelligent Rainfall Prediction using Slime Mould Optimization with Deep Learning(SDMIRPSMODL)model.In the presented SDMIRP-SMODL model,the feature subset selection process is performed by the SMO algorithm,which in turn minimizes the computation complexity.For rainfall prediction.Convolution neural network with long short-term memory(CNN-LSTM)technique is exploited.At last,this study involves the pelican optimization algorithm(POA)as a hyperparameter optimizer.The experimental evaluation of the SDMIRP-SMODL approach is tested utilizing a rainfall dataset comprising 23682 samples in the negative class and 1865 samples in the positive class.The comparative outcomes reported the supremacy of the SDMIRP-SMODL model compared to existing techniques.展开更多
In this paper, we conduct research on the structured data mining algorithm and applications on machine learning field. Various fields due to the advancement of informatization and digitization, a lot of multi-source a...In this paper, we conduct research on the structured data mining algorithm and applications on machine learning field. Various fields due to the advancement of informatization and digitization, a lot of multi-source and heterogeneous data distributed storage, in order to achieve the sharing, we must solve from the storage management to the interoperability of a series of mechanism, the method and implementation technology. Unstructured data does not have strict structure, therefore, compared with structured information that is more difficult to standardization, with management more difficult. According to these characteristics, the large capacity of unstructured data or using files separately store, is stored in the database index of similar pointer. Under this background, we propose the new idea on the structured data mining algorithm that is meaningful.展开更多
In the electron beam selective melting(EBSM)process,the quality of each deposited melt track has an effect on the properties of the manufactured component.However,the formation of the melt track is governed by various...In the electron beam selective melting(EBSM)process,the quality of each deposited melt track has an effect on the properties of the manufactured component.However,the formation of the melt track is governed by various physical phenomena and influenced by various process parameters,and the correlation of these parameters is complicated and difficult to establish experimentally.The mesoscopic modeling technique was recently introduced as a means of simulating the electron beam(EB)melting process and revealing the formation mechanisms of specific melt track morphologies.However,the correlation between the process parameters and the melt track features has not yet been quantitatively understood.This paper investigates the morphological features of the melt track from the results of mesoscopic simulation,while introducing key descriptive indexes such as melt track width and height in order to numerically assess the deposition quality.The effects of various processing parameters are also quantitatively investigated,and the correlation between the processing conditions and the melt track features is thereby derived.Finally,a simulation-driven optimization framework consisting of mesoscopic modeling and data mining is proposed,and its potential and limitations are discussed.展开更多
Introduction: The present work compared the prediction power of the different data mining techniques used to develop the HIV testing prediction model. Four popular data mining algorithms (Decision tree, Naive Bayes, N...Introduction: The present work compared the prediction power of the different data mining techniques used to develop the HIV testing prediction model. Four popular data mining algorithms (Decision tree, Naive Bayes, Neural network, logistic regression) were used to build the model that predicts whether an individual was being tested for HIV among adults in Ethiopia using EDHS 2011. The final experimentation results indicated that the decision tree (random tree algorithm) performed the best with accuracy of 96%, the decision tree induction method (J48) came out to be the second best with a classification accuracy of 79%, followed by neural network (78%). Logistic regression has also achieved the least classification accuracy of 74%. Objectives: The objective of this study is to compare the prediction power of the different data mining techniques used to develop the HIV testing prediction model. Methods: Cross-Industry Standard Process for Data Mining (CRISP-DM) was used to predict the model for HIV testing and explore association rules between HIV testing and the selected attributes. Data preprocessing was performed and missing values for the categorical variable were replaced by the modal value of the variable. Different data mining techniques were used to build the predictive model. Results: The target dataset contained 30,625 study participants. Out of which 16,515 (54%) participants were women while the rest 14,110 (46%) were men. The age of the participants in the dataset ranged from 15 to 59 years old with modal age of 15 - 19 years old. Among the study participants, 17,719 (58%) have never been tested for HIV while the rest 12,906 (42%) had been tested. Residence, educational level, wealth index, HIV related stigma, knowledge related to HIV, region, age group, risky sexual behaviour attributes, knowledge about where to test for HIV and knowledge on family planning through mass media were found to be predictors for HIV testing. Conclusion and Recommendation: The results obtained from this research reveal that data mining is crucial in extracting relevant information for the effective utilization of HIV testing services which has clinical, community and public health importance at all levels. It is vital to apply different data mining techniques for the same settings and compare the model performances (based on accuracy, sensitivity, and specificity) with each other. Furthermore, this study would also invite interested researchers to explore more on the application of data mining techniques in healthcare industry or else in related and similar settings for the future.展开更多
The environmental,social,and governance(ESG)report is globally recognized as a keystone in sustainable enterprise development.However,current literature has not concluded the development of topics and trends in ESG co...The environmental,social,and governance(ESG)report is globally recognized as a keystone in sustainable enterprise development.However,current literature has not concluded the development of topics and trends in ESG contexts in the twenty-first century.Therefore,we selected 1114 ESG reports from global firms in the technology industry to analyze the evolutionary trends of ESG topics by text mining.We discovered the homogenization effect toward low environmental,medium governance,and high social features in the evolution.We also designed a strategic framework to look closer into the dynamic changes of firms’within-industry representiveness and cross-sector distinctiveness,which demonstrates corporate social responsibility and sustainability.We found that companies are gradually converging toward the third quadrant,which indicates that firms contribute less to industrial outstanding and professional distinctiveness in ESG reporting.Firms choose to imitate ESG reports from each other to mitigate uncertainty and enhance behavioral legitimacy.展开更多
Agricultural Extension(AE)research faces significant challenges in producing relevant and practical knowledge due to rapid advancements in artificial intelligence(AI).AE struggles to keep pace with these advancements,...Agricultural Extension(AE)research faces significant challenges in producing relevant and practical knowledge due to rapid advancements in artificial intelligence(AI).AE struggles to keep pace with these advancements,complicating the development of actionable information.One major challenge is the absence of intelligent platforms that enable efficient information retrieval and quick decision-making.Investigations have shown a shortage of AI-assisted solutions that effectively use AE materials across various media formats while preserving scientific accuracy and contextual relevance.Although mainstream AI systems can potentially reduce decision-making risks,their usage remains limited.This limitation arises primarily from the lack of standardized datasets and concerns regarding user data privacy.For AE datasets to be standardized,they must satisfy four key criteria:inclusion of critical domain-specific knowledge,expert curation,consistent structure,and acceptance by peers.Addressing data privacy issues involves adhering to open-access principles and enforcing strict data encryption and anonymization standards.To address these gaps,a conceptual framework is introduced.This framework extends beyond typical user-oriented platforms and comprises five core modules.It features a neurosymbolic pipeline integrating large language models with physically based agricultural modeling software,further enhanced by Reinforcement Learning from Human Feedback.Notable aspects of the framework include a dedicated human-in-the-loop process and a governance structure consisting of three primary bodies focused on data standardization,ethics and security,and accountability and transparency.Overall,this work represents a significant advancement in agricultural knowledge systems,potentially transforming how AE services deliver critical information to farmers and other stakeholders.展开更多
基金Supported by Research on Pattern differentiation of AIDS based on Graph Theroy of National Natural Science Foundation of China(No.81202858)Research on Intervention Evaluation of TCM Health Differentiation of National Key Technology Support Program(No.2012BAI25B02)+3 种基金Research and Development in Digital Information System of Traditional Chinese Medicine of National 863 Program of China(No.2012AA02A609)Acupuncture Efficacy of Gastrointestinal Dysfunction(No.ZZ05003)Acupuncture-point Specialty Analysis based on Image Processing Technology(No.ZZ03090)of Self-selected subject of China Academy of Chinese Medical SciencesSemantic Recognition of Tongue and Pulse based on Image Content of the Beijing Key Laboratory of Advanced Information Science and Network Technology(No.XDXX1306)
文摘OBJECTIVE: To help researchers selecting appropriate data mining models to provide better evidence for the clinical practice of Traditional Chinese Medicine(TCM) diagnosis and therapy.METHODS: Clinical issues based on data mining models were comprehensively summarized from four significant elements of the clinical studies:symptoms, symptom patterns, herbs, and efficacy.Existing problems were further generalized to determine the relevant factors of the performance of data mining models, e.g. data type, samples, parameters, variable labels. Combining these relevant factors, the TCM clinical data features were compared with regards to statistical characters and informatics properties. Data models were compared simultaneously from the view of applied conditions and suitable scopes.RESULTS: The main application problems were the inconsistent data type and the small samples for the used data mining models, which caused the inappropriate results, even the mistake results. These features, i.e. advantages, disadvantages, satisfied data types, tasks of data mining, and the TCM issues, were summarized and compared.CONCLUSION: By aiming at the special features of different data mining models, the clinical doctors could select the suitable data mining models to resolve the TCM problem.
基金Item Sponsored by National Natural Science Foundation of China(51174253)
文摘Cooling process of iron ore pellets in a circular cooler has great impacts on the pellet quality and systematic energy exploitation. However, multi-variables and non-visualization of this gray system is unfavorable to efficient production. Thus, the cooling process of iron ore pellets was optimized using mathematical model and data mining techniques. A mathematical model was established and validated by steady-state production data, and the results show that the calculated values coincide very well with the measured values. Based on the proposed model, effects of important process parameters on gas-pellet temperature profiles within the circular cooler were analyzed to better understand the entire cooling process. Two data mining techniques—Association Rules Induction and Clustering were also applied on the steady-state production data to obtain expertise operating rules and optimized targets. Finally, an optimized control strategy for the circular cooler was proposed and an operation guidance system was developed. The system could realize the visualization of thermal process at steady state and provide operation guidance to optimize the circular cooler.
基金Item Sponsored by Spanish Ministry of Education and Science(DPI2007-61090)European Commission Research Programme of the Research Fund for Coal and Steel(RFS-PR-06035)
文摘An experience is presented using the finite element method (FEM) and data mining (DM) techniques to develop models that can be used to optimieze the skin-pass rolling process based on its operating conditions. A FE model based on a real skin-pass process is built and validated. Based on this model, a group of FE models is simulated with different adjustment parameters and with different materials for the sheet; both variables are chosen from pre-set ranges, From all FE model simulations, a database is generated; this database is made up of the above mentioned adjustment parameters, sheet properties and the variables of the process arising from the simulation of the model. Various types of data mining algorithms are used to develop predictive models for each of the variables of the process.The best predictive models can be used to predict experimentally hard-to-measure variables (internal stresses, internal straine, etc.) which are useful in the optimal design of the process or to be applied in real time control systems of a skin-pass process in -plant.
基金Sponsored by the Ability Enhancement Project of Teaching Staff in Harbin Institute of Technology(Grant No.06)
文摘In order to find an effective way to improve the quality of school management,finding valuable information from students' original data and providing feedback for student management are necessary. Firstly,some new and successful educational data mining models were analyzed and compared. These models have better performance than traditional models( such as Knowledge Tracing Model) in efficiency,comprehensiveness,ease of use,stability and so on. Then,the neural network algorithm was conducted to explore the feasibility of the application of educational data mining in student management,and the results show that it has enough predictive accuracy and reliability to be put into practice. In the end,the possibility and prospect of the application of educational data mining in teaching management system for university students was assessed.
基金the National Defense 973 (Grant No.513180303) and National Defense Basic Scientific Research (Grant No. A2220061080)the Na-tional Defense Foundation (Grant No. 5142040205BQ0154).
文摘The high temperature dielectrics of Quartz fiber-reinforced silicon dioxide ceramic (Si02/SiO2 ) composites were studied both theoretically and experimentally. A multi-scale theoretical model was developed based on the theory of dielectrics. It was realized to predict dielectric properties at higher temperature ( 〉 1200 ℃) by experimental data mining for correlative coefficients in model. The results show that the dielectrics of SiO2/SiO2, which were calculated with the theoretical model, were in agreement with experimental measured value.
文摘Data Mining has become an important technique for the exploration and extraction of data in numerous and various research projects in different fields (technology, information technology, business, the environment, economics, etc.). In the context of the analysis and visualisation of large amounts of data extracted using Data Mining on a temporary basis (time-series), free software such as R has appeared in the international context as a perfect inexpensive and efficient tool of exploitation and visualisation of time series. This has allowed the development of models, which help to extract the most relevant information from large volumes of data. In this regard, a script has been developed with the goal of implementing ARIMA models, showing these as useful and quick mechanisms for the extraction, analysis and visualisation of large data volumes, in addition to presenting the great advantage of being applied in multiple branches of knowledge from economy, demography, physics, mathematics and fisheries among others. Therefore, ARIMA models appear as a Data Mining technique, offering reliable, robust and high-quality results, to help validate and sustain the research carried out.
基金supported by the National Natural Science Foundation of China(61371172)the International S&T Cooperation Program of China(2015DFR10220)+1 种基金the Ocean Engineering Project of National Key Laboratory Foundation(1213)the Fundamental Research Funds for the Central Universities(HEUCF1608)
文摘For the multi-mode radar working in the modern electronicbattlefield, different working states of one single radar areprone to being classified as multiple emitters when adoptingtraditional classification methods to process intercepted signals,which has a negative effect on signal classification. A classificationmethod based on spatial data mining is presented to address theabove challenge. Inspired by the idea of spatial data mining, theclassification method applies nuclear field to depicting the distributioninformation of pulse samples in feature space, and digs out thehidden cluster information by analyzing distribution characteristics.In addition, a membership-degree criterion to quantify the correlationamong all classes is established, which ensures classificationaccuracy of signal samples. Numerical experiments show that thepresented method can effectively prevent different working statesof multi-mode emitter from being classified as several emitters,and achieve higher classification accuracy.
文摘Data mining in the educational field can be used to optimize the teaching and learning performance among the students.The recently developed machine learning(ML)and deep learning(DL)approaches can be utilized to mine the data effectively.This study proposes an Improved Sailfish Optimizer-based Feature SelectionwithOptimal Stacked Sparse Autoencoder(ISOFS-OSSAE)for data mining and pattern recognition in the educational sector.The proposed ISOFS-OSSAE model aims to mine the educational data and derive decisions based on the feature selection and classification process.Moreover,the ISOFS-OSSAEmodel involves the design of the ISOFS technique to choose an optimal subset of features.Moreover,the swallow swarm optimization(SSO)with the SSAE model is derived to perform the classification process.To showcase the enhanced outcomes of the ISOFSOSSAE model,a wide range of experiments were taken place on a benchmark dataset from the University of California Irvine(UCI)Machine Learning Repository.The simulation results pointed out the improved classification performance of the ISOFS-OSSAE model over the recent state of art approaches interms of different performance measures.
基金supported by the Key Program of the National Natural Science Foundation of China(Grant No.50539010)the Special Fund for Public Welfare Industry of the Ministry of Water Resources of China(Grant No.200801019)
文摘In conjunction with association rules for data mining, the connections between testing indices and strong and weak association rules were determined, and new derivative rules were obtained by further reasoning. Association rules were used to analyze correlation and check consistency between indices. This study shows that the judgment obtained by weak association rules or non-association rules is more accurate and more credible than that obtained by strong association rules. When the testing grades of two indices in the weak association rules are inconsistent, the testing grades of indices are more likely to be erroneous, and the mistakes are often caused by human factors. Clustering data mining technology was used to analyze the reliability of a diagnosis, or to perform health diagnosis directly. Analysis showed that the clustering results are related to the indices selected, and that if the indices selected are more significant, the characteristics of clustering results are also more significant, and the analysis or diagnosis is more credible. The indices and diagnosis analysis function produced by this study provide a necessary theoretical foundation and new ideas for the development of hydraulic metal structure health diagnosis technology.
基金Supported by Gansu Science and Technology Program(21YF5GA080)。
文摘To improve surface accuracy of the work-piece and obtain potentially valuable information,a dynamic milling force prediction model was proposed based on data mining.In view of the current dynamic milling force obtained through finite element simulation and analytical calculation,in the finite element modeling,the model built is inevitably different from the actual working conditions,and the analytical calculation is slightly cumbersome and complex,and a dynamic milling force prediction model based on data mining is proposed.The model was established using a combination of regression analysis and Radial Basis Function(RBF) neural network.Using data mining as a means,the internal relationship between milling force,cutting parameters,temperature,vibration and surface quality is deeply analyzed,and the influence of dynamic milling force changes on different situations is extracted and summarized by the methods of cluster analysis and correlation analysis.The results show that the proposed dynamic milling force model has a good prediction effect,ensures the production quality,reduces the occurrence of flutter,improves the surface accuracy of the work-piece,and provides a more accurate basis for the selection of process parameters.
基金This research was partly supported by the Technology Development Program of MSS[No.S3033853]by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.2021R1A4A1031509).
文摘Statistics are most crucial than ever due to the accessibility of huge counts of data from several domains such as finance,medicine,science,engineering,and so on.Statistical data mining(SDM)is an interdisciplinary domain that examines huge existing databases to discover patterns and connections from the data.It varies in classical statistics on the size of datasets and on the detail that the data could not primarily be gathered based on some experimental strategy but conversely for other resolves.Thus,this paper introduces an effective statistical Data Mining for Intelligent Rainfall Prediction using Slime Mould Optimization with Deep Learning(SDMIRPSMODL)model.In the presented SDMIRP-SMODL model,the feature subset selection process is performed by the SMO algorithm,which in turn minimizes the computation complexity.For rainfall prediction.Convolution neural network with long short-term memory(CNN-LSTM)technique is exploited.At last,this study involves the pelican optimization algorithm(POA)as a hyperparameter optimizer.The experimental evaluation of the SDMIRP-SMODL approach is tested utilizing a rainfall dataset comprising 23682 samples in the negative class and 1865 samples in the positive class.The comparative outcomes reported the supremacy of the SDMIRP-SMODL model compared to existing techniques.
文摘In this paper, we conduct research on the structured data mining algorithm and applications on machine learning field. Various fields due to the advancement of informatization and digitization, a lot of multi-source and heterogeneous data distributed storage, in order to achieve the sharing, we must solve from the storage management to the interoperability of a series of mechanism, the method and implementation technology. Unstructured data does not have strict structure, therefore, compared with structured information that is more difficult to standardization, with management more difficult. According to these characteristics, the large capacity of unstructured data or using files separately store, is stored in the database index of similar pointer. Under this background, we propose the new idea on the structured data mining algorithm that is meaningful.
文摘In the electron beam selective melting(EBSM)process,the quality of each deposited melt track has an effect on the properties of the manufactured component.However,the formation of the melt track is governed by various physical phenomena and influenced by various process parameters,and the correlation of these parameters is complicated and difficult to establish experimentally.The mesoscopic modeling technique was recently introduced as a means of simulating the electron beam(EB)melting process and revealing the formation mechanisms of specific melt track morphologies.However,the correlation between the process parameters and the melt track features has not yet been quantitatively understood.This paper investigates the morphological features of the melt track from the results of mesoscopic simulation,while introducing key descriptive indexes such as melt track width and height in order to numerically assess the deposition quality.The effects of various processing parameters are also quantitatively investigated,and the correlation between the processing conditions and the melt track features is thereby derived.Finally,a simulation-driven optimization framework consisting of mesoscopic modeling and data mining is proposed,and its potential and limitations are discussed.
文摘Introduction: The present work compared the prediction power of the different data mining techniques used to develop the HIV testing prediction model. Four popular data mining algorithms (Decision tree, Naive Bayes, Neural network, logistic regression) were used to build the model that predicts whether an individual was being tested for HIV among adults in Ethiopia using EDHS 2011. The final experimentation results indicated that the decision tree (random tree algorithm) performed the best with accuracy of 96%, the decision tree induction method (J48) came out to be the second best with a classification accuracy of 79%, followed by neural network (78%). Logistic regression has also achieved the least classification accuracy of 74%. Objectives: The objective of this study is to compare the prediction power of the different data mining techniques used to develop the HIV testing prediction model. Methods: Cross-Industry Standard Process for Data Mining (CRISP-DM) was used to predict the model for HIV testing and explore association rules between HIV testing and the selected attributes. Data preprocessing was performed and missing values for the categorical variable were replaced by the modal value of the variable. Different data mining techniques were used to build the predictive model. Results: The target dataset contained 30,625 study participants. Out of which 16,515 (54%) participants were women while the rest 14,110 (46%) were men. The age of the participants in the dataset ranged from 15 to 59 years old with modal age of 15 - 19 years old. Among the study participants, 17,719 (58%) have never been tested for HIV while the rest 12,906 (42%) had been tested. Residence, educational level, wealth index, HIV related stigma, knowledge related to HIV, region, age group, risky sexual behaviour attributes, knowledge about where to test for HIV and knowledge on family planning through mass media were found to be predictors for HIV testing. Conclusion and Recommendation: The results obtained from this research reveal that data mining is crucial in extracting relevant information for the effective utilization of HIV testing services which has clinical, community and public health importance at all levels. It is vital to apply different data mining techniques for the same settings and compare the model performances (based on accuracy, sensitivity, and specificity) with each other. Furthermore, this study would also invite interested researchers to explore more on the application of data mining techniques in healthcare industry or else in related and similar settings for the future.
基金supported by the Major Program of the National Natural Science Foundation of China[grant number 72394375].
文摘The environmental,social,and governance(ESG)report is globally recognized as a keystone in sustainable enterprise development.However,current literature has not concluded the development of topics and trends in ESG contexts in the twenty-first century.Therefore,we selected 1114 ESG reports from global firms in the technology industry to analyze the evolutionary trends of ESG topics by text mining.We discovered the homogenization effect toward low environmental,medium governance,and high social features in the evolution.We also designed a strategic framework to look closer into the dynamic changes of firms’within-industry representiveness and cross-sector distinctiveness,which demonstrates corporate social responsibility and sustainability.We found that companies are gradually converging toward the third quadrant,which indicates that firms contribute less to industrial outstanding and professional distinctiveness in ESG reporting.Firms choose to imitate ESG reports from each other to mitigate uncertainty and enhance behavioral legitimacy.
基金supported by the USDA National Institute of Foodand Agriculture,Hatch Project 1019654.
文摘Agricultural Extension(AE)research faces significant challenges in producing relevant and practical knowledge due to rapid advancements in artificial intelligence(AI).AE struggles to keep pace with these advancements,complicating the development of actionable information.One major challenge is the absence of intelligent platforms that enable efficient information retrieval and quick decision-making.Investigations have shown a shortage of AI-assisted solutions that effectively use AE materials across various media formats while preserving scientific accuracy and contextual relevance.Although mainstream AI systems can potentially reduce decision-making risks,their usage remains limited.This limitation arises primarily from the lack of standardized datasets and concerns regarding user data privacy.For AE datasets to be standardized,they must satisfy four key criteria:inclusion of critical domain-specific knowledge,expert curation,consistent structure,and acceptance by peers.Addressing data privacy issues involves adhering to open-access principles and enforcing strict data encryption and anonymization standards.To address these gaps,a conceptual framework is introduced.This framework extends beyond typical user-oriented platforms and comprises five core modules.It features a neurosymbolic pipeline integrating large language models with physically based agricultural modeling software,further enhanced by Reinforcement Learning from Human Feedback.Notable aspects of the framework include a dedicated human-in-the-loop process and a governance structure consisting of three primary bodies focused on data standardization,ethics and security,and accountability and transparency.Overall,this work represents a significant advancement in agricultural knowledge systems,potentially transforming how AE services deliver critical information to farmers and other stakeholders.