Cardiovascular diseases are the most common cause of death worldwide over the last few decades in the developed as well as underdeveloped and developing countries. Early detection of cardiac diseases and continuous su...Cardiovascular diseases are the most common cause of death worldwide over the last few decades in the developed as well as underdeveloped and developing countries. Early detection of cardiac diseases and continuous supervision of clinicians can reduce the mortality rate. However, accurate detection of heart diseases in all cases and consultation of a patient for 24 hours by a doctor is not available since it requires more sapience, time and expertise. In this?study, a tentative design of a cloud-based heart disease prediction system had been proposed to detect impending heart disease using Machine learning techniques. For the accurate detection of the heart disease, an efficient machine learning technique should be used which had been derived from a distinctive analysis among several machine learning algorithms in a Java Based Open Access Data Mining Platform, WEKA. The proposed algorithm was validated using two widely used open-access database, where 10-fold cross-validation is applied in order to analyze the performance of heart disease detection. An accuracy level of 97.53% accuracy was found from the SVM algorithm along with sensitivity and specificity of 97.50% and 94.94%respectively. Moreover, to monitor the heart disease patient round-the-clock by his/her caretaker/doctor, a real-time patient monitoring system was developed and presented using Arduino, capable of sensing some real-time parameters such as body temperature, blood pressure, humidity, heartbeat. The developed system can transmit the recorded data to a central server which are updated every 10 seconds. As a result, the doctors can visualize the patient’s real-time sensor data by using the application and start live video streaming if instant medication is required. Another important feature of the proposed system was that as soon as any real-time parameter of the patient exceeds the threshold, the prescribed doctor is notified at once through GSM technology.展开更多
Student dropout in primary education is a critical global challenge with significant long-term societal and individual consequences.Early identification of at-risk students is a crucial first step towards implementing...Student dropout in primary education is a critical global challenge with significant long-term societal and individual consequences.Early identification of at-risk students is a crucial first step towards implementing effective intervention strategies.This paper presents a machine learning framework for predicting student dropout risk by leveraging historical academic,attendance,and demographic data extracted from a primary school system.We formulate the problem as a binary classification task and evaluate multiple algorithms,including Logistic Regression,Random Forest,and Gradient Boosting,to identify the most effective predictor.To address the inherent class imbalance,we employ Synthetic Minority Over-sampling Technique(SMOTE).Our results,validated via stratified 5-fold cross-validation,indicate that the Random Forest model achieved the highest performance,with a recall of 0.91±0.03,ensuring that 91%of truly at-risk students were correctly identified.Furthermore,we use SHAP(SHapley Additive exPlanations)values to provide interpretable insights into the model’s predictions,revealing that attendance rate,academic performance trends,and socio-economic proxies are the most salient features.This work demonstrates the potential of machine learning as a powerful decision-support tool for educators,enabling timely and data-driven interventions to improve student retention and completion rates.展开更多
In this paper,models to predict hot spot temperature and to estimate cooling air’s working parameters of racks in data centers were established using machine learning algorithms based on simulation data.First,simulat...In this paper,models to predict hot spot temperature and to estimate cooling air’s working parameters of racks in data centers were established using machine learning algorithms based on simulation data.First,simulation models of typical racks were established in computational fluid dynamics(CFD).The model was validated with field test results and results in literature,error of which was less than 3%.Then,the CFD model was used to simulate thermal environments of a typical rack considering different factors,such as servers’power,which is from 3.3 kW to 20.1 kW,cooling air’s inlet velocity,which is from 1.0 m/s to 3.0 m/s,and cooling air’s inlet temperature,which is from 16℃ to 26℃ The highest temperature in the rack,also called hot spot temperature,was selected for each case.Next,a prediction model of hot spot temperature was built using machine learning algorithms,with servers’power,cooling air’s inlet velocity and cooling air’s inlet temperature as inputs,and the hot spot temperatures as outputs.Finally,based on the prediction model,an operating parameters estimation model was established to recommend cooling air’s inlet temperatures and velocities,which can not only keep the hot spot temperature at the safety value,but are also energy saving.展开更多
Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discoveri...Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.展开更多
Education is the base of the survival and growth of any state,but due to resource scarcity,students,particularly at the university level,are forced into a difficult situation.Scholarships are the most significant fina...Education is the base of the survival and growth of any state,but due to resource scarcity,students,particularly at the university level,are forced into a difficult situation.Scholarships are the most significant financial aid mechanisms developed to overcome such obstacles and assist the students in continuing with their higher studies.In this study,the convoluted situation of scholarship eligibility criteria,including parental income,responsibilities,and academic achievements,is addressed.In an attempt to maximize the scholarship selection process,numerous machine learning algorithms,including Support Vector Machines,Neural Networks,K-Nearest Neighbors,and the C4.5 algorithm,were applied.The C4.5 algorithm,owing to its efficiency in the prediction of scholarship beneficiaries based on extraneous factors,was capable of predicting a phenomenal 95.62%of predictions using extensive data of a well-esteemed government sector university from Pakistan.This percentage is 4%and 15%better than the remainder of the methods tested,and it depicts the extent of the potential for the technique to enhance the scholarship selection process.The Decision Support Systems(DSS)would not only save the administrative cost but would also create a fair and transparent process in place.In a world where accessibility to education is the key,this research provides data-oriented consolidation to ensure that deserving students are helped and allowed to get the financial assistance that they need to reach higher studies and bridge the gap between the demands of the day and the institutions of intellect.展开更多
Chronic diseases such as heart disease,cancer,and diabetes are leading drivers of mortality worldwide,underscoring the need for improved efforts around early detection and prediction.The pathophysiology and management...Chronic diseases such as heart disease,cancer,and diabetes are leading drivers of mortality worldwide,underscoring the need for improved efforts around early detection and prediction.The pathophysiology and management of chronic diseases have benefitted from emerging fields in molecular biology like genomics,transcriptomics,proteomics,glycomics,and lipidomics.The complex biomarker and mechanistic data from these"omics"studies present analytical and interpretive challenges,especially for traditional statistical methods.Machine learning(ML)techniques offer considerable promise in unlocking new pathways for data-driven chronic disease risk assessment and prognosis.This review provides a comprehensive overview of state-of-the-art applications of ML algorithms for chronic disease detection and prediction across datasets,including medical imaging,genomics,wearables,and electronic health records.Specifically,we review and synthesize key studies leveraging major ML approaches ranging from traditional techniques such as logistic regression and random forests to modern deep learning neural network architectures.We consolidate existing literature to date around ML for chronic disease prediction to synthesize major trends and trajectories that may inform both future research and clinical translation efforts in this growing field.While highlighting the critical innovations and successes emerging in this space,we identify the key challenges and limitations that remain to be addressed.Finally,we discuss pathways forward toward scalable,equitable,and clinically implementable ML solutions for transforming chronic disease screening and prevention.展开更多
Objective: To examine the association of body shape with cold and heat patterns, to determine which anthropometric measure is the best indicator for discriminating between the two patterns, and to investigate whether...Objective: To examine the association of body shape with cold and heat patterns, to determine which anthropometric measure is the best indicator for discriminating between the two patterns, and to investigate whether using a combination of measures can improve the predictive power to diagnose these patterns. Methods: Based on a total of 4,859 subjects (3,000 women and 1,859 men), statistical analyses using binary logistic regression were performed to assess the significance of the difference and the predictive power of each anthropometric measure, and binary logistic regression and Naive Bayes with the variable selection technique were used to assess the improvement in the predictive power of the patterns using the combined measures. Results: In women, the strongest indicators for determining the cold and heat patterns among anthropometric measures were body mass index (BMI) and rib circumference; in men, the best indicator was BMI. In experiments using a combination of measures, the values of the area under the receiver operating characteristic curve in women were 0.776 by Naive Bayes and 0.772 by logistic regression, and the values in men were 0.788 by Naive Bayes and 0.779 by logistic regression. Conclusions: Individuals with a higher BMI have a tendency toward a heat pattern in both women and men. The use of a combination of anthropometric measures can slightly improve the diagnostic accuracy. Our findings can provide fundamental information for the diagnosis of cold and heat patterns based on body shape for personalized medicine.展开更多
As internet technology use is on the rise globally,phishing constitutes a considerable share of the threats that may attack individuals and organizations,leading to significant losses from personal and confidential in...As internet technology use is on the rise globally,phishing constitutes a considerable share of the threats that may attack individuals and organizations,leading to significant losses from personal and confidential information to substantial financial losses.Thus,much research has been dedicated in recent years to developing effective and robust mechanisms to enhance the ability to trace illegitimate web pages and to distinguish them from non-phishing sites as accurately as possible.Aiming to conclude whether a universally accepted model can detect phishing attempts with 100%accuracy,we conduct a systematic review of research carried out in 2018-2021 published in well-known journals published by Elsevier,IEEE,Springer,and Emerald.Those researchers studied different Data Mining(DM)algorithms,some of which created a whole new model,while others compared the performance of several algorithms.Some studies combined two or more algorithms to enhance the detection performance.Results reveal that while most algorithms achieve accuracies higher than 90%,only some specific models can achieve 100%accurate results.展开更多
In this study, the author will investigate and utilize advanced machine learning models related to two different methodologies to determine the best and most effective way to predict individuals with heart failure and...In this study, the author will investigate and utilize advanced machine learning models related to two different methodologies to determine the best and most effective way to predict individuals with heart failure and cardiovascular diseases. The first methodology involves a list of classification machine learning algorithms, and the second methodology involves the use of a deep learning algorithm known as MLP or Multilayer Perceptrons. Globally, hospitals are dealing with cases related to cardiovascular diseases and heart failure as they are major causes of death, not only for overweight individuals but also for those who do not adopt a healthy diet and lifestyle. Often, heart failures and cardiovascular diseases can be caused by many factors, including cardiomyopathy, high blood pressure, coronary heart disease, and heart inflammation [1]. Other factors, such as irregular shocks or stress, can also contribute to heart failure or a heart attack. While these events cannot be predicted, continuous data from patients’ health can help doctors predict heart failure. Therefore, this data-driven research utilizes advanced machine learning and deep learning techniques to better analyze and manipulate the data, providing doctors with informative decision-making tools regarding a person’s likelihood of experiencing heart failure. In this paper, the author employed advanced data preprocessing and cleaning techniques. Additionally, the dataset underwent testing using two different methodologies to determine the most effective machine-learning technique for producing optimal predictions. The first methodology involved employing a list of supervised classification machine learning algorithms, including Naïve Bayes (NB), KNN, logistic regression, and the SVM algorithm. The second methodology utilized a deep learning (DL) algorithm known as Multilayer Perceptrons (MLPs). This algorithm provided the author with the flexibility to experiment with different layer sizes and activation functions, such as ReLU, logistic (sigmoid), and Tanh. Both methodologies produced optimal models with high-level accuracy rates. The first methodology involves a list of supervised machine learning algorithms, including KNN, SVM, Adaboost, Logistic Regression, Naive Bayes, and Decision Tree algorithms. They achieved accuracy rates of 86%, 89%, 89%, 81%, 79%, and 99%, respectively. The author clearly explained that Decision Tree algorithm is not suitable for the dataset at hand due to overfitting issues. Therefore, it was discarded as an optimal model to be used. However, the latter methodology (Neural Network) demonstrated the most stable and optimal accuracy, achieving over 87% accuracy while adapting well to real-life situations and requiring low computing power overall. A performance assessment and evaluation were carried out based on a confusion matrix report to demonstrate feasibility and performance. The author concluded that the performance of the model in real-life situations can advance not only the medical field of science but also mathematical concepts. Additionally, the advanced preprocessing approach behind the model can provide value to the Data Science community. The model can be further developed by employing various optimization techniques to handle even larger datasets related to heart failures. Furthermore, different neural network algorithms can be tested to explore alternative approaches and yield different results.展开更多
The composition of base oils affects the performance of lubricants made from them.This paper proposes a hybrid model based on gradient-boosted decision tree(GBDT)to analyze the effect of different ratios of KN4010,PAO...The composition of base oils affects the performance of lubricants made from them.This paper proposes a hybrid model based on gradient-boosted decision tree(GBDT)to analyze the effect of different ratios of KN4010,PAO40,and PriEco3000 component in a composite base oil system on the performance of lubricants.The study was conducted under small laboratory sample conditions,and a data expansion method using the Gaussian Copula function was proposed to improve the prediction ability of the hybrid model.The study also compared four optimization algorithms,sticky mushroom algorithm(SMA),genetic algorithm(GA),whale optimization algorithm(WOA),and seagull optimization algorithm(SOA),to predict the kinematic viscosity at 40℃,kinematic viscosity at 100℃,viscosity index,and oxidation induction time performance of the lubricant.The results showed that the Gaussian Copula function data expansion method improved the prediction ability of the hybrid model in the case of small samples.The SOA-GBDT hybrid model had the fastest convergence speed for the samples and the best prediction effect,with determination coefficients(R^(2))for the four indicators of lubricants reaching 0.98,0.99,0.96 and 0.96,respectively.Thus,this model can significantly reduce the model’s prediction error and has good prediction ability.展开更多
Influenza is a kind of infectious disease, which spreads quickly and widely. The outbreak of influenza has brought huge losses to society. In this paper, four major categories of flu keywords, “prevention phase”, “...Influenza is a kind of infectious disease, which spreads quickly and widely. The outbreak of influenza has brought huge losses to society. In this paper, four major categories of flu keywords, “prevention phase”, “symptom phase”, “treatment phase”, and “commonly-used phrase” were set. Python web crawler was used to obtain relevant influenza data from the National Influenza Center’s influenza surveillance weekly report and Baidu Index. The establishment of support vector regression (SVR), least absolute shrinkage and selection operator (LASSO), convolutional neural networks (CNN) prediction models through machine learning, took into account the seasonal characteristics of the influenza, also established the time series model (ARMA). The results show that, it is feasible to predict influenza based on web search data. Machine learning shows a certain forecast effect in the prediction of influenza based on web search data. In the future, it will have certain reference value in influenza prediction. The ARMA(3,0) model predicts better results and has greater generalization. Finally, the lack of research in this paper and future research directions are given.展开更多
Data mining and analytics involve inspecting and modeling large pre-existing datasets to discover decision-making information.Precision agriculture uses datamining to advance agricultural developments.Many farmers are...Data mining and analytics involve inspecting and modeling large pre-existing datasets to discover decision-making information.Precision agriculture uses datamining to advance agricultural developments.Many farmers aren’t getting the most out of their land because they don’t use precision agriculture.They harvest crops without a well-planned recommendation system.Future crop production is calculated by combining environmental conditions and management behavior,yielding numerical and categorical data.Most existing research still needs to address data preprocessing and crop categorization/classification.Furthermore,statistical analysis receives less attention,despite producing more accurate and valid results.The study was conducted on a dataset about Karnataka state,India,with crops of eight parameters taken into account,namely the minimum amount of fertilizers required,such as nitrogen,phosphorus,potassium,and pH values.The research considers rainfall,season,soil type,and temperature parameters to provide precise cultivation recommendations for high productivity.The presented algorithm converts discrete numerals to factors first,then reduces levels.Second,the algorithm generates six datasets,two fromCase-1(dataset withmany numeric variables),two from Case-2(dataset with many categorical variables),and one from Case-3(dataset with reduced factor variables).Finally,the algorithm outputs a class membership allocation based on an extended version of the K-means partitioning method with lambda estimation.The presented work produces mixed-type datasets with precisely categorized crops by organizing data based on environmental conditions,soil nutrients,and geo-location.Finally,the prepared dataset solves the classification problem,leading to a model evaluation that selects the best dataset for precise crop prediction.展开更多
This paper presents an efficient prediction model for a good learning environment using Random Forest(RF)classifier.It consists of a series of modules;data preprocessing,data normalization,data split andfinally classi...This paper presents an efficient prediction model for a good learning environment using Random Forest(RF)classifier.It consists of a series of modules;data preprocessing,data normalization,data split andfinally classification or prediction by the RF classifier.The preprocessed data is normalized using minmax normalization often used before modelfitting.As the input data or variables are measured at different scales,it is necessary to normalize them to contribute equally to the modelfitting.Then,the RF classifier is employed for course selection which is an ensemble learning method and k-fold cross-validation(k=10)is used to validate the model.The proposed Prediction Model for Course Selection(PMCS)system is considered a multi-class problem that predicts the course for a particular learner with three complexity levels,namely low,medium and high.It is operated under two modes;locally and globally.The former considers the gender of the learner and the later does not consider the gender of the learner.The database comprises the learner opinions from 75 males and 75 females per category(low,medium and high).Thus the system uses a total of 450 samples to evaluate the performance of the PMCS system.Results show that the system’s performance,while using locally i.e.,gender-wise has slightly higher performance than the global system.The RF classifier with 75 decision trees in the global system provides an average accuracy of 97.6%,whereas in the local system it is 97%(male)and 97.6%(female).The overall performance of the RF classifier with 75 trees is better than 25,50 and 100 decision trees in both local and global systems.展开更多
In this paper, it described the architecture of a tool called DiagData. This tool aims to use a large amount of data and information in the field of plant disease diagnostic to generate a disease predictive system. In...In this paper, it described the architecture of a tool called DiagData. This tool aims to use a large amount of data and information in the field of plant disease diagnostic to generate a disease predictive system. In this approach, techniques of data mining are used to extract knowledge from existing data. The data is extracted in the form of rules that are used in the development of a predictive intelligent system. Currently, the specification of these rules is built by an expert or data mining. When data mining on a large database is used, the number of generated rules is very complex too. The main goal of this work is minimize the rule generation time. The proposed tool, called DiagData, extracts knowledge automatically or semi-automatically from a database and uses it to build an intelligent system for disease prediction. In this work, the decision tree learning algorithm was used to generate the rules. A toolbox called Fuzzygen was used to generate a prediction system from rules generated by decision tree algorithm. The language used to implement this software was Java. The DiagData has been used in diseases prediction and diagnosis systems and in the validation of economic and environmental indicators in agricultural production systems. The validation process involved measurements and comparisons of the time spent to enter the rules by an expert with the time used to insert the same rules with the proposed tool. Thus, the tool was successfully validated, providing a reduction of time.展开更多
Particle Swarm Optimization(PSO)has been utilized as a useful tool for solving intricate optimization problems for various applications in different fields.This paper attempts to carry out an update on PSO and gives a...Particle Swarm Optimization(PSO)has been utilized as a useful tool for solving intricate optimization problems for various applications in different fields.This paper attempts to carry out an update on PSO and gives a review of its recent developments and applications,but also provides arguments for its efficacy in resolving optimization problems in comparison with other algorithms.Covering six strategic areas,which include Data Mining,Machine Learning,Engineering Design,Energy Systems,Healthcare,and Robotics,the study demonstrates the versatility and effectiveness of the PSO.Experimental results are,however,used to show the strong and weak parts of PSO,and performance results are included in tables for ease of comparison.The results stress PSO’s efficiency in providing optimal solutions but also show that there are aspects that need to be improved through combination with algorithms or tuning to the parameters of the method.The review of the advantages and limitations of PSO is intended to provide academics and practitioners with a well-rounded view of the methods of employing such a tool most effectively and to encourage optimized designs of PSO in solving theoretical and practical problems in the future.展开更多
The futures trading market is an important part of the financial markets and soybeans are one of the most strategically important crops in the world.How to predict soybean future price is a challenging topic being stu...The futures trading market is an important part of the financial markets and soybeans are one of the most strategically important crops in the world.How to predict soybean future price is a challenging topic being studied by many researchers.This paper proposes a novel hybrid soybean future price prediction model which includes two stages of data preprocessing and deep learning prediction.In the data preprocessing stage,futures price series are decomposed into subsequences using the ICEEMDAN(improved complete ensemble empirical mode decomposition with adaptive noise)method.The Lempel-Ziv complexity determination method was then used to identify and reconstruct high-frequency subsequences.Finally,the high frequency component is decomposed secondarily using variational mode decomposition optimized by beluga whale optimization algorithm.In the deep learning prediction stage,a deep extreme learning machine optimized by the sparrow search algorithm was used to obtain the prediction results of all subseries and reconstructs them to obtain the final soybean future price prediction results.Based on the experimental results of soybean future price markets in China,Italy,and the United States,it was found that the hybrid method proposed provides superior performance in terms of prediction accuracy and robustness.展开更多
文摘Cardiovascular diseases are the most common cause of death worldwide over the last few decades in the developed as well as underdeveloped and developing countries. Early detection of cardiac diseases and continuous supervision of clinicians can reduce the mortality rate. However, accurate detection of heart diseases in all cases and consultation of a patient for 24 hours by a doctor is not available since it requires more sapience, time and expertise. In this?study, a tentative design of a cloud-based heart disease prediction system had been proposed to detect impending heart disease using Machine learning techniques. For the accurate detection of the heart disease, an efficient machine learning technique should be used which had been derived from a distinctive analysis among several machine learning algorithms in a Java Based Open Access Data Mining Platform, WEKA. The proposed algorithm was validated using two widely used open-access database, where 10-fold cross-validation is applied in order to analyze the performance of heart disease detection. An accuracy level of 97.53% accuracy was found from the SVM algorithm along with sensitivity and specificity of 97.50% and 94.94%respectively. Moreover, to monitor the heart disease patient round-the-clock by his/her caretaker/doctor, a real-time patient monitoring system was developed and presented using Arduino, capable of sensing some real-time parameters such as body temperature, blood pressure, humidity, heartbeat. The developed system can transmit the recorded data to a central server which are updated every 10 seconds. As a result, the doctors can visualize the patient’s real-time sensor data by using the application and start live video streaming if instant medication is required. Another important feature of the proposed system was that as soon as any real-time parameter of the patient exceeds the threshold, the prescribed doctor is notified at once through GSM technology.
文摘Student dropout in primary education is a critical global challenge with significant long-term societal and individual consequences.Early identification of at-risk students is a crucial first step towards implementing effective intervention strategies.This paper presents a machine learning framework for predicting student dropout risk by leveraging historical academic,attendance,and demographic data extracted from a primary school system.We formulate the problem as a binary classification task and evaluate multiple algorithms,including Logistic Regression,Random Forest,and Gradient Boosting,to identify the most effective predictor.To address the inherent class imbalance,we employ Synthetic Minority Over-sampling Technique(SMOTE).Our results,validated via stratified 5-fold cross-validation,indicate that the Random Forest model achieved the highest performance,with a recall of 0.91±0.03,ensuring that 91%of truly at-risk students were correctly identified.Furthermore,we use SHAP(SHapley Additive exPlanations)values to provide interpretable insights into the model’s predictions,revealing that attendance rate,academic performance trends,and socio-economic proxies are the most salient features.This work demonstrates the potential of machine learning as a powerful decision-support tool for educators,enabling timely and data-driven interventions to improve student retention and completion rates.
基金The authors appreciate support of the project from China Electronics Engineering Design Institute CO.,LTD.(No.SDIC2021-08)from the Beijing Natural Science Foundation(No.4212040).
文摘In this paper,models to predict hot spot temperature and to estimate cooling air’s working parameters of racks in data centers were established using machine learning algorithms based on simulation data.First,simulation models of typical racks were established in computational fluid dynamics(CFD).The model was validated with field test results and results in literature,error of which was less than 3%.Then,the CFD model was used to simulate thermal environments of a typical rack considering different factors,such as servers’power,which is from 3.3 kW to 20.1 kW,cooling air’s inlet velocity,which is from 1.0 m/s to 3.0 m/s,and cooling air’s inlet temperature,which is from 16℃ to 26℃ The highest temperature in the rack,also called hot spot temperature,was selected for each case.Next,a prediction model of hot spot temperature was built using machine learning algorithms,with servers’power,cooling air’s inlet velocity and cooling air’s inlet temperature as inputs,and the hot spot temperatures as outputs.Finally,based on the prediction model,an operating parameters estimation model was established to recommend cooling air’s inlet temperatures and velocities,which can not only keep the hot spot temperature at the safety value,but are also energy saving.
基金Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia for funding this research work through the Project Number RI-44-0444.
文摘Datamining plays a crucial role in extractingmeaningful knowledge fromlarge-scale data repositories,such as data warehouses and databases.Association rule mining,a fundamental process in data mining,involves discovering correlations,patterns,and causal structures within datasets.In the healthcare domain,association rules offer valuable opportunities for building knowledge bases,enabling intelligent diagnoses,and extracting invaluable information rapidly.This paper presents a novel approach called the Machine Learning based Association Rule Mining and Classification for Healthcare Data Management System(MLARMC-HDMS).The MLARMC-HDMS technique integrates classification and association rule mining(ARM)processes.Initially,the chimp optimization algorithm-based feature selection(COAFS)technique is employed within MLARMC-HDMS to select relevant attributes.Inspired by the foraging behavior of chimpanzees,the COA algorithm mimics their search strategy for food.Subsequently,the classification process utilizes stochastic gradient descent with a multilayer perceptron(SGD-MLP)model,while the Apriori algorithm determines attribute relationships.We propose a COA-based feature selection approach for medical data classification using machine learning techniques.This approach involves selecting pertinent features from medical datasets through COA and training machine learning models using the reduced feature set.We evaluate the performance of our approach on various medical datasets employing diverse machine learning classifiers.Experimental results demonstrate that our proposed approach surpasses alternative feature selection methods,achieving higher accuracy and precision rates in medical data classification tasks.The study showcases the effectiveness and efficiency of the COA-based feature selection approach in identifying relevant features,thereby enhancing the diagnosis and treatment of various diseases.To provide further validation,we conduct detailed experiments on a benchmark medical dataset,revealing the superiority of the MLARMCHDMS model over other methods,with a maximum accuracy of 99.75%.Therefore,this research contributes to the advancement of feature selection techniques in medical data classification and highlights the potential for improving healthcare outcomes through accurate and efficient data analysis.The presented MLARMC-HDMS framework and COA-based feature selection approach offer valuable insights for researchers and practitioners working in the field of healthcare data mining and machine learning.
文摘Education is the base of the survival and growth of any state,but due to resource scarcity,students,particularly at the university level,are forced into a difficult situation.Scholarships are the most significant financial aid mechanisms developed to overcome such obstacles and assist the students in continuing with their higher studies.In this study,the convoluted situation of scholarship eligibility criteria,including parental income,responsibilities,and academic achievements,is addressed.In an attempt to maximize the scholarship selection process,numerous machine learning algorithms,including Support Vector Machines,Neural Networks,K-Nearest Neighbors,and the C4.5 algorithm,were applied.The C4.5 algorithm,owing to its efficiency in the prediction of scholarship beneficiaries based on extraneous factors,was capable of predicting a phenomenal 95.62%of predictions using extensive data of a well-esteemed government sector university from Pakistan.This percentage is 4%and 15%better than the remainder of the methods tested,and it depicts the extent of the potential for the technique to enhance the scholarship selection process.The Decision Support Systems(DSS)would not only save the administrative cost but would also create a fair and transparent process in place.In a world where accessibility to education is the key,this research provides data-oriented consolidation to ensure that deserving students are helped and allowed to get the financial assistance that they need to reach higher studies and bridge the gap between the demands of the day and the institutions of intellect.
文摘Chronic diseases such as heart disease,cancer,and diabetes are leading drivers of mortality worldwide,underscoring the need for improved efforts around early detection and prediction.The pathophysiology and management of chronic diseases have benefitted from emerging fields in molecular biology like genomics,transcriptomics,proteomics,glycomics,and lipidomics.The complex biomarker and mechanistic data from these"omics"studies present analytical and interpretive challenges,especially for traditional statistical methods.Machine learning(ML)techniques offer considerable promise in unlocking new pathways for data-driven chronic disease risk assessment and prognosis.This review provides a comprehensive overview of state-of-the-art applications of ML algorithms for chronic disease detection and prediction across datasets,including medical imaging,genomics,wearables,and electronic health records.Specifically,we review and synthesize key studies leveraging major ML approaches ranging from traditional techniques such as logistic regression and random forests to modern deep learning neural network architectures.We consolidate existing literature to date around ML for chronic disease prediction to synthesize major trends and trajectories that may inform both future research and clinical translation efforts in this growing field.While highlighting the critical innovations and successes emerging in this space,we identify the key challenges and limitations that remain to be addressed.Finally,we discuss pathways forward toward scalable,equitable,and clinically implementable ML solutions for transforming chronic disease screening and prevention.
基金Supported by the National Research Foundation of Korea(NRF)funded by the Ministry of Science,ICT&Future Planning(No.2006-2005173,NRF-2012-0009830,and NRF-2009-0090900)by the Bio&Medical Technology Development Program of the NRF funded by the Korean government,MSIP(No.NRF2015M3A9B6027139)
文摘Objective: To examine the association of body shape with cold and heat patterns, to determine which anthropometric measure is the best indicator for discriminating between the two patterns, and to investigate whether using a combination of measures can improve the predictive power to diagnose these patterns. Methods: Based on a total of 4,859 subjects (3,000 women and 1,859 men), statistical analyses using binary logistic regression were performed to assess the significance of the difference and the predictive power of each anthropometric measure, and binary logistic regression and Naive Bayes with the variable selection technique were used to assess the improvement in the predictive power of the patterns using the combined measures. Results: In women, the strongest indicators for determining the cold and heat patterns among anthropometric measures were body mass index (BMI) and rib circumference; in men, the best indicator was BMI. In experiments using a combination of measures, the values of the area under the receiver operating characteristic curve in women were 0.776 by Naive Bayes and 0.772 by logistic regression, and the values in men were 0.788 by Naive Bayes and 0.779 by logistic regression. Conclusions: Individuals with a higher BMI have a tendency toward a heat pattern in both women and men. The use of a combination of anthropometric measures can slightly improve the diagnostic accuracy. Our findings can provide fundamental information for the diagnosis of cold and heat patterns based on body shape for personalized medicine.
文摘As internet technology use is on the rise globally,phishing constitutes a considerable share of the threats that may attack individuals and organizations,leading to significant losses from personal and confidential information to substantial financial losses.Thus,much research has been dedicated in recent years to developing effective and robust mechanisms to enhance the ability to trace illegitimate web pages and to distinguish them from non-phishing sites as accurately as possible.Aiming to conclude whether a universally accepted model can detect phishing attempts with 100%accuracy,we conduct a systematic review of research carried out in 2018-2021 published in well-known journals published by Elsevier,IEEE,Springer,and Emerald.Those researchers studied different Data Mining(DM)algorithms,some of which created a whole new model,while others compared the performance of several algorithms.Some studies combined two or more algorithms to enhance the detection performance.Results reveal that while most algorithms achieve accuracies higher than 90%,only some specific models can achieve 100%accurate results.
文摘In this study, the author will investigate and utilize advanced machine learning models related to two different methodologies to determine the best and most effective way to predict individuals with heart failure and cardiovascular diseases. The first methodology involves a list of classification machine learning algorithms, and the second methodology involves the use of a deep learning algorithm known as MLP or Multilayer Perceptrons. Globally, hospitals are dealing with cases related to cardiovascular diseases and heart failure as they are major causes of death, not only for overweight individuals but also for those who do not adopt a healthy diet and lifestyle. Often, heart failures and cardiovascular diseases can be caused by many factors, including cardiomyopathy, high blood pressure, coronary heart disease, and heart inflammation [1]. Other factors, such as irregular shocks or stress, can also contribute to heart failure or a heart attack. While these events cannot be predicted, continuous data from patients’ health can help doctors predict heart failure. Therefore, this data-driven research utilizes advanced machine learning and deep learning techniques to better analyze and manipulate the data, providing doctors with informative decision-making tools regarding a person’s likelihood of experiencing heart failure. In this paper, the author employed advanced data preprocessing and cleaning techniques. Additionally, the dataset underwent testing using two different methodologies to determine the most effective machine-learning technique for producing optimal predictions. The first methodology involved employing a list of supervised classification machine learning algorithms, including Naïve Bayes (NB), KNN, logistic regression, and the SVM algorithm. The second methodology utilized a deep learning (DL) algorithm known as Multilayer Perceptrons (MLPs). This algorithm provided the author with the flexibility to experiment with different layer sizes and activation functions, such as ReLU, logistic (sigmoid), and Tanh. Both methodologies produced optimal models with high-level accuracy rates. The first methodology involves a list of supervised machine learning algorithms, including KNN, SVM, Adaboost, Logistic Regression, Naive Bayes, and Decision Tree algorithms. They achieved accuracy rates of 86%, 89%, 89%, 81%, 79%, and 99%, respectively. The author clearly explained that Decision Tree algorithm is not suitable for the dataset at hand due to overfitting issues. Therefore, it was discarded as an optimal model to be used. However, the latter methodology (Neural Network) demonstrated the most stable and optimal accuracy, achieving over 87% accuracy while adapting well to real-life situations and requiring low computing power overall. A performance assessment and evaluation were carried out based on a confusion matrix report to demonstrate feasibility and performance. The author concluded that the performance of the model in real-life situations can advance not only the medical field of science but also mathematical concepts. Additionally, the advanced preprocessing approach behind the model can provide value to the Data Science community. The model can be further developed by employing various optimization techniques to handle even larger datasets related to heart failures. Furthermore, different neural network algorithms can be tested to explore alternative approaches and yield different results.
基金financial support extended for this academic work by the Beijing Natural Science Foundation(Grant 2232066)the Open Project Foundation of State Key Laboratory of Solid Lubrication(Grant LSL-2212).
文摘The composition of base oils affects the performance of lubricants made from them.This paper proposes a hybrid model based on gradient-boosted decision tree(GBDT)to analyze the effect of different ratios of KN4010,PAO40,and PriEco3000 component in a composite base oil system on the performance of lubricants.The study was conducted under small laboratory sample conditions,and a data expansion method using the Gaussian Copula function was proposed to improve the prediction ability of the hybrid model.The study also compared four optimization algorithms,sticky mushroom algorithm(SMA),genetic algorithm(GA),whale optimization algorithm(WOA),and seagull optimization algorithm(SOA),to predict the kinematic viscosity at 40℃,kinematic viscosity at 100℃,viscosity index,and oxidation induction time performance of the lubricant.The results showed that the Gaussian Copula function data expansion method improved the prediction ability of the hybrid model in the case of small samples.The SOA-GBDT hybrid model had the fastest convergence speed for the samples and the best prediction effect,with determination coefficients(R^(2))for the four indicators of lubricants reaching 0.98,0.99,0.96 and 0.96,respectively.Thus,this model can significantly reduce the model’s prediction error and has good prediction ability.
文摘Influenza is a kind of infectious disease, which spreads quickly and widely. The outbreak of influenza has brought huge losses to society. In this paper, four major categories of flu keywords, “prevention phase”, “symptom phase”, “treatment phase”, and “commonly-used phrase” were set. Python web crawler was used to obtain relevant influenza data from the National Influenza Center’s influenza surveillance weekly report and Baidu Index. The establishment of support vector regression (SVR), least absolute shrinkage and selection operator (LASSO), convolutional neural networks (CNN) prediction models through machine learning, took into account the seasonal characteristics of the influenza, also established the time series model (ARMA). The results show that, it is feasible to predict influenza based on web search data. Machine learning shows a certain forecast effect in the prediction of influenza based on web search data. In the future, it will have certain reference value in influenza prediction. The ARMA(3,0) model predicts better results and has greater generalization. Finally, the lack of research in this paper and future research directions are given.
基金This research work was funded by the Institutional Fund Projects under Grant No.(IFPIP:959-611-1443)The authors gratefully acknowledge the technical and financial support provided by the Ministry of Education and King Abdulaziz University,DSR,Jeddah,Saudi Arabia.
文摘Data mining and analytics involve inspecting and modeling large pre-existing datasets to discover decision-making information.Precision agriculture uses datamining to advance agricultural developments.Many farmers aren’t getting the most out of their land because they don’t use precision agriculture.They harvest crops without a well-planned recommendation system.Future crop production is calculated by combining environmental conditions and management behavior,yielding numerical and categorical data.Most existing research still needs to address data preprocessing and crop categorization/classification.Furthermore,statistical analysis receives less attention,despite producing more accurate and valid results.The study was conducted on a dataset about Karnataka state,India,with crops of eight parameters taken into account,namely the minimum amount of fertilizers required,such as nitrogen,phosphorus,potassium,and pH values.The research considers rainfall,season,soil type,and temperature parameters to provide precise cultivation recommendations for high productivity.The presented algorithm converts discrete numerals to factors first,then reduces levels.Second,the algorithm generates six datasets,two fromCase-1(dataset withmany numeric variables),two from Case-2(dataset with many categorical variables),and one from Case-3(dataset with reduced factor variables).Finally,the algorithm outputs a class membership allocation based on an extended version of the K-means partitioning method with lambda estimation.The presented work produces mixed-type datasets with precisely categorized crops by organizing data based on environmental conditions,soil nutrients,and geo-location.Finally,the prepared dataset solves the classification problem,leading to a model evaluation that selects the best dataset for precise crop prediction.
文摘This paper presents an efficient prediction model for a good learning environment using Random Forest(RF)classifier.It consists of a series of modules;data preprocessing,data normalization,data split andfinally classification or prediction by the RF classifier.The preprocessed data is normalized using minmax normalization often used before modelfitting.As the input data or variables are measured at different scales,it is necessary to normalize them to contribute equally to the modelfitting.Then,the RF classifier is employed for course selection which is an ensemble learning method and k-fold cross-validation(k=10)is used to validate the model.The proposed Prediction Model for Course Selection(PMCS)system is considered a multi-class problem that predicts the course for a particular learner with three complexity levels,namely low,medium and high.It is operated under two modes;locally and globally.The former considers the gender of the learner and the later does not consider the gender of the learner.The database comprises the learner opinions from 75 males and 75 females per category(low,medium and high).Thus the system uses a total of 450 samples to evaluate the performance of the PMCS system.Results show that the system’s performance,while using locally i.e.,gender-wise has slightly higher performance than the global system.The RF classifier with 75 decision trees in the global system provides an average accuracy of 97.6%,whereas in the local system it is 97%(male)and 97.6%(female).The overall performance of the RF classifier with 75 trees is better than 25,50 and 100 decision trees in both local and global systems.
文摘In this paper, it described the architecture of a tool called DiagData. This tool aims to use a large amount of data and information in the field of plant disease diagnostic to generate a disease predictive system. In this approach, techniques of data mining are used to extract knowledge from existing data. The data is extracted in the form of rules that are used in the development of a predictive intelligent system. Currently, the specification of these rules is built by an expert or data mining. When data mining on a large database is used, the number of generated rules is very complex too. The main goal of this work is minimize the rule generation time. The proposed tool, called DiagData, extracts knowledge automatically or semi-automatically from a database and uses it to build an intelligent system for disease prediction. In this work, the decision tree learning algorithm was used to generate the rules. A toolbox called Fuzzygen was used to generate a prediction system from rules generated by decision tree algorithm. The language used to implement this software was Java. The DiagData has been used in diseases prediction and diagnosis systems and in the validation of economic and environmental indicators in agricultural production systems. The validation process involved measurements and comparisons of the time spent to enter the rules by an expert with the time used to insert the same rules with the proposed tool. Thus, the tool was successfully validated, providing a reduction of time.
文摘Particle Swarm Optimization(PSO)has been utilized as a useful tool for solving intricate optimization problems for various applications in different fields.This paper attempts to carry out an update on PSO and gives a review of its recent developments and applications,but also provides arguments for its efficacy in resolving optimization problems in comparison with other algorithms.Covering six strategic areas,which include Data Mining,Machine Learning,Engineering Design,Energy Systems,Healthcare,and Robotics,the study demonstrates the versatility and effectiveness of the PSO.Experimental results are,however,used to show the strong and weak parts of PSO,and performance results are included in tables for ease of comparison.The results stress PSO’s efficiency in providing optimal solutions but also show that there are aspects that need to be improved through combination with algorithms or tuning to the parameters of the method.The review of the advantages and limitations of PSO is intended to provide academics and practitioners with a well-rounded view of the methods of employing such a tool most effectively and to encourage optimized designs of PSO in solving theoretical and practical problems in the future.
基金fully supported by the National Natural Science Foundation of China(52072412)。
文摘The futures trading market is an important part of the financial markets and soybeans are one of the most strategically important crops in the world.How to predict soybean future price is a challenging topic being studied by many researchers.This paper proposes a novel hybrid soybean future price prediction model which includes two stages of data preprocessing and deep learning prediction.In the data preprocessing stage,futures price series are decomposed into subsequences using the ICEEMDAN(improved complete ensemble empirical mode decomposition with adaptive noise)method.The Lempel-Ziv complexity determination method was then used to identify and reconstruct high-frequency subsequences.Finally,the high frequency component is decomposed secondarily using variational mode decomposition optimized by beluga whale optimization algorithm.In the deep learning prediction stage,a deep extreme learning machine optimized by the sparrow search algorithm was used to obtain the prediction results of all subseries and reconstructs them to obtain the final soybean future price prediction results.Based on the experimental results of soybean future price markets in China,Italy,and the United States,it was found that the hybrid method proposed provides superior performance in terms of prediction accuracy and robustness.