To solve the multi-class fault diagnosis tasks,decision tree support vector machine(DTSVM),which combines SVM and decision tree using the concept of dichotomy,is proposed.Since the classification performance of DTSVM ...To solve the multi-class fault diagnosis tasks,decision tree support vector machine(DTSVM),which combines SVM and decision tree using the concept of dichotomy,is proposed.Since the classification performance of DTSVM highly depends on its structure,to cluster the multi-classes with maximum distance between the clustering centers of the two sub-classes,genetic algorithm is introduced into the formation of decision tree,so that the most separable classes would be separated at each node of decisions tree.Numerical simulations conducted on three datasets compared with"one-against-all"and"one-against-one"demonstrate the proposed method has better performance and higher generalization ability than the two conventional methods.展开更多
This paper presents the fault diagnosis of face milling tool based on machine learning approach.While machining,spindle vibration signals in feed direction under healthy and faulty conditions of the milling tool are a...This paper presents the fault diagnosis of face milling tool based on machine learning approach.While machining,spindle vibration signals in feed direction under healthy and faulty conditions of the milling tool are acquired.A set of discrete wavelet features is extracted from the vibration signals using discrete wavelet transform(DWT)technique.The decision tree technique is used to select significant features out of all extracted wavelet features.C-support vector classification(C-SVC)andν-support vector classification(ν-SVC)models with different kernel functions of support vector machine(SVM)are used to study and classify the tool condition based on selected features.From the results obtained,C-SVC is the best model thanν-SVC and it can be able to give 94.5%classification accuracy for face milling of special steel alloy 42CrMo4.展开更多
Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the verac...Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the veracity of the detection algorithms become critical to the deployment of a model that accurately scores fraudulent transactions taking into account case imbalance, and the cost of identifying a case as genuine when, in fact, the case is a fraudulent transaction. In this paper, a new criterion to judge classification algorithms, which considers the cost of misclassification, is proposed, and several undersampling techniques are compared by this new criterion. At the same time, a weighted support vector machine (SVM) algorithm considering the financial cost of misclassification is introduced, proving to be more practical for credit card fraud detection than traditional methodologies. This weighted SVM uses transaction balances as weights for fraudulent transactions, and a uniformed weight for nonfraudulent transactions. The results show this strategy greatly improve performance of credit card fraud detection.展开更多
Scientists have introduced new methods for capturing energy from ocean waves.Specifically,scientists have focused on a type of wave energy converter(WEC)that is nonbuoyant(i.e.,a body that cannot float).Typically,the ...Scientists have introduced new methods for capturing energy from ocean waves.Specifically,scientists have focused on a type of wave energy converter(WEC)that is nonbuoyant(i.e.,a body that cannot float).Typically,the WEC is most effective when it is in resonance,which occurs when the natural frequency of the WEC aligns with that of the ocean waves.Therefore,accurately predicting the movement of the WEC is crucial for adjusting its system to resonate with the incoming waves for optimal performance.In this study,artificial intelligence techniques,such as random forest,extra trees(ET),and support vector machines,are created to forecast the vertical movement of a nonbuoyant WEC.The developed models require two variables as input,namely,the water wave height and its time period.A total of approximately 4500 data points,which include nonlinear water wave height and duration ob-tained from a laboratory experiment,are used as the input for these models,with the resulting vertical movement as the output.When comparing the three models based on their processing speed and accuracy,the ET model stands out as the most efficient.Ultimately,the ET model is tested using data from a real ocean setting.展开更多
Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for ...Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for each pairs of points. Based on the proximity graph model [3], the Euclidean distance in Hilbert space is calculated using a Gaussian kernel, which is the right criterion to generate a minimum spanning tree using Kruskal's algorithm. Then the connectivity estimation is lowered by only checking the linkages between the edges that construct the main stem of the MST (Minimum Spanning Tree), in which the non-compatibility degree is originally defined to support the edge selection during linkage estimations. This new approach is experimentally analyzed. The results show that the revised algorithm has a better performance than the proximity graph model with faster speed, optimized clustering quality and strong ability to noise suppression, which makes SVC scalable to large data sets.展开更多
Every second, a large volume of useful data is created in social media about the various kind of online purchases and in another forms of reviews. Particularly, purchased products review data is enormously growing in ...Every second, a large volume of useful data is created in social media about the various kind of online purchases and in another forms of reviews. Particularly, purchased products review data is enormously growing in different database repositories every day. Most of the review data are useful to new customers for theier further purchases as well as existing companies to view customers feedback about various products. Data Mining and Machine Leaning techniques are familiar to analyse such kind of data to visualise and know the potential use of the purchased items through online. The customers are making quality of products through their sentiments about the purchased items from different online companies. In this research work, it is analysed sentiments of Headphone review data, which is collected from online repositories. For the analysis of Headphone review data, some of the Machine Learning techniques like Support Vector Machines, Naive Bayes, Decision Trees and Random Forest Algorithms and a Hybrid method are applied to find the quality via the customers’ sentiments. The accuracy and performance of the taken algorithms are also analysed based on the three types of sentiments such as positive, negative and neutral.展开更多
BACKGROUND Delayed wound healing is a common clinical complication following gastric cancer radical surgery,adversely affecting patient prognosis.With advances in artificial intelligence,machine learning offers a prom...BACKGROUND Delayed wound healing is a common clinical complication following gastric cancer radical surgery,adversely affecting patient prognosis.With advances in artificial intelligence,machine learning offers a promising approach for developing predictive models that can identify high-risk patients and support early clinical intervention.AIM To construct machine learning-based risk prediction models for delayed wound healing after gastric cancer surgery to support clinical decision-making.METHODS We reviewed a total of 514 patients who underwent gastric cancer radical surgery under general anesthesia from January 1,2014 to December 30,2023.Seventy percent of the dataset was selected as the training set and 30%as the validation set.Decision trees,support vector machines,and logistic regression were used to construct a risk prediction model.The performance of the model was evaluated using accuracy,recall,precision,F1 index,and area under the receiver operating characteristic curve and decision curve.RESULTS This study included five variables:Sex,elderly,duration of abdominal drainage,preoperative white blood cell(WBC)count,and absolute value of neutrophils.These variables were selected based on their clinical relevance and statistical significance in predicting delayed wound healing.The results showed that the decision tree model outperformed the logistic regression and support vector machine models in both the training and validation sets.Specifically,the decision tree model achieved higher accuracy,F1 index,recall,and area under the curve(AUC)values.The support vector machine model also demonstrated better performance than logistic regression,with higher accuracy,recall,and F1 index,but a slightly lower AUC.The key variables of sex,elderly,duration of abdominal drainage,preoperative WBC count,and absolute value of neutrophils were found to be strong predictors of delayed wound healing.Patients with longer duration of abdominal drainage had a significantly higher risk of delayed wound healing,with a risk ratio of 1.579 compared to those with shorter duration of abdominal drainage.Similarly,preoperative WBC count,sex,elderly,and absolute value of neutrophils were associated with a higher risk of delayed wound healing,highlighting the importance of these variables in the model.CONCLUSION The model is able to identify high-risk patients based on sex,elderly,duration of abdominal drainage,preoperative WBC count,and absolute value of neutrophils can provide valuable insights for clinical decision-making.展开更多
The significance of precise energy usage forecasts has been highlighted by the increasing need for sustainability and energy efficiency across a range of industries.In order to improve the precision and openness of en...The significance of precise energy usage forecasts has been highlighted by the increasing need for sustainability and energy efficiency across a range of industries.In order to improve the precision and openness of energy consumption projections,this study investigates the combination of machine learning(ML)methods with Shapley additive explanations(SHAP)values.The study evaluates three distinct models:the first is a Linear Regressor,the second is a Support Vector Regressor,and the third is a Decision Tree Regressor,which was scaled up to a Random Forest Regressor/Additions made were the third one which was Regressor which was extended to a Random Forest Regressor.These models were deployed with the use of Shareable,Plot-interpretable Explainable Artificial Intelligence techniques,to improve trust in the AI.The findings suggest that our developedmodels are superior to the conventional models discussed in prior studies;with high Mean Absolute Error(MAE)and Root Mean Squared Error(RMSE)values being close to perfection.In detail,the Random Forest Regressor shows the MAE of 0.001 for predicting the house prices whereas the SVR gives 0.21 of MAE and 0.24 RMSE.Such outcomes reflect the possibility of optimizing the use of the promoted advanced AI models with the use of Explainable AI for more accurate prediction of energy consumption and at the same time for the models’decision-making procedures’explanation.In addition to increasing prediction accuracy,this strategy gives stakeholders comprehensible insights,which facilitates improved decision-making and fosters confidence in AI-powered energy solutions.The outcomes show how well ML and SHAP work together to enhance prediction performance and guarantee transparency in energy usage projections.展开更多
This study aims to evaluate the effectiveness of machine learning techniques for predicting groundwater fluctuations in arid and semi-arid regions using data from the Gravity Recovery and Climate Experiment satellite ...This study aims to evaluate the effectiveness of machine learning techniques for predicting groundwater fluctuations in arid and semi-arid regions using data from the Gravity Recovery and Climate Experiment satellite mission.The primary objective is to develop accurate predictive models for groundwa-ter level changes by leveraging the unique capabilities of GRACE satellite data in conjunction with advanced machine learning algorithms.Three widely-used machine learning models,namely DT,SVM and RF,were employed to analyze and model the relationship between GRACE satellite data and groundwater fluctuations in South Khorasan Province,Iran.The study utilized 151 months of GRACE data spanning from 2002 to 2017,which were correlated with piezometer well data available in the study area.The JPL 2 model was selected based on its strong correlation(R=0.9368)with the observed data.The machine learn-ing models were trained and validated using a 70/30 split of the data,and their performance was evaluated 2 using various statistical metrics,including RMSE,R and NSE.The results demonstrated the suitability of machine learning approaches for modeling groundwater fluctuations using GRACE satellite data.The DT 2 model exhibited the best performance during the calibration stage,with an R value of 0.95,RMSE of 20.655,and NSE of 0.96.The SVM and RF models achieved R values of 0.79 and 0.65,and NSE values of 0.86 and 0.71,respectively.For the prediction stage,the DT model maintained its high efficiency,with an 2 RMSE of 1.48,R of 0.87,and NSE of 0.90,indicating its robustness in predicting future groundwater fluc-tuations using GRACE data.The study highlights the potential of machine learning techniques,particularly Decision Trees,in conjunction with GRACE satellite data,for accurate prediction and monitoring of groundwater fluctuations in arid and semi-arid regions.The findings demonstrate the effectiveness of the DT model in capturing the complex relationships between GRACE data and groundwater dynamics,provid-ing reliable predictions and insights for sustainable groundwater management strategies.展开更多
Machine learning techniques and a dataset of five wells from the Rawat oilfield in Sudan containing 93,925 samples per feature(seven well logs and one facies log) were used to classify four facies. Data preprocessing ...Machine learning techniques and a dataset of five wells from the Rawat oilfield in Sudan containing 93,925 samples per feature(seven well logs and one facies log) were used to classify four facies. Data preprocessing and preparation involve two processes: data cleaning and feature scaling. Several machine learning algorithms, including Linear Regression(LR), Decision Tree(DT), Support Vector Machine(SVM),Random Forest(RF), and Gradient Boosting(GB) for classification, were tested using different iterations and various combinations of features and parameters. The support vector radial kernel training model achieved an accuracy of 72.49% without grid search and 64.02% with grid search, while the blind-well test scores were 71.01% and 69.67%, respectively. The Decision Tree(DT) Hyperparameter Optimization model showed an accuracy of 64.15% for training and 67.45% for testing. In comparison, the Decision Tree coupled with grid search yielded better results, with a training score of 69.91% and a testing score of67.89%. The model's validation was carried out using the blind well validation approach, which achieved an accuracy of 69.81%. Three algorithms were used to generate the gradient-boosting model. During training, the Gradient Boosting classifier achieved an accuracy score of 71.57%, and during testing, it achieved 69.89%. The Grid Search model achieved a higher accuracy score of 72.14% during testing. The Extreme Gradient Boosting model had the lowest accuracy score, with only 66.13% for training and66.12% for testing. For validation, the Gradient Boosting(GB) classifier model achieved an accuracy score of 75.41% on the blind well test, while the Gradient Boosting with Grid Search achieved an accuracy score of 71.36%. The Enhanced Random Forest and Random Forest with Bagging algorithms were the most effective, with validation accuracies of 78.30% and 79.18%, respectively. However, the Random Forest and Random Forest with Grid Search models displayed significant variance between their training and testing scores, indicating the potential for overfitting. Random Forest(RF) and Gradient Boosting(GB) are highly effective for facies classification because they handle complex relationships and provide high predictive accuracy. The choice between the two depends on specific project requirements, including interpretability, computational resources, and data nature.展开更多
Posterior probability support vector machines (PPSVMs) prove robust against noises and outliers and need fewer storage support vectors (SVs). Gonen et al. (2008) extended PPSVMs to a multiclass case by both single-mac...Posterior probability support vector machines (PPSVMs) prove robust against noises and outliers and need fewer storage support vectors (SVs). Gonen et al. (2008) extended PPSVMs to a multiclass case by both single-machine and multimachine approaches. However, these extensions suffer from low classification efficiency, high computational burden, and more importantly, unclassifiable regions. To achieve higher classification efficiency and accuracy with fewer SVs, a binary tree of PPSVMs for the multiclass classification problem is proposed in this letter. Moreover, a Fisher ratio separability measure is adopted to determine the tree structure. Several experiments on handwritten recognition datasets are included to illustrate the proposed approach. Specifically, the Fisher ratio separability accelerated binary tree of PPSVMs obtains overall test accuracy, if not higher than, at least comparable to those of other multiclass algorithms, while using significantly fewer SVs and much less test time.展开更多
In recent decades,the proliferation of email communication has markedly escalated,resulting in a concomitant surge in spam emails that congest networks and presenting security risks.This study introduces an innovative...In recent decades,the proliferation of email communication has markedly escalated,resulting in a concomitant surge in spam emails that congest networks and presenting security risks.This study introduces an innovative spam detection method utilizing the Horse Herd Optimization Algorithm(HHOA),designed for binary classification within multi⁃objective framework.The method proficiently identifies essential features,minimizing redundancy and improving classification precision.The suggested HHOA attained an impressive accuracy of 97.21%on the Kaggle email dataset,with precision of 94.30%,recall of 90.50%,and F1⁃score of 92.80%.Compared to conventional techniques,such as Support Vector Machine(93.89%accuracy),Random Forest(96.14%accuracy),and K⁃Nearest Neighbours(92.08%accuracy),HHOA exhibited enhanced performance with reduced computing complexity.The suggested method demonstrated enhanced feature selection efficiency,decreasing the number of selected features while maintaining high classification accuracy.The results underscore the efficacy of HHOA in spam identification and indicate its potential for further applications in practical email filtering systems.展开更多
基金supported by the National Natural Science Foundation of China(60604021,60874054)
文摘To solve the multi-class fault diagnosis tasks,decision tree support vector machine(DTSVM),which combines SVM and decision tree using the concept of dichotomy,is proposed.Since the classification performance of DTSVM highly depends on its structure,to cluster the multi-classes with maximum distance between the clustering centers of the two sub-classes,genetic algorithm is introduced into the formation of decision tree,so that the most separable classes would be separated at each node of decisions tree.Numerical simulations conducted on three datasets compared with"one-against-all"and"one-against-one"demonstrate the proposed method has better performance and higher generalization ability than the two conventional methods.
文摘This paper presents the fault diagnosis of face milling tool based on machine learning approach.While machining,spindle vibration signals in feed direction under healthy and faulty conditions of the milling tool are acquired.A set of discrete wavelet features is extracted from the vibration signals using discrete wavelet transform(DWT)technique.The decision tree technique is used to select significant features out of all extracted wavelet features.C-support vector classification(C-SVC)andν-support vector classification(ν-SVC)models with different kernel functions of support vector machine(SVM)are used to study and classify the tool condition based on selected features.From the results obtained,C-SVC is the best model thanν-SVC and it can be able to give 94.5%classification accuracy for face milling of special steel alloy 42CrMo4.
文摘Credit card fraudulent data is highly imbalanced, and it has presented an overwhelmingly large portion of nonfraudulent transactions and a small portion of fraudulent transactions. The measures used to judge the veracity of the detection algorithms become critical to the deployment of a model that accurately scores fraudulent transactions taking into account case imbalance, and the cost of identifying a case as genuine when, in fact, the case is a fraudulent transaction. In this paper, a new criterion to judge classification algorithms, which considers the cost of misclassification, is proposed, and several undersampling techniques are compared by this new criterion. At the same time, a weighted support vector machine (SVM) algorithm considering the financial cost of misclassification is introduced, proving to be more practical for credit card fraud detection than traditional methodologies. This weighted SVM uses transaction balances as weights for fraudulent transactions, and a uniformed weight for nonfraudulent transactions. The results show this strategy greatly improve performance of credit card fraud detection.
文摘Scientists have introduced new methods for capturing energy from ocean waves.Specifically,scientists have focused on a type of wave energy converter(WEC)that is nonbuoyant(i.e.,a body that cannot float).Typically,the WEC is most effective when it is in resonance,which occurs when the natural frequency of the WEC aligns with that of the ocean waves.Therefore,accurately predicting the movement of the WEC is crucial for adjusting its system to resonate with the incoming waves for optimal performance.In this study,artificial intelligence techniques,such as random forest,extra trees(ET),and support vector machines,are created to forecast the vertical movement of a nonbuoyant WEC.The developed models require two variables as input,namely,the water wave height and its time period.A total of approximately 4500 data points,which include nonlinear water wave height and duration ob-tained from a laboratory experiment,are used as the input for these models,with the resulting vertical movement as the output.When comparing the three models based on their processing speed and accuracy,the ET model stands out as the most efficient.Ultimately,the ET model is tested using data from a real ocean setting.
基金TheNationalHighTechnologyResearchandDevelopmentProgramofChina (No .86 3 5 11 930 0 0 9)
文摘Support Vector Clustering (SVC) is a kernel-based unsupervised learning clustering method. The main drawback of SVC is its high computational complexity in getting the adjacency matrix describing the connectivity for each pairs of points. Based on the proximity graph model [3], the Euclidean distance in Hilbert space is calculated using a Gaussian kernel, which is the right criterion to generate a minimum spanning tree using Kruskal's algorithm. Then the connectivity estimation is lowered by only checking the linkages between the edges that construct the main stem of the MST (Minimum Spanning Tree), in which the non-compatibility degree is originally defined to support the edge selection during linkage estimations. This new approach is experimentally analyzed. The results show that the revised algorithm has a better performance than the proximity graph model with faster speed, optimized clustering quality and strong ability to noise suppression, which makes SVC scalable to large data sets.
文摘Every second, a large volume of useful data is created in social media about the various kind of online purchases and in another forms of reviews. Particularly, purchased products review data is enormously growing in different database repositories every day. Most of the review data are useful to new customers for theier further purchases as well as existing companies to view customers feedback about various products. Data Mining and Machine Leaning techniques are familiar to analyse such kind of data to visualise and know the potential use of the purchased items through online. The customers are making quality of products through their sentiments about the purchased items from different online companies. In this research work, it is analysed sentiments of Headphone review data, which is collected from online repositories. For the analysis of Headphone review data, some of the Machine Learning techniques like Support Vector Machines, Naive Bayes, Decision Trees and Random Forest Algorithms and a Hybrid method are applied to find the quality via the customers’ sentiments. The accuracy and performance of the taken algorithms are also analysed based on the three types of sentiments such as positive, negative and neutral.
基金Supported by the Shandong Province Traditional Chinese Medicine Technology Project,No.Q-2023147the Weifang Health Commission Research Project,No.WFWSJK-2023-033+3 种基金the Weifang City Science and Technology Development Plan(Medical Category),No.2023YX057the Weifang Medical University 2022 Campus Level Education and Teaching Reform and Research Project,No.2022YB051Norman Bethune Public Welfare Foundation,No.ezmr2023-037Special Research Project on Optimized Management of Acute Pain,Wu Jieping Medical Foundation.
文摘BACKGROUND Delayed wound healing is a common clinical complication following gastric cancer radical surgery,adversely affecting patient prognosis.With advances in artificial intelligence,machine learning offers a promising approach for developing predictive models that can identify high-risk patients and support early clinical intervention.AIM To construct machine learning-based risk prediction models for delayed wound healing after gastric cancer surgery to support clinical decision-making.METHODS We reviewed a total of 514 patients who underwent gastric cancer radical surgery under general anesthesia from January 1,2014 to December 30,2023.Seventy percent of the dataset was selected as the training set and 30%as the validation set.Decision trees,support vector machines,and logistic regression were used to construct a risk prediction model.The performance of the model was evaluated using accuracy,recall,precision,F1 index,and area under the receiver operating characteristic curve and decision curve.RESULTS This study included five variables:Sex,elderly,duration of abdominal drainage,preoperative white blood cell(WBC)count,and absolute value of neutrophils.These variables were selected based on their clinical relevance and statistical significance in predicting delayed wound healing.The results showed that the decision tree model outperformed the logistic regression and support vector machine models in both the training and validation sets.Specifically,the decision tree model achieved higher accuracy,F1 index,recall,and area under the curve(AUC)values.The support vector machine model also demonstrated better performance than logistic regression,with higher accuracy,recall,and F1 index,but a slightly lower AUC.The key variables of sex,elderly,duration of abdominal drainage,preoperative WBC count,and absolute value of neutrophils were found to be strong predictors of delayed wound healing.Patients with longer duration of abdominal drainage had a significantly higher risk of delayed wound healing,with a risk ratio of 1.579 compared to those with shorter duration of abdominal drainage.Similarly,preoperative WBC count,sex,elderly,and absolute value of neutrophils were associated with a higher risk of delayed wound healing,highlighting the importance of these variables in the model.CONCLUSION The model is able to identify high-risk patients based on sex,elderly,duration of abdominal drainage,preoperative WBC count,and absolute value of neutrophils can provide valuable insights for clinical decision-making.
文摘The significance of precise energy usage forecasts has been highlighted by the increasing need for sustainability and energy efficiency across a range of industries.In order to improve the precision and openness of energy consumption projections,this study investigates the combination of machine learning(ML)methods with Shapley additive explanations(SHAP)values.The study evaluates three distinct models:the first is a Linear Regressor,the second is a Support Vector Regressor,and the third is a Decision Tree Regressor,which was scaled up to a Random Forest Regressor/Additions made were the third one which was Regressor which was extended to a Random Forest Regressor.These models were deployed with the use of Shareable,Plot-interpretable Explainable Artificial Intelligence techniques,to improve trust in the AI.The findings suggest that our developedmodels are superior to the conventional models discussed in prior studies;with high Mean Absolute Error(MAE)and Root Mean Squared Error(RMSE)values being close to perfection.In detail,the Random Forest Regressor shows the MAE of 0.001 for predicting the house prices whereas the SVR gives 0.21 of MAE and 0.24 RMSE.Such outcomes reflect the possibility of optimizing the use of the promoted advanced AI models with the use of Explainable AI for more accurate prediction of energy consumption and at the same time for the models’decision-making procedures’explanation.In addition to increasing prediction accuracy,this strategy gives stakeholders comprehensible insights,which facilitates improved decision-making and fosters confidence in AI-powered energy solutions.The outcomes show how well ML and SHAP work together to enhance prediction performance and guarantee transparency in energy usage projections.
文摘This study aims to evaluate the effectiveness of machine learning techniques for predicting groundwater fluctuations in arid and semi-arid regions using data from the Gravity Recovery and Climate Experiment satellite mission.The primary objective is to develop accurate predictive models for groundwa-ter level changes by leveraging the unique capabilities of GRACE satellite data in conjunction with advanced machine learning algorithms.Three widely-used machine learning models,namely DT,SVM and RF,were employed to analyze and model the relationship between GRACE satellite data and groundwater fluctuations in South Khorasan Province,Iran.The study utilized 151 months of GRACE data spanning from 2002 to 2017,which were correlated with piezometer well data available in the study area.The JPL 2 model was selected based on its strong correlation(R=0.9368)with the observed data.The machine learn-ing models were trained and validated using a 70/30 split of the data,and their performance was evaluated 2 using various statistical metrics,including RMSE,R and NSE.The results demonstrated the suitability of machine learning approaches for modeling groundwater fluctuations using GRACE satellite data.The DT 2 model exhibited the best performance during the calibration stage,with an R value of 0.95,RMSE of 20.655,and NSE of 0.96.The SVM and RF models achieved R values of 0.79 and 0.65,and NSE values of 0.86 and 0.71,respectively.For the prediction stage,the DT model maintained its high efficiency,with an 2 RMSE of 1.48,R of 0.87,and NSE of 0.90,indicating its robustness in predicting future groundwater fluc-tuations using GRACE data.The study highlights the potential of machine learning techniques,particularly Decision Trees,in conjunction with GRACE satellite data,for accurate prediction and monitoring of groundwater fluctuations in arid and semi-arid regions.The findings demonstrate the effectiveness of the DT model in capturing the complex relationships between GRACE data and groundwater dynamics,provid-ing reliable predictions and insights for sustainable groundwater management strategies.
文摘Machine learning techniques and a dataset of five wells from the Rawat oilfield in Sudan containing 93,925 samples per feature(seven well logs and one facies log) were used to classify four facies. Data preprocessing and preparation involve two processes: data cleaning and feature scaling. Several machine learning algorithms, including Linear Regression(LR), Decision Tree(DT), Support Vector Machine(SVM),Random Forest(RF), and Gradient Boosting(GB) for classification, were tested using different iterations and various combinations of features and parameters. The support vector radial kernel training model achieved an accuracy of 72.49% without grid search and 64.02% with grid search, while the blind-well test scores were 71.01% and 69.67%, respectively. The Decision Tree(DT) Hyperparameter Optimization model showed an accuracy of 64.15% for training and 67.45% for testing. In comparison, the Decision Tree coupled with grid search yielded better results, with a training score of 69.91% and a testing score of67.89%. The model's validation was carried out using the blind well validation approach, which achieved an accuracy of 69.81%. Three algorithms were used to generate the gradient-boosting model. During training, the Gradient Boosting classifier achieved an accuracy score of 71.57%, and during testing, it achieved 69.89%. The Grid Search model achieved a higher accuracy score of 72.14% during testing. The Extreme Gradient Boosting model had the lowest accuracy score, with only 66.13% for training and66.12% for testing. For validation, the Gradient Boosting(GB) classifier model achieved an accuracy score of 75.41% on the blind well test, while the Gradient Boosting with Grid Search achieved an accuracy score of 71.36%. The Enhanced Random Forest and Random Forest with Bagging algorithms were the most effective, with validation accuracies of 78.30% and 79.18%, respectively. However, the Random Forest and Random Forest with Grid Search models displayed significant variance between their training and testing scores, indicating the potential for overfitting. Random Forest(RF) and Gradient Boosting(GB) are highly effective for facies classification because they handle complex relationships and provide high predictive accuracy. The choice between the two depends on specific project requirements, including interpretability, computational resources, and data nature.
基金Project (Nos. 60874104 and 70971020) supported by the National Natural Science Foundation of China
文摘Posterior probability support vector machines (PPSVMs) prove robust against noises and outliers and need fewer storage support vectors (SVs). Gonen et al. (2008) extended PPSVMs to a multiclass case by both single-machine and multimachine approaches. However, these extensions suffer from low classification efficiency, high computational burden, and more importantly, unclassifiable regions. To achieve higher classification efficiency and accuracy with fewer SVs, a binary tree of PPSVMs for the multiclass classification problem is proposed in this letter. Moreover, a Fisher ratio separability measure is adopted to determine the tree structure. Several experiments on handwritten recognition datasets are included to illustrate the proposed approach. Specifically, the Fisher ratio separability accelerated binary tree of PPSVMs obtains overall test accuracy, if not higher than, at least comparable to those of other multiclass algorithms, while using significantly fewer SVs and much less test time.
文摘In recent decades,the proliferation of email communication has markedly escalated,resulting in a concomitant surge in spam emails that congest networks and presenting security risks.This study introduces an innovative spam detection method utilizing the Horse Herd Optimization Algorithm(HHOA),designed for binary classification within multi⁃objective framework.The method proficiently identifies essential features,minimizing redundancy and improving classification precision.The suggested HHOA attained an impressive accuracy of 97.21%on the Kaggle email dataset,with precision of 94.30%,recall of 90.50%,and F1⁃score of 92.80%.Compared to conventional techniques,such as Support Vector Machine(93.89%accuracy),Random Forest(96.14%accuracy),and K⁃Nearest Neighbours(92.08%accuracy),HHOA exhibited enhanced performance with reduced computing complexity.The suggested method demonstrated enhanced feature selection efficiency,decreasing the number of selected features while maintaining high classification accuracy.The results underscore the efficacy of HHOA in spam identification and indicate its potential for further applications in practical email filtering systems.