Forecasting of ocean currents is critical for both marine meteorological research and ocean engineering and construction.Timely and accurate forecasting of coastal current velocities offers a scientific foundation and...Forecasting of ocean currents is critical for both marine meteorological research and ocean engineering and construction.Timely and accurate forecasting of coastal current velocities offers a scientific foundation and decision support for multiple practices such as search and rescue,disaster avoidance and remediation,and offshore construction.This research established a framework to generate short-term surface current forecasts based on ensemble machine learning trained on high frequency radar observation.Results indicate that an ensemble algorithm that used random forests to filter forecasting features by weighting them,and then used the AdaBoost method to forecast can significantly reduce the model training time,while ensuring the model forecasting effectiveness,with great economic benefits.Model accuracy is a function of surface current variability and the forecasting horizon.In order to improve the forecasting capability and accuracy of the model,the model structure of the ensemble algorithm was optimized,and the random forest algorithm was used to dynamically select model features.The results show that the error variation of the optimized surface current forecasting model has a more regular error variation,and the importance of the features varies with the forecasting time-step.At ten-step ahead forecasting horizon the model reported root mean square error,mean absolute error,and correlation coefficient by 2.84 cm/s,2.02 cm/s,and 0.96,respectively.The model error is affected by factors such as topography,boundaries,and geometric accuracy of the observation system.This paper demonstrates the potential of ensemble-based machine learning algorithm to improve forecasting of ocean currents.展开更多
The software engineering field has long focused on creating high-quality software despite limited resources.Detecting defects before the testing stage of software development can enable quality assurance engineers to ...The software engineering field has long focused on creating high-quality software despite limited resources.Detecting defects before the testing stage of software development can enable quality assurance engineers to con-centrate on problematic modules rather than all the modules.This approach can enhance the quality of the final product while lowering development costs.Identifying defective modules early on can allow for early corrections and ensure the timely delivery of a high-quality product that satisfies customers and instills greater confidence in the development team.This process is known as software defect prediction,and it can improve end-product quality while reducing the cost of testing and maintenance.This study proposes a software defect prediction system that utilizes data fusion,feature selection,and ensemble machine learning fusion techniques.A novel filter-based metric selection technique is proposed in the framework to select the optimum features.A three-step nested approach is presented for predicting defective modules to achieve high accuracy.In the first step,three supervised machine learning techniques,including Decision Tree,Support Vector Machines,and Naïve Bayes,are used to detect faulty modules.The second step involves integrating the predictive accuracy of these classification techniques through three ensemble machine-learning methods:Bagging,Voting,and Stacking.Finally,in the third step,a fuzzy logic technique is employed to integrate the predictive accuracy of the ensemble machine learning techniques.The experiments are performed on a fused software defect dataset to ensure that the developed fused ensemble model can perform effectively on diverse datasets.Five NASA datasets are integrated to create the fused dataset:MW1,PC1,PC3,PC4,and CM1.According to the results,the proposed system exhibited superior performance to other advanced techniques for predicting software defects,achieving a remarkable accuracy rate of 92.08%.展开更多
This study has provided an approach to classify soil using machine learning.Multiclass elements of stand-alone machine learning algorithms(i.e.logistic regression(LR)and artificial neural network(ANN)),decision tree e...This study has provided an approach to classify soil using machine learning.Multiclass elements of stand-alone machine learning algorithms(i.e.logistic regression(LR)and artificial neural network(ANN)),decision tree ensembles(i.e.decision forest(DF)and decision jungle(DJ)),and meta-ensemble models(i.e.stacking ensemble(SE)and voting ensemble(VE))were used to classify soils based on their intrinsic physico-chemical properties.Also,the multiclass prediction was carried out across multiple cross-validation(CV)methods,i.e.train validation split(TVS),k-fold cross-validation(KFCV),and Monte Carlo cross-validation(MCCV).Results indicated that the soils’clay fraction(CF)had the most influence on the multiclass prediction of natural soils’plasticity while specific surface and carbonate content(CC)possessed the least within the nature of the dataset used in this study.Stand-alone machine learning models(LR and ANN)produced relatively less accurate predictive performance(accuracy of 0.45,average precision of 0.5,and average recall of 0.44)compared to tree-based models(accuracy of 0.68,average precision of 0.71,and recall rate of 0.68),while the meta-ensembles(SE and VE)outperformed(accuracy of 0.75,average precision of 0.74,and average recall rate of 0.72)all the models utilised for multiclass classification.Sensitivity analysis of the meta-ensembles proved their capacities to discriminate between soil classes across the methods of CV considered.Machine learning training and validation using MCCV and KFCV methods enabled better prediction while also ensuring that the dataset was not overfitted by the machine learning models.Further confirmation of this phenomenon was depicted by the continuous rise of the cumulative lift curve(LC)of the best performing models when using the MCCV technique.Overall,this study demonstrated that soil’s physico-chemical properties do have a direct influence on plastic behaviour and,therefore,can be relied upon to classify soils.展开更多
Drilling optimization requires accurate drill bit rate-of-penetration(ROP)predictions.ROP decreases drilling time and costs and increases rig productivity.This study employs random forest(RF),gradient boosting modelin...Drilling optimization requires accurate drill bit rate-of-penetration(ROP)predictions.ROP decreases drilling time and costs and increases rig productivity.This study employs random forest(RF),gradient boosting modeling(GBM),extreme gradient boosting(XGBoost),and adaptive boosting(Adaboost)models to generate ROP pre-dictions.The models use well data from a 3200-m segment across the stratigraphic column(Dibdibba to Zubair formations)of the large West Qurna oil field in Southern Iraq,penetrating 19 formations and four oil reservoirs.The reservoir sections are between 40 and 440 m thick and consist of both carbonate and clastic lithologies.The ROP predictive models were developed using 14 operational parameters:TVD,weight on bit(WOB),torque,effective circulating density(ECD),drilling rotation per minute(RPM),flow rate,standpipe pressure(SPP),bit size,total RPM,D exponent,gamma ray(GR),density,neutron,caliper,and discrete lithology distribution.Training and validation of the ROP models involves data compiled from three development wells.Applying Random subsampling,the compiled dataset was split into 85%for training and 15%for validation and testing.The test subgroup’s measured and predicted ROP mismatch was assessed using root mean square error(RMSE)and coefficient of correlation(R^(2)).The RF,GBM,and XGBoost models provide ROP predictions versus depth with low errors.Models with cross-validation that integrate data from three wells deliver more accurate ROP pre-dictions than datasets from single well.The input variables’influences on ROP optimization identify the optimal value ranges for 14 operating parameters that help to increase drilling speed and reduce cost.展开更多
This paper proposes a new cost-efficient,adaptive,and self-healing algorithm in real time that detects faults in a short period with high accuracy,even in the situations when it is difficult to detect.Rather than usin...This paper proposes a new cost-efficient,adaptive,and self-healing algorithm in real time that detects faults in a short period with high accuracy,even in the situations when it is difficult to detect.Rather than using traditional machine learning(ML)algorithms or hybrid signal processing techniques,a new framework based on an optimization enabled weighted ensemble method is developed that combines essential ML algorithms.In the proposed method,the system will select and compound appropriate ML algorithms based on Particle Swarm Optimization(PSO)weights.For this purpose,power system failures are simulated by using the PSCA D-Python co-simulation.One of the salient features of this study is that the proposed solution works on real-time raw data without using any pre-computational techniques or pre-stored information.Therefore,the proposed technique will be able to work on different systems,topologies,or data collections.The proposed fault detection technique is validated by using PSCAD-Python co-simulation on a modified and standard IEEE-14 and standard IEEE-39 bus considering network faults which are difficult to detect.展开更多
Artificial intelligence(AI)serves as a key technology in global industrial transformation and technological restructuring and as the core driver of the fourth industrial revolution.Currently,deep learning techniques,s...Artificial intelligence(AI)serves as a key technology in global industrial transformation and technological restructuring and as the core driver of the fourth industrial revolution.Currently,deep learning techniques,such as convolutional neural networks,enable intelligent information collection in fields such as tongue and pulse diagnosis owing to their robust feature-processing capabilities.Natural language processing models,including long short-term memory and transformers,have been applied to traditional Chinese medicine(TCM)for diagnosis,syndrome differentiation,and prescription generation.Traditional machine learning algorithms,such as neural networks,support vector machines,and random forests,are also widely used in TCM diagnosis and treatment because of their strong regression and classification performance on small structured datasets.Future research on AI in TCM diagnosis and treatment may emphasize building large-scale,high-quality TCM datasets with unified criteria based on syndrome elements;identifying algorithms suited to TCM theoretical data distributions;and leveraging AI multimodal fusion and ensemble learning techniques for diverse raw features,such as images,text,and manually processed structured data,to increase the clinical efficacy of TCM diagnosis and treatment.展开更多
Precipitation is a significant index to measure the degree of drought and flood in a region,which directly reflects the local natural changes and ecological environment.It is very important to grasp the change charact...Precipitation is a significant index to measure the degree of drought and flood in a region,which directly reflects the local natural changes and ecological environment.It is very important to grasp the change characteristics and law of precipitation accurately for effectively reducing disaster loss and maintaining the stable development of a social economy.In order to accurately predict precipitation,a new precipitation prediction model based on extreme learning machine ensemble(ELME)is proposed.The integrated model is based on the extreme learning machine(ELM)with different kernel functions and supporting parameters,and the submodel with the minimum root mean square error(RMSE)is found to fit the test data.Due to the complex mechanism and factors affecting precipitation change,the data have strong uncertainty and significant nonlinear variation characteristics.The mean generating function(MGF)is used to generate the continuation factor matrix,and the principal component analysis technique is employed to reduce the dimension of the continuation matrix,and the effective data features are extracted.Finally,the ELME prediction model is established by using the precipitation data of Liuzhou city from 1951 to 2021 in June,July and August,and a comparative experiment is carried out by using ELM,long-term and short-term memory neural network(LSTM)and back propagation neural network based on genetic algorithm(GA-BP).The experimental results show that the prediction accuracy of the proposed method is significantly higher than that of other models,and it has high stability and reliability,which provides a reliable method for precipitation prediction.展开更多
As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single mac...As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single machine learning(ML)classifiers,especially stacking-based ensemble learning.Stacking utilizes heterogeneous-base learners trained in parallel and combines their predictions using a meta-model to determine the final prediction results.However,building an ensemble often causes the model performance to decrease due to the increasing number of learners that are not being properly selected.Therefore,the goal of this paper is to develop and evaluate a generic,data-independent predictive method using stacked-based ensemble learning(GA-Stacking)optimized by aGenetic Algorithm(GA)for outbreak prediction and health decision aided processes.GA-Stacking utilizes five well-known classifiers,including Decision Tree(DT),Random Forest(RF),RIGID regression,Least Absolute Shrinkage and Selection Operator(LASSO),and eXtreme Gradient Boosting(XGBoost),at its first level.It also introduces GA to identify comparisons to forecast the number,combination,and trust of these base classifiers based on theMean Squared Error(MSE)as a fitness function.At the second level of the stacked ensemblemodel,a Linear Regression(LR)classifier is used to produce the final prediction.The performance of the model was evaluated using a publicly available dataset from the Center for Systems Science and Engineering,Johns Hopkins University,which consisted of 10,722 data samples.The experimental results indicated that the GA-Stacking model achieved outstanding performance with an overall accuracy of 99.99%for the three selected countries.Furthermore,the proposed model achieved good performance when compared with existing baggingbased approaches.The proposed model can be used to predict the pandemic outbreak correctly and may be applied as a generic data-independent model 3946 CMC,2023,vol.74,no.2 to predict the epidemic trend for other countries when comparing preventive and control measures.展开更多
Difficulty in communicating and interacting with other people are mainly due to the neurological disorder called autism spectrum disorder(ASD)diseases.These diseases can affect the nerves at any stage of the human bein...Difficulty in communicating and interacting with other people are mainly due to the neurological disorder called autism spectrum disorder(ASD)diseases.These diseases can affect the nerves at any stage of the human being in childhood,adolescence,and adulthood.ASD is known as a behavioral disease due to the appearances of symptoms over thefirst two years that continue until adulthood.Most of the studies prove that the early detection of ASD helps improve the behavioral characteristics of patients with ASD.The detection of ASD is a very challenging task among various researchers.Machine learning(ML)algorithms still act very intelligent by learning the complex data and pre-dicting quality results.In this paper,ensemble ML techniques for the early detec-tion of ASD are proposed.In this detection,the dataset isfirst processed using three ML algorithms such as sequential minimal optimization with support vector machine,Kohonen self-organizing neural network,and random forest algorithm.The prediction results of these ML algorithms(ensemble)further use the bagging concept called max voting to predict thefinal result.The accuracy,sensitivity,and specificity of the proposed system are calculated using confusion matrix.The pro-posed ensemble technique performs better than state-of-the art ML algorithms.展开更多
Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the ident...Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the identification of risk factors associated with genetic disorders.Methods:Our study introduces a novel two-tiered analytical framework to raise the precision and reliability of genetic data interpretation.It is initiated by extracting and analyzing salient features from DNA sequences through a CNN-based feature analysis,taking advantage of the power inherent in Convolutional neural networks(CNNs)to attain complex patterns and minute mutations in genetic data.This study embraces an elite collection of machine learning classifiers interweaved through a stern voting mechanism,which synergistically joins the predictions made from multiple classifiers to generate comprehensive and well-balanced interpretations of the genetic data.Results:This state-of-the-art method was further tested by carrying out an empirical analysis on a variants'dataset of DNA sequences taken from patients affected by breast cancer,juxtaposed with a control group composed of healthy people.Thus,the integration of CNNs with a voting-based ensemble of classifiers returned outstanding outcomes,with performance metrics accuracy,precision,recall,and F1-scorereaching the outstanding rate of 0.88,outperforming previous models.Conclusions:This dual accomplishment underlines the transformative potential that integrating deep learning techniques with ensemble machine learning might provide in real added value for further genetic diagnostics and prognostics.These results from this study set a new benchmark in the accuracy of disease diagnosis through DNA sequencing and promise future studies on improved personalized medicine and healthcare approaches with precise genetic information.展开更多
The Internet of Things(IoT)integrates more than billions of intelligent devices over the globe with the capabilityof communicating with other connected devices with little to no human intervention.IoT enables data agg...The Internet of Things(IoT)integrates more than billions of intelligent devices over the globe with the capabilityof communicating with other connected devices with little to no human intervention.IoT enables data aggregationand analysis on a large scale to improve life quality in many domains.In particular,data collected by IoT containa tremendous amount of information for anomaly detection.The heterogeneous nature of IoT is both a challengeand an opportunity for cybersecurity.Traditional approaches in cybersecurity monitoring often require different kindsof data pre-processing and handling for various data types,which might be problematic for datasets that contain heterogeneousfeatures.However,heterogeneous types of network devices can often capture a more diverse set of signalsthan a single type of device readings,which is particularly useful for anomaly detection.In this paper,we presenta comprehensive study on using ensemble machine learning methods for enhancing IoT cybersecurity via anomalydetection.Rather than using one single machine learning model,ensemble learning combines the predictive powerfrom multiple models,enhancing their predictive accuracy in heterogeneous datasets rather than using one singlemachine learning model.We propose a unified framework with ensemble learning that utilises Bayesian hyperparameteroptimisation to adapt to a network environment that contains multiple IoT sensor readings.Experimentally,weillustrate their high predictive power when compared to traditional methods.展开更多
The Extreme Learning Machine(ELM) is an effective learning algorithm for a Single-Layer Feedforward Network(SLFN). It performs well in managing some problems due to its fast learning speed. However, in practical a...The Extreme Learning Machine(ELM) is an effective learning algorithm for a Single-Layer Feedforward Network(SLFN). It performs well in managing some problems due to its fast learning speed. However, in practical applications, its performance might be affected by the noise in the training data. To tackle the noise issue, we propose a novel heterogeneous ensemble of ELMs in this article. Specifically, the correntropy is used to achieve insensitive performance to outliers, while implementing Negative Correlation Learning(NCL) to enhance diversity among the ensemble. The proposed Heterogeneous Ensemble of ELMs(HE2 LM) for classification has different ELM algorithms including the Regularized ELM(RELM), the Kernel ELM(KELM), and the L2-norm-optimized ELM(ELML2). The ensemble is constructed by training a randomly selected ELM classifier on a subset of the training data selected through random resampling. Then, the class label of unseen data is predicted using a maximum weighted sum approach. After splitting the training data into subsets, the proposed HE2 LM is tested through classification and regression tasks on real-world benchmark datasets and synthetic datasets. Hence, the simulation results show that compared with other algorithms, our proposed method can achieve higher prediction accuracy, better generalization, and less sensitivity to outliers.展开更多
This paper proposes the multiple-input multiple-output(MIMO)detection scheme by using the deep neural network(DNN)based ensemble machine learning for higher error performance in wireless communication systems.For the ...This paper proposes the multiple-input multiple-output(MIMO)detection scheme by using the deep neural network(DNN)based ensemble machine learning for higher error performance in wireless communication systems.For the MIMO detection based on the ensemble machine learning,all learning models for the DNN are generated in offline and the detection is performed in online by using already learned models.In the offline learning,the received signals and channel coefficients are set to input data,and the labels which correspond to transmit symbols are set to output data.In the online learning,the perfectly learned models are used for signal detection where the models have fixed bias and weights.For performance improvement,the proposed scheme uses the majority vote and the maximum probability as the methods of the model combinations for obtaining diversity gains at the MIMO receiver.The simulation results show that the proposed scheme has improved symbol error rate(SER)performance without additional receive antennas.展开更多
Falling is among the most harmful events older adults may encounter.With the continuous growth of the aging population in many societies,developing effective fall detection mechanisms empowered by machine learning tec...Falling is among the most harmful events older adults may encounter.With the continuous growth of the aging population in many societies,developing effective fall detection mechanisms empowered by machine learning technologies and easily integrable with existing healthcare systems becomes essential.This paper presents a new healthcare Internet of Health Things(IoHT)architecture built around an ensemble machine learning-based fall detection system(FDS)for older people.Compared to deep neural networks,the ensemble multi-stage random forest model allows the extraction of an optimal subset of fall detection features with minimal hyperparameters.The number of cascaded random forest stages is automatically optimized.This study uses a public dataset of fall detection samples called SmartFall to validate the developed fall detection system.The SmartFall dataset is collected based on the acquired measurements of the three-axis accelerometer in a smartwatch.Each scenario in this dataset is classified and labeled as a fall or a non-fall.In comparison to the three machine learning models—K-nearest neighbors(KNN),decision tree(DT),and standard random forest(SRF),the proposed ensemble classifier outperformed the other models and achieved 98.4%accuracy.The developed healthcare IoHT framework can be realized for detecting fall accidents of older people by taking security and privacy concerns into account in future work.展开更多
The explosion of online information with the recent advent of digital technology in information processing,information storing,information sharing,natural language processing,and text mining techniques has enabled sto...The explosion of online information with the recent advent of digital technology in information processing,information storing,information sharing,natural language processing,and text mining techniques has enabled stock investors to uncover market movement and volatility from heterogeneous content.For example,a typical stock market investor reads the news,explores market sentiment,and analyzes technical details in order to make a sound decision prior to purchasing or selling a particular company’s stock.However,capturing a dynamic stock market trend is challenging owing to high fluctuation and the non-stationary nature of the stock market.Although existing studies have attempted to enhance stock prediction,few have provided a complete decision-support system for investors to retrieve real-time data from multiple sources and extract insightful information for sound decision-making.To address the above challenge,we propose a unified solution for data collection,analysis,and visualization in real-time stock market prediction to retrieve and process relevant financial data from news articles,social media,and company technical information.We aim to provide not only useful information for stock investors but also meaningful visualization that enables investors to effectively interpret storyline events affecting stock prices.Specifically,we utilize an ensemble stacking of diversified machine-learning-based estimators and innovative contextual feature engineering to predict the next day’s stock prices.Experiment results show that our proposed stock forecasting method outperforms a traditional baseline with an average mean absolute percentage error of 0.93.Our findings confirm that leveraging an ensemble scheme of machine learning methods with contextual information improves stock prediction performance.Finally,our study could be further extended to a wide variety of innovative financial applications that seek to incorporate external insight from contextual information such as large-scale online news articles and social media data.展开更多
基金The fund from Southern Marine Science and Engineering Guangdong Laboratory(Zhuhai)under contract No.SML2020SP009the National Basic Research and Development Program of China under contract Nos 2022YFF0802000 and 2022YFF0802004+3 种基金the“Renowned Overseas Professors”Project of Guangdong Provincial Department of Science and Technology under contract No.76170-52910004the Belt and Road Special Foundation of the National Key Laboratory of Water Disaster Prevention under contract No.2022491711the National Natural Science Foundation of China under contract No.51909290the Key Research and Development Program of Guangdong Province under contract No.2020B1111020003.
文摘Forecasting of ocean currents is critical for both marine meteorological research and ocean engineering and construction.Timely and accurate forecasting of coastal current velocities offers a scientific foundation and decision support for multiple practices such as search and rescue,disaster avoidance and remediation,and offshore construction.This research established a framework to generate short-term surface current forecasts based on ensemble machine learning trained on high frequency radar observation.Results indicate that an ensemble algorithm that used random forests to filter forecasting features by weighting them,and then used the AdaBoost method to forecast can significantly reduce the model training time,while ensuring the model forecasting effectiveness,with great economic benefits.Model accuracy is a function of surface current variability and the forecasting horizon.In order to improve the forecasting capability and accuracy of the model,the model structure of the ensemble algorithm was optimized,and the random forest algorithm was used to dynamically select model features.The results show that the error variation of the optimized surface current forecasting model has a more regular error variation,and the importance of the features varies with the forecasting time-step.At ten-step ahead forecasting horizon the model reported root mean square error,mean absolute error,and correlation coefficient by 2.84 cm/s,2.02 cm/s,and 0.96,respectively.The model error is affected by factors such as topography,boundaries,and geometric accuracy of the observation system.This paper demonstrates the potential of ensemble-based machine learning algorithm to improve forecasting of ocean currents.
基金supported by the Center for Cyber-Physical Systems,Khalifa University,under Grant 8474000137-RC1-C2PS-T5.
文摘The software engineering field has long focused on creating high-quality software despite limited resources.Detecting defects before the testing stage of software development can enable quality assurance engineers to con-centrate on problematic modules rather than all the modules.This approach can enhance the quality of the final product while lowering development costs.Identifying defective modules early on can allow for early corrections and ensure the timely delivery of a high-quality product that satisfies customers and instills greater confidence in the development team.This process is known as software defect prediction,and it can improve end-product quality while reducing the cost of testing and maintenance.This study proposes a software defect prediction system that utilizes data fusion,feature selection,and ensemble machine learning fusion techniques.A novel filter-based metric selection technique is proposed in the framework to select the optimum features.A three-step nested approach is presented for predicting defective modules to achieve high accuracy.In the first step,three supervised machine learning techniques,including Decision Tree,Support Vector Machines,and Naïve Bayes,are used to detect faulty modules.The second step involves integrating the predictive accuracy of these classification techniques through three ensemble machine-learning methods:Bagging,Voting,and Stacking.Finally,in the third step,a fuzzy logic technique is employed to integrate the predictive accuracy of the ensemble machine learning techniques.The experiments are performed on a fused software defect dataset to ensure that the developed fused ensemble model can perform effectively on diverse datasets.Five NASA datasets are integrated to create the fused dataset:MW1,PC1,PC3,PC4,and CM1.According to the results,the proposed system exhibited superior performance to other advanced techniques for predicting software defects,achieving a remarkable accuracy rate of 92.08%.
文摘This study has provided an approach to classify soil using machine learning.Multiclass elements of stand-alone machine learning algorithms(i.e.logistic regression(LR)and artificial neural network(ANN)),decision tree ensembles(i.e.decision forest(DF)and decision jungle(DJ)),and meta-ensemble models(i.e.stacking ensemble(SE)and voting ensemble(VE))were used to classify soils based on their intrinsic physico-chemical properties.Also,the multiclass prediction was carried out across multiple cross-validation(CV)methods,i.e.train validation split(TVS),k-fold cross-validation(KFCV),and Monte Carlo cross-validation(MCCV).Results indicated that the soils’clay fraction(CF)had the most influence on the multiclass prediction of natural soils’plasticity while specific surface and carbonate content(CC)possessed the least within the nature of the dataset used in this study.Stand-alone machine learning models(LR and ANN)produced relatively less accurate predictive performance(accuracy of 0.45,average precision of 0.5,and average recall of 0.44)compared to tree-based models(accuracy of 0.68,average precision of 0.71,and recall rate of 0.68),while the meta-ensembles(SE and VE)outperformed(accuracy of 0.75,average precision of 0.74,and average recall rate of 0.72)all the models utilised for multiclass classification.Sensitivity analysis of the meta-ensembles proved their capacities to discriminate between soil classes across the methods of CV considered.Machine learning training and validation using MCCV and KFCV methods enabled better prediction while also ensuring that the dataset was not overfitted by the machine learning models.Further confirmation of this phenomenon was depicted by the continuous rise of the cumulative lift curve(LC)of the best performing models when using the MCCV technique.Overall,this study demonstrated that soil’s physico-chemical properties do have a direct influence on plastic behaviour and,therefore,can be relied upon to classify soils.
文摘Drilling optimization requires accurate drill bit rate-of-penetration(ROP)predictions.ROP decreases drilling time and costs and increases rig productivity.This study employs random forest(RF),gradient boosting modeling(GBM),extreme gradient boosting(XGBoost),and adaptive boosting(Adaboost)models to generate ROP pre-dictions.The models use well data from a 3200-m segment across the stratigraphic column(Dibdibba to Zubair formations)of the large West Qurna oil field in Southern Iraq,penetrating 19 formations and four oil reservoirs.The reservoir sections are between 40 and 440 m thick and consist of both carbonate and clastic lithologies.The ROP predictive models were developed using 14 operational parameters:TVD,weight on bit(WOB),torque,effective circulating density(ECD),drilling rotation per minute(RPM),flow rate,standpipe pressure(SPP),bit size,total RPM,D exponent,gamma ray(GR),density,neutron,caliper,and discrete lithology distribution.Training and validation of the ROP models involves data compiled from three development wells.Applying Random subsampling,the compiled dataset was split into 85%for training and 15%for validation and testing.The test subgroup’s measured and predicted ROP mismatch was assessed using root mean square error(RMSE)and coefficient of correlation(R^(2)).The RF,GBM,and XGBoost models provide ROP predictions versus depth with low errors.Models with cross-validation that integrate data from three wells deliver more accurate ROP pre-dictions than datasets from single well.The input variables’influences on ROP optimization identify the optimal value ranges for 14 operating parameters that help to increase drilling speed and reduce cost.
文摘This paper proposes a new cost-efficient,adaptive,and self-healing algorithm in real time that detects faults in a short period with high accuracy,even in the situations when it is difficult to detect.Rather than using traditional machine learning(ML)algorithms or hybrid signal processing techniques,a new framework based on an optimization enabled weighted ensemble method is developed that combines essential ML algorithms.In the proposed method,the system will select and compound appropriate ML algorithms based on Particle Swarm Optimization(PSO)weights.For this purpose,power system failures are simulated by using the PSCA D-Python co-simulation.One of the salient features of this study is that the proposed solution works on real-time raw data without using any pre-computational techniques or pre-stored information.Therefore,the proposed technique will be able to work on different systems,topologies,or data collections.The proposed fault detection technique is validated by using PSCAD-Python co-simulation on a modified and standard IEEE-14 and standard IEEE-39 bus considering network faults which are difficult to detect.
基金supported by grants from the National Natural Science Foundation of China(Key Program)(No.82230124)Traditional Chinese Medicine Inheritance and Innovation“Ten million”talent project-Qihuang Project Chief Scientist Project(No.0201000401)+1 种基金State Administration of Traditional Chinese Medicine 2nd National Traditional Chinese Medicine Inheritance Studio Construction Project(Official Letter of the State Office of Traditional Chinese Medicine[2022]No.245)National Natural Science Foundation of China(General Program)(No.81974556).
文摘Artificial intelligence(AI)serves as a key technology in global industrial transformation and technological restructuring and as the core driver of the fourth industrial revolution.Currently,deep learning techniques,such as convolutional neural networks,enable intelligent information collection in fields such as tongue and pulse diagnosis owing to their robust feature-processing capabilities.Natural language processing models,including long short-term memory and transformers,have been applied to traditional Chinese medicine(TCM)for diagnosis,syndrome differentiation,and prescription generation.Traditional machine learning algorithms,such as neural networks,support vector machines,and random forests,are also widely used in TCM diagnosis and treatment because of their strong regression and classification performance on small structured datasets.Future research on AI in TCM diagnosis and treatment may emphasize building large-scale,high-quality TCM datasets with unified criteria based on syndrome elements;identifying algorithms suited to TCM theoretical data distributions;and leveraging AI multimodal fusion and ensemble learning techniques for diverse raw features,such as images,text,and manually processed structured data,to increase the clinical efficacy of TCM diagnosis and treatment.
基金funded by Scientific Research Project of Guangxi Normal University of Science and Technology,grant number GXKS2022QN024.
文摘Precipitation is a significant index to measure the degree of drought and flood in a region,which directly reflects the local natural changes and ecological environment.It is very important to grasp the change characteristics and law of precipitation accurately for effectively reducing disaster loss and maintaining the stable development of a social economy.In order to accurately predict precipitation,a new precipitation prediction model based on extreme learning machine ensemble(ELME)is proposed.The integrated model is based on the extreme learning machine(ELM)with different kernel functions and supporting parameters,and the submodel with the minimum root mean square error(RMSE)is found to fit the test data.Due to the complex mechanism and factors affecting precipitation change,the data have strong uncertainty and significant nonlinear variation characteristics.The mean generating function(MGF)is used to generate the continuation factor matrix,and the principal component analysis technique is employed to reduce the dimension of the continuation matrix,and the effective data features are extracted.Finally,the ELME prediction model is established by using the precipitation data of Liuzhou city from 1951 to 2021 in June,July and August,and a comparative experiment is carried out by using ELM,long-term and short-term memory neural network(LSTM)and back propagation neural network based on genetic algorithm(GA-BP).The experimental results show that the prediction accuracy of the proposed method is significantly higher than that of other models,and it has high stability and reliability,which provides a reliable method for precipitation prediction.
文摘As a result of the increased number of COVID-19 cases,Ensemble Machine Learning(EML)would be an effective tool for combatting this pandemic outbreak.An ensemble of classifiers can improve the performance of single machine learning(ML)classifiers,especially stacking-based ensemble learning.Stacking utilizes heterogeneous-base learners trained in parallel and combines their predictions using a meta-model to determine the final prediction results.However,building an ensemble often causes the model performance to decrease due to the increasing number of learners that are not being properly selected.Therefore,the goal of this paper is to develop and evaluate a generic,data-independent predictive method using stacked-based ensemble learning(GA-Stacking)optimized by aGenetic Algorithm(GA)for outbreak prediction and health decision aided processes.GA-Stacking utilizes five well-known classifiers,including Decision Tree(DT),Random Forest(RF),RIGID regression,Least Absolute Shrinkage and Selection Operator(LASSO),and eXtreme Gradient Boosting(XGBoost),at its first level.It also introduces GA to identify comparisons to forecast the number,combination,and trust of these base classifiers based on theMean Squared Error(MSE)as a fitness function.At the second level of the stacked ensemblemodel,a Linear Regression(LR)classifier is used to produce the final prediction.The performance of the model was evaluated using a publicly available dataset from the Center for Systems Science and Engineering,Johns Hopkins University,which consisted of 10,722 data samples.The experimental results indicated that the GA-Stacking model achieved outstanding performance with an overall accuracy of 99.99%for the three selected countries.Furthermore,the proposed model achieved good performance when compared with existing baggingbased approaches.The proposed model can be used to predict the pandemic outbreak correctly and may be applied as a generic data-independent model 3946 CMC,2023,vol.74,no.2 to predict the epidemic trend for other countries when comparing preventive and control measures.
文摘Difficulty in communicating and interacting with other people are mainly due to the neurological disorder called autism spectrum disorder(ASD)diseases.These diseases can affect the nerves at any stage of the human being in childhood,adolescence,and adulthood.ASD is known as a behavioral disease due to the appearances of symptoms over thefirst two years that continue until adulthood.Most of the studies prove that the early detection of ASD helps improve the behavioral characteristics of patients with ASD.The detection of ASD is a very challenging task among various researchers.Machine learning(ML)algorithms still act very intelligent by learning the complex data and pre-dicting quality results.In this paper,ensemble ML techniques for the early detec-tion of ASD are proposed.In this detection,the dataset isfirst processed using three ML algorithms such as sequential minimal optimization with support vector machine,Kohonen self-organizing neural network,and random forest algorithm.The prediction results of these ML algorithms(ensemble)further use the bagging concept called max voting to predict thefinal result.The accuracy,sensitivity,and specificity of the proposed system are calculated using confusion matrix.The pro-posed ensemble technique performs better than state-of-the art ML algorithms.
文摘Background:In the field of genetic diagnostics,DNA sequencing is an important tool because the depth and complexity of this field have major implications in light of the genetic architectures of diseases and the identification of risk factors associated with genetic disorders.Methods:Our study introduces a novel two-tiered analytical framework to raise the precision and reliability of genetic data interpretation.It is initiated by extracting and analyzing salient features from DNA sequences through a CNN-based feature analysis,taking advantage of the power inherent in Convolutional neural networks(CNNs)to attain complex patterns and minute mutations in genetic data.This study embraces an elite collection of machine learning classifiers interweaved through a stern voting mechanism,which synergistically joins the predictions made from multiple classifiers to generate comprehensive and well-balanced interpretations of the genetic data.Results:This state-of-the-art method was further tested by carrying out an empirical analysis on a variants'dataset of DNA sequences taken from patients affected by breast cancer,juxtaposed with a control group composed of healthy people.Thus,the integration of CNNs with a voting-based ensemble of classifiers returned outstanding outcomes,with performance metrics accuracy,precision,recall,and F1-scorereaching the outstanding rate of 0.88,outperforming previous models.Conclusions:This dual accomplishment underlines the transformative potential that integrating deep learning techniques with ensemble machine learning might provide in real added value for further genetic diagnostics and prognostics.These results from this study set a new benchmark in the accuracy of disease diagnosis through DNA sequencing and promise future studies on improved personalized medicine and healthcare approaches with precise genetic information.
文摘The Internet of Things(IoT)integrates more than billions of intelligent devices over the globe with the capabilityof communicating with other connected devices with little to no human intervention.IoT enables data aggregationand analysis on a large scale to improve life quality in many domains.In particular,data collected by IoT containa tremendous amount of information for anomaly detection.The heterogeneous nature of IoT is both a challengeand an opportunity for cybersecurity.Traditional approaches in cybersecurity monitoring often require different kindsof data pre-processing and handling for various data types,which might be problematic for datasets that contain heterogeneousfeatures.However,heterogeneous types of network devices can often capture a more diverse set of signalsthan a single type of device readings,which is particularly useful for anomaly detection.In this paper,we presenta comprehensive study on using ensemble machine learning methods for enhancing IoT cybersecurity via anomalydetection.Rather than using one single machine learning model,ensemble learning combines the predictive powerfrom multiple models,enhancing their predictive accuracy in heterogeneous datasets rather than using one singlemachine learning model.We propose a unified framework with ensemble learning that utilises Bayesian hyperparameteroptimisation to adapt to a network environment that contains multiple IoT sensor readings.Experimentally,weillustrate their high predictive power when compared to traditional methods.
基金supported by the National Natural Science Foundation of China(Nos.61174103 and61603032)the National Key Technologies R&D Program of China(No.2015BAK38B01)+2 种基金the National Key Research and Development Program of China(No.2017YFB0702300)the China Postdoctoral Science Foundation(No.2016M590048)the University of Science and Technology Beijing–Taipei University of Technology Joint Research Program(TW201705)
文摘The Extreme Learning Machine(ELM) is an effective learning algorithm for a Single-Layer Feedforward Network(SLFN). It performs well in managing some problems due to its fast learning speed. However, in practical applications, its performance might be affected by the noise in the training data. To tackle the noise issue, we propose a novel heterogeneous ensemble of ELMs in this article. Specifically, the correntropy is used to achieve insensitive performance to outliers, while implementing Negative Correlation Learning(NCL) to enhance diversity among the ensemble. The proposed Heterogeneous Ensemble of ELMs(HE2 LM) for classification has different ELM algorithms including the Regularized ELM(RELM), the Kernel ELM(KELM), and the L2-norm-optimized ELM(ELML2). The ensemble is constructed by training a randomly selected ELM classifier on a subset of the training data selected through random resampling. Then, the class label of unseen data is predicted using a maximum weighted sum approach. After splitting the training data into subsets, the proposed HE2 LM is tested through classification and regression tasks on real-world benchmark datasets and synthetic datasets. Hence, the simulation results show that compared with other algorithms, our proposed method can achieve higher prediction accuracy, better generalization, and less sensitivity to outliers.
基金This work was supported by the National Research Foundation of Korea(NRF)grant funded by the Korea government(MSIT)(No.NRF-2021R1A2C2005777)was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF)funded by the Ministry of Education(2020R1A6A1A03038540)。
文摘This paper proposes the multiple-input multiple-output(MIMO)detection scheme by using the deep neural network(DNN)based ensemble machine learning for higher error performance in wireless communication systems.For the MIMO detection based on the ensemble machine learning,all learning models for the DNN are generated in offline and the detection is performed in online by using already learned models.In the offline learning,the received signals and channel coefficients are set to input data,and the labels which correspond to transmit symbols are set to output data.In the online learning,the perfectly learned models are used for signal detection where the models have fixed bias and weights.For performance improvement,the proposed scheme uses the majority vote and the maximum probability as the methods of the model combinations for obtaining diversity gains at the MIMO receiver.The simulation results show that the proposed scheme has improved symbol error rate(SER)performance without additional receive antennas.
基金the Deputyship for Research&Innovation,Ministry of Education in Saudi Arabia through the project number(IFP2021-043).
文摘Falling is among the most harmful events older adults may encounter.With the continuous growth of the aging population in many societies,developing effective fall detection mechanisms empowered by machine learning technologies and easily integrable with existing healthcare systems becomes essential.This paper presents a new healthcare Internet of Health Things(IoHT)architecture built around an ensemble machine learning-based fall detection system(FDS)for older people.Compared to deep neural networks,the ensemble multi-stage random forest model allows the extraction of an optimal subset of fall detection features with minimal hyperparameters.The number of cascaded random forest stages is automatically optimized.This study uses a public dataset of fall detection samples called SmartFall to validate the developed fall detection system.The SmartFall dataset is collected based on the acquired measurements of the three-axis accelerometer in a smartwatch.Each scenario in this dataset is classified and labeled as a fall or a non-fall.In comparison to the three machine learning models—K-nearest neighbors(KNN),decision tree(DT),and standard random forest(SRF),the proposed ensemble classifier outperformed the other models and achieved 98.4%accuracy.The developed healthcare IoHT framework can be realized for detecting fall accidents of older people by taking security and privacy concerns into account in future work.
基金supported by Mahidol University(Grant No.MU-MiniRC02/2564)We also appreciate the partial computing resources from Grant No.RSA6280105funded by Thailand Science Research and Innovation(TSRI),(formerly known as the Thailand Research Fund(TRF)),and the National Research Council of Thailand(NRCT).
文摘The explosion of online information with the recent advent of digital technology in information processing,information storing,information sharing,natural language processing,and text mining techniques has enabled stock investors to uncover market movement and volatility from heterogeneous content.For example,a typical stock market investor reads the news,explores market sentiment,and analyzes technical details in order to make a sound decision prior to purchasing or selling a particular company’s stock.However,capturing a dynamic stock market trend is challenging owing to high fluctuation and the non-stationary nature of the stock market.Although existing studies have attempted to enhance stock prediction,few have provided a complete decision-support system for investors to retrieve real-time data from multiple sources and extract insightful information for sound decision-making.To address the above challenge,we propose a unified solution for data collection,analysis,and visualization in real-time stock market prediction to retrieve and process relevant financial data from news articles,social media,and company technical information.We aim to provide not only useful information for stock investors but also meaningful visualization that enables investors to effectively interpret storyline events affecting stock prices.Specifically,we utilize an ensemble stacking of diversified machine-learning-based estimators and innovative contextual feature engineering to predict the next day’s stock prices.Experiment results show that our proposed stock forecasting method outperforms a traditional baseline with an average mean absolute percentage error of 0.93.Our findings confirm that leveraging an ensemble scheme of machine learning methods with contextual information improves stock prediction performance.Finally,our study could be further extended to a wide variety of innovative financial applications that seek to incorporate external insight from contextual information such as large-scale online news articles and social media data.