The burgeoning market for lithium-ion batteries has stimulated a growing need for more reliable battery performance monitoring. Accurate state-of-health(SOH) estimation is critical for ensuring battery operational per...The burgeoning market for lithium-ion batteries has stimulated a growing need for more reliable battery performance monitoring. Accurate state-of-health(SOH) estimation is critical for ensuring battery operational performance. Despite numerous data-driven methods reported in existing research for battery SOH estimation, these methods often exhibit inconsistent performance across different application scenarios. To address this issue and overcome the performance limitations of individual data-driven models,integrating multiple models for SOH estimation has received considerable attention. Ensemble learning(EL) typically leverages the strengths of multiple base models to achieve more robust and accurate outputs. However, the lack of a clear review of current research hinders the further development of ensemble methods in SOH estimation. Therefore, this paper comprehensively reviews multi-model ensemble learning methods for battery SOH estimation. First, existing ensemble methods are systematically categorized into 6 classes based on their combination strategies. Different realizations and underlying connections are meticulously analyzed for each category of EL methods, highlighting distinctions, innovations, and typical applications. Subsequently, these ensemble methods are comprehensively compared in terms of base models, combination strategies, and publication trends. Evaluations across 6 dimensions underscore the outstanding performance of stacking-based ensemble methods. Following this, these ensemble methods are further inspected from the perspectives of weighted ensemble and diversity, aiming to inspire potential approaches for enhancing ensemble performance. Moreover, addressing challenges such as base model selection, measuring model robustness and uncertainty, and interpretability of ensemble models in practical applications is emphasized. Finally, future research prospects are outlined, specifically noting that deep learning ensemble is poised to advance ensemble methods for battery SOH estimation. The convergence of advanced machine learning with ensemble learning is anticipated to yield valuable avenues for research. Accelerated research in ensemble learning holds promising prospects for achieving more accurate and reliable battery SOH estimation under real-world conditions.展开更多
Ensemble learning,a pivotal branch of machine learning,amalgamates multiple base models to enhance the overarching performance of predictive models,capitalising on the diversity and collective wisdom of the ensemble t...Ensemble learning,a pivotal branch of machine learning,amalgamates multiple base models to enhance the overarching performance of predictive models,capitalising on the diversity and collective wisdom of the ensemble to surpass individual models and mitigate overfitting.In this review,a four-layer research framework is established for the research of ensemble learning,which can offer a comprehensive and structured review of ensemble learning from bottom to top.Firstly,this survey commences by introducing fundamental ensemble learning techniques,including bagging,boosting,and stacking,while also exploring the ensemble's diversity.Then,deep ensemble learning and semi-supervised ensemble learning are studied in detail.Furthermore,the utilisation of ensemble learning techniques to navigate challenging datasets,such as imbalanced and highdimensional data,is discussed.The application of ensemble learning techniques across various research domains,including healthcare,transportation,finance,manufacturing,and the Internet,is also examined.The survey concludes by discussing challenges intrinsic to ensemble learning.展开更多
This paper proposes a novel hybrid fraud detection framework that integrates multi-stage feature selection,unsupervised clustering,and ensemble learning to improve classification performance in financial transaction m...This paper proposes a novel hybrid fraud detection framework that integrates multi-stage feature selection,unsupervised clustering,and ensemble learning to improve classification performance in financial transaction monitoring systems.The framework is structured into three core layers:(1)feature selection using Recursive Feature Elimination(RFE),Principal Component Analysis(PCA),and Mutual Information(MI)to reduce dimensionality and enhance input relevance;(2)anomaly detection through unsupervised clustering using K-Means,Density-Based Spatial Clustering(DBSCAN),and Hierarchical Clustering to flag suspicious patterns in unlabeled data;and(3)final classification using a voting-based hybrid ensemble of Support Vector Machine(SVM),Random Forest(RF),and Gradient Boosting Classifier(GBC).The experimental evaluation is conducted on a synthetically generated dataset comprising one million financial transactions,with 5% labelled as fraudulent,simulating realistic fraud rates and behavioural features,including transaction time,origin,amount,and geo-location.The proposed model demonstrated a significant improvement over baseline classifiers,achieving an accuracy of 99%,a precision of 99%,a recall of 97%,and an F1-score of 99%.Compared to individual models,it yielded a 9% gain in overall detection accuracy.It reduced the false positive rate to below 3.5%,thereby minimising the operational costs associated with manually reviewing false alerts.The model’s interpretability is enhanced by the integration of Shapley Additive Explanations(SHAP)values for feature importance,supporting transparency and regulatory auditability.These results affirm the practical relevance of the proposed system for deployment in real-time fraud detection scenarios such as credit card transactions,mobile banking,and cross-border payments.The study also highlights future directions,including the deployment of lightweight models and the integration of multimodal data for scalable fraud analytics.展开更多
Identifying druggable proteins,which are capable of binding therapeutic compounds,remains a critical and resource-intensive challenge in drug discovery.To address this,we propose CEL-IDP(Comparison of Ensemble Learnin...Identifying druggable proteins,which are capable of binding therapeutic compounds,remains a critical and resource-intensive challenge in drug discovery.To address this,we propose CEL-IDP(Comparison of Ensemble Learning Methods for Identification of Druggable Proteins),a computational framework combining three feature extraction methods Dipeptide Deviation from Expected Mean(DDE),Enhanced Amino Acid Composition(EAAC),and Enhanced Grouped Amino Acid Composition(EGAAC)with ensemble learning strategies(Bagging,Boosting,Stacking)to classify druggable proteins from sequence data.DDE captures dipeptide frequency deviations,EAAC encodes positional amino acid information,and EGAAC groups residues by physicochemical properties to generate discriminative feature vectors.These features were analyzed using ensemble models to overcome the limitations of single classifiers.EGAAC outperformed DDE and EAAC,with Random Forest(Bagging)and XGBoost(Boosting)achieving the highest accuracy of 71.66%,demonstrating superior performance in capturing critical biochemical patterns.Stacking showed intermediate results(68.33%),while EAAC and DDE-based models yielded lower accuracies(56.66%–66.87%).CEL-IDP streamlines large-scale druggability prediction,reduces reliance on costly experimental screening,and aligns with global initiatives like Target 2035 to expand action-able drug targets.This work advances machine learning-driven drug discovery by systematizing feature engineering and ensemble model optimization,providing a scalable workflow to accelerate target identification and validation.展开更多
The biomass and coal co-pyrolysis (BCP) technology combines the advantages of both resources, achieving efficient resource complementarity, reducing reliance on coal, and minimizing pollutant emissions. However, this ...The biomass and coal co-pyrolysis (BCP) technology combines the advantages of both resources, achieving efficient resource complementarity, reducing reliance on coal, and minimizing pollutant emissions. However, this process still encounters numerous challenges in attaining optimal economic and environmental performance. Therefore, an ensemble learning (EL) framework is proposed for the BCP process in this study to optimize the synergistic benefits while minimizing negative environmental impacts. Six different ensemble learning models are developed to investigate the impact of input features, such as biomass characteristics, coal characteristics, and pyrolysis conditions on the product profit and CO_(2) emissions of the BCP processes. The Optuna method is further employed to automatically optimize the hyperparameters of BCP process models for enhancing their predictive accuracy and robustness. The results indicate that the categorical boosting (CAB) model of the BCP process has demonstrated exceptional performance in accurately predicting its product profit and CO_(2) emission (R2>0.92) after undergoing five-fold cross-validation. To enhance the interpretability of this preferred model, the Shapley additive explanations and partial dependence plot analyses are conducted to evaluate the impact and importance of biomass characteristics, coal characteristics, and pyrolysis conditions on the product profitability and CO_(2) emissions of the BCP processes. Finally, the preferred model coupled with a reference vector guided evolutionary algorithm is carried to identify the optimal conditions for maximizing the product profit of BCP process products while minimizing CO_(2) emissions. It indicates the optimal BCP process can achieve high product profits (5290.85 CNY·t−1) and low CO_(2) emissions (7.45 kg·t^(−1)).展开更多
With the rapid development of economy,air pollution caused by industrial expansion has caused serious harm to human health and social development.Therefore,establishing an effective air pollution concentration predict...With the rapid development of economy,air pollution caused by industrial expansion has caused serious harm to human health and social development.Therefore,establishing an effective air pollution concentration prediction system is of great scientific and practical significance for accurate and reliable predictions.This paper proposes a combination of pointinterval prediction system for pollutant concentration prediction by leveraging neural network,meta-heuristic optimization algorithm,and fuzzy theory.Fuzzy information granulation technology is used in data preprocessing to transform numerical sequences into fuzzy particles for comprehensive feature extraction.The golden Jackal optimization algorithm is employed in the optimization stage to fine-tune model hyperparameters.In the prediction stage,an ensemble learning method combines training results frommultiplemodels to obtain final point predictions while also utilizing quantile regression and kernel density estimation methods for interval predictions on the test set.Experimental results demonstrate that the combined model achieves a high goodness of fit coefficient of determination(R^(2))at 99.3% and a maximum difference between prediction accuracy mean absolute percentage error(MAPE)and benchmark model at 12.6%.This suggests that the integrated learning system proposed in this paper can provide more accurate deterministic predictions as well as reliable uncertainty analysis compared to traditionalmodels,offering practical reference for air quality early warning.展开更多
This study investigates the inundation depths of urban floods induced by real storm events,focusing on the development and assessment of super-resolution model based on ensemble learning methods.Unlike traditional dee...This study investigates the inundation depths of urban floods induced by real storm events,focusing on the development and assessment of super-resolution model based on ensemble learning methods.Unlike traditional deep neural networks which require extensive training and high parameterization,this study utilizes ensemble learning model to reconstruct high-resolution flood predictions from low-resolution hydrodynamic simulations.Hydrodynamic modeling results of real pluvial flood event at various spatial resolution are used for constructing datasets and for training and testing the point-based super-resolution model.Influencing factors related to urban terrain,subsurface,rainfall inputs and the hydrodynamic modeling results at coarser resolutions are used as features in the super-resolution model on basis of Random Forest,in which hyperparameters are tuned with Bayesian optimization method.The trained super-resolution models effectively reconstruct high-resolution inundation conditions from 30 m to 5 m coarse resolution inputs,highlighting an increase in correlation coefficients and a decrease in root mean squared error(RMSE)as resolution improves.Dominant influencing factors in the super-resolution models are identified together with variances in their contributions to the model performance.Two optimization approaches are applied to enhance accuracy and mitigate overestimation at coarse resolutions for the super-resolution models.The first integrates outputs from various coarse resolution models as features,notably reducing overestimation,especially with finer 5 m resolutions.The second employs ensemble modeling with super-resolution models from different datasets,which improves the performance across all tested resolutions,demonstrating the robustness of combining multiple predictive models for better flood forecasting in urban environments.展开更多
Accurately evaluating the safety status of lithium-ion battery systems in electric vehicles is imperative due to the challenges in effectively predicting potential battery failure risks under stochastic profiles.Compl...Accurately evaluating the safety status of lithium-ion battery systems in electric vehicles is imperative due to the challenges in effectively predicting potential battery failure risks under stochastic profiles.Complex battery fault mechanisms and limited poor-quality data collection impede fault detection for battery systems under real-world conditions.This paper proposes a novel graph-guided fault detection method designed to recognize concealed anomalies in realistic data.Graphs guided by physical relationships are constructed for learning the dynamic evolution of physical quantities under normal conditions and their potential change characteristics in fault scenarios.An ensemble Graph Sample and Aggregate Network model are developed to tackle sample distribution imbalances and non-uniformity battery system specifications across vehicles.Failure risk probabilities for diverse battery charging and discharging segments are derived.An ablation study verifies the necessity of ensemble learning in addressing imbalanced datasets.Analysis of 102,095 segments across 86 vehicles with different battery material systems,battery capacities,and numbers of cells and temperature sensors confirms the robustness and generalization of the proposed method,yielding a recall of 98.37%.By introducing the graph,spatio-temporal global fault characteristics of battery systems are automatically extracted.The coupling relationship and evolution of physical quantities under both normal and faulty states are established,effectively uncovering fault information hidden in collected battery data without observable anomalies.The safety state of battery systems is reflected in terms of failure risk probability,providing reliable data support for battery system maintenance.展开更多
Cloud computing(CC) provides infrastructure,storage services,and applications to the users that should be secured by some procedures or policies.Security in the cloud environment becomes essential to safeguard infrast...Cloud computing(CC) provides infrastructure,storage services,and applications to the users that should be secured by some procedures or policies.Security in the cloud environment becomes essential to safeguard infrastructure and user information from unauthorized access by implementing timely intrusion detection systems(IDS).Ensemble learning harnesses the collective power of multiple machine learning(ML) methods with feature selection(FS)process aids to progress the sturdiness and overall precision of intrusion detection.Therefore,this article presents a meta-heuristic feature selection by ensemble learning-based anomaly detection(MFS-ELAD)algorithm for the CC platforms.To realize this objective,the proposed approach utilizes a min-max standardization technique.Then,higher dimensionality features are decreased by Prairie Dogs Optimizer(PDO) algorithm.For the recognition procedure,the MFS-ELAD method emulates a group of 3 DL techniques such as sparse auto-encoder(SAE),stacked long short-term memory(SLSTM),and Elman neural network(ENN) algorithms.Eventually,the parameter fine-tuning of the DL algorithms occurs utilizing the sand cat swarm optimizer(SCSO) approach that helps in improving the recognition outcomes.The simulation examination of MFS-ELAD system on the CSE-CIC-IDS2018 dataset exhibits its promising performance across another method using a maximal precision of 99.71%.展开更多
To address uncertainties in satellite orbit error prediction,this study proposes a novel ensemble learning-based orbit prediction method specifically designed for the BeiDou navigation satellite system(BDS).Building o...To address uncertainties in satellite orbit error prediction,this study proposes a novel ensemble learning-based orbit prediction method specifically designed for the BeiDou navigation satellite system(BDS).Building on ephemeris data and perturbation corrections,two new models are proposed:attention-enhanced BPNN(AEBP)and Transformer-ResNet-BiLSTM(TR-BiLSTM).These models effectively capture both local and global dependencies in satellite orbit data.To further enhance prediction accuracy and stability,the outputs of these two models were integrated using the gradient boosting decision tree(GBDT)ensemble learning method,which was optimized through a grid search.The main contribution of this approach is the synergistic combination of deep learning models and GBDT,which significantly improves both the accuracy and robustness of satellite orbit predictions.This model was validated using broadcast ephemeris data from the BDS-3 MEO and inclined geosynchronous orbit(IGSO)satellites.The results show that the proposed method achieves an error correction rate of 65.4%.This ensemble learning-based approach offers a highly effective solution for high-precision and stable satellite orbit predictions.展开更多
The assessment of beach quality is an important prerequisite for beach development and serves as the foundation for coastal zone management and sustainable development.This topic has attracted widespread attention,and...The assessment of beach quality is an important prerequisite for beach development and serves as the foundation for coastal zone management and sustainable development.This topic has attracted widespread attention,and various evaluation systems have been established.Given that beach quality assessment(BQA)involves multidimensional and nonlinear indicators,machine learning methods are well-suited to handling complex data relationships.However,current research utilizing machine learning for BQA often faces challenges such as limited evaluation indicators and difficulties in obtaining relevant data.in this study,a machine learning-based model for beach quality evaluation is proposed to address the limitations of existing evaluation frameworks,particular-ly under conditions of data scarcity.Simulated data were generated,and the analytic hierarchy process was integrated to extract fea-tures from 21 beach evaluation factors.A comparative analysis was conducted using the following four machine learning models:de-cision tree,random forest,XGBoost,and MLP.Results indicate that XGBoost(mean squared error(MSE)=0.1825,weighted F1=0.7513)and MLP(Pearson coefficient=0.6053)outperform traditional models.Furthermore,an ensemble learning model combining XGBoost and MLP was developed,substantially improving predictive performance(reducing MSE to 0.0753,increasing the Pearson coefficient to 0.8002,and achieving an F1 score of 0.783).Validation using real data from Yangkou Beach demonstrated that the model maintained an accuracy of 58%even when 5–10 evaluation factors had randomly missing values.展开更多
Network Intrusion Detection System(NIDS)detection of minority class attacks is always a difficult task when dealing with attacks in complex network environments.To improve the detection capability of minority-class at...Network Intrusion Detection System(NIDS)detection of minority class attacks is always a difficult task when dealing with attacks in complex network environments.To improve the detection capability of minority-class attacks,this study proposes an intrusion detection method based on a two-layer structure.The first layer employs a CNN-BiLSTM model incorporating an attention mechanism to classify network traffic into normal traffic,majority class attacks,and merged minority class attacks.The second layer further segments the minority class attacks through Stacking ensemble learning.The datasets are selected from the generic network dataset CIC-IDS2017,NSL-KDD,and the industrial network dataset Mississippi Gas Pipeline dataset to enhance the generalization and practical applicability of the model.Experimental results show that the proposed model achieves an overall detection accuracy of 99%,99%,and 95%on the CIC-IDS2017,NSL-KDD,and industrial network datasets,respectively.It also significantly outperforms traditional methods in terms of detection accuracy and recall rate for minority class attacks.Compared with the single-layer deep learning model,the two-layer structure effectively reduces the false alarm rate while improving the minority-class attack detection performance.The research in this paper not only improves the adaptability of NIDS to complex network environments but also provides a new solution for minority-class attack detection in industrial network security.展开更多
A mortality prediction model based on small acute myocardial infarction(AMI)patients coherent with low death rate is established.In total,1639 AMI patients are selected as research objects who received treatment in se...A mortality prediction model based on small acute myocardial infarction(AMI)patients coherent with low death rate is established.In total,1639 AMI patients are selected as research objects who received treatment in seven tertiary and secondary hospitals in Shanghai between January 1,2016 and January 1,2018.Among them,72 patients deceased during the two-year follow-up.Models are established with ensemble learning framework and machine learning algorithms based on 51 physiological indicators of the patient.Shapley additive explanations algorithm and univariate test with point-biserial and phi correlation coefficients are employed to determine significant features and rank feature importance.Based on 5-fold cross validation experiment and external validation,prediction model with self-paced ensemble framework and random forest algorithm achieves the best performance with area under receiver operating characteristic curve(AUROC)score of 0.911 and recall of 0.864.Both feature ranking methods showed that ejection fractions,serum creatinine(admission),hemoglobin and Killip class are the most important features.With these top-ranked features,the simplified prediction model is capable of achieving a comparable result with AUROC score of 0.872 and recall of 0.818.This work proposes a new method to establish mortality prediction models for AMI patients based on self-paced ensemble framework,which allows models to achieve high performance with small scale of patients coherent with low death rate.It will assist in medical decision and prognosis as a new reference.展开更多
Self-powered neutron detectors(SPNDs)play a critical role in monitoring the safety margins and overall health of reactors,directly affecting safe operation within the reactor.In this work,a novel fault identification ...Self-powered neutron detectors(SPNDs)play a critical role in monitoring the safety margins and overall health of reactors,directly affecting safe operation within the reactor.In this work,a novel fault identification method based on graph convolutional networks(GCN)and Stacking ensemble learning is proposed for SPNDs.The GCN is employed to extract the spatial neighborhood information of SPNDs at different positions,and residuals are obtained by nonlinear fitting of SPND signals.In order to completely extract the time-varying features from residual sequences,the Stacking fusion model,integrated with various algorithms,is developed and enables the identification of five conditions for SPNDs:normal,drift,bias,precision degradation,and complete failure.The results demonstrate that the integration of diverse base-learners in the GCN-Stacking model exhibits advantages over a single model as well as enhances the stability and reliability in fault identification.Additionally,the GCN-Stacking model maintains higher accuracy in identifying faults at different reactor power levels.展开更多
Aimed at the issues of high feature dimensionality,excessive data redundancy,and low recognition accuracy of using single classifiers on ground-glass lung nodule recognition,a recognition method was proposed based on ...Aimed at the issues of high feature dimensionality,excessive data redundancy,and low recognition accuracy of using single classifiers on ground-glass lung nodule recognition,a recognition method was proposed based on CatBoost feature selection and Stacking ensemble learning.First,the method uses a feature selection algorithm to filter important features and remove features with less impact,achieving the effect of data dimensionality reduction.Second,random forests classifier,decision trees,K-nearest neighbor classifier,and light gradient boosting machine were used as base classifiers,and support vector machine was used as meta classifier to fuse and construct the ensemble learning model.This measure increases the accuracy of the classification model while maintaining the diversity of the base classifiers.The experimental results show that the recognition accuracy of the proposed method reaches 94.375%.Compared to the random forest algorithm with the best performance among single classifiers,the accuracy of the proposed method is increased by 1.875%.Compared to the recent deep learning methods(ResNet+GBM+Attention and MVCSNet)on ground-glass pulmonary nodule recognition,the proposed method’s performance is also better or comparative.Experiments show that the proposed model can effectively select features and make recognition on ground-glass pulmonary nodules.展开更多
Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are g...Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.展开更多
This paper adopts the NGI-ADP soil model to carry out finite element analysis,based on which the effects of soft clay anisotropy on the diaphragm wall deflections in the braced excavation were evaluated.More than one ...This paper adopts the NGI-ADP soil model to carry out finite element analysis,based on which the effects of soft clay anisotropy on the diaphragm wall deflections in the braced excavation were evaluated.More than one thousand finite element cases were numerically analyzed,followed by extensive parametric studies.Surrogate models were developed via ensemble learning methods(ELMs),including the e Xtreme Gradient Boosting(XGBoost),and Random Forest Regression(RFR)to predict the maximum lateral wall deformation(δhmax).Then the results of ELMs were compared with conventional soft computing methods such as Decision Tree Regression(DTR),Multilayer Perceptron Regression(MLPR),and Multivariate Adaptive Regression Splines(MARS).This study presents a cutting-edge application of ensemble learning in geotechnical engineering and a reasonable methodology that allows engineers to determine the wall deflection in a fast,alternative way.展开更多
With rapid development of blockchain technology,blockchain and its security theory research and practical application have become crucial.At present,a new DDoS attack has arisen,and it is the DDoS attack in blockchain...With rapid development of blockchain technology,blockchain and its security theory research and practical application have become crucial.At present,a new DDoS attack has arisen,and it is the DDoS attack in blockchain network.The attack is harmful for blockchain technology and many application scenarios.However,the traditional and existing DDoS attack detection and defense means mainly come from the centralized tactics and solution.Aiming at the above problem,the paper proposes the virtual reality parallel anti-DDoS chain design philosophy and distributed anti-D Chain detection framework based on hybrid ensemble learning.Here,Ada Boost and Random Forest are used as our ensemble learning strategy,and some different lightweight classifiers are integrated into the same ensemble learning algorithm,such as CART and ID3.Our detection framework in blockchain scene has much stronger generalization performance,universality and complementarity to identify accurately the onslaught features for DDoS attack in P2P network.Extensive experimental results confirm that our distributed heterogeneous anti-D chain detection method has better performance in six important indicators(such as Precision,Recall,F-Score,True Positive Rate,False Positive Rate,and ROC curve).展开更多
The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.Although previous studies have used machine learning algorithms to predict soybean yield base...The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.Although previous studies have used machine learning algorithms to predict soybean yield based on meteorological data,it is not clear how different models can be used to effectively separate soybean meteorological yield from soybean yield in various regions. In addition, comprehensively integrating the advantages of various machine learning algorithms to improve the prediction accuracy through ensemble learning algorithms has not been studied in depth. This study used and analyzed various daily meteorological data and soybean yield data from 173 county-level administrative regions and meteorological stations in two principal soybean planting areas in China(Northeast China and the Huang–Huai region), covering 34 years.Three effective machine learning algorithms(K-nearest neighbor, random forest, and support vector regression) were adopted as the base-models to establish a high-precision and highly-reliable soybean meteorological yield prediction model based on the stacking ensemble learning framework. The model's generalizability was further improved through 5-fold crossvalidation, and the model was optimized by principal component analysis and hyperparametric optimization. The accuracy of the model was evaluated by using the five-year sliding prediction and four regression indicators of the 173 counties, which showed that the stacking model has higher accuracy and stronger robustness. The 5-year sliding estimations of soybean yield based on the stacking model in 173 counties showed that the prediction effect can reflect the spatiotemporal distribution of soybean yield in detail, and the mean absolute percentage error(MAPE) was less than 5%. The stacking prediction model of soybean meteorological yield provides a new approach for accurately predicting soybean yield.展开更多
BACKGROUND Endoscopy artifacts are widespread in real capsule endoscopy(CE)images but not in high-quality standard datasets.AIM To improve the segmentation performance of polyps from CE images with artifacts based on ...BACKGROUND Endoscopy artifacts are widespread in real capsule endoscopy(CE)images but not in high-quality standard datasets.AIM To improve the segmentation performance of polyps from CE images with artifacts based on ensemble learning.METHODS We collected 277 polyp images with CE artifacts from 5760 h of videos from 480 patients at Guangzhou First People’s Hospital from January 2016 to December 2019.Two public high-quality standard external datasets were retrieved and used for the comparison experiments.For each dataset,we randomly segmented the data into training,validation,and testing sets for model training,selection,and testing.We compared the performance of the base models and the ensemble model in segmenting polyps from images with artifacts.RESULTS The performance of the semantic segmentation model was affected by artifacts in the sample images,which also affected the results of polyp detection by CE using a single model.The evaluation based on real datasets with artifacts and standard datasets showed that the ensemble model of all state-of-the-art models performed better than the best corresponding base learner on the real dataset with artifacts.Compared with the corresponding optimal base learners,the intersection over union(IoU)and dice of the ensemble learning model increased to different degrees,ranging from 0.08%to 7.01%and 0.61%to 4.93%,respectively.Moreover,in the standard datasets without artifacts,most of the ensemble models were slightly better than the base learner,as demonstrated by the IoU and dice increases ranging from-0.28%to 1.20%and-0.61%to 0.76%,respectively.CONCLUSION Ensemble learning can improve the segmentation accuracy of polyps from CE images with artifacts.Our results demonstrated an improvement in the detection rate of polyps with interference from artifacts.展开更多
基金National Natural Science Foundation of China (52075420)Fundamental Research Funds for the Central Universities (xzy022023049)National Key Research and Development Program of China (2023YFB3408600)。
文摘The burgeoning market for lithium-ion batteries has stimulated a growing need for more reliable battery performance monitoring. Accurate state-of-health(SOH) estimation is critical for ensuring battery operational performance. Despite numerous data-driven methods reported in existing research for battery SOH estimation, these methods often exhibit inconsistent performance across different application scenarios. To address this issue and overcome the performance limitations of individual data-driven models,integrating multiple models for SOH estimation has received considerable attention. Ensemble learning(EL) typically leverages the strengths of multiple base models to achieve more robust and accurate outputs. However, the lack of a clear review of current research hinders the further development of ensemble methods in SOH estimation. Therefore, this paper comprehensively reviews multi-model ensemble learning methods for battery SOH estimation. First, existing ensemble methods are systematically categorized into 6 classes based on their combination strategies. Different realizations and underlying connections are meticulously analyzed for each category of EL methods, highlighting distinctions, innovations, and typical applications. Subsequently, these ensemble methods are comprehensively compared in terms of base models, combination strategies, and publication trends. Evaluations across 6 dimensions underscore the outstanding performance of stacking-based ensemble methods. Following this, these ensemble methods are further inspected from the perspectives of weighted ensemble and diversity, aiming to inspire potential approaches for enhancing ensemble performance. Moreover, addressing challenges such as base model selection, measuring model robustness and uncertainty, and interpretability of ensemble models in practical applications is emphasized. Finally, future research prospects are outlined, specifically noting that deep learning ensemble is poised to advance ensemble methods for battery SOH estimation. The convergence of advanced machine learning with ensemble learning is anticipated to yield valuable avenues for research. Accelerated research in ensemble learning holds promising prospects for achieving more accurate and reliable battery SOH estimation under real-world conditions.
基金supported in part by National Natural Science Foundation of China No.92467109,U21A20478National Key R&D Program of China 2023YFA1011601the Major Key Project of PCL(Grant PCL2024A05).
文摘Ensemble learning,a pivotal branch of machine learning,amalgamates multiple base models to enhance the overarching performance of predictive models,capitalising on the diversity and collective wisdom of the ensemble to surpass individual models and mitigate overfitting.In this review,a four-layer research framework is established for the research of ensemble learning,which can offer a comprehensive and structured review of ensemble learning from bottom to top.Firstly,this survey commences by introducing fundamental ensemble learning techniques,including bagging,boosting,and stacking,while also exploring the ensemble's diversity.Then,deep ensemble learning and semi-supervised ensemble learning are studied in detail.Furthermore,the utilisation of ensemble learning techniques to navigate challenging datasets,such as imbalanced and highdimensional data,is discussed.The application of ensemble learning techniques across various research domains,including healthcare,transportation,finance,manufacturing,and the Internet,is also examined.The survey concludes by discussing challenges intrinsic to ensemble learning.
基金funded by the Deanship of Scientific Research,Vice Presidency for Graduate Studies and Scientific Research,King Faisal University,Saudi Arabia[Grant No.KFU241683].
文摘This paper proposes a novel hybrid fraud detection framework that integrates multi-stage feature selection,unsupervised clustering,and ensemble learning to improve classification performance in financial transaction monitoring systems.The framework is structured into three core layers:(1)feature selection using Recursive Feature Elimination(RFE),Principal Component Analysis(PCA),and Mutual Information(MI)to reduce dimensionality and enhance input relevance;(2)anomaly detection through unsupervised clustering using K-Means,Density-Based Spatial Clustering(DBSCAN),and Hierarchical Clustering to flag suspicious patterns in unlabeled data;and(3)final classification using a voting-based hybrid ensemble of Support Vector Machine(SVM),Random Forest(RF),and Gradient Boosting Classifier(GBC).The experimental evaluation is conducted on a synthetically generated dataset comprising one million financial transactions,with 5% labelled as fraudulent,simulating realistic fraud rates and behavioural features,including transaction time,origin,amount,and geo-location.The proposed model demonstrated a significant improvement over baseline classifiers,achieving an accuracy of 99%,a precision of 99%,a recall of 97%,and an F1-score of 99%.Compared to individual models,it yielded a 9% gain in overall detection accuracy.It reduced the false positive rate to below 3.5%,thereby minimising the operational costs associated with manually reviewing false alerts.The model’s interpretability is enhanced by the integration of Shapley Additive Explanations(SHAP)values for feature importance,supporting transparency and regulatory auditability.These results affirm the practical relevance of the proposed system for deployment in real-time fraud detection scenarios such as credit card transactions,mobile banking,and cross-border payments.The study also highlights future directions,including the deployment of lightweight models and the integration of multimodal data for scalable fraud analytics.
基金supported by the MSIT(Ministry of Science and ICT),Korea,under the ITRC(Information Technology Research Centre)support program(IITP-2024-RS-2024-00437191)supervised by the IITP(Institute for Information&Communications Technology Planning&Evaluation).
文摘Identifying druggable proteins,which are capable of binding therapeutic compounds,remains a critical and resource-intensive challenge in drug discovery.To address this,we propose CEL-IDP(Comparison of Ensemble Learning Methods for Identification of Druggable Proteins),a computational framework combining three feature extraction methods Dipeptide Deviation from Expected Mean(DDE),Enhanced Amino Acid Composition(EAAC),and Enhanced Grouped Amino Acid Composition(EGAAC)with ensemble learning strategies(Bagging,Boosting,Stacking)to classify druggable proteins from sequence data.DDE captures dipeptide frequency deviations,EAAC encodes positional amino acid information,and EGAAC groups residues by physicochemical properties to generate discriminative feature vectors.These features were analyzed using ensemble models to overcome the limitations of single classifiers.EGAAC outperformed DDE and EAAC,with Random Forest(Bagging)and XGBoost(Boosting)achieving the highest accuracy of 71.66%,demonstrating superior performance in capturing critical biochemical patterns.Stacking showed intermediate results(68.33%),while EAAC and DDE-based models yielded lower accuracies(56.66%–66.87%).CEL-IDP streamlines large-scale druggability prediction,reduces reliance on costly experimental screening,and aligns with global initiatives like Target 2035 to expand action-able drug targets.This work advances machine learning-driven drug discovery by systematizing feature engineering and ensemble model optimization,providing a scalable workflow to accelerate target identification and validation.
基金support from the National Natural Science Foundation of China(22108052).
文摘The biomass and coal co-pyrolysis (BCP) technology combines the advantages of both resources, achieving efficient resource complementarity, reducing reliance on coal, and minimizing pollutant emissions. However, this process still encounters numerous challenges in attaining optimal economic and environmental performance. Therefore, an ensemble learning (EL) framework is proposed for the BCP process in this study to optimize the synergistic benefits while minimizing negative environmental impacts. Six different ensemble learning models are developed to investigate the impact of input features, such as biomass characteristics, coal characteristics, and pyrolysis conditions on the product profit and CO_(2) emissions of the BCP processes. The Optuna method is further employed to automatically optimize the hyperparameters of BCP process models for enhancing their predictive accuracy and robustness. The results indicate that the categorical boosting (CAB) model of the BCP process has demonstrated exceptional performance in accurately predicting its product profit and CO_(2) emission (R2>0.92) after undergoing five-fold cross-validation. To enhance the interpretability of this preferred model, the Shapley additive explanations and partial dependence plot analyses are conducted to evaluate the impact and importance of biomass characteristics, coal characteristics, and pyrolysis conditions on the product profitability and CO_(2) emissions of the BCP processes. Finally, the preferred model coupled with a reference vector guided evolutionary algorithm is carried to identify the optimal conditions for maximizing the product profit of BCP process products while minimizing CO_(2) emissions. It indicates the optimal BCP process can achieve high product profits (5290.85 CNY·t−1) and low CO_(2) emissions (7.45 kg·t^(−1)).
基金supported by General Scientific Research Funding of the Science and Technology Development Fund(FDCT)in Macao(No.0150/2022/A)the Faculty Research Grants of Macao University of Science and Technology(No.FRG-22-074-FIE).
文摘With the rapid development of economy,air pollution caused by industrial expansion has caused serious harm to human health and social development.Therefore,establishing an effective air pollution concentration prediction system is of great scientific and practical significance for accurate and reliable predictions.This paper proposes a combination of pointinterval prediction system for pollutant concentration prediction by leveraging neural network,meta-heuristic optimization algorithm,and fuzzy theory.Fuzzy information granulation technology is used in data preprocessing to transform numerical sequences into fuzzy particles for comprehensive feature extraction.The golden Jackal optimization algorithm is employed in the optimization stage to fine-tune model hyperparameters.In the prediction stage,an ensemble learning method combines training results frommultiplemodels to obtain final point predictions while also utilizing quantile regression and kernel density estimation methods for interval predictions on the test set.Experimental results demonstrate that the combined model achieves a high goodness of fit coefficient of determination(R^(2))at 99.3% and a maximum difference between prediction accuracy mean absolute percentage error(MAPE)and benchmark model at 12.6%.This suggests that the integrated learning system proposed in this paper can provide more accurate deterministic predictions as well as reliable uncertainty analysis compared to traditionalmodels,offering practical reference for air quality early warning.
基金supported by the National Natural Science Foundation of China(Grant Nos.42201026,52201325 and 42407619)the Startup Foundation for Introducing Talent of NUIST(Grant No.2023r009).
文摘This study investigates the inundation depths of urban floods induced by real storm events,focusing on the development and assessment of super-resolution model based on ensemble learning methods.Unlike traditional deep neural networks which require extensive training and high parameterization,this study utilizes ensemble learning model to reconstruct high-resolution flood predictions from low-resolution hydrodynamic simulations.Hydrodynamic modeling results of real pluvial flood event at various spatial resolution are used for constructing datasets and for training and testing the point-based super-resolution model.Influencing factors related to urban terrain,subsurface,rainfall inputs and the hydrodynamic modeling results at coarser resolutions are used as features in the super-resolution model on basis of Random Forest,in which hyperparameters are tuned with Bayesian optimization method.The trained super-resolution models effectively reconstruct high-resolution inundation conditions from 30 m to 5 m coarse resolution inputs,highlighting an increase in correlation coefficients and a decrease in root mean squared error(RMSE)as resolution improves.Dominant influencing factors in the super-resolution models are identified together with variances in their contributions to the model performance.Two optimization approaches are applied to enhance accuracy and mitigate overestimation at coarse resolutions for the super-resolution models.The first integrates outputs from various coarse resolution models as features,notably reducing overestimation,especially with finer 5 m resolutions.The second employs ensemble modeling with super-resolution models from different datasets,which improves the performance across all tested resolutions,demonstrating the robustness of combining multiple predictive models for better flood forecasting in urban environments.
基金funded by the National Natural Science Foundation of China(Grant No.52222708)。
文摘Accurately evaluating the safety status of lithium-ion battery systems in electric vehicles is imperative due to the challenges in effectively predicting potential battery failure risks under stochastic profiles.Complex battery fault mechanisms and limited poor-quality data collection impede fault detection for battery systems under real-world conditions.This paper proposes a novel graph-guided fault detection method designed to recognize concealed anomalies in realistic data.Graphs guided by physical relationships are constructed for learning the dynamic evolution of physical quantities under normal conditions and their potential change characteristics in fault scenarios.An ensemble Graph Sample and Aggregate Network model are developed to tackle sample distribution imbalances and non-uniformity battery system specifications across vehicles.Failure risk probabilities for diverse battery charging and discharging segments are derived.An ablation study verifies the necessity of ensemble learning in addressing imbalanced datasets.Analysis of 102,095 segments across 86 vehicles with different battery material systems,battery capacities,and numbers of cells and temperature sensors confirms the robustness and generalization of the proposed method,yielding a recall of 98.37%.By introducing the graph,spatio-temporal global fault characteristics of battery systems are automatically extracted.The coupling relationship and evolution of physical quantities under both normal and faulty states are established,effectively uncovering fault information hidden in collected battery data without observable anomalies.The safety state of battery systems is reflected in terms of failure risk probability,providing reliable data support for battery system maintenance.
文摘Cloud computing(CC) provides infrastructure,storage services,and applications to the users that should be secured by some procedures or policies.Security in the cloud environment becomes essential to safeguard infrastructure and user information from unauthorized access by implementing timely intrusion detection systems(IDS).Ensemble learning harnesses the collective power of multiple machine learning(ML) methods with feature selection(FS)process aids to progress the sturdiness and overall precision of intrusion detection.Therefore,this article presents a meta-heuristic feature selection by ensemble learning-based anomaly detection(MFS-ELAD)algorithm for the CC platforms.To realize this objective,the proposed approach utilizes a min-max standardization technique.Then,higher dimensionality features are decreased by Prairie Dogs Optimizer(PDO) algorithm.For the recognition procedure,the MFS-ELAD method emulates a group of 3 DL techniques such as sparse auto-encoder(SAE),stacked long short-term memory(SLSTM),and Elman neural network(ENN) algorithms.Eventually,the parameter fine-tuning of the DL algorithms occurs utilizing the sand cat swarm optimizer(SCSO) approach that helps in improving the recognition outcomes.The simulation examination of MFS-ELAD system on the CSE-CIC-IDS2018 dataset exhibits its promising performance across another method using a maximal precision of 99.71%.
基金funded by the Strategic Priority Research Program of the Chinese Academy of Sciences(Grant No.XDA28040300)Project for Guangxi Science and Technology Base,and Talents(Grant No.GK AD22035957)+1 种基金the Informatization Plan of the Chinese Academy of Sciences(Grant No.CAS-WX2021SF-0304)the West Light Foundation of the ChineseAcademy of Sciences(Grant No.XAB2021YN19).
文摘To address uncertainties in satellite orbit error prediction,this study proposes a novel ensemble learning-based orbit prediction method specifically designed for the BeiDou navigation satellite system(BDS).Building on ephemeris data and perturbation corrections,two new models are proposed:attention-enhanced BPNN(AEBP)and Transformer-ResNet-BiLSTM(TR-BiLSTM).These models effectively capture both local and global dependencies in satellite orbit data.To further enhance prediction accuracy and stability,the outputs of these two models were integrated using the gradient boosting decision tree(GBDT)ensemble learning method,which was optimized through a grid search.The main contribution of this approach is the synergistic combination of deep learning models and GBDT,which significantly improves both the accuracy and robustness of satellite orbit predictions.This model was validated using broadcast ephemeris data from the BDS-3 MEO and inclined geosynchronous orbit(IGSO)satellites.The results show that the proposed method achieves an error correction rate of 65.4%.This ensemble learning-based approach offers a highly effective solution for high-precision and stable satellite orbit predictions.
基金supported by the National Natural Science Foundation of China(Nos.82202299,62203060,62403492).
文摘The assessment of beach quality is an important prerequisite for beach development and serves as the foundation for coastal zone management and sustainable development.This topic has attracted widespread attention,and various evaluation systems have been established.Given that beach quality assessment(BQA)involves multidimensional and nonlinear indicators,machine learning methods are well-suited to handling complex data relationships.However,current research utilizing machine learning for BQA often faces challenges such as limited evaluation indicators and difficulties in obtaining relevant data.in this study,a machine learning-based model for beach quality evaluation is proposed to address the limitations of existing evaluation frameworks,particular-ly under conditions of data scarcity.Simulated data were generated,and the analytic hierarchy process was integrated to extract fea-tures from 21 beach evaluation factors.A comparative analysis was conducted using the following four machine learning models:de-cision tree,random forest,XGBoost,and MLP.Results indicate that XGBoost(mean squared error(MSE)=0.1825,weighted F1=0.7513)and MLP(Pearson coefficient=0.6053)outperform traditional models.Furthermore,an ensemble learning model combining XGBoost and MLP was developed,substantially improving predictive performance(reducing MSE to 0.0753,increasing the Pearson coefficient to 0.8002,and achieving an F1 score of 0.783).Validation using real data from Yangkou Beach demonstrated that the model maintained an accuracy of 58%even when 5–10 evaluation factors had randomly missing values.
基金supported by the Institute of Information&Communications Technology Planning&Evaluation(IITP)—Innovative Human Resource Development for Local Intellectualization program grant funded by the Korea government(MSIT)(IITP-2025-RS-2022-00156334)in part by Liaoning Province Nature Fund Project(2024-BSLH-214).
文摘Network Intrusion Detection System(NIDS)detection of minority class attacks is always a difficult task when dealing with attacks in complex network environments.To improve the detection capability of minority-class attacks,this study proposes an intrusion detection method based on a two-layer structure.The first layer employs a CNN-BiLSTM model incorporating an attention mechanism to classify network traffic into normal traffic,majority class attacks,and merged minority class attacks.The second layer further segments the minority class attacks through Stacking ensemble learning.The datasets are selected from the generic network dataset CIC-IDS2017,NSL-KDD,and the industrial network dataset Mississippi Gas Pipeline dataset to enhance the generalization and practical applicability of the model.Experimental results show that the proposed model achieves an overall detection accuracy of 99%,99%,and 95%on the CIC-IDS2017,NSL-KDD,and industrial network datasets,respectively.It also significantly outperforms traditional methods in terms of detection accuracy and recall rate for minority class attacks.Compared with the single-layer deep learning model,the two-layer structure effectively reduces the false alarm rate while improving the minority-class attack detection performance.The research in this paper not only improves the adaptability of NIDS to complex network environments but also provides a new solution for minority-class attack detection in industrial network security.
基金the National Natural Science Foundation of China(No.81900308)。
文摘A mortality prediction model based on small acute myocardial infarction(AMI)patients coherent with low death rate is established.In total,1639 AMI patients are selected as research objects who received treatment in seven tertiary and secondary hospitals in Shanghai between January 1,2016 and January 1,2018.Among them,72 patients deceased during the two-year follow-up.Models are established with ensemble learning framework and machine learning algorithms based on 51 physiological indicators of the patient.Shapley additive explanations algorithm and univariate test with point-biserial and phi correlation coefficients are employed to determine significant features and rank feature importance.Based on 5-fold cross validation experiment and external validation,prediction model with self-paced ensemble framework and random forest algorithm achieves the best performance with area under receiver operating characteristic curve(AUROC)score of 0.911 and recall of 0.864.Both feature ranking methods showed that ejection fractions,serum creatinine(admission),hemoglobin and Killip class are the most important features.With these top-ranked features,the simplified prediction model is capable of achieving a comparable result with AUROC score of 0.872 and recall of 0.818.This work proposes a new method to establish mortality prediction models for AMI patients based on self-paced ensemble framework,which allows models to achieve high performance with small scale of patients coherent with low death rate.It will assist in medical decision and prognosis as a new reference.
基金the Industry-University Cooperation Project in Fujian Province University(No.2022H6020)。
文摘Self-powered neutron detectors(SPNDs)play a critical role in monitoring the safety margins and overall health of reactors,directly affecting safe operation within the reactor.In this work,a novel fault identification method based on graph convolutional networks(GCN)and Stacking ensemble learning is proposed for SPNDs.The GCN is employed to extract the spatial neighborhood information of SPNDs at different positions,and residuals are obtained by nonlinear fitting of SPND signals.In order to completely extract the time-varying features from residual sequences,the Stacking fusion model,integrated with various algorithms,is developed and enables the identification of five conditions for SPNDs:normal,drift,bias,precision degradation,and complete failure.The results demonstrate that the integration of diverse base-learners in the GCN-Stacking model exhibits advantages over a single model as well as enhances the stability and reliability in fault identification.Additionally,the GCN-Stacking model maintains higher accuracy in identifying faults at different reactor power levels.
基金the National Natural Science Foundation of China(No.62271466)the Natural Science Foundation of Beijing(No.4202025)+1 种基金the Tianjin IoT Technology Enterprise Key Laboratory Research Project(No.VTJ-OT20230209-2)the Guizhou Provincial Sci-Tech Project(No.ZK[2022]-012)。
文摘Aimed at the issues of high feature dimensionality,excessive data redundancy,and low recognition accuracy of using single classifiers on ground-glass lung nodule recognition,a recognition method was proposed based on CatBoost feature selection and Stacking ensemble learning.First,the method uses a feature selection algorithm to filter important features and remove features with less impact,achieving the effect of data dimensionality reduction.Second,random forests classifier,decision trees,K-nearest neighbor classifier,and light gradient boosting machine were used as base classifiers,and support vector machine was used as meta classifier to fuse and construct the ensemble learning model.This measure increases the accuracy of the classification model while maintaining the diversity of the base classifiers.The experimental results show that the recognition accuracy of the proposed method reaches 94.375%.Compared to the random forest algorithm with the best performance among single classifiers,the accuracy of the proposed method is increased by 1.875%.Compared to the recent deep learning methods(ResNet+GBM+Attention and MVCSNet)on ground-glass pulmonary nodule recognition,the proposed method’s performance is also better or comparative.Experiments show that the proposed model can effectively select features and make recognition on ground-glass pulmonary nodules.
基金funded by the National Natural Science Foundation of China(Grant No.41941019)the State Key Laboratory of Hydroscience and Engineering(Grant No.2019-KY-03)。
文摘Real-time prediction of the rock mass class in front of the tunnel face is essential for the adaptive adjustment of tunnel boring machines(TBMs).During the TBM tunnelling process,a large number of operation data are generated,reflecting the interaction between the TBM system and surrounding rock,and these data can be used to evaluate the rock mass quality.This study proposed a stacking ensemble classifier for the real-time prediction of the rock mass classification using TBM operation data.Based on the Songhua River water conveyance project,a total of 7538 TBM tunnelling cycles and the corresponding rock mass classes are obtained after data preprocessing.Then,through the tree-based feature selection method,10 key TBM operation parameters are selected,and the mean values of the 10 selected features in the stable phase after removing outliers are calculated as the inputs of classifiers.The preprocessed data are randomly divided into the training set(90%)and test set(10%)using simple random sampling.Besides stacking ensemble classifier,seven individual classifiers are established as the comparison.These classifiers include support vector machine(SVM),k-nearest neighbors(KNN),random forest(RF),gradient boosting decision tree(GBDT),decision tree(DT),logistic regression(LR)and multilayer perceptron(MLP),where the hyper-parameters of each classifier are optimised using the grid search method.The prediction results show that the stacking ensemble classifier has a better performance than individual classifiers,and it shows a more powerful learning and generalisation ability for small and imbalanced samples.Additionally,a relative balance training set is obtained by the synthetic minority oversampling technique(SMOTE),and the influence of sample imbalance on the prediction performance is discussed.
基金supported by the High-end Foreign Expert Introduction program(No.G20190022002)Chongqing Construction Science and Technology Plan Project(2019-0045)+1 种基金the Science and Technology Research Program of Chongqing Municipal Education Commission(Grant No.KJZD-K201900102)The financial support is gratefully acknowledged。
文摘This paper adopts the NGI-ADP soil model to carry out finite element analysis,based on which the effects of soft clay anisotropy on the diaphragm wall deflections in the braced excavation were evaluated.More than one thousand finite element cases were numerically analyzed,followed by extensive parametric studies.Surrogate models were developed via ensemble learning methods(ELMs),including the e Xtreme Gradient Boosting(XGBoost),and Random Forest Regression(RFR)to predict the maximum lateral wall deformation(δhmax).Then the results of ELMs were compared with conventional soft computing methods such as Decision Tree Regression(DTR),Multilayer Perceptron Regression(MLPR),and Multivariate Adaptive Regression Splines(MARS).This study presents a cutting-edge application of ensemble learning in geotechnical engineering and a reasonable methodology that allows engineers to determine the wall deflection in a fast,alternative way.
基金performed in the Project“Cloud Interaction Technology and Service Platform for Mine Internet of things”supported by National Key Research and Development Program of China(2017YFC0804406)+1 种基金partly supported by the Project“Massive DDoS Attack Traffic Detection Technology Research based on Big Data and Cloud Environment”supported by Scientific Research Foundation of Shandong University of Science and Technology for Recruited Talents(0104060511314)。
文摘With rapid development of blockchain technology,blockchain and its security theory research and practical application have become crucial.At present,a new DDoS attack has arisen,and it is the DDoS attack in blockchain network.The attack is harmful for blockchain technology and many application scenarios.However,the traditional and existing DDoS attack detection and defense means mainly come from the centralized tactics and solution.Aiming at the above problem,the paper proposes the virtual reality parallel anti-DDoS chain design philosophy and distributed anti-D Chain detection framework based on hybrid ensemble learning.Here,Ada Boost and Random Forest are used as our ensemble learning strategy,and some different lightweight classifiers are integrated into the same ensemble learning algorithm,such as CART and ID3.Our detection framework in blockchain scene has much stronger generalization performance,universality and complementarity to identify accurately the onslaught features for DDoS attack in P2P network.Extensive experimental results confirm that our distributed heterogeneous anti-D chain detection method has better performance in six important indicators(such as Precision,Recall,F-Score,True Positive Rate,False Positive Rate,and ROC curve).
基金supported by the Science and Technology Innovation Project of Chinese Academy of Agricultural Sciences (CAAS-ASTIP-2016-AII)。
文摘The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning.Although previous studies have used machine learning algorithms to predict soybean yield based on meteorological data,it is not clear how different models can be used to effectively separate soybean meteorological yield from soybean yield in various regions. In addition, comprehensively integrating the advantages of various machine learning algorithms to improve the prediction accuracy through ensemble learning algorithms has not been studied in depth. This study used and analyzed various daily meteorological data and soybean yield data from 173 county-level administrative regions and meteorological stations in two principal soybean planting areas in China(Northeast China and the Huang–Huai region), covering 34 years.Three effective machine learning algorithms(K-nearest neighbor, random forest, and support vector regression) were adopted as the base-models to establish a high-precision and highly-reliable soybean meteorological yield prediction model based on the stacking ensemble learning framework. The model's generalizability was further improved through 5-fold crossvalidation, and the model was optimized by principal component analysis and hyperparametric optimization. The accuracy of the model was evaluated by using the five-year sliding prediction and four regression indicators of the 173 counties, which showed that the stacking model has higher accuracy and stronger robustness. The 5-year sliding estimations of soybean yield based on the stacking model in 173 counties showed that the prediction effect can reflect the spatiotemporal distribution of soybean yield in detail, and the mean absolute percentage error(MAPE) was less than 5%. The stacking prediction model of soybean meteorological yield provides a new approach for accurately predicting soybean yield.
文摘BACKGROUND Endoscopy artifacts are widespread in real capsule endoscopy(CE)images but not in high-quality standard datasets.AIM To improve the segmentation performance of polyps from CE images with artifacts based on ensemble learning.METHODS We collected 277 polyp images with CE artifacts from 5760 h of videos from 480 patients at Guangzhou First People’s Hospital from January 2016 to December 2019.Two public high-quality standard external datasets were retrieved and used for the comparison experiments.For each dataset,we randomly segmented the data into training,validation,and testing sets for model training,selection,and testing.We compared the performance of the base models and the ensemble model in segmenting polyps from images with artifacts.RESULTS The performance of the semantic segmentation model was affected by artifacts in the sample images,which also affected the results of polyp detection by CE using a single model.The evaluation based on real datasets with artifacts and standard datasets showed that the ensemble model of all state-of-the-art models performed better than the best corresponding base learner on the real dataset with artifacts.Compared with the corresponding optimal base learners,the intersection over union(IoU)and dice of the ensemble learning model increased to different degrees,ranging from 0.08%to 7.01%and 0.61%to 4.93%,respectively.Moreover,in the standard datasets without artifacts,most of the ensemble models were slightly better than the base learner,as demonstrated by the IoU and dice increases ranging from-0.28%to 1.20%and-0.61%to 0.76%,respectively.CONCLUSION Ensemble learning can improve the segmentation accuracy of polyps from CE images with artifacts.Our results demonstrated an improvement in the detection rate of polyps with interference from artifacts.