Aiming at the problem of insufficient prediction accuracy of strip flatness at the outlet of cold tandem rolling,the prediction performance of strip flatness based on different ensemble methods was studied and a high-...Aiming at the problem of insufficient prediction accuracy of strip flatness at the outlet of cold tandem rolling,the prediction performance of strip flatness based on different ensemble methods was studied and a high-precision prediction ensemble model of strip flatness at the outlet was established.Firstly,based on linear regression(LR),K nearest neighbors(KNN),support vector regression,regression trees(RT),and backpropagation neural network(BPN),bagging,boosting,and stacking ensemble methods were used for ensemble experiments.Secondly,three existing ensemble models,i.e.,random forest,extreme random tree(ET)and extreme gradient boosting,were used to conduct experiments and compare the results.The research shows that bagging,boosting,and stacking three ensemble methods have the most significant improvement in the prediction accuracy of the regression trees model,which is increased by 5.28%,6.51%,and 5.32%,respectively.At the same time,the stacking ensemble method improves both the simple model and the complex model,and the improvement effect on the simple base model is the greatest,which is 4.69%higher than that of the base model KNN.Comparing all of the ensemble models,the stacking ensemble model of level-1(ET,AdaBoost-RT,LR,BPN)paired with level-2(LR)was discovered to be the best model(EALB-LR)and can be further studied for industrial applications.展开更多
The approach of getting useful information of monthly dynamical prediction from ensemble forecasts is studied. The extended range ensemble forecasts (8 members, the initial perturbations of the lagged average forecast...The approach of getting useful information of monthly dynamical prediction from ensemble forecasts is studied. The extended range ensemble forecasts (8 members, the initial perturbations of the lagged average forecast (LAF)(0000, 0600, 1200 and 1800 GMT in two consecutive days) of the 500 hPa height field with the global spectral model (T63L16) from January to May 1997 are provided by the National Climate Center of China. The relationship between the spread of ensemble measured by root–mean–square deviation of ensemble member from ensemble mean and forecast skill (the anomaly correlation or the root–mean–square distance between the ensemble mean forecast and the observation) is significant. The spread of ensemble can evaluate the useful forecast days N for the best estimate of 30 days mean. Thus, a weighted mean approach based on ensemble spread is put forward for monthly dynamical prediction. The anomaly correlation of the weighted monthly mean by the ensemble spread is higher than that of both the arithmetic mean and the linear weighted mean. Better results of the monthly mean circulation and anomaly are obtained from the ensemble spread weighted mean. Key words Monthly prediction - Ensemble method - Spread of ensemble Supported by the Excellent National State Key Laboratory Project (49823002), the National Key Project ‘Study on Chinese Short-Term Climate Forecast System’ (96-908-02) and IAP Innovation Foundation (8-1308).The data were provided through the National Climate Center of China. The authors wish to thank Ms. Chen Lijuan for her assistance.展开更多
Selecting which explanatory variables to include in a given score is a common difficulty, as a balance must be found between statistical fit and practical application. This article presents a methodology for construct...Selecting which explanatory variables to include in a given score is a common difficulty, as a balance must be found between statistical fit and practical application. This article presents a methodology for constructing parsimonious event risk scores combining a stepwise selection of variables with ensemble scores obtained by aggregation of several scores, using several classifiers, bootstrap samples and various modalities of random selection of variables. Selection methods based on a probabilistic model can be used to achieve a stepwise selection for a given classifier such as logistic regression, but not directly for an ensemble classifier constructed by aggregation of several classifiers. Three selection methods are proposed in this framework, two involving a backward selection of the variables based on their coefficients in an ensemble score and the third involving a forward selection of the variables maximizing the AUC. The stepwise selection allows constructing a succession of scores, with the practitioner able to choose which score best fits his needs. These three methods are compared in an application to construct parsimonious short-term event risk scores in chronic HF patients, using as event the composite endpoint of death or hospitalization for worsening HF within 180 days of a visit. Focusing on the fastest method, four scores are constructed, yielding out-of-bag AUCs ranging from 0.81 (26 variables) to 0.76 (2 variables).展开更多
In recent years,deep learning(DL)models have achieved signifcant progress in many domains,such as autonomous driving,facial recognition,and speech recognition.However,the vulnerability of deep learning models to adver...In recent years,deep learning(DL)models have achieved signifcant progress in many domains,such as autonomous driving,facial recognition,and speech recognition.However,the vulnerability of deep learning models to adversarial attacks has raised serious concerns in the community because of their insufcient robustness and generalization.Also,transferable attacks have become a prominent method for black-box attacks.In this work,we explore the potential factors that impact adversarial examples(AEs)transferability in DL-based speech recognition.We also discuss the vulnerability of diferent DL systems and the irregular nature of decision boundaries.Our results show a remarkable diference in the transferability of AEs between speech and images,with the data relevance being low in images but opposite in speech recognition.Motivated by dropout-based ensemble approaches,we propose random gradient ensembles and dynamic gradient-weighted ensembles,and we evaluate the impact of ensembles on the transferability of AEs.The results show that the AEs created by both approaches are valid for transfer to the black box API.展开更多
Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support governm...Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support government decision-making.Based on the data from1609 air quality monitors across China from 2014-2020,this study designed an ensemble ML model by integrating multiple types of spatial-temporal variables and three sub-models for time-sensitive prediction over a wide range.The ensemble ML model incorporates a residual connection to the gated recurrent unit(GRU)network and adopts the advantage of Transformer,extreme gradient boosting(XGBoost)and GRU with residual connection network,resulting in a 4.1%±1.0%lower root mean square error over XGBoost for the test results.The ensemble model shows great prediction performance,with coefficient of determination of 0.91,0.86,and 0.77 for 1-hr,3-hr,and 24-hr averages for the test results,respectively.In particular,this model has achieved excellent performance with low spatial uncertainty in Central,East,and North China,the major site-dense zones.Through the interpretability analysis based on the Shapley value for different temporal resolutions,we found that the contribution of atmospheric chemical processes is more important for hourly predictions compared with the daily scale predictions,while the impact of meteorological conditions would be ever-prominent for the latter.Compared with existing models for different spatiotemporal scales,the present model can be implemented at any air quality monitoring station across China to facilitate achieving rapid and dependable forecast of NO_(2),which will help developing effective control policies.展开更多
The estimation of model parameters is an important subject in engineering.In this area of work,the prevailing approach is to estimate or calculate these as deterministic parameters.In this study,we consider the model ...The estimation of model parameters is an important subject in engineering.In this area of work,the prevailing approach is to estimate or calculate these as deterministic parameters.In this study,we consider the model parameters from the perspective of random variables and describe the general form of the parameter distribution inference problem.Under this framework,we propose an ensemble Bayesian method by introducing Bayesian inference and the Markov chain Monte Carlo(MCMC)method.Experiments on a finite cylindrical reactor and a 2D IAEA benchmark problem show that the proposed method converges quickly and can estimate parameters effectively,even for several correlated parameters simultaneously.Our experiments include cases of engineering software calls,demonstrating that the method can be applied to engineering,such as nuclear reactor engineering.展开更多
The use of machine learning algorithms to identify characteristics in Distributed Denial of Service (DDoS) attacks has emerged as a powerful approach in cybersecurity. DDoS attacks, which aim to overwhelm a network or...The use of machine learning algorithms to identify characteristics in Distributed Denial of Service (DDoS) attacks has emerged as a powerful approach in cybersecurity. DDoS attacks, which aim to overwhelm a network or service with a flood of malicious traffic, pose significant threats to online systems. Traditional methods of detection and mitigation often struggle to keep pace with the evolving nature of these attacks. Machine learning, with its ability to analyze vast amounts of data and recognize patterns, offers a robust solution to this challenge. The aim of the paper is to demonstrate the application of ensemble ML algorithms, namely the K-Means and the KNN, for a dual clustering mechanism when used with PySpark to collect 99% accurate data. The algorithms, when used together, identify distinctive features of DDoS attacks that prove a very accurate reflection of reality, so they are a good combination for this aim. Impressively, having preprocessed the data, both algorithms with the PySpark foundation enabled the achievement of 99% accuracy when tuned on the features of a DDoS big dataset. The semi-supervised dataset tabulates traffic anomalies in terms of packet size distribution in correlation to Flow Duration. By training the K-Means Clustering and then applying the KNN to the dataset, the algorithms learn to evaluate the character of activity to a greater degree by displaying density with ease. The study evaluates the effectiveness of the K-Means Clustering with the KNN as ensemble algorithms that adapt very well in detecting complex patterns. Ultimately, cross-reaching environmental results indicate that ML-based approaches significantly improve detection rates compared to traditional methods. Furthermore, ensemble learning methods, which combine two plus multiple models to improve prediction accuracy, show greatness in handling the complexity and variability of big data sets especially when implemented by PySpark. The findings suggest that the enhancement of accuracy derives from newer software that’s designed to reflect reality. However, challenges remain in the deployment of these systems, including the need for large, high-quality datasets and the potential for adversarial attacks that attempt to deceive the ML models. Future research should continue to improve the robustness and efficiency of combining algorithms, as well as integrate them with existing security frameworks to provide comprehensive protection against DDoS attacks and other areas. The dataset was originally created by the University of New Brunswick to analyze DDoS data. The dataset itself was based on logs of the university’s servers, which found various DoS attacks throughout the publicly available period to totally generate 80 attributes with a 6.40GB size. In this dataset, the label and binary column become a very important portion of the final classification. In the last column, this means the normal traffic would be differentiated by the attack traffic. Further analysis is then ripe for investigation. Finally, malicious traffic alert software, as an example, should be trained on packet influx to Flow Duration dependence, which creates a mathematical scope for averages to enact. In achieving such high accuracy, the project acts as an illustration (referenced in the form of excerpts from my Google Colab account) of many attempts to tune. Cybersecurity advocates for more work on the character of brute-force attack traffic and normal traffic features overall since most of our investments as humans are digitally based in work, recreational, and social environments.展开更多
When dealing with imbalanced datasets,the traditional support vectormachine(SVM)tends to produce a classification hyperplane that is biased towards the majority class,which exhibits poor robustness.This paper proposes...When dealing with imbalanced datasets,the traditional support vectormachine(SVM)tends to produce a classification hyperplane that is biased towards the majority class,which exhibits poor robustness.This paper proposes a high-performance classification algorithm specifically designed for imbalanced datasets.The proposed method first uses a biased second-order cone programming support vectormachine(B-SOCP-SVM)to identify the support vectors(SVs)and non-support vectors(NSVs)in the imbalanced data.Then,it applies the synthetic minority over-sampling technique(SV-SMOTE)to oversample the support vectors of the minority class and uses the random under-sampling technique(NSV-RUS)multiple times to undersample the non-support vectors of the majority class.Combining the above-obtained minority class data set withmultiple majority class datasets can obtainmultiple new balanced data sets.Finally,SOCP-SVM is used to classify each data set,and the final result is obtained through the integrated algorithm.Experimental results demonstrate that the proposed method performs excellently on imbalanced datasets.展开更多
With a three-dimensional semiclassical ensemble method, we theoretically investigated the nonsequential double ionization of Ar driven by the spatially inhomogeneous few-cycle negatively chirped laser pulses. Our resu...With a three-dimensional semiclassical ensemble method, we theoretically investigated the nonsequential double ionization of Ar driven by the spatially inhomogeneous few-cycle negatively chirped laser pulses. Our results show that the recollision time window can be precisely controlled within an isolated time interval of several hundred attoseconds, which is useful for understanding the subcycle correlated electron dynamics. More interestingly, the correlated electron momentum distribution (CEMD) exhibits a strong dependence on laser intensity. That is, at lower laser intensity, CEMD is located in the first quadrant. As the laser intensity increases,CEMD shifts almost completely to the second and fourth quadrants, and then gradually to the third quadrant.The underlying physics governing the CEMD's dependence on laser intensity is explained.展开更多
Reservoir identification and production prediction are two of the most important tasks in petroleum exploration and development.Machine learning(ML)methods are used for petroleum-related studies,but have not been appl...Reservoir identification and production prediction are two of the most important tasks in petroleum exploration and development.Machine learning(ML)methods are used for petroleum-related studies,but have not been applied to reservoir identification and production prediction based on reservoir identification.Production forecasting studies are typically based on overall reservoir thickness and lack accuracy when reservoirs contain a water or dry layer without oil production.In this paper,a systematic ML method was developed using classification models for reservoir identification,and regression models for production prediction.The production models are based on the reservoir identification results.To realize the reservoir identification,seven optimized ML methods were used:four typical single ML methods and three ensemble ML methods.These methods classify the reservoir into five types of layers:water,dry and three levels of oil(I oil layer,II oil layer,III oil layer).The validation and test results of these seven optimized ML methods suggest the three ensemble methods perform better than the four single ML methods in reservoir identification.The XGBoost produced the model with the highest accuracy;up to 99%.The effective thickness of I and II oil layers determined during the reservoir identification was fed into the models for predicting production.Effective thickness considers the distribution of the water and the oil resulting in a more reasonable production prediction compared to predictions based on the overall reservoir thickness.To validate the superiority of the ML methods,reference models using overall reservoir thickness were built for comparison.The models based on effective thickness outperformed the reference models in every evaluation metric.The prediction accuracy of the ML models using effective thickness were 10%higher than that of reference model.Without the personal error or data distortion existing in traditional methods,this novel system realizes rapid analysis of data while reducing the time required to resolve reservoir classification and production prediction challenges.The ML models using the effective thickness obtained from reservoir identification were more accurate when predicting oil production compared to previous studies which use overall reservoir thickness.展开更多
Background error covariance(BEC)plays an essential role in variational data assimilation.Most variational data assimilation systems still use static BEC.Actually,the characteristics of BEC vary with season,day,and eve...Background error covariance(BEC)plays an essential role in variational data assimilation.Most variational data assimilation systems still use static BEC.Actually,the characteristics of BEC vary with season,day,and even hour of the background.National Meteorological Center-based diurnally varying BECs had been proposed,but the diurnal variation characteristics were gained by climatic samples.Ensemble methods can obtain the background error characteristics that suit the samples in the current moment.Therefore,to gain more reasonable diurnally varying BECs,in this study,ensemble-based diurnally varying BECs are generated and the diurnal variation characteristics are discussed.Their impacts are then evaluated by cycling data assimilation and forecasting experiments for a week based on the operational China Meteorological Administration-Beijing system.Clear diurnal variation in the standard deviation of ensemble forecasts and ensemble-based BECs can be identified,consistent with the diurnal variation characteristics of the atmosphere.The results of one-week cycling data assimilation and forecasting show that the application of diurnally varying BECs reduces the RMSEs in the analysis and 6-h forecast.Detailed analysis of a convective rainfall case shows that the distribution of the accumulated precipitation forecast using the diurnally varying BECs is closer to the observation than using the static BEC.Besides,the cycle-averaged precipitation scores in all magnitudes are improved,especially for the heavy precipitation,indicating the potential of using diurnally varying BEC in operational applications.展开更多
This paper uses the classical ensemble method to study the double ionization of a 2-dimensional (2D) model helium atom interacting with an elliptically polarized laser pulse. The classical ensemble calculation demon...This paper uses the classical ensemble method to study the double ionization of a 2-dimensional (2D) model helium atom interacting with an elliptically polarized laser pulse. The classical ensemble calculation demonstrates that the ratio of double to single ionization decreases with the increasing ellipticity of the driving field. The classical scenario shows that there are hardly any e--e recollisions with the circularly polarized laser pulse. The double ionization probability is studied for linearly and circularly polarized laser pulses. The classical numerical results are consistent with the semiclassical rescattering mechanism and in agreement with the experimental results and the quantum calculations qualitatively.展开更多
Turner syndrome(TS)is a chromosomal disorder disease that only affects the growth of female patients.Prompt diagnosis is of high significance for the patients.However,clinical screening methods are time-consuming and ...Turner syndrome(TS)is a chromosomal disorder disease that only affects the growth of female patients.Prompt diagnosis is of high significance for the patients.However,clinical screening methods are time-consuming and cost-expensive.Some researchers used machine learning-based methods to detect TS,the performance of which needed to be improved.Therefore,we propose an ensemble method of two-path capsule networks(CapsNets)for detecting TS based on global-local facial images.Specifically,the TS facial images are preprocessed and segmented into eight local parts under the direction of physicians;then,nine two-path CapsNets are respectively trained using the complete TS facial images and eight local images,in which the few-shot learning is utilized to solve the problem of limited data;finally,a probability-based ensemble method is exploited to combine nine classifiers for the classification of TS.By studying base classifiers,we find two meaningful facial areas are more related to TS patients,i.e.,the parts of eyes and nose.The results demonstrate that the proposed model is effective for the TS classification task,which achieves the highest accuracy of 0.9241.展开更多
The present aim is to update, upon arrival of new learning data, the parameters of a score constructed with an ensemble method involving linear discriminant analysis and logistic regression in an online setting, witho...The present aim is to update, upon arrival of new learning data, the parameters of a score constructed with an ensemble method involving linear discriminant analysis and logistic regression in an online setting, without the need to store all of the previously obtained data. Poisson bootstrap and stochastic approximation processes were used with online standardized data to avoid numerical explosions, the convergence of which has been established theoretically. This empirical convergence of online ensemble scores to a reference “batch” score was studied on five different datasets from which data streams were simulated, comparing six different processes to construct the online scores. For each score, 50 replications using a total of 10N observations (N being the size of the dataset) were performed to assess the convergence and the stability of the method, computing the mean and standard deviation of a convergence criterion. A complementary study using 100N observations was also performed. All tested processes on all datasets converged after N iterations, except for one process on one dataset. The best processes were averaged processes using online standardized data and a piecewise constant step-size.展开更多
An ensemble prediction model of solar proton events (SPEs), combining the information of solar flares and coronal mass ejections (CMEs), is built. In this model, solar flares are parameterized by the peak flux, th...An ensemble prediction model of solar proton events (SPEs), combining the information of solar flares and coronal mass ejections (CMEs), is built. In this model, solar flares are parameterized by the peak flux, the duration and the longitude. In addition, CMEs are parameterized by the width, the speed and the measurement position angle. The importance of each parameter for the occurrence of SPEs is estimated by the information gain ratio. We find that the CME width and speed are more informative than the flare’s peak flux and duration. As the physical mechanism of SPEs is not very clear, a hidden naive Bayes approach, which is a probability-based calculation method from the field of machine learning, is used to build the prediction model from the observational data. As is known, SPEs originate from solar flares and/or shock waves associated with CMEs. Hence, we first build two base prediction models using the properties of solar flares and CMEs, respectively. Then the outputs of these models are combined to generate the ensemble prediction model of SPEs. The ensemble prediction model incorporating the complementary information of solar flares and CMEs achieves better performance than each base prediction model taken separately.展开更多
BACKGROUND Despite the frequent progression from Parkinson’s disease(PD)to Parkinson’s disease dementia(PDD),the basis to diagnose early-onset Parkinson dementia(EOPD)in the early stage is still insufficient.AIM To ...BACKGROUND Despite the frequent progression from Parkinson’s disease(PD)to Parkinson’s disease dementia(PDD),the basis to diagnose early-onset Parkinson dementia(EOPD)in the early stage is still insufficient.AIM To explore the prediction accuracy of sociodemographic factors,Parkinson's motor symptoms,Parkinson’s non-motor symptoms,and rapid eye movement sleep disorder for diagnosing EOPD using PD multicenter registry data.METHODS This study analyzed 342 Parkinson patients(66 EOPD patients and 276 PD patients with normal cognition),younger than 65 years.An EOPD prediction model was developed using a random forest algorithm and the accuracy of the developed model was compared with the naive Bayesian model and discriminant analysis.RESULTS The overall accuracy of the random forest was 89.5%,and was higher than that of discriminant analysis(78.3%)and that of the naive Bayesian model(85.8%).In the random forest model,the Korean Mini Mental State Examination(K-MMSE)score,Korean Montreal Cognitive Assessment(K-MoCA),sum of boxes in Clinical Dementia Rating(CDR),global score of CDR,motor score of Untitled Parkinson’s Disease Rating(UPDRS),and Korean Instrumental Activities of Daily Living(KIADL)score were confirmed as the major variables with high weight for EOPD prediction.Among them,the K-MMSE score was the most important factor in the final model.CONCLUSION It was found that Parkinson-related motor symptoms(e.g.,motor score of UPDRS)and instrumental daily performance(e.g.,K-IADL score)in addition to cognitive screening indicators(e.g.,K-MMSE score and K-MoCA score)were predictors with high accuracy in EOPD prediction.展开更多
The response of the train–bridge system has an obvious random behavior.A high traffic density and a long maintenance period of a track will result in a substantial increase in the number of trains running on a bridge...The response of the train–bridge system has an obvious random behavior.A high traffic density and a long maintenance period of a track will result in a substantial increase in the number of trains running on a bridge,and there is small likelihood that the maximum responses of the train and bridge happen in the total maintenance period of the track.Firstly,the coupling model of train–bridge systems is reviewed.Then,an ensemble method is presented,which can estimate the small probabilities of a dynamic system with stochastic excitations.The main idea of the ensemble method is to use the NARX(nonlinear autoregressive with exogenous input)model to replace the physical model and apply subset simulation with splitting to obtain the extreme distribution.Finally,the efficiency of the suggested method is compared with the direct Monte Carlo simulation method,and the probability exceedance of train responses under the vertical track irregularity is discussed.The results show that when the small probability of train responses under vertical track irregularity is estimated,the ensemble method can reduce both the calculation time of a single sample and the required number of samples.展开更多
Extreme learning machine(ELM)has been proved to be an effective pattern classification and regression learning mechanism by researchers.However,its good performance is based on a large number of hidden layer nodes.Wit...Extreme learning machine(ELM)has been proved to be an effective pattern classification and regression learning mechanism by researchers.However,its good performance is based on a large number of hidden layer nodes.With the increase of the nodes in the hidden layers,the computation cost is greatly increased.In this paper,we propose a novel algorithm,named constrained voting extreme learning machine(CV-ELM).Compared with the traditional ELM,the CV-ELM determines the input weight and bias based on the differences of between-class samples.At the same time,to improve the accuracy of the proposed method,the voting selection is introduced.The proposed method is evaluated on public benchmark datasets.The experimental results show that the proposed algorithm is superior to the original ELM algorithm.Further,we apply the CV-ELM to the classification of superheat degree(SD)state in the aluminum electrolysis industry,and the recognition accuracy rate reaches87.4%,and the experimental results demonstrate that the proposed method is more robust than the existing state-of-the-art identification methods.展开更多
Anomaly detection in smart homes provides support to enhance the health and safety of people who live alone.Compared to the previous studies done on this topic,less attention has been given to hybrid methods.This pape...Anomaly detection in smart homes provides support to enhance the health and safety of people who live alone.Compared to the previous studies done on this topic,less attention has been given to hybrid methods.This paper presents a two-steps hybrid probabilistic anomaly detection model in the smart home.First,it employs various algorithms with different characteristics to detect anomalies from sensory data.Then,it aggregates their results using a Bayesian network.In this Bayesian network,abnormal events are detected through calculating the probability of abnormality given anomaly detection results of base methods.Experimental evaluation of a real dataset indicates the effectiveness of the proposed method by reducing false positives and increasing true positives.展开更多
Background:With improvements in next-generation DNA sequencing technology,lower cost is needed to collect genetic data.More machine learning techniques can be used to help with cancer analysis and diagnosis.Methods:We...Background:With improvements in next-generation DNA sequencing technology,lower cost is needed to collect genetic data.More machine learning techniques can be used to help with cancer analysis and diagnosis.Methods:We developed an ensemble machine learning system named performance-weighted-voting model for cancer type classification in 6,249 samples across 14 cancer types.Our ensemble system consists of five weak classifiers(logistic regression,SVM,random forest,XGBoost and neural networks).We first used cross-validation to get the predicted results for the five classifiers.The weights of the five weak classifiers can be obtained based on their predictive performance by solving linear regression functions.The final predicted probability of the performance-weighted-voting model for a cancer type can be determined by the summation of each classifier's weight multiplied by its predicted probability.Results:Using the somatic mutation count of each gene as the input feature,the overall accuracy of the performance-weighted-voting model reached 71.46%,which was significantly higher than the five weak classifiers and two other ensemble models:the hard-voting model and the soft-voting model.In addition,by analyzing the predictive pattern of the performance-weighted-voting model,we found that in most cancer types,higher tumor mutational burden can improve overall accuracy.Conclusion:This study has important clinical significance for identifying the origin of cancer,especially for those where the primary cannot be determined.In addition,our model presents a good strategy for using ensemble systems for cancer type classification.展开更多
基金This study was supported by the National Key Research and Development Program of China(No.2017YFB0304100)Key Projects of the National Natural Science Foundation of China(No.51634002).
文摘Aiming at the problem of insufficient prediction accuracy of strip flatness at the outlet of cold tandem rolling,the prediction performance of strip flatness based on different ensemble methods was studied and a high-precision prediction ensemble model of strip flatness at the outlet was established.Firstly,based on linear regression(LR),K nearest neighbors(KNN),support vector regression,regression trees(RT),and backpropagation neural network(BPN),bagging,boosting,and stacking ensemble methods were used for ensemble experiments.Secondly,three existing ensemble models,i.e.,random forest,extreme random tree(ET)and extreme gradient boosting,were used to conduct experiments and compare the results.The research shows that bagging,boosting,and stacking three ensemble methods have the most significant improvement in the prediction accuracy of the regression trees model,which is increased by 5.28%,6.51%,and 5.32%,respectively.At the same time,the stacking ensemble method improves both the simple model and the complex model,and the improvement effect on the simple base model is the greatest,which is 4.69%higher than that of the base model KNN.Comparing all of the ensemble models,the stacking ensemble model of level-1(ET,AdaBoost-RT,LR,BPN)paired with level-2(LR)was discovered to be the best model(EALB-LR)and can be further studied for industrial applications.
基金Supported by the Excellent National State Key Laboratory Project! (49823002)the National Key Project 'Study on Chinese Short
文摘The approach of getting useful information of monthly dynamical prediction from ensemble forecasts is studied. The extended range ensemble forecasts (8 members, the initial perturbations of the lagged average forecast (LAF)(0000, 0600, 1200 and 1800 GMT in two consecutive days) of the 500 hPa height field with the global spectral model (T63L16) from January to May 1997 are provided by the National Climate Center of China. The relationship between the spread of ensemble measured by root–mean–square deviation of ensemble member from ensemble mean and forecast skill (the anomaly correlation or the root–mean–square distance between the ensemble mean forecast and the observation) is significant. The spread of ensemble can evaluate the useful forecast days N for the best estimate of 30 days mean. Thus, a weighted mean approach based on ensemble spread is put forward for monthly dynamical prediction. The anomaly correlation of the weighted monthly mean by the ensemble spread is higher than that of both the arithmetic mean and the linear weighted mean. Better results of the monthly mean circulation and anomaly are obtained from the ensemble spread weighted mean. Key words Monthly prediction - Ensemble method - Spread of ensemble Supported by the Excellent National State Key Laboratory Project (49823002), the National Key Project ‘Study on Chinese Short-Term Climate Forecast System’ (96-908-02) and IAP Innovation Foundation (8-1308).The data were provided through the National Climate Center of China. The authors wish to thank Ms. Chen Lijuan for her assistance.
文摘Selecting which explanatory variables to include in a given score is a common difficulty, as a balance must be found between statistical fit and practical application. This article presents a methodology for constructing parsimonious event risk scores combining a stepwise selection of variables with ensemble scores obtained by aggregation of several scores, using several classifiers, bootstrap samples and various modalities of random selection of variables. Selection methods based on a probabilistic model can be used to achieve a stepwise selection for a given classifier such as logistic regression, but not directly for an ensemble classifier constructed by aggregation of several classifiers. Three selection methods are proposed in this framework, two involving a backward selection of the variables based on their coefficients in an ensemble score and the third involving a forward selection of the variables maximizing the AUC. The stepwise selection allows constructing a succession of scores, with the practitioner able to choose which score best fits his needs. These three methods are compared in an application to construct parsimonious short-term event risk scores in chronic HF patients, using as event the composite endpoint of death or hospitalization for worsening HF within 180 days of a visit. Focusing on the fastest method, four scores are constructed, yielding out-of-bag AUCs ranging from 0.81 (26 variables) to 0.76 (2 variables).
基金supported in part by NSFC No.62202275 and Shandong-SF No.ZR2022QF012 projects.
文摘In recent years,deep learning(DL)models have achieved signifcant progress in many domains,such as autonomous driving,facial recognition,and speech recognition.However,the vulnerability of deep learning models to adversarial attacks has raised serious concerns in the community because of their insufcient robustness and generalization.Also,transferable attacks have become a prominent method for black-box attacks.In this work,we explore the potential factors that impact adversarial examples(AEs)transferability in DL-based speech recognition.We also discuss the vulnerability of diferent DL systems and the irregular nature of decision boundaries.Our results show a remarkable diference in the transferability of AEs between speech and images,with the data relevance being low in images but opposite in speech recognition.Motivated by dropout-based ensemble approaches,we propose random gradient ensembles and dynamic gradient-weighted ensembles,and we evaluate the impact of ensembles on the transferability of AEs.The results show that the AEs created by both approaches are valid for transfer to the black box API.
基金supported by the Taishan Scholars (No.ts201712003)。
文摘Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support government decision-making.Based on the data from1609 air quality monitors across China from 2014-2020,this study designed an ensemble ML model by integrating multiple types of spatial-temporal variables and three sub-models for time-sensitive prediction over a wide range.The ensemble ML model incorporates a residual connection to the gated recurrent unit(GRU)network and adopts the advantage of Transformer,extreme gradient boosting(XGBoost)and GRU with residual connection network,resulting in a 4.1%±1.0%lower root mean square error over XGBoost for the test results.The ensemble model shows great prediction performance,with coefficient of determination of 0.91,0.86,and 0.77 for 1-hr,3-hr,and 24-hr averages for the test results,respectively.In particular,this model has achieved excellent performance with low spatial uncertainty in Central,East,and North China,the major site-dense zones.Through the interpretability analysis based on the Shapley value for different temporal resolutions,we found that the contribution of atmospheric chemical processes is more important for hourly predictions compared with the daily scale predictions,while the impact of meteorological conditions would be ever-prominent for the latter.Compared with existing models for different spatiotemporal scales,the present model can be implemented at any air quality monitoring station across China to facilitate achieving rapid and dependable forecast of NO_(2),which will help developing effective control policies.
基金partially sponsored by the Natural Science Foundation of Shanghai(No.23ZR1429300)the Innovation Fund of CNNC(Lingchuang Fund)。
文摘The estimation of model parameters is an important subject in engineering.In this area of work,the prevailing approach is to estimate or calculate these as deterministic parameters.In this study,we consider the model parameters from the perspective of random variables and describe the general form of the parameter distribution inference problem.Under this framework,we propose an ensemble Bayesian method by introducing Bayesian inference and the Markov chain Monte Carlo(MCMC)method.Experiments on a finite cylindrical reactor and a 2D IAEA benchmark problem show that the proposed method converges quickly and can estimate parameters effectively,even for several correlated parameters simultaneously.Our experiments include cases of engineering software calls,demonstrating that the method can be applied to engineering,such as nuclear reactor engineering.
文摘The use of machine learning algorithms to identify characteristics in Distributed Denial of Service (DDoS) attacks has emerged as a powerful approach in cybersecurity. DDoS attacks, which aim to overwhelm a network or service with a flood of malicious traffic, pose significant threats to online systems. Traditional methods of detection and mitigation often struggle to keep pace with the evolving nature of these attacks. Machine learning, with its ability to analyze vast amounts of data and recognize patterns, offers a robust solution to this challenge. The aim of the paper is to demonstrate the application of ensemble ML algorithms, namely the K-Means and the KNN, for a dual clustering mechanism when used with PySpark to collect 99% accurate data. The algorithms, when used together, identify distinctive features of DDoS attacks that prove a very accurate reflection of reality, so they are a good combination for this aim. Impressively, having preprocessed the data, both algorithms with the PySpark foundation enabled the achievement of 99% accuracy when tuned on the features of a DDoS big dataset. The semi-supervised dataset tabulates traffic anomalies in terms of packet size distribution in correlation to Flow Duration. By training the K-Means Clustering and then applying the KNN to the dataset, the algorithms learn to evaluate the character of activity to a greater degree by displaying density with ease. The study evaluates the effectiveness of the K-Means Clustering with the KNN as ensemble algorithms that adapt very well in detecting complex patterns. Ultimately, cross-reaching environmental results indicate that ML-based approaches significantly improve detection rates compared to traditional methods. Furthermore, ensemble learning methods, which combine two plus multiple models to improve prediction accuracy, show greatness in handling the complexity and variability of big data sets especially when implemented by PySpark. The findings suggest that the enhancement of accuracy derives from newer software that’s designed to reflect reality. However, challenges remain in the deployment of these systems, including the need for large, high-quality datasets and the potential for adversarial attacks that attempt to deceive the ML models. Future research should continue to improve the robustness and efficiency of combining algorithms, as well as integrate them with existing security frameworks to provide comprehensive protection against DDoS attacks and other areas. The dataset was originally created by the University of New Brunswick to analyze DDoS data. The dataset itself was based on logs of the university’s servers, which found various DoS attacks throughout the publicly available period to totally generate 80 attributes with a 6.40GB size. In this dataset, the label and binary column become a very important portion of the final classification. In the last column, this means the normal traffic would be differentiated by the attack traffic. Further analysis is then ripe for investigation. Finally, malicious traffic alert software, as an example, should be trained on packet influx to Flow Duration dependence, which creates a mathematical scope for averages to enact. In achieving such high accuracy, the project acts as an illustration (referenced in the form of excerpts from my Google Colab account) of many attempts to tune. Cybersecurity advocates for more work on the character of brute-force attack traffic and normal traffic features overall since most of our investments as humans are digitally based in work, recreational, and social environments.
基金supported by the Natural Science Basic Research Program of Shaanxi(Program No.2024JC-YBMS-026).
文摘When dealing with imbalanced datasets,the traditional support vectormachine(SVM)tends to produce a classification hyperplane that is biased towards the majority class,which exhibits poor robustness.This paper proposes a high-performance classification algorithm specifically designed for imbalanced datasets.The proposed method first uses a biased second-order cone programming support vectormachine(B-SOCP-SVM)to identify the support vectors(SVs)and non-support vectors(NSVs)in the imbalanced data.Then,it applies the synthetic minority over-sampling technique(SV-SMOTE)to oversample the support vectors of the minority class and uses the random under-sampling technique(NSV-RUS)multiple times to undersample the non-support vectors of the majority class.Combining the above-obtained minority class data set withmultiple majority class datasets can obtainmultiple new balanced data sets.Finally,SOCP-SVM is used to classify each data set,and the final result is obtained through the integrated algorithm.Experimental results demonstrate that the proposed method performs excellently on imbalanced datasets.
基金supported by the National Natural Science Foundation of China (Grant No. 12074329)Nanhu Scholars Program for Young Scholars of Xinyang Normal University。
文摘With a three-dimensional semiclassical ensemble method, we theoretically investigated the nonsequential double ionization of Ar driven by the spatially inhomogeneous few-cycle negatively chirped laser pulses. Our results show that the recollision time window can be precisely controlled within an isolated time interval of several hundred attoseconds, which is useful for understanding the subcycle correlated electron dynamics. More interestingly, the correlated electron momentum distribution (CEMD) exhibits a strong dependence on laser intensity. That is, at lower laser intensity, CEMD is located in the first quadrant. As the laser intensity increases,CEMD shifts almost completely to the second and fourth quadrants, and then gradually to the third quadrant.The underlying physics governing the CEMD's dependence on laser intensity is explained.
文摘Reservoir identification and production prediction are two of the most important tasks in petroleum exploration and development.Machine learning(ML)methods are used for petroleum-related studies,but have not been applied to reservoir identification and production prediction based on reservoir identification.Production forecasting studies are typically based on overall reservoir thickness and lack accuracy when reservoirs contain a water or dry layer without oil production.In this paper,a systematic ML method was developed using classification models for reservoir identification,and regression models for production prediction.The production models are based on the reservoir identification results.To realize the reservoir identification,seven optimized ML methods were used:four typical single ML methods and three ensemble ML methods.These methods classify the reservoir into five types of layers:water,dry and three levels of oil(I oil layer,II oil layer,III oil layer).The validation and test results of these seven optimized ML methods suggest the three ensemble methods perform better than the four single ML methods in reservoir identification.The XGBoost produced the model with the highest accuracy;up to 99%.The effective thickness of I and II oil layers determined during the reservoir identification was fed into the models for predicting production.Effective thickness considers the distribution of the water and the oil resulting in a more reasonable production prediction compared to predictions based on the overall reservoir thickness.To validate the superiority of the ML methods,reference models using overall reservoir thickness were built for comparison.The models based on effective thickness outperformed the reference models in every evaluation metric.The prediction accuracy of the ML models using effective thickness were 10%higher than that of reference model.Without the personal error or data distortion existing in traditional methods,this novel system realizes rapid analysis of data while reducing the time required to resolve reservoir classification and production prediction challenges.The ML models using the effective thickness obtained from reservoir identification were more accurate when predicting oil production compared to previous studies which use overall reservoir thickness.
基金This work was jointly sponsored by the National Natural Science Foundation of China[grant number 42075148]the Outreach Projects of the State Key Laboratory of Severe Weather[grant number 2021LASWA08]+1 种基金the Outreach Projects of the Key Laboratory of Meteorological Disaster[grant number KLME202209]supported by the High-Performance Computing Center of Nanjing University of Information Science and Technology(NUIST).
文摘Background error covariance(BEC)plays an essential role in variational data assimilation.Most variational data assimilation systems still use static BEC.Actually,the characteristics of BEC vary with season,day,and even hour of the background.National Meteorological Center-based diurnally varying BECs had been proposed,but the diurnal variation characteristics were gained by climatic samples.Ensemble methods can obtain the background error characteristics that suit the samples in the current moment.Therefore,to gain more reasonable diurnally varying BECs,in this study,ensemble-based diurnally varying BECs are generated and the diurnal variation characteristics are discussed.Their impacts are then evaluated by cycling data assimilation and forecasting experiments for a week based on the operational China Meteorological Administration-Beijing system.Clear diurnal variation in the standard deviation of ensemble forecasts and ensemble-based BECs can be identified,consistent with the diurnal variation characteristics of the atmosphere.The results of one-week cycling data assimilation and forecasting show that the application of diurnally varying BECs reduces the RMSEs in the analysis and 6-h forecast.Detailed analysis of a convective rainfall case shows that the distribution of the accumulated precipitation forecast using the diurnally varying BECs is closer to the observation than using the static BEC.Besides,the cycle-averaged precipitation scores in all magnitudes are improved,especially for the heavy precipitation,indicating the potential of using diurnally varying BEC in operational applications.
基金supported by the National Natural Science Foundation of China (Grant Nos. 10974068 and 10574057)
文摘This paper uses the classical ensemble method to study the double ionization of a 2-dimensional (2D) model helium atom interacting with an elliptically polarized laser pulse. The classical ensemble calculation demonstrates that the ratio of double to single ionization decreases with the increasing ellipticity of the driving field. The classical scenario shows that there are hardly any e--e recollisions with the circularly polarized laser pulse. The double ionization probability is studied for linearly and circularly polarized laser pulses. The classical numerical results are consistent with the semiclassical rescattering mechanism and in agreement with the experimental results and the quantum calculations qualitatively.
基金the National Key R&D Program of China(No.2020YFB2104402)。
文摘Turner syndrome(TS)is a chromosomal disorder disease that only affects the growth of female patients.Prompt diagnosis is of high significance for the patients.However,clinical screening methods are time-consuming and cost-expensive.Some researchers used machine learning-based methods to detect TS,the performance of which needed to be improved.Therefore,we propose an ensemble method of two-path capsule networks(CapsNets)for detecting TS based on global-local facial images.Specifically,the TS facial images are preprocessed and segmented into eight local parts under the direction of physicians;then,nine two-path CapsNets are respectively trained using the complete TS facial images and eight local images,in which the few-shot learning is utilized to solve the problem of limited data;finally,a probability-based ensemble method is exploited to combine nine classifiers for the classification of TS.By studying base classifiers,we find two meaningful facial areas are more related to TS patients,i.e.,the parts of eyes and nose.The results demonstrate that the proposed model is effective for the TS classification task,which achieves the highest accuracy of 0.9241.
文摘The present aim is to update, upon arrival of new learning data, the parameters of a score constructed with an ensemble method involving linear discriminant analysis and logistic regression in an online setting, without the need to store all of the previously obtained data. Poisson bootstrap and stochastic approximation processes were used with online standardized data to avoid numerical explosions, the convergence of which has been established theoretically. This empirical convergence of online ensemble scores to a reference “batch” score was studied on five different datasets from which data streams were simulated, comparing six different processes to construct the online scores. For each score, 50 replications using a total of 10N observations (N being the size of the dataset) were performed to assess the convergence and the stability of the method, computing the mean and standard deviation of a convergence criterion. A complementary study using 100N observations was also performed. All tested processes on all datasets converged after N iterations, except for one process on one dataset. The best processes were averaged processes using online standardized data and a piecewise constant step-size.
基金supported by the Young Researcher Grant of National Astronomical Observatories, Chinese Academy of Sciences, the National Basic Research Program of China (973 Program, Grant No. 2011CB811406)the National Natural Science Foundation of China (Grant Nos. 10733020, 10921303, 11003026 and 11078010)
文摘An ensemble prediction model of solar proton events (SPEs), combining the information of solar flares and coronal mass ejections (CMEs), is built. In this model, solar flares are parameterized by the peak flux, the duration and the longitude. In addition, CMEs are parameterized by the width, the speed and the measurement position angle. The importance of each parameter for the occurrence of SPEs is estimated by the information gain ratio. We find that the CME width and speed are more informative than the flare’s peak flux and duration. As the physical mechanism of SPEs is not very clear, a hidden naive Bayes approach, which is a probability-based calculation method from the field of machine learning, is used to build the prediction model from the observational data. As is known, SPEs originate from solar flares and/or shock waves associated with CMEs. Hence, we first build two base prediction models using the properties of solar flares and CMEs, respectively. Then the outputs of these models are combined to generate the ensemble prediction model of SPEs. The ensemble prediction model incorporating the complementary information of solar flares and CMEs achieves better performance than each base prediction model taken separately.
基金Supported by Basic Science Research Program through the National Research Foundation of Korea funded by the Ministry of Education,No.NRF-2018R1D1A1B07041091 and NRF-2019S1A5A8034211.
文摘BACKGROUND Despite the frequent progression from Parkinson’s disease(PD)to Parkinson’s disease dementia(PDD),the basis to diagnose early-onset Parkinson dementia(EOPD)in the early stage is still insufficient.AIM To explore the prediction accuracy of sociodemographic factors,Parkinson's motor symptoms,Parkinson’s non-motor symptoms,and rapid eye movement sleep disorder for diagnosing EOPD using PD multicenter registry data.METHODS This study analyzed 342 Parkinson patients(66 EOPD patients and 276 PD patients with normal cognition),younger than 65 years.An EOPD prediction model was developed using a random forest algorithm and the accuracy of the developed model was compared with the naive Bayesian model and discriminant analysis.RESULTS The overall accuracy of the random forest was 89.5%,and was higher than that of discriminant analysis(78.3%)and that of the naive Bayesian model(85.8%).In the random forest model,the Korean Mini Mental State Examination(K-MMSE)score,Korean Montreal Cognitive Assessment(K-MoCA),sum of boxes in Clinical Dementia Rating(CDR),global score of CDR,motor score of Untitled Parkinson’s Disease Rating(UPDRS),and Korean Instrumental Activities of Daily Living(KIADL)score were confirmed as the major variables with high weight for EOPD prediction.Among them,the K-MMSE score was the most important factor in the final model.CONCLUSION It was found that Parkinson-related motor symptoms(e.g.,motor score of UPDRS)and instrumental daily performance(e.g.,K-IADL score)in addition to cognitive screening indicators(e.g.,K-MMSE score and K-MoCA score)were predictors with high accuracy in EOPD prediction.
基金This work was financially supported by the National Natural Science Foundation of China(Nos.51978589,51778544,and 51525804).
文摘The response of the train–bridge system has an obvious random behavior.A high traffic density and a long maintenance period of a track will result in a substantial increase in the number of trains running on a bridge,and there is small likelihood that the maximum responses of the train and bridge happen in the total maintenance period of the track.Firstly,the coupling model of train–bridge systems is reviewed.Then,an ensemble method is presented,which can estimate the small probabilities of a dynamic system with stochastic excitations.The main idea of the ensemble method is to use the NARX(nonlinear autoregressive with exogenous input)model to replace the physical model and apply subset simulation with splitting to obtain the extreme distribution.Finally,the efficiency of the suggested method is compared with the direct Monte Carlo simulation method,and the probability exceedance of train responses under the vertical track irregularity is discussed.The results show that when the small probability of train responses under vertical track irregularity is estimated,the ensemble method can reduce both the calculation time of a single sample and the required number of samples.
基金supported by the National Natural Science Foundation of China(6177340561751312)the Major Scientific and Technological Innovation Projects of Shandong Province(2019JZZY020123)。
文摘Extreme learning machine(ELM)has been proved to be an effective pattern classification and regression learning mechanism by researchers.However,its good performance is based on a large number of hidden layer nodes.With the increase of the nodes in the hidden layers,the computation cost is greatly increased.In this paper,we propose a novel algorithm,named constrained voting extreme learning machine(CV-ELM).Compared with the traditional ELM,the CV-ELM determines the input weight and bias based on the differences of between-class samples.At the same time,to improve the accuracy of the proposed method,the voting selection is introduced.The proposed method is evaluated on public benchmark datasets.The experimental results show that the proposed algorithm is superior to the original ELM algorithm.Further,we apply the CV-ELM to the classification of superheat degree(SD)state in the aluminum electrolysis industry,and the recognition accuracy rate reaches87.4%,and the experimental results demonstrate that the proposed method is more robust than the existing state-of-the-art identification methods.
文摘Anomaly detection in smart homes provides support to enhance the health and safety of people who live alone.Compared to the previous studies done on this topic,less attention has been given to hybrid methods.This paper presents a two-steps hybrid probabilistic anomaly detection model in the smart home.First,it employs various algorithms with different characteristics to detect anomalies from sensory data.Then,it aggregates their results using a Bayesian network.In this Bayesian network,abnormal events are detected through calculating the probability of abnormality given anomaly detection results of base methods.Experimental evaluation of a real dataset indicates the effectiveness of the proposed method by reducing false positives and increasing true positives.
文摘Background:With improvements in next-generation DNA sequencing technology,lower cost is needed to collect genetic data.More machine learning techniques can be used to help with cancer analysis and diagnosis.Methods:We developed an ensemble machine learning system named performance-weighted-voting model for cancer type classification in 6,249 samples across 14 cancer types.Our ensemble system consists of five weak classifiers(logistic regression,SVM,random forest,XGBoost and neural networks).We first used cross-validation to get the predicted results for the five classifiers.The weights of the five weak classifiers can be obtained based on their predictive performance by solving linear regression functions.The final predicted probability of the performance-weighted-voting model for a cancer type can be determined by the summation of each classifier's weight multiplied by its predicted probability.Results:Using the somatic mutation count of each gene as the input feature,the overall accuracy of the performance-weighted-voting model reached 71.46%,which was significantly higher than the five weak classifiers and two other ensemble models:the hard-voting model and the soft-voting model.In addition,by analyzing the predictive pattern of the performance-weighted-voting model,we found that in most cancer types,higher tumor mutational burden can improve overall accuracy.Conclusion:This study has important clinical significance for identifying the origin of cancer,especially for those where the primary cannot be determined.In addition,our model presents a good strategy for using ensemble systems for cancer type classification.