In this study,we conducted an experiment to construct multi-model ensemble(MME)predictions for the El Niño-Southern Oscillation(ENSO)using a neural network,based on hindcast data released from five coupled oceana...In this study,we conducted an experiment to construct multi-model ensemble(MME)predictions for the El Niño-Southern Oscillation(ENSO)using a neural network,based on hindcast data released from five coupled oceanatmosphere models,which exhibit varying levels of complexity.This nonlinear approach demonstrated extraordinary superiority and effectiveness in constructing ENSO MME.Subsequently,we employed the leave-one-out crossvalidation and the moving base methods to further validate the robustness of the neural network model in the formulation of ENSO MME.In conclusion,the neural network algorithm outperforms the conventional approach of assigning a uniform weight to all models.This is evidenced by an enhancement in correlation coefficients and reduction in prediction errors,which have the potential to provide a more accurate ENSO forecast.展开更多
Orthogonal conditional nonlinear optimal perturbations(O-CNOPs)have been used to generate ensemble forecasting members for achieving high forecasting skill of high-impact weather and climate events.However,highly effi...Orthogonal conditional nonlinear optimal perturbations(O-CNOPs)have been used to generate ensemble forecasting members for achieving high forecasting skill of high-impact weather and climate events.However,highly efficient calculations for O-CNOPs are still challenging in the field of ensemble forecasting.In this study,we combine a gradient-based iterative idea with the Gram‒Schmidt orthogonalization,and propose an iterative optimization method to compute O-CNOPs.This method is different from the original sequential optimization method,and allows parallel computations of O-CNOPs,thus saving a large amount of computational time.We evaluate this method by using the Lorenz-96 model on the basis of the ensemble forecasting ability achieved and on the time consumed for computing O-CNOPs.The results demonstrate that the parallel iterative method causes O-CNOPs to yield reliable ensemble members and to achieve ensemble forecasting skills similar to or even slightly higher than those produced by the sequential method.Moreover,the parallel method significantly reduces the computational time for O-CNOPs.Therefore,the parallel iterative method provides a highly effective and efficient approach for calculating O-CNOPs for ensemble forecasts.Expectedly,it can play an important role in the application of the O-CNOPs to realistic ensemble forecasts for high-impact weather and climate events.展开更多
Non-technical losses(NTL)of electric power are a serious problem for electric distribution companies.The solution determines the cost,stability,reliability,and quality of the supplied electricity.The widespread use of...Non-technical losses(NTL)of electric power are a serious problem for electric distribution companies.The solution determines the cost,stability,reliability,and quality of the supplied electricity.The widespread use of advanced metering infrastructure(AMI)and Smart Grid allows all participants in the distribution grid to store and track electricity consumption.During the research,a machine learning model is developed that allows analyzing and predicting the probability of NTL for each consumer of the distribution grid based on daily electricity consumption readings.This model is an ensemble meta-algorithm(stacking)that generalizes the algorithms of random forest,LightGBM,and a homogeneous ensemble of artificial neural networks.The best accuracy of the proposed meta-algorithm in comparison to basic classifiers is experimentally confirmed on the test sample.Such a model,due to good accuracy indicators(ROC-AUC-0.88),can be used as a methodological basis for a decision support system,the purpose of which is to form a sample of suspected NTL sources.The use of such a sample will allow the top management of electric distribution companies to increase the efficiency of raids by performers,making them targeted and accurate,which should contribute to the fight against NTL and the sustainable development of the electric power industry.展开更多
Aiming at the problem of insufficient prediction accuracy of strip flatness at the outlet of cold tandem rolling,the prediction performance of strip flatness based on different ensemble methods was studied and a high-...Aiming at the problem of insufficient prediction accuracy of strip flatness at the outlet of cold tandem rolling,the prediction performance of strip flatness based on different ensemble methods was studied and a high-precision prediction ensemble model of strip flatness at the outlet was established.Firstly,based on linear regression(LR),K nearest neighbors(KNN),support vector regression,regression trees(RT),and backpropagation neural network(BPN),bagging,boosting,and stacking ensemble methods were used for ensemble experiments.Secondly,three existing ensemble models,i.e.,random forest,extreme random tree(ET)and extreme gradient boosting,were used to conduct experiments and compare the results.The research shows that bagging,boosting,and stacking three ensemble methods have the most significant improvement in the prediction accuracy of the regression trees model,which is increased by 5.28%,6.51%,and 5.32%,respectively.At the same time,the stacking ensemble method improves both the simple model and the complex model,and the improvement effect on the simple base model is the greatest,which is 4.69%higher than that of the base model KNN.Comparing all of the ensemble models,the stacking ensemble model of level-1(ET,AdaBoost-RT,LR,BPN)paired with level-2(LR)was discovered to be the best model(EALB-LR)and can be further studied for industrial applications.展开更多
Advanced DriverAssistance Systems(ADAS)technologies can assist drivers or be part of automatic driving systems to support the driving process and improve the level of safety and comfort on the road.Traffic Sign Recogn...Advanced DriverAssistance Systems(ADAS)technologies can assist drivers or be part of automatic driving systems to support the driving process and improve the level of safety and comfort on the road.Traffic Sign Recognition System(TSRS)is one of themost important components ofADAS.Among the challengeswith TSRS is being able to recognize road signs with the highest accuracy and the shortest processing time.Accordingly,this paper introduces a new real time methodology recognizing Speed Limit Signs based on a trio of developed modules.Firstly,the Speed Limit Detection(SLD)module uses the Haar Cascade technique to generate a new SL detector in order to localize SL signs within captured frames.Secondly,the Speed Limit Classification(SLC)module,featuring machine learning classifiers alongside a newly developed model called DeepSL,harnesses the power of a CNN architecture to extract intricate features from speed limit sign images,ensuring efficient and precise recognition.In addition,a new Speed Limit Classifiers Fusion(SLCF)module has been developed by combining trained ML classifiers and the DeepSL model by using the Dempster-Shafer theory of belief functions and ensemble learning’s voting technique.Through rigorous software and hardware validation processes,the proposedmethodology has achieved highly significant F1 scores of 99.98%and 99.96%for DS theory and the votingmethod,respectively.Furthermore,a prototype encompassing all components demonstrates outstanding reliability and efficacy,with processing times of 150 ms for the Raspberry Pi board and 81.5 ms for the Nano Jetson board,marking a significant advancement in TSRS technology.展开更多
Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support governm...Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support government decision-making.Based on the data from1609 air quality monitors across China from 2014-2020,this study designed an ensemble ML model by integrating multiple types of spatial-temporal variables and three sub-models for time-sensitive prediction over a wide range.The ensemble ML model incorporates a residual connection to the gated recurrent unit(GRU)network and adopts the advantage of Transformer,extreme gradient boosting(XGBoost)and GRU with residual connection network,resulting in a 4.1%±1.0%lower root mean square error over XGBoost for the test results.The ensemble model shows great prediction performance,with coefficient of determination of 0.91,0.86,and 0.77 for 1-hr,3-hr,and 24-hr averages for the test results,respectively.In particular,this model has achieved excellent performance with low spatial uncertainty in Central,East,and North China,the major site-dense zones.Through the interpretability analysis based on the Shapley value for different temporal resolutions,we found that the contribution of atmospheric chemical processes is more important for hourly predictions compared with the daily scale predictions,while the impact of meteorological conditions would be ever-prominent for the latter.Compared with existing models for different spatiotemporal scales,the present model can be implemented at any air quality monitoring station across China to facilitate achieving rapid and dependable forecast of NO_(2),which will help developing effective control policies.展开更多
The 21-yr ensemble predictions of model precipitation and circulation in the East Asian and western North Pacific (Asia-Pacific) summer monsoon region (0°-50°N, 100° 150°E) were evaluated in ni...The 21-yr ensemble predictions of model precipitation and circulation in the East Asian and western North Pacific (Asia-Pacific) summer monsoon region (0°-50°N, 100° 150°E) were evaluated in nine different AGCM, used in the Asia-Pacific Economic Cooperation Climate Center (APCC) multi-model ensemble seasonal prediction system. The analysis indicates that the precipitation anomaly patterns of model ensemble predictions are substantially different from the observed counterparts in this region, but the summer monsoon circulations are reasonably predicted. For example, all models can well produce the interannual variability of the western North Pacific monsoon index (WNPMI) defined by 850 hPa winds, but they failed to predict the relationship between WNPMI and precipitation anomalies. The interannual variability of the 500 hPa geopotential height (GPH) can be well predicted by the models in contrast to precipitation anomalies. On the basis of such model performances and the relationship between the interannual variations of 500 hPa GPH and precipitation anomalies, we developed a statistical scheme used to downscale the summer monsoon precipitation anomaly on the basis of EOF and singular value decomposition (SVD). In this scheme, the three leading EOF modes of 500 hPa GPH anomaly fields predicted by the models are firstly corrected by the linear regression between the principal components in each model and observation, respectively. Then, the corrected model GPH is chosen as the predictor to downscale the precipitation anomaly field, which is assembled by the forecasted expansion coefficients of model 500 hPa GPH and the three leading SVD modes of observed precipitation anomaly corresponding to the prediction of model 500 hPa GPH during a 19-year training period. The cross-validated forecasts suggest that this downscaling scheme may have a potential to improve the forecast skill of the precipitation anomaly in the South China Sea, western North Pacific and the East Asia Pacific regions, where the anomaly correlation coefficient (ACC) has been improved by 0.14, corresponding to the reduced RMSE of 10.4% in the conventional multi-model ensemble (MME) forecast.展开更多
This study investigates multi-model ensemble forecasts of track and intensity of tropical cyclones over the western Pacific, based on forecast outputs from the China Meteorological Administration, European Centre for ...This study investigates multi-model ensemble forecasts of track and intensity of tropical cyclones over the western Pacific, based on forecast outputs from the China Meteorological Administration, European Centre for Medium-Range Weather Forecasts, Japan Meteorological Agency and National Centers for Environmental Prediction in the THORPEX Interactive Grand Global Ensemble(TIGGE) datasets. The multi-model ensemble schemes, namely the bias-removed ensemble mean(BREM) and superensemble(SUP), are compared with the ensemble mean(EMN) and single-model forecasts. Moreover, a new model bias estimation scheme is investigated and applied to the BREM and SUP schemes. The results showed that, compared with single-model forecasts and EMN, the multi-model ensembles of the BREM and SUP schemes can have smaller errors in most cases. However, there were also circumstances where BREM was less skillful than EMN, indicating that using a time-averaged error as model bias is not optimal. A new model bias estimation scheme of the biweight mean is introduced. Through minimizing the negative influence of singular errors, this scheme can obtain a more accurate model bias estimation and improve the BREM forecast skill. The application of the biweight mean in the bias calculation of SUP also resulted in improved skill. The results indicate that the modification of multi-model ensemble schemes through this bias estimation method is feasible.展开更多
In order to reduce the uncertainty of offline land surface model (LSM) simulations of land evapotranspiration (ET), we used ensemble simulations based on three meteorological forcing datasets [Princeton, ITPCAS (...In order to reduce the uncertainty of offline land surface model (LSM) simulations of land evapotranspiration (ET), we used ensemble simulations based on three meteorological forcing datasets [Princeton, ITPCAS (Institute of Tibetan Plateau Research, Chinese Academy of Sciences), Qian] and four LSMs (BATS, VIC, CLM3.0 and CLM3.5), to explore the trends and spatiotemporal characteristics of ET, as well as the spatiotemporal pattern of ET in response to climate factors over China's Mainland during 1982-2007. The results showed that various simulations of each member and their arithmetic mean (EnsAVlean) could capture the spatial distribution and seasonal pattern of ET sufficiently well, where they exhibited more significant spatial and seasonal variation in the ET compared with observation-based ET estimates (Obs_MTE). For the mean annual ET, we found that the BATS forced by Princeton forcing overestimated the annual mean ET compared with Obs_MTE for most of the basins in China, whereas the VIC forced by Princeton forcing showed underestimations. By contrast, the Ens_Mean was closer to Obs_MTE, although the results were underestimated over Southeast China. Furthermore, both the Obs_MTE and Ens_Mean exhibited a significant increasing trend during 1982-98; whereas after 1998, when the last big EI Nifio event occurred, the Ens_Mean tended to decrease significantly between 1999 and 2007, although the change was not significant for Obs_MTE. Changes in air temperature and shortwave radiation played key roles in the long-term variation in ET over the humid area of China, but precipitation mainly controlled the long-term variation in ET in arid and semi-arid areas of China.展开更多
Seasonal prediction of summer rainfall over the Yangtze River valley(YRV) is valuable for agricultural and industrial production and freshwater resource management in China, but remains a major challenge. Earlier mu...Seasonal prediction of summer rainfall over the Yangtze River valley(YRV) is valuable for agricultural and industrial production and freshwater resource management in China, but remains a major challenge. Earlier multi-model ensemble(MME) prediction schemes for summer rainfall over China focus on single-value prediction, which cannot provide the necessary uncertainty information, while commonly-used ensemble schemes for probability density function(PDF) prediction are not adapted to YRV summer rainfall prediction. In the present study, an MME PDF prediction scheme is proposed based on the ENSEMBLES hindcasts. It is similar to the earlier Bayesian ensemble prediction scheme, but with optimization of ensemble members and a revision of the variance modeling of the likelihood function. The optimized ensemble members are regressed YRV summer rainfall with factors selected from model outputs of synchronous 500-h Pa geopotential height as predictors. The revised variance modeling of the likelihood function is a simple linear regression with ensemble spread as the predictor. The cross-validation skill of 1960–2002 YRV summer rainfall prediction shows that the new scheme produces a skillful PDF prediction, and is much better-calibrated, sharper, and more accurate than the earlier Bayesian ensemble and raw ensemble.展开更多
Dissolved oxygen(DO)is an important indicator of aquaculture,and its accurate forecasting can effectively improve the quality of aquatic products.In this paper,a new DO hybrid forecasting model is proposed that includ...Dissolved oxygen(DO)is an important indicator of aquaculture,and its accurate forecasting can effectively improve the quality of aquatic products.In this paper,a new DO hybrid forecasting model is proposed that includes three stages:multi-factor analysis,adaptive decomposition,and an optimizationbased ensemble.First,considering the complex factors affecting DO,the grey relational(GR)degree method is used to screen out the environmental factors most closely related to DO.The consideration of multiple factors makes model fusion more effective.Second,the series of DO,water temperature,salinity,and oxygen saturation are decomposed adaptively into sub-series by means of the empirical wavelet transform(EWT)method.Then,five benchmark models are utilized to forecast the sub-series of EWT decomposition.The ensemble weights of these five sub-forecasting models are calculated by particle swarm optimization and gravitational search algorithm(PSOGSA).Finally,a multi-factor ensemble model for DO is obtained by weighted allocation.The performance of the proposed model is verified by timeseries data collected by the pacific islands ocean observing system(PacIOOS)from the WQB04 station at Hilo.The evaluation indicators involved in the experiment include the Nash–Sutcliffe efficiency(NSE),Kling–Gupta efficiency(KGE),mean absolute percent error(MAPE),standard deviation of error(SDE),and coefficient of determination(R^(2)).Example analysis demonstrates that:①The proposed model can obtain excellent DO forecasting results;②the proposed model is superior to other comparison models;and③the forecasting model can be used to analyze the trend of DO and enable managers to make better management decisions.展开更多
Persistent Heavy Rainfall(PHR)is the most influential extreme weather event in Asia in summer,and thus it has attracted intensive interests of many scientists.In this study,operational global ensemble forecasts from C...Persistent Heavy Rainfall(PHR)is the most influential extreme weather event in Asia in summer,and thus it has attracted intensive interests of many scientists.In this study,operational global ensemble forecasts from China Meteorological Administration(CMA)are used,and a new verification method applied to evaluate the predictability of PHR is investigated.A metrics called Index of Composite Predictability(ICP)established on basic verification indicators,i.e.,Equitable Threat Score(ETS)of 24 h accumulated precipitation and Root Mean Square Error(RMSE)of Height at 500 h Pa,are selected in this study to distinguish"good"and"poor"prediction from all ensemble members.With the use of the metrics of ICP,the predictability of two typical PHR events in June 2010 and June 2011 is estimated.The results show that the"good member"and"poor member"can be identified by ICP and there is an obvious discrepancy in their ability to predict the key weather system that affects PHR."Good member"shows a higher predictability both in synoptic scale and mesoscale weather system in their location,duration and the movement.The growth errors for"poor"members is mainly due to errors of initial conditions in northern polar region.The growth of perturbation errors and the reason for better or worse performance of ensemble member also have great value for future model improvement and further research.展开更多
Social media is a platform to express one′s views and opinions freely and has made communication easier than it was before.This also opens up an opportunity for people to spread fake news intentionally.The ease of ac...Social media is a platform to express one′s views and opinions freely and has made communication easier than it was before.This also opens up an opportunity for people to spread fake news intentionally.The ease of access to a variety of news sources on the web also brings the problem of people being exposed to fake news and possibly believing such news.This makes it important for us to detect and flag such content on social media.With the current rate of news generated on social media,it is difficult to differentiate between genuine news and hoaxes without knowing the source of the news.This paper discusses approaches to detection of fake news using only the features of the text of the news,without using any other related metadata.We observe that a combination of stylometric features and text-based word vector representations through ensemble methods can predict fake news with an accuracy of up to 95.49%.展开更多
A redundant-subspace-weighting(RSW)-based approach is proposed to enhance the frequency stability on a time scale of a clock ensemble.In this method,multiple overlapping subspaces are constructed in the clock ensemble...A redundant-subspace-weighting(RSW)-based approach is proposed to enhance the frequency stability on a time scale of a clock ensemble.In this method,multiple overlapping subspaces are constructed in the clock ensemble,and the weight of each clock in this ensemble is defined by using the spatial covariance matrix.The superimposition average of covariances in different subspaces reduces the correlations between clocks in the same laboratory to some extent.After optimizing the parameters of this weighting procedure,the frequency stabilities of virtual clock ensembles are significantly improved in most cases.展开更多
The approach of getting useful information of monthly dynamical prediction from ensemble forecasts is studied. The extended range ensemble forecasts (8 members, the initial perturbations of the lagged average forecast...The approach of getting useful information of monthly dynamical prediction from ensemble forecasts is studied. The extended range ensemble forecasts (8 members, the initial perturbations of the lagged average forecast (LAF)(0000, 0600, 1200 and 1800 GMT in two consecutive days) of the 500 hPa height field with the global spectral model (T63L16) from January to May 1997 are provided by the National Climate Center of China. The relationship between the spread of ensemble measured by root–mean–square deviation of ensemble member from ensemble mean and forecast skill (the anomaly correlation or the root–mean–square distance between the ensemble mean forecast and the observation) is significant. The spread of ensemble can evaluate the useful forecast days N for the best estimate of 30 days mean. Thus, a weighted mean approach based on ensemble spread is put forward for monthly dynamical prediction. The anomaly correlation of the weighted monthly mean by the ensemble spread is higher than that of both the arithmetic mean and the linear weighted mean. Better results of the monthly mean circulation and anomaly are obtained from the ensemble spread weighted mean. Key words Monthly prediction - Ensemble method - Spread of ensemble Supported by the Excellent National State Key Laboratory Project (49823002), the National Key Project ‘Study on Chinese Short-Term Climate Forecast System’ (96-908-02) and IAP Innovation Foundation (8-1308).The data were provided through the National Climate Center of China. The authors wish to thank Ms. Chen Lijuan for her assistance.展开更多
The estimation of model parameters is an important subject in engineering.In this area of work,the prevailing approach is to estimate or calculate these as deterministic parameters.In this study,we consider the model ...The estimation of model parameters is an important subject in engineering.In this area of work,the prevailing approach is to estimate or calculate these as deterministic parameters.In this study,we consider the model parameters from the perspective of random variables and describe the general form of the parameter distribution inference problem.Under this framework,we propose an ensemble Bayesian method by introducing Bayesian inference and the Markov chain Monte Carlo(MCMC)method.Experiments on a finite cylindrical reactor and a 2D IAEA benchmark problem show that the proposed method converges quickly and can estimate parameters effectively,even for several correlated parameters simultaneously.Our experiments include cases of engineering software calls,demonstrating that the method can be applied to engineering,such as nuclear reactor engineering.展开更多
Selecting which explanatory variables to include in a given score is a common difficulty, as a balance must be found between statistical fit and practical application. This article presents a methodology for construct...Selecting which explanatory variables to include in a given score is a common difficulty, as a balance must be found between statistical fit and practical application. This article presents a methodology for constructing parsimonious event risk scores combining a stepwise selection of variables with ensemble scores obtained by aggregation of several scores, using several classifiers, bootstrap samples and various modalities of random selection of variables. Selection methods based on a probabilistic model can be used to achieve a stepwise selection for a given classifier such as logistic regression, but not directly for an ensemble classifier constructed by aggregation of several classifiers. Three selection methods are proposed in this framework, two involving a backward selection of the variables based on their coefficients in an ensemble score and the third involving a forward selection of the variables maximizing the AUC. The stepwise selection allows constructing a succession of scores, with the practitioner able to choose which score best fits his needs. These three methods are compared in an application to construct parsimonious short-term event risk scores in chronic HF patients, using as event the composite endpoint of death or hospitalization for worsening HF within 180 days of a visit. Focusing on the fastest method, four scores are constructed, yielding out-of-bag AUCs ranging from 0.81 (26 variables) to 0.76 (2 variables).展开更多
A Bayesian probabilistic prediction scheme of the Yangtze River Valley (YRV) summer rainfall is proposed to combine forecast information from multi-model ensemble dataset provided by ENSEMBLES project.Due to the low f...A Bayesian probabilistic prediction scheme of the Yangtze River Valley (YRV) summer rainfall is proposed to combine forecast information from multi-model ensemble dataset provided by ENSEMBLES project.Due to the low forecast skill of rainfall in dynamic models,the time series of regressed YRV summer rainfall are selected as ensemble members in the new scheme,instead of commonly-used YRV summer rainfall simulated by models.Each time series of regressed YRV summer rainfall is derived from a simple linear regression.The predictor in each simple linear regression is the skillfully simulated circulation or surface temperature factor which is highly linear with the observed YRV summer rainfall in the training set.The high correlation between the ensemble mean of these regressed YRV summer rainfall and observation benefit extracting more sample information from the ensemble system.The results show that the cross-validated skill of the new scheme over the period of 1960 to 2002 is much higher than equally-weighted ensemble,multiple linear regression,and Bayesian ensemble with simulated YRV summer rainfall as ensemble members.In addition,the new scheme is also more skillful than reference forecasts (random forecast at a 0.01 significance level for ensemble mean and climatology forecast for probability density function).展开更多
The use of machine learning algorithms to identify characteristics in Distributed Denial of Service (DDoS) attacks has emerged as a powerful approach in cybersecurity. DDoS attacks, which aim to overwhelm a network or...The use of machine learning algorithms to identify characteristics in Distributed Denial of Service (DDoS) attacks has emerged as a powerful approach in cybersecurity. DDoS attacks, which aim to overwhelm a network or service with a flood of malicious traffic, pose significant threats to online systems. Traditional methods of detection and mitigation often struggle to keep pace with the evolving nature of these attacks. Machine learning, with its ability to analyze vast amounts of data and recognize patterns, offers a robust solution to this challenge. The aim of the paper is to demonstrate the application of ensemble ML algorithms, namely the K-Means and the KNN, for a dual clustering mechanism when used with PySpark to collect 99% accurate data. The algorithms, when used together, identify distinctive features of DDoS attacks that prove a very accurate reflection of reality, so they are a good combination for this aim. Impressively, having preprocessed the data, both algorithms with the PySpark foundation enabled the achievement of 99% accuracy when tuned on the features of a DDoS big dataset. The semi-supervised dataset tabulates traffic anomalies in terms of packet size distribution in correlation to Flow Duration. By training the K-Means Clustering and then applying the KNN to the dataset, the algorithms learn to evaluate the character of activity to a greater degree by displaying density with ease. The study evaluates the effectiveness of the K-Means Clustering with the KNN as ensemble algorithms that adapt very well in detecting complex patterns. Ultimately, cross-reaching environmental results indicate that ML-based approaches significantly improve detection rates compared to traditional methods. Furthermore, ensemble learning methods, which combine two plus multiple models to improve prediction accuracy, show greatness in handling the complexity and variability of big data sets especially when implemented by PySpark. The findings suggest that the enhancement of accuracy derives from newer software that’s designed to reflect reality. However, challenges remain in the deployment of these systems, including the need for large, high-quality datasets and the potential for adversarial attacks that attempt to deceive the ML models. Future research should continue to improve the robustness and efficiency of combining algorithms, as well as integrate them with existing security frameworks to provide comprehensive protection against DDoS attacks and other areas. The dataset was originally created by the University of New Brunswick to analyze DDoS data. The dataset itself was based on logs of the university’s servers, which found various DoS attacks throughout the publicly available period to totally generate 80 attributes with a 6.40GB size. In this dataset, the label and binary column become a very important portion of the final classification. In the last column, this means the normal traffic would be differentiated by the attack traffic. Further analysis is then ripe for investigation. Finally, malicious traffic alert software, as an example, should be trained on packet influx to Flow Duration dependence, which creates a mathematical scope for averages to enact. In achieving such high accuracy, the project acts as an illustration (referenced in the form of excerpts from my Google Colab account) of many attempts to tune. Cybersecurity advocates for more work on the character of brute-force attack traffic and normal traffic features overall since most of our investments as humans are digitally based in work, recreational, and social environments.展开更多
An ensemble-based assimilation method is proposed for correcting the subsurface temperature field when nudging the sea surface temperature(SST) observations into the Max Planck Institute(MPI) climate model,ECHAM5/MPI-...An ensemble-based assimilation method is proposed for correcting the subsurface temperature field when nudging the sea surface temperature(SST) observations into the Max Planck Institute(MPI) climate model,ECHAM5/MPI-OM. This method can project SST directly to subsurface according to model ensemble-based correlations between SST and subsurface temperature. Results from a 50 year(1960–2009) assimilation experiment show the method can improve the subsurface temperature field up to 300 m compared to the qualitycontrolled subsurface ocean temperature objective analyses(EN4), through reducing the biases of the thermal states, improving the thermocline structure, and reducing the root mean square(RMS) errors. Moreover, as most of the improvements concentrate over the upper 100 m, the ocean heat content in the upper 100 m(OHT100 m)is further adopted as a property to validate the performance of the ensemble-based correction method. The results show that RMS errors of the global OHT100 m convergent to one value after several times iteration,indicating this method can represent the relationship between SST and subsurface temperature fields well, and then improve the accuracy of the simulation in the subsurface temperature of the climate model.展开更多
基金The fund from Southern Marine Science and Engineering Guangdong Laboratory(Zhuhai)under contract No.SML2021SP310the National Natural Science Foundation of China under contract Nos 42227901 and 42475061the Key R&D Program of Zhejiang Province under contract No.2024C03257.
文摘In this study,we conducted an experiment to construct multi-model ensemble(MME)predictions for the El Niño-Southern Oscillation(ENSO)using a neural network,based on hindcast data released from five coupled oceanatmosphere models,which exhibit varying levels of complexity.This nonlinear approach demonstrated extraordinary superiority and effectiveness in constructing ENSO MME.Subsequently,we employed the leave-one-out crossvalidation and the moving base methods to further validate the robustness of the neural network model in the formulation of ENSO MME.In conclusion,the neural network algorithm outperforms the conventional approach of assigning a uniform weight to all models.This is evidenced by an enhancement in correlation coefficients and reduction in prediction errors,which have the potential to provide a more accurate ENSO forecast.
基金sponsored by the National Natural Science Foundation of China(Grant Nos.41930971,42330111,and 42405061)the National Key Scientific and Technological Infrastructure project“Earth System Numerical Simulation Facility”(Earth Lab).
文摘Orthogonal conditional nonlinear optimal perturbations(O-CNOPs)have been used to generate ensemble forecasting members for achieving high forecasting skill of high-impact weather and climate events.However,highly efficient calculations for O-CNOPs are still challenging in the field of ensemble forecasting.In this study,we combine a gradient-based iterative idea with the Gram‒Schmidt orthogonalization,and propose an iterative optimization method to compute O-CNOPs.This method is different from the original sequential optimization method,and allows parallel computations of O-CNOPs,thus saving a large amount of computational time.We evaluate this method by using the Lorenz-96 model on the basis of the ensemble forecasting ability achieved and on the time consumed for computing O-CNOPs.The results demonstrate that the parallel iterative method causes O-CNOPs to yield reliable ensemble members and to achieve ensemble forecasting skills similar to or even slightly higher than those produced by the sequential method.Moreover,the parallel method significantly reduces the computational time for O-CNOPs.Therefore,the parallel iterative method provides a highly effective and efficient approach for calculating O-CNOPs for ensemble forecasts.Expectedly,it can play an important role in the application of the O-CNOPs to realistic ensemble forecasts for high-impact weather and climate events.
文摘Non-technical losses(NTL)of electric power are a serious problem for electric distribution companies.The solution determines the cost,stability,reliability,and quality of the supplied electricity.The widespread use of advanced metering infrastructure(AMI)and Smart Grid allows all participants in the distribution grid to store and track electricity consumption.During the research,a machine learning model is developed that allows analyzing and predicting the probability of NTL for each consumer of the distribution grid based on daily electricity consumption readings.This model is an ensemble meta-algorithm(stacking)that generalizes the algorithms of random forest,LightGBM,and a homogeneous ensemble of artificial neural networks.The best accuracy of the proposed meta-algorithm in comparison to basic classifiers is experimentally confirmed on the test sample.Such a model,due to good accuracy indicators(ROC-AUC-0.88),can be used as a methodological basis for a decision support system,the purpose of which is to form a sample of suspected NTL sources.The use of such a sample will allow the top management of electric distribution companies to increase the efficiency of raids by performers,making them targeted and accurate,which should contribute to the fight against NTL and the sustainable development of the electric power industry.
基金This study was supported by the National Key Research and Development Program of China(No.2017YFB0304100)Key Projects of the National Natural Science Foundation of China(No.51634002).
文摘Aiming at the problem of insufficient prediction accuracy of strip flatness at the outlet of cold tandem rolling,the prediction performance of strip flatness based on different ensemble methods was studied and a high-precision prediction ensemble model of strip flatness at the outlet was established.Firstly,based on linear regression(LR),K nearest neighbors(KNN),support vector regression,regression trees(RT),and backpropagation neural network(BPN),bagging,boosting,and stacking ensemble methods were used for ensemble experiments.Secondly,three existing ensemble models,i.e.,random forest,extreme random tree(ET)and extreme gradient boosting,were used to conduct experiments and compare the results.The research shows that bagging,boosting,and stacking three ensemble methods have the most significant improvement in the prediction accuracy of the regression trees model,which is increased by 5.28%,6.51%,and 5.32%,respectively.At the same time,the stacking ensemble method improves both the simple model and the complex model,and the improvement effect on the simple base model is the greatest,which is 4.69%higher than that of the base model KNN.Comparing all of the ensemble models,the stacking ensemble model of level-1(ET,AdaBoost-RT,LR,BPN)paired with level-2(LR)was discovered to be the best model(EALB-LR)and can be further studied for industrial applications.
文摘Advanced DriverAssistance Systems(ADAS)technologies can assist drivers or be part of automatic driving systems to support the driving process and improve the level of safety and comfort on the road.Traffic Sign Recognition System(TSRS)is one of themost important components ofADAS.Among the challengeswith TSRS is being able to recognize road signs with the highest accuracy and the shortest processing time.Accordingly,this paper introduces a new real time methodology recognizing Speed Limit Signs based on a trio of developed modules.Firstly,the Speed Limit Detection(SLD)module uses the Haar Cascade technique to generate a new SL detector in order to localize SL signs within captured frames.Secondly,the Speed Limit Classification(SLC)module,featuring machine learning classifiers alongside a newly developed model called DeepSL,harnesses the power of a CNN architecture to extract intricate features from speed limit sign images,ensuring efficient and precise recognition.In addition,a new Speed Limit Classifiers Fusion(SLCF)module has been developed by combining trained ML classifiers and the DeepSL model by using the Dempster-Shafer theory of belief functions and ensemble learning’s voting technique.Through rigorous software and hardware validation processes,the proposedmethodology has achieved highly significant F1 scores of 99.98%and 99.96%for DS theory and the votingmethod,respectively.Furthermore,a prototype encompassing all components demonstrates outstanding reliability and efficacy,with processing times of 150 ms for the Raspberry Pi board and 81.5 ms for the Nano Jetson board,marking a significant advancement in TSRS technology.
基金supported by the Taishan Scholars (No.ts201712003)。
文摘Nitrogen dioxide(NO_(2))poses a critical potential risk to environmental quality and public health.A reliable machine learning(ML)forecasting framework will be useful to provide valuable information to support government decision-making.Based on the data from1609 air quality monitors across China from 2014-2020,this study designed an ensemble ML model by integrating multiple types of spatial-temporal variables and three sub-models for time-sensitive prediction over a wide range.The ensemble ML model incorporates a residual connection to the gated recurrent unit(GRU)network and adopts the advantage of Transformer,extreme gradient boosting(XGBoost)and GRU with residual connection network,resulting in a 4.1%±1.0%lower root mean square error over XGBoost for the test results.The ensemble model shows great prediction performance,with coefficient of determination of 0.91,0.86,and 0.77 for 1-hr,3-hr,and 24-hr averages for the test results,respectively.In particular,this model has achieved excellent performance with low spatial uncertainty in Central,East,and North China,the major site-dense zones.Through the interpretability analysis based on the Shapley value for different temporal resolutions,we found that the contribution of atmospheric chemical processes is more important for hourly predictions compared with the daily scale predictions,while the impact of meteorological conditions would be ever-prominent for the latter.Compared with existing models for different spatiotemporal scales,the present model can be implemented at any air quality monitoring station across China to facilitate achieving rapid and dependable forecast of NO_(2),which will help developing effective control policies.
基金The National Nat-ural Science Foundation of China (NSFC), Grant Nos.90711003, 40375014the program of GYHY200706005, and the APCC Visiting Scientist Program jointly supportedthis work.
文摘The 21-yr ensemble predictions of model precipitation and circulation in the East Asian and western North Pacific (Asia-Pacific) summer monsoon region (0°-50°N, 100° 150°E) were evaluated in nine different AGCM, used in the Asia-Pacific Economic Cooperation Climate Center (APCC) multi-model ensemble seasonal prediction system. The analysis indicates that the precipitation anomaly patterns of model ensemble predictions are substantially different from the observed counterparts in this region, but the summer monsoon circulations are reasonably predicted. For example, all models can well produce the interannual variability of the western North Pacific monsoon index (WNPMI) defined by 850 hPa winds, but they failed to predict the relationship between WNPMI and precipitation anomalies. The interannual variability of the 500 hPa geopotential height (GPH) can be well predicted by the models in contrast to precipitation anomalies. On the basis of such model performances and the relationship between the interannual variations of 500 hPa GPH and precipitation anomalies, we developed a statistical scheme used to downscale the summer monsoon precipitation anomaly on the basis of EOF and singular value decomposition (SVD). In this scheme, the three leading EOF modes of 500 hPa GPH anomaly fields predicted by the models are firstly corrected by the linear regression between the principal components in each model and observation, respectively. Then, the corrected model GPH is chosen as the predictor to downscale the precipitation anomaly field, which is assembled by the forecasted expansion coefficients of model 500 hPa GPH and the three leading SVD modes of observed precipitation anomaly corresponding to the prediction of model 500 hPa GPH during a 19-year training period. The cross-validated forecasts suggest that this downscaling scheme may have a potential to improve the forecast skill of the precipitation anomaly in the South China Sea, western North Pacific and the East Asia Pacific regions, where the anomaly correlation coefficient (ACC) has been improved by 0.14, corresponding to the reduced RMSE of 10.4% in the conventional multi-model ensemble (MME) forecast.
基金Special Research Program for Public Welfare(Meteorology)of China(GYHY200906009,GYHY201006015,GYHY200906007)National Natural Science Foundation of China(4107503541475044)
文摘This study investigates multi-model ensemble forecasts of track and intensity of tropical cyclones over the western Pacific, based on forecast outputs from the China Meteorological Administration, European Centre for Medium-Range Weather Forecasts, Japan Meteorological Agency and National Centers for Environmental Prediction in the THORPEX Interactive Grand Global Ensemble(TIGGE) datasets. The multi-model ensemble schemes, namely the bias-removed ensemble mean(BREM) and superensemble(SUP), are compared with the ensemble mean(EMN) and single-model forecasts. Moreover, a new model bias estimation scheme is investigated and applied to the BREM and SUP schemes. The results showed that, compared with single-model forecasts and EMN, the multi-model ensembles of the BREM and SUP schemes can have smaller errors in most cases. However, there were also circumstances where BREM was less skillful than EMN, indicating that using a time-averaged error as model bias is not optimal. A new model bias estimation scheme of the biweight mean is introduced. Through minimizing the negative influence of singular errors, this scheme can obtain a more accurate model bias estimation and improve the BREM forecast skill. The application of the biweight mean in the bias calculation of SUP also resulted in improved skill. The results indicate that the modification of multi-model ensemble schemes through this bias estimation method is feasible.
基金supported by the National Natural Science Foundation of China(Grant Nos.4140508391437220 and 41305066)+1 种基金the Natural Science Foundation of Hunan Province(Grant No.2015JJ3098)the Fund Project for The Education Department of Hunan Province(Grant No.14C0897)
文摘In order to reduce the uncertainty of offline land surface model (LSM) simulations of land evapotranspiration (ET), we used ensemble simulations based on three meteorological forcing datasets [Princeton, ITPCAS (Institute of Tibetan Plateau Research, Chinese Academy of Sciences), Qian] and four LSMs (BATS, VIC, CLM3.0 and CLM3.5), to explore the trends and spatiotemporal characteristics of ET, as well as the spatiotemporal pattern of ET in response to climate factors over China's Mainland during 1982-2007. The results showed that various simulations of each member and their arithmetic mean (EnsAVlean) could capture the spatial distribution and seasonal pattern of ET sufficiently well, where they exhibited more significant spatial and seasonal variation in the ET compared with observation-based ET estimates (Obs_MTE). For the mean annual ET, we found that the BATS forced by Princeton forcing overestimated the annual mean ET compared with Obs_MTE for most of the basins in China, whereas the VIC forced by Princeton forcing showed underestimations. By contrast, the Ens_Mean was closer to Obs_MTE, although the results were underestimated over Southeast China. Furthermore, both the Obs_MTE and Ens_Mean exhibited a significant increasing trend during 1982-98; whereas after 1998, when the last big EI Nifio event occurred, the Ens_Mean tended to decrease significantly between 1999 and 2007, although the change was not significant for Obs_MTE. Changes in air temperature and shortwave radiation played key roles in the long-term variation in ET over the humid area of China, but precipitation mainly controlled the long-term variation in ET in arid and semi-arid areas of China.
基金co-supported by the National Natural Science Foundation (Grant Nos. 41005052 and 41375086)the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDA05110201)the National Basic Research Program of China (Grant No. 2010CB950403)
文摘Seasonal prediction of summer rainfall over the Yangtze River valley(YRV) is valuable for agricultural and industrial production and freshwater resource management in China, but remains a major challenge. Earlier multi-model ensemble(MME) prediction schemes for summer rainfall over China focus on single-value prediction, which cannot provide the necessary uncertainty information, while commonly-used ensemble schemes for probability density function(PDF) prediction are not adapted to YRV summer rainfall prediction. In the present study, an MME PDF prediction scheme is proposed based on the ENSEMBLES hindcasts. It is similar to the earlier Bayesian ensemble prediction scheme, but with optimization of ensemble members and a revision of the variance modeling of the likelihood function. The optimized ensemble members are regressed YRV summer rainfall with factors selected from model outputs of synchronous 500-h Pa geopotential height as predictors. The revised variance modeling of the likelihood function is a simple linear regression with ensemble spread as the predictor. The cross-validation skill of 1960–2002 YRV summer rainfall prediction shows that the new scheme produces a skillful PDF prediction, and is much better-calibrated, sharper, and more accurate than the earlier Bayesian ensemble and raw ensemble.
基金the National Natural Science Foundation of China(61873283)the Changsha Science&Technology Project(KQ1707017)the innovation-driven project of the Central South University(2019CX005).
文摘Dissolved oxygen(DO)is an important indicator of aquaculture,and its accurate forecasting can effectively improve the quality of aquatic products.In this paper,a new DO hybrid forecasting model is proposed that includes three stages:multi-factor analysis,adaptive decomposition,and an optimizationbased ensemble.First,considering the complex factors affecting DO,the grey relational(GR)degree method is used to screen out the environmental factors most closely related to DO.The consideration of multiple factors makes model fusion more effective.Second,the series of DO,water temperature,salinity,and oxygen saturation are decomposed adaptively into sub-series by means of the empirical wavelet transform(EWT)method.Then,five benchmark models are utilized to forecast the sub-series of EWT decomposition.The ensemble weights of these five sub-forecasting models are calculated by particle swarm optimization and gravitational search algorithm(PSOGSA).Finally,a multi-factor ensemble model for DO is obtained by weighted allocation.The performance of the proposed model is verified by timeseries data collected by the pacific islands ocean observing system(PacIOOS)from the WQB04 station at Hilo.The evaluation indicators involved in the experiment include the Nash–Sutcliffe efficiency(NSE),Kling–Gupta efficiency(KGE),mean absolute percent error(MAPE),standard deviation of error(SDE),and coefficient of determination(R^(2)).Example analysis demonstrates that:①The proposed model can obtain excellent DO forecasting results;②the proposed model is superior to other comparison models;and③the forecasting model can be used to analyze the trend of DO and enable managers to make better management decisions.
基金National 973 Program of China(2012CB417204)National Natural Science Foundation of China(41075035,41475044)Special Fund for Meteorological Scientific Research in the Public Interest(GYHY201006015)
文摘Persistent Heavy Rainfall(PHR)is the most influential extreme weather event in Asia in summer,and thus it has attracted intensive interests of many scientists.In this study,operational global ensemble forecasts from China Meteorological Administration(CMA)are used,and a new verification method applied to evaluate the predictability of PHR is investigated.A metrics called Index of Composite Predictability(ICP)established on basic verification indicators,i.e.,Equitable Threat Score(ETS)of 24 h accumulated precipitation and Root Mean Square Error(RMSE)of Height at 500 h Pa,are selected in this study to distinguish"good"and"poor"prediction from all ensemble members.With the use of the metrics of ICP,the predictability of two typical PHR events in June 2010 and June 2011 is estimated.The results show that the"good member"and"poor member"can be identified by ICP and there is an obvious discrepancy in their ability to predict the key weather system that affects PHR."Good member"shows a higher predictability both in synoptic scale and mesoscale weather system in their location,duration and the movement.The growth errors for"poor"members is mainly due to errors of initial conditions in northern polar region.The growth of perturbation errors and the reason for better or worse performance of ensemble member also have great value for future model improvement and further research.
文摘Social media is a platform to express one′s views and opinions freely and has made communication easier than it was before.This also opens up an opportunity for people to spread fake news intentionally.The ease of access to a variety of news sources on the web also brings the problem of people being exposed to fake news and possibly believing such news.This makes it important for us to detect and flag such content on social media.With the current rate of news generated on social media,it is difficult to differentiate between genuine news and hoaxes without knowing the source of the news.This paper discusses approaches to detection of fake news using only the features of the text of the news,without using any other related metadata.We observe that a combination of stylometric features and text-based word vector representations through ensemble methods can predict fake news with an accuracy of up to 95.49%.
基金Project supported by the National Key Research and Development Program of China (Grant No.2021YFB3900701)the Science and Technology Plan Project of the State Administration for Market Regulation of China (Grant No.2023MK178)the National Natural Science Foundation of China (Grant No.42227802)。
文摘A redundant-subspace-weighting(RSW)-based approach is proposed to enhance the frequency stability on a time scale of a clock ensemble.In this method,multiple overlapping subspaces are constructed in the clock ensemble,and the weight of each clock in this ensemble is defined by using the spatial covariance matrix.The superimposition average of covariances in different subspaces reduces the correlations between clocks in the same laboratory to some extent.After optimizing the parameters of this weighting procedure,the frequency stabilities of virtual clock ensembles are significantly improved in most cases.
基金Supported by the Excellent National State Key Laboratory Project! (49823002)the National Key Project 'Study on Chinese Short
文摘The approach of getting useful information of monthly dynamical prediction from ensemble forecasts is studied. The extended range ensemble forecasts (8 members, the initial perturbations of the lagged average forecast (LAF)(0000, 0600, 1200 and 1800 GMT in two consecutive days) of the 500 hPa height field with the global spectral model (T63L16) from January to May 1997 are provided by the National Climate Center of China. The relationship between the spread of ensemble measured by root–mean–square deviation of ensemble member from ensemble mean and forecast skill (the anomaly correlation or the root–mean–square distance between the ensemble mean forecast and the observation) is significant. The spread of ensemble can evaluate the useful forecast days N for the best estimate of 30 days mean. Thus, a weighted mean approach based on ensemble spread is put forward for monthly dynamical prediction. The anomaly correlation of the weighted monthly mean by the ensemble spread is higher than that of both the arithmetic mean and the linear weighted mean. Better results of the monthly mean circulation and anomaly are obtained from the ensemble spread weighted mean. Key words Monthly prediction - Ensemble method - Spread of ensemble Supported by the Excellent National State Key Laboratory Project (49823002), the National Key Project ‘Study on Chinese Short-Term Climate Forecast System’ (96-908-02) and IAP Innovation Foundation (8-1308).The data were provided through the National Climate Center of China. The authors wish to thank Ms. Chen Lijuan for her assistance.
基金partially sponsored by the Natural Science Foundation of Shanghai(No.23ZR1429300)the Innovation Fund of CNNC(Lingchuang Fund)。
文摘The estimation of model parameters is an important subject in engineering.In this area of work,the prevailing approach is to estimate or calculate these as deterministic parameters.In this study,we consider the model parameters from the perspective of random variables and describe the general form of the parameter distribution inference problem.Under this framework,we propose an ensemble Bayesian method by introducing Bayesian inference and the Markov chain Monte Carlo(MCMC)method.Experiments on a finite cylindrical reactor and a 2D IAEA benchmark problem show that the proposed method converges quickly and can estimate parameters effectively,even for several correlated parameters simultaneously.Our experiments include cases of engineering software calls,demonstrating that the method can be applied to engineering,such as nuclear reactor engineering.
文摘Selecting which explanatory variables to include in a given score is a common difficulty, as a balance must be found between statistical fit and practical application. This article presents a methodology for constructing parsimonious event risk scores combining a stepwise selection of variables with ensemble scores obtained by aggregation of several scores, using several classifiers, bootstrap samples and various modalities of random selection of variables. Selection methods based on a probabilistic model can be used to achieve a stepwise selection for a given classifier such as logistic regression, but not directly for an ensemble classifier constructed by aggregation of several classifiers. Three selection methods are proposed in this framework, two involving a backward selection of the variables based on their coefficients in an ensemble score and the third involving a forward selection of the variables maximizing the AUC. The stepwise selection allows constructing a succession of scores, with the practitioner able to choose which score best fits his needs. These three methods are compared in an application to construct parsimonious short-term event risk scores in chronic HF patients, using as event the composite endpoint of death or hospitalization for worsening HF within 180 days of a visit. Focusing on the fastest method, four scores are constructed, yielding out-of-bag AUCs ranging from 0.81 (26 variables) to 0.76 (2 variables).
基金supported by the Knowledge Innovation Key Project of Chinese Academy of Sciences (CAS) under Grant No.KZCX2-YW-217Doctor Research Startup Project at the Institute of Atmospheric Physics,the CAS under Grant No.7-098300
文摘A Bayesian probabilistic prediction scheme of the Yangtze River Valley (YRV) summer rainfall is proposed to combine forecast information from multi-model ensemble dataset provided by ENSEMBLES project.Due to the low forecast skill of rainfall in dynamic models,the time series of regressed YRV summer rainfall are selected as ensemble members in the new scheme,instead of commonly-used YRV summer rainfall simulated by models.Each time series of regressed YRV summer rainfall is derived from a simple linear regression.The predictor in each simple linear regression is the skillfully simulated circulation or surface temperature factor which is highly linear with the observed YRV summer rainfall in the training set.The high correlation between the ensemble mean of these regressed YRV summer rainfall and observation benefit extracting more sample information from the ensemble system.The results show that the cross-validated skill of the new scheme over the period of 1960 to 2002 is much higher than equally-weighted ensemble,multiple linear regression,and Bayesian ensemble with simulated YRV summer rainfall as ensemble members.In addition,the new scheme is also more skillful than reference forecasts (random forecast at a 0.01 significance level for ensemble mean and climatology forecast for probability density function).
文摘The use of machine learning algorithms to identify characteristics in Distributed Denial of Service (DDoS) attacks has emerged as a powerful approach in cybersecurity. DDoS attacks, which aim to overwhelm a network or service with a flood of malicious traffic, pose significant threats to online systems. Traditional methods of detection and mitigation often struggle to keep pace with the evolving nature of these attacks. Machine learning, with its ability to analyze vast amounts of data and recognize patterns, offers a robust solution to this challenge. The aim of the paper is to demonstrate the application of ensemble ML algorithms, namely the K-Means and the KNN, for a dual clustering mechanism when used with PySpark to collect 99% accurate data. The algorithms, when used together, identify distinctive features of DDoS attacks that prove a very accurate reflection of reality, so they are a good combination for this aim. Impressively, having preprocessed the data, both algorithms with the PySpark foundation enabled the achievement of 99% accuracy when tuned on the features of a DDoS big dataset. The semi-supervised dataset tabulates traffic anomalies in terms of packet size distribution in correlation to Flow Duration. By training the K-Means Clustering and then applying the KNN to the dataset, the algorithms learn to evaluate the character of activity to a greater degree by displaying density with ease. The study evaluates the effectiveness of the K-Means Clustering with the KNN as ensemble algorithms that adapt very well in detecting complex patterns. Ultimately, cross-reaching environmental results indicate that ML-based approaches significantly improve detection rates compared to traditional methods. Furthermore, ensemble learning methods, which combine two plus multiple models to improve prediction accuracy, show greatness in handling the complexity and variability of big data sets especially when implemented by PySpark. The findings suggest that the enhancement of accuracy derives from newer software that’s designed to reflect reality. However, challenges remain in the deployment of these systems, including the need for large, high-quality datasets and the potential for adversarial attacks that attempt to deceive the ML models. Future research should continue to improve the robustness and efficiency of combining algorithms, as well as integrate them with existing security frameworks to provide comprehensive protection against DDoS attacks and other areas. The dataset was originally created by the University of New Brunswick to analyze DDoS data. The dataset itself was based on logs of the university’s servers, which found various DoS attacks throughout the publicly available period to totally generate 80 attributes with a 6.40GB size. In this dataset, the label and binary column become a very important portion of the final classification. In the last column, this means the normal traffic would be differentiated by the attack traffic. Further analysis is then ripe for investigation. Finally, malicious traffic alert software, as an example, should be trained on packet influx to Flow Duration dependence, which creates a mathematical scope for averages to enact. In achieving such high accuracy, the project acts as an illustration (referenced in the form of excerpts from my Google Colab account) of many attempts to tune. Cybersecurity advocates for more work on the character of brute-force attack traffic and normal traffic features overall since most of our investments as humans are digitally based in work, recreational, and social environments.
基金The National Key R&D Program of China under contract No. 2017YFA0604201the National Natural Science Foundation of China under contract Nos 41876012 and 41861144015.
文摘An ensemble-based assimilation method is proposed for correcting the subsurface temperature field when nudging the sea surface temperature(SST) observations into the Max Planck Institute(MPI) climate model,ECHAM5/MPI-OM. This method can project SST directly to subsurface according to model ensemble-based correlations between SST and subsurface temperature. Results from a 50 year(1960–2009) assimilation experiment show the method can improve the subsurface temperature field up to 300 m compared to the qualitycontrolled subsurface ocean temperature objective analyses(EN4), through reducing the biases of the thermal states, improving the thermocline structure, and reducing the root mean square(RMS) errors. Moreover, as most of the improvements concentrate over the upper 100 m, the ocean heat content in the upper 100 m(OHT100 m)is further adopted as a property to validate the performance of the ensemble-based correction method. The results show that RMS errors of the global OHT100 m convergent to one value after several times iteration,indicating this method can represent the relationship between SST and subsurface temperature fields well, and then improve the accuracy of the simulation in the subsurface temperature of the climate model.