The proliferation of robot accounts on social media platforms has posed a significant negative impact,necessitating robust measures to counter network anomalies and safeguard content integrity.Social robot detection h...The proliferation of robot accounts on social media platforms has posed a significant negative impact,necessitating robust measures to counter network anomalies and safeguard content integrity.Social robot detection has emerged as a pivotal yet intricate task,aimed at mitigating the dissemination of misleading information.While graphbased approaches have attained remarkable performance in this realm,they grapple with a fundamental limitation:the homogeneity assumption in graph convolution allows social robots to stealthily evade detection by mingling with genuine human profiles.To unravel this challenge and thwart the camouflage tactics,this work proposed an innovative social robot detection framework based on enhanced HOmogeneity and Random Forest(HORFBot).At the core of HORFBot lies a homogeneous graph enhancement strategy,intricately woven with edge-removal techniques,tometiculously dissect the graph intomultiple revealing subgraphs.Subsequently,leveraging the power of contrastive learning,the proposed methodology meticulously trains multiple graph convolutional networks,each honed to discern nuances within these tailored subgraphs.The culminating stage involves the fusion of these feature-rich base classifiers,harmoniously aggregating their insights to produce a comprehensive detection outcome.Extensive experiments on three social robot detection datasets have shown that this method effectively improves the accuracy of social robot detection and outperforms comparative methods.展开更多
Evaluation of water richness in sandstone is an important research topic in the prevention and control of mine water disasters,and the water richness in sandstone is closely related to its porosity.The refl ection sei...Evaluation of water richness in sandstone is an important research topic in the prevention and control of mine water disasters,and the water richness in sandstone is closely related to its porosity.The refl ection seismic exploration data have high-density spatial sampling information,which provides an important data basis for the prediction of sandstone porosity in coal seam roofs by using refl ection seismic data.First,the basic principles of the variational mode decomposition(VMD)method and the random forest method are introduced.Then,the geological model of coal seam roof sandstone is constructed,seismic forward modeling is conducted,and random noise is added.The decomposition eff ects of the empirical mode decomposition(EMD)method and VMD method on noisy signals are compared and analyzed.The test results show that the firstorder intrinsic mode functions(IMF1)and IMF2 decomposed by the VMD method contain the main eff ective components of seismic signals.A prediction process of sandstone porosity in coal seam roofs based on the combination of VMD and random forest method is proposed.The feasibility and eff ectiveness of the method are verified by trial calculation in the porosity prediction of model data.Taking the actual coalfield refl ection seismic data as an example,the sandstone porosity of the 8 coal seam roof is predicted.The application results show the potential application value of the new porosity prediction method proposed in this study.This method has important theoretical guiding significance for evaluating water richness in coal seam roof sandstone and the prevention and control of mine water disasters.展开更多
To improve the efficiency of air quality analysis and the accuracy of predictions, this paper proposes a composite method based on Vector Autoregressive (VAR) and Random Forest (RF) models. In the theoretical section,...To improve the efficiency of air quality analysis and the accuracy of predictions, this paper proposes a composite method based on Vector Autoregressive (VAR) and Random Forest (RF) models. In the theoretical section, the model introduction and estimation algorithms are provided. In the empirical analysis section, global air quality data from 2022 to 2024 are used, and the proposed method is applied. Specifically, principal component analysis (PCA) is first conducted, and then VAR and Random Forest methods are used for prediction on the reduced-dimensional data. The results show that the RMSE of the hybrid model is 45.27, significantly lower than the 49.11 of the VAR model alone, verifying its superiority. The stability and predictive performance of the model are effectively enhanced.展开更多
Detecting cyber attacks in networks connected to the Internet of Things(IoT)is of utmost importance because of the growing vulnerabilities in the smart environment.Conventional models,such as Naive Bayes and support v...Detecting cyber attacks in networks connected to the Internet of Things(IoT)is of utmost importance because of the growing vulnerabilities in the smart environment.Conventional models,such as Naive Bayes and support vector machine(SVM),as well as ensemble methods,such as Gradient Boosting and eXtreme gradient boosting(XGBoost),are often plagued by high computational costs,which makes it challenging for them to perform real-time detection.In this regard,we suggested an attack detection approach that integrates Visual Geometry Group 16(VGG16),Artificial Rabbits Optimizer(ARO),and Random Forest Model to increase detection accuracy and operational efficiency in Internet of Things(IoT)networks.In the suggested model,the extraction of features from malware pictures was accomplished with the help of VGG16.The prediction process is carried out by the random forest model using the extracted features from the VGG16.Additionally,ARO is used to improve the hyper-parameters of the random forest model of the random forest.With an accuracy of 96.36%,the suggested model outperforms the standard models in terms of accuracy,F1-score,precision,and recall.The comparative research highlights our strategy’s success,which improves performance while maintaining a lower computational cost.This method is ideal for real-time applications,but it is effective.展开更多
This paper explores the synergistic effect of a model combining Elastic Net and Random Forest in online fraud detection.The study selects a public network dataset containing 1781 data records,divides the dataset by 70...This paper explores the synergistic effect of a model combining Elastic Net and Random Forest in online fraud detection.The study selects a public network dataset containing 1781 data records,divides the dataset by 70%for training and 30%for validation,and analyses the correlation between features using a correlation matrix.The experimental results show that the Elastic Net feature selection method generally outperforms PCA in all models,especially when combined with the Random Forest and XGBoost models,and the ElasticNet+Random Forest model achieves the highest accuracy of 0.968 and AUC value of 0.983,while the Kappa and MCC also reached 0.839 and 0.844 respectively,showing extremely high consistency and correlation.This indicates that combining Elastic Net feature selection and Random Forest model has significant performance advantages in online fraud detection.展开更多
The Darjeeling Himalayan region,characterized by its complex topography and vulnerability to multiple environmental hazards,faces significant challenges including landslides,earthquakes,flash floods,and soil loss that...The Darjeeling Himalayan region,characterized by its complex topography and vulnerability to multiple environmental hazards,faces significant challenges including landslides,earthquakes,flash floods,and soil loss that critically threaten ecosystem stability.Among these challenges,soil erosion emerges as a silent disaster-a gradual yet relentless process whose impacts accumulate over time,progressively degrading landscape integrity and disrupting ecological sustainability.Unlike catastrophic events with immediate visibility,soil erosion’s most devastating consequences often manifest decades later through diminished agricultural productivity,habitat fragmentation,and irreversible biodiversity loss.This study developed a scalable predictive framework employing Random Forest(RF)and Gradient Boosting Tree(GBT)machine learning models to assess and map soil erosion susceptibility across the region.A comprehensive geo-database was developed incorporating 11 erosion triggering factors:slope,elevation,rainfall,drainage density,topographic wetness index,normalized difference vegetation index,curvature,soil texture,land use,geology,and aspect.A total of 2,483 historical soil erosion locations were identified and randomly divided into two sets:70%for model building and 30%for validation purposes.The models revealed distinct spatial patterns of erosion risks,with GBT classifying 60.50%of the area as very low susceptibility,while RF identified 28.92%in this category.Notable differences emerged in high-risk zone identification,with GBT highlighting 7.42%and RF indicating 2.21%as very high erosion susceptibility areas.Both models demonstrated robust predictive capabilities,with GBT achieving 80.77%accuracy and 0.975 AUC,slightly outperforming RF’s 79.67%accuracy and 0.972 AUC.Analysis of predictor variables identified elevation,slope,rainfall and NDVI as the primary factors influencing erosion susceptibility,highlighting the complex interrelationship between geo-environmental factors and erosion processes.This research offers a strategic framework for targeted conservation and sustainable land management in the fragile Himalayan region,providing valuable insights to help policymakers implement effective soil erosion mitigation strategies and support long-term environmental sustainability.展开更多
Zenith wet delay(ZWD)is a key parameter for the precise positioning of global navigation satellite systems(GNSS)and occupies a central role in meteorological research.Currently,most models only consider the periodic v...Zenith wet delay(ZWD)is a key parameter for the precise positioning of global navigation satellite systems(GNSS)and occupies a central role in meteorological research.Currently,most models only consider the periodic variability of the ZWD,neglecting the effect of nonlinear factors on the ZWD estimation.This oversight results in a limited capability to reflect the rapid fluctuations of the ZWD.To more accurately capture and predict complicated variations in ZWD,this paper developed the CRZWD model by a combination of the GPT3 model and random forests(RF)algorithm using 5-year atmospheric profiles from 70 radiosonde(RS)stations across China.Taking the external 25 test stations data as reference,the root mean square(RMS)of the CRZWD model is 29.95 mm.Compared with the GPT3 model and another model using backpropagation neural network(BPNN),the accuracy has improved by 24.7%and 15.9%,respectively.Notably,over 56%of the test stations exhibit an improvement of more than 20%in contrast to GPT3-ZWD.Further temporal and spatial characteristic analyses also demonstrate the significant accuracy and stability advantages of the CRZWD model,indicating the potential prospects for GNSS-based applications.展开更多
With the popularization of microgrid construction and the connection of renewable energy sources to the power system,the problem of source and load uncertainty faced by the coordinated operation of multi-microgrid is ...With the popularization of microgrid construction and the connection of renewable energy sources to the power system,the problem of source and load uncertainty faced by the coordinated operation of multi-microgrid is becoming increasingly prominent,and the accuracy of typical scenario predictions is low.In order to improve the accuracy of scenario prediction under source and load uncertainty,this paper proposes a typical scenario identification model based on random forests and order parameters.Firstly,a method for ordinal parameter identification and quantification is provided for the coordinated operating mode of multi-microgrids,taking into account source-load uncertainty.Secondly,the dynamic change characteristics of the order parameters of the daily load curve,wind and solar curve,and load curve of typical scenarios are statistically analyzed to identify the key order parameters that have the most significant impact on the uncertainty of the load.Then,the order parameters and seasonal distribution are used as features to train a random forest classification model to achieve efficient scenario prediction.Finally,the simulation of actual data from a provincial distribution network shows that the proposed method can accurately classify typical scenarios with an accuracy rate of 92.7%.Additionally,sensitivity analysis is conducted to assess how changes in uncertainty levels affect the importance of each order parameter,allowing for adaptive uncertainty mitigation strategies.展开更多
One of the core works of analyzing Electrochemical Impedance Spectroscopy(EIS)data is to select an appropriate equivalent circuit model to quantify the parameters of the electrochemical reaction process.However,this p...One of the core works of analyzing Electrochemical Impedance Spectroscopy(EIS)data is to select an appropriate equivalent circuit model to quantify the parameters of the electrochemical reaction process.However,this process often relies on human experience and judgment,which will introduce subjectivity and error.In this paper,an intelligent approach is proposed for matching EIS data to their equivalent circuits based on the Random Forest algorithm.It can automatically select the most suitable equivalent circuit model based on the characteristics and patterns of EIS data.Addressing the typical scenario of metal corrosion,an atmospheric corrosion EIS dataset of low-carbon steel is constructed in this paper,which includes five different corrosion scenarios.This dataset was used to validate and evaluate the pro-posed method in this paper.The contributions of this paper can be summarized in three aspects:(1)This paper proposes a method for selecting equivalent circuit models for EIS data based on the Random Forest algorithm.(2)Using authentic EIS data collected from metal atmospheric corrosion,the paper es-tablishes a dataset encompassing five categories of metal corrosion scenarios.(3)The superiority of the proposed method is validated through the utilization of the established authentic EIS dataset.The ex-periment results demonstrate that,in terms of equivalent circuit matching,this method surpasses other machine learning algorithms in both precision and robustness.Furthermore,it shows strong applicability in the analysis of EIS data.展开更多
A switch from avian-typeα-2,3 to human-typeα-2,6 receptors is an essential element for the initiation of a pandemic from an avian influenza virus.Some H9N2 viruses exhibit a preference for binding to human-typeα-2,...A switch from avian-typeα-2,3 to human-typeα-2,6 receptors is an essential element for the initiation of a pandemic from an avian influenza virus.Some H9N2 viruses exhibit a preference for binding to human-typeα-2,6 receptors.This identifies their potential threat to public health.However,our understanding of the molecular basis for the switch of receptor preference is still limited.In this study,we employed the random forest algorithm to identify the potentially key amino acid sites within hemagglutinin(HA),which are associated with the receptor binding ability of H9N2 avian influenza virus(AIV).Subsequently,these sites were further verified by receptor binding assays.A total of 12 substitutions in the HA protein(N158D,N158S,A160 N,A160D,A160T,T163I,T163V,V190T,V190A,D193 N,D193G,and N231D)were predicted to prefer binding toα-2,6 receptors.Except for the V190T substitution,the other substitutions were demonstrated to display an affinity for preferential binding toα-2,6 receptors by receptor binding assays.Especially,the A160T substitution caused a significant upregulation of immune-response genes and an increased mortality rate in mice.Our findings provide novel insights into understanding the genetic basis of receptor preference of the H9N2 AIV.展开更多
The agricultural Internet of Things(IoT)system is a critical component of modern smart agriculture,and its security risk assessment methods have garnered increasing attention from the industry.Current agricultural IoT...The agricultural Internet of Things(IoT)system is a critical component of modern smart agriculture,and its security risk assessment methods have garnered increasing attention from the industry.Current agricultural IoT security risk assessment methods primarily rely on expert judgment,introducing subjective factors that reduce the credibility of the assessment results.To address this issue,this study constructed a dataset for agricultural IoT security risk assessment based on real-world security reports.A PCARF algorithm,built on random forest principles,was proposed,incorporating ensemble learning strategies to enhance prediction accuracy.Compared to the second-best model,the proposed model demonstrated a 2.7%increase in accuracy,a 3.4%improvement in recall rate,a 3.1%rise in Area Under the Curve(AUC),and a 7.9%boost in Matthews Correlation Coefficient(MCC).Extensive comparative experiments showed that the proposed model outperforms others in prediction accuracy and robustness.展开更多
Accurate Electric Load Forecasting(ELF)is crucial for optimizing production capacity,improving operational efficiency,and managing energy resources effectively.Moreover,precise ELF contributes to a smaller environment...Accurate Electric Load Forecasting(ELF)is crucial for optimizing production capacity,improving operational efficiency,and managing energy resources effectively.Moreover,precise ELF contributes to a smaller environmental footprint by reducing the risks of disruption,downtime,and waste.However,with increasingly complex energy consumption patterns driven by renewable energy integration and changing consumer behaviors,no single approach has emerged as universally effective.In response,this research presents a hybrid modeling framework that combines the strengths of Random Forest(RF)and Autoregressive Integrated Moving Average(ARIMA)models,enhanced with advanced feature selection—Minimum Redundancy Maximum Relevancy and Maximum Synergy(MRMRMS)method—to produce a sparse model.Additionally,the residual patterns are analyzed to enhance forecast accuracy.High-resolution weather data from Weather Underground and historical energy consumption data from PJM for Duke Energy Ohio and Kentucky(DEO&K)are used in this application.This methodology,termed SP-RF-ARIMA,is evaluated against existing approaches;it demonstrates more than 40%reduction in mean absolute error and root mean square error compared to the second-best method.展开更多
The prediction of slope stability is a complex nonlinear problem.This paper proposes a new method based on the random forest(RF)algorithm to study the rocky slopes stability.Taking the Bukit Merah,Perak and Twin Peak(...The prediction of slope stability is a complex nonlinear problem.This paper proposes a new method based on the random forest(RF)algorithm to study the rocky slopes stability.Taking the Bukit Merah,Perak and Twin Peak(Kuala Lumpur)as the study area,the slope characteristics of geometrical parameters are obtained from a multidisciplinary approach(consisting of geological,geotechnical,and remote sensing analyses).18 factors,including rock strength,rock quality designation(RQD),joint spacing,continuity,openness,roughness,filling,weathering,water seepage,temperature,vegetation index,water index,and orientation,are selected to construct model input variables while the factor of safety(FOS)functions as an output.The area under the curve(AUC)value of the receiver operating characteristic(ROC)curve is obtained with precision and accuracy and used to analyse the predictive model ability.With a large training set and predicted parameters,an area under the ROC curve(the AUC)of 0.95 is achieved.A precision score of 0.88 is obtained,indicating that the model has a low false positive rate and correctly identifies a substantial number of true positives.The findings emphasise the importance of using a variety of terrain characteristics and different approaches to characterise the rock slope.展开更多
The method of Random Forest (RF) was used to classify whether rockburst will happen and the intensity of rockburst in the underground rock projects. Some main control factors of rockburst, such as the values of in-s...The method of Random Forest (RF) was used to classify whether rockburst will happen and the intensity of rockburst in the underground rock projects. Some main control factors of rockburst, such as the values of in-situ stresses, uniaxial compressive strength and tensile strength of rock, and the elastic energy index of rock, were selected in the analysis. The traditional indicators were summarized and divided into indexes I and 1I. Random Forest model and criterion were obtained through training 36 sets of rockburst samples which come from underground rock projects in domestic and abroad. Another 10 samples were tested and evaluated with the model. The evaluated results agree well with the practical records. Comparing the results of support vector machine (SVM) method, and artificial neural network (ANN) method with random forest method, the corresponding misjudgment ratios are 10%, 20%, and 0, respectively. The misjudgment ratio using index I is smaller than that using index II. It is suggested that using the index I and RF model can accurately classify rockburst grade.展开更多
In order to avoid the noise and over fitting and further improve the limited classification performance of the real decision tree, a traffic incident detection method based on the random forest algorithm is presented....In order to avoid the noise and over fitting and further improve the limited classification performance of the real decision tree, a traffic incident detection method based on the random forest algorithm is presented. From the perspective of classification strength and correlation, three experiments are performed to investigate the potential application of random forest to traffic incident detection: comparison with a different number of decision trees; comparison with different decision trees; comparison with the neural network. The real traffic data of the 1-880 database is used in the experiments. The detection performance is evaluated by the common criteria including the detection rate, the false alarm rate, the mean time to detection, the classification rate and the area under the curve of the receiver operating characteristic (ROC). The experimental results indicate that the model based on random forest can improve the decision rate, reduce the testing time, and obtain a higher classification rate. Meanwhile, it is competitive compared with multi-layer feed forward neural networks (MLF).展开更多
Accurate assessment of undrained shear strength(USS)for soft sensitive clays is a great concern in geotechnical engineering practice.This study applies novel data-driven extreme gradient boosting(XGBoost)and random fo...Accurate assessment of undrained shear strength(USS)for soft sensitive clays is a great concern in geotechnical engineering practice.This study applies novel data-driven extreme gradient boosting(XGBoost)and random forest(RF)ensemble learning methods for capturing the relationships between the USS and various basic soil parameters.Based on the soil data sets from TC304 database,a general approach is developed to predict the USS of soft clays using the two machine learning methods above,where five feature variables including the preconsolidation stress(PS),vertical effective stress(VES),liquid limit(LL),plastic limit(PL)and natural water content(W)are adopted.To reduce the dependence on the rule of thumb and inefficient brute-force search,the Bayesian optimization method is applied to determine the appropriate model hyper-parameters of both XGBoost and RF.The developed models are comprehensively compared with three comparison machine learning methods and two transformation models with respect to predictive accuracy and robustness under 5-fold cross-validation(CV).It is shown that XGBoost-based and RF-based methods outperform these approaches.Besides,the XGBoostbased model provides feature importance ranks,which makes it a promising tool in the prediction of geotechnical parameters and enhances the interpretability of model.展开更多
The present study aims to develop two hybrid models to optimize the factors and enhance the predictive ability of the landslide susceptibility models.For this,a landslide inventory map was created with 406 historical ...The present study aims to develop two hybrid models to optimize the factors and enhance the predictive ability of the landslide susceptibility models.For this,a landslide inventory map was created with 406 historical landslides and 2030 non-landslide points,which was randomly divided into two datasets for model training(70%)and model testing(30%).22 factors were initially selected to establish a landslide factor database.We applied the GeoDetector and recursive feature elimination method(RFE)to address factor optimization to reduce information redundancy and collinearity in the data.Thereafter,the frequency ratio method,multicollinearity test,and interactive detector were used to analyze and evaluate the optimized factors.Subsequently,the random forest(RF)model was used to create a landslide susceptibility map with original and optimized factors.The resultant hybrid models GeoDetector-RF and RFE-RF were evaluated and compared by the area under the receiver operating characteristic curve(AUC)and accuracy.The accuracy of the two hybrid models(0.868 for GeoDetector-RF and 0.869 for RFE-RF)were higher than that of the RF model(0.860),indicating that the hybrid models with factor optimization have high reliability and predictability.Both RFE-RF GeoDetector-RF had higher AUC values,respectively 0.863 and 0.860,than RF(0.853).These results confirm the ability of factor optimization methods to improve the performance of landslide susceptibility models.展开更多
基金Funds for the Central Universities(grant number CUC24SG018).
文摘The proliferation of robot accounts on social media platforms has posed a significant negative impact,necessitating robust measures to counter network anomalies and safeguard content integrity.Social robot detection has emerged as a pivotal yet intricate task,aimed at mitigating the dissemination of misleading information.While graphbased approaches have attained remarkable performance in this realm,they grapple with a fundamental limitation:the homogeneity assumption in graph convolution allows social robots to stealthily evade detection by mingling with genuine human profiles.To unravel this challenge and thwart the camouflage tactics,this work proposed an innovative social robot detection framework based on enhanced HOmogeneity and Random Forest(HORFBot).At the core of HORFBot lies a homogeneous graph enhancement strategy,intricately woven with edge-removal techniques,tometiculously dissect the graph intomultiple revealing subgraphs.Subsequently,leveraging the power of contrastive learning,the proposed methodology meticulously trains multiple graph convolutional networks,each honed to discern nuances within these tailored subgraphs.The culminating stage involves the fusion of these feature-rich base classifiers,harmoniously aggregating their insights to produce a comprehensive detection outcome.Extensive experiments on three social robot detection datasets have shown that this method effectively improves the accuracy of social robot detection and outperforms comparative methods.
基金National Natural Science Foundation of China(Grant No.42274180)National Key Research and Development Program of China(2021YFC2902003).
文摘Evaluation of water richness in sandstone is an important research topic in the prevention and control of mine water disasters,and the water richness in sandstone is closely related to its porosity.The refl ection seismic exploration data have high-density spatial sampling information,which provides an important data basis for the prediction of sandstone porosity in coal seam roofs by using refl ection seismic data.First,the basic principles of the variational mode decomposition(VMD)method and the random forest method are introduced.Then,the geological model of coal seam roof sandstone is constructed,seismic forward modeling is conducted,and random noise is added.The decomposition eff ects of the empirical mode decomposition(EMD)method and VMD method on noisy signals are compared and analyzed.The test results show that the firstorder intrinsic mode functions(IMF1)and IMF2 decomposed by the VMD method contain the main eff ective components of seismic signals.A prediction process of sandstone porosity in coal seam roofs based on the combination of VMD and random forest method is proposed.The feasibility and eff ectiveness of the method are verified by trial calculation in the porosity prediction of model data.Taking the actual coalfield refl ection seismic data as an example,the sandstone porosity of the 8 coal seam roof is predicted.The application results show the potential application value of the new porosity prediction method proposed in this study.This method has important theoretical guiding significance for evaluating water richness in coal seam roof sandstone and the prevention and control of mine water disasters.
文摘To improve the efficiency of air quality analysis and the accuracy of predictions, this paper proposes a composite method based on Vector Autoregressive (VAR) and Random Forest (RF) models. In the theoretical section, the model introduction and estimation algorithms are provided. In the empirical analysis section, global air quality data from 2022 to 2024 are used, and the proposed method is applied. Specifically, principal component analysis (PCA) is first conducted, and then VAR and Random Forest methods are used for prediction on the reduced-dimensional data. The results show that the RMSE of the hybrid model is 45.27, significantly lower than the 49.11 of the VAR model alone, verifying its superiority. The stability and predictive performance of the model are effectively enhanced.
基金funded by Institutional Fund Projects under grant no.(IFPDP-261-22)。
文摘Detecting cyber attacks in networks connected to the Internet of Things(IoT)is of utmost importance because of the growing vulnerabilities in the smart environment.Conventional models,such as Naive Bayes and support vector machine(SVM),as well as ensemble methods,such as Gradient Boosting and eXtreme gradient boosting(XGBoost),are often plagued by high computational costs,which makes it challenging for them to perform real-time detection.In this regard,we suggested an attack detection approach that integrates Visual Geometry Group 16(VGG16),Artificial Rabbits Optimizer(ARO),and Random Forest Model to increase detection accuracy and operational efficiency in Internet of Things(IoT)networks.In the suggested model,the extraction of features from malware pictures was accomplished with the help of VGG16.The prediction process is carried out by the random forest model using the extracted features from the VGG16.Additionally,ARO is used to improve the hyper-parameters of the random forest model of the random forest.With an accuracy of 96.36%,the suggested model outperforms the standard models in terms of accuracy,F1-score,precision,and recall.The comparative research highlights our strategy’s success,which improves performance while maintaining a lower computational cost.This method is ideal for real-time applications,but it is effective.
基金Guangdong Innovation and Entrepreneurship Training Programme for Undergraduates“Automatic Classification and Identification of Fraudulent Websites Based on Machine Learning”(Project No.:DC2023125)。
文摘This paper explores the synergistic effect of a model combining Elastic Net and Random Forest in online fraud detection.The study selects a public network dataset containing 1781 data records,divides the dataset by 70%for training and 30%for validation,and analyses the correlation between features using a correlation matrix.The experimental results show that the Elastic Net feature selection method generally outperforms PCA in all models,especially when combined with the Random Forest and XGBoost models,and the ElasticNet+Random Forest model achieves the highest accuracy of 0.968 and AUC value of 0.983,while the Kappa and MCC also reached 0.839 and 0.844 respectively,showing extremely high consistency and correlation.This indicates that combining Elastic Net feature selection and Random Forest model has significant performance advantages in online fraud detection.
文摘The Darjeeling Himalayan region,characterized by its complex topography and vulnerability to multiple environmental hazards,faces significant challenges including landslides,earthquakes,flash floods,and soil loss that critically threaten ecosystem stability.Among these challenges,soil erosion emerges as a silent disaster-a gradual yet relentless process whose impacts accumulate over time,progressively degrading landscape integrity and disrupting ecological sustainability.Unlike catastrophic events with immediate visibility,soil erosion’s most devastating consequences often manifest decades later through diminished agricultural productivity,habitat fragmentation,and irreversible biodiversity loss.This study developed a scalable predictive framework employing Random Forest(RF)and Gradient Boosting Tree(GBT)machine learning models to assess and map soil erosion susceptibility across the region.A comprehensive geo-database was developed incorporating 11 erosion triggering factors:slope,elevation,rainfall,drainage density,topographic wetness index,normalized difference vegetation index,curvature,soil texture,land use,geology,and aspect.A total of 2,483 historical soil erosion locations were identified and randomly divided into two sets:70%for model building and 30%for validation purposes.The models revealed distinct spatial patterns of erosion risks,with GBT classifying 60.50%of the area as very low susceptibility,while RF identified 28.92%in this category.Notable differences emerged in high-risk zone identification,with GBT highlighting 7.42%and RF indicating 2.21%as very high erosion susceptibility areas.Both models demonstrated robust predictive capabilities,with GBT achieving 80.77%accuracy and 0.975 AUC,slightly outperforming RF’s 79.67%accuracy and 0.972 AUC.Analysis of predictor variables identified elevation,slope,rainfall and NDVI as the primary factors influencing erosion susceptibility,highlighting the complex interrelationship between geo-environmental factors and erosion processes.This research offers a strategic framework for targeted conservation and sustainable land management in the fragile Himalayan region,providing valuable insights to help policymakers implement effective soil erosion mitigation strategies and support long-term environmental sustainability.
基金supported by the National Natural Science Foundation of China[42030109,42074012]the Scientific Study Project for institutes of Higher Learning,Ministry of Education,Liaoning Province[LJKMZ20220673]+2 种基金the Project supported by the State Key Laboratory of Geodesy and Earths'Dynamics,Innovation Academy for Precision Measurement Science and Technology[SKLGED2023-3-2]Liaoning Revitalization Talent Program[XLYC2203162]Natural Science Foundation of Hebei Province in China[D2023402024].
文摘Zenith wet delay(ZWD)is a key parameter for the precise positioning of global navigation satellite systems(GNSS)and occupies a central role in meteorological research.Currently,most models only consider the periodic variability of the ZWD,neglecting the effect of nonlinear factors on the ZWD estimation.This oversight results in a limited capability to reflect the rapid fluctuations of the ZWD.To more accurately capture and predict complicated variations in ZWD,this paper developed the CRZWD model by a combination of the GPT3 model and random forests(RF)algorithm using 5-year atmospheric profiles from 70 radiosonde(RS)stations across China.Taking the external 25 test stations data as reference,the root mean square(RMS)of the CRZWD model is 29.95 mm.Compared with the GPT3 model and another model using backpropagation neural network(BPNN),the accuracy has improved by 24.7%and 15.9%,respectively.Notably,over 56%of the test stations exhibit an improvement of more than 20%in contrast to GPT3-ZWD.Further temporal and spatial characteristic analyses also demonstrate the significant accuracy and stability advantages of the CRZWD model,indicating the potential prospects for GNSS-based applications.
基金supported by Science and Technology Project Managed by the State Grid Jiangsu Electric Power Co.,Ltd.(No.J2024163).
文摘With the popularization of microgrid construction and the connection of renewable energy sources to the power system,the problem of source and load uncertainty faced by the coordinated operation of multi-microgrid is becoming increasingly prominent,and the accuracy of typical scenario predictions is low.In order to improve the accuracy of scenario prediction under source and load uncertainty,this paper proposes a typical scenario identification model based on random forests and order parameters.Firstly,a method for ordinal parameter identification and quantification is provided for the coordinated operating mode of multi-microgrids,taking into account source-load uncertainty.Secondly,the dynamic change characteristics of the order parameters of the daily load curve,wind and solar curve,and load curve of typical scenarios are statistically analyzed to identify the key order parameters that have the most significant impact on the uncertainty of the load.Then,the order parameters and seasonal distribution are used as features to train a random forest classification model to achieve efficient scenario prediction.Finally,the simulation of actual data from a provincial distribution network shows that the proposed method can accurately classify typical scenarios with an accuracy rate of 92.7%.Additionally,sensitivity analysis is conducted to assess how changes in uncertainty levels affect the importance of each order parameter,allowing for adaptive uncertainty mitigation strategies.
基金support of the project from the National Key R&D Program of China,Research and Application of Sensing System for Cross-regional Complex Oil&Gas Pipeline Network Safe and Efficiency Operational Status Monitoring(Grant No.2022YFB3207603).
文摘One of the core works of analyzing Electrochemical Impedance Spectroscopy(EIS)data is to select an appropriate equivalent circuit model to quantify the parameters of the electrochemical reaction process.However,this process often relies on human experience and judgment,which will introduce subjectivity and error.In this paper,an intelligent approach is proposed for matching EIS data to their equivalent circuits based on the Random Forest algorithm.It can automatically select the most suitable equivalent circuit model based on the characteristics and patterns of EIS data.Addressing the typical scenario of metal corrosion,an atmospheric corrosion EIS dataset of low-carbon steel is constructed in this paper,which includes five different corrosion scenarios.This dataset was used to validate and evaluate the pro-posed method in this paper.The contributions of this paper can be summarized in three aspects:(1)This paper proposes a method for selecting equivalent circuit models for EIS data based on the Random Forest algorithm.(2)Using authentic EIS data collected from metal atmospheric corrosion,the paper es-tablishes a dataset encompassing five categories of metal corrosion scenarios.(3)The superiority of the proposed method is validated through the utilization of the established authentic EIS dataset.The ex-periment results demonstrate that,in terms of equivalent circuit matching,this method surpasses other machine learning algorithms in both precision and robustness.Furthermore,it shows strong applicability in the analysis of EIS data.
基金supported by the National Natural Science Foundation of China(32273037 and 32102636)the Guangdong Major Project of Basic and Applied Basic Research(2020B0301030007)+4 种基金Laboratory of Lingnan Modern Agriculture Project(NT2021007)the Guangdong Science and Technology Innovation Leading Talent Program(2019TX05N098)the 111 Center(D20008)the double first-class discipline promotion project(2023B10564003)the Department of Education of Guangdong Province(2019KZDXM004 and 2019KCXTD001).
文摘A switch from avian-typeα-2,3 to human-typeα-2,6 receptors is an essential element for the initiation of a pandemic from an avian influenza virus.Some H9N2 viruses exhibit a preference for binding to human-typeα-2,6 receptors.This identifies their potential threat to public health.However,our understanding of the molecular basis for the switch of receptor preference is still limited.In this study,we employed the random forest algorithm to identify the potentially key amino acid sites within hemagglutinin(HA),which are associated with the receptor binding ability of H9N2 avian influenza virus(AIV).Subsequently,these sites were further verified by receptor binding assays.A total of 12 substitutions in the HA protein(N158D,N158S,A160 N,A160D,A160T,T163I,T163V,V190T,V190A,D193 N,D193G,and N231D)were predicted to prefer binding toα-2,6 receptors.Except for the V190T substitution,the other substitutions were demonstrated to display an affinity for preferential binding toα-2,6 receptors by receptor binding assays.Especially,the A160T substitution caused a significant upregulation of immune-response genes and an increased mortality rate in mice.Our findings provide novel insights into understanding the genetic basis of receptor preference of the H9N2 AIV.
文摘The agricultural Internet of Things(IoT)system is a critical component of modern smart agriculture,and its security risk assessment methods have garnered increasing attention from the industry.Current agricultural IoT security risk assessment methods primarily rely on expert judgment,introducing subjective factors that reduce the credibility of the assessment results.To address this issue,this study constructed a dataset for agricultural IoT security risk assessment based on real-world security reports.A PCARF algorithm,built on random forest principles,was proposed,incorporating ensemble learning strategies to enhance prediction accuracy.Compared to the second-best model,the proposed model demonstrated a 2.7%increase in accuracy,a 3.4%improvement in recall rate,a 3.1%rise in Area Under the Curve(AUC),and a 7.9%boost in Matthews Correlation Coefficient(MCC).Extensive comparative experiments showed that the proposed model outperforms others in prediction accuracy and robustness.
基金supported by the Startup Grant(PG18929)awarded to F.Shokoohi.
文摘Accurate Electric Load Forecasting(ELF)is crucial for optimizing production capacity,improving operational efficiency,and managing energy resources effectively.Moreover,precise ELF contributes to a smaller environmental footprint by reducing the risks of disruption,downtime,and waste.However,with increasingly complex energy consumption patterns driven by renewable energy integration and changing consumer behaviors,no single approach has emerged as universally effective.In response,this research presents a hybrid modeling framework that combines the strengths of Random Forest(RF)and Autoregressive Integrated Moving Average(ARIMA)models,enhanced with advanced feature selection—Minimum Redundancy Maximum Relevancy and Maximum Synergy(MRMRMS)method—to produce a sparse model.Additionally,the residual patterns are analyzed to enhance forecast accuracy.High-resolution weather data from Weather Underground and historical energy consumption data from PJM for Duke Energy Ohio and Kentucky(DEO&K)are used in this application.This methodology,termed SP-RF-ARIMA,is evaluated against existing approaches;it demonstrates more than 40%reduction in mean absolute error and root mean square error compared to the second-best method.
基金support in providing the data and the Universiti Teknologi Malaysia supported this work under UTM Flagship CoE/RG-Coe/RG 5.2:Evaluating Surface PGA with Global Ground Motion Site Response Analyses for the highest seismic activity location in Peninsular Malaysia(Q.J130000.5022.10G47)Universiti Teknologi Malaysia-Earthquake Hazard Assessment in Peninsular Malaysia Using Probabilistic Seismic Hazard Analysis(PSHA)Method(Q.J130000.21A2.06E9).
文摘The prediction of slope stability is a complex nonlinear problem.This paper proposes a new method based on the random forest(RF)algorithm to study the rocky slopes stability.Taking the Bukit Merah,Perak and Twin Peak(Kuala Lumpur)as the study area,the slope characteristics of geometrical parameters are obtained from a multidisciplinary approach(consisting of geological,geotechnical,and remote sensing analyses).18 factors,including rock strength,rock quality designation(RQD),joint spacing,continuity,openness,roughness,filling,weathering,water seepage,temperature,vegetation index,water index,and orientation,are selected to construct model input variables while the factor of safety(FOS)functions as an output.The area under the curve(AUC)value of the receiver operating characteristic(ROC)curve is obtained with precision and accuracy and used to analyse the predictive model ability.With a large training set and predicted parameters,an area under the ROC curve(the AUC)of 0.95 is achieved.A precision score of 0.88 is obtained,indicating that the model has a low false positive rate and correctly identifies a substantial number of true positives.The findings emphasise the importance of using a variety of terrain characteristics and different approaches to characterise the rock slope.
基金Projects (50934006, 10872218) supported by the National Natural Science Foundation of ChinaProject (2010CB732004) supported bythe National Basic Research Program of China+1 种基金Project (kjdb2010-6) supported by Doctoral Candidate Innovation Research Support Program of Science & Technology Review, ChinaProject (201105) supported by Scholarship Award for Excellent Doctoral Student,Ministry of Education, China
文摘The method of Random Forest (RF) was used to classify whether rockburst will happen and the intensity of rockburst in the underground rock projects. Some main control factors of rockburst, such as the values of in-situ stresses, uniaxial compressive strength and tensile strength of rock, and the elastic energy index of rock, were selected in the analysis. The traditional indicators were summarized and divided into indexes I and 1I. Random Forest model and criterion were obtained through training 36 sets of rockburst samples which come from underground rock projects in domestic and abroad. Another 10 samples were tested and evaluated with the model. The evaluated results agree well with the practical records. Comparing the results of support vector machine (SVM) method, and artificial neural network (ANN) method with random forest method, the corresponding misjudgment ratios are 10%, 20%, and 0, respectively. The misjudgment ratio using index I is smaller than that using index II. It is suggested that using the index I and RF model can accurately classify rockburst grade.
基金The National High Technology Research and Development Program of China(863 Program)(No.2012AA112304)the Scientific Innovation Research of College Graduates in Jiangsu Province(No.CXZZ13-0119)
文摘In order to avoid the noise and over fitting and further improve the limited classification performance of the real decision tree, a traffic incident detection method based on the random forest algorithm is presented. From the perspective of classification strength and correlation, three experiments are performed to investigate the potential application of random forest to traffic incident detection: comparison with a different number of decision trees; comparison with different decision trees; comparison with the neural network. The real traffic data of the 1-880 database is used in the experiments. The detection performance is evaluated by the common criteria including the detection rate, the false alarm rate, the mean time to detection, the classification rate and the area under the curve of the receiver operating characteristic (ROC). The experimental results indicate that the model based on random forest can improve the decision rate, reduce the testing time, and obtain a higher classification rate. Meanwhile, it is competitive compared with multi-layer feed forward neural networks (MLF).
基金financial support from High-end Foreign Expert Introduction program(No.G20190022002)Chongqing Construction Science and Technology Plan Project(2019-0045)as well as Chongqing Engineering Research Center of Disaster Prevention&Control for Banks and Structures in Three Gorges Reservoir Area(Nos.SXAPGC18ZD01 and SXAPGC18YB03)。
文摘Accurate assessment of undrained shear strength(USS)for soft sensitive clays is a great concern in geotechnical engineering practice.This study applies novel data-driven extreme gradient boosting(XGBoost)and random forest(RF)ensemble learning methods for capturing the relationships between the USS and various basic soil parameters.Based on the soil data sets from TC304 database,a general approach is developed to predict the USS of soft clays using the two machine learning methods above,where five feature variables including the preconsolidation stress(PS),vertical effective stress(VES),liquid limit(LL),plastic limit(PL)and natural water content(W)are adopted.To reduce the dependence on the rule of thumb and inefficient brute-force search,the Bayesian optimization method is applied to determine the appropriate model hyper-parameters of both XGBoost and RF.The developed models are comprehensively compared with three comparison machine learning methods and two transformation models with respect to predictive accuracy and robustness under 5-fold cross-validation(CV).It is shown that XGBoost-based and RF-based methods outperform these approaches.Besides,the XGBoostbased model provides feature importance ranks,which makes it a promising tool in the prediction of geotechnical parameters and enhances the interpretability of model.
文摘The present study aims to develop two hybrid models to optimize the factors and enhance the predictive ability of the landslide susceptibility models.For this,a landslide inventory map was created with 406 historical landslides and 2030 non-landslide points,which was randomly divided into two datasets for model training(70%)and model testing(30%).22 factors were initially selected to establish a landslide factor database.We applied the GeoDetector and recursive feature elimination method(RFE)to address factor optimization to reduce information redundancy and collinearity in the data.Thereafter,the frequency ratio method,multicollinearity test,and interactive detector were used to analyze and evaluate the optimized factors.Subsequently,the random forest(RF)model was used to create a landslide susceptibility map with original and optimized factors.The resultant hybrid models GeoDetector-RF and RFE-RF were evaluated and compared by the area under the receiver operating characteristic curve(AUC)and accuracy.The accuracy of the two hybrid models(0.868 for GeoDetector-RF and 0.869 for RFE-RF)were higher than that of the RF model(0.860),indicating that the hybrid models with factor optimization have high reliability and predictability.Both RFE-RF GeoDetector-RF had higher AUC values,respectively 0.863 and 0.860,than RF(0.853).These results confirm the ability of factor optimization methods to improve the performance of landslide susceptibility models.