期刊文献+
共找到698篇文章
< 1 2 35 >
每页显示 20 50 100
Synergy Between Resilient Networks and Random Forests in Online Fraud Detection
1
作者 Junxi Wang Ningtao Sun +2 位作者 Yuhan Lv Jiayi Zhou Yue Xiao 《Journal of Electronic Research and Application》 2025年第2期43-50,共8页
This paper explores the synergistic effect of a model combining Elastic Net and Random Forest in online fraud detection.The study selects a public network dataset containing 1781 data records,divides the dataset by 70... This paper explores the synergistic effect of a model combining Elastic Net and Random Forest in online fraud detection.The study selects a public network dataset containing 1781 data records,divides the dataset by 70%for training and 30%for validation,and analyses the correlation between features using a correlation matrix.The experimental results show that the Elastic Net feature selection method generally outperforms PCA in all models,especially when combined with the Random Forest and XGBoost models,and the ElasticNet+Random Forest model achieves the highest accuracy of 0.968 and AUC value of 0.983,while the Kappa and MCC also reached 0.839 and 0.844 respectively,showing extremely high consistency and correlation.This indicates that combining Elastic Net feature selection and Random Forest model has significant performance advantages in online fraud detection. 展开更多
关键词 Fraudulent websites Machine learning Elastic Net random forests
在线阅读 下载PDF
Estimating grassland LAI using the Random Forests approach and Landsat imagery in the meadow steppe of Hulunber, China 被引量:14
2
作者 LI Zhen-wang XIN Xiao-ping +3 位作者 TANG Huan YANG Fan CHEN Bao-rui ZHANG Bao-hui 《Journal of Integrative Agriculture》 SCIE CAS CSCD 2017年第2期286-297,共12页
Leaf area index (LAI) is a key parameter for describing vegetation structures and is closely associated with vegetative photosynthesis and energy balance. The accurate retrieval of LAI is important when modeling bio... Leaf area index (LAI) is a key parameter for describing vegetation structures and is closely associated with vegetative photosynthesis and energy balance. The accurate retrieval of LAI is important when modeling biophysical processes of vegetation and the productivity of earth systems. The Random Forests (RF) method aggregates an ensemble of deci- sion trees to improve the prediction accuracy and demonstrates a more robust capacity than other regression methods. This study evaluated the RF method for predicting grassland LAI using ground measurements and remote sensing data. Parameter optimization and variable reduction were conducted before model prediction. Two variable reduction methods were examined: the Variable Importance Value method and the principal component analysis (PCA) method. Finally, the sensitivity of RF to highly correlated variables was tested. The results showed that the RF parameters have a small effect on the performance of RF, and a satisfactory prediction was acquired with a root mean square error (RMSE) of 0.1956. The two variable reduction methods for RF prediction produced different results; variable reduction based on the Variable Importance Value method achieved nearly the same prediction accuracy with no reduced prediction, whereas variable re- duction using the PCA method had an obviously degraded result that may have been caused by the loss of subtle variations and the fusion of noise information. After removing highly correlated variables, the relative variable importance remained steady, and the use of variables selected based on the best-performing vegetation indices performed better than the vari- ables with all vegetation indices or those selected based on the most important one. The results in this study demonstrate the practical and powerful ability of the RF method in predicting grassland LAI, which can also be applied to the estimation of other vegetation traits as an alternative to conventional empirical regression models and the selection of relevant variables used in ecological models. 展开更多
关键词 leaf area index random forests grassland remote sensing Hulunber
在线阅读 下载PDF
MOOC Learner’s Final Grade Prediction Based on an Improved Random Forests Method 被引量:1
3
作者 Yuqing Yang Peng Fu +2 位作者 Xiaojiang Yang Hong Hong Dequn Zhou 《Computers, Materials & Continua》 SCIE EI 2020年第12期2413-2423,共11页
Massive Open Online Course(MOOC)has become a popular way of online learning used across the world by millions of people.Meanwhile,a vast amount of information has been collected from the MOOC learners and institutions... Massive Open Online Course(MOOC)has become a popular way of online learning used across the world by millions of people.Meanwhile,a vast amount of information has been collected from the MOOC learners and institutions.Based on the educational data,a lot of researches have been investigated for the prediction of the MOOC learner’s final grade.However,there are still two problems in this research field.The first problem is how to select the most proper features to improve the prediction accuracy,and the second problem is how to use or modify the data mining algorithms for a better analysis of the MOOC data.In order to solve these two problems,an improved random forests method is proposed in this paper.First,a hybrid indicator is defined to measure the importance of the features,and a rule is further established for the feature selection;then,a Clustering-Synthetic Minority Over-sampling Technique(SMOTE)is embedded into the traditional random forests algorithm to solve the class imbalance problem.In experiment part,we verify the performance of the proposed method by using the Canvas Network Person-Course(CNPC)dataset.Furthermore,four well-known prediction methods have been applied for comparison,where the superiority of our method has been proved. 展开更多
关键词 random forests grade prediction feature selection class imbalance
在线阅读 下载PDF
Random Forests Algorithm Based Duplicate Detection in On-Site Programming Big Data Environment 被引量:1
4
作者 Qianqian Li Meng Li +1 位作者 Lei Guo Zhen Zhang 《Journal of Information Hiding and Privacy Protection》 2020年第4期199-205,共7页
On-site programming big data refers to the massive data generated in the process of software development with the characteristics of real-time,complexity and high-difficulty for processing.Therefore,data cleaning is e... On-site programming big data refers to the massive data generated in the process of software development with the characteristics of real-time,complexity and high-difficulty for processing.Therefore,data cleaning is essential for on-site programming big data.Duplicate data detection is an important step in data cleaning,which can save storage resources and enhance data consistency.Due to the insufficiency in traditional Sorted Neighborhood Method(SNM)and the difficulty of high-dimensional data detection,an optimized algorithm based on random forests with the dynamic and adaptive window size is proposed.The efficiency of the algorithm can be elevated by improving the method of the key-selection,reducing dimension of data set and using an adaptive variable size sliding window.Experimental results show that the improved SNM algorithm exhibits better performance and achieve higher accuracy. 展开更多
关键词 On-site programming big data duplicate record detection random forests adaptive sliding window
在线阅读 下载PDF
Overfitting in Machine Learning:A Comparative Analysis of Decision Trees and Random Forests
5
作者 Erblin Halabaku Eliot Bytyçi 《Intelligent Automation & Soft Computing》 2024年第6期987-1006,共20页
Machine learning has emerged as a pivotal tool in deciphering and managing this excess of information in an era of abundant data.This paper presents a comprehensive analysis of machine learning algorithms,focusing on ... Machine learning has emerged as a pivotal tool in deciphering and managing this excess of information in an era of abundant data.This paper presents a comprehensive analysis of machine learning algorithms,focusing on the structure and efficacy of random forests in mitigating overfitting—a prevalent issue in decision tree models.It also introduces a novel approach to enhancing decision tree performance through an optimized pruning method called Adaptive Cross-Validated Alpha CCP(ACV-CCP).This method refines traditional cost complexity pruning by streamlining the selection of the alpha parameter,leveraging cross-validation within the pruning process to achieve a reliable,computationally efficient alpha selection that generalizes well to unseen data.By enhancing computational efficiency and balancing model complexity,ACV-CCP allows decision trees to maintain predictive accuracy while minimizing overfitting,effectively narrowing the performance gap between decision trees and random forests.Our findings illustrate how ACV-CCP contributes to the robustness and applicability of decision trees,providing a valuable perspective on achieving computationally efficient and generalized machine learning models. 展开更多
关键词 Artificial intelligence decision tree random forest PRUNE OVERFITTING
在线阅读 下载PDF
Boosting SISSO performance on small sample datasets by using Random Forests prescreening for complex feature selection
6
作者 Xiaolin Jiang Guanqi Liu +1 位作者 Jiaying Xie Zhenpeng Hu 《Frontiers of physics》 2025年第1期117-123,共7页
In materials science,data-driven methods accelerate material discovery and optimization while reducing costs and improving success rates.Symbolic regression is a key to extracting material descriptors from large datas... In materials science,data-driven methods accelerate material discovery and optimization while reducing costs and improving success rates.Symbolic regression is a key to extracting material descriptors from large datasets,in particular the Sure Independence Screening and Sparsifying Operator(SISSO)method.While SISSO needs to store the entire expression space to impose heavy memory demands,it limits the performance in complex problems.To address this issue,we propose a RF-SISSO algorithm by combining Random Forests(RF)with SISSO.In this algorithm,the Random Forests algorithm is used for prescreening,capturing non-linear relationships and improving feature selection,which may enhance the quality of the input data and boost the accuracy and efficiency on regression and classification tasks.For a testing on the SISSO’s verification problem for 299 materials,RF-SISSO demonstrates its robust performance and high accuracy.RF-SISSO can maintain the testing accuracy above 0.9 across all four training sample sizes and significantly enhancing regression efficiency,especially in training subsets with smaller sample sizes.For the training subset with 45 samples,the efficiency of RF-SISSO was 265 times higher than that of original SISSO.As collecting large datasets would be both costly and time-consuming in the practical experiments,it is thus believed that RF-SISSO may benefit scientific researches by offering a high predicting accuracy with limited data efficiently. 展开更多
关键词 random forests algorithm SISSO symbolic regression algorithm machine learning small datasets PRESCREENING complex feature selection
原文传递
A Systematic Comparison of Horizontal Federated Learning Algorithm Based on Random Forests in a Medical Setting
7
作者 Andrew Cheng Jingqing Zhang +2 位作者 Atri Sharma Vibhor Gupta Yike Guo 《Machine Intelligence Research》 2025年第2期254-266,共13页
The medical industry generates vast amounts of data suitable for machine learning during patient-clinician interaction in hospitals.However,as a result of data protection regulations like the general data protection r... The medical industry generates vast amounts of data suitable for machine learning during patient-clinician interaction in hospitals.However,as a result of data protection regulations like the general data protection regulation(GDPR),patient data cannot be shared freely across institutions.In these cases,federated learning(FL)is a viable option where a global model learns from multiple data sites without moving the data.In this paper,we focused on random forests(RFs)for its effectiveness in classification tasks and widespread use throughout the medical industry and compared two popular federated random forest aggregation algorithms on horizontally partitioned data.We first provided necessary background information on federated learning,the advantages of random forests in a medical context,and the two aggregation algorithms.A series of extensive experiments using four public binary medical datasets(an excerpt of MIMIC III,Pima Indian diabetes dataset from Kaggle,and diabetic retinopathy and heart failure dataset from UCI machine learning repository)were then performed to systematically compare the two on equal-sized,unequal-sized,and class-imbalanced clients.A follow-up investigation on the effects of more clients was also conducted.We finally empirically analyzed the advantages of federated learning and concluded that the weighted merge algorithm produces models with,on average,1.903%higher F1 score and 1.406%higher AUCROC value. 展开更多
关键词 Federated learning horizontal federated learning random forests machine learning medical diagnosis.
原文传递
新型城镇化对共同富裕的影响研究——基于Random Forests模型和Loess模型的分析 被引量:5
8
作者 欧阳金琼 张俊蕾 王雨濛 《城市问题》 CSSCI 北大核心 2024年第3期91-103,共13页
基于2010—2021年31个省份的相关数据,运用综合评分法测度我国的新型城镇化水平和共同富裕程度,结合Random Forests模型和Loess模型分析新型城镇化对共同富裕的影响效应及其成因。研究发现,新型城镇化总体上可以促进共同富裕,但存在维... 基于2010—2021年31个省份的相关数据,运用综合评分法测度我国的新型城镇化水平和共同富裕程度,结合Random Forests模型和Loess模型分析新型城镇化对共同富裕的影响效应及其成因。研究发现,新型城镇化总体上可以促进共同富裕,但存在维度异质性与空间异质性。各维度对共同富裕的影响力从高至低排序依次为经济城镇化、人口城镇化、社会城镇化和生态城镇化,各区域按相关性从大到小排序依次为东北、东部、中部和西部地区。不同维度的城镇化对不同区域的共同富裕影响程度不同,对东部、中部和东北地区影响最大的是经济城镇化,对西部地区影响最大的是人口城镇化。基于此,提出以人的全面发展为目标、以经济城镇化为重点、以差异化发展策略为前提、以公共服务均等化为突破口推进共同富裕的政策建议。 展开更多
关键词 新型城镇化 共同富裕 random forests模型 Loess模型
原文传递
Multiple Random Forests Based Intelligent Location of Single-phase Grounding Fault in Power Lines of DFIG-based Wind Farm 被引量:4
9
作者 Yongli Zhu Hua Peng 《Journal of Modern Power Systems and Clean Energy》 SCIE EI CSCD 2022年第5期1152-1163,共12页
To address the problems of wind power abandonment and the stoppage of electricity transmission caused by a short circuit in a power line of a doubly-fed induction generator(DFIG) based wind farm, this paper proposes a... To address the problems of wind power abandonment and the stoppage of electricity transmission caused by a short circuit in a power line of a doubly-fed induction generator(DFIG) based wind farm, this paper proposes an intelligent location method for a single-phase grounding fault based on a multiple random forests(multi-RF) algorithm. First, the simulation model is built, and the fundamental amplitudes of the zerosequence currents are extracted by a fast Fourier transform(FFT) to construct the feature set. Then, the random forest classification algorithm is applied to establish the fault section locator. The model is resampled on the basis of the bootstrap method to generate multiple sample subsets, which are used to establish multiple classification and regression tree(CART) classifiers. The CART classifiers use the mean decrease in the node impurity as the feature importance,which is used to mine the relationship between features and fault sections. Subsequently, a fault section is identified by voting on the test results for each classifier. Finally, a multi-RF regression fault locator is built to output the predicted fault distance. Experimental results with PSCAD/EMTDC software show that the proposed method can overcome the shortcomings of a single RF and has the advantage of locating a short hybrid overhead/cable line with multiple branches. Compared with support vector machines(SVMs)and previously reported methods, the proposed method can meet the location accuracy and efficiency requirements of a DFIG-based wind farm better. 展开更多
关键词 Doubly-fed induction generator(DFIG)based wind farm power line multiple random forests(multi-RF) single-phase grounding fault fault location
原文传递
Variable importance-weighted Random Forests 被引量:4
10
作者 Yiyi Liu Hongyu Zhao 《Frontiers of Electrical and Electronic Engineering in China》 CSCD 2017年第4期338-351,共14页
Background: Random Forests is a popular classification and regression method that has proven powerful for various prediction problems in biological studies. However, its performance often deteriorates when the number... Background: Random Forests is a popular classification and regression method that has proven powerful for various prediction problems in biological studies. However, its performance often deteriorates when the number of features increases. To address this limitation, feature elimination Random Forests was proposed that only uses features with the largest variable importance scores. Yet the performance of this method is not satisfying, possibly due to its rigid feature selection, and increased correlations between trees of forest. Methods: We propose variable importance-weighted Random Forests, which instead of sampling features with equal probability at each node to build up trees, samples features according to their variable importance scores, and then select the best split from the randomly selected features. Results: We evaluate the performance of our method through comprehensive simulation and real data analyses, for both regression and classification. Compared to the standard Random Forests and the feature elimination Random Forests methods, our proposed method has improved performance in most cases. Conclusions: By incorporating the variable importance scores into the random feature selection step, our method can better utilize more informative features without completely ignoring less informative ones, hence has improved prediction accuracy in the presence of weak signals and large noises. We have implemented an R package "viRandomForests" based on the original R package "randomForest" and it can be freely downloaded from http:// zhaocenter.org/software. 展开更多
关键词 random forests variable importance score CLASSIFICATION regression
原文传递
Learning random forests for ranking 被引量:2
11
作者 Liangxiao Jiang (1) ljiang@cug.edu.cn 《Frontiers of Computer Science》 SCIE EI CSCD 2011年第1期79-86,共8页
The random forests (RF) algorithm, which combines the predictions from an ensemble of random trees, has achieved significant improvements in terms of classification accuracy. In many real-world applications, however... The random forests (RF) algorithm, which combines the predictions from an ensemble of random trees, has achieved significant improvements in terms of classification accuracy. In many real-world applications, however, ranking is often required in order to make optimal decisions. Thus, we focus our attention on the ranking performance of RF in this paper. Our experi- mental results based on the entire 36 UC Irvine Machine Learning Repository (UCI) data sets published on the main website of Weka platform show that RF doesn't perform well in ranking, and is even about the same as a single C4.4 tree. This fact raises the question of whether several improvements to RF can scale up its ranking performance. To answer this question, we single out an improved random forests (IRF) algorithm. Instead of the information gain measure and the maximum-likelihood estimate, the average gain measure and the similarity- weighted estimate are used in IRF. Our experiments show that IRF significantly outperforms all the other algorithms used to compare in terms of ranking while maintains the high classification accuracy characterizing RF. 展开更多
关键词 random forests (RF) decision tree randomselection class probability estimation RANKING the areaunder the receiver operating characteristics curve (AUC)
原文传递
A genome-wide association study of Alzheimer's disease using random forests and enrichment analysis 被引量:2
12
作者 ZOU Liang HUANG Qiong LI Ao WANG MingHui 《Science China(Life Sciences)》 SCIE CAS 2012年第7期618-625,共8页
Alzheimer's disease(AD) is a serious neurodegenerative disorder and its cause remains largely elusive.In past years,genome-wide association(GWA) studies have provided an effective means for AD research.However,the... Alzheimer's disease(AD) is a serious neurodegenerative disorder and its cause remains largely elusive.In past years,genome-wide association(GWA) studies have provided an effective means for AD research.However,the univariate method that is commonly used in GWA studies cannot effectively detect the biological mechanisms associated with this disease.In this study,we propose a new strategy for the GWA analysis of AD that combines random forests with enrichment analysis.First,backward feature selection using random forests was performed on a GWA dataset of AD patients carrying the apolipoprotein gene(APOEε4) and 1058 susceptible single nucleotide polymorphisms(SNPs) were detected,including several known AD-associated SNPs.Next,the susceptible SNPs were investigated by enrichment analysis and significantly-associated gene functional annotations,such as 'alternative splicing','glycoprotein',and 'neuron development',were successfully discovered,indicating that these biological mechanisms play important roles in the development of AD in APOEε4 carriers.These findings may provide insights into the pathogenesis of AD and helpful guidance for further studies.Furthermore,this strategy can easily be modified and applied to GWA studies of other complex diseases. 展开更多
关键词 genome-wide association study random forests enrichment analysis feature selection Alzheimer's disease
原文传递
Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data 被引量:1
13
作者 Yilin Gao Zifan Zhu Fengzhu Sun 《Synthetic and Systems Biotechnology》 SCIE 2022年第1期574-585,共12页
Dysfunction of microbial communities in various human body sites has been shown to be associated with a variety of diseases raising the possibility of predicting diseases based on metagenomic samples.Although many stu... Dysfunction of microbial communities in various human body sites has been shown to be associated with a variety of diseases raising the possibility of predicting diseases based on metagenomic samples.Although many studies have investigated this problem,there are no consensus on the optimal approaches for predicting disease status based on metagenomic samples.Using six human gut metagenomic datasets consisting of large numbers of colorectal cancer patients and healthy controls from different countries,we investigated different software packages for extracting relative abundances of known microbial genomes and for integrating mapping and as-sembly approaches to obtain the relative abundance profiles of both known and novel genomes.The random forests(RF)classification algorithm was then used to predict colorectal cancer status based on the microbial relative abundance profiles.Based on within data cross-validation and cross-dataset prediction,we show that the RF prediction performance using the microbial relative abundance profiles estimated by Centrifuge is generally higher than that using the microbial relative abundance profiles estimated by MetaPhlAn2 and Bracken.We also develop a novel method to integrate the relative abundance profiles of both known and novel microbial or-ganisms to further increase the prediction performance for colorectal cancer from metagenomes. 展开更多
关键词 MICROBIOME Colorectal cancer Metagenomic shotgun sequencing random forests
原文传递
Random forests to predict survival of octogenarians with brain metastases from nonsmall-cell lung cancer
14
作者 Lijun Song Yu Wang +5 位作者 Xue Li Yi Liu Bingyi Yin Daorui Li Hongsheng Lin Yuqi Zhang 《Brain Science Advances》 2024年第1期39-56,共18页
Background:To create and validate nomograms for the personalized prediction of survival in octogenarians with newly diagnosed nonsmall-cell lung cancer(NSCLC)with sole brain metastases(BMs).Methods:Random forests(RF)w... Background:To create and validate nomograms for the personalized prediction of survival in octogenarians with newly diagnosed nonsmall-cell lung cancer(NSCLC)with sole brain metastases(BMs).Methods:Random forests(RF)were applied to identify independent prognostic factors for building nomogram models.The predictive accuracy of the model was evaluated based on the receiver operating characteristic(ROC)curve,C-index,and calibration plots.Results:The area under the curve(AUC)values for overall survival at 6,12,and 18 months in the validation cohort were 0.837,0.867,and 0.849,respectively;the AUC values for cancer-specific survival prediction were 0.819,0.835,and 0.818,respectively.The calibration curves visualized the accuracy of the model.Conclusion:The new nomograms have good predictive power for survival among octogenarians with sole BMs related to NSCLC. 展开更多
关键词 OCTOGENARIAN NSCLC brain metastases random forests NOMOGRAM
原文传递
A comparative study of fuzzy weights of evidence and random forests for mapping mineral prospectivity for skarn-type Fe deposits in the southwestern Fujian metallogenic belt, China 被引量:11
15
作者 ZHANG Zhen Jie ZUO Ren Guang XIONG Yi Hui 《Science China Earth Sciences》 SCIE EI CAS CSCD 2016年第3期556-572,共17页
Recent studies have pointed out that the widespread iron deposits in southwestern Fujian metallogenic belt(SFMB)(China) are skarn-type deposits associated with the Yanshanian granites. There is still excellent potenti... Recent studies have pointed out that the widespread iron deposits in southwestern Fujian metallogenic belt(SFMB)(China) are skarn-type deposits associated with the Yanshanian granites. There is still excellent potential for mineral exploration because large areas in this belt are covered by forest. A new predictive model for mapping skarn-type Fe deposit prospectivity in this belt was developed and focused on in this study, using five criteria as evidence:(1) the contact zones of Yanshanian granites(GRANITE);(2) the contact zones within the late Paleozoic marine sedimentary rocks and the carbonate formations(FORMATION);(3) the NE-NNE-trending faults(FAULT);(4) the zones of skarn alterations(SKARN); and(5) the aeromagnetic anomaly(AEROMAGNETIC). The fuzzy weights of evidence(FWof E) method, developed from the classical weights of evidence(Wof E) and based on fuzzy sets and fuzzy probabilities, could provide smaller variances and more accurate posterior probabilities and could effectively minimize the uncertainty caused by omitted or wrongly assigned data and be more flexible than the Wof E. It is an efficient and widely used method for mineral potential mapping. Random forests(RF) is a new and useful method for data-driven predictive mapping of mineral prospectivity method, and needs further scrutiny. Both prospectivity results respectively using the FWof E and RF methods reveal that the prediction model for the skarn-type Fe deposits in the SFMB is successful and efficient. Both methods suggested that the GRANITE and FORMATION are the most valuable evidence maps, followed by SKARN, AEROMAGNETIC, and FAULT. This is coincident with the skarn-type Fe deposit mineral model in the SFMB. The unstable performance experienced when FORMATION was omitted might indicate that the highest uncertainty and risk in follow-up exploration is related to the sequences. In addition, the performance of the RF method for the skarn-type Fe deposits prospectivity in the SFMB is better than the FWof E; therefore, it could be used to guide further exploration of skarn-type Fe prospects in the SFMB. 展开更多
关键词 Mineral prospectivity mapping Fuzzy weights of evidence random forest Skarn-type Fe Makeng deposit
原文传递
A new correlation-based approach for ensemble selection in random forests 被引量:2
16
作者 Mostafa El Habib Daho Nesma Settouti +2 位作者 Mohammed El Amine Bechar Amina Boublenza Mohammed Amine Chikh 《International Journal of Intelligent Computing and Cybernetics》 EI 2021年第2期251-268,共18页
Purpose-Ensemble methods have been widely used in the field of pattern recognition due to the difficulty offinding a single classifier that performs well on a wide variety of problems.Despite the effectiveness of thes... Purpose-Ensemble methods have been widely used in the field of pattern recognition due to the difficulty offinding a single classifier that performs well on a wide variety of problems.Despite the effectiveness of thesetechniques,studies have shown that ensemble methods generate a large number of hypotheses and thatcontain redundant classifiers in most cases.Several works proposed in the state of the art attempt to reduce allhypotheses without affecting performance.Design/methodology/approach-In this work,the authors are proposing a pruning method that takes intoconsideration the correlation between classifiers/classes and each classifier with the rest of the set.The authorshave used the random forest algorithm as trees-based ensemble classifiers and the pruning was made by atechnique inspired by the CFS(correlation feature selection)algorithm.Findings-The proposed method CES(correlation-based Ensemble Selection)was evaluated onten datasets from the UCI machine learning repository,and the performances were compared to sixensemble pruning techniques.The results showed that our proposed pruning method selects a smallensemble in a smaller amount of time while improving classification rates compared to the state-of-the-artmethods.Originality/value-CES is a new ordering-based method that uses the CFS algorithm.CES selects,in a shorttime,a small sub-ensemble that outperforms results obtained from the whole forest and the other state-of-thearttechniques used in this study. 展开更多
关键词 Ensemble pruning random forest Tree selection CORRELATION CFS CES
在线阅读 下载PDF
Interpreting uninterpretable predictors:kernel methods,Shtarkov solutions,and random forests
17
作者 T.M.Le Bertrand Clarke 《Statistical Theory and Related Fields》 2022年第1期10-28,共19页
Many of the best predictors for complex problems are typically regarded as hard to interpret physically.These include kernel methods,Shtarkov solutions,and random forests.We show that,despite the inability to interpre... Many of the best predictors for complex problems are typically regarded as hard to interpret physically.These include kernel methods,Shtarkov solutions,and random forests.We show that,despite the inability to interpret these three predictors to infinite precision,they can be asymptotically approximated and admit conceptual interpretations in terms of their mathe-matical/statistical properties.The resulting expressions can be in terms of polynomials,basis elements,or other functions that an analyst may regard as interpretable. 展开更多
关键词 BAYES BOOSTING kernel methods random forest Shtarkov predictor STACKING
原文传递
Modelling Key Population Attrition in the HIV and AIDS Programme in Kenya Using Random Survival Forests with Synthetic Minority Oversampling Technique-Nominal Continuous
18
作者 Evan Kahacho Charity Wamwea +1 位作者 Bonface Malenje Gordon Aomo 《Journal of Data Analysis and Information Processing》 2023年第1期11-36,共26页
HIV and AIDS has continued to be a major public health concern, and hence one of the epidemics that the world resolved to end by 2030 as highlighted in sustainable development goals (SDGs). A colossal amount of effort... HIV and AIDS has continued to be a major public health concern, and hence one of the epidemics that the world resolved to end by 2030 as highlighted in sustainable development goals (SDGs). A colossal amount of effort has been taken to reduce new HIV infections, but there are still a significant number of new infections reported. HIV prevalence is more skewed towards the key population who include female sex workers (FSW), men who have sex with men (MSM), and people who inject drugs (PWID). The study design was retrospective and focused on key population enrolled in a comprehensive HIV and AIDS programme by the Kenya Red Cross Society from July 2019 to June 2021. Individuals who were either lost to follow up, defaulted (dropped out, transferred out, or relocated) or died were classified as attrition;while those who were active and alive by the end of the study were classified as retention. The study used density analysis to determine the spatial differences of key population attrition in the 19 targeted counties, and used Kilifi county as an example to map attrition cases in smaller administrative areas (sub-county level). The study used synthetic minority oversampling technique-nominal continuous (SMOTE-NC) to balance the datasets since the cases of attrition were much less than retention. The random survival forests model was then fitted to the balanced dataset. The model correctly identified attrition cases using the predicted ensemble mortality and their survival time using the estimated Kaplan-Meier survival function. The predictive performance of the model was strong and way better than random chance with concordance indices greater than 0.75. 展开更多
关键词 random Survival forests Synthetic Minority Oversampling Technique-Nominal Continuous (SMOTE-NC) Key Population Female Sex Workers (FSW) Men Who Have Sex with Men (MSM) People Who Inject Drugs (PWID)
暂未订购
Unmasking Social Robots’Camouflage:A GNN-Random Forest Framework for Enhanced Detection
19
作者 Weijian Fan Chunhua Wang +1 位作者 Xiao Han Chichen Lin 《Computers, Materials & Continua》 SCIE EI 2025年第1期467-483,共17页
The proliferation of robot accounts on social media platforms has posed a significant negative impact,necessitating robust measures to counter network anomalies and safeguard content integrity.Social robot detection h... The proliferation of robot accounts on social media platforms has posed a significant negative impact,necessitating robust measures to counter network anomalies and safeguard content integrity.Social robot detection has emerged as a pivotal yet intricate task,aimed at mitigating the dissemination of misleading information.While graphbased approaches have attained remarkable performance in this realm,they grapple with a fundamental limitation:the homogeneity assumption in graph convolution allows social robots to stealthily evade detection by mingling with genuine human profiles.To unravel this challenge and thwart the camouflage tactics,this work proposed an innovative social robot detection framework based on enhanced HOmogeneity and Random Forest(HORFBot).At the core of HORFBot lies a homogeneous graph enhancement strategy,intricately woven with edge-removal techniques,tometiculously dissect the graph intomultiple revealing subgraphs.Subsequently,leveraging the power of contrastive learning,the proposed methodology meticulously trains multiple graph convolutional networks,each honed to discern nuances within these tailored subgraphs.The culminating stage involves the fusion of these feature-rich base classifiers,harmoniously aggregating their insights to produce a comprehensive detection outcome.Extensive experiments on three social robot detection datasets have shown that this method effectively improves the accuracy of social robot detection and outperforms comparative methods. 展开更多
关键词 Social robot detection graph neural networks random forest HOMOPHILY heterophily
在线阅读 下载PDF
Prediction of sandstone porosity in coal seam roof based on variable mode decomposition and random forest method
20
作者 Huang Ya-ping Qi Xue-mei +3 位作者 Cheng Yan Zhou Ling-ling Yan Jia-hao Huang Fan-rui 《Applied Geophysics》 2025年第1期197-208,235,236,共14页
Evaluation of water richness in sandstone is an important research topic in the prevention and control of mine water disasters,and the water richness in sandstone is closely related to its porosity.The refl ection sei... Evaluation of water richness in sandstone is an important research topic in the prevention and control of mine water disasters,and the water richness in sandstone is closely related to its porosity.The refl ection seismic exploration data have high-density spatial sampling information,which provides an important data basis for the prediction of sandstone porosity in coal seam roofs by using refl ection seismic data.First,the basic principles of the variational mode decomposition(VMD)method and the random forest method are introduced.Then,the geological model of coal seam roof sandstone is constructed,seismic forward modeling is conducted,and random noise is added.The decomposition eff ects of the empirical mode decomposition(EMD)method and VMD method on noisy signals are compared and analyzed.The test results show that the firstorder intrinsic mode functions(IMF1)and IMF2 decomposed by the VMD method contain the main eff ective components of seismic signals.A prediction process of sandstone porosity in coal seam roofs based on the combination of VMD and random forest method is proposed.The feasibility and eff ectiveness of the method are verified by trial calculation in the porosity prediction of model data.Taking the actual coalfield refl ection seismic data as an example,the sandstone porosity of the 8 coal seam roof is predicted.The application results show the potential application value of the new porosity prediction method proposed in this study.This method has important theoretical guiding significance for evaluating water richness in coal seam roof sandstone and the prevention and control of mine water disasters. 展开更多
关键词 VMD random forest method coal seams SANDSTONE POROSITY
在线阅读 下载PDF
上一页 1 2 35 下一页 到第
使用帮助 返回顶部