For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic...For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.展开更多
Accurate identification of water sources is crucial for effective water management and safety in mining operations.However,imbalanced water sample datasets often lead to suboptimal classification accuracy.To address t...Accurate identification of water sources is crucial for effective water management and safety in mining operations.However,imbalanced water sample datasets often lead to suboptimal classification accuracy.To address this challenge,this study proposes a novel water source identification method integrating Synthetic Minority Over-Sampling Technique(SMOTE),Zebra Optimization Algorithm(ZOA),and Light Gradient Boosting Machine(LightGBM).Initially,SMOTE is utilized to synthesize samples for the minority class within the imbalanced dataset,thereby generating a balanced water sample dataset and mitigating class distribution disparities.Subsequently,an efficient water source identification model is constructed by combining ZOA with LightGBM,leveraging the strengths of both algorithms.The model’s performance is validated using a test set and compared with other common classification models.Results demonstrate that SMOTE significantly alleviates class imbalance and enhances the classification accuracy of LightGBM for minority class water samples.ZOA parameter tuning accelerates model convergence and further improves classification accuracy,optimizing the model’s overall performance.In experimental validation,the proposed SMOTE-ZOA-LightGBM model achieved an accuracy of 88.41%and a F1 score of 88.24%,outperforming six other classification models.The method proposed in this paper can accurately identify water source types,effectively addressing the issue of low classification accuracy caused by imbalanced water sample data.It provides reliable technical support and scientific basis for identifying and preventing water inrush sources in mines.展开更多
The liver is a crucial gland and the second-largest organ in the human body and also essential in digestion,metabolism,detoxification,and immunity.Liver diseases result from factors such as viral infections,obesity,al...The liver is a crucial gland and the second-largest organ in the human body and also essential in digestion,metabolism,detoxification,and immunity.Liver diseases result from factors such as viral infections,obesity,alcohol consumption,injuries,or genetic predispositions.Pose significant health risks and demand timely diagnosis and treatment to enhance survival rates.Traditionally,diagnosing liver diseases relied heavily on clinical expertise,often leading to subjective,challenging,and time-intensive processes.However,early detection is essential for effective intervention,and advancements in machine learning(ML)have demonstrated remarkable success in predicting various conditions,including Chronic Obstructive Pulmonary Disease(COPD),hypertension,and diabetes.This study proposed a novel XGBoost-liver predictor by integrating distinct feature methodologies,including Ranking and Statistical Projection-based strategies to detect early signs of liver disease.The Fisher score method is applied to perform global interpretation analysis,helping to select optimal features by assessing their contributions to the overall model.The performance of the proposed model has been extensively evaluated through k-fold cross-validation tests.Firstly,the performance of the proposed model is evaluated using individual and hybrid features.Secondly,the XGBoost-Liver model performance is compared to that of commonly used classifier algorithms.Thirdly,its performance is compared with the existing state-of-the-art computational models.The experimental results show that the proposed model performed better than the existing predictors,reaching an average accuracy rate of 92.07%.This paper demonstrates the potential of machine learning to improve liver disease prediction,enhance diagnostic accuracy,and enable timely medical interventions for better patient outcomes.展开更多
针对动态变化的信道环境,自适应正交频分复用(Orthogonal Frequency Division Multiplexing,OFDM)系统可以对子载波间隔和循环前缀长度进行调整,以最大化系统的吞吐量。为了能够快速准确地找到OFDM系统在不同信道环境中的最优子载波间...针对动态变化的信道环境,自适应正交频分复用(Orthogonal Frequency Division Multiplexing,OFDM)系统可以对子载波间隔和循环前缀长度进行调整,以最大化系统的吞吐量。为了能够快速准确地找到OFDM系统在不同信道环境中的最优子载波间隔和循环前缀长度取值,本文提出了基于随机森林的OFDM系统自适应算法。随机森林算法基于集成的思想,能够有效处理高维度数据,并且具有高效率、高准确率和强泛化能力等优势,可以在复杂的数据场景下进行有效的分类。通过提取通信过程中信噪比、用户移动速度、最大多普勒频率和均方根时延扩展等信道特征与OFDM系统的子载波间隔和循环前缀长度组成训练样本,利用随机森林算法创建了OFDM系统参数多分类模型。所提模型可以根据输入的信道特征,实现OFDM系统子载波间隔和循环前缀长度的自适应分配。同时,针对训练样本主要集中在少数几个系统参数类别的情况,利用合成少数类过采样技术对较少样本数的类别进行扩充,满足了随机森林算法对训练样本类别平衡化的需求,进一步提高了算法的分类准确率。相比传统的自适应算法,所提算法具有更高的分类准确率和模型泛化能力。分析和仿真结果表明,与子载波间隔和循环前缀长度固定的OFDM系统相比,本文所提出的自适应算法能够准确选择出最优的系统参数,可以有效地减轻信道中符号间干扰和子载波间干扰的影响,从而在整个信噪比范围上提供最大的平均频谱效率。基于随机森林的OFDM系统自适应算法能够动态地分配子载波间隔和循环前缀长度,增强OFDM系统的通信质量和抗干扰能力,实现在不同信道环境下的可靠传输。展开更多
基金supported by the National Key Research and Development Program of China(2018YFB1003700)the Scientific and Technological Support Project(Society)of Jiangsu Province(BE2016776)+2 种基金the“333” project of Jiangsu Province(BRA2017228 BRA2017401)the Talent Project in Six Fields of Jiangsu Province(2015-JNHB-012)
文摘For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
基金funding from the Natural Science Foundation of Henan Province(252300421852)the State Key Laboratory of Development and Comprehensive Utilization of Coking Coal Resources(41040220201308)+4 种基金the National Natural Science Foundation of China(41972254)the China Postdoctoral Science Foundation(2019M662494)Supported by the Key Scientific Research Projects of Higher Education Institutions of Henan Province(19A170005)the Fundamental Research Funds for the Universities of Henan Province(NSFRF200337,NSFRF200103)Key Research and Development Project of Henan Province(251111322300).
文摘Accurate identification of water sources is crucial for effective water management and safety in mining operations.However,imbalanced water sample datasets often lead to suboptimal classification accuracy.To address this challenge,this study proposes a novel water source identification method integrating Synthetic Minority Over-Sampling Technique(SMOTE),Zebra Optimization Algorithm(ZOA),and Light Gradient Boosting Machine(LightGBM).Initially,SMOTE is utilized to synthesize samples for the minority class within the imbalanced dataset,thereby generating a balanced water sample dataset and mitigating class distribution disparities.Subsequently,an efficient water source identification model is constructed by combining ZOA with LightGBM,leveraging the strengths of both algorithms.The model’s performance is validated using a test set and compared with other common classification models.Results demonstrate that SMOTE significantly alleviates class imbalance and enhances the classification accuracy of LightGBM for minority class water samples.ZOA parameter tuning accelerates model convergence and further improves classification accuracy,optimizing the model’s overall performance.In experimental validation,the proposed SMOTE-ZOA-LightGBM model achieved an accuracy of 88.41%and a F1 score of 88.24%,outperforming six other classification models.The method proposed in this paper can accurately identify water source types,effectively addressing the issue of low classification accuracy caused by imbalanced water sample data.It provides reliable technical support and scientific basis for identifying and preventing water inrush sources in mines.
基金supported by Research Supporting Project Number(RSPD2025R585),King Saud University,Riyadh,Saudi Arabia.
文摘The liver is a crucial gland and the second-largest organ in the human body and also essential in digestion,metabolism,detoxification,and immunity.Liver diseases result from factors such as viral infections,obesity,alcohol consumption,injuries,or genetic predispositions.Pose significant health risks and demand timely diagnosis and treatment to enhance survival rates.Traditionally,diagnosing liver diseases relied heavily on clinical expertise,often leading to subjective,challenging,and time-intensive processes.However,early detection is essential for effective intervention,and advancements in machine learning(ML)have demonstrated remarkable success in predicting various conditions,including Chronic Obstructive Pulmonary Disease(COPD),hypertension,and diabetes.This study proposed a novel XGBoost-liver predictor by integrating distinct feature methodologies,including Ranking and Statistical Projection-based strategies to detect early signs of liver disease.The Fisher score method is applied to perform global interpretation analysis,helping to select optimal features by assessing their contributions to the overall model.The performance of the proposed model has been extensively evaluated through k-fold cross-validation tests.Firstly,the performance of the proposed model is evaluated using individual and hybrid features.Secondly,the XGBoost-Liver model performance is compared to that of commonly used classifier algorithms.Thirdly,its performance is compared with the existing state-of-the-art computational models.The experimental results show that the proposed model performed better than the existing predictors,reaching an average accuracy rate of 92.07%.This paper demonstrates the potential of machine learning to improve liver disease prediction,enhance diagnostic accuracy,and enable timely medical interventions for better patient outcomes.
文摘针对动态变化的信道环境,自适应正交频分复用(Orthogonal Frequency Division Multiplexing,OFDM)系统可以对子载波间隔和循环前缀长度进行调整,以最大化系统的吞吐量。为了能够快速准确地找到OFDM系统在不同信道环境中的最优子载波间隔和循环前缀长度取值,本文提出了基于随机森林的OFDM系统自适应算法。随机森林算法基于集成的思想,能够有效处理高维度数据,并且具有高效率、高准确率和强泛化能力等优势,可以在复杂的数据场景下进行有效的分类。通过提取通信过程中信噪比、用户移动速度、最大多普勒频率和均方根时延扩展等信道特征与OFDM系统的子载波间隔和循环前缀长度组成训练样本,利用随机森林算法创建了OFDM系统参数多分类模型。所提模型可以根据输入的信道特征,实现OFDM系统子载波间隔和循环前缀长度的自适应分配。同时,针对训练样本主要集中在少数几个系统参数类别的情况,利用合成少数类过采样技术对较少样本数的类别进行扩充,满足了随机森林算法对训练样本类别平衡化的需求,进一步提高了算法的分类准确率。相比传统的自适应算法,所提算法具有更高的分类准确率和模型泛化能力。分析和仿真结果表明,与子载波间隔和循环前缀长度固定的OFDM系统相比,本文所提出的自适应算法能够准确选择出最优的系统参数,可以有效地减轻信道中符号间干扰和子载波间干扰的影响,从而在整个信噪比范围上提供最大的平均频谱效率。基于随机森林的OFDM系统自适应算法能够动态地分配子载波间隔和循环前缀长度,增强OFDM系统的通信质量和抗干扰能力,实现在不同信道环境下的可靠传输。