期刊文献+
共找到75篇文章
< 1 2 4 >
每页显示 20 50 100
Multi-Label Feature Selection Based on Improved Ant Colony Optimization Algorithm with Dynamic Redundancy and Label Dependence
1
作者 Ting Cai Chun Ye +5 位作者 Zhiwei Ye Ziyuan Chen Mengqing Mei Haichao Zhang Wanfang Bai Peng Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第10期1157-1175,共19页
The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challengi... The world produces vast quantities of high-dimensional multi-semantic data.However,extracting valuable information from such a large amount of high-dimensional and multi-label data is undoubtedly arduous and challenging.Feature selection aims to mitigate the adverse impacts of high dimensionality in multi-label data by eliminating redundant and irrelevant features.The ant colony optimization algorithm has demonstrated encouraging outcomes in multi-label feature selection,because of its simplicity,efficiency,and similarity to reinforcement learning.Nevertheless,existing methods do not consider crucial correlation information,such as dynamic redundancy and label correlation.To tackle these concerns,the paper proposes a multi-label feature selection technique based on ant colony optimization algorithm(MFACO),focusing on dynamic redundancy and label correlation.Initially,the dynamic redundancy is assessed between the selected feature subset and potential features.Meanwhile,the ant colony optimization algorithm extracts label correlation from the label set,which is then combined into the heuristic factor as label weights.Experimental results demonstrate that our proposed strategies can effectively enhance the optimal search ability of ant colony,outperforming the other algorithms involved in the paper. 展开更多
关键词 Multi-label feature selection ant colony optimization algorithm dynamic redundancy high-dimensional data label correlation
在线阅读 下载PDF
Adaptive feature selection method for high-dimensional imbalanced data classification
2
作者 WU Jianzhen XUE Zhen +1 位作者 ZHANG Liangliang YANG Xu 《Journal of Measurement Science and Instrumentation》 2025年第4期612-624,共13页
Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from nume... Data collected in fields such as cybersecurity and biomedicine often encounter high dimensionality and class imbalance.To address the problem of low classification accuracy for minority class samples arising from numerous irrelevant and redundant features in high-dimensional imbalanced data,we proposed a novel feature selection method named AMF-SGSK based on adaptive multi-filter and subspace-based gaining sharing knowledge.Firstly,the balanced dataset was obtained by random under-sampling.Secondly,combining the feature importance score with the AUC score for each filter method,we proposed a concept called feature hardness to judge the importance of feature,which could adaptively select the essential features.Finally,the optimal feature subset was obtained by gaining sharing knowledge in multiple subspaces.This approach effectively achieved dimensionality reduction for high-dimensional imbalanced data.The experiment results on 30 benchmark imbalanced datasets showed that AMF-SGSK performed better than other eight commonly used algorithms including BGWO and IG-SSO in terms of F1-score,AUC,and G-mean.The mean values of F1-score,AUC,and Gmean for AMF-SGSK are 0.950,0.967,and 0.965,respectively,achieving the highest among all algorithms.And the mean value of Gmean is higher than those of IG-PSO,ReliefF-GWO,and BGOA by 3.72%,11.12%,and 20.06%,respectively.Furthermore,the selected feature ratio is below 0.01 across the selected ten datasets,further demonstrating the proposed method’s overall superiority over competing approaches.AMF-SGSK could adaptively remove irrelevant and redundant features and effectively improve the classification accuracy of high-dimensional imbalanced data,providing scientific and technological references for practical applications. 展开更多
关键词 high-dimensional imbalanced data adaptive feature selection adaptive multi-filter feature hardness gaining sharing knowledge based algorithm metaheuristic algorithm
在线阅读 下载PDF
Apple leaf disease identification using genetic algorithm and correlation based feature selection method 被引量:23
3
作者 Zhang Chuanlei Zhang Shanwen +2 位作者 Yang Jucheng Shi Yancui Chen Jia 《International Journal of Agricultural and Biological Engineering》 SCIE EI CAS 2017年第2期74-83,共10页
Apple leaf disease is one of the main factors to constrain the apple production and quality.It takes a long time to detect the diseases by using the traditional diagnostic approach,thus farmers often miss the best tim... Apple leaf disease is one of the main factors to constrain the apple production and quality.It takes a long time to detect the diseases by using the traditional diagnostic approach,thus farmers often miss the best time to prevent and treat the diseases.Apple leaf disease recognition based on leaf image is an essential research topic in the field of computer vision,where the key task is to find an effective way to represent the diseased leaf images.In this research,based on image processing techniques and pattern recognition methods,an apple leaf disease recognition method was proposed.A color transformation structure for the input RGB(Red,Green and Blue)image was designed firstly and then RGB model was converted to HSI(Hue,Saturation and Intensity),YUV and gray models.The background was removed based on a specific threshold value,and then the disease spot image was segmented with region growing algorithm(RGA).Thirty-eight classifying features of color,texture and shape were extracted from each spot image.To reduce the dimensionality of the feature space and improve the accuracy of the apple leaf disease identification,the most valuable features were selected by combining genetic algorithm(GA)and correlation based feature selection(CFS).Finally,the diseases were recognized by SVM classifier.In the proposed method,the selected feature subset was globally optimum.The experimental results of more than 90%correct identification rate on the apple diseased leaf image database which contains 90 disease images for there kinds of apple leaf diseases,powdery mildew,mosaic and rust,demonstrate that the proposed method is feasible and effective. 展开更多
关键词 apple leaf disease diseased leaf recognition region growing algorithm(RGA) genetic algorithm and correlation based feature selection(GA-CFS)
原文传递
Investigation of feature contribution to shield tunneling-induced settlement using Shapley additive explanations method 被引量:19
4
作者 K.K.Pabodha M.Kannangara Wanhuan Zhou +1 位作者 Zhi Ding Zhehao Hong 《Journal of Rock Mechanics and Geotechnical Engineering》 SCIE CSCD 2022年第4期1052-1063,共12页
Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the sett... Accurate prediction of shield tunneling-induced settlement is a complex problem that requires consideration of many influential parameters.Recent studies reveal that machine learning(ML)algorithms can predict the settlement caused by tunneling.However,well-performing ML models are usually less interpretable.Irrelevant input features decrease the performance and interpretability of an ML model.Nonetheless,feature selection,a critical step in the ML pipeline,is usually ignored in most studies that focused on predicting tunneling-induced settlement.This study applies four techniques,i.e.Pearson correlation method,sequential forward selection(SFS),sequential backward selection(SBS)and Boruta algorithm,to investigate the effect of feature selection on the model’s performance when predicting the tunneling-induced maximum surface settlement(S_(max)).The data set used in this study was compiled from two metro tunnel projects excavated in Hangzhou,China using earth pressure balance(EPB)shields and consists of 14 input features and a single output(i.e.S_(max)).The ML model that is trained on features selected from the Boruta algorithm demonstrates the best performance in both the training and testing phases.The relevant features chosen from the Boruta algorithm further indicate that tunneling-induced settlement is affected by parameters related to tunnel geometry,geological conditions and shield operation.The recently proposed Shapley additive explanations(SHAP)method explores how the input features contribute to the output of a complex ML model.It is observed that the larger settlements are induced during shield tunneling in silty clay.Moreover,the SHAP analysis reveals that the low magnitudes of face pressure at the top of the shield increase the model’s output。 展开更多
关键词 feature selection Shield operational parameters Pearson correlation method Boruta algorithm Shapley additive explanations(SHAP) analysis
在线阅读 下载PDF
Correlation of selected molecular markers in chemosensitivity prediction
5
作者 David King Thomas Keane Wei Hu 《Journal of Biomedical Science and Engineering》 2009年第7期506-515,共10页
Finding effective cancer treatment is a challenge, because the sensitivity of the cancer stems from both intrinsic cellular properties and acquired resistances from prior treatment. Previous research has revealed indi... Finding effective cancer treatment is a challenge, because the sensitivity of the cancer stems from both intrinsic cellular properties and acquired resistances from prior treatment. Previous research has revealed individual protein markers that are significant to chemosensitivity prediction. Our goal is to find correlated protein markers which are collectively significant to chemosensitivity prediction to complement the individual markers already reported. In order to do this, we used the D’ correlation measurement to study the feature selection correlations for chemosensitivity prediction of 118 anticancer agents with putatively known mechanisms of action. Three data-sets on the NCI-60 were utilized in this study: two protein datasets, one previously studied for chemosensitivity prediction and another novel to this topic, and one DNA copy number dataset. To validate our approach, we identified the protein markers that were strongly correlated by our analysis with the individual protein markers found in previous studies. Our feature analysis discovered highly correlated protein marker pairs, based on which we found individual protein markers with medical significance. While some of the markers uncovered were consistent with those previously reported, others were original to this work. Using these marker pairs we were able to further correlate the cellular functions associated with them. As an exploratory analysis, we discovered feature selection correlation patterns between and within different drug mechanisms of action for each of our datasets. In conclusion, the highly correlated protein marker pairs as well as their functions found by our feature analysis are validated by previous studies, and are shown to be medically significant, demonstrating D’ as an effective measurement of correlation in the context of feature selection for the first time. 展开更多
关键词 Cancer CHEMOSENSITIVITY correlation D’ feature selection Genetic algorithm MARKOV BLANKET Memetic algorithm NCI-60
暂未订购
基于IHA-TPE-LightGBM融合模型的NiTi基形状记忆合金相变温度预测方法
6
作者 李珺 徐亮 陈小然 《中国材料进展》 北大核心 2026年第3期245-250,共6页
提出了一种基于IHA-TPE-LightGBM的融合模型预测NiTi基形状记忆合金的相变温度(T_(p))的方法。融合遗传算法与模拟退火算法形成改进混合算法(improved hybrid algorithm,IHA),筛选影响T_(p)的特征,减少特征冗余并优化模型性能;利用非标... 提出了一种基于IHA-TPE-LightGBM的融合模型预测NiTi基形状记忆合金的相变温度(T_(p))的方法。融合遗传算法与模拟退火算法形成改进混合算法(improved hybrid algorithm,IHA),筛选影响T_(p)的特征,减少特征冗余并优化模型性能;利用非标准贝叶斯优化算法(tree-structured Parzen estimator,TPE)优化最佳模型的超参数,提升模型的精度。结果表明,提出的温度预测模型IHA-TPE-LightGBM的R^(2)评价指标为0.92,验证了该方法的有效性。该研究方法有助于开发新型NiTi基形状记忆合金,可以加快未来高性能弹热材料的发现。 展开更多
关键词 NiTi基合金 遗传算法 模拟退火算法 特征筛选 非标准贝叶斯优化算法 LightGBM
在线阅读 下载PDF
基于FCBF-IPSO的旋转机械故障诊断方法
7
作者 尹海涛 赵荣珍 +1 位作者 马驰 邓林峰 《振动.测试与诊断》 北大核心 2026年第1期99-106,218,219,共10页
针对旋转机械高维故障数据集存在着冗余特征导致分类困难和故障识别率偏低的问题,提出一种将快速相关过滤(fast correlation-based filter,简称FCBF)算法和改进粒子群优化(improved particle swarm optimization,简称IPSO)算法相结合的... 针对旋转机械高维故障数据集存在着冗余特征导致分类困难和故障识别率偏低的问题,提出一种将快速相关过滤(fast correlation-based filter,简称FCBF)算法和改进粒子群优化(improved particle swarm optimization,简称IPSO)算法相结合的故障敏感特征选择方法。首先,利用FCBF算法和数据集值域状况初步筛选特征,剔除与类别信息不相关的特征和冗余特征;其次,使用IPSO算法对筛选后的特征子集进行二次筛选,进一步剔除其中的冗余特征,得到有利于分类运算的低维敏感特征子集;最后,通过转子故障模拟数据集进行了实验验证。结果表明:该方法可有效剔除故障数据集中的不相关特征和冗余特征,利用IPSO算法对支持向量机(support vector machine,简称SVM)参数C和σ进行的优化,达到了显著提高分类器辨识精度和运行效率的效果。本研究方法为降低旋转机械故障数据资源的规模提供了一种敏感特征筛选策略,并丰富了特征选择的基础理论。 展开更多
关键词 故障诊断 特征选择 快速相关过滤算法 粒子群优化算法
在线阅读 下载PDF
An Intrusion Detection System for SDN Using Machine Learning
8
作者 G.Logeswari S.Bose T.Anitha 《Intelligent Automation & Soft Computing》 SCIE 2023年第1期867-880,共14页
Software Defined Networking(SDN)has emerged as a promising and exciting option for the future growth of the internet.SDN has increased the flexibility and transparency of the managed,centralized,and controlled network... Software Defined Networking(SDN)has emerged as a promising and exciting option for the future growth of the internet.SDN has increased the flexibility and transparency of the managed,centralized,and controlled network.On the other hand,these advantages create a more vulnerable environment with substantial risks,culminating in network difficulties,system paralysis,online banking frauds,and robberies.These issues have a significant detrimental impact on organizations,enterprises,and even economies.Accuracy,high performance,and real-time systems are necessary to achieve this goal.Using a SDN to extend intelligent machine learning methodologies in an Intrusion Detection System(IDS)has stimulated the interest of numerous research investigators over the last decade.In this paper,a novel HFS-LGBM IDS is proposed for SDN.First,the Hybrid Feature Selection algorithm consisting of two phases is applied to reduce the data dimension and to obtain an optimal feature subset.In thefirst phase,the Correlation based Feature Selection(CFS)algorithm is used to obtain the feature subset.The optimal feature set is obtained by applying the Random Forest Recursive Feature Elimination(RF-RFE)in the second phase.A LightGBM algorithm is then used to detect and classify different types of attacks.The experimental results based on NSL-KDD dataset show that the proposed system produces outstanding results compared to the existing methods in terms of accuracy,precision,recall and f-measure. 展开更多
关键词 Intrusion detection system light gradient boosting machine correlation based feature selection random forest recursive feature elimination software defined networks
在线阅读 下载PDF
基于CA/SPA-CARS算法的小麦条锈病特征波段优选与监测模型构建 被引量:1
9
作者 谷玲霄 方涛 +4 位作者 杜林丹 吴喜芳 李长春 连增增 岳哲 《农业机械学报》 北大核心 2025年第6期487-498,共12页
作物病害会严重制约作物产量和品质,传统的病害监测方法效率低且易受主观因素影响。高光谱遥感技术以其高光谱分辨率和客观真实性在作物病害监测中展现出重要潜力。本文利用多生育期冬小麦地面高光谱及田间病情指数(Disease index,DI),... 作物病害会严重制约作物产量和品质,传统的病害监测方法效率低且易受主观因素影响。高光谱遥感技术以其高光谱分辨率和客观真实性在作物病害监测中展现出重要潜力。本文利用多生育期冬小麦地面高光谱及田间病情指数(Disease index,DI),基于相关性分析(Correlation analysis,CA)和连续投影法(Successive projections algorithm,SPA)分别对光谱数据进行光谱特征降维,通过构建最优参数的竞争性自适应重加权采样(Competitive adaptive reweighted sampling,CARS)算法优选小麦条锈病敏感波段,最后利用偏最小二乘回归(Partial least squares regression,PLSR)、反向传播神经网络(Back propagation neural network,BPNN)和极限学习机(Extreme learning machine,ELM)算法建立基于特征光谱的病情指数模型,比较不同建模方法的建模效果,实现小麦条锈病监测。研究结果表明,不同生育期均显示小麦条锈病敏感特征波段多集中于近红外和短波红外波段,其中挑旗期为842、850、858 nm,灌浆期为947、953、1275、1277、1590、1663、1665 nm;对比不同建模算法,PLSR模型表现最佳,满足小麦早期病虫害监测需求,且在病害中期显示更明显特征;挑旗期和灌浆期分别以SPA-CARS-MCX和CA-CARS-MSC数据构建PLSR模型预测效果最优,验证集R2分别为0.782和0.861,RMSE分别为0.022和0.094,RPD分别为2.140和2.687。本文构建算法能够为不同生育期小麦条锈病监测提供参考。 展开更多
关键词 小麦条锈病 光谱变换 特征波段选择 相关性分析 连续投影法 竞争性自适应重加权采样
在线阅读 下载PDF
基于SAE-MSCNN的网络入侵检测
10
作者 王泽辉 郝秦霞 《计算机工程与设计》 北大核心 2025年第10期2858-2865,共8页
针对现有的网络入侵检测方法忽略了流量特征间的关联性对特征选择的重要性,且在数据平衡时未能考虑到低频攻击样本的分布离散性,导致检测性能下降的问题,提出互信息值融合(mutual information value fusion,MIVF)方法来选择与攻击行为... 针对现有的网络入侵检测方法忽略了流量特征间的关联性对特征选择的重要性,且在数据平衡时未能考虑到低频攻击样本的分布离散性,导致检测性能下降的问题,提出互信息值融合(mutual information value fusion,MIVF)方法来选择与攻击行为相关性高且彼此之间关联性低的特征。提出基于DBSCAN改进的SMOTE方法对低频攻击样本按照其密度聚类分布进行过采样;构建SAE-MSCNN分类模型来检验性能。在NSL-KDD和UNSW-NB15数据集上验证,准确率分别达到92.89%和94.85%。结果表明所提方法可以有效地选择特征以及平衡数据,尤其是提高低频攻击的检测准确率。 展开更多
关键词 网络入侵检测 互信息 特征关联 特征选择 密度聚类 过采样 数据平衡
在线阅读 下载PDF
Data-driven unsupervised anomaly detection and recovery of unmanned aerial vehicle flight data based on spatiotemporal correlation 被引量:11
11
作者 YANG Lei LI ShaoBo +3 位作者 LI ChuanJiang ZHU CaiChao ZHANG AnSi LIANG GuoQiang 《Science China(Technological Sciences)》 SCIE EI CAS CSCD 2023年第5期1304-1316,共13页
Anomaly detection is crucial to the flight safety and maintenance of unmanned aerial vehicles(UAVs)and has attracted extensive attention from scholars.Knowledge-based approaches rely on prior knowledge,while model-bas... Anomaly detection is crucial to the flight safety and maintenance of unmanned aerial vehicles(UAVs)and has attracted extensive attention from scholars.Knowledge-based approaches rely on prior knowledge,while model-based approaches are challenging for constructing accurate and complex physical models of unmanned aerial systems(UASs).Although data-driven methods do not require extensive prior knowledge and accurate physical UAS models,they often lack parameter selection and are limited by the cost of labeling anomalous data.Furthermore,flight data with random noise pose a significant challenge for anomaly detection.This work proposes a spatiotemporal correlation based on long short-term memory and autoencoder(STCLSTM-AE)neural network data-driven method for unsupervised anomaly detection and recovery of UAV flight data.First,UAV flight data are preprocessed by combining the Savitzky-Golay filter data processing technique to mitigate the effect of noise in the original historical flight data on the model.Correlation-based feature subset selection is subsequently performed to reduce the reliance on expert knowledge.Then,the extracted features are used as the input of the designed LSTM-AE model to achieve the anomaly detection and recovery of UAV flight data in an unsupervised manner.Finally,the method's effectiveness is validated on real UAV flight data. 展开更多
关键词 unmanned aerial vehicle(UAV) anomaly detection spatiotemporal correlation based on long short-term memory and autoencoder(STC-LSTM-AE) Savitzky-Golay feature selection
原文传递
融合K-means聚类和标记相关性的多标记Relief特征选择 被引量:1
12
作者 丰昌武 孙林 《聊城大学学报(自然科学版)》 2025年第1期122-134,共13页
现有Relief算法在利用标记相关性方面存在不足,忽视了局部标记相关性所提供的宝贵信息。针对这一问题,提出了一种融合K-means聚类与标记相关性的多标记Relief特征选择方法。首先,为充分考虑样本标记相关性,采用K-means聚类算法对样本进... 现有Relief算法在利用标记相关性方面存在不足,忽视了局部标记相关性所提供的宝贵信息。针对这一问题,提出了一种融合K-means聚类与标记相关性的多标记Relief特征选择方法。首先,为充分考虑样本标记相关性,采用K-means聚类算法对样本进行聚类,将其划分到不同的簇中,从而构建样本的局部标记空间。其次,定义了所有样本在特征上的欧式距离,以此衡量样本的全局标记相关性。同时,改进了传统的余弦相似度,使用L1范数的平方根进行优化,并在局部标记空间中应用改进的余弦相似度,以有效获取样本的局部标记相关性。最后,在Relief算法的基础上,融合了样本的全局标记相关性与局部标记相关性,以此作为衡量样本相似度的依据,进而判别最近邻同类样本与最近邻异类样本,最终获得特征权重。为评估所提算法的性能,在10个多标记数据集上进行了对比测试,实验结果证明,与其他多标记特征选择算法相比,本算法具有显著优势。 展开更多
关键词 多标记学习 特征选择 K-MEANS聚类 标记相关性 RELIEF算法
在线阅读 下载PDF
数据增强和复杂特征优化的类不平衡病理嗓音检测
13
作者 武雅琴 张佳庆 张涛 《应用声学》 北大核心 2025年第1期234-244,共11页
该文以提高病理嗓音多分类准确性为目标,构建了一种基于数据增强和复杂特征优化的类不平衡病理嗓音检测系统。首先,对32种声学特征进行分析并将其归类为时域类特征和频域类特征;其次,采用改进的合成少数类过采样技术对数据集进行增广与... 该文以提高病理嗓音多分类准确性为目标,构建了一种基于数据增强和复杂特征优化的类不平衡病理嗓音检测系统。首先,对32种声学特征进行分析并将其归类为时域类特征和频域类特征;其次,采用改进的合成少数类过采样技术对数据集进行增广与均衡处理;然后,结合高效相关性特征选择算法和盒图对多维声学特征进行融合优化,综合评估各特征的判别能力;最后,基于随机森林分类器,详细分析和验证不同特征组合的分类性能。结果表明,该文提出的融合优化特征集(To、Fatr、Jita、sAPQ、vAm、NHR)在随机森林分类器下,对声带小结、息肉、水肿及麻痹4种病理嗓音的分类性能表现优异,取得了88.6%的分类准确率、88.4%的召回率、88.4%的F1分数和99.7%的AUC值。 展开更多
关键词 病理嗓音 数据增强 复杂特征 高效相关性特征选择 盒图
在线阅读 下载PDF
基于Shapley加法解释算法的基酒近红外快速检测
14
作者 张贵宇 张磊 +3 位作者 庹先国 王怡博 向星睿 严俊 《光谱学与光谱分析》 北大核心 2025年第8期2393-2400,共8页
当前在白酒的摘酒工艺中对基酒等级的划分主要是采用感官评判为主的方式,该技术存在检测效率低,易受主观因素影响等问题。于是将近红外光谱技术用于基酒等级检测,并探讨可解释人工智能算法中Shapley加法解释算法(SHAP)用于选择特征光谱... 当前在白酒的摘酒工艺中对基酒等级的划分主要是采用感官评判为主的方式,该技术存在检测效率低,易受主观因素影响等问题。于是将近红外光谱技术用于基酒等级检测,并探讨可解释人工智能算法中Shapley加法解释算法(SHAP)用于选择特征光谱点的可行性。结果特征数为36时,轻量梯度提升机(LightGBM)预测模型准确率为97.08%。为进一步提高模型性能,提出区间偏最小二乘(iPLS)结合SHAP的混合策略,结果当特征数为9时,LightGBM模型达到99.27%的准确率。iPLS区间划分与SHAP贡献值的空间分布分析表明:SHAP贡献值排名并不严格等于预测性能排名,合理设计特征选择策略后可提高模型性能。 展开更多
关键词 基酒 近红外光谱 特征选择 可解释人工智能算法
在线阅读 下载PDF
改进的基于粒子群优化的支持向量机特征选择和参数联合优化算法 被引量:38
15
作者 张进 丁胜 李波 《计算机应用》 CSCD 北大核心 2016年第5期1330-1335,共6页
针对支持向量机(SVM)中特征选择和参数优化对分类精度有较大影响,提出了一种改进的基于粒子群优化(PSO)的SVM特征选择和参数联合优化算法(GPSO-SVM),使算法在提高分类精度的同时选取尽可能少的特征数目。为了解决传统粒子群算法... 针对支持向量机(SVM)中特征选择和参数优化对分类精度有较大影响,提出了一种改进的基于粒子群优化(PSO)的SVM特征选择和参数联合优化算法(GPSO-SVM),使算法在提高分类精度的同时选取尽可能少的特征数目。为了解决传统粒子群算法在进行优化时易出现陷入局部最优和早熟的问题,该算法在PSO中引入遗传算法(GA)中的交叉变异算子,使粒子在每次迭代更新后进行交叉变异操作来避免这一问题。该算法通过粒子之间的不相关性指数来决定粒子之间的交叉配对,由粒子适应度值的大小决定其变异概率的大小,由此产生新的粒子进入到群体中。这样使得粒子跳出当前搜索到的局部最优位置,提高了群体的多样性,在全局范围内寻找更优值。在不同数据集上进行实验,与基于PSO和GA的特征选择和SVM参数联合优化算法相比,GPSO-SVM的分类精度平均提高了2%~3%,选择的特征数目减少了3%~15%。实验结果表明,所提算法的特征选择和参数优化效果更好。 展开更多
关键词 支持向量机 特征选择 参数优化 粒子群优化算法 遗传算法 不相关性指数
在线阅读 下载PDF
特征选择方法中三种度量的比较研究 被引量:9
16
作者 宋智超 康健 +1 位作者 孙广路 何勇军 《哈尔滨理工大学学报》 CAS 北大核心 2018年第1期111-116,共6页
不同类型数据中特征与类别以及特征与特征之间存在一定的线性和非线性相关性。针对基于不同度量的特征选择方法在不同类型数据集上选取的特征存在明显差别的问题,本文选择线性相关系数、对称不确定性和互信息三种常用的线性或非线性度量... 不同类型数据中特征与类别以及特征与特征之间存在一定的线性和非线性相关性。针对基于不同度量的特征选择方法在不同类型数据集上选取的特征存在明显差别的问题,本文选择线性相关系数、对称不确定性和互信息三种常用的线性或非线性度量,将它们应用于基于相关性的快速特征选择方法中,对它们在基因微阵列和图像数据上的特征选择效果进行实验验证和比较。实验结果表明,基于相关性的快速特征选择方法使用线性相关系数在基因数据集上选取的特征集往往具有较好分类准确率,使用互信息在图像数据集上选取的特征集的分类效果较好,使用对称不确定性在两种类型数据上选取特征的分类效果较为稳定。 展开更多
关键词 特征选择 线性相关系数 对称不确定性 互信息 基于相关性的快速特征选择方法
在线阅读 下载PDF
基于归一化互信息的FCBF特征选择算法 被引量:22
17
作者 段宏湘 张秋余 张墨逸 《华中科技大学学报(自然科学版)》 EI CAS CSCD 北大核心 2017年第1期52-56,共5页
针对高维数据中不相关特征、冗余特征等导致的分类任务计算量大、分类正确率低等问题,提出了一种基于归一化互信息的相关性快速过滤特征选择(FCBF-NMI)算法.该算法采用归一化互信息代替对称不确定性作为FCBF算法的相关性评价标准,进行... 针对高维数据中不相关特征、冗余特征等导致的分类任务计算量大、分类正确率低等问题,提出了一种基于归一化互信息的相关性快速过滤特征选择(FCBF-NMI)算法.该算法采用归一化互信息代替对称不确定性作为FCBF算法的相关性评价标准,进行特征与类别、特征与特征的相关性分析,删除不相关特征及冗余特征以获得最优特征子集.实验结果表明:FCBF-NMI算法得到的最优特征子集更合理,平均分类正确率为89.68%,所用时间平均低至2.64s. 展开更多
关键词 高维数据 特征选择 归一化互信息 相关性快速过滤特征选择(FCBF) 分类
原文传递
一种新的基于统计的自动文本分类方法 被引量:48
18
作者 刘斌 黄铁军 +1 位作者 程军 高文 《中文信息学报》 CSCD 北大核心 2002年第6期18-24,共7页
自动文本分类就是在给定的分类体系下 ,让计算机根据文本的内容确定与它相关联的类别。为了提高分类性能 ,本文提出了中文文本多层次特征提取方法和基于核的距离加权KNN算法。多层次特征提取方法在汉字、常用词表和专业词表三个层次上... 自动文本分类就是在给定的分类体系下 ,让计算机根据文本的内容确定与它相关联的类别。为了提高分类性能 ,本文提出了中文文本多层次特征提取方法和基于核的距离加权KNN算法。多层次特征提取方法在汉字、常用词表和专业词表三个层次上提取文档的统计特征 ,能够更好地反映文档的统计分布。基于核的距离加权KNN算法解决了样本的多峰分布、边界重叠问题和分类器的精确分类决策问题。实际应用中 ,互联网和文本库提供了大量经过粗分类的训练文本 ,但普遍存在样本质量较差的问题 ,本文通过样本重要性分析技术解决此问题。实验系统证明了新方法的有效性。 展开更多
关键词 统计 自动文本分类 多层次特征提取 距离加权KNN算法 样本重要性分析 汉字识别
在线阅读 下载PDF
基于最大信息系数的关联性特征选择算法:MICCFS 被引量:10
19
作者 罗幼喜 谢昆明 +1 位作者 胡超竹 李翰芳 《华中师范大学学报(自然科学版)》 CAS CSCD 北大核心 2023年第6期777-785,共9页
针对基于关联性特征选择算法(CFS)对于回归任务只能识别变量线性关系和分类任务使用对称不确定性度量的不足,提出一种基于最大信息系数(MIC)的CFS特征选择算法:MICCFS.将回归任务中衡量变量间的线性相关系数和分类任务中的对称不确定性... 针对基于关联性特征选择算法(CFS)对于回归任务只能识别变量线性关系和分类任务使用对称不确定性度量的不足,提出一种基于最大信息系数(MIC)的CFS特征选择算法:MICCFS.将回归任务中衡量变量间的线性相关系数和分类任务中的对称不确定性度量改进为MIC度量方式.运用最佳优先搜索算法搜索特征子集,以UCI机器学习数据库中11个回归数据集和10个分类数据集为实验对象,采用支持向量机、k近邻算法、朴素贝叶斯模型、决策树四种分类器,比较了MICCFS和CFS以及常用特征选择方法SVMRFE、Lasso、MIM、Relief F、Chi-Square的效果,结果表明MICCFS具有一定优势. 展开更多
关键词 关联性特征选择 最大信息系数 特征选择 分类 降维
在线阅读 下载PDF
基于词条属性聚类的文本特征选择算法 被引量:4
20
作者 张群 王红军 王伦文 《计算机应用研究》 CSCD 北大核心 2017年第2期369-372,377,共5页
文本挖掘之前首先要对文本集进行有效的特征选择。传统的特征选择算法在维数约减及文本表征方面效果有限,并且因需要用到文本的类别信息而不适用于无监督的文本聚类任务。针对这种情况,设计一种适用于文本聚类任务的特征选择算法,提出... 文本挖掘之前首先要对文本集进行有效的特征选择。传统的特征选择算法在维数约减及文本表征方面效果有限,并且因需要用到文本的类别信息而不适用于无监督的文本聚类任务。针对这种情况,设计一种适用于文本聚类任务的特征选择算法,提出词条属性的概念。首先基于词频、文档频、词位置及词间关联性构建词条特征模型,重点研究了词位置属性及词间关联性属性的权值计算方法,改进了Apriori算法用于词间关联性属性权值计算;然后通过改进的K-means聚类算法对词条特征模型进行多次聚类完成文本特征选择。实验结果表明,与传统特征选择算法相比,该算法在获得较好维数约减率的同时提高了所选特征词的文本表征能力,能有效适用于文本聚类任务。 展开更多
关键词 文本特征选择 词条属性 词位置 词间关联性 关联规则算法 K-均值算法
在线阅读 下载PDF
上一页 1 2 4 下一页 到第
使用帮助 返回顶部