期刊文献+

一种面向不平衡数据集的核Fisher线性判别分析方法 被引量:6

A Kernel Fisher Linear Discriminant Analysis Approach Aiming at Imbalanced Data Set
原文传递
导出
摘要 实际应用中,很多分类问题是面向不平衡数据的分类,而不平衡数据集会导致许多分类器的性能下降.文中介绍核Fisher线性判别分析的分类机制,分析不平衡数据导致核Fisher线性判别分析失效的原因,进而提出一种加权核Fisher线性判别分析方法.该方法通过调整两类样本的核协方差矩阵对核类内离散度矩阵的贡献,可克服不平衡数据对分类性能的影响.为进一步测试该方法,对UCI数据集进行实验测试,实验结果表明该方法可有效改进分类器的分类性能. In practical real applications lots of classification questions are aiming at imbalanced data sets, while these unbalanced data will lead to the descending of the classification performance of many classifiers. In this paper the classification mechanism based on kernel fisher linear discriminant analysis (KFDA) is introduced, and then the reasons that the unbalanced data cause KFDA to turn ineffective is analyzed. Therefore, a weighted kernel fisher linear discriminant analysis (WKFDA) method is proposed. The method balances the contributions from kernel covariance matrices of two classes of sample to the kernel within-class scatter matrix and can constrain the influence of unbalanced data on classification performance. The experiments on 7 UCI datasets are performed to further test the performance of our algorithm. The experimental results show that the developed approach can effectively improve the classification performance of the proposed classifier.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2010年第3期414-420,共7页 Pattern Recognition and Artificial Intelligence
基金 国家自然科学基金(No.60873176) 江苏省自然科学基金(No.BK2008430)资助项目
关键词 不平衡数据集 核Fisher线性判别分析(KFDA) 过抽样 欠抽样 Imbalanced Data Set, Kernel Fisher Linear Discriminant Analysis (KFDA), Over-Sampling,Under-Sampling
  • 相关文献

参考文献20

  • 1Sun Yanmin,Kamela M S,Wong A K C,et al.Cost-Sensitive Boosting for Classification of Imbalanced Data.Pattern Recognition,2007,40(12):3358 -3378.
  • 2Chan P K,Stolfo S J.Toward Scalable Learning with Non-Uniform Class and Cost Distributions:A Case Study in Credit Card Fraud Detection//Proc of the 4th International Conference on Knowledge Discovery and Data Mining.New York,USA,1998:164-168.
  • 3Weiss G M,Hirsh H.Learning to Predict Rare Events in Event Sequences // Proc of the 4th International Conference on Knowledge Discovery and Data Mining.New York,USA,1998:359 -363.
  • 4Atiya A F.Bankruptcy Prediction for Credit Risk Using Neural Network:A Survey and New Results.IEEE Trans on Neural Networks,2001,12(4):929 -935.
  • 5Kubat M,Holte R C,Matwin S.Machine Learning for the Detection of Oil Spills in Satellite Radar Images.Machine Learning,1998,30(2/3):195 -215.
  • 6Maloof M A.Learning When Data Sets Are Imbalanced and When Costs Are Unequal and Unknown//Proc of the Workshop on Learning from Imbalanced Data Sets.Washington,USA,2003:73-80.
  • 7Kubat M,Matwin S.Addressing the Curse of Imbalanced Training Sets:One-Sided Selection//Proc of the 14th International Conference on Machine Learning.San Francisco,USA,1997:179 -186.
  • 8Chawla N N,Bowyer K W,Kegelmeyer W P.SMOTE:Synthetic Minority Over-Sampling Technique.Journal of Artificial Intelligence Research,2002,16:321 -357.
  • 9Josh I M,Kumar V,Agarwal R.Evaluating Boosting Algorithms to Classify Rare Classes:Comparison and Improvements//Proc of the 1st IEEE International Conference on Data Mining.San Jose,USA,2001:257 -264.
  • 10Edward Y,Wu Changgang.Class-Boundary Alignment for Imbalanced Dataset Learning//Proc of the Workshop on Learning from Imbalanced Datasets.Washington,USA,2003:49-56.

二级参考文献30

  • 1张启蕊,张凌,董守斌,谭景华.训练集类别分布对文本分类的影响[J].清华大学学报(自然科学版),2005,45(S1):1802-1805. 被引量:27
  • 2边肇祺 张学工 等.模式识别[M].北京:清华大学出版社,2001..
  • 3DRUMMOND C,HOLTE R C.C4.5,class imbalance,and cost sensitivity:why under-sampling beats over-sampling[C]// Proc of Learning from Imbalanced Datasets Ⅱ.Washington DC,2003.
  • 4QUINLAN R J.C4.5:programs for machine learning[M].Seattle:Morgan Kaufman Publishers,1993.
  • 5WITTEN I H,FRANK E.Data ming:practical machine learning tools and techniques with Java implementations[M].Seattle:Morgan Kauman Publishers,2000:265-314.
  • 6Fisher R A. The Statistical Utilization of Multiple Measurements. Annals of Eugenics, 1938, 8: 376- 386
  • 7Mika S, Ratsch G, Weston J, Scholkopf B, Muller K. Fisher Discriminant Analysis with Kernels. In: Proc of the IEEE Neural Networks for Signal Processing Workshop, Madison, 1999, 41 - 48
  • 8Scholkopf B, Mika S, et al. Input Space Versus Feature Space in Kernel-Based Methods. IEEE Trans on Neural Networks, 1999, 10(5): 1000- 1017
  • 9Weston J, Watkins C. Support Vector Machines for Multi-Class Pattern Recognition. In: Proc of 7th European Symposium on Artificial Neural Networks, Bruges, Belgium, 1999, 219- 224
  • 10Foley D H, Sammon J W. An Optimal Set of Discriminant Vectors. IEEE Trans on Computers, 1975, 24(3) : 281 - 289

共引文献72

同被引文献62

引证文献6

二级引证文献13

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部