摘要
基于逻辑判别式(LD,Logistic Discrimination),提出一种叫做LDRC(LD based Rare-class Classification)方法用于提升LD在稀有类问题中的泛化性能.为了充分考虑稀有类的特性,构建了一种新目标函数RPM(Recall and Precision based M etric),其同时考虑正类和负类的召回率以及正类的精度,其中正类和负类的召回率用于保障模型在评估指标召回率以及g-mean(正类和分类的召回率的几何平均数)上具有较高的泛化能力,正类的召回率和精度用于保障了模型具有较高的准确率以及fmeasure值(基于正类召回率与精度的指标).LDRC使用RPM作为目标函数监督参数学习过程,以保障LDRC具有较高的整体泛化能力.UCI数据集上的实验结果表明,与传统的逻辑判别、基于过采样和基于欠采样的逻辑判别相比,LDRC模型在评价指标召回率、g-mean和f-measure上都表现出明显优势.
Based on LD ( Logistic Discrimination ), we provide a novel method called LDRC ( LD based Rare-class Classification ) toenhance the generalization performance of LD on rare-class problem. Take full use of the character of rare-class, we consuct a newobjective function RPM ( Metric based on Recall and Precision) which take into account the recall of both positive and negative classas well as the precision of positive class. The recall of both positive and negative class guarantee LDRC has a better generalization per-formance on recall and g-mean while the precision and recall of positive class ensure LDRC has better generalization performance onaccuracy and f-measure. LDRC learn the parameter with the objective function RPM to get better performance. The experiments onUCI data sets show that the proposed method presents significant advantage comparing to LD,LD based on Under-Sample and Over-Sample on measures of recall, g-mean and f-measure.
出处
《小型微型计算机系统》
CSCD
北大核心
2016年第1期140-145,共6页
Journal of Chinese Computer Systems
基金
国家自然科学基金(61202194
61402393
61572417)资助
河南省教育厅科学技术研究项目(14A520016
14B520045
12A520035)资助
关键词
稀有类
逻辑判别
召回率
精度
分类
rare-class
logistic discrimination
recall
precision
classification