摘要
传统分类算法隐含的假设(每个样本的误差都具有相同的代价,且接受每个样本的分类结果)并不适用于医疗诊断、故障诊断、欺诈检测等领域的实际需求。在定义拒识代价的基础上,本文提出一种嵌入非对称误分类代价和非对称拒识代价的二元分类算法(CSVM-CM C2RC),包括以下4个步骤:学习代价敏感支持向量机、估计每个样本的后验概率、计算每个样本的分类可靠性、确定每类样本的最优拒识阈值。基于标准数据集的试验研究表明,CSVM-CM C2RC能有效地降低误分类率和平均代价,提高分类结果的可靠性。
To minimize "0-1" loss,most of conventional classification algorithms non-explicitly assume that all results of classification are accepted. However,the assumption is inappliable to knowledge extraction in such fields as medical diagnosis, fault diagnosis and fraud detection. In this paper ,the algorithm Cost-sensitive SVM with Class-dependent Misclassification Cost and Class-dependent Reject Cost (CSVM-CMC2RC) is proposed. In CSVM-CMCZRC algorithm,firstly,a cost-sensitive SVM is trained to obtain the preliminary classification results. Secondly, the post probability of each sample is computed. Thirdly,the classification reliability of each sample is estimated. Finally,the optimal reject threshold and the final reject decision are determined based on minimizing the average cost. Experimental results demonstrate that the proposed CSVM-CMC2RC algorithm can reduce the misclassification rate and average cost,and the classification reliability is improved.
出处
《广西师范大学学报(自然科学版)》
CAS
北大核心
2010年第3期104-108,共5页
Journal of Guangxi Normal University:Natural Science Edition
基金
国家自然科学基金资助项目(60905034)
浙江省自然科学基金资助项目(Y1080950)
国家公益行业专项资助(2007GYJ016)
云南省教育厅科学研究基金资助项目(08C0019)