摘要
讨论了用传统性能评估指标(精度和错误率)评估文本分类器时面临的困难,分析了目前几种常用的文本分类器性能评估指标:查 全率-查准率曲线、AUROC、F1值以及BEP值的优点与不足,并提出了两种新的可用于文本分类器的性能评估指标。
Afterdiscussing on the deficiency of accuracy and error rate as performance metrics for text categorization problems, this paper analyzes the strength and shortcoming of recall-precision curve, area under the ROC curve, F1value and break-even point, which are the most prevailing performance metrics for text classifiers, and proposes two kinds of new performance metric for text classifier.
出处
《计算机工程》
CAS
CSCD
北大核心
2004年第13期107-109,127,共4页
Computer Engineering
关键词
文本分类
性能评估
BEP值
ROC曲线
Text categorization
Performance evaluation
Breaks-even point
Receiver operating characteristic curve