期刊文献+

中文微博语料情感类别自动标注方法 被引量:5

Automatic annotation methods for Chinese micro-blog corpus with sentiment class
在线阅读 下载PDF
导出
摘要 针对大规模微博语料手动标注困难的问题,提出了中文微博语料情感类别自动标注的方法,包括基于关键词的、基于概率求和的和基于概率乘积的3种自动标注方法和一种集成标注方法。自动标注时首先分别使用3种标注方法进行标注,得到3种标注结果;然后,采用标注方法集成的策略,对3种标注的结果通过投票的方式决定最终的标注结果。通过设计自动标注实验系统进行实验,实验结果验证了所提方法的可行性和有效性。实验结果表明,单个标注方法的准确率均在70%以上,投票方法的准确率达90%以上。 For the difficulty of manual annotation on large-scale micro-blog corpus, three automatic annotation methods and an integrated annotation method by voting for Chinese micro-blog corpus were proposed. Three automatic annotation methods included keywords-based annotation method, probability-summation-based annotation method and probability-product-based annotation method. During the process of automatic annotation, firstly, micro-blog corpus were annotated by three annotation methods respectively, and three results were obtained, then the final annotation results were determined by voting method with the integrated strategy. By designing automatic annotation experiment system, experimental results verify the feasibility and effectiveness of the proposed methods, and show that the accuracy of the single annotation method is more than 70%, and it is more than 90% for the voting method.
出处 《计算机应用》 CSCD 北大核心 2014年第8期2188-2191,共4页 journal of Computer Applications
基金 国家社会科学基金资助项目(12BYY045) 教育部新世纪优秀人才支持计划项目(NCET-12-0939)
关键词 中文微博 微博情感 情感分类 自动标注 准确率 Chinese micro-blog micro-blog sentiment micro-blog sentiment classification automatic annotation accuracy
  • 相关文献

参考文献10

  • 1YANG A,ZHOU Y,LIN J.A method of Chinese texts sentiment classification based on Bayesian algorithm [J].Applied Mechanics and Materials,2012,263/264/265/266:2185-2190.
  • 2YANG A,LIN J,ZHOU Y,et al.Research on building a Chinese sentiment lexicon based on SO-PMI [J].Applied Mechanics and Materials,2012,263/264/265/266:1688-1693.
  • 3崔刚,盛永梅.语料库中语料的标注[J].清华大学学报(哲学社会科学版),2000,15(1):89-94. 被引量:36
  • 4徐琳宏,林鸿飞,赵晶.情感语料库的构建和分析[J].中文信息学报,2008,22(1):116-122. 被引量:117
  • 5庞磊,李寿山,周国栋.基于情绪知识的中文微博情感分类方法[J].计算机工程,2012,38(13):156-158. 被引量:32
  • 6韩忠明,张玉沙,张慧,万月亮,黄今慧.有效的中文微博短文本倾向性分类算法[J].计算机应用与软件,2012,29(10):89-93. 被引量:39
  • 7中国计算机学会.评测测试数据[EB/OL].[2013-12-10].http://tcci.ccf.org.cn/conference/2013/pages/page04_tdata.html.
  • 8大连理工大学信息检索研究室.情感词汇本体库[EB/OL].[2014-01-18].http://ir.dlut.edu.cn/EmotionOntologyDownload.aspx?utm_source=weibolife.
  • 9姜飞,张辉,刘奕群,等.THUIR-Senti中文微博情绪分析评测报告[EB/OL].[2013-12-02].http://tcci.ccf.org.cn/conference/2013/dldoc/evrpt02.rar.
  • 10孙晓,叶嘉琪,唐诚意,等.基于多粒度模型的中文微博情感分析[EB/OL].[2013-12-02].http://tcci.ccf.org.cn/conference/2013/dldoc/evrpt02.rar.

二级参考文献51

共引文献216

同被引文献76

引证文献5

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部