摘要
文本的情感分类问题,即判断文本中的论断是持支持态度还是反对态度.已有的研究表明,监督分类方法对情感分类很有效.但是多数情况下,已有的标注数据与待判断情感类别的数据不属于同一个领域,此时监督分类算法的性能明显下降,由此产生的即为跨领域情感分类问题.为解决此问题,提出一个统一框架,分多阶段进行跨领域情感分类:首先利用训练域文本的准确标签来得到测试域文本的初始标签;然后将测试域建成一个加权网络,将一些较准确的测试文本作为"源点"和"汇点",进一步利用热传导思想迭代进行跨领域情感分类.实验结果表明,此方法能大幅度提高跨领域情感分类的精度.
Sentiment classification of documents aims to determine the opinion (e. g. , negative or positive) of a given document. Existing studies have shown that, Usually, supervised classification approaches perform well in sentiment classification. However, in most cases, the existing labeled data and the unlabeled data don't belong to the same domain. And the performance of sentiment classification decreases sharply when transferred from one domain to another domain. This causes cross-domain sentiment classification, which is a very significant problem and getting more and more attention. A unified framework is proposed, which integrates several stages for cross-domain sentiment classification. Firstly, we utilize the accurate labels of source-domain documents to get the initial labels of target-domain documents. Then, we build the target domain as a weighted network, and choose some target-domain documents whose opinions are determined more accurately as "source components" and "sink components". Further, we apply heat conduction process to the weighted network to improve the performance of cross-domain sentiment classification of target-domain data, with the help of "source components" and "sink components". An experiment is conducted using data from three different domains, and we transfer between two of them. The experiment results indicate that the proposed framework could improve the performance of cross-domain sentiment classification dramatically.
出处
《计算机研究与发展》
EI
CSCD
北大核心
2013年第8期1683-1689,共7页
Journal of Computer Research and Development
基金
国家自然科学基金项目(61100083
60903139
61173064)
国家自然科学基金重点课题(60933005)
国家"二四二"安全专项基金项目(2011F65
2011A001)
国家"九七三"重点基础研究发展计划基金项目(2012CB316303)
关键词
跨领域
情感分类
热传导模型
倾向性分析
迁移学习
cross domain
sentiment classi{ication
heat conduction model
opinion analysis
weighted network