摘要
文章在概念层面上将标签分为事实型、主观型和个人化3类,根据相关元数据构建词表,并根据标签在用户生成内容中的句法构成制定识别规则,结合二者将标签进行分类。以中国最大的电影标注系统豆瓣网675351位用户的标签数据为例进行实验,实验的召回率为95.01%、准确率为96.19%、F1-measure为95.32%,结果表明这种方法可以较好地实现标签自动分类工作。
This paper designs a tag classification approach which combines thesaurus with syntax rule, improving the effectiveness of tag classification.Firstly, the authors divide the tags into 3 types: factual, subjective and personal. Sec- ondly, the authors construct a thesaurus based on metadata which can be fetched from online wiki and social tagging system, and design identification rules based on the tags' syntactic structure in user generated content, as well as com- bine two methods to classify the tag.The authors conduct an experiment based on 675351 Douban Movie users' social tagging. The results show that the recall is 95.01%, the precision is 96.19%, and the Fl-measure is 95.32%. Experiment result shows that the method can solve the problem of tag classification well.
出处
《情报资料工作》
CSSCI
北大核心
2017年第5期63-69,共7页
Information and Documentation Services
关键词
社会化标签
自建词表
句法规则
social tag, self-built thesaurus, syntactic rules