摘要
针对网络评论挖掘中的产品特征抽取准确度不高、人工参与较多和难以处理口语化表述等问题,提出一种基于潜在狄利特雷分布模型的产品特征抽取方法。该方法首先应用中文分词工具对网络评论信息进行分词和词性标注,得到最初的产品特征名词集合;然后采用潜在狄利特雷分布文本训练模型筛选出候选产品特征词集合,进而通过同义词词林拓展和过滤规则得到最终的产品特征集合。以京东网上的相机和手机评论数据为例,通过实验对比分析验证了所提方法的有效性。
Aiming at the problems that low accuracy of product feature extraction, much human participation and dif ficult to handle the colloquial expression, a new product feature extraction method was proposed based on Latent Dirichlet Allocation (LDA). The online product reviews were parsed and labeled by using Chinese lexical analysis tool to generate the initial nouns set of product feature. The set of candidate product feature words was selected by LDA text training model, and the final product feature set was obtained through synonym lexicon expansion and fea ture filtering rules. The evaluate data of camera and mobile phone from JD. com was taken as the example to verify the effectiveness of the proposed method.
出处
《计算机集成制造系统》
EI
CSCD
北大核心
2014年第1期96-103,共8页
Computer Integrated Manufacturing Systems
基金
国家自然科学基金资助项目(71128003,70972006,71102111)
新世纪优秀人才支持计划资助项目(NCET-11-0792)~~
关键词
网络评论
产品特征抽取
潜在狄利特雷分布
数据挖掘
online reviews
product feature extraction
Latent Dirichlet allocation
data mining