期刊文献+
共找到1篇文章
< 1 >
每页显示 20 50 100
Enriching short text representation in microblog for clustering 被引量:14
1
作者 Jiliang TANG Xufei WANG Huiji GAO Xia HU Huan LIU 《Frontiers of Computer Science》 SCIE EI CSCD 2012年第1期88-101,共14页
Social media websites allow users to exchange short texts such as tweets via microblogs and user status in friendship networks. Their limited length, pervasive abbrevi- ations, and coined acronyms and words exacerbate... Social media websites allow users to exchange short texts such as tweets via microblogs and user status in friendship networks. Their limited length, pervasive abbrevi- ations, and coined acronyms and words exacerbate the prob- lems of synonymy and polysemy, and bring about new chal- lenges to data mining applications such as text clustering and classification. To address these issues, we dissect some poten- tial causes and devise an efficient approach that enriches data representation by employing machine translation to increase the number of features from different languages. Then we propose a novel framework which performs multi-language knowledge integration and feature reduction simultaneously through matrix factorization techniques. The proposed ap- proach is evaluated extensively in terms of effectiveness on two social media datasets from Facebook and Twitter. With its significant performance improvement, we further investi- gate potential factors that contribute to the improved perfor- mance. 展开更多
关键词 short texts text representation multi-languageknowledge matrix factorization social media
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部