Social media websites allow users to exchange short texts such as tweets via microblogs and user status in friendship networks.Their limited length,pervasive abbrevi-ations,and coined acronyms and words exacerbate the...Social media websites allow users to exchange short texts such as tweets via microblogs and user status in friendship networks.Their limited length,pervasive abbrevi-ations,and coined acronyms and words exacerbate the prob-lems of synonymy and polysemy,and bring about new chal-lenges to data mining applications such as text clustering and classification.To address these issues,we dissect some poten-tial causes and devise an efficient approach that enriches data representation by employing machine translation to increase the number of features from different languages.Then we propose a novel framework which performs multi-language knowledge integration and feature reduction simultaneously through matrix factorization techniques.The proposed ap-proach is evaluated extensively in terms of effectiveness on two social media datasets from Facebook and Twitter.With its significant performance improvement,we further investi-gate potential factors that contribute to the improved perfor-mance.展开更多
文摘Social media websites allow users to exchange short texts such as tweets via microblogs and user status in friendship networks.Their limited length,pervasive abbrevi-ations,and coined acronyms and words exacerbate the prob-lems of synonymy and polysemy,and bring about new chal-lenges to data mining applications such as text clustering and classification.To address these issues,we dissect some poten-tial causes and devise an efficient approach that enriches data representation by employing machine translation to increase the number of features from different languages.Then we propose a novel framework which performs multi-language knowledge integration and feature reduction simultaneously through matrix factorization techniques.The proposed ap-proach is evaluated extensively in terms of effectiveness on two social media datasets from Facebook and Twitter.With its significant performance improvement,we further investi-gate potential factors that contribute to the improved perfor-mance.