期刊文献+
共找到3篇文章
< 1 >
每页显示 20 50 100
Research of Web Documents Clustering Based on Dynamic Concept
1
作者 WANG Yun-hua CHEN Shi-hong 《Wuhan University Journal of Natural Sciences》 EI CAS 2004年第5期547-552,共6页
Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge.Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web them... Conceptual clustering is mainly used for solving the deficiency and incompleteness of domain knowledge.Based on conceptual clustering technology and aiming at the institutional framework and characteristic of Web theme information,this paper proposes and implements dynamic conceptual clustering algorithm and merging algorithm for Web documents,and also analyses the super performance of the clustering algorithm in efficiency and clustering accuracy. 展开更多
关键词 conceptual clustering clustering center dynamic conceptual clustering THEME web documents clustering
在线阅读 下载PDF
Hierarchical Subtopic Segmentation of Web Document
2
作者 ZHANG Yun-tao GONG Ling WANG Yong-cheng 《Wuhan University Journal of Natural Sciences》 EI CAS 2006年第1期47-50,共4页
The paper proposes a novel method for subtopics segmentation of Web document. An effective retrieval results may be obtained by using subtopics segmentation. The proposed method can segment hierarchically subtopics an... The paper proposes a novel method for subtopics segmentation of Web document. An effective retrieval results may be obtained by using subtopics segmentation. The proposed method can segment hierarchically subtopics and identify the boundary of each subtopic. Based on the term frequency matrix, the method measures the similarity between adjacent blocks, such as paragraphs, passages. In the real-world sample experiment, the macro-averaged precision and recall reach 73.4 % and 82.5 %, and the micro-averaged precision and recall reach 72.9% and 83. 1%. Moreover, this method is equally efficient to other Asian languages such as Japanese and Korean, as well as other western languages. 展开更多
关键词 subtopic segmentation web document passage retrieval DISCOURSE
在线阅读 下载PDF
Improving Web Document Clustering through Employing User-Related Tag Expansion Techniques 被引量:5
3
作者 李鹏 王斌 晋薇 《Journal of Computer Science & Technology》 SCIE EI CSCD 2012年第3期554-566,共13页
As high quality descriptors of web page semantics, social annotations or tags have been used for web document clustering and achieved promising results. However, most web pages have few tags (less than 10). This spa... As high quality descriptors of web page semantics, social annotations or tags have been used for web document clustering and achieved promising results. However, most web pages have few tags (less than 10). This sparsity seriously limits the usage of tags for clustering. In this work, we propose a user-related tag expansion method to overcome this problem, which incorporates additional useful tags into the original tag document by utilizing user tagging data as background knowledge. Unfortunately, simply adding tags may cause topic drift, i.e., the dominant topic(s) of the original document may be changed. To tackle this problem, we have designed a novel generative model called Folk-LDA, which jointly models original and expanded tags as independent observations. Experimental results show that 1) our user-related tag expansion method can be effectively applied to over 90% tagged web documents; 2) Folk-LDA can alleviate topic drift in expansion, especially for those topic-specific documents; 3) the proposed tag-based clustering methods significantly outperform the word-based methods., which indicates that tags could be a better resource for the clustering task. 展开更多
关键词 web document clustering social bookmarking topic model tag expansion
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部