In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of t...In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of the TF*IDF, TFC and ITC algorithms in order to make it more appropriate for web documents. Meanwhile, the presented algorithm is applied to improved vector space model (IVSM). A real system has been implemented for calculating semantic similarities of web documents. Four experiments have been carried out. They are keyword weight calculation, feature item selection, semantic similarity calculation, and WKWA time performance. The results demonstrate accuracy of keyword weight, and semantic similarity is improved.展开更多
本文提出了一种基于关键词的中文文档图像检索方法,能在不经OCR(Optical Character Recognition)识别的情况下,直接利用中文字符的图像特征进行关键词检索。首先将文档图像分割成单个中文字符图像,接着对字符图像进行汉字笔画的特征数...本文提出了一种基于关键词的中文文档图像检索方法,能在不经OCR(Optical Character Recognition)识别的情况下,直接利用中文字符的图像特征进行关键词检索。首先将文档图像分割成单个中文字符图像,接着对字符图像进行汉字笔画的特征数据提取,然后在特征数据间进行基于WMHD(Weighted Modified Hausdorff Dis-tance)的相似性测量。该方法不受字号的影响,也有一定的抗字体能力,实验证明其具有较高的检索效果。展开更多
基金Project supported by the Science Foundation of Shanghai Municipal Commission of Science and Technology (Grant No.055115001)
文摘In this paper, an improved algorithm, web-based keyword weight algorithm (WKWA), is presented to weight keywords in web documents. WKWA takes into account representation features of web documents and advantages of the TF*IDF, TFC and ITC algorithms in order to make it more appropriate for web documents. Meanwhile, the presented algorithm is applied to improved vector space model (IVSM). A real system has been implemented for calculating semantic similarities of web documents. Four experiments have been carried out. They are keyword weight calculation, feature item selection, semantic similarity calculation, and WKWA time performance. The results demonstrate accuracy of keyword weight, and semantic similarity is improved.