期刊文献+

面向Internet的中文新词语检测 被引量:60

Internet-oriented Chinese New Words Detection
在线阅读 下载PDF
导出
摘要 随着社会的飞速发展 ,新词语不断地在日常生活中涌现出来。搜集和整理这些新词语 ,是中文信息处理中的一个重要研究课题。本文提出了一种自动检测新词语的方法 ,通过大规模地分析从Internet上采集而来的网页 ,建立巨大的词和字串的集合 ,从中自动检测新词语 ,而后再根据构词规则对自动检测的结果进行进一步的过滤 ,最终抽取出采集语料中存在的新词语。根据该方法实现的系统 ,可以寻找不限长度和不限领域的新词语 ,目前正应用于《现代汉语新词语信息 (电子 )词典》的编纂 ,在实用中大大的减轻了人工查找新词语的负担。 With the fast development of the society,more and more new words come out in our life. It is one of the important topics in Chinese natural language processing to collect those new words. A method is presented for detecting these new words automaitcally in this paper. Through analysing webpages grabbed from the Internet, a large word and string set is built, which new words are detected from and filtered by rules. At last new words which exist in the webpages grabbed are extracted. The system built in this way can find new words in any length and in any field.Now it is applying to the compilation of Modern Chinese New Word Information Dictionary. It reduced human labor a lot in practise.
出处 《中文信息学报》 CSCD 北大核心 2004年第6期1-9,共9页 Journal of Chinese Information Processing
关键词 计算机应用 中文信息处理 新词语 自动检测 computer application Chinese language processing new word automatic detection
  • 相关文献

参考文献2

  • 1Hua- Ping ZHANG, Qun LIU. et al, Chinese Name Entity Recognition Using Role Model[ J]. Special issue ''Word Formation and Chinese Language processing'' of the International Journal of Computational Linguistics and Chinese Language Processing, 2003, 8(2):2
  • 2Craig G. Nevill - Manning, Ian H. Witten. Identifying Hierarchical Structure in Sequences: A linear - time algorithm [J]. Journal of Artificial Intelligence Research, 1997, 7:67- 82

同被引文献520

引证文献60

二级引证文献329

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部