期刊文献+

一个基于多代码页的中文屏幕实时解释引擎的设计 被引量:2

The Design of Chinese Screen Real-time Paraphrase Engineer Based on Multiple Code Pages
在线阅读 下载PDF
导出
摘要 目前,在计算机中汉字有多种代码页,汉字的多代码页并存现象将长期存在。为了实现汉字多代码页并存,需要汉字代码页自动识别技术的支撑。屏幕实时解释引擎是目前各种在线字典、词典以及教学软件的核心技术,此技术目前存在不能跨代码页,取词不全面、不正确等缺陷。本文主要针对以上情况,描述了采用汉字内码的代码页自动识别技术以及优化的自动屏幕取词技术的中文屏幕实时解释引擎的系统架构,并阐述了数据词典的设计以及在设计中采用的关键技术。对五百万汉字样本的测试中,应用此引擎的在线词典对有意义短字符串(不包括单字)代码页的识别率可以达到99%以上。 Nowadays, in the computer the Chinese Characters are represented by various code pages, and it is a long existing phenomenon. In order to use all kinds of Chinese code pages including GB2312, GBK, GB18030, BIG-5, HKSCS and ISO10646/Unieode at same time, the technology of Chinese code pages auto recognition is required. The Chinese screen real-time paraphrase engineer is the key technology to build many kinds of online dictionary, teaching software and so on. This paper describes the system architecture of the Chinese Screen Real-time Paraphrase Engineering, which is based on the technology of Chinese code pages auto recognition and auto capturing words from screen. It also illuminates the design of data dictionary and the key technology of such engineer. In an online dictionary which used this engineer as a sample, the recognition rate of short string's code pages can reach 99% on the test documents which include about five million Chinese characters.
出处 《中文信息学报》 CSCD 北大核心 2005年第5期90-96,共7页 Journal of Chinese Information Processing
基金 江苏省高校自然科学基金资助项目(01kjb520001 04KKB320134)
关键词 计算机应用 中文信息处理 汉字代码页自动识别 屏幕取词 ISO10646 computer application Chinese information processing Chinese character code pages auto recognition capturing words from screen ISO10646
  • 相关文献

参考文献4

  • 1李培峰,朱巧明,钱培德.多文种环境下汉字内码识别算法的研究[J].中文信息学报,2004,18(2):73-79. 被引量:16
  • 2International Organization for Standardization (ISO). Universal Multiple-Octet Coded Character Set (UCS) [S]. International Standard. Ref. No. ISO/IEC 10646 - 1:1993(E)/10646- 1:2000(E)/10646 - 2:2001(E).
  • 3GB18030-2000.信息技术信息交换用汉字编码字符集基本集的扩充[S].[S].息产业部和原国家质量技术监督局,2000,3.17.
  • 4香港资讯科技署及法定语文事务署.香港增补字符集-2001[M].,2001..

二级参考文献1

  • 1张轴材.ISO/ IEC 10646-1 and Unicode标准与实现.CharacterCode amp Data To Come研讨会[R].,1996..

共引文献19

同被引文献9

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部