摘要
中文自动分词一直是中文信息处理、Web文档挖掘等处理文档类研究的基础技术之一,传统的中文分词技术研究,主要集中在算法的改进研究上,对实验的平台,有关软件的实现涉及很少.本文通过探讨中文自动分词的重要性和不可或缺性,给出了基于VC++/MFC自动分词软件的开发技术,最后对软件实现的四种基于词典的机械匹配的算法进行了实验分析,实验证明该系统能够高效的给中文信息处理提供一个平台.
The Chinese automatic word segmentation is always one basic component in many fields of Chinese information processing, the Web documents mining and so on, one of technologies is processing documents class research. The traditional Chinese word segmentation engineering research mainly concen- trates on algorithm improvement research. For the experimental platform, the related software realization involves very few. This article has given the VC/MFC-based automatic word segmentation software development technology. Four methods, actualized by software, is based on the dictionary machinery matched by algorithm. The experiment proved that this system can effectively provide a platform for Chinese information processing.
出处
《广西师范学院学报(自然科学版)》
2008年第3期104-108,共5页
Journal of Guangxi Teachers Education University(Natural Science Edition)
基金
国家科技型中小企业技术创新基金项目(06C26224501689)
广西自然科学基金(桂科自0679018)
关键词
自动分词
中文信息处理
挖掘
基于词典的机械匹配
automatic word segmentation
Chinese information processing
mining
dictionary - based machinery match