摘要
中文分词是计算机自动处理文本的基础。通过比较常用的机械分词算法的优缺点,提出了分层逐字二分算法,综合了TRIE树和逐字二分分词的特点,以求通过较小的开销来实现较快的匹配速度。实验结果表明,该算法在综合性能上有显著提高。
Chinese word segmentation is the base for Chinese information processing. By comparison commonly the advantages and disadvantages of the machinery word segmentation algorithm, then a lied verbatim binary algorithm has been presented, which integrated TRIE trees and verbatim binary search's characteristics, try to take the smaller overhead to achieve faster match speed. The results show that the algorithm in the comprehensive performance has made significant increase.
出处
《计算机与数字工程》
2009年第3期68-71,87,共5页
Computer & Digital Engineering
关键词
中文分词
计算机应用
中文信息处理
Chinese word segmentation, computer application, Chinese information processing