摘要
本文在介绍分析常用中文分词方法及其特点的基础上,针对地名地址字符串,提出基于地址要素识别机制的地名地址分词算法。该算法基于整词二分分词词典,采用FMM算法,增加了基于地址要素的识别机制,从而有效地实现了对地名地址串的拆分。识别机制通过判断地址要素的完整性完成对未登录地址名称的处理,解决了分词算法对未登录地址名称的识别问题。测试证明新算法能够有效实现对地名地址串的拆分,从很大程度上解决对未登录地址名称的识别问题。
This paper, after introducing some frequently used algorithms of Chinese segmentation, put forward an algorithm of ad- dress segmentation based on identify unrecorded address name. The algorithm used FMM algorithm with the bipartite-word dictionary and added a mechanism by identifying address name. The mechanism confirms the attribute of segmentations to record whether the seg- mentation is completed or not, and solve the problem of identifying unrecorded words. Experiment proved that the algorithm could ef- fectively segment strings of addresses and solve the problem of identifying unrecorded words.
出处
《测绘科学》
CSCD
北大核心
2013年第5期74-76,共3页
Science of Surveying and Mapping
基金
国家科技支撑项目(2012BAH24B00)
关键词
中文分词
地名地址分词
未登录词识别
分词词典
Chinese segmentation
address segmentation
identifying unrecorded words
segmentation dictionary