摘要
随着地理信息系统应用的不断发展,本文提出了在地理信息系统中对基于受限自然语言的查询语句进行分词处理的方法——首字扩词分词法。该分词方法在汲取正向最大匹配分词方法的核心思想--长词优先原则。为了缩小匹配范围,提高匹配效率,该方法首先根据查询自然语言的首字对词汇库的记录进行筛选,得到以该首字开头的词汇库子集;然后再按照长词优先原则把原查询语句与词汇库子集进行匹配,切分查询语句。通过比较,该方法比正向最大匹配方法具有较小的时间复杂度。同时,采用DELPHI编程语言在计算机上得到了实现。
In this paper, we present the method called Word Expanding Method for analyzing and handling the Chinese natural language queries which are derived from the user interface of GIS. The realization of this method will help the nonprofessionals to use the GIS more conveniently and efficiently. Here, we adopt the idea of Maximum Matching Method known as longer word preferred. The process of Word Expanding Method includes two steps: filtrating and matching. The filtrating step filtrates the whole Limited Chinese Words Library by using the first character of the querying sentence, so as to reduce the scope which the matching step should match to. The Word Expanding Method is approved to be a rather efficient method for Chinese word segmentation. And finally, we applied this method in GIS by programming.
出处
《地球信息科学》
CSCD
2005年第3期67-71,共5页
Geo-information Science
基金
国家863计划资助(2002AA134020)
关键词
地理信息系统
自然语言分词
首字扩词
长词优先
GIS
Chinese word segmentation
word expanding method
longer word preferred