摘要
以词间空格作为自然分隔符,非常容易获取维吾尔文中的词,但又很难获取结构完整的语义词,因此多种文本处理效果总是很不理想。提出维吾尔文组词的新概念,将数据挖掘中的频繁模式挖掘方法引入到维吾尔文组词中,再结合维吾尔文的语言文字特点,将无先验知识的模式挖掘问题转化为特定模式的匹配问题,提出了一种快速高效的频繁模式挖掘算法,来获取语义完整的维吾尔文词。实验结果表明,通过该算法获取的维吾尔文词,在结构上是稳定的,语义上是完整而独立的。
It is very easy to get the words in Uighur text lines by the natural delimiters such as spaces, but it is difficult to obtain the completely structured semantic words. Therefore, many kinds of text processing methods always seem not to be very effective. This paper put forward a new concept of Uyghur word grouping and introduced the frequent pattern mining method in data mining scheme, and combined the Uyghur language features, turned the pattern mining problem without prior knowledge into a pattern matching with special pattern, and proposed a fast and efficient frequent pattern mining algorithm to obtain the Uyghur words with complete semantics. The experimental results show that, words obtained by this algorithm are stable in structure, and semantically complete and independent.
出处
《计算机应用》
CSCD
北大核心
2012年第10期2920-2922,2926,共4页
journal of Computer Applications
基金
国家自然科学基金资助项目(61063022
61262062
61163033
61142004)
新疆维吾尔自治区高技术研究发展计划项目(201212124)
教育部新世纪优秀人才支持计划项目(NCET-10-0969)
关键词
维吾尔文本
分词
组词
语义词
频繁模式
Uyghur text
word segmentation
word grouping
semantic word
frequent pattern