摘要
针对中文分词的相关理论,讨论了中文分词的难点、语料库和中文分词算法,然后在.NET下设计了一个中文分词系统。在长词优先的原则下通过SQL Server 2005自建了一个语料库。在Visual Studio 2005下,采用改进的正向减字最大匹配分词算法,使用ASP.NET和C#语言实现了这个系统。结果表明该分词系统达到了较好的分词效果。
This article discussed Chinese word segmentation theory and corpus,then designed a Chinese word segmentation systems based on.NET.And SQL Server 2005 helped to build a corpus in principle of long-term priority.In Visual Studio 2005,word segmentation was implemented under improved forward maximum matching word method by using ASP.NET and C#language.Experiments show that the system can have a good segmentation results.
出处
《微计算机信息》
2010年第12期215-216,214,共3页
Control & Automation