摘要
文章通过有效的特征信息逼近策略,针对通过互联网发送的不良短消息给出了一种识别的方法。该方法采用经典的统计自然语言处理(SNLP)方式,抽取训练语料中的特征信息,并以特征信息为中心词,学习该词性链上的词性搭配关系,生成特征信息的词性转移表作为系统知识。在处理实际的短消息时,根据已有知识获得它的有害度评测值。据此开发的识别色情信息的实验系统,召回率及准确率分别在90%和80%以上。
This paper gives a method for identifying ill short messages sent by network by an effective feature informa-tion approach strategy.This way extracts feature information from training corpus,learns context collocation regular be-tween their Part-of-speech(POS)and creates Part-of-speech(POS)transfer-value as system knowledge by statistical natural language processing(SNLP).When real short message is processed,its evaluation value of ill degree can be gained according to the system knowledge by matching feature information.We develop an experiment system for identifying erotic short message,the recall ratio is beyond90%and the precision ratio is beyond80%.
出处
《计算机工程与应用》
CSCD
北大核心
2003年第36期161-162,165,共3页
Computer Engineering and Applications
基金
国家信息产业部项目资助(编号:01XK230009)