期刊文献+

搜索日志中“V+N”、“V+V”型短语识别 被引量:1

Recognition of “V+N” and “V+V” structure phrase in search engine query logs
在线阅读 下载PDF
导出
摘要 从搜狗日志语料出发,分析语料特点,以词语本身、词性信息、位置信息、查询词串频次和音节数为特征,提出了基于SVM_HMM模型的短语自动识别方法,对"V+N"、"V+V"短语进行多重对比实验,实验验证了上下文信息量的增加能提高短语识别效率,证实了音节数、位置特征对实验效果的低影响力,为搜索引擎用短语词典的构建提供技术支持,为进一步的短语类别识别研究提供方向性指导。 A new way of automatic recognition for phrase based on SVM_HMM model is put forward in this paper through analyzing the characters of the Sogou log corpus.Multiple experiments are conducted on the "V+N"、"V+V" phrases from different perspectives using some information of words,such as parts of speech,position,the times of being searched and so on.The results of the experiments reveal that the increase of the context information can improve the efficiency of phrase recognition,and the characteristics of the number and location of syllables have little influence on the experimental effect.This research provides technical support for phrase dictionary building of search engine in future and directive guidance for a further research on recognition of phrases category.
出处 《北京信息科技大学学报(自然科学版)》 2012年第2期53-58,共6页 Journal of Beijing Information Science and Technology University
基金 国家社会科学基金项目(09CYY021)
关键词 搜狗日志 SVM_HMM模型 短语自动识别 “V+N”短语 “V+V”短语 Sogou log SVM_HMM model phrases automatic recognition the phrase of "V+N" the phrase of"V+V"
  • 相关文献

参考文献10

  • 1周强,俞士汶.汉语短语标注标记集的确定[J].中文信息学报,1996,10(4):1-11. 被引量:35
  • 2周雅倩,郭以昆,黄萱菁,吴立德.基于最大熵方法的中英文基本名词短语识别[J].计算机研究与发展,2003,40(3):440-446. 被引量:63
  • 3Tsochantaridis,Hofmann T,Joachims T,et al.Support vector learning for interdependent andstructured output spaces[J].Journal ofMachine Learning Research,2005(6):1453-1484.
  • 4Erik F,Tjong Kim Sang,jorn Veenstra.Representing text chunks[C] ∥proceeding ofthe 9th Conference of the European Chapter ofthe Association for Computational Linguistice,1999:173-179.
  • 5Adam L Berger,Stephen A Della Pietra,VincentJ Della Pierta.A maximum entropy approach tonatural language processing[J].ComputationalLinguistics,1996,22(1):39-71.
  • 6Matthew J Beal,Zoubin Ghahramani,CarlEdward Rasmussen.The infinite hidden markovmodel[J].Machine Learning,2002,29(29):577-584.
  • 7俞士文.词语切分与词性标注-规范与加工手册[EB/OL].(1999)[2011-10-10].http:∥icl.pku.edu.cn/icl_groups/corpus/coprus-annotation.htm.
  • 8Joachims Thorsten.Sequence Tagging withStructural Support Vector Machines[EB/OL].(2008-8-14)[2011-10-10].http:∥www.cs.cornell.edu/People/tj/svm_light/svm_hmm.html.
  • 9冯敏萱.面向信息处理的现代汉语V+N序列分析.扬州大学学报:人文社会科学版,2011,15(1):48-51.
  • 10徐艳华,陈小荷.面向自动句法分析的“V+V”结构歧义研究[J].计算机工程与应用,2006,42(33):150-152. 被引量:2

二级参考文献43

共引文献94

同被引文献6

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部