摘要
"N1+N2"作为现代汉语中常见的短语形式,亦广泛存在于搜索引擎检索用语中。以日志短语词典为基础,根据搜索日志查询串的语言特点,对包含"N1+N2"型短语的查询串进行改写,其中包括空格分割、引号加注和焦点强调方法,并对查询串粗略分类。实验结果显示:在引号加注的作用下MPA由0.362提高到0.441;导航类查询MRR值从0.64提升到0.719,信息事务类查询MRR值从0.25增加到0.344。从而验证了短语特征能够指导查询结果优化,进而提升搜索引擎性能。
The "N1+N2" structure,as a common phrase structure in modern Chinese,is also widely used in retrieval parlance of search engine.In this paper,on the basis of phrase dictionary of query logs and according to language characteristics of search logs query strings,we rewrite the query strings containing "N1+N2" structure phrase in three ways,including the space segmentation,the quote marking and the focus-emphasising,and make preliminary classification on query strings.Experimental results show,the quote marking method makes MPA increase from 0.362 to 0.441,makes navigational query MRR improve from 0.64 to 0.719,and makes informational and transactional query MRR raise from 0.25 to 0.344.Therefore it verifies that the phrase characteristics can guide the query results optimisation and then enhance the performance of search engine.
出处
《计算机应用与软件》
CSCD
北大核心
2012年第9期117-121,共5页
Computer Applications and Software
基金
国家社会科学基金项目(09CYY021)