摘要
为了解决传统FTP搜索引擎对检索结果优化程度不够而造成的检索质量低的问题,在对FTP用户查询日志进行统计分析的基础上,采用双字节倒排索引、检索结果自动分类以及查询自动纠错等技术设计了一种高性能的智能化FTP搜索引擎.双字节倒排是对文件名中每两个字节建立倒排索引表,自动分类是对检索结果按主题划分为层次结构,查询自动纠错是以用户查询日志中的高频查询词为数据源构建拼写错误词典.试验结果表明,该方案能够有效地提高FTP搜索引擎的文件检索效率与质量.
In order to improve the query quality of the traditional FTP search engines possessing low optimization performance for query results, a high-performance intelligent FTP search engine is designed based on the statistical analysis of FTP user query logs. In this engine, the double-byte inverted index is employed to build an inverted index table with every double bytes of the file name, the automatic classification of query results is used to establish a tree structure of query results based on the search topic, and the automatic error correction is adopted to construct a spelling mistake dictionary with the high-frequency search keywords in user query logs. Query results in a real system indicate that the proposed scheme greatly improves the query efficiency and quality of a FTP search engine.
出处
《华南理工大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2009年第1期135-139,共5页
Journal of South China University of Technology(Natural Science Edition)
基金
国家"863"计划项目(2006AA10Z239)
国家科技支撑计划项目(2006BAH02A16)
关键词
文件传输协议
搜索引擎
倒排索引
自动分类
自动纠错
File Transfer Protocol
search engine
inverted index
automatic classification
automatic error correction