期刊文献+

高效FTP搜索引擎的设计与实现 被引量:7

Design and Implementation of High-Performance FTP Search Engine
在线阅读 下载PDF
导出
摘要 为了解决传统FTP搜索引擎对检索结果优化程度不够而造成的检索质量低的问题,在对FTP用户查询日志进行统计分析的基础上,采用双字节倒排索引、检索结果自动分类以及查询自动纠错等技术设计了一种高性能的智能化FTP搜索引擎.双字节倒排是对文件名中每两个字节建立倒排索引表,自动分类是对检索结果按主题划分为层次结构,查询自动纠错是以用户查询日志中的高频查询词为数据源构建拼写错误词典.试验结果表明,该方案能够有效地提高FTP搜索引擎的文件检索效率与质量. In order to improve the query quality of the traditional FTP search engines possessing low optimization performance for query results, a high-performance intelligent FTP search engine is designed based on the statistical analysis of FTP user query logs. In this engine, the double-byte inverted index is employed to build an inverted index table with every double bytes of the file name, the automatic classification of query results is used to establish a tree structure of query results based on the search topic, and the automatic error correction is adopted to construct a spelling mistake dictionary with the high-frequency search keywords in user query logs. Query results in a real system indicate that the proposed scheme greatly improves the query efficiency and quality of a FTP search engine.
出处 《华南理工大学学报(自然科学版)》 EI CAS CSCD 北大核心 2009年第1期135-139,共5页 Journal of South China University of Technology(Natural Science Edition)
基金 国家"863"计划项目(2006AA10Z239) 国家科技支撑计划项目(2006BAH02A16)
关键词 文件传输协议 搜索引擎 倒排索引 自动分类 自动纠错 File Transfer Protocol search engine inverted index automatic classification automatic error correction
  • 相关文献

参考文献10

  • 1Wolfram Spink D, Jansen B J, Saracevic A T. Searching the Web : the public and their queries [ J ]. Journal of the American Society for Information Science, 2001,53 : 226- 234.
  • 2Almpanidis G, Kotropoulos C, Pitas I. Combining text and link analysis for focused crawling-an application for vertical search engines [ J ]. Information Systems, 2006,9 (4) :1-23.
  • 3彭波.大规模搜索引擎检索系统框架与实现要点[J].计算机工程与科学,2006,28(3):1-4. 被引量:20
  • 4皮鹏,张国印.智能元搜索引擎的研究[J].应用科技,2001,28(8):24-26. 被引量:9
  • 5陈华,罗昶,王建勇,段晖,薛明.基于Web的百万级FTP搜索引擎的设计与实现[J].计算机应用,2000,20(9):68-70. 被引量:14
  • 6谢欣,刘菲菲,李晓明.天网千帆——一种新型文件搜索引擎[J].华南理工大学学报(自然科学版),2004,32(z1):58-62. 被引量:4
  • 7陈华,王继民,韩近强,谢欣.互联网上FTP文件的分布特征及启示[J].计算机工程与应用,2004,40(1):129-133. 被引量:11
  • 8Shepherd S J. Concepts and architectures for next-generation information search engines [ J ]. International Journal of Information Management, 2007,27 ( 1 ) : 3- 8.
  • 9Chuang Shui-lung, Chien Lee-feng. Enriching Web taxonomies through subject categorization of query terms from search engine logs [ J ]. Decision Support Systems,2003, 35( 1 ) :113-127.
  • 10Yan Hong-fei, Wang Jian-yong, Li Xiao-ming. A dynamically reconfigurable model for a distributed Web crawling system [ C ]//Proc of Int'l Conf on Computer Networks and Mobile Computing. Beijing: [ s.n.] ,2001 : 157-162.

二级参考文献24

  • 1陈华 李晓明.文件分类查找方法[P].专利申请号:02100839.6.2002-01.
  • 2Dhand H Mannila等著 张银奎 廖丽等译.数据挖掘原理[M].北京:机械工业出版社,2003..
  • 3陈华 李晓明.高级文件搜索引擎核心功能的实现技术:搜索引擎与Web挖掘进展[M].高等教育出版社,2003..
  • 4[4]Wagner R A, Fisher M J. The string-to-string correction problem [J]. Journal of the Association for Computing Machinery, 1974,21:168 - 173.
  • 5[5]Bunke H,Csirik J. Parametric string edit distance and its application to pattern recognition [J]. IEEE Trans System Man and Cybernetics, 1995,25 (1) :202 - 206.
  • 6Liu Jianguo,Proceedings of the Fourth International Conference/Exhibitionon High Performance,2000年,751页
  • 7S Brin,L Page.The Anatomy of a Large-Scale Hypertexual Web Search Engine[A].Proc of the 7th WWW Conf[C].1998.
  • 8天网.北京大学天网中英搜索引擎[EB/OL].http://e.pku.edu.cn,2003-10.
  • 9YAN Hongfei,WANG Jianyong,LI Xiaoming.A Dynamically Reconfigurable Model for a Distributed Web Crawling System[A].2001 Int'l Conf on Computer Networks and Mobile Computing[C].2001.157-162.
  • 10YAN Hongfei,WANG Jianyong,LI Xiaoming,et al.Architectural Design and Evaluation of an Efficient Web-Crawling System[J].Journal of System and Software,2002,60(3):185-193.

共引文献43

同被引文献43

引证文献7

二级引证文献42

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部