期刊文献+

Web信息检索结构化排序函数与标引词加权技术 被引量:1

Survey on structured ranking function and term weighting technology of Web information retrieval
在线阅读 下载PDF
导出
摘要 分析了当前Web信息检索的技术现状,指出检索效率不高的根本原因在于搜索引擎所采用的排序函数和标引词加权技术。介绍了传统的信息检索排序函数和标引词加权技术。分析了Web文档的特点,指出其主要形式HTML文档是一种结构化文档,结构由标签显式地定义,不同文档结构对检索性能的贡献不同。对本领域国内外学者的成果作了对比研究。最后探讨了Web信息检索排序函数及标引词加权技术的发展方向。 This paper analyzes current technological status of Web Information Retrieval(IR) and points out the root of its inefficiency is the ranking function and term weighting algorithms that searching engine adopts.Then classic IR ranking function and term weighting technologies are introduced.Characters of Web documents are studied,the fact is most of them are HTML documents,a kind of structured documents.Its structure is defined explicitly by predefined HTML tags,which has different importance and influence on the performance of search engine.The studies of researchers on structures of HTML documents are introduced,that is,making use of the peculiarity of Web documents to extend classic ranking function and term weighting technology to a structured one.Finally we discuss development trend of these technologies mentioned above.
出处 《计算机工程与应用》 CSCD 北大核心 2007年第11期181-184,共4页 Computer Engineering and Applications
基金 国家教育部科学技术重点研究项目(the Key Technologies Project of the Ministry of Education of China No.03144) 海南省自然科学基金(the Natural Science Foundation of Hainan Province of China under Grant No.60533)。
关键词 排序函数 标引词加权 文档结构 搜索引擎 ranking function term weighting document structure search engine
  • 相关文献

参考文献9

  • 1Gordon M,Pathak P.Finding information on the world wide web:the retrieval effectiveness of search engines[J].Information Processing and Management,1999,35(2):141-180.
  • 2Kleinberg J M.Authoritative sources in a hyperlinked environment[C]//Proceedings of the ACM-SLAM Symposium on Discrete Algorithms,1998:668-677.
  • 3Cutler M,Shih Yung-ming,Meng Wei-yi.Using the structure of HTML documents to improve retfieval[C]//Proceedings of the 11st IEEE Conference on Tools with AI,1999:406-409.
  • 4Kim S,Zhang B T.Genetic mining of HTML structures for effective web-document retrieval[J].Applied Intellingences,2003(18):243-256.
  • 5张敏,马少平,宋睿华.DF还是IDF?主特征模型在Web信息检索中的使用[J].软件学报,2005,16(5):1012-1020. 被引量:13
  • 6Trotman A.Choosing document structure weights[J].Information Processing and Management,2005,41:243-264.
  • 7Newby G B,ChapelHill UNC.Information space based on HTML structures[EB/OL].http://www.ils.unc.edu/-gbnewby/papers/trec9-proceedings.pdf.
  • 8刘芳,卢正鼎.有效地检索HTML文档[J].小型微型计算机系统,2000,21(9):986-988. 被引量:23
  • 9韩毅.基于文档结构的向量空间检索模型研究[J].情报学报,2004,23(2):158-162. 被引量:11

二级参考文献24

  • 1上海交大远程教育中心,HTML 语言参考 .WWW书籍,1998年
  • 2Anick PG. Adapting a full-text information retrieval system to computer the troubleshooting domain. In: Croft WB, van Rijsbergen CJ, eds. Proc. of the 17th Annual Int'l ACM-SIGIR Conf. on Research and Development in Information Retrieval (SIGIR'94).ACM Press, 1994. 349-358.
  • 3Croft WB, Cook R, Wilder D. Providing government information on the Internet: Experience with THOMAS. In: Proc. of the 2nd Int'l Conf. in Theory and Practice of Digital Libraries (DL'95). Texas, 1995. 19-24. http://csdl.tamu.edu/DL95/papers/croft/croft.html
  • 4Stefan K, Armin H, Markus J, Andreas D. Improving document retrieval by automatic query expansion using collaborative learning of term-based concepts. Lecture Notes in Computer Science 2423, 2002. 376-387.
  • 5Moffat A, Davis R, Wilkinson R, Zobel J. Retrieval of partial documents. In: Harman D, ed. Proc. of the 2nd Text Retrieval Conf.(TREC 2). Gaithersburg: National Institute of Standards and Technology Special Publication, 1994. 181-191.
  • 6Srinivasa S, Bhatt PCP. Introduction to Web information retrieval: A user perspective. Journal of Science Education, 2002,7(6):27-38.
  • 7Meng M, Yu C, Liu KL. Building efficient and effective metasearch engines. ACM Computing Surveys, 2002,34(1):48-89.
  • 8Glover E, Tsioutsiouliklis K, Lawrence S, Pennock D, Flake G. Using Web structure for classifying and describing Web pages. In:Proc. of the Int'l World Wide Web Conf. (www 2002). Hawaii: ACM Press, 2002. 562-569. http://www2002.org/CDROM/refereed/504/index.html
  • 9Cutler M, Shih Y, Meng W. Using the structure of HTML documents to improve retrieval. In: Proc. of the USENIX Symp. on Internet Technologies and Systems (NISTS'97). 1997. 241-251. http://www.usenix.org/publications/library/proceedings/usits97/full_papers/cutler/cutler.pdf
  • 10Newby GB. Information space based on HTML structure. In: Vorhees E, ed. Proc. of the 9th Text Retrieval Conf. (TREC 9).Gaithersburg: National Institute of Standards and Technology Special Publication, 2000. 601-610.

共引文献44

同被引文献5

引证文献1

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部