期刊文献+

垂直搜索引擎中过滤器的设计与实现 被引量:2

DESIGNING AND IMPLEMENTING THE FILTER OF VERTICAL SEARCH ENGINE
在线阅读 下载PDF
导出
摘要 从海量Web资源中高效、快捷地检索信息的常规的方法是搜索引擎和信息抽取等技术,而过滤器则起着重要的Web网页预处理作用。提出了基于DOM树结构的过滤器方法,讨论了垂直搜索引擎中页面预处理的方法、设计与实现,并给出了具体的实现算法。最后结合在石油领域垂直搜索引中的具体应用,总结了过滤器适应于当前网页的结构和设计的特点,验证了方法的正确性和适用性,大大提高了垂直搜索引擎的效率和准确率。 Conventional method of fast and efficiently retrieving information from mass Web resources is the technique of search engines and information extraction, etc. , and the filter plays an important role in pre-treating the Web pages. In this paper we propose a filter method based on DOM tree structure, discuss the pre-treatment approach, designing and implementation of the pages in vertical search engine with a concrete realisation algorithm. At last, we summarise the structure and design characteristics of the filter adapting to the current website in conjunction with the specific application of the vertical search engine in oil field, and validate the correctness and applicability of the method. The search engine efficiency and accuracy have been greatly improved.
出处 《计算机应用与软件》 CSCD 2009年第12期148-151,共4页 Computer Applications and Software
关键词 网页预处理 过滤器 DOM 搜索引擎 Page pretreatment Filter DOM Search engine
  • 相关文献

参考文献6

  • 1Charu Aggarwal,Fatima Al-Garawi, Phillip Yu. Intelligent crawling on the World Wide Web with arbitrary predicates[ C ]//Proceedings of the 10th International World Wide Web Conference. 2001:96 - 105.
  • 2陈康,武港山.基于Ontology的信息检索技术研究[J].中文信息学报,2005,19(2):51-57. 被引量:30
  • 3张树瑜,杜国宁,朱仲英.基于Web的半结构化信息抽取技术研究[J].系统工程与电子技术,2004,26(5):610-612. 被引量:6
  • 4C Ch Latiri ,S BenYahia. Query expansion using fuzzy association rules between terms [ J ]. Conference Journees Informatique Messine, Metz, France, 2003.
  • 5欧阳柳波,李学勇,李国徽,王鑫.网络蜘蛛搜索策略进展研究[J].小型微型计算机系统,2005,26(4):703-706. 被引量:8
  • 6Charkrabarti S, Dora B. Focused crawling:a new approach to topic-specific Web resource discovery [ C ]//Proceedings of the 8 th Internationai W W W Conference. 1999:6- 9.

二级参考文献46

  • 1苏海菊,王永成.中文科技文献文摘的自动编写[J].情报学报,1989,8(6):433-439. 被引量:26
  • 2何绍义.概念信息检索的理论与实践[J].情报学报,1995,14(2):134-141. 被引量:11
  • 3.Google[Z].http:∥www.google.com,.
  • 4.百度[Z].http : ∥ www . baidu . com,.
  • 5N.Guarino,C.Masolo,and G.Vetere.OntoSeek:Content-Based Access to the Web[J].IEEE Intelligent System,1999,14(3):70-80.
  • 6G.A. Miller. WORDNET: A Lexical Database for English[J]. Communications of the ACM, 1995,38(11):39 -41.
  • 7Gruber T R. A Translation approach portable ontology specifications[J]. Knowledge Acquisition, 1993,5(2): 199 -220.
  • 8Seung-Hoon Na, In-Su Kang, Sang-Yool Lee. Question Answering Using a WordNet-based Answer Type Taxonomy[A]. Proceedings of the 11th Text Retrieval Conference (TREC- 11)[C].
  • 9Rohini Srihari, Wei Li. Information Extraction Supported Question Answering[ A]. Proceedings of the 8th Text Retrieval Conference (TREC - 8) [ C ].
  • 10Cho J, Garcia-Molina H, Page L. Efficient crawling through URL ordering [J]. Computer Networks, 1998, 30 (1-7): 161-172.

共引文献41

同被引文献20

引证文献2

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部