摘要
从海量Web资源中高效、快捷地检索信息的常规的方法是搜索引擎和信息抽取等技术,而过滤器则起着重要的Web网页预处理作用。提出了基于DOM树结构的过滤器方法,讨论了垂直搜索引擎中页面预处理的方法、设计与实现,并给出了具体的实现算法。最后结合在石油领域垂直搜索引中的具体应用,总结了过滤器适应于当前网页的结构和设计的特点,验证了方法的正确性和适用性,大大提高了垂直搜索引擎的效率和准确率。
Conventional method of fast and efficiently retrieving information from mass Web resources is the technique of search engines and information extraction, etc. , and the filter plays an important role in pre-treating the Web pages. In this paper we propose a filter method based on DOM tree structure, discuss the pre-treatment approach, designing and implementation of the pages in vertical search engine with a concrete realisation algorithm. At last, we summarise the structure and design characteristics of the filter adapting to the current website in conjunction with the specific application of the vertical search engine in oil field, and validate the correctness and applicability of the method. The search engine efficiency and accuracy have been greatly improved.
出处
《计算机应用与软件》
CSCD
2009年第12期148-151,共4页
Computer Applications and Software