摘要
首先阐述主题爬虫相关度算法目标和相关度的计算内涵;然后根据信息处理的进化观点,以信息特征项的处理为线索,分别从字符层、语言层、语义层3个层次系统分析当前主题爬虫相关度的计算方法,并比较不同层次间各个算法的优缺点;最后总结现有的研究成果,并给出进一步的研究方向。
This paper describes the goal of relevance algorithm and relevance calculation connotation in focused crawler. Then, according to the evolutionary point of view of information processing, it systematically analyzes the current relevance calculation method of focused crawler in three levels: character layer, language layer, semantic layer, and compares the advantages/disad- vantages among algorithms from different levels. Finally, it summarizes the current research results and indicates the direction in future works.
出处
《计算机与现代化》
2013年第4期27-30,39,共5页
Computer and Modernization
基金
公益性科研院所基本科研业务费专项资金资助项目(2012-J-06)
关键词
相关度
算法
主题爬虫
概念
relevance
algorithm
focused crawler
concept