期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Clustering-based topical Web crawling using CFu-tree guided by link-context 被引量:2
1
作者 Lu LIU 《Frontiers of Computer Science》 SCIE EI CSCD 2014年第4期581-595,共15页
Topical Web crawling is an established technique for domain-specific information retrieval. However, almost all the conventional topical Web crawlers focus on building crawlers using different classifiers, which needs... Topical Web crawling is an established technique for domain-specific information retrieval. However, almost all the conventional topical Web crawlers focus on building crawlers using different classifiers, which needs a lot of labeled training data that is very difficult to label manually. This paper presents a novel approach called clustering-based topical Web crawling which is utilized to retrieve information on a specific domain based on link-context and does not require any labeled training data. In order to collect domain-specific content units, a novel hierarchical clustering method called bottom-up approach is used to illustrate the process of clustering where a new data structure, a linked list in combination with CFu-tree, is implemented to store cluster label, feature vector and content unit. During clustering, four metrics are presented. First, comparison variation (CV) is defined to judge whether the closest pair of clusters can be merged. Second, cluster impurity (CIP) evaluates the cluster error. Then, the precision and recall of clustering are also presented to evaluate the accuracy and comprehensive degree of the whole clustering process. Link-context extraction technique is used to expand the feature vector of anchor text which improves the clustering accuracy greatly. Experimental results show that the performance of our proposed method overcomes conventional focused Web crawlers both in Harvest rate and Target recall. 展开更多
关键词 topical web crawling comparison variation (CV) cluster impurity (CIP) CFu-tree link-context CLUSTERING
原文传递
Assessing the effectiveness of crawlers and large language models in detecting adversarial hidden link threats in meta computing
2
作者 Junjie Xiong Mingkui Wei +1 位作者 Zhuo Lu Yao Liu 《High-Confidence Computing》 2025年第3期31-45,共15页
In the emerging field of Meta Computing,where data collection and integration are essential components,the threat of adversary hidden link attacks poses a significant challenge to web crawlers.In this paper,we investi... In the emerging field of Meta Computing,where data collection and integration are essential components,the threat of adversary hidden link attacks poses a significant challenge to web crawlers.In this paper,we investigate the influence of these attacks on data collection by web crawlers,which famously elude conventional detection techniques using large language models(LLMs).Empirically,we find some vulnerabilities in the current crawler mechanisms and large language model detection,especially in code inspection,and propose enhancements that will help mitigate these weaknesses.Our assessment of real-world web pages reveals the prevalence and impact of adversary hidden link attacks,emphasizing the necessity for robust countermeasures.Furthermore,we introduce a mitigation framework that integrates element visual inspection techniques.Our evaluation demonstrates the framework’s efficacy in detecting and addressing these advanced cyber threats within the evolving landscape of Meta Computing. 展开更多
关键词 Meta computing Data integration Adversary hidden link web crawling Content deception detection Large language model
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部