摘要
近几年,网络被在线数据库迅速地深化。在深网中,大量的资料提供了丰富的数据模式,这些模式详细说明了它们的目标领域和查询性能,因此对大规模数据的整合是当前面临的挑战。在数据挖掘中,聚类分析是一个重要方法。本文论述通过查询接口采用凝聚层次聚类方法聚类结构化的Web资源,并采用先聚类后分类的方法稍加改进。实验显示对于聚类Web查询模式,凝聚的层次聚类能正确地组织资料。
In the recent years, the Web has been rapidly "deepened" with the databases online. On this deep Web, numerous sources are structured, providing schema-rich data-Their schemas define the object domain and its query capabilities. The structured deep Web thus presents challenges for large-scale information integration. Clustering is one of the important approaches in data mining, this paper studies organizing structured Web sources by query schemas with the hierarchical agglomerative clustering algorithm. And we use pre-clustering and post-classification techniques to improve it. Our experiments show the effectiveness- By clustering the query schemas, the hierarchical agglomerative clustering algorithm can accurately organize sources into object domains.
出处
《现代计算机》
2006年第9期19-21,62,共4页
Modern Computer
关键词
数据整合
深网
凝聚层次聚类
Data Integration
Deep Web
Hierarchical Agglomerative Clustering