期刊文献+

基于KNN的Deep Web数据源分类研究 被引量:1

Automatic classification of Deep Web sources based on KNN
在线阅读 下载PDF
导出
摘要 针对Deep Web的查询需求,文章提出了改进的对Deep Web数据源的分类方法:在对数据源进行分类时,采用了KNN分类算法来进行。由于KNN分类算法的K值选的过大或者过小都会对分类结果产生影响,因此提出了对K值进行优化的改进的KNN算法。文章利用k-means聚类算法来进行聚类,分别计算取得每个类别的k个距离相近的数据并计算这k个数据到聚类中心的距离,把这个距离的倒数作为该数据点对分类结果的贡献值。对训练集进行聚类后返回聚类中心,根据聚类中心计算权重,从而进一步来计算每个类别中k个最近邻贡献值之和S,选取S最大的类别作为测试数据的类别来进行分类,从而可达到比较好的分类效果。 To meet the need of Deep Web query, This paper puts forward the classification method of Deep Web data source: The improvement in the classification of the data source, Using the KNN classification algorithm to. Because the KNN classi- fication algorithm K value selected is too large or too small will affect the classification results, This paper uses k-means clus- tering algorithm to cluster, Made k a distance similar data for each category and calculate the k data to the cluster center distance were calculated, The inverse of this distance as the data points the results of the classification contribution value. On the training set to return after clustering the clustering center, Weights are calculated according to the clustering center, Thus fiLrther to cal- culate each category k nearest neighbor contribution value and S, S is selected as the largest category of test data categories to classify, Which can achieve the better classification effect.
出处 《信息通信》 2015年第1期19-21,共3页 Information & Communications
关键词 DEEP WEB KNN K-MEANS 聚类 分类 Deep Web KNN k-means classification clustering
  • 相关文献

参考文献8

  • 1Raghavan S, Garcia-Molina H. Crawling the Hidden Web [C]. Proceedings of the 27th International Conference on Very Large Data Bases, Roma: [s.n.],2001,: 129-138.
  • 2He B, Patel M, Zhang Z, et al. Accessing the Deep Web: A Survey[J]. Communications of the ACM(CACM), 2007, 50 (5): 94-101.
  • 3Panagiotis G Ipeirotis, Luis Gravano, Mehran Sahami. Pro- be, count, and classify: categorizing hidden web databases [C] // Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, 2001:67-78.
  • 4Yih-Ling Hedley, Muhammad Younas, Anne E James. The categorization of llidden web databases through concept specificity and coverage[C]//proceedings of the 2005 inter- national workshop on web and mobile information Systems, 2005:371-376.
  • 5He B, Tao T, Chang K C C. Organizing structured web sour- ces by query schemas: a clustering approach[C]//Procee~l- ings of the 13th Conference on Information and Knowledge Management, 2004:22-31.
  • 6金灵芝,王小玲,朱守中.Deep Web数据源自动分类[J].微计算机信息,2009,25(12):227-228. 被引量:3
  • 7Wang W, Yang J, Muntz R.STRIN: A Statistical Information Grid Approach to Spatial Data Mining[C].//Proc.of 1997 Intl. Conf. on Very ~ Databases, Athens, C_trc~~. 1997-8:186-195.
  • 8Jim Z.C.Lai, Yi-Ching Liaw. Improvement of the k-means clustering filtering algorithm [J]. Pattern Recognition 41,2008:3677-3681.

二级参考文献5

  • 1Bergman M K. The Deep Web:Surfacing Hidden Value J/OL . The Journal of Electronic Publishin g, 2001 , 7 (1)2001 . htt p:// www. press, umich, edu/jep/07 - 01/bergman.HTML.
  • 2Chang K C, He B, Li C, Patel M, Zhang Z. Structured databases on the Web: Observations and Implications. SIG-MOD Record, 2004, 33(3): 61-70
  • 3Peng Q, Meng W, He H, Yu C T. WISE-cluster: Cluste-ring e-commerce search engines automatically//Proceedingsof the 6th ACM International Workshop on Web Information and Data Management. Washington, 2004:104-111
  • 4Ipeirotis P G, Gravano L, Sahami M. Probe, count, an classify: Categorizing hidden Web databases//Proceedings othe 19th ACM SIGMOD International Conference on Man-agement of Data. Santa Barbara, 2001:67-78
  • 5李涛,陈鹏,李哲.深度Web资源探测系统的研究与实现[J].微计算机信息,2007,23(33):185-187. 被引量:7

共引文献2

同被引文献1

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部