基于KNN的Deep Web数据源分类研究被引量：1

Automatic classification of Deep Web sources based on KNN

下载PDF

导出

摘要针对Deep Web的查询需求,文章提出了改进的对Deep Web数据源的分类方法:在对数据源进行分类时,采用了KNN分类算法来进行。由于KNN分类算法的K值选的过大或者过小都会对分类结果产生影响,因此提出了对K值进行优化的改进的KNN算法。文章利用k-means聚类算法来进行聚类,分别计算取得每个类别的k个距离相近的数据并计算这k个数据到聚类中心的距离,把这个距离的倒数作为该数据点对分类结果的贡献值。对训练集进行聚类后返回聚类中心,根据聚类中心计算权重,从而进一步来计算每个类别中k个最近邻贡献值之和S,选取S最大的类别作为测试数据的类别来进行分类,从而可达到比较好的分类效果。 To meet the need of Deep Web query, This paper puts forward the classification method of Deep Web data source： The improvement in the classification of the data source, Using the KNN classification algorithm to. Because the KNN classi- fication algorithm K value selected is too large or too small will affect the classification results, This paper uses k-means clus- tering algorithm to cluster, Made k a distance similar data for each category and calculate the k data to the cluster center distance were calculated, The inverse of this distance as the data points the results of the classification contribution value. On the training set to return after clustering the clustering center, Weights are calculated according to the clustering center, Thus fiLrther to cal- culate each category k nearest neighbor contribution value and S, S is selected as the largest category of test data categories to classify, Which can achieve the better classification effect.

作者牟晓伟刘寒梅

机构地区长春工业大学计算机科学与工程学院

出处《信息通信》 2015年第1期19-21,共3页 Information & Communications

关键词 DEEP WEB KNN K-MEANS 聚类分类 Deep Web KNN k-means classification clustering

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献8

1Raghavan S, Garcia-Molina H. Crawling the Hidden Web [C]. Proceedings of the 27th International Conference on Very Large Data Bases, Roma: [s.n.],2001,: 129-138.
2He B, Patel M, Zhang Z, et al. Accessing the Deep Web: A Survey[J]. Communications of the ACM(CACM), 2007, 50 (5): 94-101.
3Panagiotis G Ipeirotis, Luis Gravano, Mehran Sahami. Pro- be, count, and classify: categorizing hidden web databases [C] // Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data, 2001:67-78.
4Yih-Ling Hedley, Muhammad Younas, Anne E James. The categorization of llidden web databases through concept specificity and coverage[C]//proceedings of the 2005 inter- national workshop on web and mobile information Systems, 2005:371-376.
5He B, Tao T, Chang K C C. Organizing structured web sour- ces by query schemas: a clustering approach[C]//Procee~l- ings of the 13th Conference on Information and Knowledge Management, 2004:22-31.
6金灵芝,王小玲,朱守中.Deep Web数据源自动分类[J].微计算机信息,2009,25(12):227-228. 被引量：3
7Wang W, Yang J, Muntz R.STRIN: A Statistical Information Grid Approach to Spatial Data Mining[C].//Proc.of 1997 Intl. Conf. on Very ~ Databases, Athens, C_trc~~. 1997-8:186-195.
8Jim Z.C.Lai, Yi-Ching Liaw. Improvement of the k-means clustering filtering algorithm [J]. Pattern Recognition 41,2008:3677-3681.

二级参考文献5

1Bergman M K. The Deep Web:Surfacing Hidden Value J/OL . The Journal of Electronic Publishin g, 2001 , 7 (1)2001 . htt p:// www. press, umich, edu/jep/07 - 01/bergman.HTML.
2Chang K C, He B, Li C, Patel M, Zhang Z. Structured databases on the Web: Observations and Implications. SIG-MOD Record, 2004, 33(3): 61-70
3Peng Q, Meng W, He H, Yu C T. WISE-cluster: Cluste-ring e-commerce search engines automatically//Proceedingsof the 6th ACM International Workshop on Web Information and Data Management. Washington, 2004:104-111
4Ipeirotis P G, Gravano L, Sahami M. Probe, count, an classify: Categorizing hidden Web databases//Proceedings othe 19th ACM SIGMOD International Conference on Man-agement of Data. Santa Barbara, 2001:67-78
5李涛,陈鹏,李哲.深度Web资源探测系统的研究与实现[J].微计算机信息,2007,23(33):185-187. 被引量：7

共引文献2

1沈炜,蒙祖强.基于Web日志粒度化的深网数据库分类[J].微计算机信息,2010,26(15):161-162.
2张智,顾韵华.基于K-近邻算法的Deep Web数据源的自动分类[J].信息技术,2011,35(5):108-111.

同被引文献1

1王凯.基于产生式规则系统的抽油泵故障诊断[J].石油勘探与开发,2010,37(1):116-120. 被引量：14

引证文献1

1冯中申.基于kNN分类算法的有杆泵抽油井井下工况故障诊断方法[J].化工管理,2017(14):101-101. 被引量：2

二级引证文献2

1侯贤沐,王付勇,宰芸,廉培庆.基于机器学习和测井数据的碳酸盐岩孔隙度与渗透率预测[J].吉林大学学报（地球科学版）,2022,52(2):644-653. 被引量：30
2刘新平,杨鹏磊,张晓东,邓杰.基于LSTM的抽油机井工况诊断研究[J].计算机与数字工程,2023,51(11):2742-2745.

1马如霞,孟小峰.基于数据源分类可信性的真值发现方法研究[J].计算机研究与发展,2015,52(9):1931-1940. 被引量：8
2刘刚,梁晓庚,贺学剑.基于图形处理器的模糊C均值聚类分割算法[J].计算机科学,2012,39(1):285-286. 被引量：1
3为未来数据中心打基础[J].网管员世界,2011(6):12-12.
4石龙,强保华,谌超,吴春明.基于查询接口文本VSM的Deep Web数据源分类[J].计算机应用与软件,2013,30(8):54-58. 被引量：2
5姚双良,鞠时光.Deep Web数据源分类模型研究[J].江苏科技大学学报（自然科学版）,2012,26(1):45-49.
6华慧,伏玉琛,周小科.基于查询接口文本的Deep Web数据源分类[J].计算机工程,2010,36(12):66-68. 被引量：1
7黄黎,赵朋朋,方巍,崔志明,孙振强.基于世界知识的深网数据源增强分类模型[J].计算机工程,2010,36(8):60-63. 被引量：1
8张浩.威邦云操作系统[J].移动通信,2013,37(10):8-10.
9马妮娜,吉文帅.云计算模型在构建海上绿色IT数据中心的研究[J].舰船科学技术,2017,39(3X):165-167.
10林仕鼎.浅谈云计算与数据中心计算[J].程序员,2012(2):44-47. 被引量：1

信息通信

2015年第1期

浏览历史

内容加载中请稍等...

基于KNN的Deep Web数据源分类研究被引量：1

参考文献8

二级参考文献5

共引文献2

同被引文献1

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于KNN的Deep Web数据源分类研究 被引量：1

参考文献8

二级参考文献5

共引文献2

同被引文献1

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于KNN的Deep Web数据源分类研究被引量：1