摘要
针对Deep Web的查询需求,文章提出了改进的对Deep Web数据源的分类方法:在对数据源进行分类时,采用了KNN分类算法来进行。由于KNN分类算法的K值选的过大或者过小都会对分类结果产生影响,因此提出了对K值进行优化的改进的KNN算法。文章利用k-means聚类算法来进行聚类,分别计算取得每个类别的k个距离相近的数据并计算这k个数据到聚类中心的距离,把这个距离的倒数作为该数据点对分类结果的贡献值。对训练集进行聚类后返回聚类中心,根据聚类中心计算权重,从而进一步来计算每个类别中k个最近邻贡献值之和S,选取S最大的类别作为测试数据的类别来进行分类,从而可达到比较好的分类效果。
To meet the need of Deep Web query, This paper puts forward the classification method of Deep Web data source: The improvement in the classification of the data source, Using the KNN classification algorithm to. Because the KNN classi- fication algorithm K value selected is too large or too small will affect the classification results, This paper uses k-means clus- tering algorithm to cluster, Made k a distance similar data for each category and calculate the k data to the cluster center distance were calculated, The inverse of this distance as the data points the results of the classification contribution value. On the training set to return after clustering the clustering center, Weights are calculated according to the clustering center, Thus fiLrther to cal- culate each category k nearest neighbor contribution value and S, S is selected as the largest category of test data categories to classify, Which can achieve the better classification effect.
出处
《信息通信》
2015年第1期19-21,共3页
Information & Communications