期刊文献+
共找到2篇文章
< 1 >
每页显示 20 50 100
Unsupervised Quick Reduct Algorithm Using Rough Set Theory 被引量:2
1
作者 C. Velayutham K. Thangavel 《Journal of Electronic Science and Technology》 CAS 2011年第3期193-201,共9页
Feature selection (FS) is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that not all features are important. Some of the features ma... Feature selection (FS) is a process to select features which are more informative. It is one of the important steps in knowledge discovery. The problem is that not all features are important. Some of the features may be redundant, and others may be irrelevant and noisy. The conventional supervised FS methods evaluate various feature subsets using an evaluation function or metric to select only those features which are related to the decision classes of the data under consideration. However, for many data mining applications, decision class labels are often unknown or incomplete, thus indicating the significance of unsupervised feature selection. However, in unsupervised learning, decision class labels are not provided. In this paper, we propose a new unsupervised quick reduct (QR) algorithm using rough set theory. The quality of the reduced data is measured by the classification performance and it is evaluated using WEKA classifier tool. The method is compared with existing supervised methods and the result demonstrates the efficiency of the proposed algorithm. 展开更多
关键词 Index terms--data mining rough set supervised and unsupervised feature selection unsupervised quick reduct algorithm.
在线阅读 下载PDF
Load Balance Strategy of Data Routing Algorithm Using Semantics for Deduplication Clusters
2
作者 Ze-Jun Jiang Zhi-Ke Zhang +2 位作者 Li-Fang Wang Chin-Chen Chang Li Liu 《Journal of Electronic Science and Technology》 CAS CSCD 2017年第3期277-282,共6页
The backup requirement of data centres is tremendous as the size of data created by human is massive and is increasing exponentially.Single node deduplication cannot meet the increasing backup requirement of data cent... The backup requirement of data centres is tremendous as the size of data created by human is massive and is increasing exponentially.Single node deduplication cannot meet the increasing backup requirement of data centres.A feasible way is the deduplication cluster,which can meet it by adding storage nodes.The data routing strategy is the key of the deduplication cluster.DRSS(data routing strategy using semantics) improves the storage utilization of MCS(minimum chunk signature) data routing strategy a lot.However,for the large deduplication cluster,the load balance of DRSS is worse than MCS.To improve the load balance of DRSS,we propose a load balance strategy used for DRSS,namely DRSSLB.When a node is overloaded,DRSSLB iteratively migrates the current smallest container of the node to the smallest node in the deduplication cluster until this overloaded node becomes non-overloaded.A container is the minimum unit of data migration.Similar files sharing the same features or file names are stored in the same container.This ensures the similar data groups are still in the same node after rebalancing the nodes.We use the dataset from the real world to evaluate DRSSLB.Experimental results show that,for various numbers of nodes of the deduplication cluster,the data skews of DRSSLB are under predefined value while the storage utilizations of DRSSLB do not nearly increase compared with DRSS,with the low penalty(the data migration rate is only6.5% when the number of nodes is 64). 展开更多
关键词 Index terms--data routing strategy deduplicationcluster SEMANTICS load balance.
在线阅读 下载PDF
上一页 1 下一页 到第
使用帮助 返回顶部