As data are growing rapidly in data centers,inline cluster deduplication technique has been widely used to improve storage efficiency and data reliability.However,there are some challenges faced by the cluster dedupli...As data are growing rapidly in data centers,inline cluster deduplication technique has been widely used to improve storage efficiency and data reliability.However,there are some challenges faced by the cluster deduplication system:the decreasing data deduplication rate with the increasing deduplication server nodes,high communication overhead for data routing,and load balance to improve the throughput of the system.In this paper,we propose a well-performed cluster deduplication system called AR-Dedupe.The experimental results of two real datasets demonstrate that AR-Dedupe can achieve a high data deduplication rate with a low communication overhead and keep the system load balancing well at the same time through a new data routing algorithm.In addition,we utilize application-aware mechanism to speed up the index of handprints in the routing server which has a 30%performance improvement.展开更多
重复数据删除集群是解决不断增长的海量数据备份需求的一种有效方法。它的关键问题是数据路由策略,即如何把数据合理分配到集群内的各个节点。目前的数据路由策略利用文件或者数据段的最小数据块签名计算路由目标节点,称作MCS(minimum c...重复数据删除集群是解决不断增长的海量数据备份需求的一种有效方法。它的关键问题是数据路由策略,即如何把数据合理分配到集群内的各个节点。目前的数据路由策略利用文件或者数据段的最小数据块签名计算路由目标节点,称作MCS(minimum chunk signature)数据路由策略。当重复数据删除集群规模较小时,这种方法的存储使用量接近单节点重复数据删除。但是,当集群规模较大时,它的存储使用量远远劣于单节点重复数据删除。为了降低重复数据删除集群的存储使用量,提出一种基于路径的重复数据删除集群的数据路由策略,称作DRSD(data routing strategy based on directories)。实验结果表明,对于各种不同的节点数量,DRSD的重复数据删除率都明显高于MCS,并且接近单节点重复数据删除。当节点数量是64时,DRSD的重复数据删除率比MCS高35%。展开更多
基金the National High Technology Research and Development Program(863)of China(No.2013AA013201)the National Natural Science Foundation of China(Nos.61025009,61232003,61170288 and 61332003)
文摘As data are growing rapidly in data centers,inline cluster deduplication technique has been widely used to improve storage efficiency and data reliability.However,there are some challenges faced by the cluster deduplication system:the decreasing data deduplication rate with the increasing deduplication server nodes,high communication overhead for data routing,and load balance to improve the throughput of the system.In this paper,we propose a well-performed cluster deduplication system called AR-Dedupe.The experimental results of two real datasets demonstrate that AR-Dedupe can achieve a high data deduplication rate with a low communication overhead and keep the system load balancing well at the same time through a new data routing algorithm.In addition,we utilize application-aware mechanism to speed up the index of handprints in the routing server which has a 30%performance improvement.
文摘重复数据删除集群是解决不断增长的海量数据备份需求的一种有效方法。它的关键问题是数据路由策略,即如何把数据合理分配到集群内的各个节点。目前的数据路由策略利用文件或者数据段的最小数据块签名计算路由目标节点,称作MCS(minimum chunk signature)数据路由策略。当重复数据删除集群规模较小时,这种方法的存储使用量接近单节点重复数据删除。但是,当集群规模较大时,它的存储使用量远远劣于单节点重复数据删除。为了降低重复数据删除集群的存储使用量,提出一种基于路径的重复数据删除集群的数据路由策略,称作DRSD(data routing strategy based on directories)。实验结果表明,对于各种不同的节点数量,DRSD的重复数据删除率都明显高于MCS,并且接近单节点重复数据删除。当节点数量是64时,DRSD的重复数据删除率比MCS高35%。