摘要
鉴于容错并行算法的设计是影响其容错性能的关键因素,首先,根据容错并行算法的设计方法,给出了容错并行算法的分类,并对各类算法的特点进行了分析;然后,根据分类方法选择了并行矩阵三角分解和快速傅里叶变换2种典型的并行算法,设计出2类并行算法应用所对应的容错并行算法;最后,在一个256结点的机群系统上对设计的容错并行算法的性能进行了测试,结果表明容错并行算法可以实现很低的容错开销.
The design of fault-tolerant parallel algorithm (FTPA) is to partition a program into program sections, and manipulate each program section into a fault-tolerant program section with the insertion of a data saving section, a failure detection section, and a recovery section. First, according to the design methodology, the classification of FTPA was given and the characters of all classifications of FTPA were analyzed. Second, the FTPAs for matrix triangular decomposition were fast Fourier transformation. Finally, the performance of FTPAs was evaluated on a cluster with 256 nodes. The experimental results show that FTPA can achieve a low fault-tolerant overhead.
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2011年第4期49-52,共4页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
国家自然科学基金资助项目(61003087
60903059)
国家科技重大专项基金资助项目(2009ZX01036-001-003-001)
关键词
并行编程
容错
分类
容错并行算法
矩阵三角分解
快速傅里叶变换
parallel programming
fault tolerance
classification
fault-tolerant parallel algorithm
matrix triangular decomposition~ fast Fourier transformation