期刊文献+

并行数字地形分析的容错算法研究 被引量:3

Research on the Fault-Tolerant Algorithm of Parallel Digital Terrain Analysis
在线阅读 下载PDF
导出
摘要 在高性能地学计算系统中,任务计算失败将会导致严重的后果,因此高性能地学计算必须具有可靠性保障。软件容错模型是提高并行计算容错能力的一种有效方法。针对传统基于检查点/回滚的容错策略存在资源浪费的不足,以并行地形分析为研究对象,基于软件容错模型提出一种基于邻域型算法的容错策略——N-ABFT(Neigh-boring-Algorithm Based Fault-Tolerant)。针对邻域型地形因子,该容错策略为并行程序划分出的各数据块增加冗余的校验行或校验列。最后,结合N-ABFT算法,提出一种容错调度算法,有效地提高了系统容错能力,降低了错误检测开销。 In recent years,due to the increasing calculation demands for the massive spatial data analysis,the parallel computing based on high-performance computers has become an inevitable trend for Digital Terrain Analysis(DTA).At the same time,the reliability of the parallel system becomes a foremost key while the stability of the clusters with tens of thousands of processors is threatened constantly by a larger number of hardware and software failures.This paper takes parallel DTA technologies as research object and proposes a Neighboring-Algorithm Based Fault-Tolerant(N-ABFT) strategy so as to enhance the accuracy of failure detection in fault-tolerant software.By means of the check row/column,the N-ABFT algorithm can detect the transient and fail-stop failures after all the computing nodes finished the calculation.Finally,two algorithms based on different analytical windows are tested and the preliminary results are discussed.
出处 《地理与地理信息科学》 CSCD 北大核心 2013年第2期1-5,共5页 Geography and Geo-Information Science
基金 国家863计划资助项目(2011AA120303) 国家自然科学基金项目(41171298 41071244) 江苏省普通高校研究生科研创新计划项目(CXZZ12_0393)
关键词 并行计算 DEM 软件容错 parallel computing DEM fault-tolerant software
  • 相关文献

参考文献20

  • 1RANDELI. B. System structure for software fault tolerance[J]. IEEE Transactions on Software Engineering, 1975,1 (2) : 221 - 232.
  • 2LEVITIN G. Optimal structure of fault-tolerant software sys- tems[J]. Reliability Engineering and System Safety, 2005, 89(3) :286-295.
  • 3LEVITIN G, XIE M, ZHANG T. Reliability of fault-tolerant systems with parallel task processing[J]. European Journal of Operational Research, 2007,177 ( 1 ) : 420- 430.
  • 4杜云飞,唐玉华.容错并行算法的分类和设计[J].华中科技大学学报(自然科学版),2011,39(4):49-52. 被引量:1
  • 5HANMER R S. Patterns for Fault Tolerant Software[M]. John Wiley & Sons Ltd,2007.
  • 6HUANG K H, ABRAHAM J A. Algorithm-based fault toler- ance for matrix operations[J]. IEEE Transactions on Comput- ers, 1984,33(6) ; 518-528.
  • 7OBORIL F, TAHOORI M B, HEUVELINE V, et al. Numerical defect correction as an algorithm-based fault tolerance technique for iterative solvers[A]. 17th IEEE Pacific Rim International Symposium on Dependable Computing Pasadena, CA, USA, 2011. 144-153.
  • 8BOSII/2A G, DELMAS R, DONGARRA J, et al. Algorithm-based fault tolerance applied to high performance computing[J]. Par- allel and Distributed Computing,2009,69(4):410-416.
  • 9BANERJEE P,ABRAHAM J A. Bounds on algorithm-based fault tolerance in multiple processor systems[J]. IEEE Transactions on Computers, 1986,35(4) :296-306.
  • 10CHEN Z, DONGARRA J. Algorithm-based fault tolerance for fail-stop failures[J]. IEEE Transactions on Parallel and Dis- tributed Systems, 2008, (19) 12 : 1628- 1641.

二级参考文献116

共引文献49

同被引文献205

引证文献3

二级引证文献257

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部