期刊文献+

提高用任务重复的检查点方案的性能 被引量:4

Improving the Performance of Checkpointing Scheme with Task Duplication
在线阅读 下载PDF
导出
摘要 设置检查点是减少程序在故障条件下执行时间的一种常用技术 .将检查点与任务重复技术相结合 ,不仅能够完成有效的故障恢复 ,而且还能进行完善的故障检测 .上述系统的开销主要来自两方面 :其一是每个检查点的比较和保存开销 ,其二是因故障而引起的卷回 .本文利用增量检查点对Ziv和Bruck提出的方法进行了改进 ,改进后的方法不仅能够有效地减少比较、保存检查点的开销 ,而且还能够避免潜伏故障引起的卷回 .分析表明改进后的方法与Ziv和Bruck的方法相比表现出更好的性能 . Checkpointing is a common technique for reducing the execution time of programs under fault assumption.With the combination of checkpointing and task duplication,not only effective fault recovery but also perfect fault detection can be achieved.The overhead of such systems comes from two aspects:comparing and saving operations at each checkpoint,and the rollbacks caused by faults.This paper improves the method presented by Ziv and Bruck by employing incremental checkpointing.The improved method can reduce the overhead of comparing and saving operation,and moreover the rollbacks caused by latent faults can be avoided.Analysis shows that our method exhibits better performance by comparison with that of Ziv and Bruck.
出处 《电子学报》 EI CAS CSCD 北大核心 2000年第5期33-35,28,共4页 Acta Electronica Sinica
基金 国家自然科学基金!(No.698730 1 3)
关键词 容错 检查点 卷回恢复 任务重复 程序 fault tolerance checkpoint rollback recovery task duplication
  • 相关文献

参考文献9

  • 1[1] A.Ziv and J.Bruck.Performance optimization of checkpointing schemes with task duplication.IEEE Trans.Computers,Dec.1997,46(12):1381~1386
  • 2[2] A.Ziv and J.Bruck.Analysis of checkpointing schemes with task duplication.IEEE Trans.Computers,Feb.1998,37(2):222~227
  • 3[3] D.P.Siewiorek and R.S.Swarz.The theory and practice of reliable system design.Digital Press,1982
  • 4[4] P.Agrawal.Fault tolerance in multiprocessor systems without dedicated redundency.IEEE Trans.Computers,Mar.1988,37(3):358~362
  • 5[5] A.Duda.The effects of checkpointing on program execution time.Information Processing Letters,June 1983,16:221~229
  • 6[6] D.K.Pradhan,and N.H.Vaidya.Roll-Forward and Rollback Recovery:Performance-Reliability Trade-off.Proc.24th IEEE Int′l Symp.Fault-Tolerant Computing,June 1994:186~195
  • 7[7] J.Long,W.K.Fuchs,and J.A.Abraham.Forward recovery using checkpointing in parallel systems.Proc.19th Int′l Conf.Parallel Processing,Aug.1990:272~275
  • 8[8] E.N.Elnozahy,D.B.Johnson and W.Zwaenepoel.The performance of consistent checkpinting.The 11th Symposium on Reliable Distributed Systems,1992:39~47
  • 9[9] J.S.Plank,M.Beck and G.Kingsley.Libckpt:Transparent Checkpointing under Unix.1995 USENIX Technical Conference,1995:213~223

同被引文献60

  • 1杨金民,张大方.一种基于虚拟对象的进程检查点实现方法[J].系统仿真学报,2004,16(6):1354-1357. 被引量:1
  • 2郑建仙,黎忠文,陈亮.防危核技术在铁路微机联锁系统中的应用研究[J].铁路计算机应用,2005,14(6):13-15. 被引量:1
  • 3黄海林,唐志敏,许彤.龙芯1号处理器的故障注入方法与软错误敏感性分析[J].计算机研究与发展,2006,43(10):1820-1827. 被引量:31
  • 4D P Siewiorek,R S Swarz.Reliable computer systems:Design and evaluation[M].2nd Edition.Bedford Mass:Digital Press.1992.
  • 5Y.c.Jenq.Perfect Reconstruction of Digital Spectrum from Nonuniformly Sampled Signals[J].IEEE Trans.Instrum.Meas.June1997,46:649-652.
  • 6Pradhan DK, Krishna P, Vaidya NH, Recovery in mobile environments design and tradeoff analysis, In: Tohma Y, ed. Proc of the 26th Int'l Symp. Fault-Tolerant Computing. Sendai: IEEE Press, 1996, 16-25,.
  • 7Koo R, Touge S. Checkpoinging and rollback-recovery for distributed systems. IEEE Trans on Software Engineering,1987,13(1):23-31.
  • 8Kim JL, Park T. An efficient algorithm for checkpointing recovery in distributed systems. IEEE Transon Parallel and Distributed Systems, 1993,4(8):955-960.
  • 9Chandy KM, Lamport L. Distributed snapshots: Determining global states of distributed systems. ACM Trans on Computer Systems, 1985,3(1):63-75.
  • 10Ramanathan P, Shin KG. Use of common time base for checkpointing and rollback recovery ina distributed system. IEEE Trans on Software Engineering, 1993,19(6):571-583.

引证文献4

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部