提高用任务重复的检查点方案的性能被引量：4

Improving the Performance of Checkpointing Scheme with Task Duplication

下载PDF

导出

摘要设置检查点是减少程序在故障条件下执行时间的一种常用技术 .将检查点与任务重复技术相结合 ,不仅能够完成有效的故障恢复 ,而且还能进行完善的故障检测 .上述系统的开销主要来自两方面 :其一是每个检查点的比较和保存开销 ,其二是因故障而引起的卷回 .本文利用增量检查点对Ziv和Bruck提出的方法进行了改进 ,改进后的方法不仅能够有效地减少比较、保存检查点的开销 ,而且还能够避免潜伏故障引起的卷回 .分析表明改进后的方法与Ziv和Bruck的方法相比表现出更好的性能 . Checkpointing is a common technique for reducing the execution time of programs under fault assumption.With the combination of checkpointing and task duplication,not only effective fault recovery but also perfect fault detection can be achieved.The overhead of such systems comes from two aspects:comparing and saving operations at each checkpoint,and the rollbacks caused by faults.This paper improves the method presented by Ziv and Bruck by employing incremental checkpointing.The improved method can reduce the overhead of comparing and saving operation,and moreover the rollbacks caused by latent faults can be avoided.Analysis shows that our method exhibits better performance by comparison with that of Ziv and Bruck.

作者李凯原杨孝宗

机构地区哈尔滨工业大学计算机科学与工程系

出处《电子学报》 EI CAS CSCD 北大核心 2000年第5期33-35,28,共4页 Acta Electronica Sinica

基金国家自然科学基金!(No.698730 1 3)

关键词容错检查点卷回恢复任务重复程序 fault tolerance checkpoint rollback recovery task duplication

分类号 TP302.8 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献9

1[1] A.Ziv and J.Bruck.Performance optimization of checkpointing schemes with task duplication.IEEE Trans.Computers,Dec.1997,46(12):1381～1386
2[2] A.Ziv and J.Bruck.Analysis of checkpointing schemes with task duplication.IEEE Trans.Computers,Feb.1998,37(2):222～227
3[3] D.P.Siewiorek and R.S.Swarz.The theory and practice of reliable system design.Digital Press,1982
4[4] P.Agrawal.Fault tolerance in multiprocessor systems without dedicated redundency.IEEE Trans.Computers,Mar.1988,37(3):358～362
5[5] A.Duda.The effects of checkpointing on program execution time.Information Processing Letters,June 1983,16:221～229
6[6] D.K.Pradhan,and N.H.Vaidya.Roll-Forward and Rollback Recovery:Performance-Reliability Trade-off.Proc.24th IEEE Int′l Symp.Fault-Tolerant Computing,June 1994:186～195
7[7] J.Long,W.K.Fuchs,and J.A.Abraham.Forward recovery using checkpointing in parallel systems.Proc.19th Int′l Conf.Parallel Processing,Aug.1990:272～275
8[8] E.N.Elnozahy,D.B.Johnson and W.Zwaenepoel.The performance of consistent checkpinting.The 11th Symposium on Reliable Distributed Systems,1992:39～47
9[9] J.S.Plank,M.Beck and G.Kingsley.Libckpt:Transparent Checkpointing under Unix.1995 USENIX Technical Conference,1995:213～223

同被引文献60

1杨金民,张大方.一种基于虚拟对象的进程检查点实现方法[J].系统仿真学报,2004,16(6):1354-1357. 被引量：1
2郑建仙,黎忠文,陈亮.防危核技术在铁路微机联锁系统中的应用研究[J].铁路计算机应用,2005,14(6):13-15. 被引量：1
3黄海林,唐志敏,许彤.龙芯1号处理器的故障注入方法与软错误敏感性分析[J].计算机研究与发展,2006,43(10):1820-1827. 被引量：31
4D P Siewiorek,R S Swarz.Reliable computer systems:Design and evaluation[M].2nd Edition.Bedford Mass:Digital Press.1992.
5Y.c.Jenq.Perfect Reconstruction of Digital Spectrum from Nonuniformly Sampled Signals[J].IEEE Trans.Instrum.Meas.June1997,46:649-652.
6Pradhan DK, Krishna P, Vaidya NH, Recovery in mobile environments design and tradeoff analysis, In: Tohma Y, ed. Proc of the 26th Int'l Symp. Fault-Tolerant Computing. Sendai: IEEE Press, 1996, 16-25,.
7Koo R, Touge S. Checkpoinging and rollback-recovery for distributed systems. IEEE Trans on Software Engineering,1987,13(1):23-31.
8Kim JL, Park T. An efficient algorithm for checkpointing recovery in distributed systems. IEEE Transon Parallel and Distributed Systems, 1993,4(8):955-960.
9Chandy KM, Lamport L. Distributed snapshots: Determining global states of distributed systems. ACM Trans on Computer Systems, 1985,3(1):63-75.
10Ramanathan P, Shin KG. Use of common time base for checkpointing and rollback recovery ina distributed system. IEEE Trans on Software Engineering, 1993,19(6):571-583.

引证文献4

1李庆华,蒋廷耀,张红君.一种面向移动计算的低代价透明检查点恢复协议(英文)[J].软件学报,2005,16(1):135-144. 被引量：4
2黎忠文.嵌入式实时系统容错集成技术的研究[J].计算机科学,2006,33(5):277-281. 被引量：1
3黎忠文,郑建仙,罗仁泽.容错实时系统的内存管理优化方案及实现[J].航空计算技术,2007,37(3):63-65.
4曾宪炼,马捷中,何世强.基于容错技术的处理器设计[J].计算机测量与控制,2010,18(4):892-895. 被引量：2

二级引证文献7

1张展,左德承,慈轶为,杨孝宗.一种基于移动计算环境的因果日志卷回恢复算法[J].计算机研究与发展,2008,45(2):348-357. 被引量：7
2张伟功,朱晓燕,关永,周继芹,尚媛园.基于微包协议的三模冗余容错计算机无缝重构算法[J].计算机科学,2009,36(6):286-289. 被引量：4
3门朝光,徐振朋,李香.移动计算系统检查点迁移策略的性能评价[J].哈尔滨工业大学学报,2010,42(5):806-810. 被引量：3
4徐振朋.An adaptive handoff management for fault tolerant mobile computing[J].High Technology Letters,2010,16(4):407-412.
5徐振朋,门朝光,李香.日志检查点回卷恢复策略的检查点周期求解模型[J].高技术通讯,2011,21(6):575-580. 被引量：2
6陶鹏,马捷中,支新辉.基于VHDL的故障注入工具的研究[J].测控技术,2011,30(9):108-111. 被引量：1
7高明.基于双机热备结构的嵌入式ARM容错控制系统[J].计算机测量与控制,2013,21(7):1828-1830. 被引量：3

1包日晓.处理不断增加的打印任务[J].电脑爱好者,2004(7):80-80.
2汪东升,郑纬民,王鼎兴,沈美明.基于NOW的检查点设置与卷回恢复[J].中国科学（E辑）,1998,28(6):559-566. 被引量：2
3陈伟,刘求真,张蕾,蒲利.8031两模冗余容错单片机系统[J].西南石油学院学报,1993,15(3):130-134.
4张海炎.关于电力变压器智能化技术探讨[J].科技与企业,2016(3):213-213.
5文梅,李宏亮,张春元,范金鹏,吴涛,王志英.分布式系统故障卷回恢复技术研究与实践[J].计算机工程与科学,2000,22(5):52-55. 被引量：3
6雷电.荒野大救赎[J].游戏机实用技术,2010(12):26-43.
7向兵,董晓红,黄慧.教务综合信息管理系统的设计与实现[J].电脑知识与技术（过刊）,2016,22(6X):78-81. 被引量：1
8黎忠文.嵌入式实时系统容错集成技术的研究[J].计算机科学,2006,33(5):277-281. 被引量：1
9职场人爱上“网络拖延症”任务重复缺乏动力[J].西北职教,2012(10):41-41.
10万国伟,卢宇彤,谢旻,沈志宇.一种低开销非阻塞的协同式检查点算法[J].计算机工程,2007,33(24):66-68. 被引量：1

电子学报

2000年第5期

浏览历史

内容加载中请稍等...

提高用任务重复的检查点方案的性能被引量：4

参考文献9

同被引文献60

引证文献4

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

提高用任务重复的检查点方案的性能 被引量：4

参考文献9

同被引文献60

引证文献4

二级引证文献7

相关作者

相关机构

相关主题

浏览历史

提高用任务重复的检查点方案的性能被引量：4