摘要
为了提高三模冗余容错计算机的故障恢复效率,缩短故障恢复时间,提出并设计实现了一种基于硬件监视的关键数据链表式管理恢复算法,克服了传统容错计算机在故障恢复方面的缺陷.利用系统运行过程中CPU的空闲时间与串行数据交换通道,在不中断系统工作的情况下,对三模冗余容错计算机进行无缝恢复,保证了系统在故障恢复时工作的连续性,同时给出了系统的具体恢复过程和测试结果.实验结果验证了该方法的可行性和可靠性.
In order to improve the fault recovery efficiency of the Triple Module Redundancy fault-tolerant computer and reduce the failure recovery time, this paper designs and implements a recovery algorithm of the critical data for list-style management based on hardware monitoring. It overcomes the recovery deficiencies of the traditional fault-tolerant computer. Without interrupting the system to work, the Triple Module Redundancy fault-tolerant computer uses the CPU idle time and the serial data exchange channel to recov- ery seamlessly, which ensures the continuity of system operation during the period of fault recovery. Meanwhile, the specific recovery process and test results show that the method is feasible and reliable.
出处
《小型微型计算机系统》
CSCD
北大核心
2012年第1期188-192,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(60873006)资助
北京市自然科学基金项目(4062009
4082009)资助
北京市教委重点项目(KZ200710028014)资助
关键词
三模冗余
故障恢复
关键数据
硬件监视
监测包
TMR
fault recovery
critical data
hardware monitoring
monitoring package