摘要
故障诊断是分布式并行环境下容错系统的关键部分,故障分类模型是影响故障诊断性能的重要因素之一。由于不同的分布式系统有其不同的特点,为了减少系统在故障诊断方面的负担,故障诊断方案一般都考虑程序的需求和系统的属性,选择最合适的故障分类模型。本文提出了一种新的分布式并行环境下的故障分类模型,可以将故障诊断限定在一个合理的故障集中。将这种分类模型和特殊的程序需求以及系统属性相结合,能够得到一个效果理想的故障检测方案。
Fault detection is a crucial part of the distributed fault tolerant system,and failure classification model is one of the most important factors to affect the performance of a fault detection system. Each particular distributed system has distinguishing features. In order to reduce the message number and the time consumption, fault detection protocols should be tailored best to application requirements and system properties ,and fight failure classification should be chosen. This paper presents a common failure classification in distributed as an instrument to limit fault detection to a reasonable failure set. An efficient failure detector can be designed when we combine this common failure classification with application requirements and system properties.
出处
《计算机应用与软件》
CSCD
北大核心
2006年第7期35-36,共2页
Computer Applications and Software
基金
上海---应用材料研究与发展基金(0215)项目资助
关键词
故障分类
分布式系统
Failure classification Distributed systems