|
K. Kim, C. Subbaraman, E. Shokri, High Coverage Fault Tolerance in Real-Time
Systems Based on Point-to-Point Communication, High Assurance Systems Engineering
Symposium, Washington, DC, August, 1997
The distributed recovery block (DRB) scheme is a widely applicable approach for
realizing both hardware and software fault tolerance in real-time distributed and
parallel computer systems. One of the most important extensions of the DRB scheme
which were outlined in recent years but not developed fully is the integration of
the DRB scheme and a network surveillance(NS) scheme. We recently developed an NS
scheme effective in a variety of point-to-point networks and it is called the supervisor-based
NS (SNS) scheme. In this paper, we present an integration of the DRB scheme with
the SNS scheme, called the DRB/SNS scheme. This scheme is a significant improvement
over the previous versions of the DRB scheme with respect to the fault coverage
and recovery time bound achieved in the systems that are based on point-to-point
networks. The execution support for the integrated scheme has been implemented as
a part of the DREAM kernel prototype, a timeliness-guaranteed operating system kernel
developed at the University of California, Irvine. The recovery time bound of the
DRB/SNS scheme is analyzed on the basis of the prototype implementation.
Click to Download
|
|