|
MEADEP was developed to provide a means of quantitatively assessing the reliability
and probability of failure on demand for computer-based safety systems in nuclear
power plants (referred to as "digital systems"). Existing instrumentation and control
(I&C) systems are obsolete or obsolescent. However, upgrading these safety grade
(Class 1E) systems to digital technology currently poses a technological and regulatory
risk. This risk is not so much due to the technology, which is mature and proven,
as to limitations of methods used for verifying compliance with system safety and
reliability requirements. While methods for predicting analog hardware reliability
are widely accepted by the nuclear power community, this does not hold for similar
methods for Class 1E digital systems.
There are two conventional approaches to reliability and availability prediction:
1) modeling of a system in the design phase, or 2) assessment of the system in a
later phase, typically by test. The first approach relies on probabilistic models
that use component level failure rates published in handbooks or supplied by the
manufacturers. This approach provides an early indication of system reliability,
but the model as well as the underlying data later need to be validated by actual
measurements. The second approach typically uses test data and reliability growth
models. It involves fewer assumptions than the first, but it can be very costly.
The higher the reliability specified for a system, the longer the required test.
A further difficulty arises in the translation of reliability data obtained by test
into those applicable to the operational environment.
MEADEP overcomes the problems enumerated above and is not only well suited to the
operational environment, but it can also be applied during test. The primary advance
over the conventional methods lies in the use of models for interpretation of the
measurements. This methodology extracts much more information from available data
than conventional approaches. In turn, it permits creditable assessment of the probability
of rare events from measurement of (more frequent) predecessor events and without
the need of observing the actual event. Based on measurements of operational systems,
MEADEP can provide objective reliability assessments with stated confidence levels.
This is like proof of the pudding by eating, rather than by analyzing the ingredients.
A further benefit for regulatory agencies is that the use of standardized failure
rate data for digital systems provides an overview of the performance of these systems.
Measurements can be performed on commercial grade components without requiring the
vendor to reveal proprietary information or to modify an established development
process.
Critical digital systems are found in a broad spectrum of applications, and the
goals of reliability assessment vary widely among these. It is instructive to consider
the following two extremes in order to understand where the methodology and the
tool discussed in the report are most applicable:
Networked systems, such as air traffic control, telephone switching, and funds transfer
Stand-alone protection systems for nuclear or chemical plants Networked systems
are characterized by:
Essentially constant workload, making a "usage profile" and Mean Time Between Failures
(MTBF) relevant. Ability to tolerate occasional outages because alternate routines
or similar work-arounds can be used. Users that are highly motivated to keep downtime
statistics and assess causes of failures. An earlier study found downtimes of between
30 and 35 hours per year per installation for these systems, with between 30 and
50 percent of the downtime being due to identified software problems . This environment
permits systematic studies of failure mechanisms and occurrence rates that can be
used to build and validate software reliability models. Models that assume an initial
fault density and an increase in the MTBF proportional to the faults removed are
appropriate for this application area.
In contrast, in stand-alone protection systems the performance under routine workload
is largely irrelevant to the safety actions expected of the sytem in response to
malfunctions; the most important dependability criterion is probability of failure
on demand rather than mean time between failures; alternate means for accomplishing
the intended actions are not always available (human intervention or defense-in-depth
of automatic systems); and, most significant in the context of this investigation,
the user (utility, chemical plant) has no clear motivation and very little ability
to keep statistics of non-catastrophic system failures and to investigate and correct
the causes of these failures.
MEADEP has capabilities relevant to this last point by providing users of digital
protection systems with an easy means of keeping track of non-catastrophic failures
and of propagating the resulting statistics to system failure probabilities. Further,
making it easy to collect relevant statistics, and identifying the potential contribution
to catastrophic events of non-catastrophic conditions, will motivate users to investigate
and correct the causes of failures or exception conditions as they are encountered.
Of course, many applications fall between the two extremes discussed above, and
the tool will be valuable to these as well as for the specialized needs of stand-alone
protection systems. For example, Air Traffic Control (ATC) systems which have many
characteristics of a networked system. The example shows how the tool can furnish
MTBF predictions, and also how it can estimate the fault tolerance coverage, a capability
not provided in any of the published reliability models and the tools that support
them.
The importance of the methodology and the associated tool rests partly on the assumption
that catastrophic failures in well-tested systems usually result from the coincidence
of a number of conditions that are individually tolerable or, at least, non-catastrophic.
A number of examples of such occurrences are documented in NUREG/CR-6293 . That
catastrophic failures result from a combination of failures and mistakes that individually
have much lesser consequences is not restricted to digital systems. This process
is very much at work in many major industrial, transportation, and energy generation
accidents.
Potentially the greatest benefit of MEADEP is when it is used during concept development
and early design because deficiencies identified at that time can be corrected at
relatively low cost and with few side-effects. The difficulty is that no direct
measurements from the system will be available at that time. Even in these circumstances
the tool can be used for sensitivity studies, for "what if?" investigations, and
for comparison of alternatives under a given set of parameters. As operational data
acquired by use of the tool are made available for reliability assessment during
earlier phases these benefits will be considerably increased.
|