analysis costing cycle life, availability, block design, block diagram, block diagram software, business continuity, business continuity planning, design fmea, engineering resource, fmea, fmea software, fmeca, fracas, hardware project, high availability, lcc, life cycle cost, life cycle costing, markov, markov chain, mil 217, mil hdbk 217, network analysis system, network availability, ram design, rbd, reliability, reliability analysis, reliability analysis software, reliability engineering, reliability modeling, reliability prediction, reliability prediction software, reliability services, reliability software, reliability tool, system reliability, tco, total cost of ownership, data prediction, process fmea, reliability availability, tco tool, integrated hardware support, prediction software, process flow software, reliability block diagram, sneak circuit analysis, markov reliability, analysis costing cycle life, hardware prediction, failure reporting, sneak circuit, tco analysis, reliability software tool, data safety software, hardware prediction software
 
Home  Software  MEADEP  Background
 Background
 
  
 
 


Background

MEADEP was developed to provide a means of quantitatively assessing the reliability and probability of failure on demand for computer-based safety systems in nuclear power plants (referred to as "digital systems"). Existing instrumentation and control (I&C) systems are obsolete or obsolescent. However, upgrading these safety grade (Class 1E) systems to digital technology currently poses a technological and regulatory risk. This risk is not so much due to the technology, which is mature and proven, as to limitations of methods used for verifying compliance with system safety and reliability requirements. While methods for predicting analog hardware reliability are widely accepted by the nuclear power community, this does not hold for similar methods for Class 1E digital systems.

There are two conventional approaches to reliability and availability prediction: 1) modeling of a system in the design phase, or 2) assessment of the system in a later phase, typically by test. The first approach relies on probabilistic models that use component level failure rates published in handbooks or supplied by the manufacturers. This approach provides an early indication of system reliability, but the model as well as the underlying data later need to be validated by actual measurements. The second approach typically uses test data and reliability growth models. It involves fewer assumptions than the first, but it can be very costly. The higher the reliability specified for a system, the longer the required test. A further difficulty arises in the translation of reliability data obtained by test into those applicable to the operational environment.

MEADEP overcomes the problems enumerated above and is not only well suited to the operational environment, but it can also be applied during test. The primary advance over the conventional methods lies in the use of models for interpretation of the measurements. This methodology extracts much more information from available data than conventional approaches. In turn, it permits creditable assessment of the probability of rare events from measurement of (more frequent) predecessor events and without the need of observing the actual event. Based on measurements of operational systems, MEADEP can provide objective reliability assessments with stated confidence levels. This is like proof of the pudding by eating, rather than by analyzing the ingredients. A further benefit for regulatory agencies is that the use of standardized failure rate data for digital systems provides an overview of the performance of these systems. Measurements can be performed on commercial grade components without requiring the vendor to reveal proprietary information or to modify an established development process.

Critical digital systems are found in a broad spectrum of applications, and the goals of reliability assessment vary widely among these. It is instructive to consider the following two extremes in order to understand where the methodology and the tool discussed in the report are most applicable:

Networked systems, such as air traffic control, telephone switching, and funds transfer Stand-alone protection systems for nuclear or chemical plants Networked systems are characterized by:

Essentially constant workload, making a "usage profile" and Mean Time Between Failures (MTBF) relevant. Ability to tolerate occasional outages because alternate routines or similar work-arounds can be used. Users that are highly motivated to keep downtime statistics and assess causes of failures. An earlier study found downtimes of between 30 and 35 hours per year per installation for these systems, with between 30 and 50 percent of the downtime being due to identified software problems . This environment permits systematic studies of failure mechanisms and occurrence rates that can be used to build and validate software reliability models. Models that assume an initial fault density and an increase in the MTBF proportional to the faults removed are appropriate for this application area.

In contrast, in stand-alone protection systems the performance under routine workload is largely irrelevant to the safety actions expected of the sytem in response to malfunctions; the most important dependability criterion is probability of failure on demand rather than mean time between failures; alternate means for accomplishing the intended actions are not always available (human intervention or defense-in-depth of automatic systems); and, most significant in the context of this investigation, the user (utility, chemical plant) has no clear motivation and very little ability to keep statistics of non-catastrophic system failures and to investigate and correct the causes of these failures.

MEADEP has capabilities relevant to this last point by providing users of digital protection systems with an easy means of keeping track of non-catastrophic failures and of propagating the resulting statistics to system failure probabilities. Further, making it easy to collect relevant statistics, and identifying the potential contribution to catastrophic events of non-catastrophic conditions, will motivate users to investigate and correct the causes of failures or exception conditions as they are encountered. Of course, many applications fall between the two extremes discussed above, and the tool will be valuable to these as well as for the specialized needs of stand-alone protection systems. For example, Air Traffic Control (ATC) systems which have many characteristics of a networked system. The example shows how the tool can furnish MTBF predictions, and also how it can estimate the fault tolerance coverage, a capability not provided in any of the published reliability models and the tools that support them.

The importance of the methodology and the associated tool rests partly on the assumption that catastrophic failures in well-tested systems usually result from the coincidence of a number of conditions that are individually tolerable or, at least, non-catastrophic. A number of examples of such occurrences are documented in NUREG/CR-6293 . That catastrophic failures result from a combination of failures and mistakes that individually have much lesser consequences is not restricted to digital systems. This process is very much at work in many major industrial, transportation, and energy generation accidents.

Potentially the greatest benefit of MEADEP is when it is used during concept development and early design because deficiencies identified at that time can be corrected at relatively low cost and with few side-effects. The difficulty is that no direct measurements from the system will be available at that time. Even in these circumstances the tool can be used for sensitivity studies, for "what if?" investigations, and for comparison of alternatives under a given set of parameters. As operational data acquired by use of the tool are made available for reliability assessment during earlier phases these benefits will be considerably increased.

 

 

 

 



©2001 SoHaR Corporation. All rights reserved.

 

analysis costing cycle life, availability, block design, block diagram, block diagram software, business continuity, business continuity planning, design fmea, engineering resource, fmea, fmea software, fmeca, fracas, hardware project, high availability, lcc, life cycle cost, life cycle costing, markov, markov chain, mil 217, mil hdbk 217, network analysis system, network availability, ram design, rbd, reliability, reliability analysis, reliability analysis software, reliability engineering, reliability modeling, reliability prediction, reliability prediction software, reliability services, reliability software, reliability tool, system reliability, tco, total cost of ownership, data prediction, process fmea, reliability availability, tco tool, integrated hardware support, prediction software, process flow software, reliability block diagram, sneak circuit analysis, markov reliability, analysis costing cycle life, hardware prediction, failure reporting, sneak circuit, tco analysis, reliability software tool, data safety software, hardware prediction software