Reliability and Safety Software Download
Get a quote
Reliability and Safety Software Demo


 
 
 
 
Reliability Software, Safety and Quality Solutions / Software Reliability & Safety / Software Safety

Software Safety

As systems and products become more and more dependent on software components it is no longer realistic to develop a system safety program that does not include the software elements.

Does software fail? We tend to believe that well written, well tested, safety critical software never fails. Experience proves otherwise with software making headlines when it actually does fail, sometimes critically. Software does not fail the same way hardware does, and the various failure behaviors we are accustomed to from the world of hardware are often not applicable to software. However, software does fail, and when it does, it can be just as catastrophic as hardware failures.

Safety-critical software

Safety-critical software is a creature very different from both non-critical software and safety-critical hardware. The difference lies in the massive testing program that such software undergoes. 

What are "software failure modes"? Software, especially in critical systems, tends to fail where least expected. We are usually extremely good at setting up test plans for the main line code of the program, and these sections usually do run flawlessly. Software does not "break" but it must be able to deal with "broken" input and conditions, which are often causes for "software failures". The task of dealing with abnormal/anomalous conditions and inputs is handled by the exception code dispersed throughout the program. Setting up a test plan and exhaustive test cases for the exception code is by definition difficult and somewhat subjective.  Anomalous inputs can be due to:

  • failed hardware

  • timing problems

  • harsh/unexpected environmental conditions

  • multiple changes in conditions and inputs that are beyond what the hardware is able to deal with.

  • Unanticipated conditions during software mode changes

  • Bad user input

 Often the conditions most difficult to predict are multiple, coinciding, irregular inputs and conditions.

Safety-critical software is usually tested to the point that no new critical failures are observed.  This of course does not mean that the software is fault-free at this point, only that failures are no longer observed in test. Why are the faults leading to these types of failures overseen in test? These are faults that are not tested for any of the following reasons:  

  • Faults in code that is not often used and therefore not well represented in the operational profiles used for testing

  • Faults that are due to multiple anomalous conditions that are difficult to test

  • Faults related to interfaces and controls of failed hardware

  • Faults due to missing requirements 

It is clear why these types of  faults may remain outside of a normal, reliability focused, test plan.  

How does one protect against such failures once software is released? Current guides and standards are not nearly as specific and clear as the hardware equivalent. Most notably RTCA/DO-178B, the Radio Technical Commission for Aeronautics Software Considerations in Airborne Systems and Equipment Certification document, whose latest version is from 1992, and is considered the main guideline for safety-critical software development does not deal in particular with types of failures and how to avoid them. The document deals mainly with process control that will hopefully ensure good software. It does not dictate how one can verify that the process worked "well enough" for the requirements of any particular system. The much awaited update to this document, DO-178C, is expected in 2011 and will most certainly offer more prescriptive guidelines to the certification of safety-critical software. These will be based not only on more current design environments such as Object Oriented Design but in general model-based software and more current formal methods for verification.

As safety-critical software is most often part of a larger system that includes hardware the safety assessment process should follow the process applied to hardware:

Software Preliminary/Functional Hazard Analysis:

If the top level architecture of the system details software components it is necessary that they be included in this qualitative analysis. The PHA is also needed to formulate a software specification that (a) prevents software from interfering with established hazard handling provisions, and (b) directs software design to reinforce hazard handling where established provisions are lacking. The analysis is necessary for completing the later stages of the safety review. SoHaR will work with your requirements and specification documents as well as any early design documents and artifacts available. Model based artifacts such as use-case scenarios are very helpful at this level of analysis.

Software System Hazard Analysis and/or Fault Tree Analysis:

The quantitative hazard and Fault Tree Analyses should include any software that interfaces with safety-critical hardware.

At this stage SoHaR software safety engineers will work with your design products to provide a complete analysis:

  1. System architecture
  2. system requirements document
  3. preliminary/functional hazard analysis
  4. hardware failure information
  5. human error information

A common obstacle to including software in a quantitative analysis is the lack of a failure rate estimates for these components. Software can be constructed so that a specified number of failures can be tolerated. If a system is safety critical it is usually assumed that it will be fielded only after stringent testing which will show no remaining defects in the software code (this does not mean 100% reliability though!). Any remaining sources of failure (associated with the software) can be assumed to be the result of incomplete requirement definition, in particular requirements dealing with rare and anomalous conditions such as hardware failures, rare environmental and usage conditions and unforeseen operator actions. Often combinations of multiple rare events will lead to conditions that the software was not prepared for. An approximate rate for such events can be derived from the size and quality of requirements but cannot be fully verified.  

If the software is not part of a safety critical system/function it may be fielded with a known failure rate (based on the software testing program). In this case this failure rate may be used as an estimate for the fault tree analysis.

Software Failure Modes and Effects Analysis

Failure Modes and Effects Analysis is done at system level based on the Fault Tree analysis (or hazard analysis) results. The fault tree identifies end effects that are to be mitigated. The FMEA will identify "initiating events" which can lead to these end effects.  A software FMEA will identify which initiating events (such as incorrect or missing inputs, particular modes of operation etc) will result in the software causing a system failure. As an example consider incorrect input during a relatively rare operating mode, or a rare input while system is recovering from a hardware failure. The software FMEA is performed only on software components (or subcomponents) that can lead to hazardous conditions and results.

SoHaR performs software FMEAs (and system FMEAs that include software) based on an Object Oriented design. In order to perform this analysis, SoHaR requires the following design products:

  1. System architecture
  2. System requirements document
  3. System hazard analysis and/or Fault Tree Analysis (can be generated by SoHaR)
  4. Software program in an object oriented design environment: this may be in the form of a UML design, a Matlab Simulink design, or the code in any object oriented language (e.g. C++, .NET etc)
  5. Hardware failure  information
  6. Human/operator error information

The FMEA process in an object oriented environment ensures exhaustive identification of exception condition initiators, and verification that protection against faults in exception handling, are in place and effective!

Although slightly different from a hardware FMEA, when properly executed, the software FMEA is compatible with hardware FMEAs and permits a full system FMEA. Hence it provides the assurance, that other certification processes cannot, that we have identified all possible failure modes and have included provisions to detect and protect against them.

Software FMEA - How?

One of the main reasons the FMEA hasn't been a consistent part of critical software certification is the difficulty in applying it to a large piece of code. SoHaR has developed a methodology that overcomes this problem by using the object view of the program. Whether developed as a UML or Matlab Simulink model, or coded in an object-oriented language such as C++, .Net or Java, we apply our FMEA methodology at the object level.

Along with requirements and design documents we are able to construct a software FMEA that is surprisingly similar to a hardware FMEA, as software "object methods" are equivalent to hardware "parts". Moreover, when required, we will develop and generate a system FMEA which will include hardware and software and any interface failure modes.  

Our method overcomes another inherent software FMEA problem that most professionals cannot escape: the subjectivity of the process. Most software safety professionals will apply the FMEA at a "functional" level. This application is not only problematic in that it can leave entire sections of the exception code unevaluated, but it also introduces a subjectivity into the process that allows more failure modes to be ignored. Our object-centered method removes this subjectivity as it uses the classes defined in the design.     

Automated Software FMEA

FMEAs, applied to software or hardware, are a large task. Hardware FMEAs are automated through an exhaustive system breakdown tree, or Bill Of Material. SoHaR has developed automated tools and methods for generating the software FMEA based on object-oriented software models. Our tools are currently able to automatically generate the FMEA structure for models developed in UML (Unified Modeling Language) or within the Matlab Simulink environment. Benefits of using our automated tools include:

  •  A significant reduction in work load (by several orders of magnitude)

  • Assurance of completeness of the task (no failure modes left behind)

  • Libraries for future use that reduce work load even more (software and interface components, failure modes, higher order effects, detection methods, compensation provisions)

What Can You Expect From SoHaR's Software FMEA Services and Tools?

SoHaR provides both consulting services and tools for the Software FMEA. Our services cover the entire spectrum of organizational needs:

  • SoHaR can perform the entire task of developing the FMEA for your system and generating the complete FMEA reports.

  • SoHaR can provide consulting to an in-house effort which may include any combination of: training, system set-up, tools and/or continuous program support.

Either way, SoHaR will walk you through the process so that your organization is able to successfully complete the FMEA and fully trust the results.

What will our FMEA and reports include? 

  • List of critical failure modes and whether they have been accounted for in the design;

  • List of provisions (detection methods & compensation provisions) required to make the current system safe.

At the end of every effort, the reports and electronic libraries developed in the process will lead to an easier task in future FMEA efforts. As in the case of hardware, a software FMEA is an incredibly valuable addition to the organizational knowledge base, allowing for safer and less costly programs in the future.  

Requirements V&V

V&V of software requirements is at least as crucial as V&V for hardware, if not more so. Most serious failures in safety and mission critical software are due to incomplete or incorrect requirement definition. As software does not fail randomly and hardly ever due to actual coding defects, most failures are the result of the code not being designed to deal with certain (mostly rare) events: conditions and inputs. Moreover, it is in the requirements that mitigations for failures are listed. For serious failures, multiple (redundant) mitigation strategies are required. A safety-informed requirements V&V focuses on these types of omissions. 

In order to perform a requirements review that can focus on safety-aspects of the code SoHaR requires the following design products:

  1. System architecture
  2. Complete system requirements documents
  3. System hazard analysis and/or Fault Tree Analysis

 For more information about SoHaR's Software Reliability and Safety program please contact us at becky@sohar.com

 

 
Customers
OOPS. Your Flash player is missing or outdated.Click here to update your player so you can see this content.