As systems and products become more
and more dependent on software
components it is no longer realistic
to develop a system safety program
that does not include the software
elements.
Does software fail?
We tend to believe that well written,
well tested, safety critical software
never fails. Experience proves
otherwise with software making
headlines when it actually does fail,
sometimes critically. Software does
not fail the same way hardware does,
and the various failure behaviors we
are accustomed to from the world of
hardware are often not applicable to
software. However, software does fail,
and when it does, it can be just as
catastrophic as hardware failures.
Safety-critical
software
Safety-critical software is a creature
very different from both non-critical
software and safety-critical hardware.
The difference lies in the massive
testing program that such software
undergoes.
What are "software
failure modes"?
Software, especially in critical
systems, tends to fail where least
expected. We are usually extremely
good at setting up test plans for the
main line code of the program, and
these sections usually do run
flawlessly. Software does not "break"
but it must be able to deal with
"broken" input and conditions, which
are often causes for "software
failures". The task of dealing with
abnormal/anomalous conditions and
inputs is handled by the exception
code dispersed throughout the program.
Setting up a test plan and exhaustive
test cases for the exception code is
by definition difficult and somewhat
subjective. Anomalous inputs can be
due to:
-
failed hardware
-
timing problems
-
harsh/unexpected
environmental conditions
-
multiple changes
in conditions and inputs that are
beyond what the hardware is able to
deal with.
-
Unanticipated
conditions during software mode
changes
-
Bad user input
Often the conditions most difficult
to predict are multiple, coinciding,
irregular inputs and conditions.
Safety-critical software is usually
tested to the point that no new
critical failures are observed. This
of course does not mean that the
software is fault-free at this point,
only that failures are no longer
observed in test. Why are the faults
leading to these types of failures
overseen in test? These are faults
that are not tested for any of the
following reasons:
-
Faults in code
that is not often used and therefore
not well represented in the
operational profiles used for
testing
-
Faults that are
due to multiple anomalous conditions
that are difficult to test
-
Faults related to
interfaces and controls of failed
hardware
-
Faults due to
missing requirements
It is clear why these types of faults
may remain outside of a normal,
reliability focused, test plan.
How does one protect against such
failures once software is released?
Current guides and standards are not
nearly as specific and clear as the
hardware equivalent. Most notably
RTCA/DO-178B, the Radio Technical
Commission for Aeronautics Software
Considerations in Airborne Systems and
Equipment Certification document,
whose latest version is from 1992, and
is considered the main guideline for
safety-critical software development
does not deal in particular with types
of failures and how to avoid them. The
document deals mainly with process
control that will hopefully ensure
good software. It does not dictate how
one can verify that the process worked
"well enough" for the requirements of
any particular system. The much
awaited update to this document,
DO-178C, is expected in 2011 and will
most certainly offer more prescriptive
guidelines to the certification of
safety-critical software. These will
be based not only on more current
design environments such as Object
Oriented Design but in general
model-based software and more current
formal methods for verification.
As safety-critical software is most
often part of a larger system that
includes hardware the safety
assessment process should follow the
process applied to hardware:
Software
Preliminary/Functional Hazard
Analysis:
If the top level architecture of the
system details software components it
is necessary that they be included in
this qualitative analysis. The PHA is
also needed to formulate a software
specification that (a) prevents
software from interfering with
established hazard handling
provisions, and (b) directs software
design to reinforce hazard handling
where established provisions are
lacking. The analysis is necessary for
completing the later stages of the
safety review. SoHaR will work with
your requirements and specification
documents as well as any early design
documents and artifacts available.
Model based artifacts such as use-case
scenarios are very helpful at this
level of analysis.
Software System
Hazard Analysis and/or Fault Tree
Analysis:
The quantitative hazard and Fault Tree
Analyses should include any software
that interfaces with safety-critical
hardware.
At this stage SoHaR software safety
engineers will work with your design
products to provide a complete
analysis:
-
System
architecture
-
system
requirements document
-
preliminary/functional hazard
analysis
-
hardware failure
information
-
human error
information
A common obstacle to including
software in a quantitative analysis is
the lack of a failure rate estimates
for these components. Software can be
constructed so that a specified number
of failures can be tolerated. If a
system is safety critical it is
usually assumed that it will be
fielded only after stringent testing
which will show no remaining
defects in the software code (this
does not mean 100% reliability
though!). Any remaining sources of
failure (associated with the software)
can be assumed to be the result of
incomplete requirement definition, in
particular requirements dealing with
rare and anomalous conditions such as
hardware failures, rare environmental
and usage conditions and unforeseen
operator actions. Often combinations
of multiple rare events will lead to
conditions that the software was not
prepared for. An approximate rate for
such events can be derived from the
size and quality of requirements but
cannot be fully verified.
If the software is not part of a
safety critical system/function it may
be fielded with a known failure rate
(based on the software testing
program). In this case this failure
rate may be used as an estimate for
the fault tree analysis.
Software Failure
Modes and Effects Analysis
Failure Modes and Effects Analysis is
done at system level based on the
Fault Tree analysis (or hazard
analysis) results. The fault tree
identifies end effects that are to be
mitigated. The FMEA will identify
"initiating events" which can lead to
these end effects. A software FMEA
will identify which initiating events
(such as incorrect or missing inputs,
particular modes of operation etc)
will result in the software causing a
system failure. As an example consider
incorrect input during a relatively
rare operating mode, or a rare input
while system is recovering from a
hardware failure. The software FMEA is
performed only on software components
(or subcomponents) that can lead to
hazardous conditions and results.
SoHaR performs software FMEAs (and
system FMEAs that include software)
based on an Object Oriented design. In
order to perform this analysis, SoHaR
requires the following design
products:
-
System
architecture
-
System
requirements document
-
System hazard
analysis and/or Fault Tree Analysis
(can be generated by SoHaR)
-
Software program
in an object oriented design
environment: this may be in the form
of a UML design, a Matlab Simulink
design, or the code in any object
oriented language (e.g. C++, .NET
etc)
-
Hardware failure
information
-
Human/operator
error information
The FMEA process in
an object oriented environment ensures
exhaustive identification of exception
condition initiators, and verification
that protection against faults in
exception handling, are in place and
effective!
Although slightly
different from a hardware FMEA, when
properly executed, the software FMEA
is compatible with hardware FMEAs and
permits a full system FMEA. Hence it
provides the assurance, that other
certification processes cannot, that
we have identified all possible
failure modes and have included
provisions to detect and protect
against them.
Software FMEA -
How?
One of the main
reasons the FMEA hasn't been a
consistent part of critical software
certification is the difficulty in
applying it to a large piece of code. SoHaR has developed a methodology that
overcomes this problem by using the
object view of the program. Whether
developed as a UML or Matlab Simulink
model, or coded in an object-oriented
language such as C++, .Net or Java, we
apply our FMEA methodology at the
object level.
Along with
requirements and design documents we
are able to construct a software FMEA
that is surprisingly similar to a
hardware FMEA, as software "object
methods" are equivalent to hardware
"parts". Moreover, when required, we
will develop and generate a system
FMEA which will include hardware and
software and any interface failure
modes.
Our method
overcomes another inherent software
FMEA problem that most professionals
cannot escape: the subjectivity of the
process. Most software safety
professionals will apply the FMEA at a
"functional" level. This application
is not only problematic in that it can
leave entire sections of the exception
code unevaluated, but it also
introduces a subjectivity into the
process that allows more failure modes
to be ignored. Our object-centered
method removes this subjectivity as it
uses the classes defined in the
design.
Automated
Software FMEA
FMEAs, applied to
software or hardware, are a large
task. Hardware FMEAs are automated
through an exhaustive system breakdown
tree, or Bill Of Material. SoHaR has
developed automated tools and methods
for generating the software FMEA based
on object-oriented software models.
Our tools are currently able to
automatically generate the FMEA
structure for models developed in UML
(Unified Modeling Language) or within
the Matlab Simulink environment.
Benefits of using our automated tools
include:
-
A
significant reduction in work load
(by several orders of magnitude)
-
Assurance of
completeness of the task (no failure
modes left behind)
-
Libraries for
future use that reduce work load
even more (software and interface
components, failure modes, higher
order effects, detection methods,
compensation provisions)
What Can You
Expect From SoHaR's Software FMEA
Services and Tools?
SoHaR provides both
consulting services and tools for the
Software FMEA. Our services cover the
entire spectrum of organizational
needs:
-
SoHaR can perform
the entire task of developing the
FMEA for your system and generating
the complete FMEA reports.
-
SoHaR can provide
consulting to an in-house effort
which may include any combination
of: training, system set-up, tools
and/or continuous program support.
Either way, SoHaR
will walk you through the process so
that your organization is able to
successfully complete the FMEA and
fully trust the results.
What will our FMEA
and reports include?
At the end of every
effort, the reports and electronic
libraries developed in the process
will lead to an easier task in future
FMEA efforts. As in the case of
hardware, a software FMEA is an
incredibly valuable addition to the
organizational knowledge base,
allowing for safer and less costly
programs in the future.
Requirements
V&V
V&V of software
requirements is at least as crucial as
V&V for hardware, if not more so. Most
serious failures in safety and mission
critical software are due to
incomplete or incorrect requirement
definition. As software does not fail
randomly and hardly ever due to actual
coding defects, most failures are the
result of the code not being designed
to deal with certain (mostly rare)
events: conditions and inputs.
Moreover, it is in the requirements
that mitigations for failures are
listed. For serious failures, multiple
(redundant) mitigation strategies are
required. A safety-informed
requirements V&V focuses on these
types of omissions.
In order to perform
a requirements review that can focus
on safety-aspects of the code SoHaR
requires the following design
products:
-
System
architecture
-
Complete system
requirements documents
-
System hazard
analysis and/or Fault Tree Analysis
For
more information about SoHaR's
Software Reliability and Safety
program please contact us at
[email protected]
|