System Safety and System Reliability
System safety is an engineering
discipline separate from system
reliability and maintainability.
Whiles R&M focuses on failure
mitigation, Safety focuses on hazard
mitigation. These two outlooks do not
necessarily coincide.
As defined in MIL-STD-882C, Safety is
defined as "freedom from those
conditions that can cause death,
injury, occupational illness, or
damage to or loss of equipment or
property, or damage to the
environment". Similar definitions (e.g.
IEEE STD-1228) emphasize elimination
of hazards and accidents. Although in
all these definitions the injury and
damage are caused accidentally they do
not directly refer to system failures
or system reliability in any way.
Failures can cause injury and damage,
but system safety is more general in
that it includes also conditions that
are not necessarily the result of
failure. Moreover, there are clear
cases when increasing safety (through
the elimination or mitigation of a
hazard) decreases the reliability and
maintainability of a system. Examples
include such mundane systems as
elevators with automated trip
mechanisms that are triggered under
conditions for a potential hazard
(e.g. door malfunction) that do not
necessarily affect the operational
state. Under such circumstances the
elevator is safely disabled but
completely unreliable as it is not
functioning at all.
There are of course many systems and
conditions where reliability and
safety do align, when
proper straightforward functioning of
a system (without failures) is enough
for both reliable and safe operation.
However this is not the general case
and having a good reliability program
does not necessarily mean that system
safety is being effectively managed
and satisfied.
Hazard Analysis
Preliminary Hazard Analysis (PHA)
System Hazard Analysis (SHA)
Subsystem Hazard
Analysis
Event Tree Analysis (ETA)
Risk Assessment
Safety Management
Hazard Analysis

There are some variations on the
definition of the "hazard" concept,
whether it is considered an intrinsic
property of an item or a set of
conditions that involves both the item
and the use environment. A definition
that represents a wide spectrum of
views defines a hazard as a "State or
set of conditions of an item that,
together with other conditions in the
environment of the item will lead to
an accident". Here "item" may refer to
a system, subsystem, or object; and
variations can include qualifications
such as "inevitably". However, clearly
a hazard must be defined with respect
to the environment in which the item
exists and/or is operating. This
definition also illuminates the fact
that a hazard can exist without a
system failure: it can be entirely due
to a combination of operational system
state and environmental conditions
(e.g. landing in bad weather).
Hazard analysis should take place,
iteratively, over the entire lifecycle
of the system and typically will yield
different types of results at the
different stages.
Preliminary Hazard
Analysis (PHA)

PHA can start as early as concept
exploration or at the very early
design stage. The PHA identifies the
critical system functions and broad
system hazards. The results are used
to include safety considerations in
concept trade-off analyses and design
alternative comparisons. Naturally
hazards related to implementation
details, that are not within the
critical system functions, will not be
identified at this stage. The results
are qualitative and risk assessment
is usually not complete at this stage.
The PHA is evaluated and updated
iteratively as the initial design
steps are taken. The PHA also provides
input to later stage analyses.
Despite limited detail, the PHA
provides critical input at a critical
time. A decision to skip the
preliminary hazard analysis and wait
for a time "when we know more about
the details of the system" can lead to
costly results as safety is not
included in the concept and design
tradeoff analyses. It can lead to:
-
Significant
late stage design/engineering
modifications
-
Costly operational and maintenance
requirements in order to mitigate
hazards
-
Failure
at market due to cost, safety and
liability issues
Although later stage hazard analysis
yields more detail on the hazards the
preliminary analysis offers
information that is more likely to
critically affect the success of the
program in the long run.
SoHaR will work with your design team
and requirement/design documents to
assess, list, and prioritize hazards
at the early stages of program
lifecycle. Our engineers are
experienced at focusing in on the
hazards and identifying, even at the
early stages of design, potential
hazards that may be overlooked by
design engineers intent on delivering
the best functionality. We will bring
to the table a safety-centric outlook
to complement your design-efforts and
ensure that your safety requirements
are not left for last.
System Hazard Analysis (SHA)

System Hazard Analysis (SHA) is
commonly performed once design is
fleshed out, in parallel with the
preliminary design review. The
analysis is iteratively updated with
the design. Whereas Preliminary
Hazard Analysis focuses on critical
functionality and broad system
hazards, System hazard analysis
focuses in on the details.
Specifically we are interested in
-
Overall system operation with
attention to users, modes and
varying environments
-
Interfaces between subsystems and
their interdependent compliance with
the overall safety requirements
-
Whether
design changes have affected safety
The results of the SHA are used to
recommend changes, identify required
controls, and evaluate how the design
responds to safety requirements.
SHA requires attention to details of
the design, knowledge of operational
environments and system mode changes
that can lead to unforeseen
combinations of conditions. It
requires extensive experience with
"typical" hazard scenarios combined
with detailed knowledge of the domain.
SoHaR safety engineers will
collaborate with your design and
system engineers to ensure no hazards
are overlooked and their causes and
consequences are adequately
identified.
SHA will often include quantitative
assessment of hazard: probability of
occurrence and severity of
consequence. These are required for
follow-up
risk assessment.
Subsystem Hazard Analysis (SSHA)

Subsystem Hazard analysis (SSHA) is
similar to SHA in its goals and
methods, however its scope is limited
to subsystems as components. It is
often initiated at a later stage when
details of the subsystems become
available. Failure modes as
contributing to hazards are focused on
at the subsystem level and the
detailed interfaces between components
are investigated for possible
conditions leading to hazards. Here we
investigate how each single component
affects the safety of the entire
system while in the SHA we focus on
the collaborative effects of
components working together.
Hazards identified in the SHA and
linked to specific conditions of
subsystems are investigated and their
probability of occurrence are
estimated based on such input as
component reliability and human error.
Quantifiable input is added as the
specifics of the design emerge.
Subsystems may include a single
"media-type" (electronics, software,
mechanical) but are often integrated.
Embedded software-hardware systems or
electromechanical actuators are
examples of mixed-media subsystems
that require an integrated SSHA. Even
when a subsystem is composed purely of
one engineering field it is still
recommended that the SSHA be performed
by safety engineers rather than design
or system engineers. The goal of the
analysis is to isolate the hazards and
safety issues from the design and
functional operation of the system. A
design engineer with a strong view of
the design of the subsystem will have
difficulty looking away from mainline
operation, as will a system engineers.
It is the role of the safety engineer
to provide the unique view that
focuses on potential mishaps and
hazardous conditions.
Event Tree Analysis

Event Tree Analysis (ETA) is a
bottom-up technique for analyzing the
various outcomes of initiating events.
It is often used in conjunction with
Fault Tree Analysis which is a
top-bottom method for analyzing and
quantifying system failures. If a
system is small or simple enough to
allow for a complete Fault Tree
Analysis we may not have to analyze
Event Trees. However, in most cases a
system cannot be fully analyzed
top-to-bottom and the Event Tree
allows us to section off parts of the
system for the Fault Tree Analysis.
The Event Tree begins with an
initiating event and searches forward
for all possible outcomes branching
out at nodes signifying:
-
possible conditions (e.g. windy
conditions during a gas leak)
-
possible states (e.g. cargo door
unlatched during emergency landing)
-
possible malfunction of mitigating
factors (e.g. sprinklers in the case
of fire)

The Event Tree allows us to estimate
the probability of outcomes based on
the relative probabilities of the
branches leading up to the outcome:

Although the structure of an Event
Tree is simple, there are several
challenges in performing a useful
Event Tree Analysis:
-
Maintaining a global view of the
system as a whole and not only of its
various functions. To this end it is
preferable to use a safety engineer to
perform the analysis rather than a
design or system engineer who are
preoccupied with correct function.
Safety issues are often not related to
a system malfunctioning but rather to
a negative synergistic effect of
system mode, system state,
environmental conditions and user
actions. A safety engineering view
considers all these inputs and is
aware of interfaces that may not be up
to the challenge when certain
conditions and states align.
-
One of the most difficult elements in
the ETA is listing the correct
initiating events and the correct
branching nodes: whether these take
place in a nuclear power plant or
onboard a space shuttle, the
progression of an event to a possible
outcome has to be related to
conditions we can interpret . As an
example we consider microscopic crack
formation. The Event Tree for this
scenario should not (and cannot) track
the physics of crack propagation to a
disastrous accident. Rather the Event
Tree should simulate how our
mitigation elements deal with the
crack. For example: a branching node
may correspond to the probability of
the crack being unnoticed in routine
maintenance when it is 1mm or shorter;
a second node would correspond to the
probability of it being noticed at
3mm; and so on. For these nodes we
should have reasonably good estimates
and interpretations.
-
Quantitative input: often the
probabilities available for the node
outcomes are approximate or based on
broad assumptions. This uncertainty
propagates from the nodes to the
possible outcomes and should be taken
into account in the resulting outcome
probabilities.
The Event Tree Analysis is very useful
in identifying the areas where we
should use
Fault Trees or
FMECAs to investigate further.
Fault Trees are often used to evaluate
the probabilities at specific nodes
(e.g. probability of a sprinkler not
functioning).
Event Trees bring to the surface
protection system features that are
most crucial to eliminating risk
allowing us to take steps to reduce
their failure probability. Although
they seem to have a simple form, they
are a very powerful tool for an
overall safety assessment of system
and subsystem safety.
Risk Assessment

Risk assessment in the safety context
directly relates to the hazard. Risk
combines the hazard level with the
likelihood of it leading to an
accident (danger) and the duration of
or exposure to the hazard.
The Event Tree Analysis can provide
direct input to Risk Assessment.
The evaluation of risk almost always
requires qualitative input and
judgment. Often the probabilities
involved in the three components are
interdependent and one cannot assume a
simplified independent probabilities
product. A meaningful risk assessment
requires experience and a very good
understanding of system usage. Risk
assessment is complementary to the
design effort - it requires looking at
negative outcomes. Here too the
perspective of a safety engineer is
healthy not only because of the
specialized experience but also
because of their ability to disconnect
from mainline function and focus on
negative, anticipated or unexpected
outcomes.
Safety Management

In many industries management of the
safety aspects of design,
implementation & operation of a system
or installation require specialized
tools that guarantee that no hazard,
safety task or design concern will be
overlooked. Safety Management Systems
(SMS) , software applications
developed to manage this effort, can
assure management, customers, and
regulatory organizations that safety
requirements of the program are
successfully and continuously met.
In particular in the aviation industry
current
guidelines for the establishment of
SMS at airports include:
ICAO
Annex 14:
A
systematic approach to managing
safety, including the necessary
organizational structures,
accountabilities, policy and
procedures.
FAA
AC 150/5200-37
Formal business like approach to
managing safety risk. It includes
systematic procedures, practices and
policies for the management of safety.
The
main goals of a safety management
system:
-
identify possible hazards with
significant risk of an accident,
injury or damage;
-
select appropriate corrective action
to eliminate this risk or to reduce
it to acceptable levels;
-
monitor the corrective action taken
and test its efficiency;
can be controlled via a single SMS
application that is flexible enough
to meet these requirements throughout
program lifecycle.
Establishing a formal reporting
procedure within the safety management
is an important element that allows
monitoring the level of safety
performance achieved throughout the
organization. Thus being aware of
every possible threat or risk and
taking appropriate corrective action
to minimize these risks.
Our FavoWeb
FRACAS software
tool provides the four essential
requirements of a successful SMS:
1.
Collection and management of
operational data
FavoWeb FRACAS functionality and
configuration allows for uniform and
easy data collection. Forms are
flexible and allow customization of
the fields and data to be collected
and tracked. FavoWeb FRACAS is user
friendly and wherever possible can
operate with "point and-click"
options, so that data collection is
easy and therefore accurate. This also
facilitates analysis and reporting as
"free-text" input is minimized.
2.
Analysis of the data
FavoWeb FRACAS offers numerous
reliability and statistics trending
reports combined with a flexible query
mechanism.
3.
Risk
Assessment
Our
RAM Commander Safety
Assessment Software Module linked
to the FavoWeb application
implements all qualitative and
quantitative tasks for safety
assessment required during system
development:
-
Generation and verification of safety
requirements;
-
Identification of all relevant failure
conditions;
-
Consideration of all significant
combinations of failures causing
failure conditions;
-
Generation of output reports beginning
with Functional Hazard Analysis (FHA/PHA)
and ending with System Safety
Assessment (SSA), verifying that every
aspect of the design meets safety
requirements.
4.
Corrective Action Effectiveness
Assessment
FavoWeb FRACAS Corrective Actions
module provides full support of all
corrective action activities: