Sei sulla pagina 1di 166

ASCAD

/az-kad/

Adelard
Safety Case
Development
Manual

First Published 1998 by Adelard


College Building, Northampton Square, London EC1V 0HB

Adelard, 1998

Foreward
Following research for the UK HSE/NII in the 1990's, Adelard published its Safety
Case Development Manual (ASCAD) in 1998. This has successfully been used in
many organisations worldwide since then.
In support of the safety community Adelard has decided to make the manual
publicly available. It can be downloaded, after registration, at our website
http://www.adelard.com/resources/ascad
While now available free of charge to individuals, copyright is retained by
Adelard. Conditions of use are:

The manual may only be used by the individual who downloads the
document. It may not be passed on to anyone else without permission
from Adelard. Other interested parties should download the document
from our website. Anyone who has difficulty downloading the document
should contact Adelard to discuss other options.
The manual may be used freely by registered users, both for commercial
and non-commercial use.
While Adelard believes the content to be accurate, it accepts no
responsibility for any consequence of use, either direct or indirect. Use of
the manual implies acceptance of this and all other conditions.
The content of the manual may not be reproduced in any format (other
than for backup purposes) without agreement from Adelard in writing.
The document may be used is support of both academic teaching and
research, and in both cases some of the above restrictions may be
waived. Contact <office@adelard.com> for more information.
The document is available free of charge in softcopy only. Hard copy
versions are available at a nominal reproduction charge. Contact
<office@adelard.com> for more information.

Published 1998 by Adelard, 3 Coborn Rd, London E3 2DA


Published 2006 by Adelard, College Building, Northampton Square, London EC1V 0HB
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted in any form, or by any means electronic, mechanical,
photocopying, recording or otherwise without prior permission in writing from Adelard.
British Library Cataloguing in Publication Data
ASCADAdelard Safety Case Development Manual

ISBN 0 9533771 0 5

Adelard Safety Case Development Manual

Adelard
Adelard is an independent consultancy founded in 1987 by Robin Bloomfield and
Peter Froome. Adelard works on a wide spectrum of problems in the area of the
assurance and development of safety related computer-based systems, ranging
from formal machine assisted verification to the human and social vulnerabilities
of organisations. We also apply this specialist knowledge to the development and
verification of real industrial systems.
http://www.adelard.com

Adelard of Bath
Adelard takes its name from Adelard of
Bath, a medieval mathematician and
natural philosopher, a crucial figure in the
development of early European thought,
and a major influence in the
revolutionary adoption of the Arabic
notation for numbers instead of the
intractable Roman numerals.
Adelards most influential works were on mathematics. He translated Euclids
Elementsstill the basis of much of todays mathematicsfrom Arabic into Latin,
the international language of European scholarship. He was also the author of a
Latin version of a treatise on Arabic arithmetic by al-Khwarizmi, the great Saracen
mathematician whose name, corrupted to algorism, became the European word
for the new system of numbers.

Version: 1.0

Adelard Safety Case Development Manual

Contents
Part 1 Introduction............................................................................................................. 7
1 Scope ............................................................................................................................. 7
2 What is a safety case?.................................................................................................. 8
3 The importance of a good safety case ...................................................................... 8
4 Basis of the ASCAD methodology ............................................................................... 8
5 How to use the manual................................................................................................. 9
6 Feedback..................................................................................................................... 11
7 Acknowledgements ................................................................................................... 11
Part 2 Description of the safety case methodology.................................................... 13
1 Introduction.................................................................................................................. 13
2 Overview of approach ............................................................................................... 14
2.1 Safety case principles ......................................................................................... 14
2.2 Safety case structure ........................................................................................... 14
2.3 Types of claim....................................................................................................... 17
2.4 Sources of evidence............................................................................................ 17
2.5 Style of argument................................................................................................. 18
3 Safety case development.......................................................................................... 21
3.1 Safety case elements .......................................................................................... 21
4 Developing Preliminary safety case elements ........................................................ 22
4.1 Definition of system and project ........................................................................ 23
4.1.1 Operating context ............................................................................................ 23
4.1.2 Identify any defined PES (Programmable Electronic System) or component
safety requirements..................................................................................................... 24
4.1.3 Existing safety and project information ........................................................ 24

Version: 1.1

Adelard Safety Case Development Manual

4.2 Develop claims from attributes ..........................................................................24


4.2.1 Computer system architecture ...................................................................... 25
4.2.2 Software attributes............................................................................................ 26
4.3 Traceability between levels ................................................................................27
4.4 Establish project constraints................................................................................28
4.5 Long term issues....................................................................................................29
5 Developing Architectural safety case elements .....................................................31
5.1 Design for assessment..........................................................................................32
5.1.1 Keeping it simple (KISS)...................................................................................... 33
5.1.2 Partitioning according to criticality ................................................................ 34
5.1.3 Avoidance of novelty........................................................................................ 35
5.2 Sources of evidence ............................................................................................35
5.3 Design assumptions..............................................................................................36
5.4 Choosing a suitable system architecture and safety case ............................36
5.5 Risk assessment and review ................................................................................37
6 Developing Implementation safety case elements ................................................39
6.1 Attribute-claim-evidence tables........................................................................40
6.2 Risk Assessment and review ................................................................................41
7 Operation and installation safety case elements ..................................................43
8 Project safety case structure......................................................................................44
8.1 Relationship to project lifecycle and structure ................................................44
8.2 Influence of types of system ...............................................................................46
8.3 Subsystem safety case.........................................................................................47
9 Independent assessment and acceptance of the safety case ............................48
10 Long-term maintenance ..........................................................................................50
11 Contents of a safety case report: documentation issues ......................................52
11.1 Environment description.....................................................................................53
11.2 PES safety requirements......................................................................................54
11.3 PES system architecture......................................................................................54
11.4 Planned and actual implementation approach............................................54

Version: 1.1

Adelard Safety Case Development Manual

11.5 PES system architecture safety argument ....................................................... 55


11.6 Subsystem design and safety arguments ........................................................ 56
11.7 Long term support requirements....................................................................... 56
11.8 Status information ............................................................................................... 56
11.9 Evidence of quality and safety management ............................................... 57
11.10 References......................................................................................................... 57
Appendix A System safety context............................................................................... 59
A.1 Safety-related standards in the public domain ............................................... 61
A.2 Other safety guidance ........................................................................................ 62
A.3 Example criteria.................................................................................................... 63
A.3.1 Probabilistic criteria ........................................................................................... 63
A.3.2 Deterministic criteria ......................................................................................... 64
A.3.3 Qualitative criteria............................................................................................. 65
Appendix B Design options to limit dangerous failures ............................................. 67
B.1 Computer system defences ................................................................................ 67
B.2 Software defences ............................................................................................... 69
B.3 Operations and maintenance error defences ................................................. 71
Appendix C Checklist of safety documents ............................................................... 73
C.1 Planning ............................................................................................................... 73
C.2 Safety cases......................................................................................................... 73
C.3 Safety related documentation ......................................................................... 74
C.4 Project implementation ..................................................................................... 74
C.5 Review and audits .............................................................................................. 74
Appendix D Attribute-claim-evidence tables ............................................................ 75
D.1 Attribute-claim-design tables ............................................................................ 75
D.2 Attribute-claim-argument tables ..................................................................... 79
Appendix E Review of changes that can affect the safety case ............................. 83
E.1 Changed PES system requirements .................................................................... 83
E.2 Impending obsolescence.................................................................................... 85
E.3 Changes to regulatory environment or safety criteria..................................... 88
Appendix F Safety case review checklist .................................................................... 91

Version: 1.1

Adelard Safety Case Development Manual

F.1 Basis for the checklists ..........................................................................................91


F.2 Demonstrable.....................................................................................................91
F.3 Valid.....................................................................................................................91
F.4 Adequately safe ................................................................................................92
F.5 Over its entire lifetime........................................................................................92
F.6 Checklist for the technical adequacy of the arguments ................................93
F.6.1 Completeness of argument ............................................................................. 93
F.6.2 Credibility of argument ..................................................................................... 94
F.6.3 Integrity of the safety case documentation and system design .............. 94
F.6.4 Checklist for integrity of the operations and maintenance infra-structure94
F.7 Long-term maintainability of the safety case....................................................95
F.7.1 Robustness to system change.......................................................................... 95
F.7.2 Long-term integrity of the safety case support infra-structure .................. 95
F.7.3 Impact of technological obsolescence ........................................................ 96
F.7.4 Impact of regulatory change .......................................................................... 96
Appendix G Use of field evidence to support a reliability claim...............................97
G.1 Empirical evidence.............................................................................................97
G.2 Theoretical analysis .............................................................................................99
G.3 Application of the theory to COTS..................................................................101
G.4 Application to a new system ...........................................................................103
G.5 Estimating residual faults ..................................................................................103
G.6 References .........................................................................................................105
Appendix H Long term issues......................................................................................107
H.1 Introduction ........................................................................................................107
H.2 Incorporating the guidance in existing safety management processes ...107
H.3 Long-term improvement of the safety methodology ...................................109
H.4 Safety case maintenance documentation ...................................................109
Appendix I Maintenance and human factors ..........................................................113
I.1 Individual weaknesses..........................................................................................113
I.2 Supporting materials ...........................................................................................115
I.3 Violations ...............................................................................................................115
I.4 Group weaknesses ...............................................................................................115

Version: 1.1

Adelard Safety Case Development Manual

I.5 Organisational issues ........................................................................................... 116


I.6 Knowledge management .................................................................................. 117
I.7 References ........................................................................................................... 118
Appendix J Example checklist long term issues ...................................................... 119
J.1 Basis for the checklists........................................................................................ 119
J.2 Remain acceptable....................................................................................... 120
J.2.1 Demonstrable.................................................................................................... 121
J.2.2 Consistent........................................................................................................... 123
J.2.3 Valid .................................................................................................................... 124
J.2.4 Adaptable ......................................................................................................... 124
J.3 Respond to changes in the equipment, environment, and technical knowledge
..................................................................................................................................... 125
J.3.1 Equipment Changes........................................................................................ 125
J.3.2 Changes in the environment ......................................................................... 126
J.3.3 Changes in technical knowledge ................................................................ 127
J.4 The checklists ...................................................................................................... 129
J.5 Demonstrable ..................................................................................................... 129
J.5.1 Human resources............................................................................................. 129
J.5.2 Documentation ............................................................................................... 131
J.5.3 Technical resources ........................................................................................ 132
J.6 Consistent............................................................................................................ 132
J.7 Valid ..................................................................................................................... 133
J.8 Adaptable........................................................................................................... 134
J.9 Respond to changes in the equipment ......................................................... 136
J.10 Respond to changes in the environment .................................................... 137
J.11 Respond to changes in the technical knowledge ...................................... 139
J.12 Long-term improvement of the safety methodology ................................. 140
Appendix K Example safety case.............................................................................. 141
K.1 The environment ................................................................................................. 141
K.1.1 The plant ............................................................................................................ 141
K.1.2 Sensors and actuators..................................................................................... 141
K.1.3 Failure modes.................................................................................................... 141
K.2 Trip system requirements.................................................................................... 141

Version: 1.1

Adelard Safety Case Development Manual

K.3 Candidate system architecture ........................................................................142


K.3.1 Redundant channels and thermocouples ................................................. 143
K.3.2 Fail-safe design features ................................................................................. 144
K.3.3 Separate monitor computer.......................................................................... 145
K.3.4 Simplicity ............................................................................................................ 145
K.3.5 Formally proved software ............................................................................... 145
K.3.6 1oo2 high trip logic .......................................................................................... 145
K.3.7 2oo2 low trip logic............................................................................................ 146
K.3.8 Program and trip parameters in PROM ....................................................... 146
K.3.9 Modular hardware replacement.................................................................. 146
K.3.10 Use of mature hardware and software tools............................................ 146
K.3.11 Access constraints.......................................................................................... 146
K.3.12 Summary of design features contributing to safety ................................ 146
K.4 Evidence from the development process .......................................................148
K.5 Long term support activities ..............................................................................148
K.6 Arguments supporting the safety claims..........................................................149
K.7 Supporting analyses............................................................................................153
K.7.1 Probabilistic fault tree analysis....................................................................... 153
K.7.2 Anticipated change analysis......................................................................... 155
K.7.3 Analysis of maintenance and operations ................................................... 156
K.8 Safety long-term support requirements............................................................157
K.8.1 Support infrastructure ...................................................................................... 157
K.8.2 Maintenance support risks ............................................................................. 158
K.8.3 Regular analyses .............................................................................................. 159
K.9 Elaboration to subsystem requirements ...........................................................159
K.9.1 Software Functional requirements ................................................................ 160
K.9.2 Safety case design constraints imposed on the software ....................... 161
K.9.3 Safety case evidence requirements for the software development .... 162
K.9.4 Software documentation/QA requirements............................................... 162
K.10 References: ......................................................................................................162
Appendix L Index .........................................................................................................163

Version: 1.1

Adelard Safety Case Development Manual

Part 1 Introduction
1 Scope
This manual defines the Adelard safety case development methodology (ASCAD)
which seeks to minimise safety risks and commercial risks by constructing a
demonstrable safety case. The ASCAD methodology places the main emphasis
on claims about the behaviour of the system (i.e. functional behaviour and system
attributes) and methods for structuring the safety arguments which are both
understandable and traceable.
The overall approach used in ASCAD is generic and applicable across a wide
range of technologies. The details of the approach are concerned with safety
cases for computer based command, control and protection systems such as
those found in railway signalling, nuclear reactor protection, air traffic control and
safety critical medical devices as well as many diverse military applications.
ASCAD can be applied both to new systems, using bespoke or COTS
components, and to the retrospective development of safety cases.
Many problems in producing an acceptable safety case can arise from an
attitude that regards the safety case as a bolt-on accessory to the system
(often produced after the system has been built). At this stage it is often
discovered that retro-fitting the supporting safety case is both expensive and
time consuming because the design does not minimise the scope of assessment
and the retrospective production of evidence is expensive. The overall ASCAD
approach can be applied to existing systems but the safety case options are
more constrained.
The manual assumes that the reader is familiar with the concepts of safety
management systems, quality management systems and safety analysis in
general. There is already a large body of guidance in these areas and the
uniqueness of this manual is its emphasis on addressing the construction of safety
cases. We also assume a familiarity with the system safety context as elaborated
in Appendix A.

Version: 1.1

Adelard Safety Case Development Manual

2 What is a safety case?


We define a safety case as:
A documented body of evidence that provides a convincing and
valid argument that a system is adequately safe for a given
application in a given environment
The safety case is a living set of documents which evolve over the life of the
system. In practice the arguments of the safety case are contained in the safety
case report, a document defining and describing the overall safety case, with
references to a number of supporting documents.

3 The importance of a good safety case


It is important that an adequate safety case is produced for a safety related
system in order to:
1. Ensure an adequate level of safety
2. Ensure that safety is maintained throughout the lifetime of the system
3. Minimise licensing risk (being able to demonstrate safety to the regulators
and assessors)
4. Minimise commercial risk (ensuring implementation and maintenance
costs are acceptable)
A safety case is a requirement in many safety standards and industries. Explicit
safety cases are required for military systems, the off shore oil industry, rail transport
and the nuclear industry. Furthermore, equivalent requirements can be found in
other industry standards, such as the emerging IEC 61508 (which requires a
functional safety assessment) the EN 292 Machinery Directive (which requires a
technical file) and DO 178B for avionics (which requires an accomplishment
summary).

4 Basis of the ASCAD methodology


Adelard has developed this methodology over several years. Initially the ideas
were the product of research studies, but this methodology has been applied in:

Specific safety cases for a number of command and control systems

Version: 1.1

Adelard Safety Case Development Manual

The safety case for the DUST-EXPERT advisory software produced by


Adelard for the Health and Safety Executive.

The development of safety standards such as MOD Def Stan 00-55

A generalised form in Def Stan 00-42 Part 2, as the software reliability case

The development of a Software Assessment Manual to IEC 61508 for


Factory Mutual Research Corp.

The approach has evolved during this period, but the evolution is largely through
extensions to the methodology rather than changing earlier ideas. While the
methodology is likely to evolve further, we believe that our current ASCAD
provides a good basis for safety case development.

5 How to use the manual


Part 1 of the manual provides some introductory material. The methodology itself
is described in Part 2. Part 2 is structured as follows:
Section 2 provides an overview of the technical approach
Section 3 defines the different elements of safety cases and their relationship
to the project and safety lifecycles. Guidance on the development of the
safety case elements is provided in detail in sections 47.
Section 4 provides guidance on the Preliminary safety case element
Section 5 provides guidance on the Architectural safety case element. This
leads into the need for design for assessmentcovered in Section 5.1
Section 6 provides guidance on the Implementation safety case element
Section 7 describes the Operation and Installation safety case element
Section 8 describes how to combine the safety case elements into a safety
case structure for a real project
Section 9 discusses the independent assessment of the safety cases
Section 10 deals with the important topic of long term maintenance of the
safety case
Section 11 discusses the contents of the safety case report and references
checklists of supporting documents.

Version: 1.1

Adelard Safety Case Development Manual

The main guidance is supplemented by appendices containing supporting


information, checklists, and an example safety case for a simple application.
The manual can be read from a number of different viewpoints.
New to safety cases?
Section 2 provides an overview of the approach. Also consider Appendix A
which provides an overview of the system safety context. It might also be
worth browsing Section 11 to get an idea of the contents of a safety case
and the example in Appendix K.
Wishing to construct a safety case?
It may be worth browsing the introductory material but the main starting
point is Section 3. This will provide signposts to the guidance on the different
elements of the safety case. Before starting it would be worth looking at the
example in Section K.
An experienced safety case developer?
Again, browse the overview and start with Section 3. One of the differences
between this manual and other published material is the emphasis on design
for assessment (see Section 5.1 and the checklists in Appendix D). The
tabular approach to safety case development will also probably be new
(see Sections 5 and Section 6.1). Also Section G, which discusses at length
the use of field experience, may be unfamiliar.
An independent assessor or regulator?
There is a specific section on assessment (Section 9) with a supporting
checklist in Appendix F. Long term issues are often not given sufficient weight
so consider Section 10 as well.
Concerned about long term issues?
Section 10 is specifically devoted to long term issues and refers to a number
of supporting appendices.
Developing a safety case retrospectively?
The methodology does not only apply to new systems. However the
evidence and arguments used can be different. Of particular interest will be
Appendix G, which discusses at length the use of field experience
There are a variety of checklists to support the safety case construction. These are
indicated with a tick.

10

Version: 1.1

Adelard Safety Case Development Manual

6 Feedback
We are keen to receive feedback on this manual. Please send comments to
ascad@adelard.co.uk, see our www page at http://www.adelard.co.uk or write
to Robin Bloomfield, Adelard, 3 Coborn Road, London E3 2DA.

7 Acknowledgements
The manual was produced by Peter Bishop, Robin Bloomfield, Luke Emmet, Claire
Jones and Peter Froome. Some of the underlying technical work was undertaken
in the CEC sponsored SHIP project (ref. EV5V 103). More recent material has come
from the Quarc project funded by the UK (Nuclear) Industrial Management
Committee (IMC) Nuclear Safety Research Programme under Scottish Nuclear
contracts 70B/0000/006384 and PP/74851/HN/MB.

Version: 1.1

11

Adelard Safety Case Development Manual

12

Version: 1.1

Adelard Safety Case Development Manual

Part 2 Description of the


safety case methodology
1 Introduction
This manual describes our approach to developing safety cases for computer
based command, control and protection systems. It provides technical rationale,
an explanation of how to construct safety cases, and supporting checklists and
examples to help with the efficient and practical development of safety case.
The manual is structured as follows:
Section 2 provides an overview of the technical approach
Section 3 defines the different elements of safety cases and their relationship
to the project and safety lifecycles. Guidance on the development of the
safety case elements is provided in detail in sections 47 below.
Section 4 provides guidance on the Preliminary safety case element
Section 5 provides guidance on the Architectural safety case element. This
leads into the need for design for assessmentcovered in Section 5.1
Section 6 provides guidance on the Implementation safety case element
Section 7 describes the Operation and Installation safety case element
Section 8 describes how to combine the safety case elements into a safety
case structure for a real project
Section 9 discusses the independent assessment of the safety cases
Section 10 deals with the important topic of long term maintenance of the
safety case
Section 11 discusses the contents of the safety case report and references
checklists of supporting documents.

Version: 1.1

13

Adelard Safety Case Development Manual

The main guidance is supplemented by appendices containing supporting


information, checklists, and an example safety case for a simple application.

2 Overview of approach
2.1 Safety case principles
We define a safety case as:
a documented body of evidence that provides a demonstrable and
valid argument that a system is adequately safe for a given application
and environment over its lifetime.
To implement a safety case we need to:
make an explicit set of claims about the system
produce the supporting evidence
provide a set of safety arguments that link the claims to the
evidence
make clear the assumptions and judgements underlying the
arguments
allow different viewpoints and levels of detail
The following sections describe how we think a safety case should be structured
to meet these goals.

2.2 Safety case structure


The safety case should:
make an explicit set of claims about the system
provide a systematic structure for marshalling the evidence
provide a set of safety arguments that link the claims to the evidence
make clear the assumptions and judgements underlying the arguments
provide for different viewpoints and levels of detail

14

Version: 1.1

Adelard Safety Case Development Manual

A safety case consists of the following elements: a claim about a property of the
system or some subsystem; evidence which is used as the basis of the safety
argument; an argument linking the evidence to the claim, and an inference
mechanism that provides the transformational rules for the argument. This is
summarised in the figure below.
Inference rule
Evidence
Claim
Evidence

Subclaim
Inference rule

Argument structure

Figure 1: Safety case structure


Note that evidence can be a sub-claim produced by a subsidiary safety-case.
This means that there can be a relatively simple top-level argument, supported by
a hierarchy of subsidiary safety cases. This structuring makes it easier to
understand the main arguments and to partition the safety case activities.
Different types of argument can be used to support claims for the attributes:
Deterministic application of predetermined rules to derive a true/false
claim (given some initial assumptions), e.g. formal proof of
compliance to a specification, or demonstration of a safety
requirement (such as execution time analysis or exhaustive
test of the logic)
Probabilistic

Version: 1.1

quantitative statistical reasoning, to establish a numerical


level (e.g. MTTF, MTTR, reliability testing)

15

Adelard Safety Case Development Manual

Qualitative

compliance with rules that have an indirect link to the


desired attributes (e.g. compliance with QMS standards, staff
skills and experience)

The choice of argument will depend on the available evidence and the type of
claim. For example claims for reliability would normally be supported by statistical
arguments, while other claims (e.g. for maintainability) might rely on more
qualitative arguments such as adherence to codes of practice.
In addition the overall argument should be robust, i.e. it should be valid even if
there are uncertainties or errors. For example, two independent arguments could
be used to support the top level safety claim about a given system. Alternatively,
if there are two independent systems that can assure safety, it may only be
necessary to have a single argument for each one. Typically the strength of the
argument will depend on the integrity level associated with the specific system. At
the highest integrity level (Level 4) we might expect two independent arguments
for a single system regardless of the existence of other systems, as illustrated in
Figure H0 below

Evidence A
Evidence B

Argument 1

Evidence C

Argument 2

Claim

Figure 2: Illustration of a robust claim


The development of the safety case should be alert to the possibility of evidence
that detracts from or possibly refutes the claims being made.
The safety case needs to be viewed at various levels of detail. A top-level safety
case might be decomposed into a hierarchy of sub-claims which are treated as
evidence in the top level safety caseso the evidence used at one level of the
argument can be:

16

facts, e.g. based on established scientific principles and prior research

Version: 1.1

Adelard Safety Case Development Manual

assumptions, which are necessary to make the argument, but may not
always apply in the real world
sub-claims, derived from a lower-level sub-argument

This is a recursive structure which can represent arguments at successively finer


levels of detail. This structure could evolve over the lifetime of the project. Initially
some of the sub-claims might actually be design targets, but as the system
develops the sub-claims might be replaced by facts or more detailed subarguments based on the real system. Deviations in implementation can be
analysed to see how they affect a sub-claim, and how changes in a sub-claim
ripple through the safety argument.
If correctly designed, the top-level safety case should remain substantially the
same as the design evolves, and many of the detailed sub-arguments and
evidence can be referenced out to supporting documents or subsidiary safety
cases. The evolution of the safety case at the top-level should be confined mainly
to the changing status of the supporting sub-claims and assumptions. For example
there may be an assumption that some tool will operate correctly, and this may
be later supported by explicit field evidence or analysis. The status of the
assumption would then change from unsupported to verified with a crossreference to the supporting document.

2.3 Types of claim


The safety case is broken down into claims about different attributes for the
various sub-systems, e.g.:
reliability and availability

usability (by the operator)

security (from external attack)

fail-safety

functional correctness

accuracy

time response

robustness to overload

maintainability

modifiability

The relevant attributes should be identified and, where possible, quantified. Note
that the attributes listed are only examples and further attributes may be safetyrelevant. This is elaborated later in Section 4.2.1.

2.4 Sources of evidence


The arguments themselves can utilise evidence from the following main sources:

Version: 1.1

17

Adelard Safety Case Development Manual

the design

the development processes

simulated experience (via reliability testing)

prior field experience

The choice of argument will depend in part on the availability of such evidence,
e.g. claims for reliability might be based on field experience for an established
design, and on development processes and reliability testing for a new design.

2.5 Style of argument


In safety related systems, we are primarily concerned with dangerous failures, and
the safety argument should be focused on ways of inhibiting a dangerous failure.
To do this, we first have to establish whether there is a known safe state for the
safety system or component. In the application described in Appendix K, there is
a known safe state so the design can be biased to fail in that direction.
Even in cases where there is no safe state for the hazardous plant (e.g. an aircraft,
or an unstable chemical process) it may still be possible to identify a safe state for
the subsystem (such as a transfer to a backup system or manual control). The
nature of the safety case argument will depend on the existence or otherwise of
these safe states.
This possible strategy for maximising safety is very similar to the one followed for
the top-level plant safety (i.e. hazard elimination, control, and accident
mitigation). For the subsystems, the hazards are more indirecta hazardous
subsystem state could, in combination with other failures, lead to an accident. At
these lower levels, we need to consider how hazardous subsystem states could
arise (i.e. what random and systematic faults could lead to a hazardous state),
and then minimise the probability of occurrence by a combination of fault
elimination, fault tolerance, and failure mitigation (normally fail-safety).
We can characterise the various approaches to limiting the dangerous failure
rate in the following figure.

18

Version: 1.1

Adelard Safety Case Development Manual

Transition depends on:


fault tolerance in
design
nature of application
(grace time, selfhealing?)

Safe
State

Error correction
Safe failure
Error
State

OK
State

Transition depends
on:
fail-safe design
partitioning
existence of safe
states

Dangerous failure
Fault activation
Danger
State

Transition depends on:


fault freeness
KISS, partitioning, novelty,
implementation quality,
past operating experience
Figure 3: Model of system failure behaviour
This fault-error-failure model can be applied at the level of a complete system or
for sub-components (e.g. the software level). A fault is a defect in the system and
is the primary source of the failure. However a system will probably operate as
intended until some triggering input condition is encountered. Once triggered,
some of the output values will deviate from the design intent (an error). However
the deviation may not be large enough (or persist long enough) to be dangerous,
so the system may recover naturally from the glitch in subsequent computations
(self healing). Alternatively explicit design features (e.g. diversity or safety
kernels) can be used to detect such deviations and either recover the correct
value (error recovery) or override the value with a safe alternative (fail-safety).
The chance of a dangerous failure transition would normally be expressed in
probabilistic terms, but it might also be expressed in deterministic terms (this can
never happen), or qualitative terms (e.g. two barriers must fail before this can
happen).

Version: 1.1

19

Adelard Safety Case Development Manual

A particular safety argument can focus on claims about particular transition arcs.
The main approaches are listed below:
A fault elimination argument can increase the chance of being in the
perfect state and can hence reduce or eliminate the OK erroneous
transition. This is the reasoning behind the requirement to use formal methods
(e.g. in MOD DS 00-55) which essentially supports a claim that the error
transition rate is zero because the software correctly implements the
specified logical behaviour.
A failure containment argument can strengthen the erroneous OK or
erroneous safe transition. An example would be a strongly fail-safe design
which quantifies the fail-safe bias. This, coupled with test evidence bounding
the error activation rate, would be sufficient to bound the dangerous failure
rate.
A failure rate estimation argument can estimate the OK dangerous
transition. The whole system is treated as a black-box, and probabilistic
arguments are made about the observed failure rate based on past
experience or extensive reliability testing.
It is also possible to apply the arguments selectively to particular components or
fault classes, e.g.:
A design incorporates a safety barrier, which can limit dangerous failures
occurring in the remainder of the system. The safety argument would then
focus on the reliability of the barrier rather than the whole system.
Different countermeasures might be utilised for different classes of fault.
Each fault class then represents a separate link in the argument chain, and
all fault classes would have to be covered to complete the argument chain.
For example, design faults might be demonstrated to be absent by formal
development, while random hardware failures are covered by hardware
redundancy.
While normally applied to incorrect logical behaviour, the same approach can
be applied to many of the other safety attributes. For instance to ensure
timeliness, timing errors could be:

eliminated by a design that ensures a maximum response time


mitigated using independent timing checks to force the output to a safe
state

A similar strategy could be applied for other safety-related attributes (e.g.


accuracy, security and maintainability).

20

Version: 1.1

Adelard Safety Case Development Manual

3 Safety case development


3.1 Safety case elements
The development of a safety case does not follow a simple step by step process
as the main activities interact with each other and iterate as the design proceeds
and as the level of component in the system changes. We have identified four
elements from which, in different combinations, one can construct the safety
cases required on a real project. The elements are:
Preliminary
Architectural
Implementation
Operation and
Installation
The scale and nature of a project will determine the number and type of safety
case elements required. There may be a recursive structure with multiple
Preliminary and Architectural elements within the system and sub- safety cases.
The structuring of these elements on real projects is discussed in Section 8.
The characteristics of the safety case elements are, whether one is considering a
new or off the shelf system, as follows.
Preliminary safety case element
1. This establishes the system context, whether the safety case is for a
complete system or a component within a system, and the phase of the
project lifecycle
2. It also establishes safety requirements and attributes for the level of the
design and interfaces to the system safety analysis
3. It defines operational requirements and constraints such as maintenance
levels, time to repair.
Architectural safety case element
1. This defines the system or sub-system architecture and makes trade-offs
between the design of the system and the options for the safety case.

Version: 1.1

21

Adelard Safety Case Development Manual

2. It defines the assumptions that need to be validated and evidence to be


provided in the component safety cases.
3. It also defines how the design addresses the preliminary operating and
installation aspects for the safety case (e.g. via maintainability,
modifiability, and usability attributes).
Implementation safety case element
1. This safety case provides the justification that the design intent of the
architectural safety case has been implemented and that the actual
design features and the development process followed provides the
evidence that the safety requirements are satisfied.
2. Additional assumptions for operation and maintenance are identified
and detail provided on how to meet the operational requirements.
Operation and installation safety case element
1. This safety case adds detail to the maintenance and support
requirements identified in the implementation safety case.
2. It defines any safety related operational procedures identified in the
preliminary safety case or Architectural safety case.
3. For a COTS system, the safety case would include the safety justification
of the specific configuration, and human factors-related issues such as
staffing requirement s and competence levels, training of operators and
maintenance personnel and facilities for long-term support.
4. The safety case would also record and resolve any non-compliances with
the original safety requirements.
Note: the reason for separating out the Architectural safety case is the
importance that good design has in ensuring safety. In our experience this is often
a neglected area of safety engineering and standards.

4 Developing Preliminary safety case elements


A Preliminary safety case element establishes the system and safety context:
whether the safety case is for a complete system or a component within a system,
the phase of the project lifecycle and defines the safety requirements and
attributes. To achieve this it is necessary to:
1. Define the system and equipment that a safety case is being developed
for and assess existing information about the project

22

Version: 1.1

Adelard Safety Case Development Manual

2. Select relevant attributes and define safety requirements as claims from


them
3. Provide traceability to system and other sub-system safety cases
4. Establish project constraints on design options and availability of
evidence
5. Assess potential long term changes to the safety case context
The extent to which this involves just marshalling existing information and the
extent to which it requires new analysis is of course very project dependent. Some
legacy systems may have none of this information available.

4.1 Definition of system and project


4.1.1 Operating context
Establish the operating context for the safety system including:

external equipment (e.g. the plant or other equipment)

interfaces to environment (e.g. actuators, sensors, data links)

failure modes of external equipment and interfaces

the safe and hazardous plant states (or equipment states) and target
failure probabilities
hazardous / safe states of the interfaces
anticipated changes in external equipment, interfaces and operating
modes
any operational or maintenance requirements such as maintenance
levels, repair times, manning intervals

4.1.2 Identify any defined PES (Programmable Electronic System) or


component safety requirements
Identify any top level safety requirements for the components that have been
defined. These might include:

safety functions

Version: 1.1

23

Adelard Safety Case Development Manual

reliability requirements

other safety attributes

applicable design criteria and standards

anticipated changes over its lifetime

4.1.3 Existing safety and project information


Establish the extent and quality of the existing safety documentation and the
requirements of any safety management system. A checklist of safety
documentation is provided in Appendix C.

4.2 Develop claims from attributes


The safety case is broken down into claims about different attributes for the
various sub-systems. The relevant attributes should be identified and, where
possible, quantified. Below is a suggested list of safety attributes at two levels: the
computer system level and the software level.
Note that the attributes listed are only examples and further attributes may be
safety-relevant. Conversely, for some applications not all attributes need be
safety-related. For example

security might be addressed by physical barriers

fault tolerance may be implemented in hardware

time response would not be safety-relevant for off-line stress analysis


programs, but it would be necessary to have accuracy and functional
correctness

4.2.1 Computer system architecture


Some of these attributes may not be relevant or may be addressed by other parts
of the system The appropriate set of safety attributes should be identified for the
specific application.

24

Version: 1.1

Adelard Safety Case Development Manual

System attributes
Accuracy
Availability
Fail-safety
Logical correctness
Maintainability (e.g. MTTR)
Maximum input and output data rates
Maximum response time
Maximum storage capacity (e.g. permanent records)
Modifiability (with respect to identified functional changes)
Real-time performance
Reliability (e.g. MTTF, pfd)
Response to hardware failures
Response to internal failures
Response to overload (data rate, internal storage)
Security
Timeliness
Usability
Table 1: Computer system attributes

4.2.2 Software attributes


This is a suggested list of safety attributes for the software. Many of the
requirements at the computer system level will be converted into functional
requirements for the software and hardware, so for example logical correctness
at the software level may be needed to implement the security attribute at the
computer system level. Some of these attributes may not be relevant to the

Version: 1.1

25

Adelard Safety Case Development Manual

software or may be addressed by other parts of the system (e.g. fault tolerance
may be implemented entirely in hardware). In addition, the software
implementation must cope with the constraints imposed by the specific choice of
hardware.
Software attributes
Accuracy
Compliance with hardware constraints (e.g. memory
capacity)
Fail-safety
Fault tolerance
Logical correctness (sometimes represented by the
software integrity level)
Maintainability
Modifiability (with respect to identified functional
changes)
Reliability
Response to hardware failures
Response to internal failures
Response to overload (data rate, internal storage)
Time response
Table 2: Software attributes

4.3 Traceability between levels


The top-level requirements are transformed into derived requirements as the
design proceeds producing a layered safety case. In the example below, a toplevel overall safety target (a worst case accident rate) is progressively
transformed into derived requirements for subsystems.

26

Version: 1.1

Adelard Safety Case Development Manual

Target for top event

Plant Safety
Requirement

Safety
Functions 1

Dangerous failure rate


Availability
Integrity attributes
Performance attributes

Dangerous failure rate


Availability
Residual attributes

Hardware
Functions

(Accident probability)

Safety
Functions 2

Dangerous failure
rate
Availability

System
Architecture

Dangerous failure rate


Availability
Integrity attributes
Performance attributes

Computer
System Functions

Computer
System Functions

Software
Functions

Initially these might be attributes such as security or maintainability, but at a


more detailed level of implementation these requirements will be converted into
design requirements that are implemented in one or more subsystems. It is
important that there is traceability between these levels so that there is a clear link
between the design features and the safety attributes. The subsidiary safety cases
for the subsystems should identify the design features and present arguments to
support claims that they implement the safety attributes. The traceability between
levels is illustrated in the figure below:

Version: 1.1

27

Adelard Safety Case Development Manual

Additional software
allow parameterisation
Modifiability
Recovery routines
Availability
Hardware
Security
Voting algorithms
Redundant channels
Data encryption
mechanisms
Password authentication
network isolation

In this way layered safety cases are developed, i.e. a top-level safety case with
subsidiary traceable safety cases for subsystems.

4.4 Establish project constraints


The extent to which the design can be changed and the availability and costs of
different types of evidence are two very important considerations in the
development of the safety case strategy. Each project will be faced with different
immutable realties and these should be established as early as possible in the
safety case development. Two extreme examples are:

The development of a new safety case in conjunction with a new system

The development of a safety case for an existing legacy system

The first is characterised by freedom to choose design options and an absence of


operating experience. The latter has no design freedom but a potentially large
body of operating experience.

28

Version: 1.1

Adelard Safety Case Development Manual

Checklist of project constraints


Design freedoms
To what extent can the design of the system be influenced?
If the design of the component is frozen are there other design options
available? e.g. add additional systems or components
Is there scope for design for assessment which takes into account the
costs and complexity of the safety case as well as the design?
Availability of evidence
Is there operating experience with the component? If so, how much,
how similar?
Is there evidence about the engineering process used to develop and
assure the component? If so, what type of data, of what quality? If not,
what are the likely costs of obtaining the evidence?
Is there analytical or empirical evidence about the behaviour of the
component? Does the data include safety relevant situations? Is it
relevant to this application?
Are there existing safety cases e.g. for generic system, or similar system?

4.5 Long term issues


The safety case context should include an assessment of the expected changes
over the lifetime of the safety case. A checklist of issues to consider is provided in
Table H0. The background to the lists is provided in Appendix E.

Version: 1.1

29

Adelard Safety Case Development Manual

Checklist of potential changes


Support environment changes
Changes to equipment maintenance and operation procedures (e.g. to
increase the intervals between maintenance)
Obsolescence of test and analysis hardware and software
Obsolescence/ upgrades of support tools (compilers, linkers, archiving
tools, document browsers, etc.)
Hardware changes
Obsolescence of computer hardware
Obsolescence / replacement of interface equipment (sensors,
actuators)
Interfaces to new systems
Software changes
Data changes (trip levels, configuration options)
Functional changes (new safety logic, support for new interfaces)
Changes to other attributes (timing, accuracy, storage capacity)
Changes in safety criteria
More stringent requirements on diversity of subsystems
More stringent requirements for system and channel isolation
Increased integrity requirements for software
Additional / stronger safety arguments to support claim
Table 3: Checklist of potential changes
There is also the need to plan ahead in the design of the safety case and
consider long term support issues. These are discussed in Section 10 and in
particular it may be necessary to conduct a safety case infrastructure assessment
(see Appendix H.2).

30

Version: 1.1

Adelard Safety Case Development Manual

5 Developing Architectural safety case elements


A Architectural safety cases provides the first level of detail of the safety case. It
involves:
1. Establishing the safety requirements either by importing the Preliminary
safety case and/or repeating it for the changes that have occurred (e.g.
revised safety analysis, more detail of design).
2. Evaluating design options or existing features to assess their relevance to
the safety case claims and attributes.
3. Adopting a design for assessment approach to develop a solution for
each safety attribute claim.
4. Elaborating the evidence to show that the claim is met and more usually
for this type of safety case defining the evidence that is required to be
collected.
5. Identifying the requirements that will be passed onto subsystems to
implement the architectural requirements.
6. Undertaking a risk assessment to identify any additional hazards arising
from: random failures, systematic faults or human errors in operations and
maintenance.
7. Assessing any additional risks introduced by the subsystem to ensure they
are acceptable in the context of the overall safety case.
The development of the Architecture safety case can be seen as the
progressive completion of a table for each attribute:

Version: 1.1

31

Adelard Safety Case Development Manual

Attribute: Functional Behaviour


Claim

This is from the


Preliminary
Safety Case.
See Section 4

Design Features

Assumption
/Evidence

Subsystem
Requirements

The evidence
either needed
(assumption) or
used to
substantiate the
claim is recorded
here.
See Section 5.2

These are selected or


those present
evaluated using the
fault avoidance, the
tolerance and failsafe bias approach.
See Section 5.1.

Used to
document and
trace
assumptions.
See Section 5.3

Design options See Section 5.1 and checklists in Appendix B


Attribute

Fault Avoidance

Error Tolerance

Fail-safe bias

Example tables are provided in Appendix D.

5.1 Design for assessment


In systems where there is design freedom to influence the design a design for
assessment approach is advocated where the safety system and the safety case
arguments are designed in parallel. In other more constrained situations the
design features that can contribute to the safety argument need to be identified
and evaluated.
By integrating the safety case into the design, the feasibility and cost of the safety
case construction and maintenance can be evaluated in the initial design phase.
This design for assessment approach should help exclude unsuitable designs
and enable more realistic design trade-offs to be made. The need to
demonstrate safety can be a very significant factor in the overall costs. For
example the Darlington computer-based reactor trip software was considered to
be too complex to understand and difficult to maintain. As a result, around 50

32

Version: 1.1

Adelard Safety Case Development Manual

man-years of effort was expended in software analysis, combined with extensive


statistical testing of the software using simulated trips. This resulted in several
months of licensing delay and the loss of several million pounds in lost generation.
Further costs will be incurred, as the software must be rewritten to make it more
maintainable. In this case, a design that permitted a more convincing safety case
would have been very cost-effective, even if the implementation costs were
higher.
The design should incorporate defences against anticipated hardware failures,
design flaws and human errors that could affect the functional behaviour of the
system. Three main strategies exist for ensuring safety: fault avoidance, error
tolerance and fail-safe bias. The tables in Appendix B identify potential defences
under these three main headings.
It is difficult to be specific about the choice of appropriate design and safety
case options that are likely to be both cost effective and convincing, but some
general design strategies are given below.

5.1.1 Keeping it simple (KISS)


Simplicity has many benefits: it can reduce the costs of implementation, the
safety case is easier to understand and, as a consequence, the risks of licensing
delays are reduced. While this may sound obvious, actually achieving simplicity is
quite difficult and it is all too easy to introduce unnecessary design complexity
which then has to be justified with a more complex safety case and more
extensive evidence.
Avoidance of complexity should be considered at an early stage in the design
process. It is often possible to choose a safety system architecture that eliminates
or at least reduces dependence on complex computer-based systems and
hence reduces the problems of constructing safety software. Take for example a
proposal to replace existing pressure limit switches (illustrated below).
High
Pressure
Pipe

Pressure limit
switch
Safety logic
(1,000 metres)

Figure 4: Original design


The operational goals are to improve availability and reduce time-consuming
manual checks (e.g. valving off a pipe to perform an over-pressure test). One
possibility is to replace each switch with a smart sensor; intelligent cross-

Version: 1.1

33

Adelard Safety Case Development Manual

comparisons between sensors can identify failures and hence improve fault
diagnosis and availability. However the safety justification would be extremely
difficult without detailed analysis of the smart sensor software and hardware. In
fact it is possible to produce a simple design which meets the safety and
operational requirements without excessive reliance on computer-based
elements as shown below.

High
Pressure
Pipe

Limit Switch

Analogue
Pressure Sensor

Safety logic
4-20 mA signal
Isolated
repeater
signals

(1,000 meters)

spy-points for monitor computer


Figure 5: Simple replacement design
The main difference is that an analogue pressure measurement is made in the
pipe rather than using a binary switch. By adding some spy-points, the
performance of the external pressure switches can be continuously monitored
and cross-compared by a single computer. It is relatively easy to justify the main
safety logic as it uses well-established, simple components. Failure of the
computer will interrupt monitoring but this has no immediate safety impact, so the
monitoring functions could be readily implemented on the existing station data
processing system or a simple low-integrity PLC.

5.1.2 Partitioning according to criticality


Even in cases where computers are assigned a safety critical function, it is possible
to minimise design complexity by partitioning the design according to criticality.
For example the Digital Trip Meter developed for Ontario Hydros Pickering B
nuclear power station is a simple device which trips on a single parameter. To
further minimise the complexity, the basic trip function is implemented in an
entirely separate computer. The more complex but less critical display and
monitoring functions are implemented in a second computer. This makes it easier
to justify the integrity of the main trip function. A similar approach is used in railway
computer-based safety interlocking equipmentthe interlocking and monitoring
functions are implemented on separate computers.

34

Version: 1.1

Adelard Safety Case Development Manual

The reactor trip system design in Appendix K provides a further example of


partitioning. Complex diagnostic functions are excluded by implementing them
on a separate machine. The choice of this architecture simplifies the design of the
main safety software, which is then easier to implement, analyse, test and justify in
a safety case.

5.1.3 Avoidance of novelty


Established systems and components have an established track record and past
field experience. The availability of existing evidence of fitness for purpose (e.g.
failure rates, failure modes and resistance to environmental attacks) reduces
uncertainty in the safety case arguments and the need to produce new
evidence to support the safety argument. The over-pressure switch example cited
earlier is an example of a system design that uses established analogue
components to simplify the associated safety case arguments.
Where computer-based systems are used, a similar approach can be employed.
Commercial off the shelf (COTS) systems and software components can benefit
from past experience. Mature versions of software are likely to be more reliable
than new ones. This also applies to complex hardware chips. For example, over
100 design faults have been reported in the five generations of the Intel processor
chip (from the 8086 to the Pentium). Of these faults, most were present in the early
versions of each chip design and were subsequently removed in the later versions
of the chip.
Even with established components, it is difficult to extrapolate from existing
experience if the operating conditions are novel. Risks can be reduced by
avoiding unusual modes of use and operating environments.

5.2 Sources of evidence


A preliminary safety case argument should be developed for the outline
architecture, which shows why the candidate design satisfies the safety related
requirements. This could use evidence from:

System Hazard Analysis, Fault Tree Analysis, etc.


Human Error Analysis (addressing the safety impact of maintenance and
operational actions, and the safeguards)
probabilistic design assessments (of reliability, availability, fail-safety and
performance)
qualitative design assessment studies (of complexity, analysability and
novelty)

Version: 1.1

35

Adelard Safety Case Development Manual

resource estimates for the implementation and the associated safety


case (effort, cost, time)

prior evidence about specific design techniques

independent certification (e.g. for COTS products)

experience from existing systems in field operation

5.3 Design assumptions


Almost inevitably, the safety case for the top-level design will have to make
design assumptions that need to be verified at a later stage. It is also necessary to
identify how the integrity of the design and the associated safety case will be
maintained over the lifetime of the safety system. It is therefore necessary to
identify:

requirements for additional safety case evidence to be produced during


the project (e.g. specific tests and analyses)
requirements for the long term maintenance and operation of the
equipment
requirements for long-term safety case maintenance (e.g. to handle
possible changes in safety function or technology)

5.4 Choosing a suitable system architecture and safety case


It can be seen from the example tables in Appendix D that there are many
possible arguments and architectures that could be used to meet the safety
requirements, and the choices can affect the requirements placed on the
subsystem components. For example, a triple modular redundant hardware
design (TMR) could minimise the need for software-based hardware integrity
checks. Equally it may be feasible to reduce the criticality of the software by using
hardware safety devices, diverse safety functions or diverse software
implementation. Some safety standards such as Def Stan 00-56 define basic rules
for system architectures for a given integrity level. There may also be specific
design criteria, based on prior design consensus, that are deemed essential for a
safety system.
Typically, some candidate design options will be identified and a preliminary
safety case will be constructed. This will normally be an iterative process, which
involves the identification of hazardous subsystem states (e.g. through some form

36

Version: 1.1

Adelard Safety Case Development Manual

of hazard analysis), and appropriate countermeasures (elimination, reduction and


failure mitigation, see Appendix B). The design and safety case are then assessed
to establish whether:
the design implements the safety functions and attributes
the design criteria are satisfied
the design is feasible
the associated safety arguments are credible
the approach is cost-effective
A more detailed review checklist is given in Appendix F. In this assessment process,
the costs of implementing the safety system and the associated safety case
should be considered during the architectural design phase. This analysis should
also include a consideration of the long-term safety risks and lifecycle support
costs involved in:
changing the safety functions
changing the hardware (e.g. due to obsolescence)
maintaining the equipment
maintaining the associated safety case
A checklist of potential changes is given in Table H0. An explicit list of anticipated
changes should be constructed, and the initial system should be designed to
cater for these changes.

5.5 Risk assessment and review


Having produced the outline design and safety case argument, a risk assessment
should be performed, to identify any additional hazards arising from:
random failures
systematic faults
human errors in operations and maintenance
This could utilise techniques such as FMEA, Hazops, and Human Reliability Analysis.

Version: 1.1

37

Adelard Safety Case Development Manual

The additional risks introduced by the subsystem should be assessed to ensure


they are acceptable in the context of the overall safety case. The feasibility and
cost should also be reviewed. This will require the involvement of the prime
contractor, and subsystem developer. Any changes and trade-offs will require the
agreement of the affected parties (e.g. regulator if safety case is changed,
operator if functionality changed).
It is also necessary to assess the safety case feasibility and cost. This assessment
should consider:

Implementation risk (addressing cost, novel designs and techniques,


design complexity, and need for ALARP).
Supplier risk assessment (if known). This could be based on factors such as
past track record, technical skills, documentation standards, quality
management standards, etc.
Licensing risk (addressing credibility of arguments, assumptions and
evidence, analysability of design, risks in obtaining the required evidence,
and compliance with design criteria and standards).
Safety case support risks concerning the long-term ability to sustain the
safety case (e.g. impact of functional changes, specialist skills, tools,
hardware obsolescence, regulatory changes).

Having identified the risks, the options and possible trade-offs should be reviewed.
This review will include the viewpoints of the developer, operator, licenser,
purchaser and maintainer. Also, the candidate design, system requirements,
safety case evidence and arguments, and the long term support requirements,
should be agreed with these stakeholders.
Appendix F provides a checklist for safety case reviewing.

6 Developing Implementation safety case elements


The Implementation safety case element completes the safety case in the sense
that it provides arguments and evidence to support the safety claims being made
about the component being implemented. Developing the Implementation
safety case element involves:
1. Establishing the component safety requirements either by importing them
from a Preliminary safety case element and/or by activities specific to this
case.
2. Elaborating the evidence to show that the claims are met.

38

Version: 1.1

Adelard Safety Case Development Manual

3. Documenting the results and providing traceability to the appropriate


Preliminary and Architecture safety case elements.
The preliminary subsystem (or component) and architecture safety case elements
should have identified the evidence needed for the implementation safety case
(e.g. test results, proofs, checks of assumptions, justification of tools, etc.). This
evidence is now gathered either as part of the normal development processes or,
for some retrospective safety cases, through additional technical investigations.
The key distinction between this and the other safety cases is that the evidence is
now provided to support the claims.
The evidence can be a combination of:

design features and supporting analyses (e.g. failsafe bias and


demonstration of the strength of the feature)
process features and results of the process (e.g. worst case timing
analysis)
experience either real or simulated (e.g. via statistical testing) and, more
usually for this type of safety case defining the evidence that is required
to be collected.

and could arise from:

normal verification and validation activities


tests and analyses to check specific safety attributes (e.g. time response,
response to overload, fail-safe bias, fault diagnosis performance, etc.)
tests of design assumptions (i.e. fail-safety, fault detection coverage, etc.)
more general safety analyses on the system design such as failure modes
and effects analysis (FMEA) and common cause failure analysis (CCF)

The completed implementation safety case for a subsystem will provide evidence
that:

the design features, V&V and safety analysis demonstrate that the
required attributes were implemented
all sub-contracted components have been implemented to specification,
and implement their required attributes

Version: 1.1

39

Adelard Safety Case Development Manual

all deviations are documented, and their impact has been analysed and
justified

As the project evolves the results of this subsidiary safety case will be incorporated
in the higher level system safety case. The actual subsystem components would
then be integrated into the overall system according to an integration plan. As
part of this process, the safety case may require evidence from:

conventional V&V tests

tests for specific system attributes

tests of design assumptions

hazard analyses on the final design

6.1 Attribute-claim-evidence tables


The development of the safety case can be seen as the progressive completion
of an attribute-claim-evidence table for each attribute. The following table
illustrates the types of claim that might be made for the correctness of the safety
function or other safety-related attributes of the software which have been
identified in the system-level safety case, and which are apportioned to the
software for the system. The text in italics refers to additional evidence which is
derived by V&V activities when the system has been partially or wholly
implemented.

40

Version: 1.1

Adelard Safety Case Development Manual

Attribute Correctness
Claim

Argument

Evidence/Assumptions

There is no
logical fault in
the software
implementation

Formal proof of
specified safety
properties

The design is simple enough to be


amenable to proof

Formal proof that code


implements its
specification

Proof tool is correct (or unlikely to


make a compensating error)
Compiler generates correct code
(sub-argument might use formal
proof, past experience, or compiler
certification)
High quality V&V process
Unit test results

Software
reliability
exceeds system
requirement

Reliability can be
assessed under
simulated operational
conditions

Statistical test results

Table 4: Example safety arguments in safety case for functional correctness.


More examples are provided in Appendix D.2. The tables could also be used to
record the evidence that might refute or undermine the claim being made.

6.2 Risk Assessment and review


At the completion of the subsystem implementation, the safety case evidence
should be reviewed to establish:

Version: 1.1

41

Adelard Safety Case Development Manual

the acceptability of the implemented subsystem (e.g. consistency with


requirements, and compliance with agreed standards criteria)
the acceptability of the subsystem safety case (e.g. completeness,
consistency, credibility, non-compliances)
whether it is consistent with the systems safety case

If problems are identified, a resolution to the problem has to be agreed between


the stakeholders (such as re-implementation, more extensive testing, reworking of
the subsystem safety case, or adjustments to the top-level safety case).
When a subsystem safety case is completed, the impact of the subsystem safety
case results on the overall safety case should be assessed, e.g.:

the impact of sub-system non-compliances on the overall safety case

whether independence and diversity assumptions have been satisfied

the strength of subsystem evidence and supporting documentation

The completed system safety case (including subsystem evidence and systemlevel evidence) should be reviewed to assess whether:

all identified hazards have been tracked and resolved


the safety case is complete (implementation and traceability of all
requirements)
the information is consistent and accessible (e.g. indexing and cross
referencing)
the supporting evidence is available in a suitable archive

It will also be necessary to check that an appropriate system support


infrastructure is identified, including:

a supporting document set

system operations and maintenance requirements

42

technical resources (e.g. for document archiving, safety analyses, and


tests)
safety case maintenance infrastructure

Version: 1.1

Adelard Safety Case Development Manual

Appendix F provides a checklist for safety case reviewing.

7 Operation and installation safety case elements


The operational, maintenance and installation aspects of the system will have
been addressed in part by all the previous safety case elements.

The Preliminary safety case element will have defined requirements in this
area and identified any operating constraints that might apply (see
Section 4)
The Architecture safety case element will have addressed the need to
design for usability, maintainability and modifiability (see Section 5)
The Implementation safety case element will have implemented these
features and assessed whether there are any new operating constraints
or procedures required as well as adding the detail now available to the
maintenance, use and support aspects (see Section 6)

This safety case element then adds to these by:

defining any safety related operational, installation and maintenance


procedures and requirements identified in the Preliminary safety case or
Architectural safety case
assessing whether the assumptions and operating constraints defined in
the Preliminary safety case are still valid and updating them as necessary
recording and resolving any non-compliances with the original safety
requirements
putting in place acceptable measures for dealing with outstanding
concerns (e.g. periodic tests, gathering field evidence, further analyses,
etc.)
for a COTS system, including the safety justification of the specific
configuration

The types of information that will be new to this safety case are those aspects of
operation, installation and maintenance that the developer may not be
competent to define. For example, specific grades of staff to undertake the
different types of maintenance, training requirements for operators, the exact user
specific permit to work system that should be used, the identified operating

Version: 1.1

43

Adelard Safety Case Development Manual

procedures needed to mitigate failure modes that require knowledge of the


wider system and environment to draft.
The development of the safety case may require particular trials or experiments to
be undertaken to confirm the adequacy of the operation and maintenance
aspects.
It will also be necessary to check that an appropriate system support
infrastructure is identified, including:

a supporting document set

system operation and maintenance resources

technical resources (e.g. for document archiving, safety analyses, and


tests)
appropriate user safety management systems

Over the lifetime of the system, there will almost inevitably be changes to the
safety case to accommodate changes in regulations, technology and
organisations so it will be necessary to establish a safety case maintenance
infrastructure (see Section 10).

8 Project safety case structure


The guidance above provides a systematic approach to the construction of
Preliminary, Architectural and Implementation safety case elements. To apply
these ideas to a real project will involve the identification of the number and
nature of the different safety case elements that are required. This will involve
consideration of:

Project lifecycle and structure

System structure

We also need to address any special considerations that apply when there are
subsidiary safety cases for components and sub-systems. The documentation
structure is addressed in Section 11.

8.1 Relationship to project lifecycle and structure


The project lifecycle and structure will influence the number and type of safety
case elements required. The different safety case components are influenced by
the system structure and by the changes of responsibilities. For example when a

44

Version: 1.1

Adelard Safety Case Development Manual

contractual boundary is crossed the safety responsibilities are handed over via a
safety case for that stage. This is the practice for civil and military air traffic control
where there are four part safety cases reflecting the purchaser/developer/
operator/user/maintainer boundaries.
The following table illustrates the different safety case components, for example
of a simple command system that consists of a database and an interface. Note
not all the project phases are shown.

Version: 1.1

45

Adelard Safety Case Development Manual

Project Phase

Safety case
element

Produced by

Invitation to Tender

Preliminary

Purchaser in conjunction with


user

Preliminary design

Preliminary

Designer, to confirm initial


preliminary safety case

System Design

Architectural

Designer

Subsystem Requirements

Preliminary

Designer for HCI


subcontractor

Subsystem Requirements

Preliminary

Designer for database


developer

HCI Subsystem Design

Architectural

HCI subcontractor for


designer

Database Subsystem
Design

Architectural

Database subcontractor for


designer

Subsystem
Implementation

Implementation

HCI subcontractor for


designer

Subsystem
Implementation

Implementation

Database subcontractor for


designer

Systems Integration

Implementation

Designer integrates and


consolidates subsystem
implementation safety cases
into overall case for
purchaser

Operation

Operational

User integrates overall


implementation safety case
into the operational safety
case

8.2 Influence of types of system


As noted above, the safety case activities will depend on the nature of the actual
system. For example, the system could be:

46

Version: 1.1

Adelard Safety Case Development Manual

an entirely new bespoke system

a COTS product that is configured for a new application

a legacy system which is already operational

The main distinction between these cases is the degree of implementation


freedom that exists for the different types of system. This will affect the approach
taken for the Architectural and Implementation safety cases.

For a new system the design can take into account the need to
demonstrate safety and the safety case production can be incorporated
into the project following a design for assessment approach.
For a COTS product, the design freedom is more constrained. There is
design freedom in the choice of COTS, so that a system can be chosen
where there is sufficient generic evidence to demonstrate safety. There is
also design freedom in the way the product is configured and used in a
particular application.
For a pre-existing system, there is very little design freedom, but there
may be scope for additional testing and analysis to demonstrate safety
attributes.

8.3 Subsystem safety case


The system architecture will apportion top-level safety functions to subsystems and
will also impose derived requirements on the subsystem. These will include
additional functional requirements (e.g. to support fail-safety and fault tolerance
mechanisms). There may also be attributes such as timeliness which have been
apportioned by the systems analysis, which have to be implemented and
demonstrated at the subsystem level. In addition, the system architecture design
may impose additional design constraints (such as available memory), which
must be respected before the system can function correctly. Thus the subsystem
safety case will consist of a number of claims which link back to requirements and
constraints imposed by the system-level safety case.
There could be several layers of subsystems and associated safety cases (e.g.
individual computer system and software). For each layer one can identify a
design activity that establishes the component safety context and an
implementation activity.

Version: 1.1

47

Adelard Safety Case Development Manual

9 Independent assessment and acceptance of the safety case


Depending on the system and industry involved, there is usually some form of
independent assessment and acceptance of the safety case. The safety case
may be accepted by some safety body within the customer organisation, or by
some external regulator or independent assessor. Ideally, this assessment process
should be phased to run in parallel with safety case development. For example, if
the individual parts of the safety case are assessed and accepted as they are
developed, this can reduce the effort and time required to produce the overall
safety case.
The Preliminary safety case element assessment should focus on:

the realism of the environment description

the credibility of the safety analysis and hazard identification

the credibility and conservatism of the assumptions (e.g. sensitivity to


error)
for the validity of the safety requirements to be implemented in the other
safety cases

The Architecture safety case element assessment should focus on:

whether the design features will achieve the attributes and whether a
design for assurance approach has been adopted

whether the design addresses long term issues

the credibility and depth of the design hazard analysis

the project risk arising from novelty, complexity and project stress

the extent of standards compliance

the use of prior evidence, pre-certification of components (e.g. by TV),


and field experience
traceability to the Preliminary safety case component

The Implementation safety case element assessment should focus on:

48

Version: 1.1

Adelard Safety Case Development Manual

the consistency of the claims with the Architecture safety case


requirements

the strength of the arguments and evidence to support the claims

the sensitivity to argument flaws (e.g. number of argument legs)

the credibility and conservatism of assumptions

whether the hazards identified in the design have been tracked and
controlled (e.g. by hazard elimination, protective features, or operational
procedures)
the impact of changes made during development (and whether this
affects the arguments in the Preliminary and Architectural safety cases)
whether the operational and maintenance requirements to maintain the
system and the safety case are likely to be reasonable

The Operational safety case assessment should focus on:

whether the operational procedures necessary to maintain safety have


been implemented, and are reasonable
whether there are adequate staff with appropriate technical skills and
training to maintain and operate the system
whether there is an adequate support infrastructure to monitor and
update the safety case during operation (e.g. are there staff or contracts
in place to update systems and the associated safety cases and
evidence)
whether there is an acceptable approach for dealing with outstanding
concerns (e.g. periodic tests, gathering field evidence, further analyses,
etc.)

Over the lifetime of the system, there will almost inevitably be changes to the
safety case to accommodate changes in regulations, technology and
organisations.
Appendix F provides a generic safety case review checklist that can be used at
all project phases.

Version: 1.1

49

Adelard Safety Case Development Manual

The independent assessments should also look broadly at the available evidence
to ensure that any evidence contradicting the claims is properly incorporated into
the safety arguments.

10 Long-term maintenance
An important part of many safety cases is their potential longevity. This part of the
manual looks at the issues raised by this longevity and the supporting
organisational and management processes that are needed. The maintenance
implications of the safety case have been incorporated into the overall safety
case methodology in Section 5, so that the long-term costs and risks of
maintaining the safety case can be considered at an early stage in the system
design. There is little published data on the costs of safety case maintenance. The
costs of maintaining the overall safety cases in the nuclear industry is significant,
roughly 2% of operating costs per year, so a methodology that considers support
implications could have a considerable impact on costs as well as safety.
Control and protection systems are long lived in comparison with the lifetimes of
the implementation technologies, which are typically electronic and computer
based. Developments in these technologies are rapid with typical products
obsolete within a few years. This has led to the special provision of spares and for
the planned refurbishment of systems, and considerable effort is expended to
address the long term operational requirements. There are however wider issues
than this to be addressed when looking at the long term maintenance of safety
cases. These include the need to maintain the safety case in the light of external
changes which may affect it, e.g.:

changes in operational requirements

changes to the implementation and assurance technologies

physical deterioration of the equipment

changes to safety criteria, standards and the regulatory environment

new technical knowledge and the feedback of experience

We also need to consider internal changes which affect the long-term integrity of
the safety case maintenance process, e.g.:

50

changes to the safety case process, people and technical resources

changes in organisational structures and responsibilities

Version: 1.1

Adelard Safety Case Development Manual

In discussing the integrity of the maintenance process, we cannot just consider


the process as some kind of machinewe have to address some of the human
factors and organisational issues involved in the process. For example, the
maintenance integrity will be affected by the skills and the unwritten knowledge
(tacit knowledge) of a person or a team. The performance can also be
impaired by organisational factors such as cumbersome procedures, poor
communication, and management attitudes (i.e. pressures imposed through
resource availability, time schedules, response to safety problems, etc.). In
general the weaknesses arising from human factors can be broadly classified in
terms of their source, as follows:
1. generic individual weaknesses to errore.g. skill-based, rule-based, and
knowledge-based errors
2. supporting materialshow does the design and actual use of documents
and other representations help or hinder the human activities they are
designed to support?
3. violations of established proceduresdo existing accepted or
documented procedures encourage deviations or departures that are
(a) harmful or (b) necessary to get the job done?
4. generic group weaknesses to errore.g. group co-ordination and process
failures arising from inappropriate resources; co-ordination failures or
motivational problems
5. organisational weaknesseshow does the communication, culture and
structure of the organisation impact on the process and its constituent
activities?
More background on human factors is provided in Appendix I.
In preparing this manual we investigated historical trends in safety system design
and their impact on safety case construction and maintenance. We also
examined the overall safety case maintenance activities, the main sources of
change and the requirements for maintaining the integrity of the maintenance
process so that it can respond to change. We also conducted interviews with
safety case specialists to help identify difficulties in maintaining the integrity of an
existing system and the problems associated with system and organisational
changes. This provided the raw material for the guidance.
The long-term maintenance requirements are dealt with in more detail in
Appendix H, but briefly the main activities are to:

Monitor the integrity of the safety case and the support infrastructure. (Is
the safety case still valid? Have the outstanding concerns been
addressed? Can anticipated changes be implemented?)

Version: 1.1

51

Adelard Safety Case Development Manual

Assess the safety impact of any proposed changes (e.g. replacement of


obsolete parts, changes to functional requirements, changes to related
equipment) or new technical knowledge (e.g. about component failure
modes).
Update the safety case to reflect any change in the system.

11 Contents of a safety case report: documentation issues


As defined above a safety case is a documented body of evidence
accessible at different levels of detail. To provide an effective means for
developing, communicating and reviewing the safety case as it evolves, the main
claims, arguments and evidence should be contained in a safety case report with
a wealth of supporting documentation.
The guidance in this manual is more concerned with making valid and convincing
arguments than with prescriptive details of how the information should be
structured. Clearly there are a number of different ways of structuring the
documentation according to the project structure and the overall approach to
documentation.
The safety case structure is to some extent recursive, with a similar structure for the
system and subsystem safety cases. Also not all the information is available at the
start of the project so the safety case report will evolve from being a statement of
requirements, to a statement of intent to a record of what has been achieved. For
this reason it is important that the report contains a section describing the current
status of the system implementation, subsidiary safety cases and the supporting
evidence.
For generic systems and subsystems the safety case would be split into two parts:
1. A generic safety casecovering generic safety features (this is done only
once)
2. An installation and operation safety casecovering the safety
justification for the specific installation
The safety case documentation should be a coherent and consistent whole.
Some approaches will involve subsuming earlier safety cases in the current one,
others will update and reissue previous versions. traceability needs to be
maintained throughout.
Safety cases are good candidates for electronic support. In Adelard we have
been developing a particular approach to this based on the Claviar tool set:
details can be found on our web page (www.adelard.co.uk).

52

Version: 1.1

Adelard Safety Case Development Manual

In developing an approach to documentation the following should be


addressed:
A common style (template) across all parts and stages of the
project
A structure that reflects the contractual boundaries (e.g. a
new safety case report for each change in ownership)
A structure that takes account of the system structure (e.g. a
safety case for each major component)
A mechanism for ensuring consistency between safety cases
(e.g. review, insertion of common information from one
source)
A practicable mechanism for ensuring traceability between
safety cases and their different versions
The following sections provide a checklist of headings that could be incorporated
in a safety case report. A checklist of other safety documents is provided in
Appendix C.

11.1 Environment description


This summarises the operating context for the safety system. The description should
include:

external equipment (e.g. the plant or other equipment)

interfaces to environment (e.g. actuators, sensors, data links)

failure modes of external equipment and interfaces

hazardous plant states

hazardous / safe states of the interfaces

anticipated changes in external equipment, interfaces and operating


modes

Version: 1.1

53

Adelard Safety Case Development Manual

11.2 PES safety requirements


This identifies the top level safety requirements for the Programmable Electronic
System (PES), and should include:

safety functions

reliability requirements

other safety attributes (see Table H0, page 25)

applicable design criteria and standards

anticipated changes over its lifetime

11.3 PES system architecture


The proposed or actual system architecture used to implement the safety
requirements should be described. The architecture description should cover:

subsystem components and interconnections

the apportionment of safety functions and integrity levels to subsystems

description of design features and methods for implementing the


attributes
subsystem derived functions (e.g. diagnostic, security, maintenance
functions)
subsystem attributes (e.g. time budgets, reliability, integrity level, fail-safe
bias)
subsystem design constraints (memory capacity, processor utilisation,
expansion capacity, segregation and isolation, diversity requirements)
subsystem safety case evidence requirements

11.4 Planned and actual implementation approach


The development methods used to assure the integrity of the implemented system
should be identified, e.g.:

54

fault avoidance (e.g. use of established components)

Version: 1.1

Adelard Safety Case Development Manual

design analyses

verification and validation tests

11.5 PES system architecture safety argument


The safety case should present a set of arguments based on the design, planned
development processes and assumptions which support a claim that the safety
requirements can be met. A safety argument should:

provide at least one safety argument for each requirement which relates
evidence from design features, subsystem requirements and
development processes to a claim about the requirement
identify all design assumptions used in the argument, e.g.:

claim limits (CCF, system failure rates, fault detection, diversity)

failure modes

failure rates

fail-safe bias

fault detection coverage

segregation and independence

performance of test and development methods

identify supporting evidence and analyses

system hazard analysis

operations and maintenance hazard analysis (human error analysis)

reliability and availability (e.g. failures per demand, spurious trip rate)

timeliness, accuracy

compliance with design criteria

expected change analysis

Version: 1.1

55

Adelard Safety Case Development Manual

evidence from subsystem safety cases on subsystem attributes and


functions
analysis of field evidence (standard components, supporting tools)

11.6 Subsystem design and safety arguments


This is essentially similar to the main safety case except that some of the attributes
may be converted to functional requirements, and the supporting evidence may
differ. See Table H0, page 26 for a list of attributes related to software.
This should include the results of additional hazard analyses assessing the impact
of the subsystem on the overall system and validating the assumptions made in
earlier safety analyses.

11.7 Long term support requirements


The safety case should also identify the associated support requirements for the
PES. This will include:

PES maintenance and operation procedures


safety case support infrastructure, to accommodate anticipated system
changes and maintain integrity of the existing safety argument

11.8 Status information


The safety case report is a living document which evolves throughout
development and subsequent operation and should contain a description of the
current status. This could include the status of:

safety case evidence

design assumptions

subsystems

outstanding concerns

unresolved hazards

In order to track the evolution of the safety case, it is also desirable to record
significant events during the construction of the safety case. This would include:

56

Version: 1.1

Adelard Safety Case Development Manual

changes in safety case arguments and evidence

justification of changes

identification of hazards arising during implementation, and their


resolution

11.9 Evidence of quality and safety management


Since the safety case is only credible if the development is well-controlled, the
accomplishment summary must also reference evidence that the safety
management and quality management procedures have been followed. This
evidence includes:

results of QA audits

result of safety audits

evidence that identified problems are resolved

11.10 References
The document will include references to related documents. These could include:

environment descriptions

design documents

safety case evidence (analysis documents, test documents)

subsystem safety cases

hazard log

quality and safety audits

scientific journals, technical documents

Version: 1.1

57

Adelard Safety Case Development Manual

58

Version: 1.1

Adelard Safety Case Development Manual

Appendix A System safety


context
Safety is a property of the overall system
rather than any given sub-component.
The top-level requirement is to maintain
the safety of the plant, aircraft or other
system. In the subsequent safety analysis, there is a process of hazard
identification, risk analysis, and risk reduction. The risk reduction can be
implemented using a number of different strategies:

Hazard elimination. The hazard is removed by modifying the plant design.


For example a fuel reprocessing cell full of radioactive material might go
critical. The hazard can be removed by reducing the size of the cell so
that there will always be a sub-critical mass.
Hazard control. For remaining hazards, the possibility of an accident can
be reduced by independent safety features. For example, the processing
cell could be fitted with a number of independent safety devices to
prevent too much material entering the cell (weighing devices or
radioactivity measurements) or to prevent criticality if a dangerous
amount does enter (e.g. by flooding with neutron absorber).
Accident mitigation. If the hazard controls fail, and there could be an
incident or accident, the severity could be mitigated by other safety
features (i.e. by operating the system underground, using various forms of
containment, etc.).

The overall plant or system safety case makes an overall claim for safety based on
all these risk reduction approaches. Targets would be set for the tolerable
accident frequency and severity, and the top-level safety case would argue that
the implemented safety features ensured the accident frequency was within
limits. There is also a requirement to show the risk is ALARP (as low as reasonably
practicable) so further risk reduction should be implemented provided the costs
do not outweigh the gains.
The systems we are discussing fall mainly into the category of hazard control (i.e.
reducing accident frequency). They would be used to implement the basic safety
functions (e.g. preventing excess mass entry or flooding the cell). Of course there
is no actual need to use a computer-based system to implement a safety
function; other mechanisms such as mechanical interlocks, discrete logic or a
human operator could be used instead. In addition the same safety function

Version: 1.1

59

Adelard Safety Case Development Manual

might be implemented by several different systems (e.g. computers, discrete


logic, and manual operation).
The top-level safety case is normally based on compliance with established safety
engineering principles coupled with probabilistic arguments about the failure
probability of the various safety systems that implement the safety functions. The
safety engineering principles are based on engineering consensus and past
practice within an industry and system designs have to satisfy specific criteria such
as defence in depth and the single failure criterion. The industry can also
impose design safety assessment rules to introduce some conservatism into the
probabilistic analysis used in the safety case. These might be qualitative rules
about what components can be considered diverse (and hence fail
independently) or quantitative rules such as:

claim limits on the failure rates of a subsystem or component

claim limits on the level of common mode failure between components

In computer-based systems, these limits are often related to an assigned integrity


level. The concept of an integrity level is used in a number of different safety
standards, such as IEC 61508, and MOD Defence Standard 00-56 with the integrity
level ranges from 1 to 4 (with 4 being the highest level). Generally speaking, subcomponents inherit the system integrity level, although if the sub-components are
diverse, the sub-components integrity level can be one level lower. The integrity
level for a software component is often linked to recommended techniques
(e.g. statistical tests for Level 4).
Safety also has to be maintained over the lifetime of the plant. The system design
and the associated safety case must consider potential attacks on the design
integrity over its lifetime (e.g. through normal operation, maintenance, upgrades
and replacement) which can introduce new flaws. When these considerations
are applied to individual subsystems (typically by applying hazard analysis
methods), a set of derived requirements may be produced for the subsystems
that are necessary to maintain the top-level safety goal. These derived
requirements can be represented by attributes of the equipment that can affect
plant safety. The most obvious attributes are reliability, availability and fail-safety,
but there are other more indirect attributes including:

60

maintainability

modifiability

security

usability

Version: 1.1

Adelard Safety Case Development Manual

replaceability

These tend to be treated as softer attributes, but they are necessary to maintain
the integrity of the original design against potential sources of attack (even if
these are unintentional). Essentially the attributes relate to threats from different
sources (such as maintenance staff, the operator, unauthorised personnel, or
ageing and obsolescent equipment). These might be addressed using more
qualitative arguments (e.g. number of defences or conformity to ergonomic rules
and design standards).

A.1 Safety-related standards in the public domain


There are many safety related standards. A selection of the publicly available
generic standards is shown below.
IEC 61508

A general safety standard for industrial computer


systems, which covers both system and safety aspects.
Identifies recommended methods at different
integrity levels. Also contains guidance on
management and documentation requirements

MOD DS 00-56

Ministry of Defence system safety standard. It includes


a risk classification scheme and rules for assigning
integrity levels to systems and software

MOD DS 00-55

Ministry of Defence standard for software in safety


related systems. Initially designed for Integrity Level 4,
but now extended to cover lower integrity levels

DIN V-19250

German system safety standard, includes a risk


classification scheme

DIN VDE-801

German standard for implementing safety related


software. Focuses on defences against systematic and
random faults in the system

ISO 9001

Quality standard. Compliance to an accepted quality


standard is required to ensure the safety case
construction process is well managed

ISO 9000-3

Quality standard for software development within the


framework of ISO 9001
Table A1: Example public domain standards

Version: 1.1

61

Adelard Safety Case Development Manual

Some links to standards and reviews of standards are provided from


www.adelard.co.uk.

A.2 Other safety guidance


In addition to specific standards there is guidance issued by the HSE, and
established industry practices.
HSE PES Guidelines

Guidance for general industrial use.


Concentrates mainly on the overall
system safety architecture

IAEA Software important to safety in


nuclear power plant

Provides guidance on the


development and verification of
software for nuclear power plant

Table A2: Example industry guidance

A.3 Example criteria


Design criteria identify design rules that are considered necessary to achieve an
adequate level of safety. The criteria can be expressed in probabilistic, qualitative
or deterministic terms. For example: no single independent fault shall affect
normal operation of the equipment is an absolute rule. There is scope for
interpretation of the words independent and fault, but once agreed the
design can be subjected to an analysis to establish conformity to the rule. A
qualitative rule using terms such reasonable or practicable requires human
judgement and justification for each new case.

A.3.1 Probabilistic criteria


A probabilistic safety analysis can yield unrealistic results if excessive claims are
made about the performance of individual subsystems. Unidentified and
unquantified factors might exist which drastically affect the achieved level of
performance. To avoid over-optimistic analyses, limits are imposed on the
claimed performance level of complete systems and specific design features.
Note that evidence must still be supplied to show the level is actually reached or
exceeded. In the case of IEC 61508, reliability claims for computer systems are
associated with specific integrity levels, i.e.:

62

Version: 1.1

Adelard Safety Case Development Manual

Integrity
level

Failure probability per


hour (P)

10-5 > P 10-6

10-6 > P 10-7

10-7 > P 10-8

10-8 > P 10-9

Table A3: IEC 61508 reliability claims

A similar limit scheme is used in MOD DS 00-56, but the probability ranges are not
pre-determined; they have to be defined for a specific application.
For diverse subsystems implementing the same function (or functions), MOD DS 0056 allows the subsystem integrity level to be reduced by one level. Common faults
can limit the reliability improvement of diverse systems. The reduction in integrity
level reflects empirical experience that diversity can yield an order of magnitude
improvement. Other examples of claim limits for other design features are:

Fault detection coverage factor (e.g. maximum 95%)

Beta Factor limit for redundant channels (e.g. 10%)

Fail-safe bias (e.g. maximum 95%)

A.3.2 Deterministic criteria


These usually impose constraints on the system architecture that would be
acceptable in a safety system. These constraints might include rules such as:

No single independent fault shall affect normal operation.

No two independent faults shall affect safety.

Segregation requirements, where all components in one box are


assumed to be able to affect each other. Any component inside the
segregation boundary has to be implemented to the integrity level of the
most critical component.

Version: 1.1

63

Adelard Safety Case Development Manual

At least two different safety functions to protect against the most critical
accidents.

A.3.3 Qualitative criteria


Qualitative criteria require compliance to some rule but the judgement of
compliance tends to be more subjective and context specific e.g.:

64

Compliance with a quality management standard. This could be


implemented in different ways so the acceptance requirements are not
so clear cut.
As low as reasonably practicable (ALARP). This is based on some
quantitative estimates of gain in safety versus implementation cost, but
the relative costs are open to debate.

Version: 1.1

Adelard Safety Case Development Manual

Appendix B Design options to


limit dangerous failures
A design should incorporate defences
against anticipated hardware failures,
design flaws and human errors that
could affect the functional behaviour of
the system. Three main strategies exist for ensuring safety: fault avoidance, error
tolerance and fail-safe bias. The following tables identify potential defences
under these three main headings.

B.1 Computer system defences


This table identifies some of the safety related attributes at the software level, and
identifies possible design approaches that limit a dangerous failure of an
attribute.
Design Approaches
Attribute
Accuracy

Fault Avoidance
stable sensors
stable and
accurate inputoutput system

Availability

Version: 1.1

Error Tolerance

Fail-safe bias

feedback
mechanisms to
minimise long-term
error

high reliability
components

multiple channels +
voting

compliance with
environmental
standards (EMI,
temperature,
etc.)

main + hot standby


main + cold
standby

65

Adelard Safety Case Development Manual

Logical
correctness

design simplicity

design diversity

formally proved
hardware (e.g.
VIPER)

hardware
watchdogs
fail-safe bias on
inputs and outputs

mature hardware
(stable, extensive
field experience)
Maintainability

interface
labelling

multiple channels +
voting

keyed
connectors to
avoid errors

main + hot standby


main + cold
standby

indicator lights for


failed
components
Modifiability

simple, standard
interfaces
modular design

Response to
overload

ensuring
processor
capacity is
sufficient for
maximum inputoutput data rates

prioritising functions
so that the least
important functions
can be discarded

Security

locked cabinets

access indicators
(e.g. light on if door
open)

encryption
Timeliness

time budgets
assigned to
functions

hardware
watchdogs

Table B1: System level defences

66

Version: 1.1

Adelard Safety Case Development Manual

B.2 Software defences


This table identifies some of the safety-related attributes at the software level, and
identifies possible design countermeasures that limit a dangerous failure of an
attribute.
Design options
Attribute
Accuracy

Fault Avoidance
use of floating
point

Error Tolerance

Fail-safe bias

diversity plus voting

comparison
against
computation with
a small input
perturbation

safety critical tasks


have priority on
resources

memory exhaustion
checks

alternative data
sources

fail-safe response
to failure conditions

integer
calculations with
worst case error
and overflow
analysis
algorithm stability
analysis
Compliance to
hardware
constraints (e.g.
memory)

pre-allocation of
resources

Fault tolerance

alternative output
devices
Logical
correctness

design simplicity
formal
development

design diversity

isolation from
failures in noncritical functions
safety kernels
assertion checks in
code

Modifiability

design simplicity

code assertions to
detect errors

information
hiding

Version: 1.1

67

Adelard Safety Case Development Manual

Response to
overload

mechanisms for
limiting
throughput

graceful
degradation (e.g.
discarding old
data in a real time
system)

overload detection

Time response

bounded
execution time

preference given
to safety critical
tasks

software timers
watchdogs

Table B2: Software level defences

68

Version: 1.1

Adelard Safety Case Development Manual

B.3 Operations and maintenance error defences


Risk reduction options
Human Error

Operator error

Error Avoidance

training
procedures

Fault Detection /
Recovery
status displays
capability for cancelling
or returning to original
state

Calibration
error

Repair error

Update error
(parameter
data, redesign)

training
independent checks
status recording

pre-start tests

training
independent checks
status recording

pre-start tests

training
independent checks
status recording

capability for cancelling


or returning to original
state

on-line monitoring of
configuration integrity

on-line monitoring of
configuration integrity

pre-start tests
on-line monitoring of
configuration integrity
Malicious
damage

restriction of access
(locked cabinets,
passwords, authorisation
procedures)

on-line monitoring of
configuration integrity

Table B3: Maintenance and operations defences

Version: 1.1

69

Adelard Safety Case Development Manual

70

Version: 1.1

Adelard Safety Case Development Manual

Appendix C Checklist of safety


documents
To support the safety case construction
and maintenance activities, a wide
range of safety and project documents
will be needed. The set of documents
listed here is illustrative rather than mandatory. For a checklist on safety case
maintenance documentation see Appendix H.4.

C.1 Planning
Safety plandocument specifying the steps to produce a structured safety case
over the systems lifetime, covering quality management, safety management
and functional and technical safety.
Other plansthe safety aspects of other plans may also be relevant e.g. Quality
Plan, Configuration Management Plan, Integrated Logistic Support Plan,
Operation and Maintenance Plan, V&V Plan, Overall Project Plan.

C.2 Safety cases


There may be a number of different safety cases, phased by the stage in the
project or by the subsystem or component that they apply to.
Preliminary safety caseoutline safety case, provides the basic arguments used
to justify the system and subsystems, and the supporting evidence required.
Subsystem safety casesafety case for subsystem, provides the basic arguments
used to justify subsystems, and references to the supporting documentation.
Architectural safety casesafety case for system and subsystem design.
Final safety casefor generic systems and subsystems this would be split into two
parts:
Generic safety casecovering generic safety features (this is done only
once).
Installation and operation safety casecovering the safety justification for
the specific installation.

Version: 1.1

71

Adelard Safety Case Development Manual

C.3 Safety related documentation


Safety management systemdefinition and supporting procedures, work
instructions, organisations safety policy (targets, approach to ALARP), safety
record of company.
Supporting safety documents i.e. analyses, safety audit reports, hazard
identification, hazard analysis, operations and maintenance hazard analysis,
human error analysis, risk assessment, hazard log and hazard resolution records.
Safety aspects of "'ility" analysesreliability, availability, maintainability, security,
performance.
Analysis of field evidenceof standard components, supporting tools.

C.4 Project implementation


Codes of design practice intention and compliance at system, equipment,
software levels.
Project documentationrequirements, specification, design, coding, V&V and
test records, QA records, CM records.
System and environment descriptionexternal equipment (e.g. the plant or other
equipment), interfaces to environment (e.g. actuators, sensors, data links), failure
modes of external equipment and interfaces, hazardous plant states, hazardous/
safe states of the interfaces, anticipated changes in external equipment,
interfaces and operating modes.
Management and status informationcurrent concerns list, assessments of
competency, project stress indicators and safety culture indicators.

C.5 Review and audits


Safety Case Assessment Reportsevaluate credibility (performed by internal and
external assessors) of safety cases.
Certificatesresults of review and audit e.g. by ISA, QA organisation, internal
project audits, results of third party certification or previous regulatory approvals.
Audit reportsresults of audit e.g. by ISA, QA organisation, internal project audits.
Standards Compliance reportsintentions, statement and assessments.

72

Version: 1.1

Adelard Safety Case Development Manual

Appendix D Attribute-claimevidence tables


D.1 Attribute-claim-design tables
The following tables identify attributes
which may affect safety at the system
architecture level, together with possible safety arguments based on these forms
of evidence. These arguments often rely on assumptions, and requirements for
subsystems, which will have to be substantiated to complete the overall safety
case.
Attribute: Functional Behaviour
Claim

Claim that the


composite
behaviour of the
critical functions
implements the
overall safety
function

Design Features

Identification of
safety-related
functions
Partitioning
according to
criticality

Assumption
/Evidence

Subsystem
Requirements

Assumption that
Subsystem integrity
segregated
level
functions cannot
affect each other Functional
segregation
requirements

Design simplicity

Version: 1.1

73

Adelard Safety Case Development Manual

Attribute: Fail-safety
Claim

Design
Features

Claim that safety is


maintained under
stated failure
conditions, assuming
the subsystems are
correctly implemented

Use of
functional
diversity
Fail-safe
architectures

Assumption
/Evidence

System Hazard
Analysis
Fault Tree
Analysis

Subsystem
Requirements

Fail safety
requirements to
subsystems (response
to failure conditions)

Attribute: Reliability/availability
Claim

Reliability claim
based on
reliability
modelling and
CMF assumptions,
together with
fault detection
and repair
assumptions.
Reliability claim
based on
experience with
similar systems

74

Design Features

Architecture,
levels of
redundancy,
segregation
Fault tolerant
architectures
Design simplicity

Assumption
/Evidence

Subsystem
Requirements

Reliability of
components,
CMF assumptions

Hardware
component reliability

Failure rate,
diagnostic
coverage,
test intervals,
repair time,
chance of
successful repair
Prior field
reliability
in similar
applications

Software integrity
level
Component
segregation
requirements
Fault detection and
diagnostic
requirements
Maintenance
requirements

Version: 1.1

Adelard Safety Case Development Manual

Attribute: Response Time


Claim

Claim that overall


system design
can meet target
time response

Design Features

Design ensures
overall response
time is bounded

Assumption
/Evidence
Assumes subsystem time
budgets can be
met

Subsystem
Requirements
Time budgets for
hardware interfaces,
and software

Attribute: Security
Claim

Claim a defence
exists for all
identified attacks
Claim of defence
in depth for
critical attacks

Design Features

System level
access controls
External
interfaces
Physical barriers

Assumption
/Evidence
Knowledge of
likelihood of
different forms of
attack
Assumption that
all forms of
attack are
identified

Subsystem
Requirements
Subsystem
integrity checks,
interface credibility
checks,
subsystem
segregation

Attribute: Modifiability
Claim

Claim that
anticipated
changes do not
pose a safety risk

Version: 1.1

Design Features

Assumption
/Evidence

Functional
segregation,
design structure

Identification of
features likely to
change

Design simplicity

Impact
assessment of
incorrect
modification

Subsystem
Requirements
Explicit identification
of features likely to
change in software
and hardware
specifications

75

Adelard Safety Case Development Manual

Attribute: Maintainability
Claim

Claim that
maintenance
actions can be
performed
reliably, or are at
least fail-safe
(based on
analysis)

Design Features

Time to repair
Limits on
maintenance
actions (access,
calibration,
repair,
reconfiguration)

(based on past
systems with
similar features)

Assumption
/Evidence
Identification of
possible
maintenance
errors

Subsystem
Requirements
Subsystem failure
reporting and self-test
functions

Assessment of
incorrect action,
assessment of
impact on
dangerous failure

Attribute: Usability
Claim

Design Features

Claim that
operator cannot
affect the safety
of the system

On-line help,
ergonomic
design,
credibility
checks,
limits on operator
action

76

Assumption
/Evidence
Human error
rates, types of
error

Subsystem
Requirements
Operator interface
requirements

Usability tests

Version: 1.1

Adelard Safety Case Development Manual

D.2 Attribute-claim-argument tables


The following tables identify attributes that may affect safety at the software
subsystem architecture level, together with possible safety arguments based on
these forms of evidence.

Attribute: Correctness
Claim

Argument

Evidence/Assumptions

There is no
logical fault in
the software
implementation

Formal proof of
specified safety
properties

The design is simple enough to be


amenable to proof

Formal proof that code


implements its
specification

Proof tool is correct (or unlikely to


make a compensating error)
Compiler generates correct code
(sub-argument might use formal
proof, past experience, or compiler
certification)
High quality V&V process
Unit test results
Statistical test results

Attribute: Reliability
Claim
Software
reliability
exceeds system
requirement

Version: 1.1

Argument

Evidence/Assumptions

Reliability can be
assessed under
simulated operational
conditions

77

Adelard Safety Case Development Manual

Attribute: Timeliness
Claim

Argument

The system will


always respond
within the
specified time
constraints

Software design is such


that the execution time
is bounded and
statically decidable

Evidence/Assumptions
Maximum timing decided by static
code analysis
Dynamic tests of worst case time
response

Maximum time less than


limit

Attribute: Memory Constraints


Claim
The system will
always have
sufficient
memory to
continue
operation

Argument
Software design is such
that the memory is
bounded and statically
decidable

Evidence/Assumptions
Analysis of memory usage
Stress testing of system

Maximum memory use is


less than limit

Attribute: Tolerance to hardware failure


Claim

Argument

Identified
hardware
failures
(computer
interfaces, and
computer
system) are
either tolerated
or result in a failsafe response

Interface faults are


detectable by software
(e.g. via redundancy or
encoding). Internal
failure is detectable
and fail-safe

78

Evidence/Assumptions
All failure modes have been
identified
Fault injection tests to check
response

Version: 1.1

Adelard Safety Case Development Manual

Attribute: Tolerance to overload


Claim
Demands in
excess of the
specified rates
will result in a
safe response

Argument
Design can detect
overload conditions
and either maintain a
degraded service or
perform a fail-safe
action

Evidence/Assumptions
There is sufficient processing power
to cope with credible levels of
overload
Overload tests

Attribute: Maintainability
Claim

Argument

Parameter
adjustments
can be made
without
affecting safety

Software-imposed limits
ensure parameters
remain in the safe range

Evidence/Assumptions
Systems level analysis of allowable
safe ranges
Validation tests

Attribute: Operability
Claim
Claim that the
system is robust
to faulty
operator
actions
Claim that the
system is
designed to
minimise user
error

Version: 1.1

Argument
Design conforms to
human factors
standards

Evidence/Assumptions
Interface prototyping
Validation tests

Actions checked for


safety implications (e.g.
software safety
interlocks)

79

Adelard Safety Case Development Manual

80

Version: 1.1

Adelard Safety Case Development Manual

Appendix E Review of changes


that can affect the safety case
The following discussion considers the
various sources of change and the
associated modifications to the safety
case.

E.1 Changed PES system requirements


There are a number of different types of change to PES system requirements.
Ideally these should have been anticipated in the original development (e.g.
revised sensors or different setpoints). These should have been designed for in the
safety case, and the changes required will depend on the success in partitioning
the design for maintenance and in possibly showing generic properties so that
existing analyses can be reused.
Other types of changes could be to add functionality for new items (e.g. for
safety reasons a new trip is required) or to respond to changed interface
requirements for operational reasons (e.g. to improve provision of information to
the operator or to facilitate maintenance).
These changes to requirements can affect both non-functional and functional
requirements. Changes to functional requirements will require the redevelopment
and assurance of part of the system and the revision of part of the safety case
with the principles and the types of argument remaining as before. However, if
the technology is not available, or the safety criteria have changed, then there
may be a more radical change to the safety case. This would happen in the case
of obsolete equipment (see Appendix E.2) or in the case of changes in the
regulatory environment (Appendix E.3). If the changes were so major as to require
redesign of the complete system or a different hardware architecture the safety
case changes could also be large.
Changes to non-functional requirements such as reliability or availability can also
have an effect of varying degree. We could anticipate incremental
improvements which require the additional assurance of, say, performance. Or
we might have increased reliability requirements that could be achieved by
strengthening the statistical testing argument of the safety case. However, the
marginal cost of these changes will depend on how the system has been
designed and developed. If, for example, changes to non-functional
requirements resulted in the system moving from SIL2 to SIL3 so that formal
methods were then required, that would be a very significant and expensive

Version: 1.1

81

Adelard Safety Case Development Manual

change. For example, in a system Adelard developed, we estimate that a


change from SIL2 to SIL3 would increase the project cost by 25% and the project
development time by 20%. If we had not been using formal methods at SIL2 these
figures would have been more comparable to the original development effort.
The overall cost of the changes is important but often, for systems that are already
in operation, the duration of the assurance required can be as significant as the
direct cost.
To summarise, the impact on the safety case of changes is very non-linear. The
following tables summarise the discussion:
Changes to functional requirements
Arguments

Impact of small
change

Larger changes

Statistical testing

There will be a need


to redo tests of the
impacted
components: this
could be significant
in terms of project
time

Test environment may


change

Deterministic
arguments

The changes should


be localised and only
require partial reanalysis

May need to redo or


restructure whole
proof

Experience

The change would


invalidate the use of
prior experience
unless some
modularisation
argument would
apply

The change could


invalidate the use of
prior experience

Process

Need to reimplement
process or develop
changed/new parts.
Obsolescence could
be a problem

Need to reimplement
process

Table E1: Changes to functional requirements

82

Version: 1.1

Adelard Safety Case Development Manual

Changes to non-functional requirements


Arguments

Impact of small
change

Larger changes

Statistical testing

Additional tests directly


related to changes in
reliability requirement.
Note that number
varies as 10 to the
power of SIL

Could make tests


infeasible due to time
required for them

Deterministic
arguments

May require re-analysis


of timing analysis with
changes to data

May require complete


re-analysis, e.g. building
new model of system
Changes to
performance and
optimisation

Experience

May require re-analysis


and more data to use
this argument

May greatly weaken


the experience
argument if reliability
requirement
significantly increased

Process

Need to collect
process data as for
initial development

Process may not be


appropriate for
significant changes in
SIL

Table E2: Changes to non-functional requirements

E.2 Impending obsolescence


Impending obsolescence of the PES equipment can precipitate a need for
changes to the safety case. The extent of the change can vary from complete
replacement of the system, with the corresponding requirement for completely
new arguments and evidence to support the safety case, through to less drastic
modification. The following table provides some more detail of the possible
impact.

Version: 1.1

83

Adelard Safety Case Development Manual

Arguments

Impact on software safety case of change


to hardware

Statistical testing

Need to repeat with the new hardware.


Arguments for more limited testing might
be possible if software reused entirely

Deterministic
arguments

The arguments from specification to


source code should remain untouched if
using the same source languages
otherwise only reusable to detailed design
Need to redo those arguments from
source code to object code and
generate new evidence

Experience

Experience of application software may


carry over but not of operating system.
Again depends on extent of software
change

Process

There will be a need to control and


measure the process used for any
changes compatible with the use made of
process arguments

Table E3: Impact on software safety case of change to hardware

The extent of the change to software will obviously have a profound effect on the
changes to the safety case. The following table indicates the potential impact:

84

Version: 1.1

Adelard Safety Case Development Manual

Arguments

Impact of change to software

Statistical testing

Need to retest, probably completely

Deterministic arguments

Need to repeat parts of software


affected and to argue for partitioning

Experience

Evidence not easily reusable. Some


software structures may allow
arguments for reuse to be made

Process

Not reusable. Need new process


arguments for new software
Table E4: Impact of change to software

There is also the potential problem of the obsolescence of the software


engineering process, techniques and tools used to justify the software safety case.
The issues of maintenance of expertise are dealt with below.
This problem has been recognised in the defence industry, and is to some extent
addressed in defence standards and guidelines that require delivery of the tools
and supporting environments used to develop the software. This provides the
potential capability for maintenance but poses the problems of how to maintain
these essential tools in working order. The tools fall into a number of categories:

documentation support: tools that do not perform technical analysis or


transform the system, such as word processors and configuration
management tools
checking and analysis tools, e.g. static analysis, timing analysis, and test
coverage analysis tools
transformation tools e.g. compilers, code generators, linkers and loaders

The issue of obsolescent tools needs to be addressed in the periodic review and
an appropriate response formulated. This might involve:

Selection of an alternative tool, and migration of the relevant items


(documentation, analysis or software) to the new tool.
Running the existing tool on a software emulation of an old system. For
example, the Malpas analysis tool only runs on a DEC VAX machine, but

Version: 1.1

85

Adelard Safety Case Development Manual

the VAX environment can be emulated on the current DEC Alpha


machines.

Adopting an entirely new approach (e.g. new language, different forms


of analysis, etc.).

The costs and risks of the approaches need to be considered, and this would
have to include the issue of maintaining expertise in the tools.

E.3 Changes to regulatory environment or safety criteria


Another driver for maintenance of the safety case is changes to what might be
broadly called the regulatory environment. Changes in the regulatory
environment can sometimes lead to changes in the safety case without top level
changes to the PES requirements. This can be as a result of new standards or
interpretations of standards, new tools and technologies becoming feasible, or
shifts in attitude to risk.
Shifts in attitude to risk might occur from changing perceptions of new
technologies, from the investigation of incidents, or from wider social or political
pressures. For example, an incident involving, say, problems with a
communication protocol in another industry might focus concern on that issue
and require a reinvestigation of the safety case. This may be no more than a
reappraisal that the system is not vulnerable to the incident, or it could require
major new analytical work to bolster the safety case in this area.
The emergence of new standards, or the re-interpretations of existing standards,
could also lead to changes in the safety case. These could impact almost any
area but, given the long gestation period of standards, it should be possible to
track developments and to plan accordingly. In terms of international standards,
the development of IEC 61508 is a significant step as this is likely to have a major
impact on the safety related PES market, and might lead to systems becoming
classified and assessed to IEC 61508 and being used in application areas where
safety cases have traditionally been developed using industry-specific standards.
Either through the development of standards or through wider industrial
application, technologies may be shown to be feasible and hence, given the
present interpretation of ALARP as representing good industrial practice, become
requirements. This may either lead to retrofitting techniques to the safety case or,
as is more usual, to requirements on new systems or modifications of the system.
Potential candidates here are the increased use of statistical testing, more
rigorous justifications from field experience, and an increase in formality and the
degree of proof.
Good industrial practice can also change with respect to the software
engineering process. The increased take up of capability assessment and process

86

Version: 1.1

Adelard Safety Case Development Manual

maturity models may lead to increased requirements for software development.


For example, a large UK company has set target Software Engineering Institute
Capability Maturity Model (SEI CMM) levels of 4 and 5 for safety related software
development and intends that these should be achieved within the next couple
of years.
One must also countenance possible changes to the criteria underpinning the
safety case. This work is broadly in line with the existing policy of requiring two
arguments in the safety case. But the issues of the strength of the arguments
should they both be equally strong?and how the number of arguments might
change with the criticality of the application, are presently unclear and subject to
further regulatory development.

Version: 1.1

87

Adelard Safety Case Development Manual

88

Version: 1.1

Adelard Safety Case Development Manual

Appendix F Safety case review


checklist
F.1 Basis for the checklists
In this section we define the basis for the
checklists which follow. We consider the objective of a safety case and elaborate
on the implications of the different parts of the definition.
The objective is to produce
a demonstrable and valid argument that a system is adequately safe
over its entire lifetime

F.2 Demonstrable
Understandable. The safety case (or a component part) has to be presented to
and understood by different audiences, such as the developer, the operator and
regulator.
Evolutionary. The safety case has to be presented at different phases in the
system lifetime, i.e.: system concept, system development, acceptance,
operation and replacement.

F.3 Valid
Accurate. As a prerequisite for a valid argument, the evidence presented should
be accurate, i.e.:

Internally consistent.
Be available to all interested parties. We have termed these the
stakeholders; they could include the regulator, developer, subcontractor, and customer departments (e.g. engineering, health and
safety, operations and maintenance).
Be up-to-date and relate to the actual system design.

This is achieved by producing the safety case within an established safety and
quality management system which tracks the status of the various components of
the safety case and system design and controls the release of documents.

Version: 1.1

89

Adelard Safety Case Development Manual

Related to safety properties. The arguments should directly support claims about
the required safety properties of the system (reliability, fail-safety, etc.). Arguments
of good practice (e.g. we tried hard) are not sufficient
Designed for assurance. The construction of a valid safety case may be not
feasible unless an appropriate design is used. A design for assurance approach
is advocated where the system design and safety are developed in parallel to
ensure that:

safety properties can implemented

the design is feasible

the associated safety argument is credible

the approach is cost-effective

KISS (keep it simple). The risk of flaws in the system design and the associated
safety case will increase with complexity. Complexity should be minimised
wherever possible (see Section 5.1.1).
Traceable. Safety properties at one level will be translated into design features at
a lower level. It should be possible to demonstrate a clear link between top level
safety goals and the functional behaviour and attributes of implemented
subsystems
Robust. Arguments may contain flaws. The overall claims should not be sensitive to
individual

F.4 Adequately safe


ALARP. The risk level should be as low as reasonably practicable. It should be
shown that further improvements are either unnecessary or too costly. This will
require analysis of the associated costs of developing the system design and its
associated safety case.
Satisfies design criteria. The nuclear industry frequently uses design safety criteria
that are based on engineering consensus and prior experience. To be considered
adequate, the safety case should show the design criteria have been satisfied
and the safety analysis has respected imposed constraints (such as claim limits).

F.5 Over its entire lifetime


Adaptable. The original system and safety case should be designed to
accommodate likely changes in:

90

Version: 1.1

Adelard Safety Case Development Manual

operational requirements

the operational environment

hardware (e.g. replacement due to obsolescence)

changes in safety requirements

These factors should be considered in the initial design process.


Sustainable. The lifecycle support requirements for the safety case should be
considered during the initial design phase. A support infrastructure is needed to:

check that the integrity of the current safety case is maintained

respond to required changes

The infrastructure requirements have to be feasible over the long term. This
requires an assessment in the design phase of the costs and risks of maintaining
the safety case over the long term.

F.6 Checklist for the technical adequacy of the arguments


This section covers the requirements for a safety case to be a valid
argument that a system is adequately safe.
F.6.1 Completeness of argument

Is there adequate coverage of the safety related attributes?

Are the criticalities of attributes defined and justified?

Is there coverage of initiating faults in the system?

Are the mechanisms for eliminating faults and dealing with failure
adequate?
Is there coverage of operations and maintenance risks and adequacy of
defences?
Does the argument conform to the design criteria?

Version: 1.1

Assessment rules (e.g. claim limits)

91

Adelard Safety Case Development Manual

Design rules

F.6.2 Credibility of argument

Are there a number of independent arguments supporting a claim?

Is an appropriate safety case structure is used?

the system design and the associated safety argument is kept simple (see
Section 5.1.1)

Is the argument readily understandable?

Is the evidence of the required quality?

Are the assumptions credible and/or demonstrably conservative?

F.6.3 Integrity of the safety case documentation and system design

Does the documentation satisfy standards and is it under


configuration control?

Is the documentation correct?


Is the documentation consistent?
Is the documentation complete?
Is the cross-referencing in the documentation correct and
appropriate?

Is the documentation unduly complex?


F.6.4 Checklist for integrity of the operations and maintenance infrastructure

Is there a large number of departments and organisations involved


and/or complex contractual interfaces?

Does the organisation have a safety culture?


Is operator training carried out?

92

Version: 1.1

Adelard Safety Case Development Manual

Is maintenance training available?


Are operation and maintenance procedures formulated and
maintained?

Are all documents available and subject to configuration


management?

Is there project stress (e.g. shortage of sufficient staff and time)?


F.7 Long-term maintainability of the safety case
This section gives checklists to indicate whether a safety case is likely to
remain valid over the lifetime of the system to which it relates. More
discussion of long term issues can be found in Appendix H.
F.7.1 Robustness to system change

What is the likelihood of changes to system function, technology, or


regulatory requirements?

Is there the capability to adapt to the identified changes, and


provision made for the costs of implementing change?

Is there dependency on operator and maintenance intervention?


Is there the infrastructure for long-term support (organisations,
responsibilities, procedures, skills, resources, budget)?
F.7.2 Long-term integrity of the safety case support infra-structure

What is the number of organisations involved? What are the


contractual interfaces?

Is there scarcity of skills for performing the safety case analysis and
updates?

Is there access to domain knowledge about the safety


application?

Is there access to documentation and supporting evidence?

Version: 1.1

93

Adelard Safety Case Development Manual

Is there project stress (e.g. shortage of sufficient staff and time)?


F.7.3 Impact of technological obsolescence

Does the system depend on a specific computer system or supplier?


Does the system depend on specialised niche technologies (e.g.
nucleonics)?

Are there any fall-back options (and are they acceptable to


regulators)?
F.7.4 Impact of regulatory change
For example, can more stringent requirements for the following be met?

94

diversity (design diversity or functional diversity)

segregation

independence (e.g. for IV&V)

evidence to support an attribute

number of independent arguments

design criteria and claim limits

Version: 1.1

Adelard Safety Case Development Manual

Appendix G Use of field


evidence to support a reliability
claim
This appendix discusses the use of prior
experience as a justification for
reliability. This is normally applied to pre-developed systems. Typically the predeveloped systems are commercial off-the-shelf software (COTS) like operating
systems, compilers, graphics or networking software which may be used as part of
an overall design. There can also be complete hardware/software packages
which are configured for specific applications.
The main incentives for using such packages are reduced development time and
(potentially) better reliability since extensive usage allows faults to be detected
and corrected. The following sub-sections look at some of the empirical evidence
on achieved reliability, and some of the underlying theory which is applicable to
COTS (and to a lesser extent) new software developments.

G.1 Empirical evidence


The most direct method of estimating reliability is to record failures in the field
when the product is used. This data can then be analysed to obtain a reliability
estimate. This does however require a very extensive failure recording and
collection process to be implemented over many different sites and companies.
This may be difficult to achieve in general, although data may be available for
very specific application areas.
One study of available field reliability data [2] showed that quite high reliability
figures can be obtained (see figure below).

Version: 1.1

95

Adelard Safety Case Development Manual

(measurement bound)
10000
1000
100
MTTF
(years)

10
1
0.1
0.1

10
100
1000 10000 100000
Operational Use (years)

The data relates mainly to commercial products used for real-time applications
(control, protection, telephone switching, etc). For one of the protection systems,
the MTTF approaches 1000 years. However MTTF may be an unsuitable measure
for such systems as the important attribute is the probability of failure on demand,
and demands may be infrequent (e.g. less than one per year). Nevertheless clear
trends were found in the study:
1. The reliability seems to be higher with increased operational use
2. Small software applications have higher reliability than large programs
given the same level of operational use.
Using IEC 61508 terminology, the field reliability results indicate that a System
Integrity Level target of SIL 2 (10 to 100 years mean time to failure) is achievable
for some commercial real-time products but that many fall below this. It also
provides some justification for treating SIL3 and SIL4 as very onerous requirements
requiring special measures.
Another independent study of reliability in PLC applications [4] yielded the
following results.

96

Version: 1.1

Adelard Safety Case Development Manual

Industry
Sector

Years of
operation

No. of Failures

Safety
significant

Production
significant

Minor

Total

Nuclear

924.0

16

Chemical

74.5

Oil and
gas

64.5

Electricity

54.4

10

1117.4

11

16

30

Totals

Note that all the failures observed were due to faults in the application software
rather than the underlying PLC operating system. The average failure rate of the
application software is about once in 35 years, and about once in 100 years for
safety-related failures. Again this is consistent with a SIL2 target (10 to 100 years).
Like the previous study, this study found a correlation between application
complexity and unreliability.

G.2 Theoretical analysis


There are a large number of reliability growth models available, but they generally
require detailed data and are only accurate over the short term. However there
has been recent work on estimating a worst case bound for reliability after a
given period of operational use [3]. This prediction can be made with quite
limited data.
Provided that:

the pattern of use is broadly similar over time

faults are diagnosed and corrected immediately

the theory predicts that for a system with N faults, the achieved reliability after a
usage time T will be bounded by:

MTTF

Version: 1.1

e
T
N

97

Adelard Safety Case Development Manual

where MTTF is the mean time to failure, and e is the exponential constant
(2.7181..). Studies of empirical reliability seem to indicate that this result applies in
practice, and the empirical results shown in the section above are consistent with
this finding. For example [3] discusses the application to a number of data sets.
One of these is from three generations of teleswitch equipment. Most of the
detailed data are confidential, but information is available about: the number of
known faults; the software size; and the failure rate over time. Most of the reliability
growth data are based on operation in the field. One complicating factor is that
new systems were being progressively installed on different sites, each with a
different operational profile and possibly different software options, so that new
parts of the input space could be covered for each new installation. The results for
one generation of teleswitch are shown below. We have used a fault estimate
which is 50% greater than the known faults.

MTTF
Predicted Bound
(N=175)
10.00

1.00
MTTF
(years
0.10
)
0.01

0.001
0.0

0.1

1.0

10.0

100.0

Prior Usage Time (years)

Figure 1 Teleswitch reliability growth


(note axes are logarithmic)
Note how the model provides a long term prediction. The following shows the
growth in time-to-failure (TTF|t) using random input distribution test data.

98

Version: 1.1

Adelard Safety Case Development Manual

1000000
100000
10000
TTF
1000
(cycles)
100

TTF
Bound
(N=31)

10
1

100 1000 10000 1E+5 1E+6


Usage Time
(cycles
Figure 2 Growth of time to failure: PODS uniform random test data
(note axes are logarithmic)
1

10

The predicted lower bound is also plotted on the figure, assuming N=31. It can be
seen that most TTFs lie above the bound. The bound actually relates to the
average TTF, so statistically some TTFs could fall outside the limits. The one point
that falls a long way below the line is known to be a correction-induced fault, but
this has little impact on subsequent reliability growth

G.3 Application of the theory to COTS


Any claims for a COTS product based on such evidence would have to
demonstrate that the underlying assumptions were respected, namely that:

the developer has an appropriate infrastructure for collecting and


analysing field fault reports
the developer has appropriate quality and configuration management
controls for implementing the required corrections
the product is mature, so that successive releases are related to fault
corrections rather than the addition of new functions (which could
introduce a new set of faults)

Version: 1.1

99

Adelard Safety Case Development Manual

the usage of the product in the intended safety application is typical


and avoids infrequently used functions

To perform the calculation, we also require the overall usage time of the product
and an estimate of the number of residual faults. The usage time can be inferred
from the number of units sold, and a reasonably good estimate of residual faults
can be obtained by multiplying the software size by the expected fault density.
The fault density might be provided by the developer, or a generic figure could
be used. Relatively conservative generic values of fault density are:
1 fault per kilobyte of binary code

(if there is no knowledge of the


source code)

10 faults per kilo line of source

(lines of source code excluding


comments)

For example, a small PLC with 20 kilobytes of code might have 20 residual faults. If
the PLC had 10 000 years of prior usage we might expect the MTTF for operating
system faults (excluding hardware failures) to be better than:

e
2.718 10000
T=
1300 years
N
20
This is consistent with the empirical evidence (i.e. that no PLC operating system
failures were observed in 1000 years of operation). More complex systems will
have more faults so the expected level of reliability growth will be lower. For
example a teleswitching system might contain ten to a hundred times as many
faults, so the reliability after a similar level of usage might be one or two orders of
magnitude lower (i.e. between 10 and 100 years MTBF which is broadly consistent
with empirical observation)
The theory and empirical results support the KISS principle (Keep It Simple). Simpler
systems should contain fewer faults and hence become reliable more rapidly
than large systems.
It also follows that rapidly evolving designs will be more unreliable than stable
designs. Some systems may be subject to continuous change to incorporate new
functions. These changes can reduce the reliability to a much lower level since
the new faults will have been exposed to relatively little usage and hence can
have much higher failure rates. Under conditions of continuous change, the
failure rates of the new faults can be the dominant factor, i.e. the limit will always
be worse than eT/N where N is the number of new faults introduced in the
periodic upgrades which occur after a usage time T. So for a system that
introduces 100 new faults in each upgrade, and upgrades once per year over

100

Version: 1.1

Adelard Safety Case Development Manual

1000 sites (i.e. N=100, T=1000 years), the best reliability that can be expected at
the end of the year is at most:

eT 2.718 1000
=
27 years
N
100
and in the early stages in the upgrade period the MTTF bound will be much
smaller. It would therefore be sensible not to upgrade to a new version until
extensive field experience has been gained.

G.4 Application to a new system


The theory can also be used to predict field reliability for a system which is under
development. However the approach cannot support a claim for high reliability
since the amount of usage time T that can be accumulated under the realistic
conditions is typically quite low. For example, the theory predicts that, for a new
system with N=100 residual faults N and T=4 months of realistic field trials, the
resultant MTTF would be around 3 days.
The only exception to this might be demand-based system (such as shut-down
systems and interlock systems) where a large number of realistic demands can be
simulated in a relatively short test period. A similar equation can be used where
the probability of failure on demand (PFD) is calculated by replacing T with the
number of test demands. In this case the theory predicts that for 100 faults,
around 37 000 test demands would be needed to achieve a PFD of 10-3. Such a
test programme might be feasible within a test period of a few months.
For real-time systems with low reliability requirements or demand-based systems, it
is feasible to measure the achieved reliability directly (e.g. record the failures and
execution rate or demands and compute the reliability). Clearly it takes less than
4 months to demonstrate whether an MTTF of 3 days is being achieved.
Thus the main relevance of the theory is that it gives an indication of how long it
will take to reach a given reliability level. This can be used to check the realism of
the project plan and the likelihood that target levels of reliability will be reached.

G.5 Estimating residual faults


In order to apply the theory, it is necessary to make an estimate of the number of
residual faults. The available evidence suggests that program complexity is the
primary determinant of the number of residual faults. The number will increase
approximately linearly with program size but specific development methods, tools
and verification methods can reduce the occurrence rate. In the PLC study the
program size was measured by the number of coils or input-outputs. The
average incidence of faults detected in operation was around 0.5 faults per 1000
input-outputs. In conventional computers, program size is typically measured in

Version: 1.1

101

Adelard Safety Case Development Manual

kloc (kilo-lines of code) and studies show the post-delivery fault density might lie
between 1 and 5 faults per kloc for conventional development processes.
For in-house software development more accurate estimates of fault density may
be feasible. For a well-established development process applied to large systems,
more precise estimates might be obtained from process profiling. This involves
estimating the fault detection profile for previous projects, and the early
developmental fault data can be scaled to derive accurate estimates of residual
faults. To illustrate the following table shows the process profile of the PODS
software diversity experiment [1]. The results are probably not typical of larger
projects, but it does illustrate the overall approach.

Detection Method

Cust. Req
68

Suppl Spec Design


53

19

Code
26

Cust. Spec Review

52

Suppl. Spec Review

38

Design Review

14

Code Review /Test

24

Acceptance Test

14

Faults
Created

Detected

Remaining

Table G1: Fault detection performance (PODS project)

The column headings are stages in development (i.e. production of documents


and code). The row headings are fault detection techniques that are applied.
From these results we obtain a profile of the fault creation and fault detection
rates. Assuming the process is typical, the next project can rescale these figures
(e.g. based on the relative sizes of the programs) or on the relative number of
faults found in the early phases of the project. For example, in the PODS project
around 10% of faults escaped acceptance testing and were found in simulated
field operation. If another project used the same process and 40 faults were
found at acceptance testing we might expect around 4 faults to remain in the
released software. While it may not be typical, the figure of around 10% of the
acceptance faults has been observed in some large real-time projects as well.
Alternatively, the data can be used to estimate fault density at different stages of
development (and especially in field operation). Given knowledge of the size of
the final product in terms of lines of code, and assuming a similar process will yield
similar fault densities in the released product, an estimate can be formed be
multiplying the program size by the predicted residual fault density.

102

Version: 1.1

Adelard Safety Case Development Manual

G.6 References
[1]

P.G. Bishop, et al, PODS a Project on Diverse Software, IEEE Trans.


Software Engineering, Vol. SE-12, No. 9, 1986, pp 929-940

[2]

P.G. Bishop, R.E. Bloomfield, C.C.M. Jones, Quantification of Software


Reliability in a Nuclear Safety Case: Main Report QUARC 1 Project,
GNSR/CI/2/3, SNL Contract 70B/0000/006384 Adelard report ref,
D/68/4301/4 v1.0, 4 May, 1995

[3]

P.G. Bishop and R.E. Bloomfield, A Conservative Theory for Long-Term


Reliability Growth Prediction. IEEE Trans. Reliability, vol. 45, no. 4, Dec.
1996, pp 550-560

[4]

R.I. Wright and A.F. Pilkington, An Investigation into PLC Reliability, HSE
Software Reliability Study, GNSR/CI/21, Risk Management Consultants
(RMC), Report R94-1(N) Issue B, Nov. 1995

Version: 1.1

103

Adelard Safety Case Development Manual

104

Version: 1.1

Adelard Safety Case Development Manual

Appendix H Long term issues


H.1 Introduction
This section provides guidance for
maintaining the integrity of the safety
case. The long term maintenance of the
safety case should address the:

safety case maintenance process

technologies used to underpin the safety case

human resources used to implement the process

organisational infrastructure used to sustain the process

We have assumed in our recommendations that the safety management context


will be broadly similar over time, i.e. that there will be periodic safety reviews;
there will be certain stakeholders (procurers, suppliers, regulators, safety
departments, operators and safety committees); and there will be established
safety and quality management processes.
We define the overall long term objective as:
The safety case should remain acceptable over its planned lifetime and
respond to changes in the equipment, environment, and technical
knowledge.
In Appendix J we provide an elaboration of this objective and checklists to
support its achievement. Below we discuss the impact on the safety management
process.

H.2 Incorporating the guidance in existing safety management processes


Long term issues should be addressed during the design and the development of
the safety case (see Section 5) and of course during the operation and
maintenance phase of the lifecycle. We envisage that the assessment guidance
will be integrated into existing safety management activities, namely:

safety and quality audits

Version: 1.1

105

Adelard Safety Case Development Manual

periodic reviews of the safety case

the approvals process for system modifications

These processes should be extended to include the integrity of the safety case
infrastructure. In the case of the system modification procedure, this may require
a redefinition of a modification to include changes in the safety case
maintenance environment (i.e. people, structures, resources and procedures),
and an extension to the scope of periodic reviews and audits.
We propose two forms of assessment:

A safety case infrastructure assessment, to assess the capability for


maintaining and updating the safety case. This will be based on an
assessment of the adequacy of the supporting resources needed to
maintain the integrity of the safety case, e.g. documentation, procedures,
staff and organisation, and technical resources.
A safety case technical assessment, to establish the continued adequacy
of the safety case itself and its capability to meet new requirements.

We also identify the set of documents needed to support these assessments, and
propose that a mechanism is put in place which updates the assessment
guidance in the light of practical experience and new technical knowledge.
It will also be necessary to consider what activities should be implemented at the
system level and the corporate level. It would be logical to make the long-term
monitoring of new technical knowledge a corporate function so there should be
some central corporate activity which:

106

collates system experience (e.g. failure data, common cause failure and
incident analyses)
monitors technical advances, standards and regulatory requirements
alerts system sites to immediate problems (e.g. to other sites with similar
systems)
analyses past experience and updates the design and safety case
assessment guidance and checklists

Version: 1.1

Adelard Safety Case Development Manual

H.3 Long-term improvement of the safety methodology


It is unlikely that any proposed assessment methodology will be perfectaspects
may be incomplete or incorrect, so it is necessary to have some feedback
mechanism which assess the performance of the methodology and incorporates
practical operational experience. Such processes have been implemented in the
aerospace industry. The basic approach for learning from experience is to:

maintain records of past experience (e.g. incidents, failures, etc.)

actively capture new feedback data and application experience

analyse the data and derive the lessons learned (these could be more
general than the incident itself)
review and validate the new rules
incorporate the rules in design criteria, claim limits, checklists, assessment
procedures, etc., for use on subsequent projects
verify the application of the new rules applied in each project

It should be noted that this long-term improvement process need not result in an
increased assessment and maintenance burden. Optimisation is part of the
processthe performance and costs of the existing rules and recommendations
should be assessed. If these prove to be ineffective or irrelevant or there are more
cost-efficient alternatives, the rules should be changed to reflect this.
Greater knowledge of the maintenance effort could have a significant impact on
the approach to the design of new safety systems (e.g. by employing simple
designs or additional defences which reduce the safety case maintenance
requirements). This could be reflected in updated design guidance and design
criteria.

H.4 Safety case maintenance documentation


The documents required to deal with the dynamics of long-term safety case
support include:
Infrastructure requirements. This identifies the expected level of safety case
maintenance support (e.g. staff competencies, technical resources and
staffing levels). This would have been considered at the initial design stage,
but will be updated in the light of subsequent changes.

Version: 1.1

107

Adelard Safety Case Development Manual

Anticipated change list. This is a list of possible changes that have been allowed
for in the current system design and safety case, and may need to be
updated over time.
Current concerns list. This would include lists of safety issues that require either
resolution, monitoring or further analysis (similar in principle to the Hazard Log
defined in Defence Standard 00-56). This list can change with time (e.g.
problems can be resolved, and new concerns can be identified).
Safety case infrastructure status report. Produced at periodic reviews to assess the
adequacy of the safety case infrastructure, i.e.:

staff competencies

documentation

technical resources

and any recommended remedial action. Identified problems would be


added to the list of current concerns.
Operational safety case status report. Produced at periodic reviews to assess the
integrity of the current safety case, this would include:

unresolved issues and reported problems

changes to safety case and system since the last review

audit of operation and maintenance procedures and their compliance

review of anticipated changes

summary of the safety case infrastructure report

identification of any potential problems and recommendations for


remedial action

Identified problems would be added to the list of current concerns.


Assessment checklists. Used in the periodic assessments of the integrity of the
safety case and its infrastructure.
Change log for safety case. Records changes to the safety case.
System configuration record. Records the current and past states of the safety
system.

108

Version: 1.1

Adelard Safety Case Development Manual

Feedback records. Contains incident records, equipment failure records, software


and hardware problem reports.
Feedback analyses. Identify problems and update rules and recommendations
(design rules, infrastructure rules and checklists). Identified problems would
be added to the list of current concerns.
Change requests. To correct problems, to respond to operational or regulatory
changes, etc.
Modification impact analysis. Assesses the safety implications and costs of a
requested change. This can result in either a rejection of the request, or a
modification proposal with an appropriate safety classification, which is
handled by the existing system modification procedures.
Safety case construction and maintenance guidance. This guidance could be
updated in the light of experience.
Updated safety case and system design documents. These are the final products
of the safety case maintenance process.
Supporting DocumentsQA records, CM records, V&V and test records, safety
analyses, safety audit reports, etc., produced by the existing QMS and safety
management processes.

Version: 1.1

109

Adelard Safety Case Development Manual

110

Version: 1.1

Adelard Safety Case Development Manual

Appendix I Maintenance and


human factors
Maintenance of a safety case can be
viewed as a general human factors
problem, by virtue of its being an
example of complex collaborative
human activity. As such, it is subject to
weaknesses arising in its constituent human individual and collaborative work.
In this appendix, we review the generic vulnerabilities and discuss how these
might manifest themselves in terms of a safety case and its supporting processes.
The weaknesses arising from human factors can be broadly classified in terms of
their source, as follows:
1. generic individual weaknesses to errore.g. skill-based, rule-based, and
knowledge-based errors
2. supporting materialshow does the design and actual use of documents
and other representations help or hinder the human activities they are
designed to support?
3. violations of established proceduresdo existing accepted or
documented procedures encourage deviations or departures that are
(a) harmful or (b) necessary to get the job done?
4. generic group weaknesses to errore.g. group co-ordination and
process failures arising from inappropriate resources; co-ordination
failures or motivational problems
5. organisational weaknesseshow does the communication, culture and
structure of the organisation impact on the process and its constituent
activities?

I.1 Individual weaknesses


In general, errors due to human factor weaknesses do not arise in a random,
haphazard manner. One well-used taxonomy of errors (see for example [7]), is the
distinction between skill-based, rule-based and knowledge-based errors. Skillbased errorssometimes called slips and lapsesare typically associated with
execution failures of routine planned action, and can thus arise when any of the
components of planned action fail.

Version: 1.1

111

Adelard Safety Case Development Manual

Rule-based activity can be thought of as problem-solving where pre-packaged


solutions or rules are applied. Thus rule-based mistakes can arise if those rules are
inappropriate or misapplied.
Knowledge-based activity, where plans and solutions have to be devised without
recourse to pre-packaged rules or solutions, is also vulnerable to error.
Knowledge-based mistakes may arise when errors are made in the formulation of
plans or in judgmental processes and reasoning about novel situations based on
existing knowledge. For example, availability and frequency biases may be
exhibited in solutions to novel problems that are merely chosen because (a) the
solution easily comes to mind or (b) the solution is chosen because the current
problem seems to resemble an existing problem, or has been used before.
The maintenance of experts skills and domain knowledge is clearly an issue for
complex processes. In particular, we might be concerned with the maintenance
of expertise in the use of notations and languages. One concern might be over
the ability to read and understand a computer program written in an almost
dead language (e.g. Argus assembler); or at another extreme even over the
natural languages used to document the safety case. In the USA, for example, a
significant proportion of the workforce on PES systems on chemical plants may not
have English as their first language. We might also be concerned with some
artefacts of the maintenance process (e.g. an antique piece of computer
hardware, or static analysis tool) and how expertise is maintained in their use. If
certain aspects of the system require external expertise, there is a likelihood that
some de-skilling may take place for internal experts.
The maintenance of expert knowledge requires close consideration of how
expertise and know-how is to be transferred and communicated within the
particular community maintaining the safety-case. Different types of expert
knowledge require different kinds of support. Explicit knowledge (facts,
information, text-book knowledge etc.) may be transferred by means of formal
documentation within the appropriate organisational groups. However, part of an
experts skills and knowledge may be in the form of tacit knowledge or knowhow that is hard to articulate and transfer. Attempts to manage organisational
knowledge merely by focusing on those aspects that are easy to manage
(namely explicit knowledge) may exclude other aspects that require different
kinds of organisational and technical support. Tacit knowledgetypically the
informal how-to-do-it expertiseis often poorly supported by documentation,
and typically requires some form of sustained apprenticeship to be transferred.
We therefore need to consider the maintenance of domain knowledge for the
individual; especially the role of tacit knowledge which tends to remain in the
head of the particular expert. If the appropriate level of expertise and
knowledge is not sustainable, maintainers of safety cases may not justifiably
assume that the rationale behind all aspects of the safety case will be

112

Version: 1.1

Adelard Safety Case Development Manual

reconstructable or make sense outside the particular context in which it is first


constructed.

I.2 Supporting materials


Supporting diagrams, documentation and notations perform a central role in the
maintenance process. Careful consideration needs to be given to whether the
structure and design of such materials supports the work actually done.
Maintenance of individuals knowledge in obscure notations and representations
may become an issue when those notations or conventions fall out of common
usage.

I.3 Violations
Past incidents and accidents may provoke restrictions and prescriptive
procedures on the actions of users of the system. Increased maturity adds further
restrictions as time goes by, perhaps resulting in procedural over-specification to
the point where user violations are the only way to actually get the job done. An
overly-prescriptive safety case procedure set against time demands may
therefore encourage violations and result in unsafe acts. Thus it is important to
consider, in an open-minded manner, any difference between the prescribed
procedures for the process and the actual procedures followed by the users.

I.4 Group weaknesses


The maintenance of safety cases involves the co-ordinated activity of teams of
experts. These teams may include a wide range of stakeholders in the safety case
maintenance process including technical experts, plant managers, regulators
consultants and so on. The ability of such diverse groups to maintain good
cohesion, co-ordination and functioning is central to the success of the safety
case maintenance activity.
Consideration of the social-scientific literature can be used to identify potential
sources of weakness for co-ordinated social group activity. In particular we can
look for weaknesses associated with:

resourcesare the available human resources lacking or inappropriate


for the task? Teams and group leaders should be selected according to
the skills and experience they possess relevant to the task.
normshow is the function of the group presented to the group
members? Are the group norms that govern the groups functioning
explicit and available to the group members or to observers outside the
group? How is consensus formed and managed within the group?

Version: 1.1

113

Adelard Safety Case Development Manual

performancehow are the contributions of group members produced


and co-ordinated? For example, there may be: status-related problems
(e.g. experts are over-believed); socio-motivational problems (so called
free-rider problems); or group co-ordination problems arising from
inappropriate leadership styles and group management.
evaluationhow are the individual contributions and overall products of
the group evaluated? If the supervision and evaluation of the group is
inappropriate, members may be apprehensive about contributing, or
contributions may be inappropriately weighted towards expert or
majority opinion.

I.5 Organisational issues


Organisational factors and corporate culture must also be considered when
identifying sources of weakness due to human factors. At an organisational level
the maintenance of a safety case clearly depends on the process within an
organisation, and also on the assumptions about the nature of the organisation
embedded in that safety case. Organisational change is a fact of life and the
extent to which an organisation copes with change can greatly effect safety. For
example, Rochlin [5] identifies organisational failure in high hazard technologies
arising from organisational change as:

failure to adapt to changing technologies and retention of traditional


forms of decision making
too rapid change and the neglect of an organisations experience base
failures due to not recognising that changed organisational or mission
goals could undermine existing mechanisms for error control

Many analyses of incidents (e.g. the Challenger and Herald of Free Enterprise
disasters) that have naively been attributed to human error have shown that
organisational context and culture are central in assuring the safety of a process.
An inappropriate organisational context surrounding a process may provide
latent errors that lie dormant until a particular set of coinciding events come
together to form a safety critical incident. Furthermore, organisational, culture
and communication structures (e.g. [6]) determine the extent to which corporate
knowledge and good practice may be reused; otherwise old problems may have
to be revisited and solved afresh each time. Organisational weaknesses may be
associated with:

114

structureWhat is the organisational structure? Different categories of


organisational weakness are associated with, for example, centralised vs.
decentralised structures; complex vs. linear interactions; and tight vs.

Version: 1.1

Adelard Safety Case Development Manual

loose component coupling. For example, do any single points of failure


exist in terms of vital human resources or communication links?

communicationsWhat are the existing communication channels and


structure? How is safety case knowledge disseminated? How is nonexplicit knowledge recorded if at all? If people are the main sources of
communication links (X meets Y), are there other means by which the
non-explicit, tacit corporate knowledge and learning can be
disseminated to appropriate personnel?
safety-cultureWhat is the actual safety culture of the organisation? Do
procedures encourage violations of regulations? How is responsibility for
safety managed and controlled?
learningHow are experts re-trained in the light of technological and
organisational change? How is experience made available to a wider
audience and disseminated throughout the organisation? How is tacit
knowledge managedit may certainly be easier to ask Fred, but what
happens to Freds knowledge when he leaves?

While the documentation, structuring and other systemisation of a safety case


may make it more explicit and make the process altogether more algorithmic,
there are limits to how much knowledge can be captured in this way.

I.6 Knowledge management


The problems of maintaining an adaptive response to the changes in structure
and demand for organisational knowledge resources raise general issues
surrounding the management of knowledge and learning within the whole
organisation.
In particular a long term knowledge-based project such as the maintenance of a
safety case requires a commitment to knowledge management at an
organisational level.
Different types of knowledge require different management strategies. For
example, the role of tacit knowledge needs to be recognised and integrated
within long term development projects. Recent work has been done looking at
the role of tacit knowledge in the design and uninvention of nuclear weapons
[2] portraying a picture whereby expert know-how may cease to exist unless
actively maintained and transferred. Also, in the medical sector there is the
example of a large hospital that threw away a scanner when the only technician
who knew how to operate it left. Attempts to codify tacit knowledge into an
explicit form are therefore of value if they provide some means of access to
otherwise inaccessible expertise.

Version: 1.1

115

Adelard Safety Case Development Manual

However, research [1,2] cautions against an overly technical approach to some


of the problems of long term maintenance. For example, the use of computer
based expert systems into which are poured the expertise of an engineer
before he retires are at best only a very limited solution to the problem of loss of
experienced personnel. Such attempts seldom capture the valuable aspects of
expertise that are typically hard to codify and formalise. Instead, the formal
knowledge may be easily encoded, but without any notion of how that
knowledge should be applied, or when it is no longer applicable.
Hard-core tacit knowledge is much better passed on through a process of
apprenticeship, whereby this expert know-how can be transferred through
observation and shared participation.
On an organisational level, informal communities of practice may evolve to
support the sharing and dissemination of knowledge between interested parties.
Such groups tend to be focused around particular problems or interests and,
although they typically do not produce deliverables or documents, provide an
important source of information sharing within a long term project.
Organisational supportrather than strict management of such bodiescan
enable these informal social networks of learning and knowledge sharing to
provide additional sources of important non-formal knowledge and continuity for
their participants.

I.7 References
[1]

Collins Artificial Experts, Social Knowledge and Intelligent Machines MIT


Press, Cambridge, Mass, 1990.

[2]

D MacKenzie and G Spinardi, Tacit knowledge, weapons design, and the


uninvention of nuclear weapons. Manuscript 1994.

[3]

Safety Case management task forceFinal report to steering group:


proposals for the redesign of the safety-case management process.

[4]

F J Redmill, Dependability of Critical Computer Systems 2, Part 2:


Maintenance and Modification. ISBN 1-8516-203-0, Elsevier, 1989.

[5]

G Rochlin, Essential friction: error control in organisational behaviour, in The


necessity of friction, Springer Verlag, 1993.

[6]

Scott D Sagan, The Limits of Safety: organisations, accidents and nuclear


weapons, Princeton, 1993.

[7]

Reason, J., Human Error. Cambridge University Press, Cambridge, UK, 1990.

116

Version: 1.1

Adelard Safety Case Development Manual

Appendix J Example checklist


long term issues
J.1 Basis for the checklists
In this section we define the basis for the
checklists which follow. This is achieved
by considering the overall objective of maintaining the safety case and then
elaborating the implications of the different parts of this definition. We assume
that, as an initial condition, the safety case is accepted, the maintenance
infrastructure has been assessed and the system has been licensed, possibly with
known, negotiated concerns (e.g. things to fix later, monitor or investigate).
The overall objective is as follows:
The safety case should remain acceptable over its planned lifetime and
respond to changes in the equipment, environment, and technical
knowledge.
We now consider the implications of this objective. First we consider the
requirements arising from a static safety case (where the basic system and the
environment are unchanged). Responding to the inevitable changes is dealt with
in Section J.3.

Version: 1.1

117

Adelard Safety Case Development Manual

Maintaining
the safety
case

Respond to
changes

Environment

Remain acceptable given


static system and
environment

Demonstrable

Equipment

Consistent

Technical
Knowledge

Valid
Adaptable

Human resources
Technical resources
Documentation

J.2 Remain acceptable


To remain acceptable the safety case should be: demonstrable, consistent, valid
and adaptable.
It might be argued that such requirements are irrelevant if a system and safety
case is unchanged. But as a matter of principle, a company should be able to
understand the systems it operates, and be able to demonstrate safety to a third
party at any time. The safety case maintenance infrastructure provides this
understanding. In practice of course, nothing is ever entirely static, and a safety
case infrastructure is needed to assess potential threats to the integrity of the
safety case, and to demonstrate its continued acceptability in the light of new
developments. It is also necessary to check that the infrastructure has not
degraded with time and is capable of responding to anticipated changes.

118

Version: 1.1

Adelard Safety Case Development Manual

J.2.1 Demonstrable
The safety case should be demonstrablefor each stakeholder there should be
adequate human resources, documentation and technical resources to
understand and evaluate the safety case.
Adequate human resources
The safety case is not demonstrable unless there are people available who
understand the safety case and its relationship to the safety system. This
requirement applies to each stakeholder. Some of the issues involved in
maintaining safety case knowledge and skills are discussed below.
Maintaining Skills: There will be a need to identify the skills and knowledge
necessary for the stakeholders. The required skills and knowledge should be
documented, together with the staff who provide these capabilities (e.g. in a
competence matrix). This should include any key sub-contract staff who
provide maintenance support. This matrix should define the required depth of
understanding; in some cases it may only be necessary to have sufficient
knowledge to understand what others have done, while in other cases there
should be the in-depth knowledge needed to create acceptable documents or
designs.
The safety case infrastructure status assessment should:

assess the available competencies of the staff and sub-contractors


report on the adequacy of coverage; this would include the depth of
understanding in each area
identify any potential risks, such as excessive dependency on a single
person or sub-contract
recommend remedial actions, such as recruitment, training, supporting
contracts or in-house support

Tacit knowledge: The safety case may rely on unexpressed knowledge and
expertise within the safety team or supporting experts. Some of this may be in the
form of implicit assumptions and background rationale for design decisions that
have been made. However some of the deep expertise of domain experts may
be in the form of know how and difficult to express (see Appendix I). This form of
tacit knowledge is hard to formalise and codify and can be a vulnerability once
these key personnel retire or move on.

Version: 1.1

119

Adelard Safety Case Development Manual

To address this vulnerability, the review should assess the extent of tacit
knowledge, and recommend how it may be converted to explicit knowledge, or
maintained for the future.
Adequate documentation
The safety case documentation set should meet the needs of the various
stakeholders. It should be written with a clear understanding of who the target
audience is, their likely tasks, and how the safety case documentation set is going
to support these tasks. In particular it should be:

completein terms of coverage of the (sub-)system, and references to


supporting material
well-structured (good indexing and cross-referencing)to support basic
user navigation and document understanding
understandable and usable by the various stakeholdersdoes the safety
case make reasonable assumptions about the audiences background
technical knowledge, does it support user tasks such as review, evaluation
and assessment

A global assessment of the adequacy of the documentation could be performed


prior to acceptance of the original safety case, and be rechecked after any
significant change.
Adequate technical resources
Adequate technical resources should be available to maintain the safety case.
These should include the ability to reconstruct evidence and supporting analyses,
such as: hazard analyses, reliability, availability and maintainability analyses, and
so on. This will require the existence of appropriate tools and techniques to reuse
and reinterpret such data.
Due to the typical size and complexity of a safety case there is a need to provide
adequate support for the documentation set itself. This tool support should itself
support and be based on an understanding of:
Safety case developers taskssuch as authoring, document management,
review and evolution
Safety case user taskssuch as assessment, navigation, search and so on
These will need to be supported by

120

document archive and retrieval systems and data bases

Version: 1.1

Adelard Safety Case Development Manual

document configuration control, analysis, cross reference and navigation


tools
word processing facilities

A periodic assessment should be made of the availability of the tools, hardware


and software environments and the people needed to support these functions.
The assessment should:

identify any potential problem areas (e.g. obsolescence of tools, lack of


resources)
make recommendations to rectify the situation (e.g. migration to a new
database or word processor, training, use of external consultancy, etc.)

J.2.2 Consistent
For each stakeholder (e.g. operation, regulator or safety department) the safety
case documentation set(s) should be consistent, i.e.:

all the stakeholders should have the same documentation set

the documentation set should relate to the current system in operation

the documentation should be internally consistent (in terms of crossreferences and dependencies)

The available records can be reviewed and stakeholder sites can be audited to
see if the latest versions have been distributed. Even if the documents remain
unchanged, responsibilities and organisations may alter, and it may be necessary
to check that the current stakeholders have the relevant documentation.
An audit can also be performed to check whether the safety case has taken into
account any changes to the system and the operational environment, e.g.:

changes to related systems

changes in operation and maintenance procedures

Any mismatches should be identified, the causes analysed and, where necessary,
changes in procedures implemented. This may require an analysis of existing
processes, and should take into account human factors aspects (see also
Appendix I). For example, a private marked-up copy may exist which reflects
the true configuration of the system. One response is to make the procedures
more strict. However a human-centred analysis might conclude that the existing

Version: 1.1

121

Adelard Safety Case Development Manual

procedures are too restrictive, causing the official procedures to be bypassed. If


this was the root cause, a more streamlined and rapid document updating and
dissemination process might be needed.

J.2.3 Valid
Issues to be addressed include:

Stability of the environment so that the safety case assumptions and


evidence remain valid.
Procedures should exist for monitoring for changes in the equipment and
environment that could invalidate the safety case.
The safety case maintenance process should track identified concerns or
caveats in the initial safety case.
The integrity of the PES equipment should be assessed to ensure that
physical deterioration does not affect the performance levels assumed in
the initial safety case.
The maintenance and operational procedures required by the safety
case should be checked for adequacy and conformance.

J.2.4 Adaptable
The safety case and supporting process should be capable of responding to
anticipated changes. As part of the overall safety case methodology, a list of
anticipated changes should be identified, and the system design and safety case
should be able to accommodate those changes.
The capacity to adapt to change should be periodically assessed, e.g. by:

122

Reviewing and updating the list of potential changes. The anticipated


change list should be reviewed and updated in the light of operating
experience, changing requirements (e.g. for changed modes of system
operation such as a change to load following from base load operation)
and developments in technology (e.g. test methods, understanding of
diversity, sensors, or obsolescence).
Assessing the capability to implement such changes (e.g. availability of
staff skills or technical resources) and documenting the areas that are
difficult to change.

Version: 1.1

Adelard Safety Case Development Manual

Recommending any necessary changes to the support infrastructure to


facilitate change (i.e. recruitment, re-deployment, training, increasing
technical resources, etc.).
Recommending feasibility studies to cope with impending changes. For
example, technical obsolescence might be addressed by using a
software emulation of an earlier computer. A study might be needed to
establish if this was feasible and to assess the impact on the safety case.

J.3 Respond to changes in the equipment, environment, and technical


knowledge
J.3.1 Equipment Changes
The safety case should evolve and remain acceptable after equipment changes.
There are a number of different categories of change:

changes to equipment maintenance and operational procedures (e.g. to


increase the intervals between maintenance)
replacement of obsolete hardware, but with no changes to functionality
(e.g. a simple refurbishment)
changes to functionality but not the basic equipment (e.g. changes to
parameter settings or configuration options which have been anticipated
in the original design)
changes in functionality which may involve changes to both hardware
and software

For any change, there should be processes in place to assess and manage the
impact of these changes on the safety case. This will include:

analysing safety significance (both for the system and more globally)
identifying what changes are required to the safety case and the system
design
assessing commercial aspects of the change: risks and costs
(implementation cost, outage delays and lifecycle support costs)
negotiating the proposed changes (e.g. to procedures, equipment, and
safety case) with appropriate stakeholders (the relevant stakeholders will

Version: 1.1

123

Adelard Safety Case Development Manual

depend on the safety category and commercial significance of the


change)

approving the changes with the licensors

implementing the change

At a more general level, we can also use long-term experience feedback to


improve the overall safety case design and maintenance methodology. This is
discussed in more detail in Section H.3.

J.3.2 Changes in the environment


The safety case should remain acceptable after changes to the environment
(e.g. to organisations, individuals or supporting tools).
Periodic assessments would normally check that the environment is substantially
the same, and could recommend remedial action if any creep in the
environment is found. This can include:

loss of skilled staff and associated tacit knowledge and know how (see
Appendix I)
changes of responsibility and organisational restructuring
loss of ready access to key resources (e.g. documentation, technical
equipment or expertise).

obsolescence of technical resources (e.g. databases or test equipment)

project stress (e.g. reduced time-scales or greater work-loads)

changes in management attitude and culture (e.g. is the notification of a


safety problem rewarded or penalised? do payment schemes encourage
several quick bodge-ups rather than a single good repair?)

There are a number of possible recommendations for remedial action. Typically


these involve the development of a policy and strategy to support the relevant
organisation safety case knowledge. This should be supported by appropriate
communication channels within the organisation together with systems to support
knowledge sharing and transfer (see Appendix I.6)
Major changes to the environment should be assessed and approved in
advance, since they can have a very significant impact on the safety case. These
would usually be organisational changes. The process would typically require a

124

Version: 1.1

Adelard Safety Case Development Manual

proposal which identifies the organisational changes and maps out the changes
in resources and how the new structure is to be aligned with the safety case
maintenance tasks.
For both remedial actions and major changes there should be a process involving
the system stakeholders which can accept the proposed change and approve
the resulting implementation. This would typically be part of the normal safety
management process (e.g. involving the plant safety committee, the corporate
safety departments and the licensors).

J.3.3 Changes in technical knowledge


Maintaining an adaptive response to technical changes requires a similar
approach to that of responding to general environmental changes. In particular
there is a need to be aware of the contribution of domain experts to the integrity
of the safety case. There needs to be an approach to managing this intellectual
asset so that the safety case can (a) make the most of any changes (b) mitigate
against vulnerabilities arising from any loss of expertise.
The safety case should remain valid or be improved in the light of new technical
knowledge.
New technical knowledge can come from a variety of different sources,
including:

Long-term monitoring of the validity of design assumptions (e.g.


quantitative data, such as failure rates, and new phenomena, such as
unforeseen failure modes). This can be obtained using experience gained
from the operational system, similar systems and generic data.
Analysis of incidents and experience (with safety or commercial
consequences) on the current and other systems.
Industry-wide technical interest groups that share experience
Technology watch to identify new methods or new threats (e.g. mobile
phones and EMI, or viruses).
Changes to interpretations of engineering principles (e.g. diversity or the
single failure criterion). For example in early protection systems, similar
designs implemented with different relays were considered diverse. A
more strict interpretation might require different designs and different
technology.

Version: 1.1

125

Adelard Safety Case Development Manual

Reinterpretation of ALARP (e.g. the policy on what is feasible, or the


influence of other organisations practices).

The impact of any new knowledge on the safety case can be either positive or
negative. The information should be assessed to establish whether:

The safety case is still valid, or whether changes are required to the safety
case or the system.
The safety case is too conservative (e.g. pessimistic design assumptions for
fail-safe bias, failure rates, etc.). The new information may permit stronger
claims to be made about the system.
Whether the safety case is still ALARP (e.g. are any of the new methods
reasonably practicable?).

At a more general level, we can also use long-term experience feedback to


improve the overall safety case design and maintenance methodology. This is
discussed in more detail in Section H.3.

126

Version: 1.1

Adelard Safety Case Development Manual

J.4 The checklists


Here we provide a checklist of issues and questions to be addressed in meeting
the main objective defined above, namely:
The safety case should remain acceptable over its planned lifetime and
respond to changes in the equipment, environment, and technical
knowledge.
To meet this objective we proposed the following sub objectives:
Remain acceptable
Demonstrable in terms of human resources, documentation and technical
resources
Consistent
Valid
Adaptable
Respond to changes
Equipment changes
Changes in environment
Changes in technical knowledge and learning from experience

J.5 Demonstrable
The requirement:
The safety case should be demonstrablefor each stakeholder there should be
adequate human and technical resources and documentation to understand
and evaluate the safety case.
The following sets of questions address this requirement.

J.5.1 Human resources


Who are the stakeholders that need to be able to understand the safety case?

Version: 1.1

127

Adelard Safety Case Development Manual

Maintaining skills:
For each stakeholder identify:

Who has application knowledge (e.g. of trip systems, reactor types) ?


Who has knowledge of the generic implementation technologies (e.g.
computers, software, laddics)?
Who has specific knowledge about the system design and
implementation?

Who understands the overall safety case arguments and rationale?

Who understands the safety case arguments?

Who is able to read and understand the details of the safety case?

Who needs to be able to write(change) the safety case in the future?

Ask for evidence of these understandings, e.g. reviews.


The safety case is not demonstrable unless there are people available who
understand the safety case and its relationship to the safety system. This
requirement applies to each stakeholder.
The safety case infrastructure status assessment should:

assess the available competencies of the staff and sub-contractors


report on the adequacy of coverage; this would include the depth of
understanding in each area
identify any potential risks, such as excessive dependency on a single
person or sub-contractor
recommend remedial actions, such as recruitment, training, supporting
contracts or in-house support

Tacit knowledge.
Assess the extent to which the safety case relies on tacit knowledge or know
how of experts. Look for indications such as:

128

Version: 1.1

Adelard Safety Case Development Manual

the existence of gatekeepers who seem to know where to look for


information and how to interpret that information
the requirements for use or interpretation of complex test data,
languages, formalisms
the extent of skilled hands on expertise (e.g. in maintenance or
operation)

Develop a strategy to maintain and transfer this tacit knowledge in the future, for
example by:

converting it to explicit knowledge by making explicit underlying


assumptions and background context
direct explicit attention to elaborating the rationale behind the work
supporting the safety case
enabling the transfer/development of know how by supporting some
system of safety case mentor/apprentice arrangement or training
approach
encourage the sharing of expertise across interest groups or internal
communities of practice

J.5.2 Documentation

What constitutes the safety case documentation set? Is it complete?

Is this structured to facilitate navigation, browsing and searching?

What indexing and cross referencing is provided?

Which stakeholders will use the safety case documentation set?

What assumptions are made about the different users background


knowledge?
What will be typical user tasks in using the safety case? Consider activities
such as familiarisation, evaluation, assessment and so on.
To what extent does the structure and presentation of the
documentation support the user tasks?

Version: 1.1

129

Adelard Safety Case Development Manual

Is the safety case documentation accessible? How?

J.5.3 Technical resources


For each stakeholder ask:
Is there adequate availability of tools, hardware and software environments,
people to use them to reconstruct evidence and supporting analyses used in the
safety case (from safety analysis to testing)? Consider how the users will
reconstruct, interpret and reuse analyses such as:

hazard analyses

reliability, availability and maintainability analyses

functional tests and analyses

performance tests and analyses (i.e. timing, memory usage, throughput,


etc.)

Is there adequate availability of tools to support the access, navigation and


maintenance of the safety case documentation? Consider the following:

document archive and retrieval systems and data bases


document configuration control, analysis, cross reference and navigation
tools

text retrieval and search engines

word processing facilities

user annotations and cross-references

networks/intranets to support distributed use of such documentation

Is there evidence of the use of the tools by the stakeholders for the safety case?

J.6 Consistent
The requirement:
For each stakeholder (operation, regulator, safety dept) the safety case
documentation sets should be consistent with the current configuration of
the system.

130

Version: 1.1

Adelard Safety Case Development Manual

You should consider:

How is consistency maintained between the different versions of the


safety case documentation (e.g. those with different stakeholders)?
What tools are used to support configuration management? Traceability
across versions? Internal document set consistency?
What quality management and change management procedures exist?
Is there evidence of adequate use?
Is a change log maintained for the safety case documentation set?
How is consistency maintained between the actual physical equipment
and plant and the safety case documentation?
Is the right balance maintained between rigorous documentation control
procedures and the need for usability (e.g. are change procedures so
onerous that users maintain their own unofficial versions)?

J.7 Valid
The requirement:
The safety case should remain valid.
Consider the following questions:
Is the environment (in its broadest sense) stable so that the safety case
assumptions and evidence remain valid?
What monitoring is there to detect any changes that invalidate the safety case,
consider the following:

operational modes

interfaces

connected equipment outside the system boundary

new information, e.g. from the analysis of failure data, incidents, and
periodic tests

Version: 1.1

131

Adelard Safety Case Development Manual

Are concerns or caveats in the initial safety case tracked (e.g. things to fix later,
questionable assumptions, continuing investigations or supporting analyses)?
Consider:

questionable assumptions (e.g. on common mode failure, failure rates)

required investigations and supporting analyses

implementation of required fixes

integrity of PES

sensors and actuators

the computing system equipment

ancillary equipment (power supplies, cooling systems, etc.)

Review the maintenance procedures and processes. Consider:

What is the impact of the actual operational and maintenance


performance on the safety case?
Are there adequate procedures to maintain the equipment? Are they
followed adequately?
Is there monitoring of maintenance records to check the maintenance
process?
What types of vulnerability does the equipment have to maintenance
(i.e. was there design for maintainability)?
Have there been any incidents? e.g. has there been loss of availability
spurious tripdue to maintaining redundant channels incorrectly?
Have the causes of any common failures been identified?(e.g. wrong
versions in all channels defeating redundancy, passing of bad data,
wrong connections, system not restored after maintenance, calibration
problems, misdiagnose and repair of the wrong component)?

J.8 Adaptable
The requirement:

132

Version: 1.1

Adelard Safety Case Development Manual

The safety case and supporting process should be capable of responding to


anticipated changes.
The questions:

Has the need for change been addressed in the design of the safety
case?
Is there an anticipated change list? Is this reviewed and updated in the
light of operating experience, changing requirements (e.g. changed
modes of system operation such as a change to load following from
base load operation), and developments in technology (e.g. test
methods, understanding of diversity, sensors or obsolescence)?
Is it possible to adapt the safety case to change? Review the safety case
with respect to the anticipated change list. Assess cost of different types
of change. Document the areas that are difficult to change.
Does the safety case documentation structure and architecture support
its own evolution and development?

Version: 1.1

133

Adelard Safety Case Development Manual

J.9 Respond to changes in the equipment


There are a number of different categories of change that need to be tracked:

changes to equipment maintenance and operational procedures (e.g. to


increase the intervals between maintenance)
replacement of obsolete hardware, but with no changes to functionality
(e.g. a simple refurbishment)
changes to functionality but not the basic equipment (e.g. changes to
parameter settings or configuration options which have been anticipated
in the original design)
changes in functionality which may involve changes to both hardware
and software

For any change, is there a process in place to:

analyse safety significance (both for the system and more globally)?
identify what changes are required to the safety case and the system
design, i.e. what is changed and to what extent? Consider:

134

same arguments and type of evidence but regenerated (e.g. for


different compiler or different hardware)
same type of evidence but change in rigour (e.g. more testing)
change in evidence sources (e.g. replace test evidence with field
experience)
change in argument (e.g. change in strategy as fail-safe design
features added)
changes in operational modes or maintenance procedures (e.g.
system modes or contractual arrangements)
changes in deployment and long term support for the system (i.e.
skills, tools, process, training for new technologies, etc.)

assess commercial aspects of the change: risks and costs


(implementation cost, outage delays and lifecycle support costs)?

Version: 1.1

Adelard Safety Case Development Manual

negotiate the proposed changes (e.g. to procedures, equipment, and


safety case) with appropriate stakeholders (the relevant stakeholder will
depend on the safety category and commercial significance of the
change)?

approve the proposal (depends on actual process)?

implement the change?

agree the change with the licensors?

approve the change?

J.10 Respond to changes in the environment


Assess changes to the environment (in terms or organisational, human resource
and tools) that could affect the validity of the safety case. Consider the extent of:

past/proposed loss/migration of skilled staff


organisational protections against the ensuing loss of expertise and tacit
knowledge and know how
any changes of responsibility
loss of ready access to key resources (e.g. documentation, technical
equipment or expertise)

obsolescence of technical resources (e.g. databases or test equipment)

project stress (e.g. reduced time-scales or greater work-loads)

changes in management attitude (e.g. is the notification of a safety


problem rewarded or penalised? do payment schemes encourage
several quick bodge-ups rather than a single good repair?)

Possible recommendations for remedial action include:

recruitment and retraining

documenting tacit knowledge

Version: 1.1

135

Adelard Safety Case Development Manual

setting up structures to spread tacit knowledge (e.g. a safety case


apprentice, group reviews of safety case arguments or providing
organisational support for communities of practice)
facilitating formal and informal communications (e.g. via communities
of practice) between key groups (e.g. access to the instrumentation
group may be needed to discuss the failure behaviour of sensors)
changes in responsibilities (e.g. maintenance of all safety instrumentation
might be assigned to a single person or group)

provision of new technical equipment and resources

changes in resource levels, and management incentive structures

Assess organisational changes which typically have a very significant impact on


the safety case. Develop a proposal to deal with these changes that:

Identifies the current organisations, their procedures, tasks and interfaces,


and staff involved.
Describes the proposed new structure, procedures, tasks and interfaces,
and staff involved.
Includes an impact analysis of the proposed changes that assesses:

136

the coverage of current tasks (are all tasks covered? are some tasks
duplicated?)
the organisational fit (e.g. are tasks spread across organisational
boundaries? will the speed of response be acceptable?)
the loss of expertise and domain knowledge (will past knowledge be
diluted, or split across separate organisations? do we have a
lobotomised organisation?)
inter-group communication and number of contractual barriers (e.g.
the number of interfaces involved in implementing a given activity
or change)
access to documentation and expertise

Provides an argument that the new structure is capable of supporting


design and safety case changes.

Version: 1.1

Adelard Safety Case Development Manual

Includes the input from the system stakeholders

J.11 Respond to changes in the technical knowledge


Identify relevant sources of new technical knowledge that could impact the
safety case. This could include:

Long-term monitoring of the validity of design assumptions (e.g.


quantitative data, such as failure rates, and new phenomena, such as
unforeseen failure modes). This can be obtained using experience gained
from the operational system, similar systems and generic data.
Analysis of incidents and experience (with safety or commercial
consequences) on the current and other systems.
Industry-wide technical interest groups that share experience.
Technology watch to identify new methods or new threats (e.g. mobile
phones and EMI, or viruses).
Changes to interpretations of engineering principles (e.g. diversity or the
single failure criterion). For example in early protection systems, similar
designs implemented with different relays were considered diverse. A
more strict interpretation might require different designs and different
technology.
Reinterpretation of ALARP (e.g. the policy on what is feasible, or the
influence of other organisations practices).

The impact of any new knowledge on the safety case can be either positive or
negative. The information should be assessed to establish whether:

The safety case is still valid, or whether changes are required to the safety
case or the system.
The safety case is too conservative (e.g. pessimistic design assumptions
for fail-safe bias, failure rates, etc.). The new information may permit
stronger claims to be made about the system.
Whether the safety case is still ALARP (e.g. are any of the new methods
reasonably practicable?).

Version: 1.1

137

Adelard Safety Case Development Manual

J.12 Long-term improvement of the safety methodology


Assess the support for learning from experience within the organisation.

Is there a process for learning from experience?

Are incidents and feedback data collected?

138

How are new requirements elaborated from existing incident and


feedback data?
What is the process for verifying the application of new requirements to
new projects?

Version: 1.1

Adelard Safety Case Development Manual

Appendix K Example safety


case
This appendix illustrates our approach to
the construction of a safety case. It is
intended to illustrate the basic
principles, and focuses on the overall
system architecture, and the identification of requirements and constraints for the
hardware and software. The following safety case has been developed for a
notional reactor trip system. It is hoped the requirements are similar to those for
real reactors, and that many of the arguments presented below would be
applicable to a real system.

K.1 The environment


K.1.1 The plant
The plant is a gas-cooled nuclear reactor containing 400 fuel pins. Each pin is in a
separate gas duct and is cooled by carbon dioxide gas, and if the gas flow is
restricted in any duct the fuel pin could overheat and rupture. A reactor trip
system is required to trip the reactor if an excessive temperature is observed in
any duct.

K.1.2 Sensors and actuators


The temperature in each duct is measured by two thermocouples. The reactor trip
is implemented by dropping safety rods into the reactor.

K.1.3 Failure modes


The rod drop system is designed to be fail safe.
Thermocouples could fail to an open circuit state, to a short circuit state or
gradually degrade.

K.2 Trip system requirements


We will base our example on the requirements of a notional reactor trip system
which has two thermocouple probes in each of the 400 individual reactor coolant
ducts to detect overheating.

Version: 1.1

139

Adelard Safety Case Development Manual

The primary safety requirement for the trip is:


R.TRIP

Trip the reactor if the temperature is too high in any gas duct

There are a number of associated performance requirements for the safety


function:
R.PFD
R.TIM

Probability of failure on demand < 0.001 per annum


Maximum response time 5 seconds

The implemented system must also satisfy a number of operational and


maintenance requirements:
R.STR
R.FIX
R.TST

Spurious Trip Rate < 0.1 per annum


MTTR (including identification) 10 hours
Periodic on-line test interval: 3 months

The integrity should be maintained in the light of changes:


R.UPD
R.SEC

Can be modified to meet expected changes


Can withstand maintenance errors and malicious attacks

The system also has to satisfy specific design criteria, e.g.:


D.F1
D.F2

No single independent fault affects availability


No two independent faults affect safety

K.3 Candidate system architecture


A system architecture has to be evolved which can satisfy these overall
requirements. One solution is shown below. Note that PAC denotes the Protection
Algorithm Computer and DCL stands for Dynamic Check Logic.

140

Version: 1.1

Adelard Safety Case Development Manual


Isolation
Amplifiers

Coded
output
signal

Square
wave
signal
DCL

PAC

Thermocouples
B
DCL

PAC

DCL

PAC

Fail-safe
guardline
logic
2oo4

DCL

PAC

Serial lines
Monitor
Computer
Figure K1: Reactor protection example: system architecture
Each design feature addresses one or more of the safety requirements as
described below.

K.3.1 Redundant channels and thermocouples


Since there are four channels, a single channel failure will not cause a spurious
trip, similarly testing can proceed on a single channel without causing a trip. If two
channels fail to no-trip, the safety function is still maintained (R.TST, D.F1 and D.F2).
The 2oo4 channel voting reduces spurious trip rate (R.STR) in the presence of
random failures. With only two thermocouples however special arrangements are
needed to minimise the spurious trip rate due to thermocouple failures. If required,
one sensor of a pair can be disconnected and tested without the need for a veto
(discussed later).
The four channels and dual thermocouples also reduce the risk of a failure on
demand (R.PFD), and the risk of maintenance induced faults (R.UPD).

Version: 1.1

141

Adelard Safety Case Development Manual

K.3.2 Fail-safe design features


Each Protection Algorithm Computer (PAC) produces a dynamic output signal
which is checked by the Dynamic Check Logic (DCL) check hardware. This
design continuously checks the integrity of the input/output and should be failsafe if it encounters a systematic or random fault. This reduces the risk of a failure
on demand due to an unrevealed fault (R.PFD) and can aid fault detection
(R.FIX).
The DCL checks for an expected output trip pattern based on the injection of test
signals as shown in the figure below. A test signal is fed into each ADC input card
(which is assumed to service 8 analogue inputs). Half the test inputs are
connected to test source T1 and the other half to T2. The test sources T1 and T2
can produces values which should be just above and just below the trip level. The
test values are swapped over by a test mode selector output from the DCL (the
alternation occurs after a complete scan). The test signal inputs to the PAC are
carefully chosen to ensure that a unique pattern of trip output signals is produced
on alternate cycles. This checks the operation of the input hardware and the
setting of the trip level. It also detects stuck-at inputs because the DCL expects
different trip patterns on alternate scans and will freeze if the wrong pattern is
found.

PAC

800 thermocouple
readings

T1

DCL
Coded
output
signal

Square
wave
signal

T2
Test
Source
Test mode selector

Figure K2: Dynamic check logic for a reactor trip channel


The integrity of the underlying computer hardware and compiler is checked using
the reversible computing concept (see reference [7]). This is sensitive to both
systematic faults and random failures in the hardware or faults created by the
compiler and should result in a freeze which is fail-safe (D.F2)it would also
reveal malicious program modifications (R.SEC). Time overruns caused by infinite
loops are also detectable by the reversible computer technique (R.TIM).

142

Version: 1.1

Adelard Safety Case Development Manual

K.3.3 Separate monitor computer


This is an example of partitioning according to criticality. The more complex, but
less critical diagnostic functions are performed on a separate system. This
simplifies the design of the trip channel. Each channel provides:

software configuration data (limits, version numbers etc.)

measured values and trip results

The monitor computer can be used for pre-start checks on the consistency of the
software configurations in the four channels (R.SEC), and for on-line diagnosis of
channel failures and failures of thermocouples (R.TST, R.FIX). By comparing outputs
from the channels it is possible to decide whether the fault resides in a channel or
the thermocouple input system. It can also be used to monitor long term
degradation of thermocouples. If these are severe, availability can be
maintained by replacement or a veto.

K.3.4 Simplicity
The design has no intercommunication between channels and the A/D
conversion is performed within the PAC. There is no need for interrupt handling or
buffering so the software can be implemented as a simple cyclic program. This
should be easy to test and verify (R.TRIP) and alter (R.UPD).
Since the program is simple and cyclic, the worst case response time is bounded,
and the worst case time is readily determined via timing tests or code analysis. The
time delays in the interfaces can also be measured to determine the overall
response time (R.TIM).

K.3.5 Formally proved software


A simple cyclic program is amenable to formal proof (R.TRIP, R.PFD, R.STR).

K.3.6 1oo2 high trip logic


In order the minimise the risk of failing to trip on demand (R.PFD), either
thermocouple reading high will trip the reactor. To reduce the spurious trip rate,
this design imposes a fail-low direction on the thermocouples and buffer
amplifiers. A veto for a high-failing thermocouple forces the input low, but a
double veto is fail-safe as it will cause a trip (see below).

Version: 1.1

143

Adelard Safety Case Development Manual

K.3.7 2oo2 low trip logic


To ensure that the system is fail-safe if both sensors fail, the system will trip if a
thermocouple pair have readings well below the average sensor reading. This
design can withstand a transient loss of a single sensor (e.g. for repair) or a lowreading sensor without using vetoes (R.STR), this minimises the need for error-prone
manual vetoes. The sensor comparison can assist in detecting failed sensors
(R.FIX).

K.3.8 Program and trip parameters in PROM


The program and trip parameters are stored in separate PROMs so changes
cannot be made without PROM-burning equipment and physical access to the
machine (R.SEC). Configuration errors can also be revealed by the on-line test
inputs, the outputs to the monitor computer and the periodic tests. This helps to
ensure the intended trip function is performed (R.TRIP) and reduces the risk of a
failure on demand or a spurious trip (R.STR, R.PFD).

K.3.9 Modular hardware replacement


Plug in cards reduce the repair time (T.FIX). Simple input-output interfaces can be
easily upgraded to accommodate new types of sensor (R.UPD).

K.3.10 Use of mature hardware and software tools


This reduces the risk of systematic faults within the system (R.TRIP, R.PFD, R.STR). This
is an example of avoidance of novelty.

K.3.11 Access constraints


To limit the scope of maintenance error (R.SEC), all equipments are locked and
can only be accessed using the appropriate key (different for each channel). All
plugs and sockets are uniquely identified or physically different to prevent
misconnection. An indicator light is used to show when a cabinet is unlocked.

K.3.12 Summary of design features contributing to safety


The main features of the design, and their relationship to the safety requirements
are summarised in the following table.

144

Version: 1.1

Adelard Safety Case Development Manual

Requirement
PFD

STR

Redundant channels and


thermocouples

Fail-safe design features

Design Feature

TRIP

TIM

Formally proved software

F1

F2

SEC

UPD




1oo2 high temp trip in


software

2oo2 low temp trip in


software, fail-low bias on
inputs (to reduce vetoes)


Modular hardware
replacement
Mature hardware and
software tools




Design simplicityno interchannel communication,


cyclic software

TST

Separate monitor computer

Program and trip


parameters in PROM

FIX

Access constraints

Table K1: Safety case: design features vs. safety requirements

An alternative architecture might use design diversity as a method for reducing


the probability of failure on demand (R.PFD). For example, two channels could be
implemented using PLC type A, and two channels implemented using PLC type B.
The failure rate for the hardware and system software (and possibly fail-safe bias)
of the PLCs could be based on field experience (see Appendix G). Diversity might

Version: 1.1

145

Adelard Safety Case Development Manual

be used to claim an order of magnitude reduction in the probability of common


failures between the diverse PLCs.

K.4 Evidence from the development process


The development and verification processes can produce evidence that can be
used in the safety argument. Documentary evidence is needed to show that the
planned activities are being carried out correctly (e.g. audits). This is necessary to
have confidence in the documented evidence and its relevance to the actual
system.
More specifically there can be tests incorporated within the development process
to support claims about specific safety attributes, i.e.:
R.TRIP

Proof of conformance to specification


High trip tests for pairs and single inputs
Low trip tests for pairs and single inputs
Tests of independence between inputs from different ducts

R.PFD

Statistical reliability tests (104 representative trips)


Tests of fail-safe response (e.g. simulated failures)

R.TIM

Static analysis to determine the worst case execution time


Time response tests

R.FIX

Test of diagnosis and repair times using simulated faults

With an alternative diverse channel architecture using PLCs, we may not be able
to perform formal proofs but we might be able to claim an order of magnitude
reduction in failures per demand beyond that demonstrated in the statistical tests.

K.5 Long term support activities


Long term support requirements are discussed in Appendix H. This deals with the
long term infrastructure requirements necessary for maintaining and updating the
system. The details will not be discussed here, but there are some specific support
activities which can affect the system integrity, namely:
Scheduled testingproof testing to verify all inputs can produce a trip,
recalibration, etc. Scheduled testing for channels would typically be
staggered to reduce the risk of a common mode maintenance error.

146

Version: 1.1

Adelard Safety Case Development Manual

On-line fault detectionA fault might be diagnosed from a behavioural


anomaly (e.g. a partial or total trip), or by apparent discrepancies between
channels.
Fault diagnosisUsing available data from the computer monitor outputs,
and direct tests on the hardware, the source of the problem is identified.
RepairAn item is recalibrated, or an item is replaced. The channel or a
channel interface is powered down while this is done. The unit is retested
and the channel put on-line.
VetoIt is sometimes necessary to disable the normal functionality of the
system in order to maintain availability. The thermocouples are physically
located in the reactor and cannot be repaired immediately so a veto might
also be applied to avoid a spurious trip if a thermocouple sensor was failing
high. The trip for an individual fuel element may also be temporarily vetoed
for on-load refuelling.
RefuellingThe thermocouple connectors are disconnected on refuelling.
This is not a problem if the reactor is refuelled off-load, but disconnected
thermocouples could cause problems on start-up.
UpdatesThe software functionality may be changed. Changes are most
likely to be made to trip limits and scaling parameters, but in some cases the
program may be modified. The changes have to be verified off-line, and
correctly installed (via PROM replacement). The likely changes are
anticipated to be:

trip limit changes

change in number of inputs

change of computer hardware or software tools

change in trip logic

change of sensors

regulatory changes (design criteria, or evidence)

K.6 Arguments supporting the safety claims


For each safety requirement there will be one or more independent arguments to
support the claim. A subset of the safety claims is summarised in the following
tables:

Version: 1.1

147

Adelard Safety Case Development Manual

R.PFD

Claim

Argument

Assumptions

PFD<10-3 pa

C.PFD.RAND

Hardware reliability
analysis
(redundancy + monitor
+ self-tests)

Common mode
factor

The failure per demand


due to random
hardware failures is less
than 10-3

No systematic faults
(sub-claim C.NO-FLT)
(see Probabilistic Fault
Tree Analysis)

Component
failure rates
Fault detection
coverage and
fail-safe bias of
inputs
Repair times

C.PFD.SYST.1
Even if there are
systematic faults, the
chance of failure per
demand is less than 10-3
C.PFD.SYST.2
Fail-safe design will
ensure that at least 90%
of failures due to
systematic faults are
fail-safe
(Note: design
assessment criteria
might impose a claim
limit of 90%)

148

104 reliability tests using


representative trips
without failure give
more than 99%
confidence in a PFD of
10-3

Trip scenarios are


realistic

a) Double
thermocouple
disconnection or veto
will cause a trip

Thermocouples
fail low in 90% of
cases

b) Compiler, loader
and processor flaws
protected by the
reversible computing
technique

Tests indicate
99.995% fail-safe
bias

c) ADC and
application software
and configuration
flaws covered by
dynamic on-line tests

The requirements
are correct

Detects 90% of
systematic
failures

Version: 1.1

Adelard Safety Case Development Manual

Sub-claim

Lower level claim

Argument

Assumptions

C.NO-FLT.HW

Established designs +
system tests + reliability
tests imply that there
will be no systematic
hardware flaws

Extensive use will


reveal and
remove inherent
flaws

C.NO-FLT

There are no systematic


flaws in the hardware

Tests will reveal all


miswiring and
mis-configuration
C.NO-FLT.SW
There are no systematic
faults in the software

Version: 1.1

The code has been


formally proved

The requirements
are correct

It has undergone
functional tests to
reveal compilerinduced faults

The functional
tests can reveal
all compilerinduced faults

149

Adelard Safety Case Development Manual

R.TIM

Claim

Argument

Assumptions/
evidence

Time<5
secs

C.TIM.STATIC

Static analysis of worst


case path through the
code.

Instruction
execution times
are correct

Includes times needed


for i/o

ADC conversions
and output time
are correct

Timing measurements +
argument that the
execution time is
bounded and relatively
constant

Test results

C.TIM.REV

An excessive or infinite
loop will be detected
by the reversible
computer
implementation

The reversible
computer
implementation is
OK

R.UPD

Claim

Argument

Evidence

Updating
the system
should not
introduce
faults

C.UPD.DATA
C.UPD.PROG

Adequate support
infrastructure

See Anticipated
Change analysis

There is sufficient
protection to prevent
updates of program or
data introducing
dangerous faults

Procedures for testing


updates

Worst case cycle time is


2.7 seconds

C.TIM.TEST
Worst measured time is
2.4 seconds

150

On-line test injection will


reveal dangerous flaws
in trip limits and trip
logic

Version: 1.1

Adelard Safety Case Development Manual

K.7 Supporting analyses


The basic safety argument will refer to evidence from supporting analyses. This
evidence will change as the system is developed. Initially the analyses may be
based on initial assumptions (e.g. based on past experience) and design targets.
This can later be supplemented by test evidence and, in some cases, there may
be a requirement to gather supporting evidence during system operation in the
longer term (e.g. to confirm initial assumptions in the estimate of the probability of
failure per demand).
The following sections provide examples of analyses which support some of the
safety arguments.

K.7.1 Probabilistic fault tree analysis


The fault tree analysis is based on a system hazard identification study (not
discussed here) which uses conventional guide words to help identify potentially
dangerous failure modes of the various system components. A fault tree is then
constructed to identify combinations of events which can cause a dangerous
failure. The top event in the tree is when the system is unavailable but the failure is
unrevealed.
To be more concise, the fault tree is represented textually with the top events on
the left and sub-events indented. Terms in square brackets represent intermediate
or top events, and are expanded on the subsequent indented lines. The fault tree
covers the main safety related eventa failure to trip on demand. A similar tree
could be constructed for spurious trips.
The probability of the base events in the fault tree are based on estimates of
hardware reliabilities, and the likelihood of human initiated events. The
assumptions on which the analysis is based are listed first, followed by the
quantitative estimates for the minimal cutsets contributing to the top event.
Note that some events may be deemed incredible (i.e. probability zero) based
either on deterministic arguments or because of the depth of defences. Even if
zero, all probabilities are shown for later inspection and independent assessment.

Assumptions
10% of sensor failures are unrevealed
10% of buffer failures are unrevealed
Common failures are 10% of individual failures
10% of channel failures are unrevealed by a channel trip
10% of channel failures are unrevealed by the monitor
Channel failure rate (CPU + ADC + DCL) 1 pa
Sensor failure rate 10-3 pa

Version: 1.1

151

Adelard Safety Case Development Manual

Buffer failure rate 10-3 pa


MTTR 10 hours
Proof test interval 3 months

Probability estimation
The system is unsafe if a dangerous fault exists but is unrevealed. Internal checks,
monitor checks and proof tests are the main methods for revealing failures.
Systematic faults are mainly deemed to be incredible (see the sub-claim R.NOFLT).
For random failures we have to include the risk of common cause failures, and the
chance they will remain undetected until the 3-monthly proof test. Taking the
case of the sensors, the basic failure rate is estimated to be 10-3 per annum. We
assume that the common mode failures are 10% of this (10-4 per annum), and 10%
of these will be undetected until the 3 monthly proof test (10-5 per annum). On
average the dangerous sensor measurement failure will be unrevealed for one
and half months (0.125 of a year), so the probability of unrevealed unavailability is
(0.125 10-5 ). The unavailability of temperature measurements due to two
unrevealed random failures in one duct is negligible (around 10-10). Since the
demand is only made on one duct, we only need to consider the unavailability of
a single duct measurement.
A similar argument can be applied to the isolation amplifiers and buffers. The
dominant factor is again common mode failure, which is assumed to affect all
buffers simultaneously, so the calculation is identical to the one used for the
thermocouples.
For the hardware channel failures we assume the common mode failure rate is
10% of the single channel failure rate (10-1 per annum). Of these 10% are
unrevealed by a channel trip (10-2 per annum), and 10% of the remainder are not
detected by the monitor (10-3 per annum). An unrevealed failure persists an
average of 0.125 years, so the overall is 12.5 10-5
The probability assignments for the fault tree events are summarised below,
including those which are assumed to be incredible (probability zero).
[duct-specific fault]
Demand(i) and
2oo2 Sensor (I) failed unrevealed
or
3oo4 [ Buffer (A,I) and Buffer (B,I) fail
unrevealed ]
or
software reads input J instead of input I

0.125 10-5
0.125 10-5

(proof tests,
analysis)

or

152

Version: 1.1

Adelard Safety Case Development Manual

multiplexor reads input J instead of input I


or
[multiple channel faults]
3oo4 [hardware channels fail unrevealed]
or
wrong trip settings

(proof test, DCL)

12.5 10-5
0

(proof test +
monitor + DCL)

(no copies, DCL)

(no copies, DCL)

(analysis + test +
online test)

or
high trip logic flawed

(formal proof, test,


DCL)

or
multiplexor hardware latches past values

(proof test, DCL)

DCL fail-danger flaw

(analysis, fault
injection)

or
operating on stale copy of input data
or
sends old copy of output data
or
execution time too long

or

PFD

12.7 10-5

With an unrevealed unavailability of 0.13 10-3, and an assumed demand rate of


1 per annum, the estimated PFD is 0.13 10-3 pa is which is well within the 10-3 pa
target.

K.7.2 Anticipated change analysis


System Updates. The system and its safety case will need to be updated to
respond to functional changes, changes in technology, and regulatory
requirements (R.UPD). Potential changes to the system and their impact are
discussed below.
Trip limit changes. The safety case has to justify that trip limits are valid, the
changes are correctly implemented, and do not affect the remaining software.
The impact of the change is minimised by holding the parameters on a separate
PROM. The installed parameter settings can be verified by proof testing, via the
on-line test signals (each side of the trip limit) and via the monitor output.
Change in number of inputs. No fundamental changes are required in the design
or the safety case. It may requires changes in the input-output hardware,
software and DCL, but no change in the proof, and only small changes in the

Version: 1.1

153

Adelard Safety Case Development Manual

program which can be verified by proof testing and by testing in conjunction with
the modified DCL.
Change of computer hardware or software tools. The fail-safe integrity checks
provide protection against flaws in the new hardware and software tools. The
separate channel structure and simple input-output interfaces permit selective
upgrading on a per-channel basis (phased commissioning).
Change in functional requirements. Would require repetition of the formal proof
and the formally developed software. Proof tools have to be available (or be reimplementable on another system). Formal proof requires relatively scarce
expertise and could represent a risk in terms of greater implementation delays
and higher update costs. However licensing risks and the associated costs are
likely to be reduced.
Change of sensors. Relatively simple technology. Changes can be
accommodated by re-scaling the buffer amplifiers or changing the scaling
constants in the software. Verifiable via proof testing, dynamic on-line tests and
the monitor output.
Regulatory changes. If the requirements for diversity become more stringent,
diversely implemented channels can be used to protect against systematic
hardware and software flaws. This is relatively simple as each channel is
independent. Diverse sensors and buffers are also feasible. Requirements for more
rigorous system testing should be feasible as each channel is a standalone unit,
and tests can be performed individually without the need to test for interaction
effects.

K.7.3 Analysis of maintenance and operations


The possible failures that could occur in these activities are enumerated by
considering a number of guide words (e.g. incomplete or wrong). The design
safeguards are identified for each case. These could well be supplemented by
procedures, training, manual records and checklists, but are not discussed below.
Proof testing Incompletee.g. some elements not tested, transposede.g. tests
on wrong channel, wronge.g. incorrect recalibration.
Safeguardsclear identification of channel equipment, access keys
(different for each channel), limits on amount of adjustment, crosschecking subsequent behaviour via the monitor.
Fault diagnosis Incompletefailure to spot discrepancy between channels,
transposeidentify correct component type but not which one (e.g.
channel or thermocouple), wrongidentification completely wrong.

154

Version: 1.1

Adelard Safety Case Development Manual

Safeguardsproof tests, cross-checking subsequent behaviour via the


monitor, system trip (fail-safe, but undesirable).
Repair Incompleterepair omitted or partially performed (e.g. not fully
reconnected), transposeswap over connections or components, wrong
e.g. wrong component, wrong settings. Repair on the wrong channel could
cause a spurious trip if one channel is tripped already.
Safeguardsproof tests, cross-checking subsequent behaviour via the
monitor, PROM and computer self-tests, system trip (fail-safe, but
undesirable).
Veto Incompletee.g. sensor not vetoed on all channels, transposeveto
wrong sensor of pair, wronge.g. wrong channel vetoed.
Safeguardsproof tests, cross-checking behaviour via the monitor,
channel trip when sensor fails low or high, avoidance of vetoes for
normal operation and designed failure modes.
Refuelling Incompletethermocouple left disconnected, transposesensor
connections transposed, wrongbad connection (reading low, short
circuit).
Safeguardsreactor start-up checks, proof tests, cross-checking
behaviour via the monitor, connection labelling.
Updates Incompleteincomplete PROMS, transposePROMS in wrong
order, wrongwrong PROM version used, update incorrect.
Safeguardsproof tests, PROM integrity checks (e.g. CRC checks
across program PROMS and parameter PROMS), version and
parameter settings echoed to monitor. Cross-checking behaviour via
the monitor. Channel trip due to pattern mismatch at DCL.

K.8 Safety long-term support requirements


K.8.1 Support infrastructure
Activities:

safety reviews

problem analysis

system/safety case redesign

Version: 1.1

155

Adelard Safety Case Development Manual

Special tools/skills:

formal proof methods

reversible computer design

DCL design

test environments

test suites

FTA and RAMS techniques

Domain knowledge:

sensor characteristics

CMF mechanisms

Anticipated changes:

trip parameters

trip logic

fault detection

number of inputs

processor hardware

interface hardware

K.8.2 Maintenance support risks


Most of the maintenance and upgrade safety issues have been addressed in the
design, but upgrades could be hampered if there was a lack of key skills and
technologies. Replacement of obsolescent hardware does not require any
unusual skills. Reprogramming the software is mainly restricted to a reimplementation of the reversible computer instruction set and is a relatively
straightforward task. Functional changes will require a change to the formal
proof, and may be vulnerable to obsolescence of the support tools and formal
methods skills. There will be a significant delay if the formal proof has to be re-

156

Version: 1.1

Adelard Safety Case Development Manual

implemented from scratch using a different formal notations and support tools.
Obsolescence of the dynamic coded logic could be a problem, but the basic
structure should be re-implementable in a new technology, and the fail-safety
can be reviewed by independent specialists and tested directly by fault injection.
As a fall-back, the system could be re-implemented with diverse hardware and
software in the channels.

K.8.3 Regular analyses


The safety case is predicated on a set of design assumptions about the
equipment, the operational environment and the behaviour of connected
equipment. Records should be maintained of equipment failures and repairs, and
these should be analysed to determine whether these assumptions are borne out
in practice. The analyses would typically include:

equipment failure rates

component failure rates

proportion of common mode failures

proportion of fail danger faults

proportion of gradual and abrupt sensor failures

MTTR

maintenance error rates

proportion of equipment faults found in on-line tests and proof tests

spurious trip rate

software faults and the proportion which are dangerous

The impact of these results on the safety case should be assessed. If the results
undermine the safety case, changes to the system design, operating procedures,
or monitoring systems may be necessary.

K.9 Elaboration to subsystem requirements


If the candidate system architecture, safety case and support requirements are
acceptable, the design can be further elaborated into a set of design

Version: 1.1

157

Adelard Safety Case Development Manual

requirements for the subsystems. In the specific reactor trip example there might
be requirements for the following.
D.ARCH

Overall system architectureapportionment of functions, overall


design safety case, design assumptions, numerical design targets,
design constraints, required safety case evidence, operation and
maintenance infrastructure, design for change, long-term support
requirements.

D.ENV

Requirements for environmental tests (shake and bake) for all


hardware, maximum temperature, humidity, cooling requirements, EMI
protection.

D.POW

Power supply specifications, reliability requirements.

D.DCL

Specification of the DCL + fail-safety requirements.

D.INP

Input specifications (number, range, isolation, etc.).

D.ADC

Requirements for the ADC (number of inputs, range, speed, reliability).

D.MON

Requirements for the monitor and monitor interfaces.

D.CPU

Requirements for the CPU (speed, PROM capacity, RAM capacity,


input-output, etc.).

D.SW

Requirements for the software.

Note that the subsystem requirements will include any evidence required for the
safety case (e.g. environmental test evidence, timing, fault tolerance tests, fault
injection tests, etc.). This evidence could be part of the subsystem deliverable.
As an example of how the subsystem requirements are elaborated, the
requirements for the software (D.SW) are given below. The requirements placed
on the software are based on an apportionment of the top-level safety functions
together with additional requirements imposed by lower level design decisions.
The requirements include the basic functional requirements for the software,
specific design constraints on the implementation method, and requirements for
safety case evidence.

K.9.1 Software Functional requirements


SW.INFO

From (R.SEC and R.UPD). Every complete scan cycle, send the software
configuration data (number of inputs, input scale factors, trip limit
values, software version number and sumchecks).

SW.TRIP

From (R.TRIP). For all inputs:

158

Version: 1.1

Adelard Safety Case Development Manual

Scan the two temperature readings (Ra, Rb) from the ADC.

Scale the values to Ta and Tb.

Perform 1oo2 voted high temperature trip (HiTrip = max(Ta,Tb) > Tlimit).

Perform 2oo2 voted low temperature trip ( LoTrip = max(Ta,Tb) <


MinOpTemp)
(MinOpTemp is MaxDiff below the median operating temperature for
all ducts).

Send (HiTrip or LoTrip) to the DCL.

Send Ra, Rb, HiTrip, LoTrip values to the monitor output.

SW.IO

Satisfy the specified interface requirements for the ADC, DCL, and
Monitor ports (from D.DCL, D.ADC, D.MON).

SW.CHK

From R.MTTR. Halt if an internal failure is detected (PROM sumcheck,


RAM checks, processor, time overrun). Provide indication of the type of
fault detected.

SW.TIM

From R.TIM. The software scan cycle should be less than 5 seconds
including the time required for all input and output operations.

K.9.2 Safety case design constraints imposed on the software


1. For an architecture where there is common hardware and software in all four
channels:
SW.REV

Implement the software using the reversible computer technique.

SW.FM

Formal proof that code implements specification.

2. For an architecture using diverse PLCs and software in the channels:


SW.ETST

Exhaustive test for software components + arguments for


independence/ composability.

SW.DIV

Diverse implementations of the application software (e.g. IEC 1131-3


and Pascal).

SW.CYC

Ensure that the software is implemented in a simple cyclic loop. Avoid


the use of interrupts and buffering.

Version: 1.1

159

Adelard Safety Case Development Manual

K.9.3 Safety case evidence requirements for the software development


SW.CHK.CASE

Check the fault detection performance for simulated faults.

SW.TRIP.CASE

Perform 104 demands on the system using realistic trip profiles.

SW.TIM.CASE

Show the timing constraint is satisfied.

SW.V&V.CASE1 SW.FM.VER

SW.REV.CASE

Provide proof script, independent verification of


proof.
Demonstrate the reversible computer is
implemented correctly and formal software is
correctly mapped to reversible code.
Provide tests of fail-safe performance.

SW.V&V.CASE2 SW.ETST.CASE

Show all software modules are exhaustively tested.


Show all modules operate independently for all
readings.

SW.DIV.CASE

Show diverse implementations are independent


(languages, tools, staff, V&V).

SW.DES.CASE

Show compliance with the implementation constraints.

SW.TOOL.CASE

Provide impact analysis of faults in support tools, analysis of tool


quality (e.g. likely number of faults injected).

K.9.4 Software documentation/QA requirements


SW.PROCESS

Provide evidence for the integrity of the delivered system and the
development process: safety plan, safety audit records, quality
plan, QA records, plans, design documents, software, proof files,
V&V records.

SW.PRODUCT

Provide all necessary items for use and long-term support: design
documents, software, proof scripts, test environment, support
tools.

K.10 References:
[1]

160

P.G. Bishop, 1997 Using Reversible Computing to Achieve Fail-safety in


proceedings ISSRE 97, Nov 1997, Alberquerque, New Mexico, USA

Version: 1.1

Adelard Safety Case Development Manual

Appendix L Index
accident mitigation .................... 18, 61

deterministic argument...............15, 86

adequate ..............................................8

diversity ................................................19

ALARP ...................................... 61, 67, 94

documentation ......................52, 53, 58

architectural safety case ........... 22, 32

domain knowledge .........................116

assumptions.........................................14

external equipment ...........................24

certification .........................................37

failsafe bias .........................................40

checklist ............ 30, 31, 75, 93, 123, 133

failure mitigation ..........................18, 38

claim limits .............................. 56, 62, 66

failure modes ......................................24

claims ........................................ 7, 14, 20

fault elimination..................................18

commercial risk.....................................8

fault tree analysis................................36

common cause failure analysis........41

feedback records ............................113

conservatism .......................................49

field experience .................................99

correctness..........................................41

FMEA ....................................................41

costs......................................................29

formal proof ............................15, 56, 81

COTS................................ 37, 48, 99, 103

hazard analysis ...................................36

dangerous failure ........................ 18, 19

Hazops .................................................39

Def Stan 00-55 .....................................64

human error ......................................115

Def Stan 00-56 ........................ 62, 64, 66

human factors ............................52, 115

defence in depth ...............................62

IAEA-367...............................................65

design criteria .....................................65

IEC 61508 .......... 8, 9, 62, 64, 65, 90, 100

design for assessment ..... 30, 32, 33, 48

implementation safety case ......23, 40

design options.....................................69

independent assessment..................49

Version: 1.1

161

integrity checks.................................. 37

qualitative argument.........................16

integrity level .............. 16, 55, 62, 65, 66

quality management ........... 39, 58, 67

interlocking ......................................... 36

regulator ........................... 10, 39, 49, 93

ISO 9000-3 ........................................... 64

risk .................... 32, 34, 36, 38, 39, 49, 61

ISO 9001 .............................................. 64

risk assessment ....................... 32, 38, 42

KISS............................................... 34, 104

robustness ............................................16

legacy system .............................. 29, 48

safety case maintenance...............109

long term issues................................ 109

security...........................................21, 62

long-term costs .................................. 51

single failure criterion ................ 62, 131

maintainability ............................. 21, 62

software ...................... 27, 34, 37, 41, 69

maintenance ..................................... 51

software integrity level ......................27

management..................................... 52

software reliability case.......................9

modifiability ........................................ 62

software safety case............................9

MTTF ..................................................... 15

stakeholders ............................... 39, 117

MTTR..................................................... 22

support tools........................................31

novelty................................................. 36

tacit knowledge ................ 52, 116, 134

obsolescence .............................. 51, 87

teams .................................................117

operation and installation safety


case............................................... 23, 44

timing errors .........................................20

operator .............................................. 39
PES........................................................ 55
Preliminary .......................................... 23
preliminary safety case............... 22, 23
probabilistic arguments.................... 15
probabilistic criteria........................... 65
project lifecycle........................... 22, 45
purchaser............................................ 39

162

tolerable ..............................................61
tools ................................................89, 90
traceability ....................................28, 54
training ...........................................23, 73
usability ................................................63
validation.......................................40, 56
verification.....................................40, 56
voting ...................................................71
watchdogs ....................................70, 72

Version: 1.1

Potrebbero piacerti anche