Sei sulla pagina 1di 198

Software Reliability Engineering:

Techniques and Tools


CS130
Winter, 2002

Source Material

Software Reliability and Risk Management:


Techniques and Tools, Allen Nikora and Michael
Lyu, tutorial presented at the 1999 International
Symposium on Software Reliability Engineering
Allen Nikora, John Munson, Determining Fault
Insertion Rates For Evolving Software Systems,
proceedings of the International Symposium on
Software Reliability Engineering, Paderborn,
Germany, November, 1998
2

Agenda
Part I:
Part II:
Part III:
Part IV:
Part V:
Part VI:
Part VII:

Introduction
Survey of Software Reliability Models
Quantitative Criteria for Model Selection
Input Data Requirements and Data Collection Mechanisms
Early Prediction of Software Reliability
Current Work in Estimating Fault Content
Software Reliability Tools

Part I: Introduction

Reliability Measurement Goal


Definitions
Reliability Theory

Reliability Measurement Goal

Reliability measurement is a set of mathematical


techniques that can be used to estimate and
predict the reliability behavior of software during
its development and operation.
The primary goal of software reliability modeling is
to answer the following question:
Given a system, what is the probability that it will
fail in a given time interval, or, what is the
expected duration between successive failures?
5

Basic Definitions

Software Reliability R(t): The probability of


failure-free operation of a computer program for
a specified time under a specified environment.
Failure: The departure of program operation
from user requirements.
Fault: A defect in a program that causes failure.

Basic Definitions (contd)

Failure Intensity (rate) f(t): The expected number


of failures experienced in a given time interval.
Mean-Time-To-Failure (MTTF): Expected value of
a failure interval.
Expected total failures (t): The number of
failures expected in a time period t.
7

Reliability Theory
Let "T" be a random variable representing the
failure time or lifetime of a physical system.
For this system, the probability that it will fail by
t
time "t" is:
F (t ) P[T t ] f ( x) dx
0

The probability of the system surviving


until time

"t" is:
R (t ) P[T t ] 1 F (t ) f ( x)dx

Reliability Theory (contd)


Failure rate - the probability that a failure will
occur in the interval [t1, t2] given that a failure
has not occurred before time t1. This is written
as:

P[ t 1 t t 2| T t 1]
P[ t 1 t t 2 ]

t 2 t1
( t 2 t 1) P[ T t 1]
F ( t 2 ) F ( t 1)

( t 2 t 1) R ( t 1)
9

Reliability Theory (contd)


Hazard rate - limit of the failure rate as the length of the
interval approaches zero. This is written as:

F ( t t ) F ( t )
z ( t ) lim
t 0
tR ( t )
f (t )

R(t )
This is the instantaneous failure rate at time t, given that the
system survived until time t. The terms hazard rate and
failure rate are often used interchangeably.

10

Reliability Theory (contd)


A reliability objective expressed in terms of one
reliability measure can be easily converted into
another measure as follows (assuming an
average failure rate, , is measured):
t t
MTTF 1

1 MTTF
R t e t
11

Hazard rate z(t)

Reliability Theory (cont'd)

Region 1 debugging,
infant mortality

Region 2 useful life


period

Region 3 component
wear-out
or fatigue

Time t

12

Part II: Survey of Software Reliability Models

Software Reliability Estimation Models:

Exponential NHPP Models


Jelinski-Moranda/Shooman Model
Musa-Okumoto Model
Geometric Model

Software Reliability Modeling and


Acceptance Testing

13

Jelinski-Moranda/Shooman Models

Jelinski-Moranda model was developed by Jelinski and


Moranda of McDonnell Douglas Astronautics Company
for use on Navy NTDS software and a number of
modules of the Apollo program. The Jelinski-Moranda
model was published in 1971.

Shooman's model, discovered independently of Jelinski


and Moranda's work, was also published in 1971.
Shooman's model is identical to the JM model.

14

Jelinski-Moranda/Shooman (cont'd)
Assumptions:
1. The number of errors in the code is fixed.
2. No new errors are introduced into the code through the
correction process.
3. The number of machine instructions is essentially constant.
4. Detections of errors are independent.
5. The software is operated in a similar manner as the
anticipated operational usage.
6. The error detection rate is proportional to the number of
errors remaining in the code.

15

Jelinski-Moranda/Shooman (cont'd)
Let represent the amount of debugging time spent on the system since
the start of the test phase.
From assumption 6, we have:

z() = K r( )

where K is the proportionality constant, and r is the error rate (number of


remaining errors normalized with respect to the number of instructions).
ET = number of errors initially in the program
IT = number of machine instructions
the cprogram
r() = E / Iin ()
T

c = cumulative number of errors fixed in the interval [0,normalized by the


number of instructions).

16

Jelinski-Moranda/Shooman (cont'd)
ET and IT are constant (assumptions 1 and 3).
No new errors are introduced into the correction process
(assumption 2).
As , c() ET/IT, so r() 0.
ET
C
The hazard rate becomes: z K
IT

z()

K(ET/IT)

ET/IT

C()
17

Jelinski-Moranda/Shooman (cont'd)
The reliability function becomes:

K ET C
IT

R e

The expression for MTTF is:


1
K

18

Geometric Model

Proposed by Moranda in 1975 as a variation of the


Jelinski-Moranda model.
Unlike models previously discussed, it does not assume
that the number of errors in the program is finite, nor does
it assume that errors are equally likely to occur.
This model assumes that errors become increasingly
difficult to detect as debugging progresses, and that the
program is never completely error free.
19

Geometric Model (cont'd)


Assumptions:
1.
2.
3.
4.

5.

There are an infinite number of total errors.


All errors do not have the same chance of detection.
The detections of errors are independent.
The software is operated in a similar manner as the
anticipated operational usage.
The error detection rate forms a geometric
progression and is constant between error
occurrences.

20

Geometric Model (cont'd)


The above assumptions result in the
following hazard rate:
z(t) = Di-1
for any time "t" between the (i - 1)st and
the i'th error.
The initial value of z(t) = D
21

Geometric Model (cont'd)


Hazard
rate
D

} D(1 - )

D
D

Hazard Rate Graph

D(1 - 2 )

time

change in z t on discovery of i' th error


1
D D

1
i
i 1
change in z t on discovery of i 1st error

D D
i 1

i 2

22

Musa-Okumoto Model

The Musa-Okumoto model assumes that the failure intensity function


decreases exponentially with the number of failures observed:

0e

Since (t) = d(t)/dt, we have the following differential equation:

d (t )
0e u t
dt

or

d (t ) u t
e 0
dt

23

Musa-Okumoto Model (contd)


Note that

We then obtain

de ( t )
d (t ) u t

e
dt
dt

de ( t )
0
dt

24

Musa-Okumoto Model (contd)


Integrating this last equation yields:

t 0 C

Since (0) = 0, C = 1, and the mean value function (t) is:

ln t 0 1
t

25

Software Reliability Modeling and


Acceptance Testing
Given a piece of software advertised as having a failure
rate , you can see if it meets that failure rate to a
specific level of confidence.
is the risk (probability) of falsely saying that the
software does not meet the failure rate goal.
is the risk of saying that the goal is met when it is
not.
The discrimination ratio, , is the factor you specify
that identifies acceptable departure from the goal.
For instance, if = 2, the acceptable failure rate lies
between /2 and 2.
26

Software Reliability Modeling and


Acceptance Testing (contd)

Failure Number

Reject

Continue

Accept

Normalized Failure Time


(Time to failure times failure intensity objective)

27

Software Reliability Modeling and


Acceptance Testing (contd)
We can now draw a chart as shown in the previous slide. Define
intermediate quantities A and B as follows:

A ln

B ln

The boundary between the reject and continue regions is given


by:
A n ln

1
where n is number of failures observed. The boundary between the
continue and accept regions of the chart is given by:
B n ln
1

28

Part III: Criteria for Model Selection

Background
Non-Quantitative criteria
Quantitative criteria

29

Criteria for Model Selection - Background

When software reliability models first appeared, it was felt


that a process of refinement would produce definitive
models that would apply to all development and test
situations:
Current situation
Dozens of models have been published in the literature
Studies over the past 10 years indicate that the
accuracy of the models is variable
Analysis of the particular context in which reliability
measurement is to take place so as to decide a priori
which model to use does not seem possible.
30

Criteria for Model Selection (contd)


Non-Quantitative Criteria
Model Validity
Ease of measuring parameters
Quality of assumptions
Applicability
Simplicity
Insensitivity to noise
31

Criteria for Model Selection (contd)


Quantitative Criteria for Post-Model Application
Self-consistency
Goodness-of-Fit
Relative Accuracy (Prequential Likelihood
Ratio)
Bias (U-Plot)
Bias Trend (Y-Plot)
32

Criteria for Model Selection (contd)


Self-constency - Analysis of a models predictive quality can help user decide which
model(s) to use.
Simplest question a SRM user can ask is How reliable is the software at this
moment?
The time to the next failure, Ti, is usually predicted using observed times ot
failure
t1 ,made
t 2 , ,using
t i1 observed times to failure
In general, predictions of Ti can be
The results of predictions made for different values of K can then be compared. If
a model produced self consistent
for differing values of K, this indicates
t1 , t 2 , , tresults
iK , K 0
that its use is appropriate for the data on which the particular predictions were
made.
HOWEVER, THIS PROVIDES NO GUARANTEE THAT THE PREDICTIONS ARE
CLOSE TO THE TRUTH.

33

Criteria for Model Selection (contd)


Goodness-of-fit - Kolmogorov-Smirnov Test
Uses the absolute vertical distance between
two CDFs to measure goodness of fit.
Depends on the fact that:

n Dn

sup F x F x

where F0 is a known, continuous CDF, and Fn is


the sample CDF, is distribution free.

34

Criteria for Model Selection (contd)


Goodness-of-fit (contd) - Chi-Square Test

More suited to determining GOF of failure counts data than to interfailure


times.

Value given by:

k 1

j np j

np

j 1
where:
j
n = number of independent repetitions of an experiment in which the
outcomes are decomposed into k+1 mutually exclusive sets A1, A2,..., Ak+1
Nj = number of outcomes in the jth set
pj = P[Aj]

35

Criteria for Model Selection (contd)


Prequential Likelihood Ratio

The pdf for Fi(t) for Ti is based on observations

f t d F t dt

Fori one-step iahead predictions of

t1 t 2

t i1

, the prequential likelihood is:

T j 1 , T j 2 , , T j n
j n ~

f (t )

Two prediction systems, A and B, can be evaluated by computing the prequential likelihood
PLn

ratio:

i j 1

j n

. The
, pdf, ,

~
A
i

( ti )

i j 1
If PLRn approaches infinity as n approaches
infinity, B is discarded in favor of A

PLRn

j n ~ B

i j 1

( ti )

36

Prequential Likelihood Example


fi

fi+2

fi+1

true pdf

High bias, low noise


fi

fi+1

true
pdf

fi+3

fi+2

Low bias, high noise

37

Criteria for Model Selection (contd)


Prequential Likelihood Ratio (cont'd)
When predictions have been made for T j 1 , T j 2 , , T j n , the PLR
is given by:

PLR n

p ( tj n , , tj 1| tj , , t 1, A)
p ( tj n , , tj 1| tj , , t 1, B )

Using Bayes' Rule, the PLR is rewritten as:

p ( A| tj n , , t 1) p ( tj n , , tj 1| tj , , t 1)
p ( A| tj , , t 1)
PLR n
p ( B| tj n , , t 1) p ( tj n , , tj 1| tj , , t 1)
p ( B| tj , , t 1)
38

Criteria for Model Selection (contd)


Prequential Likelihood Ratio (contd)
This equals:

p ( A| tj n , , t 1) p ( B| t 1, , tj )

p ( B| tj n , , t 1) p( A| t 1, , tj )

If the initial conditions were based only on prior belief, the second
factor of the final equation is the prior odds ratio. If the user is
indifferent between models A and B, this ratio has a value of 1.

39

Criteria for Model Selection


(contd)
Prequential Likelihood Ratio (contd):
The final equation is then written as:

wA
PLR n
1 wA

This is the posterior odds ratio, where wA is the posterior belief that
A is true after making predictions with both A and B and comparing
them with actual behavior.

40

Criteria for Model Selection (contd)


The u-plot can be used to assess the predictive quality of a model
~

Given a predictor, F i t , that estimates the probability that the time to


the next failure is less than t. Consider the sequence
~
ui F i t i

where each ui is a probability integral transform


of the observed t i
~
ui t1 , t 2 , , t i 1 .
using the previously calculated predictor F i based upon
~
F
If each i were identical to the true, but hidden, F i, then the ui would
be realizations of independent random variables with a uniform
distribution in [0,1].
The problem then reduces to seeing how closely the sequence
resembles a random sample from [0,1]

41

U-Plots for JM and LV Models

1.0

JM
LV

0.5

0.5

1.0

42

Criteria for Model Selection (contd)


The y-plot:

Temporal ordering is not shown in a u-plot. The y-plot addresses this


deficiency

To generate a y-plot, the following steps are taken:


Compute the sequence of
u i
For each
, compute
Obtain
ui
xi ln 1 ui
by computing:

yi

j
j
for i m, m representing the number
of observations
made
j 1
j 1
If the really do form a sequence of independent random variables in [0,1],
the slope of the plotted will be constant.

ui

yi

43

Y-Plots for JM and LV Models

1.0

LV
JM

0.5

0.5

1.0

44

Criteria for Model Selection (contd)


Quantitative Criteria Prior to Model Application
Arithmetical Mean of Interfailure Times
Laplace Test

45

Arithmetical Mean of Interfailure Times

Calculate arithmetical mean of interfailure times as


follows:
1 i
i j
i j 1
i = number of observed failures
j = jth interfailure time

Increasing series of (i) suggests reliability growth.


Decreasing series of (i) suggests reliability decrease.

46

Laplace Test

The occurrence of failures is assumed to follow a nonhomogeneous Poisson process whose failure intensity is
decreasing:
a bt

h t e

Null hypothesis is that occurrences of failures follow a


homogeneous Poisson process (I.e., b=0 above).
For interfailure times, test statistic computed
by:
i

u i

1 i 1 n

j
i 1 n 1 j 1
i

j 1

j 1

1
12 i 1

47

Laplace Test (contd)

For interval data, test statistic computed by:


k

u k

i 1

k 1
i 1 n i
2

n i
i 1

1
k n i
12 i 1

48

Laplace Test (contd)

Interpretation
Negative values of the Laplace factor indicate decreasing failure intensity.
Positive values suggest an increasing failure intensity.
Values varying between +2 and -2 indicate stable reliability.
Significance is that associated with normal distribution; e.g.
The null hypothesis H 0 : HPP vs. H 1 : decreasing failure intensity is
rejected at the 5% significance level for (T) < -1.645
The null hypothesis H 0 : HPP vs. H 1 : increasing failure intensity is
rejected at the 5% significance level for (T) > -1.645
The null hypothesis H 0 : HPP vs. H 1 : there is a trend is rejected at
the 5% significance level for |(T)| > 1.96

49

Part IV: Input Data Requirements and


Data Collection Mechanisms

Model Inputs
Time Between Successive Failures
Failure Counts and Test Interval Lengths
Setting up a Data Collection Mechanism
Minimal Set of Required Data
Data Collection Mechanism Examples

50

Input Data Requirements and Data Collection


Mechanisms
Model Inputs - Time between Successive Failures

Most of the models discussed in Section II require the times


between successive failures as inputs.

Preferred units of time are expressed in CPU time (e.g., CPU


seconds between subsequent failures).
Allows computation of reliability independent of wall-clock
time.
Reliability computations in one environment can be easily
transformed into reliability estimates in another, provided
that the operational profiles in both environments are the
same and that the instruction execution rates of the original
environment and the new environment can be related.

51

Input Data Requirements and Data Collection


Mechanisms (contd)
Model Inputs - Time between Successive Failures
(contd)
Advantage - CPU time between successive
failures tends to more accurately characterize the
failure history of a software system than calendar
time. Accurate CPU time between failures can
give greater resolution than other types of data.
Disadvantage - CPU time between successive
failures can often be more difficult to collect than
other types of failure history data.
52

Input Data Requirements and Data Collection


Mechanisms (contd)
Model Inputs ( contd) - Failure Counts and Test Interval Lengths

Failure history can be collected in terms of test interval lengths and


the number of failures observed in each interval. Several of the
models described in Section II use this type of input.

The failure reporting systems of many organizations will more easily


support collection of this type of data rather than times between
successive failures. In particular, the use of automated test systems
can easily establish the length of each test interval. Analysis of the
test run will then provide the number of failures for that interval.

Disadvantage - failure counts data does not provide the resolution


that accurately collected times between failures provide.

53

Input Data Requirements and Data


Collection Mechanisms (contd)
Setting up a Data Collection Mechanism
1. Establish clear, consistent objectives.
2. Develop a plan for the data collection process. Involve all individuals concerned
(e.g. software designers, testers, programmers, managers, SQA and SCM staff).
Address the following issues:
a. Frequency of data collection.
b. Data collection responsibilities
c. Data formats
d. Processing and storage of data
e. Assuring integrity of data/adherence to objectives
f. Use of existing mechanisms to collect data

54

Input Data Requirements and Data


Collection Mechanisms (contd)
Setting up a Data Collection Mechanism (contd)
3. Identify and evaluate tools to support data collection effort.
4. Train all parties in use of selected tools.
5. Perform a trial run of the plan prior to finalizing it.
6. Monitor the data collection process on a regular basis (e.g. weekly
intervals) to assure that objectives are being met, determine current
reliability of software, and identify problems in collecting/analyzing the data.
7. Evaluate the data on a regular basis. Assess software reliability as testing
proceeds, not only at scheduled release time.
8. Provide feedback to all parties during data collec-tion/analysis effort.

55

Input Data Requirements and Data


Collection Mechanisms (contd)
Minimal Set of Required Data - to measure software reliability during
test, the following minimal set of data should be collected by a
development effort:
Time between successive failures OR test interval lengths/number
of failures per test interval.
Functional area tested during each interval.
Date on which functionality was added to software under test;
identifier for functionality added.
Number of testers vs. time.
Dates on which testing environment changed, and nature of
changes.
Dates on which test method changed.

56

Part VI: Early Prediction of Software


Reliability

Background
RADC Study
Phase-Based Model

57

Part VI: Background

Modeling techniques discussed in preceding sections can be applied only


during test phases.
These techniques do not take into account structural properties of the
system being developed or characteristics of the development
environment.
Current techniques can measure software reliability, but model outputs
cannot be easily used to choose development methods or structural
characteristics that will increase reliability.
Measuring software reliability prior to test is an open area. Work in this
area includes:
RADC study of 59 projects
Phase-Based model
Analysis of complexity

58

Part VI: RADC Study

Study of 59 software development efforts, sponsored by RADC in mid


1980s
Purpose - develop a method for predicting software reliability in the life
cycle phases prior to test. Acceptable model forms were:
measures leading directly to reliability/failure rate predictions
predictions that could be translated to failure rates (e.g., error density)
Advantages of error density as a software reliability figure of merit,
according to participating investigators:
It appears to be a fairly invariant number.
It can be obtained from commonly available data.
It is not directly affected by variables in the environment
Conversion among error density metrics is fairly straightforward.

59

Part VI: RADC Study (contd)

Advantages of error density as a software reliability figure of merit (contd)


Possible to include faults by inspection with those found during testing
and operations, since the time-dependent elements of the latter do not
need to be accounted for.
Major disadvantages cited by the investigators are:
This metric cannot be combined with hardware reliability
metrics.
Does not relate to observations in the user environment. It is far
easier for users to observe the availability of their systems than their
fault density, and users tend to be far more concerned about how
frequently they can expect the system to go down.
No assurance that all of the faults have been found.

60

Part VI: RADC Study (contd)


Given these advantages and disadvantages, the
investigators decided to attempt prediction of error
density during the early phases of a development effort,
and develop a transformation function that could be
used to interpret the predicted error density as a failure
rate. The driving factor seemed to be that data available
early in life cycle could be much more easily used to
predict error densities rather than failure rates.

61

Part VI: RADC Study (contd)


Investigators postulated that the following measures representing
development environment and product characteristics could be used
as inputs to a model that would predict the error density, measured
in errors per line of code, at the start of the testing phase.

A -- Application Type (e.g. real-time control system, scientific


computation system, information management system)
D -- Development Environment (characterized by
development methodology and available tools). The types of
development environments considered are the organic,
semi-detached, and embedded modes, familiar from the
COCOMO cost model.

62

Part VI: RADC Study (contd)


Measures of development environment and product characteristics (contd):

Requirements and Design Representation Metrics


SA - Anomaly Management
ST - Traceability
SQ - Incorporation of Quality Review results into the software
Software Implementation Metrics
SL - Language Type (e.g. assembly, high-order language, fourth generation
language)
SS - Program Size
SM - Modularity
SU - Extent of Reuse
SX - Complexity
SR - Incorporation of Standards Review results into the software

63

Part VI: RADC Study (contd)

Initial error density at the start of test given by:

A * D * SA * ST * SQ * SL * SS * SM * SU * SX * SR

Initial failure rate:

F * K * 0 * Number of lines of source code

F *frequency
K *W 0 of the program
0
F = linear execution
K = fault exposure ratio (1.4*10-7 < K < 10.6*10-7, with an average value
of 4.2*10-7)
W0 = number of inherent faults

64

Part VI: RADC Study (contd)


Moreover, F = R/I, where
R is the average instruction rate
I is the number of object instructions in the program
I can be further rewritten as IS * QX, where

IS is the number of source instructions,

QX is the code expansion ratio (the ratio of machine instructions to source instructions, which has an average value of 4
according to this study).
Therefore, the initial failure rate can be expressed as:

K
R*
*

Q x

W
I

65

Part VI: Phase-Based Model

Developed by John Gaffney, Jr. and Charles F. Davis of the Software


Productivity Consortium
Makes use of error statistics obtained during technical review of
requirements, design and the implementation to predict software reliability
during test and operations.
Can also use failure data during testing to estimate reliability.
Assumptions:
The development effort's current staffing level is directly related to the
number of errors discovered during a development phase.

The error discovery curve is monomodal.


Code size estimates are available during early phases of a development
effort.
Fagan inspections are used during all development phases.

66

Part VI: Phase-Based Model


The first two assumptions, plus Norden's observation that the Rayleigh
curve represents the "correct" way of applying to a development effort,
results in the following expression for the number of errors discovered
during a life cycle phase:

V t

B
(
t

1
)

Bt
E e
e

E = Total Lifetime Error Rate, expressed in


Errors per Thousand Source Lines of Code (KSLOC)
t = Error Discovery Phase index

67

Part VI: Phase-Based Model


Note that t does not represent ordinary calendar time. Rather, t
represents a phase in the development process. The values of t and
the corresponding life cycle phases are:
t = 1 - Requirements Analysis
t = 2 - Software Design
t = 3 - Implementation
t = 4 - Unit Test
t = 5 - Software Integration Test
t = 6 - System Test
t = 7 - Acceptance Test

68

Part VI: Phase-Based Model


p, the Defect Discovery Phase Constant is the location of the peak in a
continuous fit to the failure data. This is the point at which 39% of the
errors have been discovered:

1
2

2
p

The cumulative form of the model is:

1 Bt

E
V t e
where V is the number of errors per KSLOC that have been dis-covered
2

through phase t

69

Part VI: Phase-Based Model


Typical Error Discovery Profile
Error Density (Errors per KLOC)

14
12
10
8
6
4
2
0
1

Development Phase Index

70

Part VI: Phase-Based Model


This model can also be used to estimate the number of latent errors in the
software. Recall that the number of errors per KSLOC removed through the n'th
phase is:

Bn

V n E 1 e

The number of errors remaining in the software at that point is:


2

Bn

times the number of source statements

71

Part VII: Current Work in Estimating


Fault Content

Analysis of Complexity
Regression Tree Modeling

72

Analysis of Complexity

The need for measurement


The measurement process
Measuring software change
Faults and fault insertion
Fault insertion rates

73

Analysis of Complexity (contd)

Recent work has focused on relating measures of


software structure to fault content.
Problem - although different software metrics will
say different things about a software system, they
tend to be interrelated and can be highly
correlated with one another (e.g., McCabe
complexity and line count are highly correlated).

74

Analysis of Complexity (contd)

Relative complexity measure, developed by


Munson and Khoshgoftaar, attempts to handle the
problem of interdependence and multicollinearity
among software metrics.
Technique used is factor analysis, whose purpose is to decompose a set of correlated
measures into a set of eigenvalues and
eigenvectors.
75

Analysis of Complexity (contd)

The need for measurement


The measurement process
Measuring software change
Faults and fault insertion
Fault insertion rates

76

Analysis of Complexity - Measuring Software


Source Code
{
int i,j;
for (i=0; Array[i][0]!='\0'; i++)
{
j = strcmp(String, Array[i]);
if (j>0)
continue;
if (j<0)
return -1;
else
return i;
};
return -1;
}

Module

CMA
Metric
Analysis

LOC
Stmts
N1
N2
eta1
eta2

14
12
30
23
15
12

Module
Characteristics
77

Analysis of Complexity - Simplifying


Measurements
Modules

Metric
Analysis

12 23 54 12 203 39 238 34

7 13 64 12 215 9 39 238

CMA

11 21 54 12 241 39 238 35

5 33 44 12 205 39 138 44

42 55 54 12 113 29 234 14

Program

Raw Metrics

Principal
Components
Analysis
PCA/
RCM

50
40

60
45
55

Relative
Complexity
78

Analysis of Complexity - Relative


Complexity

Relative complexity is a synthesized metric


m

Bj d Bj
B
i

j 1

Relative complexity is a fault surrogate


Composed of metrics closely related to
faults
Highly correlated with faults

79

Analysis of Complexity (contd)

The need for measurement


The measurement process
Measuring software change
Faults and fault insertion
Fault insertion rates

80

Analysis of Complexity (contd)


Software Evolution

We assume that we are developing (maintaining) a


program

We are really working with many programs over time

They are different programs in a very real sense

We must identify and measure each version of each


program module

81

Analysis of Complexity (contd)


Evolution of the STS Primary Avionics Software System (PASS)
1

31

36

37

38

32

33

34

35

39

40

41

42

55

A
64

49

50

51

5 2

65

66

67

68

56

57

58

60

61

62

70

71

72

69

63

53

1 54

1 60

1 61

1 55

1 63

1 64

65

6 6

67

68

6 9

70

1 62
7

88

1 74

1 71

1 72

1 73

1 75

1 76

1 77

1 78

1 79

1 80

1 81

1 82
1 85

1 83

2 03

302

303

304

305

306

307

1 86

355

356

310

01

2 05

2 07

2 08

2 09

2 10

2 11

212

213

2 14

215

216

17

20

2 25

226

312

361

362

313

314

315

363

325

326

327

328

329

330

134

229

230

231

232

336

337

338

333

334

278

279

280

332
335

339

364

331

228

323
324

227

322

02

2 06

2 24

311

90

297

309

89

296

308

1 87

295

301

00

2 04

294

300

88

1 84

159

281

262

263

264

274

349

275

276

277

282

283

284

82

Analysis of
Complexity
(contd)
The
Problem

Build N

Build N+1

A
B

C
D
E

L
M
D
E

G
H
I
J
K

H
I
J
K

83

Analysis of Complexity (contd)


Managing fault counts during evolution

Some faults are inserted during branch builds


These fault counts must be removed when the branch is
pruned

Some faults are eliminated on branch builds


These faults must be removed from the main sequence build

Fault count should contain only those faults on the main


sequence to the current build

Faults attributed to modules not in the current build must be


removed from the current count

84

Analysis of Complexity (contd)


Baselining a software system
Software changes over software builds
Measurements, such as relative complexity, change
across builds

Initial build as a baseline

Relative complexity of each build

Measure change in fault surrogate from initial


baseline
85

Analysis of Complexity - Measurement


Baseline

Point A

Point B

86

Analysis of Complexity - Baseline


Components

Vector of means
Vector of standard deviations
z Bj ,i ( w Bj ,i x jB ) / s Bj

Transformation matrix
d B ,i z B ,i T B

87

Analysis of Complexity - Comparing Two


Builds
Build i
Measurement
Tools

Source Code

Baseline

Baselined Build i

RCM Values

Code
Churn

RCMDelta

Code
Deltas
Build j

Baselined Build j

88

Analysis of Complexity - Measuring


Evolution

Different modules in different builds


M i , j set of modules not in latest build
a
i, j
M b set of modules not in early build
i, j
M

c set of common modules


i, j
B, j
B,i

Code delta
a
a
a
i, j
i, j
B, j
B ,i

a
a
a
a
Code churn
Net code churn
i, j

i, j

mcM c

B ,i

maM ai , j

B, j

mbM bi , j

89

Analysis of Complexity (contd)

The need for measurement


The measurement process
Measuring software change
Faults and fault insertion
Fault insertion rates

90

Analysis of Complexity - Fault


Insertion
Build N
Existing
Faults

Build N+1
Existing
Faults

Faults
Removed

Faults
Added

91

Analysis of Complexity Identifying and Counting Faults

Unlike failures, faults are not directly observable


fault counts should be at same level of granularity as software
structure metrics
Failure counts could be used as a surrogate for fault counts if:
Number of faults were related to number of failures
Distribution of number of faults per failure had low variance
The faults associated with a failure were confined to a
single procedure/function
Actual situation shown on next slide

92

Analysis of Complexity - Observed Distribution of Faults per


Failure
Distribution of Defects per Failure
4

Frequency

39

37

35

33

31

29

27

25

23

21

19

17

15

13

11

Defects per Reported Failure


Statistics
N
Valid
Defects
per 1
Failure

Missing
30

Mean
10.5667

Median
7.5000

Std.
Deviation
9.3428

25
3.7500

Percentiles
50
7.5000

75
13.2500

93

Analysis of Complexity - Fault


Identification and Counting Rules

Taxonomy based on corrective actions taken in response to failure


reports
faults in variable usage
Definition and use of new variables
Redefinition of existing variables (e.g. changing type from float
to double)
Variable deletion
Assignment of a different value to a variable
faults involving constants
Definition and use of new constants
Constant definition deletion

94

Analysis of Complexity - Fault


Identification and Counting Rules (contd)

Control flow faults


Addition of new source code block
Deletion of erroneous conditionally-executed path(s) within a set of
conditionally executed statements
Addition of execution paths within a set of conditionally executed
statements
Redefinition of existing condition for execution (e.g. change if i < 9
to if i <= 9)
Removal of source code block
Incorrect order of execution
Addition of a procedure or function
Deletion of a procedure or function

95

Analysis of Complexity (contd)


Control flow fault examples - removing execution paths from
a code block

Counts as two faults, since two paths were removed

96

Analysis of Complexity (contd)


Control flow examples (contd) - addition of conditional execution
paths to code block

Counts as three faults, since three paths were added

97

Analysis of Complexity - Estimating


Fault Content

The fault potential of a module i is


directly proportional to its relative
complexity
0
ri 0

R0

From previous development projects


develop a proportionality constant, k,
0
0
F

kR
for total faults
0
0 0
g

r
Faults per module: i i F
98

Analysis of Complexity Estimating Fault Insertion Rate

Proportionality constant, k, representing


the rate of fault insertion
For jth build, total faults insertion

F kR k
j

0, j

Estimate for the fault insertion rate

F j 1 F j kR 0 k 0, j 1 kR 0 k 0, j
0 , j 1
0, j

k (
)
k j , j 1
99

Analysis of Complexity (contd)

The need for measurement


The measurement process
Measuring software change
Faults and fault insertion
Fault insertion rates

100

Analysis of Complexity - Relationships


Between Change in Fault Count and
Structural Change
Version ID

Version 3
Version 4
Version 5
Versions 3-5

Number of
Cases
10
12
13
35

Pearsons R

Spearman
Correlation

.376
.661
.891
.568

-.323
.700
-.276
.125

.451
.793
.871
.631

-.121
.562
-.233
.087

= code churn
= code delta
101

Analysis of Complexity Regression Models


b0

b1

t (b1)

b2

t (b2)

R2

Adjusted
R2

j , j 1

1.507

0.373

3.965

-----

-----

0.568

0.323

j , j 1

1.312

0.460

4.972

0.172

2.652

0.667

j , j 1

-----

0.576

7.937

-----

-----

j , j 1

-----

0.647

9.172

0.201

2.849

Model

j , j 1

j , j 1

0 b1

j , j 1

j , j 1

0 b1

b
1

j , j 1

j , j 1

b
1

0.302

Residual
Sum of
Squares
140.812

Degrees
Of
Freedom
33

0.445

0.412

115.441

32

0.806

0.649

0.639

179.121

34

0.846

0.719

0.702

143.753

33

j , j 1

d is the number of faults inserted between builds j and


j+1
is the measured code churn between builds j and
j , j 1
j , j 1

j+1
is the measured code delta between builds j and j+1
102

Analysis of Complexity - PRESS


Scores - Linear vs. Nonlinear
Models
Effect
Churn only
Churn and
Delta

Linear Models
Excluding
Observations of
Churn = 0
186.090
159.875

Nonlinear Models
Excluding
Observations of
Churn = 0
205.718
157.831

103

Analysis of Complexity - Selecting


an Adequate Linear Model

Linear model gives best R2 and PRESS score.


Is the model based only on code churn an adequate
predictor at the 5% significance level?

R2-adequate test shows that code churn is not an


adequate predictor at the 5% significance level

.
104

Analysis of Complexity - Analysis of Predicted Residuals


Predicted Residuals vs. Observed Defects
Defects = b1*Churn + b2*Delta
8

Predicted Residuals

6
4
2
0
-2
-4
0

10

12

Number of observed defects - versions 3-5


Wilcoxon Test Results Predictions for Excluded Observations
Sample Pair
Number
Mean Rank
Of
Observations
10
19.20
Observed Defects
Negative Ranks
25
17.52
And Estimated
Positive Ranks
0
Defects from Excluded Ties
35
Observations.
Total
Estimated Defects
Predicted by:

j , j 1

b
1

j , j 1

Sum of Ranks
192.00
438.00

Test
Statistic
Z
-2.015

Asymptotic
Significance
(2-tailed)
0.044

j , j 1

105

Regression Tree Modeling


Objectives
Attractive way to encapsulate the knowledge of
experts and to aid decision making.
Uncovers structure in data
Can handle data with complicated an unexplained
irregularities
Can handle both numeric and categorical
variables in a single model.
106

Regression Tree Modeling (contd)


Algorithm
Determine set of predictor variables (software metrics)
and a response variable (number of faults).
Partition the predictor variable space such that each
partition or subset is homogeneous with respect to the
dependent variable.
Establish a decision rule based on the predictor variables
which will identify the programs with the same number of
faults.
Predict the value of the dependent variable which is the
average of all the observations in the partition.
107

Regression Tree Modeling (contd)


Algorithm (contd)
Minimize the deviance function given by:

D yi , i y i i

Establish stopping criteria based on:


Cardinality threshold - leaf node is smaller than certain
absolute size.
Homogeneity threshold - deviance of leaf node is less
than some small percentage of the deviance of the root
node; I.e., leaf node is homogeneous enough.
108

Regression Tree Modeling


(contd)
Application

Software for medical imaging system, consisting of 4500


modules amounting to 400,000 lines of code written in Pascal,
FORTRAN, assembly language, and PL/M.

Random sample of 390 modules from the ones written in Pascal


and FORTRAN, consisting of about 40,000 lines of code.

Software was developed over a period of five years, and had


been in use at several hundred sites.

Number of changes made to the executable code documented


by Change Reports (CRs) indicates software development effort.

109

Regression Tree Modeling (contd)


Application

Software for medical imaging system, consisting of 4500


modules amounting to 400,000 lines of code written in Pascal,
FORTRAN, assembly language, and PL/M.

Random sample of 390 modules from the ones written in


Pascal and FORTRAN, consisting of about 40,000 lines of
code.

Software was developed over a period of five years, and had


been in use at several hundred sites.

Number of changes made to the executable code documented


by Change Reports (CRs) indicates software development
effort.

110

Regression Tree Modeling (contd)

Application (contd)
Metrics
Total lines of code (TC)
Number of code lines (CL)
Number of characters (Cr)
Number of comments (Cm)
Comment characters (CC)
Code characters (Co)
Halsteads program length
Halsteads estimate of program length metric
Jensens estimate of program length metric
Cyclomatic complexity metric
Bandwidth metric

111

Regression Tree Modeling


(contd)
Pruning

Tree grown using the stopping rules is too elaborate.

Pruning - equivalent to variable selection in linear


regression.

Determines a nested sequence of subtrees of the


given tree by recursively snipping off partitions with
minimal gains in deviance reduction

Degree of pruning can be determine by using crossvalidation.

112

Regression Tree Modeling (contd)


Pruning
Tree grown using the stopping rules is too elaborate.
Pruning - equivalent to variable selection in linear
regression.
Determines a nested sequence of subtrees of the given
tree by recursively snipping off partitions with minimal
gains in deviance reduction
Degree of pruning can be determine by using crossvalidation.
113

Regression Tree Modeling (contd)


Cross-Validation
Evaluate the predictive performance of the regression tree
and degree of pruning in the absence of a separate
validation set.
Data are divided into two mutually exclusive sets, viz.,
learning sample and test sample.
Learning sample is used to grow the tree, while the test
sample is used to evaluate the tree sequence.
Deviance - measure to assess the performance of the
prediction rule in predicting the number of errors for the
test sample of different tree sizes.
114

Regression Tree Modeling (contd)


Performance Analysis

Two types of errors:


Predict more faults than the actual number - Type I misclassification.
Predict fewer faults than actual number - Type II error.

Type II error is more serious.

Type II error in case of tree modeling is 8.7%, and in case of fault density
is 13.1%.

Tree modeling approach is significantly than fault density approacy.

Can also be used to classify modules into fault-prone and non fault-prone
categories.

Decision rule - classifies the module as fault-prone if the predicted


number of faults is greater than a certain a.

Choice of a determines the misclassification rate.

115

Part VIII: Software Reliability Tools

SRMP
SMERFS
CASRE

116

Where Do They Come From?

Software Reliability Modeling Program (SRMP)


Bev Littlewood of City University, London
Statistical Modeling and Estimation of Reliability
Functions for Software (SMERFS)
William Farr of Naval Surface Warfare Center
Computer-Aided Software Reliability Estimation Tool
(CASRE)
Allen Nikora, JPL & Michael Lyu, Chinese
University of Hong Kong

117

SRMP Main Features

Multiple Models (9)


Model Application Scheme: Multiple Iterations
Data Format: Time-Between-Failures Data Only
Parameter Estimation: Maximum Likelihood
Multiple Evaluation Criteria - Prequential
Likelihood, Bias, Bias Trend, Model Noise
Simple U-Plots and Y-Plots
118

SMERFS Main Features

Multiple Models (12)


Model Application Scheme: Single Execution
Data Format: Failure-Counts and Time-Between Failures
On-line Model Description Manual
Two parameter Estimation Methods
Least Square Method
Maximum Likelihood Method
Goodness-of-fit Criteria: Chi-Square Test, KS Test
Model Applicability - Prequential Likelihood, Bias, Bias Trend,
Model Noise
Simple Plots

119

The SMERFS Tool Main Menu

Data Input
Data Edit
Data Transformation
Data Statistics

Plots of the Raw Data


Model Applicability Analysis
Executions of the Models
Analyses of Model Fit
Stop Execution of SMERFS

120

CASRE Main Features

Multiple Models (12)


Model Application Scheme: Multiple Iterations
Goodness-of-Fit Criteria - Chi-Square Test, KS Test
Multiple Evaluation Criteria - Prequential Likelihood, Bias,
Bias Trend, Model Noise
Conversions between Failure-Counts Data and TimeBetween-Failures Data
Menu-Driven, High-Resolution Graphical User Interface
Capability to Make Linear Combination Models
121

CASRE High-Level Architecture

122

Further Reading

A. A. Abdel-Ghaly, P. Y. Chan, and B. Littlewood; "Evaluation of Competing Software


Reliability Predictions," IEEE Transactions on Software Engineering; vol. SE-12, pp.
950-967; Sep. 1986.
T. Bowen, "Software Quality Measurement for Distributed Systems", RADC TR-83-175.
W. K. Erlich, A. Iannino, B. S. Prasanna, J. P. Stampfel, and J. R. Wu, "How Faults
Cause Software Failures: Implications for Software Reliability Engineering", published
in proceedings of the International Symposium on Software Reliability Engineering, pp
233-241, May 17-18, 1991, Austin, TX
M. E. Fagan, "Advances in Software Inspections", IEEE Transactions on Software
Engineering, vol SE-12, no 7, July, 1986, pp 744-751
M. E. Fagan, "Design and Code Inspections to Reduce Errors in Program
Development," IBM Systems Journal, Volume 15, Number 3, pp 182-211, 1976
W. H. Farr, O. D. Smith, and C. L. Schimmelpfenneg, "A PC Tool for Software
Reliability Measurement," published in the 1988 Proceedings of the Institute of
Environmental Sciences, King of Prussia, PA

123

Further Reading (contd)

W. H. Farr, O. D. Smith, "Statistical Modeling and Estimation of Reliability Functions for


Software (SMERFS) User's Guide," Naval Weapons Surface Center, December 1988
(approved for unlimited public distribution by NSWC)
J. E. Gaffney, Jr. and C. F. Davis, "An Approach to Estimating Software Errors and
Availability," SPC-TR-88-007, version 1.0, March, 1988, proceedings of Eleventh
Minnowbrook Workshop on Software Reliability, July 26-29, 1988, Blue Mountain Lake,
NY
J. E. Gaffney, Jr. and J. Pietrolewicz, "An Automated Model for Software Early Error
Prediction (SWEEP)," Proceedings of Thirteenth Minnow-brook Workshop on Software
Reliability, July 24-27, 1990, Blue Mountain Lake, NY
A. L. Goel, S. N. Sahoo, "Formal Specifications and Reliability: An Experimental Study",
published in proceedings of the International Symposium on Software Reliability
Engineering, pp 139-142, May 17-18, 1991, Austin, TX
A. Grnarov, J. Arlat, A. Avizienis, "On the Performance of Software Fault-Tolerance
Strategies", published in the proceedings of the Tenth International Symposium on Fault
Tolerant Computing (FTCS-10), Kyoto, Japan, October, 1980, pp 251-253

124

Further Reading (contd)

K. Kanoun, M. Bastos Martini, J. Moreira De Souza, A Method for Software Reliability


Analysis and Prediction - Application to the TROPICO-R Switching System, IEEE
Transactions on Software Engineering, April 1991, pp, 334-344
J. C. Kelly, J. S. Sherif, J. Hops, "An Analysis of Defect Densities Found During
Software Inspections", Journal of Systems Software, vol 17, pp 111-117, 1992
T. M. Khoshgoftaar and J. C. Munson, "A Measure of Software System Complexity
and its Relationship to Faults," proceedings of 1992 International Simulation
Technology Conference and 992 Workshop on Neural Networks (SIMTEC'92 sponsored by the Society for Computer Simulation), pp. 267-272, November 4-6,
1992, Clear Lake, TX
M. Lu, S. Brocklehurst, and B. Littlewood, "Combination of Predictions Obtained from
Different Software Reliability Growth Models," proceedings of the IEEE 10th Annual
Software Reliability Symposium, pp 24-33, June 25-26, 1992, Denver, CO
M. Lyu, ed. Handbook of Software Reliablity Engineering, McGraw-Hill and IEEE
Computer Society Press, 1996, ISBN 0-07-0349400-8

125

Further Reading (contd)

M. Lyu, "Measuring Reliability of Embedded Software: An Empirical Study with JPL Project Data,"
published in the Proceedings of the International Conference on Probabilistic Safety Assessment
and Management; February 4-6, 1991, Los ngeles, CA.
M. Lyu and A. Nikora, "A Heuristic Approach for Software Reliability Prediction: The EquallyWeighted Linear Combination Model," published in the proceedings of the IEEE International
Symposium on Software Reliability Engineering, May 17-18, 1991, Austin, TX M. Lyu and A. Nikora,
"Applying Reliability Models More Effectively", IEEE Software, vol. 9, no. 4, pp. 43-52, July, 1992
M. Lyu and A. Nikora, "Software Reliability Measurements Through Com-bination Models:
Approaches, Results, and a CASE Tool," proceedings the 15th Annual International Computer
Software and Applications Conference COMPSAC91), September 11-13, 1991, Tokyo, Japan
J. McCall, W. Randall, S. Fenwick, C. Bowen, P. Yates, N. McKelvey, M. Hecht, H. Hecht, R. Senn,
J. Morris, R. Vienneau, "Methodology for Software Reliability Prediction and Assessment," Rome Air
Development Center (RADC) Technical Report RADC-TR-87-171. volumes 1 and 2, 1987
J. Munson and T. Khoshgoftaar, "The Use of Software Metrics in Reliability Models," presented at
the initial meeting of the IEEE Subcommittee on Software Reliability Engineering, April 12-13, 1990,
Washington, DC

126

Further Reading (contd)

J. C. Munson, "Software Measurement: Problems and Practice," Annals of Software Engineering, J.


C. Baltzer AG, Amsterdam 1995.
J. C. Munson, Software Faults, Software Failures, and Software Reliability Modeling, Information
and Software Technology, December, 1996.
J. C. Munson and T. M. Khoshgoftaar Regression Modeling of Software Quality: An Empirical
Investigation, Journal of Information and Software Technology, 32, 1990, pp. 105-114.
J. Munson, A. Nikora, Estimating Rates of Fault Insertion and Test Effectiveness in Software
Systems, invited paper, published in Proceedings of the Fourth ISSAT International Conference on
Quality and Reliability in Design, Seattle, WA, August 12-14, 1998
John D. Musa., Anthony Iannino, Kazuhiro Okumoto, Software Reliability: Measurement, Prediction,
Application; McGraw-Hill, 1987; ISBN 0-07-044093-X.
A. Nikora, J. Munson, Finding Fault with Faults: A Case Study, presented at the Annual Oregon
Workshop on Software Metrics, May 11-13, 1997, Coeur dAlene, ID.
A. Nikora, N. Schneidewind, J. Munson, "IV&V Issues in Achieving High Reliability and Safety in
Critical Control System Software," proceedings of the Third ISSAT International Conference on
Reliability and Quality in Design, March 12-14, 1997, Anaheim, CA.

127

Further Reading (contd)

A. Nikora, J. Munson, Determining Fault Insertion Rates For Evolving Software


Systems, proceedings of the Ninth International Symposium on Software Reliability
Engineering, Paderborn, Germany, November 4-7, 1998
Norman F. Schneidewind, Ted W,.Keller, "Applying Reliability Models to the Space
Shuttle", IEEE Software, pp 28-33, July, 1992
N. Schneidewind, Reliability Modeling for Safety-Critical Software, IEEE Transactions on
Reliability, March, 1997, pp. 88-98
N. Schneidewind, "Measuring and Evaluating Maintenance Process Using Reliability,
Risk, and Test Metrics", proceedings of the International Conference on Software
Maintenance, September 29-October 3, 1997, Bari, Italy.
N. Schneidewind, "Software Metrics Model for Integrating Quality Control and Prediction",
proceedings of the 8th International Sympsium on Software Reliability Engineering,
November 2-5, 1997, Albuquerque, NM.
N. Schneidewind, "Software Metrics Model for Quality Control", Proceedings of the Fourth
International Software Metrics Symposium, November 5-7, 1997, Albuquerque, NM.

128

Additional Information

CASRE Screen Shots


Further modeling details
Additional Software Reliability Models
Quantitative Criteria for Model Selection the
Subadditivity Property
Increasing the Predictive Accuracy of Models

129

CASRE - Initial Display

130

CASRE - Applying Filters

131

CASRE - Running Average Trend Test

132

CASRE - Laplace Test

133

CASRE - Selecting and Running Models

134

CASRE - Displaying Model Results

135

CASRE - Displaying Model Results


(contd)

136

CASRE - Prequential Likelihood Ratio

137

CASRE - Model Bias

138

CASRE - Model Bias Trend

139

CASRE - Ranking Models

140

CASRE - Model Ranking Details

141

CASRE - Model Ranking Details (contd)

142

CASRE - Model Results Table

143

CASRE - Model Results Table (contd)

144

CASRE - Model Results Table (contd)

145

Additional Software Reliability


Models

Software Reliability Estimation Models:


Exponential NHPP Models
Generalized Poisson Model
Non-homogeneous Poisson Process Model
Musa Basic Model
Musa Calendar Time Model
Schneidewind Model
Littlewood-Verrall Bayesian Model
Hyperexponential Model
146

Generalized Poisson Model

Proposed by Schafer, Alter, Angus, and Emoto


for Hughes Aircraft Company under contract
to RADC in 1979.
Model is analogous in form to the JelinskiMoranda model but taken within the error
count framework. The model can be shown
reduce to the Jelinski-Moranda model under
the appropriate circumstances.
147

Generalized Poisson Model (cont'd)


Assumptions:
1.
The expected number of errors occurring in any time interval
is proportional to the error content at the time of testing and
to some function of the amount of time spent in error testing.
2.
All errors are equally likely to occur and are independent of
each other.
3.
Each error is of the same order of severity as any other error.
4.
The software is operated in a similar manner as the
anticipated usage.
5.
The errors are corrected at the ends of the testing intervals
without introduction of new errors into the program.

148

Generalized Poisson Model (cont'd)


Construction of Model:
Given testing intervals of length X 1, X2,...,Xn
fi errors discovered during the i'th interval
At the end of the i'th interval, a total of M i errors have been corrected
First assumption of the model yields:
E(fi) = (N - Mi-1)gi (x1, x2, ..., xi)
where
is a proportionality constant
N is the initial number of errors
gi is a function of the amount of testing time spent, previously and
currently. gi is usually non-decreasing. If gi (x1, x2, ..., xi) = xi, then the
model reduces to the Jelinski-Moranda model.

149

Schneidewind Model

Proposed by Norman Schneidewind in 1975.


Model's basic premise is that as the testing progresses
over time, the error detection process changes.
Therefore, recent error counts are usually of more use
than earlier counts in predicting future error counts.
Schneidewind identifies three approaches to using the
error count data. These are identified in the following
slide.

150

Schneidewind Model

First approach is to use all of the error counts for all


testing intervals.
Second approach is to use only the error counts from test
intervals s through m and ignore completely the error
counts from the first s - 1 test intervals, assuming that
there have been m test intervals to date.
Third approach is a hybrid approach which uses the
cumulative error count for the first s - 1 intervals and the
individual error counts for the last m - s + 1 intervals.
151

Schneidewind Model (cont'd)


Assumptions:
1.

The number of errors detected in one interval is independent


of the error count in another.

2.

The error correction rate is proportional to the number of errors


to be corrected.

3.

The software is operated in a similar manner as the


anticipated operational usage.

4.

The mean number of detected errors decreases from one


interval to the next.

5.

The intervals are all of the same length.

152

Schneidewind Model (cont'd)


Assumptions (cont'd):
6.

The rate of error detection is proportional the number of


errors within the program at the time of test. The error
detection process is assumed to be a nonhomogeneous
Poisson process with an exponentially decreasing error
detection rate. This rate of change is:

d e

and
i > 0 are constants of
for the i'th interval, where

>
0

i
the model.

153

Schneidewind Model

Proposed by Norman Schneidewind in 1975.


Model's basic premise is that as the testing progresses
over time, the error detection process changes.
Therefore, recent error counts are usually of more use
than earlier counts in predicting future error counts.
Schneidewind identifies three approaches to using the
error count data. These are identified in the following
slide.

154

Schneidewind Model

First approach is to use all of the error counts for all


testing intervals.
Second approach is to use only the error counts from test
intervals s through m and ignore completely the error
counts from the first s - 1 test intervals, assuming that
there have been m test intervals to date.
Third approach is a hybrid approach which uses the
cumulative error count for the first s - 1 intervals and the
individual error counts for the last m - s + 1 intervals.
155

Schneidewind Model (cont'd)


Assumptions:
1.

The number of errors detected in one interval is independent


of the error count in another.

2.

The error correction rate is proportional to the number of errors


to be corrected.

3.

The software is operated in a similar manner as the


anticipated operational usage.

4.

The mean number of detected errors decreases from one


interval to the next.

5.

The intervals are all of the same length.

156

Schneidewind Model (cont'd)


From assumption 6, the cumulative mean number of
errors is:

i
1 e

for the i'th interval, the mean number of errors is:

i 1

m i = Di - Di - 1
157

Nonhomogeneous Poisson Process

Proposed by Amrit Goel of Syracuse University and Kazu


Okumoto in 1979.
Model assumes that the error counts over nonoverlapping time intervals follow a Poisson distribution.
It is also assumed that the expected number of errors in
an interval of time is proportional to the remaining number
of errors in the program at that time.
158

Nonhomogeneous Poisson Process (contd)


Assumptions:
1.

The software is operated in a similar manner as the anticipated


operational usage.

2.

The numbers of errors, (f1, f2, f3,...,fm) detected in each of the respective
time intervals [(0, t1), (t1, t2),...,(tm-1,tm)] are independent for any finite
collection of times t1 < t2 < ... < tm.

3.

Every error has the same chance of being detected and is of the same
severity as any other error.

4.

The cumulative number of errors detected at any time t, N(t), follows a


Poisson distribution with mean m(t). m(t) is such that the expected
number of error occurrences for any time (t, t + t), is proportional to the
expected number of undetected errors at time t.

159

Nonhomogeneous Poisson Process (contd)


Assumptions (contd):
5.

The expected cumulative number of errors function, m(t), is


assumed to be a bounded, nondecreasing function of t with:
m(t) = 0 for t = 0
m(t) = a for t =
where a is the expected total number of errors to be
eventually detected in the testing process.

160

Nonhomogeneous Poisson Process (contd)


Assumptions 4 and 5 give the number of errors discovered in the
interval
as:

t , t t

m t t m t b a m t t O t

where
b is a constant of proportionality
a is the total number of errors in the program
Solving the above differential equations yields:

m t a1 e

Which satisfies the initial conditions bt

m 0 0, m a

161

Musa Basic Model


Assumptions:
1.

Cumulative number of failures by time t, M(t), follows a


Poisson process with the following mean value function:

t 1 exp
0

2.

Execution times between failures are piecewise


exponentially distributed, i.e, the hazard rate for a single
failure is constant.

162

Musa Basic Model (contd)

Mean value function given on previous slide.


Failure intensity given by:

t t

1t

163

Musa Calendar Time

Developed by John Musa of AT&T Bell Labs between 1975 and


1980.
This model is one of the most popular software reliability
models, having been employed both inside and outside of
AT&T.
Basic model (see previous slides) is based on the amount of
CPU time that occurs between successive failures rather than
on wall clock time.
Calendar time component of the model attempts to relate CPU
time to wall clock time by modeling the resources (failure
identification personnel, failure correction personnel, and
computer time) that may limit various time segments of testing.

164

Musa Calendar Time (cont'd)

Musa's basic model is essentially the same as the Jelinski-Moranda model


The importance of the calendar component of the model is:
Development of resource allocation (failure identification staff, failure
correction staff, and computer time)
Determining the relationship between CPU time and wall clock time
Let tI/ = instantaneous ratio of calendar to execution time resulting
from effects of failure identification staff
Let tF/ = instantaneous ratio of calendar to execution time resulting
from effects of failure correction staff
Let tC/ = instantaneous ratio of calendar to execution time resulting
from effects of available computer time

165

Musa Calendar Time (cont'd)


Increment in calendar time is proportional to the average amount by which
the limiting resource constrains testing over a given execution time segment:
2

in MTTF from to can


Resource requirements
t associated
max with, a change
,
1
2
d
d d d
be approximated by:
1
X = + m
= execution time increment
m = increment of failures experienced
= execution time coefficient of resource expenditure
= failure coefficient of resource expenditure

dtI dtF dtC

166

Littlewood-Verrall Bayesian Model

Littlewood's model, a reformulation of the Jelinski-Moranda


model, postulated that all errors do not contribute equally to
the reliability of a program.
Littlewood's model postulates that the error rate, assumed to
be a constant in the Jelinski-Moranda model, should be
treated as a random variable.
The Littlewood-Verrall model of 1978 attempts to account for
error generation in the correction process by allowing for the
probability that the program could be made less reliable by
correcting an error.

167

Littlewood-Verrall (cont'd)
Assumptions:
1.
Successive execution times between failures are independent random variables with
probability density functions

X e

where the i are the error rates


iX i

i
i
i
i's form a sequence
of random
variables, each with a gamma distribution of parameters
and (i), such that:

i i

(i) is an increasing function of i that describes the


i "quality" of the programmer and the
"difficulty" of the task. i

168

Littlewood-Verrall (cont'd)
Assumptions (cont'd)
Imposing the constraint
any
P j x P j 1 xfor
x reflects the intention to make the program better by
correcting errors. It also reflects the fact that
sometimes corrections will make the program worse.
3.

The software is operated in a similar manner as the


anticipated operational usage.

169

Littlewood-Verrall (cont'd)
f X i , i f

X g d
i

ie (i ) Xi [ (i )] i 1e (i ) Xi di
0
( )

i
1
X i i

This is a Pareto distribution

170

Littlewood-Verrall (cont'd)
Joint density for the Xi's is given by:
n

f[X1, X2, ... , Xn | , (i)] =

(i )
n

i 1

Xi ( i )

i 1

For , Littlewood and Verrall suggest:


i 0 1i
or
i 0 1 i2

171

Hyperexponential Model

Extension to classical exponential models. First considered by Ohba; addressed in variations by Yamada,
Osaki, and Laprie et. al.
Basic idea is that different sections of software experience an exponential failure rate. Rates may vary over
these sections to reflect their different natures.

172

Hyperexponential Model (contd)


Assumptions:
Suppose that there are K sections of the software such that within
each class:
1.
The rate of fault detection is proportional to the current fault
content within that section of the software.
2.
The fault detection rate remains constant over the intervals
between fault occurrence.
3.
A fault is corrected instantaneously without introducing new
faults into the software.
4.
Every fault has the same chance of being encountered and is of
the same severity as any other fault.

173

Hyperexponential Model (contd)


Assumptions (contd):
1.

2.

The failures, when the faults are detected, are


independent.
The cumulative number of failures by time t, (t), follows a
Poisson process with the following mean value function:
K

( t ) N pi[1 exp( it )]
i 1

where 0 i 1,

1,

i 1

0 pi 1, and N is finite
174

Criteria for Model Selection


Quantitative Criteria Prior to Model Application
Subadditive Property Analysis

175

Subadditive Property Analysis

Common definition of reliability growth is that successive


interfailure times tend to grow larger:

i j

T T
i

st

Under the assumption of the interfailure times being


stochastically independent, we have:

i j and x 0

F T x F T x
i

176

Subadditive Property Analysis (contd)

As an alternative to assumption of stochastic independence,


we can consider that successive failures are governed by a
non-homogeneous Poisson process:
N(t) = cumulative number of failures observed in [0,t]
H(t) = E[N(t)], mean value of cumulative failures
h(t) = dH(t)/dt, failure intensity
Natural definition of reliability growth is that the increase in the
expected number of failures, H(t), tends to become lower.
However, there are situations where reliability growth may take
place on average even though the failure intensity fluctuates
locally.

177

Subadditive Property Analysis (contd)

To allow for local fluctuations, we can say that the expected


number of failures in any initial interval (i.e., of the form [0,t]) is
no less than the expected number of failures in any interval of
the same length occurring later (i.e., in [x,x+t]).
Independent increment property of an NHPP allows above
definition to be written as:
H(t1)+H(t2) H(t1+ t2)
When above inequality holds, H(t) is said to be subadditive.
This definition of reliability growth allows there to be local
intervals in which reliability decreases without affecting the
overall trend of reliability increase.

178

Subadditive Property Analysis (contd)

We can interpret the subadditive property graphically. Consider the curve (t), representing
the mean value function for the cumulative number of failures, and the line L t joining the two
end points of (t) at (t,H(t)), shown below.

H(x)

(t)

Let AH(t) denote the difference between the area delimited by (1) (t) and the coordinate axis
and (2) Lt and the coordinate axis. If H(t) is subadditive, then AH(t) 0 for all t [0,T]. We
call AH(t) the subadditivity factor.
t
t
1

179

Subadditive Property Analysis (contd)


Summary

AH(t) 0 over [0,T] implies reliability growth on average over [0,T].

AH(t) 0 over [0,T] implies reliability decrease on average over [0,T].

AH(t) constant over [0,T] implies stable reliability on average over [0,T].

Changes in the sign of AH(t) indicate reliability trend changes.

When derivative with respect to time of AH(t), Ah(t), 0 over subinterval


[t1,t2], this implies local reliability growth on average over [t 1,t2]

Ah(t) 0 over [t1,t2] implies local reliability decrease on average over [t 1,t2].

Changes in the sign of Ah(t) indicate local reliability trend changes.

Transient changes of the failure intensity are not detected by the


subadditivity property.

180

Increasing the Predictive Accuracy of


Models

Introduction
Linear Combination Models
Statically-Weighted Models
Statically-Determined/Dynamically-Assigned
Weights
Dynamically-Determined/Dynamically-Assigned
Weights
Model Application Results
Model Recalibration
181

Problems in Reliability Modeling


Introduction
Significant differences exist among the performance
of software reliability models
When software reliability models were first introduced,
it was felt that a process of refinement would produce
a definitive model
The reality is: no single model could be determined a
priori as the best model during measurement

182

The Reality Is In Between


Pessimistic
Projection
Reality
Cumulative
Defects

Optimistic
Projection

Elapsed time

183

Linear Combination Models


Forming Linear Combination Models
(1) Identify a basic set of models (called component
models)
(2) Select models that tend to cancel out in their biased
predictions
(3) Keep track of the software failure data with all the
component models
(4) Apply certain criteria to weigh the selected component
models and form one or several Linear Combination
Models for final predictions

184

A Set of Linear Combination Models


Selected component models : GO, MO, LV
(1) ELC Equally-Weighted LC Model (Statically-Weighted model)
(2) MLC Median-Oriented LC Model (StaticallyDetermined/Dynamically-Assigned)
(3) ULC Unequally-Weighted LC Model (StaticallyDetermined/Dynamically-Assigned)
(4) DLC Dynamically-Weighted LC Model (Dynamically
Determined and Assigned)
Weighting is determined by dynamically calculating the
posterior prequential likelihood of each model as a metapredictor

185

A Set of Linear Combination Models (contd)


Equally-Weighted Linear Combination Model - apply equal weights in
selecting component models and form the Equally-Weighted Linear
Combination model for final predictions.
One possible ELC is:

1
1
1
GO MO LV
3
3
3

These models were chosen as component models because:


Their predictive validity has been observed in previous investigations.
They represent different categories of models: exponential NHPP,
logarithmic NHPP, and inverse-polynomial Bayesian.
Their predictive biases tend to cancel for the five data sets analyzed.

186

A Set of Linear Combination Models (contd)


Median-Oriented Linear Combination Model - instead of choosing
the arithmetic mean as was done for the ELC model, use the
median value. This model's formulation is:

0 * P 1* M 0 * O
where :
O = optimistic prediction
M = median prediction
P = pessimistic prediction

187

A Set of Linear Combination Models (contd)


Unequally-Weighted Linear Combination Model - instead of choosing the
arithmetic mean as was done for the ELC model, use the PERT weighting
scheme. This model's formulation is:

1
4
1
P M O
6
6
6
where:
O = optimistic prediction
M = median prediction
P = pessimistic prediction

188

A Set of Linear Combination Models (contd)

Dynamically-Weighted LC Model - use changes in one or more of


the four previously defined criteria (e.g. prequential likelihood) to
determine weights for each component model.
Changes can be observed over windows of varying size and type.
Fixed or sliding windows can be used.
w

w i computation
w i reference window

i+2

i+1

i+1

reference window

computation
w

i+2

i+2

computation

reference window

Time

reference window

w
reference window
i+1for component models vary over time.
Weights
w i reference window

Time

189

Data Set 1: Model Comparisons for


Voyager Project

Recommended Models: 1-DLC 2-ELC 2-LV

190

Overall Model Comparisons Using All


Four Criteria
Summary of Model Ranking for Each Data Set by All Four Criteria
Model
JM
GO
MO
DU
LM
LV
ELC
ULC
MLC
10
9
1
6
8
6
4
2
3
9
10
6
7
8
1
4
5
2
6
8
4
9
9
6
4
3
2
10
7
6
7
9
2
2
4
5
5
7
10
6
9
4
1
3
8
8
6
6
8
10
1
1
1
4

DLC
5
2
1
1
2
5

5
1

5
5

8
1

1
9

9
3

10
10

1
8

5
7

4
3

3
6

Sum of
Rank

54

57

42

53

65

40

25

30

31

25

Handicap

+22

+25

+10

+21

+33

+8

-7

-2

-1

-7

Total
Rank

10

Data Set
RADC 1
RADC 2
RADC 3
Voyager
Galileo
Galileo
CDS
Magellan
Alaska
SAR

191

Overall Model Comparisons by the


Prequential Likelihood Measure
Model Ranking Summary for Each Data Sets Prequential Likelihood
Model
JM
GO
MO
DU
LM
LV
ELC
ULC
MLC
10
9
2
8
6
7
5
4
3
7
9
4
10
7
1
4
4
3
4
7
4
10
8
9
2
2
4
10
7
6
8
9
2
3
4
5
5
7
9
10
5
4
2
3
8
6
5
8
10
6
2
3
4
8

DLC
1
2
1
1
1
1

6
2

6
6

6
2

2
10

6
2

5
9

3
8

4
7

6
2

1
1

Sum of
Rank

50

56

41

68

49

39

30

32

39

Handicap

+18

+24

+9

+36

+17

+7

-2

+7

-23

Total
Rank

10

Data Set
RADC 1
RADC 2
RADC 3
Voyager
Galileo
Galileo
CDS
Magellan
Alaska
SAR

192

Summary of Long-Term Predictions

193

Possible Extensions of LC Models


1.
2.
3.

4.

5.

Apply models other than GO, MO, and LV


Apply more than three component models
Apply other meta-predictors for weight
assignments in DLC-type models
Apply user-determined weighting schemes,
subject to project criteria and user judgment
Apply combination models as component
models for a hybrid combination
194

Increasing the Predictive Accuracy of


Models
Model Recalibration - Developed by Brocklehurst, Chan, Littlewood, and
Snell. Uses model bias function (u-plot) to reduce bias in model
prediction.
Let the random variable T i represent the prediction of the next time to
failure, based on observed times to failure t1 , t 2 , , t i1 .
Let F i t represent the true, but hidden, distribution of the random

variable T i, and let F t represent the prediction of T i. The relationship


between the estimated and true distributions can be written as:
i

F i t Gi F i t

195

Increasing the Predictive Accuracy of


Models (contd)
Model Recalibration (contd):
If Gi were known, it be possible to determine the true distribution of T i
from the inaccurate predictor.
The key notion in recalibration is that the sequence Gi is
approximately stationary. Experience seems to show that Gi
changes only slowly in many cases.* This opens the possibility of
approximating Gi with an estimate Gi and using it to form a new
prediction:
*
*
F i t Gi F i t

196

Increasing the Predictive Accuracy of


Models (contd)
Model Recalibration (contd):
A suitable estimator for Gi is suggested by the observation that
*
Gi is the distribution function U i F i T i . The estimate Gi is based on
the u-plot calculated from predictions which have been made prior to Gi .

This new prediction recalibrates the model based on *knowledge of the


accuracy of past predictions. The simplest form of Gi is the u-plot with
the steps joined to form a polygon. Smooth versions can be constructed
using spline techniques.

197

Increasing the Predictive Accuracy of


Models (contd)
Model Recalibration (contd):
To recalibrate a models predictions, follow these four steps:
1.
Check that error in previous predictions is approximately
stationary. The y-plot can be used for this purpose.
2.
Find the u-plot for predictions made before Ti, and join up
the steps on the plot to form a polygon G* (alternatively, use
a spline technique to construct a smooth version).
3.
Use the basic model (e.g., the JM, LV, or MO models) to
t
make a raw prediction F i .

4.
Recalibrate the raw prediction using F i t Gi F i t .

198

Potrebbero piacerti anche