Sei sulla pagina 1di 2

Software Reliability Engineering in Industry

John D. Musa

Software Reliability Engineering and Testing Courses

Abstract. Software reliability engineering has recently been playing a rapidly


increasing role in industry [1]. This has occurred because it carefully plans and
guides development and test so that you develop a more reliable product faster
and cheaper. In this paper we will first describe what software reliability
engineering is. Then we will discuss the current state of the practice; that is,
how industry is using it. The current “best” way of practicing software
reliability engineering will be discussed. Finally, we will outline some of the
important open research questions; solutions to these problems hold great
promise for further advances.

1. Introduction

Software reliability engineering (SRE) is a practice for quantitatively planning and


guiding software development and test, with emphasis on reliability and availability
[2,3,4,5,6]. We define reliability as the probability a system or a capability of a
system functions without failure for a specified time or number of natural units in a
specified environment. Natural units are units other than time related to the output of
a software-based product, such as pages of output, transactions, telephone calls, or
jobs. Availability is the probability that a system or a capability of a system is
functional at a given time in a specified environment.
SRE quantifies expected use by function and uses this information to make product
development and test more efficient. It matches major quality characteristics
(reliability, availability, schedule, cost) to user needs more precisely by setting
quantitative objectives for reliability and/or availability as well as schedule and cost.
Then it engineers project strategies to meet the objectives. Finally, it tracks reliability
during system test against each objective as one of the release criteria.

2. State of the Practice

SRE is a proven, standard, best current practice that is widely applicable, low in cost
and schedule impact, and widespread in use.
As an example of the proven value of SRE, consider the development of a release of
the AT&T International Definity PBX [7, pp 167-8]. When SRE was applied to this

M. Felici, K. Kanoun, A. Pasquini (Eds.): SAFECOMP’99, LNCS 1698, pp. 1-12, 1999
© Springer-Verlag Berlin Heidelberg 1999
2 J.D. Musa

release, the project experienced a reduction in customer-reported problems by a factor


of 10, a reduction of system test interval by a factor of 2, a reduction in total
development time of 30%, and no serious service outages in 2 years of deployment.
As the result of experiences like this on a number of projects, SRE was proposed as a
candidate to become an AT&T best current practice. Qualification as an AT&T best
current practice requires use on several (typically eight to 10) projects with
documented large benefit/cost ratios, as well as a probing review by two boards of
high-level managers. Some 70 project managers also reviewed the practice of SRE
before it received approval in May 1991. Standards for approval as an AT&T best
current practice are high; only five of 30 proposed best current practices were
approved in 1991.
AT&T’s Operations Technology Center in its Network Computing Services
Division applied the SRE best current practice in a large fraction of its projects. It was
the primary software development organization for the AT&T business unit that won
the Malcolm Baldrige National Quality Award in 1994. In addition, four of the first
five software winners of the AT&T Bell Laboratories President’s Quality Award used
SRE.
The American Institute of Aeronautics and Astronautics approved SRE as a
standard in 1993, and IEEE standards are under development. McGraw-Hill and the
IEEE Computer Society Press recently recognized the rapid maturing and
standardization of the field, publishing a handbook on the topic [7]. The IEEE
Computer Society’s Technical Committee on Software Reliability Engineering is
growing very rapidly and currently has a membership of more than 1,000 [8,9].
SRE implements perhaps the most significant concept of the higher levels of the
Software Engineering Institute’s Capability Maturity Model: that you need to measure
the results of the software development process and use this information to optimize
it. By providing a means for measuring process results, it provides a way to help
evaluate the effectiveness of methodologies and tools that are being considered as
possible standards. Thus it can help rationalize the many quality standards efforts that
are currently under way.
SRE requires no changes in architecture, design, or code. Technically speaking, you
can apply SRE to any software-based product, beginning at start of any release cycle.
Economically speaking, SRE is also applicable to virtually all software-based
products, although it may be impractical for small components (involving perhaps
less than 2 staff months of effort), except perhaps in abbreviated form, unless you use
them in a large number of products. Thus particular promise may lie in applying SRE
to certify object libraries. Although object-oriented concepts have made better
modularization possible, the promise and benefits of reuse are not being fully realized
because developers (and probably rightly so) strongly resist using objects whose
reliability they cannot vouch for.
Investment cost is low, involving no more than 3 equivalent staff days per person in
an organization, including presenting an overview and a course to everyone and
allowing for planning. Note also that you should actually write the investment cost
off over multiple projects. Table 1 shows life-cycle operating cost as a function of
project size.

Potrebbero piacerti anche