Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Russell Frith
4/12/2011
0
Abstract
Software engineering cost estimation is the process of predicting the effort required to
develop a software system. Cost estimation techniques involve distinctive steps, tools, algorithms
and assumptions. Many estimation models have been developed since the 1980s due to the
dyanmic nature of software engineering practices. Despite the evolution of new cost estimation
techniques, fundamental economic principles underlie the overall structure of the software
engineering life cycle, and its primary refinements of prototyping, incremental development, and
advancement. This paper provides a general overview of software cost estimation methods,
including recent advances in the field. Many of the models rely on a software project size
estimate as input and this paper provides details for common size metrics. The primary economic
driver of the software life-cycle structure is the significantly increasing cost of making software
changes or fixing software problems as a function of a development phase in which the change
or fix is made. Software engineering models are classified into two major categories: algorithmic
and non-algorithmic. Each has its own strengths and weaknesses with regard to implementing
modifications to software projects. A key factor in selecting a cost estimation model is the
1. Introduction
In recent years, software has become the most expensive component of computer system
projects. The cost of software development is mostly sourced in human efforts, and most
estimation efforts focus on this aspect and give estimates in terms of person-months. If one
considers economics as the study of how people make decisions in resource-limited situations,
then economics category of macroeconomics is the study of how people make decisions in
more personal scale and it treats decisions that individuals and organizations make on such issues
as how much insurance to buy, which word software development systems to procure, or what
resources. There is never enough time or money to encompass all the essential features software
vendors would like to put into their products. Even with cheap hardware, storage, memory, and
networks, software projects must always operate within a world of limited computing and
network resources. Subsequently, accurate software cost estimates are critical to both developers
and customers. Those estimates can be used for generating requests for proposals, contract
could result in management approving proposed systems that potentially exceed budget
time. Conversely, overestimating costs may result in too many resources committed to a project,
or, during contract bidding, result in losing a contract and loss of jobs.
It can help to classify and prioritize development projects with respect to an overall
business plan,
2
It can be used to determine what resources to commit to the project and how well those
Projects can be easier to manage and control when resources are better matched to real
needs, and
Three fundamental estimates typically comprise a software cost estimate and these are effort in
person-months, project duration, and cost. Most cost estimation models attempt to generate an
effort estimate which is then converted into a project duration time-line and cost. The relation
between effort and cost may be non-linear, although effort is measured in person-months of
programmers, analysts, and project managers and effort estimates can be converted to a dollar
cost figure by calculating an average salary per unit time of the staff involved and multiplying
Which software size measurement should be used lines of code (LOC), function points
A widely used practice of cost estimation method is that of using expert judgment. Using this
technique, project managers rely on experience and prevailing industry norms as a basis to
3
develop cost estimates. Basing estimates on expert judgement can be somewhat error prone
however:
The approach is not repeatable and the means of deriving an estimate is subjective.
In general, the relationship between cost and system size is not linear. Costs tend to
increase exponentially with size, which subsequently confines expert judgment estimates
Budget alterations by management aimed at avoiding cost overruns make experience and
There exist alternatives to expert judgements, some theoretical and not very useful, others having
more pragmatic value and they are presented in the software engineering cost estimation section.
In the last four decades, many quantitative software cost estimation models have been developed
and they range from empirical models such as Boehms COCOMO models [4] to analytical
models such as those in [7, 22, 23]. An empirical model uses data from previous projects to
evaluate the current project and derives the basic formulae from analysis of the particular
database available. Alternative analytical models use formulae based on global assumptions,
such as the rate at which developers solve problems and the number of problems available.
A well-constructed software cost estimate should have the following properties [24]:
It is conceived and supported by the project manager and the development team.
It is defined in enough detail so that its key risk areas are understood and the probability
Hindrances to developing a reliable software engineering cost estimate include the following:
effort and productivity, and which relationships are not well understood,
Throughout the software life cycle, there are many decision situations involving limited
resources in which software engineering techniques provide useful assistance. See Figure II in
the appendix for elements of a computer programming project cycle. To provide a feel for the
nature of these economic decision issues, an example is given below for each of the major phases
in the software life cycle. In addition, refer to Figure III in the appendix for the loopback nature
Feasibility Phase: How much should one invest in information system analyses (user
5
simulations, scenarios, prototypes) in order to obtain convergence on an appropriate
Plans and Requirements Phase: How rigorously should requirements be specified? How
Product Design Phase: Should developers organize software to make it possible to use a
complex piece of existing software which generally but not completely meets
requirements?
Programming Phase: Given a choice between three data storage and retrieval schemes
Integration and Test Phase: How much testing and formal verification should be
which the cost estimate is used to derive a project plan. Typical steps in a planning process
include:
6
1. The project manager develops a characterization of the overall functionality, size,
2. A macro-level estimate of the total effort and schedule is developed using a software cost
estimation model.
3. The project manager partitions the effort estimate into a top-level work breakdown
structure. In addition, the schedule is partitioned into major milestone dates and a staffing
profile is configured.
strengths;
7. once the project has started, monitor its actual cost and progress, and feedback results to
project management.
Regardless of which estimation model is selected, consumers of the model must pay attention to
7
Since some models generate effort estimates for the full software life-cycle and others do
not include effort for the requirements stage, coverage of the estimate is essential.
The microeconomics field provides a number of techniques for dealing with software
life-cycle decision issues such as the ones mentioned early in this section. Standard optimization
techniques can be used when one can find a single quantity such as rupees or dollars to serve as a
universal solvent into which all decision variables can be converted. Or, if nonmonetary
objectives can be expressed as constraints (system availability must be 98%, throughput must be
150 transactions per second), then standard constrained optimization techniques can be used. If
cash flows occur at different times, then present-value techniques can be used to normalize them
engineering economics analysis techniques. One such technique compares cost and benefits. An
example involves the provisioning of a cell phone service in which there are two options.
Option A: Accept an available operating system that requires $80K in software costs, but
will achieve a peak performance of 120 transactions per second using five $10K
Option B: Build a new operating system that would be more efficient and would support a
8
In general, software engineering decision problems are even more complex as Options A and B
and will have several important criteria on which they differ such as robustness, ease of tuning,
ease of change, functional capability, and so on. If these criteria are quantifiable, then some type
of figure of merit can be defined to support a comparative analysis of the preference of one
option over another. If some of the criteria are unquantifiable (user goodwill, programmer
morale, etc.), then some techniques for comparing unquantifiable criteria need to be used.
In software engineering, decision issues are generally complex and involve analyzing
risk, uncertainty, and the value of information. The main economic analysis techniques available
1. Techniques for decision making under complete uncertainty, such as the maximax rule,
the maximin rule and the Laplace rule [19]. These techniques are generally inadequate for
outcome; i.e., successful development of a new operating system, and complete the
making under complete uncertainty, but they still involve a great deal of risk if the
prototyping is a way of buying information to reduce uncertainty about the likely success
9
high-risk elements, one can get a clearer picture of the likelihood of successfully
Information-buying often tends to be the most valuable aid for software engineering decisions.
The question of how much information-buying is enough can be answered via statistical decision
theoretic techniques using Bayes Law, which provides calculations for the expected payoff from
a software project as a function of the level of investment in a prototype. In practice, the use of
Bayes Law involves the estimation of a number of conditional probabilities which are not easy
to estimate accurately. However, the Bayes Law approach can be translated into a number of
Condition 1: There exist attractive alternatives which payoff varies greatly, depending on
some critical states of nature. If not, engineers can commit themselves to one of the attractive
not, engineers can again commit without major risk. For situations with extremely high
variations in payoff, the appreciable probability level is lower than in situations with smaller
variations in payoff.
occurrence of the critical states of nature. If not, the investigations will not do much to reduce
10
Condition 4: The required cost and schedule of the investigations do not overly curtail
their net value. It does one little good to obtain results which cost more than those results can
save for us, or which arrive too late to help make a decision.
Condition 5: There exist significant side benefits derived from performing the
investigations. Again, one may be able to justify an investigation solely on the basis of its value
During the 1950s and the 1960s, relatively little progress was made in software cost
estimation, while the frequency and magnitude of software cost overruns was becoming critical
to many large systems employing computers. In 1964, the U.S. Air Force contracted with System
Development Corporation for a landmark project in software cost estimation. The project
collected 104 attributes of 169 software projects and treated them to extensive statistical analysis.
One result was the 1965 SDC cost model which was the best possible statistical 13-parameter
When applied to its database of 169 projects, this model produced a mean estimate of 40 MM
and a standard deviation of 62MM; not a very accurate predictor. The model is also
11
counterintuitive; a project will all zero values for variables is estimated at -33 MM; changing
language from a higher order language to assembly adds 7 MM, independent of project size. One
can conclude that there were too many nonlinear aspects of software development for a linear
Today, software size is the most important factor that affects software cost. There exist
five fundamental software size metrics used in practice. Two of the most commonly used metrics
include the Lines of Code and Function Point metrics. The Lines of Code metric is the
number of lines of delivered source code for the software and it is known as LOC [9], and is
programming language dependent. Most models relate this measurement to the software cost, but
the exact LOC can only be obtained after the project has completed. Thus, estimating project
One method for estimating code size is to use experts judgment together with a
technique called PERT (Program Evaluation and Review Technique)[2]. The model is based
upon three possible code-sizes: S1, the lowest possible size; Sh the highest possible size; and
S l S h 4S m
Sm, the most likely size. An esitmate of the code-size S may be computed as S
6
.This formula is valid for modular code components and can be summed with other code
An alternative PERT proposed by Halstead [11] uses code length and volume metrics.
Code length is used to measure sorce code program length and is defined as N N1 N 2 where
N1 is the total number of operator occurances and N2 is the total number of operand occurances.
Volume corresponds to the amount of storage space and is defined as V N log(n1 n 2) where
12
n1 is the number of distinct operators and n2 is the number of distinct operands that appear in the
points. This is a measurement based on the functionality of the program and was introduced by
Albrecht [1]. The total number of function points depends on the counts of distinct logic types in
2. User-output types: output data types to the user that leaves the system
4. Internal file types: files that are used and shared inside the system
5. External file types: files that are passed or shared between the system and other systems.
Each of these types is individually assigned one of three complexity levels of {1 = simple, 2 =
medium, 3 = complex} and given a weighting value that varies from three for simple input to 15
for complex internal files. The unadjusted function-point counts (UFC) is given as
5 3
UFC N ijWij where Nij and Wij are respectively the number and weight types of class I
i 1 j 1
and complexity j. For instance, if the raw function-point counts of a project are two simple inputs
(Wij = 3), two complex outputs (Wij = 7) and one complex internal file (Wij = 15) then the UFC
value is computes as 2*3 + 2*7 + 15 = 35. This initial function- point count is either directly
used for cost estimation or is further modified by factors which values depend on the overall
13
complexity of the project. The accounting consists of the degrees of distributed processing, the
amount of resuse, performance requirements, and so on. The advantage of the function-point
measurement is that it can be obtained based on the system requirement specification in the early
UFC may also be used for code size estimation using a linear formula:
LOC = a*UFC + b. The parameters a and b can be obtained using linear regression and
previously completed project data. The latest Function Point Counting Practices Manual is
An extension of the function point software measurement technique is the feature point
measurement technique. Feature point extends function points to include algorithms as a new
class [16]. An algorithm is defined as the set of rules which must be completely expressed to
solve a significant computational problem. For example, a square root routine can be considered
as an algorithm. Each algorithm used is given a weight ranging from one (elementary) to ten
(advanced) and the feature point is the weighted sum of the algorithms plus the function points.
This measurement is especially useful for systems with few inputs/outputs and high algorithmic
Real-time applications development cost estimation is based on full function point (FFP)
analysis. It takes into account special hardware control of such applications. FFP introduces two
new control data functions types and four new control transactional function types which are
described in [28].
A final consideration for software project size estimation is the use of object points.
While feature point and FFP extend function point estimates, object points measure sizes from a
14
different dimension. These measurements are based on the number and complexity of the
following objects: screens, reports, and 3GL components. Each of these objects is counted and
given a weight ranging from one (simple screen) to ten (3GL component) and object points are
There are two major types of cost estimation methods: algorithmic and non-algorithmic.
The former vary widely in mathematical sophistication and are based on simple arithmetic
formulae using summary statistics [8]. Others are based on regression models [30] and
differential equations [23]. To improve the accuracy of algorithmic models, there is a need to
adjust or calibrate the model to specific circumstances, but even this added work can still lead to
mixed accuracy. Table I in the appendix lists strengths and weaknesses of software cost-
estimation methods. The first part of this comparative discussion will treat non-algorithmic
costing.
Analogy Costing: This method involves reasoning by analogy with one or more completed
projects to relate their actual costs to an estimate of the cost of a similar new project. This
protocol may be used at either the total project level or at the subsystem level. The total project
level has the advantage that all cost components of the system will be considered while the
subsystem level has the advantage of providing a more detailed assessment of the similarities and
differences between the new project and the completed project. Success factors using this
15
Expert Judgment: This method involves consulting one or more experts, perhaps with the aid
of an expert-consensus mechanism. Experts provide estimates using their own methods and
experience. The PERT technique can be used to resolve inconsistencies in estimates. The Delphi
1. The coordinator presents each expert with a specification and a form to record estimates.
2. Each expert fills in the form individually and is allowed to ask the coordinator questions.
3. The coordinator prepares a summary of all estimates from the experts on a form
requesting another iteration of the experts estimates and the rationale for the estimates.
A modification of the Delphi technique proposed by Boehm and Fahquhar [4] has proven more
effective. Before the estimation, a group meeting involving the coordinator and the experts is
arranged to discuss the estimation issues. In step three the experts do not offer any rationale for
their estimates. Instead, after each round of estimation, the coordinator calls a meeting to have
Parkinson: In the Parkinson principle, work expands to fill the available volume. This principle
is used to equate the cost estimate to available resources [21]. For instance, if the software has to
be delivered in 12 months and five people are available, then the effort is estimated to be 60
person-months. This method is hazardous to use in that it has the potential to provide unrealistic
Price-to-win: Using this method, the cost estimate is equated to the price believed necessary to
win the job or to the schedule believed necessary to be first in the market with a new product.
16
The estimate is based on the customers budget instead of the software functionality. For
example, if a reasonable estimation for a project costs 100 person-months but the customer can
only afford 60 person-months, it is common that the estimator is asked to modify the estimation
to fit 60 person-months effort in order to win the project. A very poor practice indeed, but all
too-often used.
Bottom-up: Each component of the software job is separately estimated, and the results
aggregated to produce an estimate for the overall job. An initial design must be in place that
Top-down: An overall cost estimate for the project is derived from global properties of the
software product. The total cost is then split among the various components.
The main conclusions one can draw from Table I are the following:
None of the alternatives is better than the others from all aspects.
The Parkinson and price-to-win methods are unacceptable and do not produce
Algorithmic methods are based on mathematical models that produce cost estimates as a
function of a number of variables, which are considered to be the major cost factors. Any
17
algorithmic model has the form: Effort f ( x1 , x2 ,..., xn ) where {x1 , x2, ..., x n } denotes the set
of cost factors. The existing algorithmic methods differ in two aspects: the selection of cost
emerging. Despite the seven approaches to software cost estimation, there is no definitive way
one can expect a particular technique to compensate for a lack of definition or understanding of
the software job to be done. Until a software specification is fully defined, a software cost
estimation technique represents a range of software development costs. Figure I in the appendix
shows a limitation in cost estimation technology. In the figure, the accuracy of the cost estimates
is shown as a function of the software life-cycle phase. The horizontal line labeled x is a
convergent estimate for the cost of a human-machine interface for a hypothetical software
project. The level of cost uncertainty is the y-axis and its range is between and four times the
convergent cost. This range is somewhat subjective and is intended to represent 80% confidence
limits, that is, within a factor of four on either side, 80% of the time. At the feasibility phase of
the human-machine interface component, the software engineering estimator does not know what
classes of people (clerks, computer specialists, middle managers, etc.) or what classes of data
(raw or pre-edited, numerical or text, digital or analog) the system will have to support. Until
those uncertainties are clarified, a factor of four in either direction serves as a best-guess for a
range of estimates.
The uncertainty envelope contracts once the feasibility phase is completed and the
operational concept is settled. At this stage, the range of estimates constricts to a factor of two on
either side of the convergent estimate. Outstanding issues include the specific types of query to
been developed, at which point the estimate of software costs are ranged within a factor of 1.5 in
Once the product design specification is completed and validated, design issues such as
the internal data structure of the software product and the specific techniques for handling
network input/output between the client computer and the web server will have been resolved. At
this point the software estimate should be accurate to within a factor of 1.2 of the convergent
estimate. The remaining discrepencies are caused by sources of uncertainty in specific algorithms
to be used for database queries, internet error handling, network failure recovery, and so on.
Those issues will be resolved at the detailed desing phase, but there will still be some residual
uncertainty around ten percent and that is based on how well programmers understand the
specifications to which they are to code, or possibly, personnel turnover during development and
test phases.
So far a substantial part of this discussion has treated software cost estimation in terms of
software size. There exist, however, many additional cost factors that are worth mentioning.
Table II in the appendix summarizes a set of cost factors proposed by Boehm et al in the
COCOMO II model for software engineering cost estimation [5]. There are four types of cost
factors. The first set of factors includes product factors and these are placeholders for required
reliability, product complexity, database size, required reusability, and documentation matched to
life-cycle needs. The second set includes computer factors which include execution time
19
constraints, storage constraints, computer turnaround constraints, and platform volatility.
capabilities, platform experience, language and tool experience, and personnel continuity. The
final set of factors includes project factors, the set of which is made up of multisite development,
software tools used, and development schedule. Many of these factors are hard to quantify and in
many models, some are combined, others are omitted. Furthermore, some factors take on discrete
n
Linear models have the form: Effort a0 ai xi where the coefficients are chosen to
i 1
best fit the completed project data, as in Nelsons work [19]. Needless to say, software
development is mostly comprised of nonlinear interactions so this model is less than optimal.
n
Multiplicative models have the form: Effort a0 ai . This model was used by
x i
i 1
Walston-Felix [30] with each xi taking on three possible values of -1, 0, and +1. These models
Power function models have the general form: Effort a S b where S is the code size
and a and b are functions of other cost factors. This class of modeling contains two popular
algorithmic models.
20
COCOMO (Constructive Cost Model)
This family of models was proposed by Boehm [3, 4] and they have been widely
accepted in practice. In these models, code-size S is given in thousand LOC (KLOC) and effort is
in person-months. The primary motivation for the COCOMO model has been to help people
understand the cost consequences of the decisions they will make in commissioning, developing,
models which range from a single macro-estimation scaling model as a function of product size
multipliers for each cost driver attribute. COCOMO applies to three classes of software projects:
1. Organic projects: small teams with good experience working with less than rigid
requirements,
2. Semi-detached projects: medium teams with mixed experience working with a mix of rigid
3. Embedded projects: developed within a set of tight constraints such as hardware, software,
A) Basic COCOMO: This model uses three sets of {a, b} depending on the complexity of the
21
The basic COCOMO model is simple and it excludes using many cost factors.
effort estimation is obtained using the power function with three sets of {a, b}, with coefficient a
Next, fifteen cost factors with values ranging from 0.7 to 1.66 drawn from Table II are
determined [4]. The overall impact factor M is obtained as the product of all individual factors,
and the estimate is obtained by multiplying M to the nominal estimate. While both basic and
intermediate COCOMOs estimate software costs at the system level, the detailed COCOMO
works on each sub-system separately and has an obvious advantage for large systems that
following way.
source instructions in thousands (KDSI) and the products development mode, which is
2) A set of effort multipliers are determined from the products ratings on a set of 15 cost driver
attributes.
22
3) The estimated development effort is obtained by multiplying the nominal effort estimate by all
4) Additional factors can be used to determine dollar costs, development schedules, phase and
activity distributions, computer costs, annual maintenance costs, and other elements from the
C) COCOMO II: In this contemporary modification, the exponent b in the earlier COCOMO
models changes according to the following cost factors: precedent, development flexibility, risk,
team cohesion, and process maturity. Newly added cost factors are thrown in for additional
measure.
The Putnam model is based on Norden/Raleigh manpower distribution and his findings in
analyzing many completed projects [23]. The software equation forms the main part of the model
and is: S E ( Effort )1 / 3 t d4 / 3 where t d is the software delivery time; E is the environment factor
that reflects the development capability, which can be derived from historical data using the
software equation. The size S is in LOC and the Effort is in person-years. Another important
up which ranges from eight; i.e., entirely new software with many interfaces, to 27 for rebuilt
software. Combining the above equation with the software equation, we obtain the power
1 3 3
9 / 7
function form: Effort ( Do E
4/7
) S 9 / 7 and t ( D 7 E 7 ) S 7 . SLIM is a software
d 0
tool based on this model for cost estimation and manpower scheduling (http://www.qsm.com/).
23
4.3.5 Model Calibration Using Linear Regression
A direct application of the above models does not take local circumstances into
consideration. One can adjust the cost factors using local data and a linear regression method.
Let the cost estimation power formula be: Effort a S b . Taking the logarithm of both sides
and transforming the result into a linear equation, one gets: Y A bX where Y = log(Effort), A
= log(a) and X = log(S). By applying the least square method to a set of previous project data
{Yi , X i : i 1,..., k} one obtains parameters b and A for the power function.
Discrete models have a tabular form which usually relates effort, duration, difficulty, and
other cost factors. This class of models contains models from [2, 3, 31]. The models have gained
The Price-S is a proprietary software cost estimation model developed and maintained by
RCA [20]. It is a macro cost-estimation model developed for embedded system applications
which formulation has evolved from subjective complexity factors to equations based on the
number of computers/servers, personnel, and project attributes that modulate complexity. The
program provides a wide range of useful outputs such as activity distribution analysis and cost-
In the 1980s, PRICE-S added a software life-cycle support cost estimation capability
called PRICE SL and it involved the definition of three categories of support activities.
24
Growth: The estimator specifies the amount of code to be added to the product. PRICE
SL then uses its standard techniques to estimate the resulting life-cycle-effort distribution.
Enhancement: PRICE SL estimates the fraction of the existing product which will be
modified.
Maintenance: The estimator provides a parameter indicating the quality level of the
developed code. PRICE SL uses this to estimate the effort required to eliminate
remaining errors.
This model is the result of extensive data analysis collected by the Air Force in the 1960s
and 1970s. A number of models of similar form were developed for different application areas.
The effort multipliers are shown in Table V. This model has numerical stability issues because it
exhibits a discontinuity at KDSI = 10, and produces widely varying estimates via the f factors.
For instance, answering yes to first software developed on CPU adds 92% to the estimated
cost.
25
One common error measure for software engineering cost estimation is the Mean
1 n
Absolute Relative Error (MARE). The formula is: MARE (| estimatei actuali |) / actuali
n i1
where the ith estimate is the estimated effort from the model, the ith actual is the actual effort,
and n is the number of projects. To establish whether models are biased, the Mean Relative Error
1 n
(MRE) can be used. Its formulation is: MRE (estimatei actuali ) / actuali . A large
n i1
positive MRE suggests that the model overestimates the effort, while a large negative value
The following criteria can be used for evaluating cost estimation models [4]:
1. Definition Has the model clearly defined the costs it is estimating and the costs it is
excluding?
2. Fidelity Are the estimates close to the actual costs expended on the projects?
3. Objectivity Does the model avoid allocating most of the software cost variance to poorly
calibrated subjective factors such as complexity? Is it hard to adjust the model to obtain any
4. Constructiveness Can a user tell why the model gives the estimates it does? Does it help the
5. Detail Does the model easily accommodate the estimation of a software system consisting of
a number of subsystems and units? Does it give accurate phase and activity breakdowns?
26
6. Stability Do small differences in inputs produce small differences in output cost estimates?
7. Scope Does the model cover the class of software projects whose costs the user needs to
estimate?
8. Ease of Use Are the model inputs and options easy to understand and specify?
9. Prospectiveness Does the model avoid the use of information that will not be well known
10. Parsimony Does the model avoid the use of highly redundant factors, or factors which
Many studies have attempted to evaluate cost estimation models and the results are
discouraging in that many cost estimation techniques were found to be inaccurate. Some studies
Estimacs, and FPA)[17] where no recalibration of models was performed on the project data,
which was different from that used for model development. Most models showed a strong over-
estimation bias and large estimation errors, ranging from a MARE of 57% to 800%.
2. Vicinanza, Mukhopadhyay, and Prietula used experts to estimate project effort using
Kemerers data set without formal algorithmic techniques and found the results outperformed the
models in the original study [29]. The MARE, however, ranged from 32% to 1107%.
27
3. Ferens and Gurner evaluated three models (SPANS, Checkpoint, and COSTAR) using 22
projects from Albrechts database and 14 projects from Kemerers data set. The estimation errors
were found to be large, with MAREs ranging from 46% for the Checkpoint model to 105% for
4. Jeffery and Low investigated the need for model calibration at both the industry and
organization levels [15]. Without model calibration, their estimation error findings were large,
with MAREs ranging from 43% to 105%, They later compared the SPQR/20 model to FPA using
data from 64 projects from a single organization [15]. Their models were recalibrated to the local
environment to remove estimation biases. Some improvements in their estimates show MARE
5. Sheppard and Schofield found that estimating by analogy outperformed estimation based on
6. Heemstra surveyed 364 organizations and found that only 51 used models to estimate effort
and that model users made no better estimates than non-model users [12.5]. Also, use of
7. A survey of software development within JPL found that only 7% if estimators use algorithmic
6. New Approaches
attract considerable research and attention. Recently, Finnie and Wittig applied artificial neural
networks (ANN) and case-based reasoning (CBR) to estimation of effort [10] on a data set from
28
the Australian Software Metrics Association. ANN was able to estimate development effort
within 25% of the actual effort in more than 75% of the projects, and with a MARE of less than
25%. The results from CBR were less encouraging. In 73% of the cases, the estimates were
within 50% of the actual effort, and for 53% of the cases, the estimates were within 25% of the
actual.
Srinivasan and Fisher used machine learning approaches based on regression trees and
neural networks to estimate costs [27]. The learning approaches were found to be competitive
with SLIM, COCOMO, and function points, compared to the previous study by Kemerer [17]. A
primary advantage of learning systems is that they are adaptable and nonparametric.
7. Conclusion
As of today, almost no software engineering cost estimation model can predict the cost of
software development with a high degree of accuracy. This state of the practice is created
(1) there are a large number of interrelated factors that influence the software development
process of a given development team and a large number of project attributes, such as number of
web pages, volatility of system requirements, and the use of reusable software components,
(3) there is a lack of measurement that truly reflects the complexity of software systems.
To produce better estimates, estimators must improve their understandings of those project
attributes and their causal relationships, model the impact of the evolving environment, and
Estimates produce at early stages of development are inevitably inaccurate, as the accuracy
depends highly on the amount of reliable information available to the estimator. As more project
details emerge during analysis and later design stages, uncertainties are reduced and more
accurate estimates can be made. Most models produce exact results without regard to this
uncertainty. They need to be enhanced to produce a range of estimates and their probabilities.
To improve algorithmic models, there is a great need for the industry to collect project
data on a wider scale. The recent effort of ISBSG is a step in the right direction [14]. This
standards group has established a repository of over 790 projects, which can serve as a potential
With new types of applications, new development paradigms, and new development
tools, cost estimators are facing great challenges in applying known estimation models in the 21st
century. Historical data may prove to be irrelevant for future projects. The search for reliable,
accurate, and low cost estimation methods must continue. Several areas are in need of immediate
attention and these include the need for models based on development using formal methods or
those based on iterative software processes. Also, more studies are needed to improve the
accuracy of cost estimates for maintenance projects. Although a good deal of progress has been
made in software cost estimation, a great deal remains to be done. Outstanding issues needing
References
1. A. J. Albrecht, and J.E. Gaffney, Software function, source lines of codes, and development
effort prediction: a software science validation, IEEE Trans Software Eng. SE-9, 1983, pp. 639-
648.
2. J.D. Aron, Estimating Resource for Large Programming Systems, NATO Science Committee,
3. R.K.D. Black, R.P. Curnow, R. Katz and M.D. Gray, BCS Software Production Data, Final
4. B.W. Boehm, Software engineering economics, Englewood Cliffs, NJ: Prentice-Hall, 1981
5. B.W. Boehm et al The COCOMO 2.0 Software Cost Estimation Model, American
6. L.C. Briand, K. El Eman, F. Bomarius, COBRA: A hybrid method for software cost
31
7. G. Cantone, A. Cimitile and U. De Carlini, A comparison of models for software cost
8. W.S. Donelson, Project planning and control, Datamation, June 1976, pp. 73-80.
9. N. E. Fenton and S. L. Pfleeger, Software Metrics: A Rigorous and Practical Approach, PWS
10. G.R. Finnie, G.E. Wittig, AI tools for software development estimation, Software
Engineering and Education and Practice Conference, IEEE Computer Society Press, pp. 346-
353, 1996.
12. P.G. Hamer, G.D. Frewin, M.H. Halsteads Software Science a critical examination,
Proceedings of the 6th International Conference on Software Engineering,Sept. 13-16, 1982, pp.
197-206.
12.5. F.J. Heemstra, Software cost estimation, Information and Software Technology vol. 34,
13. J. Hihn and H. Habib-Agahi, Cost Estimation of software intensive projects: a survey of
15. D.R. Jeffery, G. C. Low, A comparison of function point counting techniques, IEEE Trans
32
16. C. Jones, Applied Software Measurement, Assuring Productivity and Quality, McGraw-Hill,
1997.
Communications of the ACM, vol. 30, no. 5, May 1987, pp. 416-429
18. R.D. Luce and H. Raiffa, Games and Decisions. New York: Wiley, 1957.
19. R. Nelson, Management Handbook for the Estimation of Computer Programming Costs,AD-
20. R.E. Park, PRICE S: The calculation of within and why, Proceedings of ISPA Tenth Annual
21. G.N. Parkinson, Parkinsons Law and Other Studies in Administration, Houghton-Miffin,
Boston, 1957.
22. N. A. Parr, An alternative to the Raleigh Curve Model for Software development effort,
23. L.H. Putnam, A general empirical solution to the macro software sizing and estimating
24. W. Royce, Software project management: a unified framework, Addison Wesley, 1998
the theory and its empirical support, IEEE Transactions on Software Engineering, 9,2,1983, pp.
155-165.
33
26. M. Shepperd and C. Schofield, Estimating software project effort using analogy, IEEE
development effort, IEEE Trans. Soft. Eng., vol. 21, no. 2, Feb. 1995, pp. 126-137.
28. D. St-Pierre, et al, Full Function Points: Counting Practice Manual, Technical Report 1997-
exploratory study of expert performance, Information Systems Research, vol. 2, no. 4, Dec.
30. C.E. Walston and C.P. Felix, A method of programming measurement and estimation,
31. R.W. Wolverton, The cost of developing large-scale software, IEEE Trans. Computer, June
Appendix
34
Figure II: Computer Programming Project Cycle [19]
35
Figure III: Computer Programming Processing Steps [19]
36
Figure IV: Early Sample Cost Justification Form [19]
37
Figure V: Early Sample Project Description Form [19]
38
Figure VI: Early Software Budget Form [19]
39
Table I: Strengths and Weaknesses of Software Cost-Estimation Methods [4]
analyzable formula
Assessment of exceptional
analysis
Calibrated to past, not future
Objectively calibrated to
experience
representativeness,
Biases, incomplete recall
interactions, exceptional
circumstances
experience experience
experience
Generally produces large
overruns
costs
More stable
committment
40
Table II: Cost factors and their weights in COCOMO II [4]
Product
Computer
Personnel
Project
41
Feature Mode
detached
Need for software conformance with pre-established requirements Basic Considerable Full
Need for software conformance with external interface specifications Basic Considerable Full
Concurrent development of associated new hardware and operational procedures Some Moderate Extensive
Need for innovative data processing architectures, algorithms Minimal Some Considerable
Examples
Table IV [4]
42
Table V [4]
43