Sei sulla pagina 1di 44

UNIVERSITY OF ALASKA AT ANCHORAGE

Software Project Cost


Estimating

Russell Frith
4/12/2011

0
Abstract

Software engineering cost estimation is the process of predicting the effort required to

develop a software system. Cost estimation techniques involve distinctive steps, tools, algorithms

and assumptions. Many estimation models have been developed since the 1980s due to the

dyanmic nature of software engineering practices. Despite the evolution of new cost estimation

techniques, fundamental economic principles underlie the overall structure of the software

engineering life cycle, and its primary refinements of prototyping, incremental development, and

advancement. This paper provides a general overview of software cost estimation methods,

including recent advances in the field. Many of the models rely on a software project size

estimate as input and this paper provides details for common size metrics. The primary economic

driver of the software life-cycle structure is the significantly increasing cost of making software

changes or fixing software problems as a function of a development phase in which the change

or fix is made. Software engineering models are classified into two major categories: algorithmic

and non-algorithmic. Each has its own strengths and weaknesses with regard to implementing

modifications to software projects. A key factor in selecting a cost estimation model is the

accuracy of its estimates, which can be very problematical.

1. Introduction

In recent years, software has become the most expensive component of computer system

projects. The cost of software development is mostly sourced in human efforts, and most

estimation efforts focus on this aspect and give estimates in terms of person-months. If one

considers economics as the study of how people make decisions in resource-limited situations,

then economics category of macroeconomics is the study of how people make decisions in

resource-limited situations on a national or global scale. Macroeconomic decisions are


1
influenced by tax rates, interest rates, foreign policy, and trade policy. Conversely,

microeconomics is the study of how people make decisions in resource-limited situations on a

more personal scale and it treats decisions that individuals and organizations make on such issues

as how much insurance to buy, which word software development systems to procure, or what

prices to charge for products and services.

Software engineering is an exercise in microeconomics in that it deals with limited

resources. There is never enough time or money to encompass all the essential features software

vendors would like to put into their products. Even with cheap hardware, storage, memory, and

networks, software projects must always operate within a world of limited computing and

network resources. Subsequently, accurate software cost estimates are critical to both developers

and customers. Those estimates can be used for generating requests for proposals, contract

negotiations, scheduling, monitoring, and control. Understimating software engineering costs

could result in management approving proposed systems that potentially exceed budget

allocations, or underdeveloped functions with poor quality, or a failure to complete a project on

time. Conversely, overestimating costs may result in too many resources committed to a project,

or, during contract bidding, result in losing a contract and loss of jobs.

Accurate cost estimation is important because:

It can help to classify and prioritize development projects with respect to an overall

business plan,

It can be used to assess the impact of changes and support replanning,

2
It can be used to determine what resources to commit to the project and how well those

resources will be used,

Projects can be easier to manage and control when resources are better matched to real

needs, and

Customers expect actual development costs to be in line with estimated costs.

Three fundamental estimates typically comprise a software cost estimate and these are effort in

person-months, project duration, and cost. Most cost estimation models attempt to generate an

effort estimate which is then converted into a project duration time-line and cost. The relation

between effort and cost may be non-linear, although effort is measured in person-months of

programmers, analysts, and project managers and effort estimates can be converted to a dollar

cost figure by calculating an average salary per unit time of the staff involved and multiplying

that number by the estimated effort required.

In constructing a software cost engineering estimate, three basic questions arise:

Which software cost estimation model should be used?

Which software size measurement should be used lines of code (LOC), function points

(FP), or feature points?

What is a good estimate?

A widely used practice of cost estimation method is that of using expert judgment. Using this

technique, project managers rely on experience and prevailing industry norms as a basis to

3
develop cost estimates. Basing estimates on expert judgement can be somewhat error prone

however:

The approach is not repeatable and the means of deriving an estimate is subjective.

The pool of experienced estimators of new software projects is very small.

In general, the relationship between cost and system size is not linear. Costs tend to

increase exponentially with size, which subsequently confines expert judgment estimates

to those new projects with anticipated sizes of past projects.

Budget alterations by management aimed at avoiding cost overruns make experience and

data from previous projects questionable.

There exist alternatives to expert judgements, some theoretical and not very useful, others having

more pragmatic value and they are presented in the software engineering cost estimation section.

In the last four decades, many quantitative software cost estimation models have been developed

and they range from empirical models such as Boehms COCOMO models [4] to analytical

models such as those in [7, 22, 23]. An empirical model uses data from previous projects to

evaluate the current project and derives the basic formulae from analysis of the particular

database available. Alternative analytical models use formulae based on global assumptions,

such as the rate at which developers solve problems and the number of problems available.

A well-constructed software cost estimate should have the following properties [24]:

It is conceived and supported by the project manager and the development team.

It is accepted by all stakeholders as realizable.


4
It is based on a well-defined software cost model with a credible basis.

It is based on a database of relevant project experience (similar processes, similar

technologies, similar environments, similar people and similar requirements).

It is defined in enough detail so that its key risk areas are understood and the probability

of success is objectively assessed.

Hindrances to developing a reliable software engineering cost estimate include the following:

Lack of an historical database of cost measurement,

Software development involving many interrelated factors, which affect development

effort and productivity, and which relationships are not well understood,

Lack of trained estimators with the necessary expertise, and

Little penalty is often associated with a poor estimate.

2. Process of Software Engineering Estimation

Throughout the software life cycle, there are many decision situations involving limited

resources in which software engineering techniques provide useful assistance. See Figure II in

the appendix for elements of a computer programming project cycle. To provide a feel for the

nature of these economic decision issues, an example is given below for each of the major phases

in the software life cycle. In addition, refer to Figure III in the appendix for the loopback nature

of computer programming process steps.

Feasibility Phase: How much should one invest in information system analyses (user

questionnaires and interviews, current-system analysis, workload characterizations,

5
simulations, scenarios, prototypes) in order to obtain convergence on an appropriate

definition and concept of operation for the system to be implemented?

Plans and Requirements Phase: How rigorously should requirements be specified? How

much should be invested in requirements validation activities (automated completeness,

consistency, traceability checks, analytic models, simulations, prototypes) before

proceeding to design and develop a software system?

Product Design Phase: Should developers organize software to make it possible to use a

complex piece of existing software which generally but not completely meets

requirements?

Programming Phase: Given a choice between three data storage and retrieval schemes

which are primarily execution time-efficient, storage-efficient, and easy-to-modify,

respectively; which of these should be implemented?

Integration and Test Phase: How much testing and formal verification should be

performed on a product before releasing it to users?

Maintenance Phase: Given an extensive list of suggested product improvements, which

ones should be implemented first?

Phaseout: Given an aging, hard-to-modify software product, should it be replaced with a

new product, should it be restructured, or should it be left alone?

Software cost engineering estimation typically involves a top-down planning approach in

which the cost estimate is used to derive a project plan. Typical steps in a planning process

include:

6
1. The project manager develops a characterization of the overall functionality, size,

process, environment, people, and quality required for the project.

2. A macro-level estimate of the total effort and schedule is developed using a software cost

estimation model.

3. The project manager partitions the effort estimate into a top-level work breakdown

structure. In addition, the schedule is partitioned into major milestone dates and a staffing

profile is configured.

The actual cost estimation process involves seven steps [4]:

1. establish cost-estimating objectives;

2. generate a project plan for required data and resources;

3. pin down software requirements;

4. work out as much detail about the software system as feasible;

5. use several independent cost estimation techniques to capitalize on their combined

strengths;

6. compare different estimates and iterate the estimation process; and

7. once the project has started, monitor its actual cost and progress, and feedback results to

project management.

Regardless of which estimation model is selected, consumers of the model must pay attention to

the following to get the best results:

7
Since some models generate effort estimates for the full software life-cycle and others do

not include effort for the requirements stage, coverage of the estimate is essential.

Model calibration and assumptions should be decided beforehand.

Sensitivity analysis of the estimates to different model parameters should be calculated.

The microeconomics field provides a number of techniques for dealing with software

life-cycle decision issues such as the ones mentioned early in this section. Standard optimization

techniques can be used when one can find a single quantity such as rupees or dollars to serve as a

universal solvent into which all decision variables can be converted. Or, if nonmonetary

objectives can be expressed as constraints (system availability must be 98%, throughput must be

150 transactions per second), then standard constrained optimization techniques can be used. If

cash flows occur at different times, then present-value techniques can be used to normalize them

to a common point in time.

Inherent in the process of software engineering estimation is the utilization of software

engineering economics analysis techniques. One such technique compares cost and benefits. An

example involves the provisioning of a cell phone service in which there are two options.

Option A: Accept an available operating system that requires $80K in software costs, but

will achieve a peak performance of 120 transactions per second using five $10K

minicomputer processors, because of high multiprocessor overhead factors.

Option B: Build a new operating system that would be more efficient and would support a

higher peak throughput, but would require $180 in software costs.

8
In general, software engineering decision problems are even more complex as Options A and B

and will have several important criteria on which they differ such as robustness, ease of tuning,

ease of change, functional capability, and so on. If these criteria are quantifiable, then some type

of figure of merit can be defined to support a comparative analysis of the preference of one

option over another. If some of the criteria are unquantifiable (user goodwill, programmer

morale, etc.), then some techniques for comparing unquantifiable criteria need to be used.

In software engineering, decision issues are generally complex and involve analyzing

risk, uncertainty, and the value of information. The main economic analysis techniques available

to resolve complex decisions include the following:

1. Techniques for decision making under complete uncertainty, such as the maximax rule,

the maximin rule and the Laplace rule [19]. These techniques are generally inadequate for

practical software engineering decisions.

2. Expected-value techniques, in which one estimates the probabilities of occurrence of each

outcome; i.e., successful development of a new operating system, and complete the

expected payoff of each option: EV = Prob(success)*Payoff(successful OS) +

Prob(failure) *Payoff(unsuccessful OS). These techniques are better than decision

making under complete uncertainty, but they still involve a great deal of risk if the

Prob(failure) is considerably higher than the estimate of it.

3. Techniques in which one reduces uncertainty by buying information. For example,

prototyping is a way of buying information to reduce uncertainty about the likely success

or failure of a multiprocessor operating system; by developing a rapid prototype of its

9
high-risk elements, one can get a clearer picture of the likelihood of successfully

developing the full operating system.

Information-buying often tends to be the most valuable aid for software engineering decisions.

The question of how much information-buying is enough can be answered via statistical decision

theoretic techniques using Bayes Law, which provides calculations for the expected payoff from

a software project as a function of the level of investment in a prototype. In practice, the use of

Bayes Law involves the estimation of a number of conditional probabilities which are not easy

to estimate accurately. However, the Bayes Law approach can be translated into a number of

value-of-information guidelines, or conditions under which it makes good sense to decide on

investing in more information before committing to a particular course of action.

Condition 1: There exist attractive alternatives which payoff varies greatly, depending on

some critical states of nature. If not, engineers can commit themselves to one of the attractive

alternatives with no risk of significant loss.

Condition 2: The critical states of nature have an appreciable probability of occurring. If

not, engineers can again commit without major risk. For situations with extremely high

variations in payoff, the appreciable probability level is lower than in situations with smaller

variations in payoff.

Condition 3: The investigations have a high probability of accurately identifying the

occurrence of the critical states of nature. If not, the investigations will not do much to reduce

the risk of loss incurred by making the wrong decision.

10
Condition 4: The required cost and schedule of the investigations do not overly curtail

their net value. It does one little good to obtain results which cost more than those results can

save for us, or which arrive too late to help make a decision.

Condition 5: There exist significant side benefits derived from performing the

investigations. Again, one may be able to justify an investigation solely on the basis of its value

in training, team-building, customer relations, or design validation.

3. Software Engineering Project Sizing

During the 1950s and the 1960s, relatively little progress was made in software cost

estimation, while the frequency and magnitude of software cost overruns was becoming critical

to many large systems employing computers. In 1964, the U.S. Air Force contracted with System

Development Corporation for a landmark project in software cost estimation. The project

collected 104 attributes of 169 software projects and treated them to extensive statistical analysis.

One result was the 1965 SDC cost model which was the best possible statistical 13-parameter

linear estimation model for the sample data:

MM (man-months) = -33.63 + 9.15(Lack of Requirements) + 10.73(Stability of Design) +

0.51(Percent Math Instructions) + 0.46*(Percent Storage/Retrieval Instructions) + 0.4(Number of

Subprograms) + 7.28(Programming Language) 21.45(Business Application) + 13.53(Stand-

Alone Program) + 12.35(First Program on Computer) + 58.82(Concurrent Hardware

Development) + 30.61(Random Access Device Used) + 29.55(Difference Host, Target

Hardware) + 0.54(Number of Personnel Trips) -25.2(Developed by Military Organization) [5].

When applied to its database of 169 projects, this model produced a mean estimate of 40 MM

and a standard deviation of 62MM; not a very accurate predictor. The model is also

11
counterintuitive; a project will all zero values for variables is estimated at -33 MM; changing

language from a higher order language to assembly adds 7 MM, independent of project size. One

can conclude that there were too many nonlinear aspects of software development for a linear

cost-estimation model to work.

Today, software size is the most important factor that affects software cost. There exist

five fundamental software size metrics used in practice. Two of the most commonly used metrics

include the Lines of Code and Function Point metrics. The Lines of Code metric is the

number of lines of delivered source code for the software and it is known as LOC [9], and is

programming language dependent. Most models relate this measurement to the software cost, but

the exact LOC can only be obtained after the project has completed. Thus, estimating project

costs becomes substantially more difficult.

One method for estimating code size is to use experts judgment together with a

technique called PERT (Program Evaluation and Review Technique)[2]. The model is based

upon three possible code-sizes: S1, the lowest possible size; Sh the highest possible size; and

S l S h 4S m
Sm, the most likely size. An esitmate of the code-size S may be computed as S
6

.This formula is valid for modular code components and can be summed with other code

components size values.

An alternative PERT proposed by Halstead [11] uses code length and volume metrics.

Code length is used to measure sorce code program length and is defined as N N1 N 2 where

N1 is the total number of operator occurances and N2 is the total number of operand occurances.

Volume corresponds to the amount of storage space and is defined as V N log(n1 n 2) where
12
n1 is the number of distinct operators and n2 is the number of distinct operands that appear in the

program. Counter-alternatives to Halstead may be found in [12, 25].

As mentioned previously, software size measurement may also be based on function

points. This is a measurement based on the functionality of the program and was introduced by

Albrecht [1]. The total number of function points depends on the counts of distinct logic types in

the following five classes:

1. User-input types: data or control user-input types

2. User-output types: output data types to the user that leaves the system

3. Inquiry types: interactive inputs requiring a response

4. Internal file types: files that are used and shared inside the system

5. External file types: files that are passed or shared between the system and other systems.

Each of these types is individually assigned one of three complexity levels of {1 = simple, 2 =

medium, 3 = complex} and given a weighting value that varies from three for simple input to 15

for complex internal files. The unadjusted function-point counts (UFC) is given as

5 3
UFC N ijWij where Nij and Wij are respectively the number and weight types of class I
i 1 j 1

and complexity j. For instance, if the raw function-point counts of a project are two simple inputs

(Wij = 3), two complex outputs (Wij = 7) and one complex internal file (Wij = 15) then the UFC

value is computes as 2*3 + 2*7 + 15 = 35. This initial function- point count is either directly

used for cost estimation or is further modified by factors which values depend on the overall

13
complexity of the project. The accounting consists of the degrees of distributed processing, the

amount of resuse, performance requirements, and so on. The advantage of the function-point

measurement is that it can be obtained based on the system requirement specification in the early

stage of software development.

UFC may also be used for code size estimation using a linear formula:

LOC = a*UFC + b. The parameters a and b can be obtained using linear regression and

previously completed project data. The latest Function Point Counting Practices Manual is

maintained by the IFPUG (International Function Point Users Group) in http://www.ifpug.org/.

An extension of the function point software measurement technique is the feature point

measurement technique. Feature point extends function points to include algorithms as a new

class [16]. An algorithm is defined as the set of rules which must be completely expressed to

solve a significant computational problem. For example, a square root routine can be considered

as an algorithm. Each algorithm used is given a weight ranging from one (elementary) to ten

(advanced) and the feature point is the weighted sum of the algorithms plus the function points.

This measurement is especially useful for systems with few inputs/outputs and high algorithmic

complexity, such as mathematical software, discrete simulations, and military applications.

Real-time applications development cost estimation is based on full function point (FFP)

analysis. It takes into account special hardware control of such applications. FFP introduces two

new control data functions types and four new control transactional function types which are

described in [28].

A final consideration for software project size estimation is the use of object points.

While feature point and FFP extend function point estimates, object points measure sizes from a
14
different dimension. These measurements are based on the number and complexity of the

following objects: screens, reports, and 3GL components. Each of these objects is counted and

given a weight ranging from one (simple screen) to ten (3GL component) and object points are

the weighted sum of all these objects.

4. Software Engineering Cost Estimation

There are two major types of cost estimation methods: algorithmic and non-algorithmic.

The former vary widely in mathematical sophistication and are based on simple arithmetic

formulae using summary statistics [8]. Others are based on regression models [30] and

differential equations [23]. To improve the accuracy of algorithmic models, there is a need to

adjust or calibrate the model to specific circumstances, but even this added work can still lead to

mixed accuracy. Table I in the appendix lists strengths and weaknesses of software cost-

estimation methods. The first part of this comparative discussion will treat non-algorithmic

costing.

4.1 Non-algorithmic Methods

Analogy Costing: This method involves reasoning by analogy with one or more completed

projects to relate their actual costs to an estimate of the cost of a similar new project. This

protocol may be used at either the total project level or at the subsystem level. The total project

level has the advantage that all cost components of the system will be considered while the

subsystem level has the advantage of providing a more detailed assessment of the similarities and

differences between the new project and the completed project. Success factors using this

technique are outlined in [26].

15
Expert Judgment: This method involves consulting one or more experts, perhaps with the aid

of an expert-consensus mechanism. Experts provide estimates using their own methods and

experience. The PERT technique can be used to resolve inconsistencies in estimates. The Delphi

technique works as follows:

1. The coordinator presents each expert with a specification and a form to record estimates.

2. Each expert fills in the form individually and is allowed to ask the coordinator questions.

3. The coordinator prepares a summary of all estimates from the experts on a form

requesting another iteration of the experts estimates and the rationale for the estimates.

4. Steps 2-3 may be repeated several times.

A modification of the Delphi technique proposed by Boehm and Fahquhar [4] has proven more

effective. Before the estimation, a group meeting involving the coordinator and the experts is

arranged to discuss the estimation issues. In step three the experts do not offer any rationale for

their estimates. Instead, after each round of estimation, the coordinator calls a meeting to have

experts reconcile their estimates.

Parkinson: In the Parkinson principle, work expands to fill the available volume. This principle

is used to equate the cost estimate to available resources [21]. For instance, if the software has to

be delivered in 12 months and five people are available, then the effort is estimated to be 60

person-months. This method is hazardous to use in that it has the potential to provide unrealistic

estimates and it does not promote good software engineering practices.

Price-to-win: Using this method, the cost estimate is equated to the price believed necessary to

win the job or to the schedule believed necessary to be first in the market with a new product.
16
The estimate is based on the customers budget instead of the software functionality. For

example, if a reasonable estimation for a project costs 100 person-months but the customer can

only afford 60 person-months, it is common that the estimator is asked to modify the estimation

to fit 60 person-months effort in order to win the project. A very poor practice indeed, but all

too-often used.

Bottom-up: Each component of the software job is separately estimated, and the results

aggregated to produce an estimate for the overall job. An initial design must be in place that

indicates how the system is decomposed into different components.

Top-down: An overall cost estimate for the project is derived from global properties of the

software product. The total cost is then split among the various components.

The main conclusions one can draw from Table I are the following:

None of the alternatives is better than the others from all aspects.

The Parkinson and price-to-win methods are unacceptable and do not produce

satisfactory cost estimates.

The strengths and weaknesses of the other techniques are complimentary.

In practice, a combination of the viable techniques should be employed, their results

compared, and iterations on them performed when they differ.

4.2 Algorithmic Methods

Algorithmic methods are based on mathematical models that produce cost estimates as a

function of a number of variables, which are considered to be the major cost factors. Any

17
algorithmic model has the form: Effort f ( x1 , x2 ,..., xn ) where {x1 , x2, ..., x n } denotes the set

of cost factors. The existing algorithmic methods differ in two aspects: the selection of cost

factors and the form of the funciton f.

A clearer picture of the fundamental limitations of software cost estimation techniques is

emerging. Despite the seven approaches to software cost estimation, there is no definitive way

one can expect a particular technique to compensate for a lack of definition or understanding of

the software job to be done. Until a software specification is fully defined, a software cost

estimation technique represents a range of software development costs. Figure I in the appendix

shows a limitation in cost estimation technology. In the figure, the accuracy of the cost estimates

is shown as a function of the software life-cycle phase. The horizontal line labeled x is a

convergent estimate for the cost of a human-machine interface for a hypothetical software

project. The level of cost uncertainty is the y-axis and its range is between and four times the

convergent cost. This range is somewhat subjective and is intended to represent 80% confidence

limits, that is, within a factor of four on either side, 80% of the time. At the feasibility phase of

the human-machine interface component, the software engineering estimator does not know what

classes of people (clerks, computer specialists, middle managers, etc.) or what classes of data

(raw or pre-edited, numerical or text, digital or analog) the system will have to support. Until

those uncertainties are clarified, a factor of four in either direction serves as a best-guess for a

range of estimates.

The uncertainty envelope contracts once the feasibility phase is completed and the

operational concept is settled. At this stage, the range of estimates constricts to a factor of two on

either side of the convergent estimate. Outstanding issues include the specific types of query to

be supported or the specific functions to be performed within a potential client-server enterprise


18
application. Those issues will be resolved by the time a software requirements specification has

been developed, at which point the estimate of software costs are ranged within a factor of 1.5 in

either direction of the convergent estimate.

Once the product design specification is completed and validated, design issues such as

the internal data structure of the software product and the specific techniques for handling

network input/output between the client computer and the web server will have been resolved. At

this point the software estimate should be accurate to within a factor of 1.2 of the convergent

estimate. The remaining discrepencies are caused by sources of uncertainty in specific algorithms

to be used for database queries, internet error handling, network failure recovery, and so on.

Those issues will be resolved at the detailed desing phase, but there will still be some residual

uncertainty around ten percent and that is based on how well programmers understand the

specifications to which they are to code, or possibly, personnel turnover during development and

test phases.

4.3 Algorithmic Model Cost Estimation Formulation

4.3.1 Cost Factors

So far a substantial part of this discussion has treated software cost estimation in terms of

software size. There exist, however, many additional cost factors that are worth mentioning.

Table II in the appendix summarizes a set of cost factors proposed by Boehm et al in the

COCOMO II model for software engineering cost estimation [5]. There are four types of cost

factors. The first set of factors includes product factors and these are placeholders for required

reliability, product complexity, database size, required reusability, and documentation matched to

life-cycle needs. The second set includes computer factors which include execution time

19
constraints, storage constraints, computer turnaround constraints, and platform volatility.

Personnel factors consist of the capabilities of analysts, application experience, programming

capabilities, platform experience, language and tool experience, and personnel continuity. The

final set of factors includes project factors, the set of which is made up of multisite development,

software tools used, and development schedule. Many of these factors are hard to quantify and in

many models, some are combined, others are omitted. Furthermore, some factors take on discrete

forms which result in an estimation function taking on piece-wise form.

4.3.2 Linear Models

n
Linear models have the form: Effort a0 ai xi where the coefficients are chosen to
i 1

best fit the completed project data, as in Nelsons work [19]. Needless to say, software

development is mostly comprised of nonlinear interactions so this model is less than optimal.

4.3.3 Multiplicative Models

n
Multiplicative models have the form: Effort a0 ai . This model was used by
x i

i 1

Walston-Felix [30] with each xi taking on three possible values of -1, 0, and +1. These models

have proven to be too restrictive to incorporate many cost factor values.

4.3.4 Power Function Models

Power function models have the general form: Effort a S b where S is the code size

and a and b are functions of other cost factors. This class of modeling contains two popular

algorithmic models.

20
COCOMO (Constructive Cost Model)

This family of models was proposed by Boehm [3, 4] and they have been widely

accepted in practice. In these models, code-size S is given in thousand LOC (KLOC) and effort is

in person-months. The primary motivation for the COCOMO model has been to help people

understand the cost consequences of the decisions they will make in commissioning, developing,

and supporting a software product. COCOMO is a hierarchy of three increasingly detailed

models which range from a single macro-estimation scaling model as a function of product size

to micro-estimation model with a three-level breakdown structure and a set of phase-sensitive

multipliers for each cost driver attribute. COCOMO applies to three classes of software projects:

1. Organic projects: small teams with good experience working with less than rigid

requirements,

2. Semi-detached projects: medium teams with mixed experience working with a mix of rigid

and less than rigid requirements, and

3. Embedded projects: developed within a set of tight constraints such as hardware, software,

operational demands, and so on.

A) Basic COCOMO: This model uses three sets of {a, b} depending on the complexity of the

software. Typical parameter values include the following:

(1) for simple, well-understood applications, a = 2.4, b = 1.05;

(2) for more complex systems, a = 3.0, b = 1.15;

(3) for embedded systems, a = 3.6, b= 1.20.

21
The basic COCOMO model is simple and it excludes using many cost factors.

B) Intermediate COCOMO and Detailed COCOMO: In intermediate COCOMO, a nominal

effort estimation is obtained using the power function with three sets of {a, b}, with coefficient a

being slightly different from that of basic COCOMO:

(1) for simple, well-understood applications, a = 3.2, b = 1.05;

(2) for more complex systems, a = 3.0, b = 1.15;

(3) for embedded systems, a = 2.8, b = 1.2.

Next, fifteen cost factors with values ranging from 0.7 to 1.66 drawn from Table II are

determined [4]. The overall impact factor M is obtained as the product of all individual factors,

and the estimate is obtained by multiplying M to the nominal estimate. While both basic and

intermediate COCOMOs estimate software costs at the system level, the detailed COCOMO

works on each sub-system separately and has an obvious advantage for large systems that

contain non-homogeneous subsystems.

Intermediate COCOMO estimates the cost of a proposed software product in the

following way.

1) A nominal development effort is estimated as a function of the products size in delivered

source instructions in thousands (KDSI) and the products development mode, which is

described in Table III.

2) A set of effort multipliers are determined from the products ratings on a set of 15 cost driver

attributes.

22
3) The estimated development effort is obtained by multiplying the nominal effort estimate by all

of the products effort multipliers.

4) Additional factors can be used to determine dollar costs, development schedules, phase and

activity distributions, computer costs, annual maintenance costs, and other elements from the

development effort estimate.

C) COCOMO II: In this contemporary modification, the exponent b in the earlier COCOMO

models changes according to the following cost factors: precedent, development flexibility, risk,

team cohesion, and process maturity. Newly added cost factors are thrown in for additional

measure.

Putnams Model and SLIM

The Putnam model is based on Norden/Raleigh manpower distribution and his findings in

analyzing many completed projects [23]. The software equation forms the main part of the model

and is: S E ( Effort )1 / 3 t d4 / 3 where t d is the software delivery time; E is the environment factor

that reflects the development capability, which can be derived from historical data using the

software equation. The size S is in LOC and the Effort is in person-years. Another important

relation found by Putnam is Effort Do t d3 where Do is a parameter called manpower build-

up which ranges from eight; i.e., entirely new software with many interfaces, to 27 for rebuilt

software. Combining the above equation with the software equation, we obtain the power

1 3 3
9 / 7
function form: Effort ( Do E
4/7
) S 9 / 7 and t ( D 7 E 7 ) S 7 . SLIM is a software
d 0

tool based on this model for cost estimation and manpower scheduling (http://www.qsm.com/).

23
4.3.5 Model Calibration Using Linear Regression

A direct application of the above models does not take local circumstances into

consideration. One can adjust the cost factors using local data and a linear regression method.

Let the cost estimation power formula be: Effort a S b . Taking the logarithm of both sides

and transforming the result into a linear equation, one gets: Y A bX where Y = log(Effort), A

= log(a) and X = log(S). By applying the least square method to a set of previous project data

{Yi , X i : i 1,..., k} one obtains parameters b and A for the power function.

4.3.6 Discrete Models

Discrete models have a tabular form which usually relates effort, duration, difficulty, and

other cost factors. This class of models contains models from [2, 3, 31]. The models have gained

recent popularity due to their simplicity.

4.3.7 Price-S Model

The Price-S is a proprietary software cost estimation model developed and maintained by

RCA [20]. It is a macro cost-estimation model developed for embedded system applications

which formulation has evolved from subjective complexity factors to equations based on the

number of computers/servers, personnel, and project attributes that modulate complexity. The

program provides a wide range of useful outputs such as activity distribution analysis and cost-

schedule-expected progress forecasts.

In the 1980s, PRICE-S added a software life-cycle support cost estimation capability

called PRICE SL and it involved the definition of three categories of support activities.

24
Growth: The estimator specifies the amount of code to be added to the product. PRICE

SL then uses its standard techniques to estimate the resulting life-cycle-effort distribution.

Enhancement: PRICE SL estimates the fraction of the existing product which will be

modified.

Maintenance: The estimator provides a parameter indicating the quality level of the

developed code. PRICE SL uses this to estimate the effort required to eliminate

remaining errors.

4.3.8 The Doty Model

This model is the result of extensive data analysis collected by the Air Force in the 1960s

and 1970s. A number of models of similar form were developed for different application areas.

For a general application,

MM 5.288( KDSI )1.047 , forKDSI 10,


14
MM 2.060( KDSI )1.047 ( f j ), forKDSI 10.
j 1

The effort multipliers are shown in Table V. This model has numerical stability issues because it

exhibits a discontinuity at KDSI = 10, and produces widely varying estimates via the f factors.

For instance, answering yes to first software developed on CPU adds 92% to the estimated

cost.

4.4 Measurement of Model Performance

25
One common error measure for software engineering cost estimation is the Mean

1 n
Absolute Relative Error (MARE). The formula is: MARE (| estimatei actuali |) / actuali
n i1

where the ith estimate is the estimated effort from the model, the ith actual is the actual effort,

and n is the number of projects. To establish whether models are biased, the Mean Relative Error

1 n
(MRE) can be used. Its formulation is: MRE (estimatei actuali ) / actuali . A large
n i1

positive MRE suggests that the model overestimates the effort, while a large negative value

indicates the reverse.

The following criteria can be used for evaluating cost estimation models [4]:

1. Definition Has the model clearly defined the costs it is estimating and the costs it is

excluding?

2. Fidelity Are the estimates close to the actual costs expended on the projects?

3. Objectivity Does the model avoid allocating most of the software cost variance to poorly

calibrated subjective factors such as complexity? Is it hard to adjust the model to obtain any

result the user wants?

4. Constructiveness Can a user tell why the model gives the estimates it does? Does it help the

user understand the software job to be done?

5. Detail Does the model easily accommodate the estimation of a software system consisting of

a number of subsystems and units? Does it give accurate phase and activity breakdowns?

26
6. Stability Do small differences in inputs produce small differences in output cost estimates?

7. Scope Does the model cover the class of software projects whose costs the user needs to

estimate?

8. Ease of Use Are the model inputs and options easy to understand and specify?

9. Prospectiveness Does the model avoid the use of information that will not be well known

until the project is complete?

10. Parsimony Does the model avoid the use of highly redundant factors, or factors which

make no appreciable contribution to the results?

5. Performance of Estimation Models

Many studies have attempted to evaluate cost estimation models and the results are

discouraging in that many cost estimation techniques were found to be inaccurate. Some studies

found in a literature search include:

1. Kemerer performed an empirical validation of four algorithmic models (SLIM, COCOMO,

Estimacs, and FPA)[17] where no recalibration of models was performed on the project data,

which was different from that used for model development. Most models showed a strong over-

estimation bias and large estimation errors, ranging from a MARE of 57% to 800%.

2. Vicinanza, Mukhopadhyay, and Prietula used experts to estimate project effort using

Kemerers data set without formal algorithmic techniques and found the results outperformed the

models in the original study [29]. The MARE, however, ranged from 32% to 1107%.

27
3. Ferens and Gurner evaluated three models (SPANS, Checkpoint, and COSTAR) using 22

projects from Albrechts database and 14 projects from Kemerers data set. The estimation errors

were found to be large, with MAREs ranging from 46% for the Checkpoint model to 105% for

the COSTAR model.

4. Jeffery and Low investigated the need for model calibration at both the industry and

organization levels [15]. Without model calibration, their estimation error findings were large,

with MAREs ranging from 43% to 105%, They later compared the SPQR/20 model to FPA using

data from 64 projects from a single organization [15]. Their models were recalibrated to the local

environment to remove estimation biases. Some improvements in their estimates show MARE

observations of 12%, thus reflecting the benefits of model calibration.

5. Sheppard and Schofield found that estimating by analogy outperformed estimation based on

statistically derived algorithms [26].

6. Heemstra surveyed 364 organizations and found that only 51 used models to estimate effort

and that model users made no better estimates than non-model users [12.5]. Also, use of

estimation models was no better than expert judgment [12.5].

7. A survey of software development within JPL found that only 7% if estimators use algorithmic

models as a primary approach of estimation [13].

6. New Approaches

Software engineering cost estimation remains a complex problem and it continues to

attract considerable research and attention. Recently, Finnie and Wittig applied artificial neural

networks (ANN) and case-based reasoning (CBR) to estimation of effort [10] on a data set from

28
the Australian Software Metrics Association. ANN was able to estimate development effort

within 25% of the actual effort in more than 75% of the projects, and with a MARE of less than

25%. The results from CBR were less encouraging. In 73% of the cases, the estimates were

within 50% of the actual effort, and for 53% of the cases, the estimates were within 25% of the

actual.

Srinivasan and Fisher used machine learning approaches based on regression trees and

neural networks to estimate costs [27]. The learning approaches were found to be competitive

with SLIM, COCOMO, and function points, compared to the previous study by Kemerer [17]. A

primary advantage of learning systems is that they are adaptable and nonparametric.

7. Conclusion

As of today, almost no software engineering cost estimation model can predict the cost of

software development with a high degree of accuracy. This state of the practice is created

because of the following reasons:

(1) there are a large number of interrelated factors that influence the software development

process of a given development team and a large number of project attributes, such as number of

web pages, volatility of system requirements, and the use of reusable software components,

(2) software engineering development environments are evolving continuously, and

(3) there is a lack of measurement that truly reflects the complexity of software systems.

To produce better estimates, estimators must improve their understandings of those project

attributes and their causal relationships, model the impact of the evolving environment, and

develop effective ways of measuring software complexity.


29
At the initial stage of a project, there is high uncertainty about many project attributes.

Estimates produce at early stages of development are inevitably inaccurate, as the accuracy

depends highly on the amount of reliable information available to the estimator. As more project

details emerge during analysis and later design stages, uncertainties are reduced and more

accurate estimates can be made. Most models produce exact results without regard to this

uncertainty. They need to be enhanced to produce a range of estimates and their probabilities.

To improve algorithmic models, there is a great need for the industry to collect project

data on a wider scale. The recent effort of ISBSG is a step in the right direction [14]. This

standards group has established a repository of over 790 projects, which can serve as a potential

source for builders of cost estimation models.

With new types of applications, new development paradigms, and new development

tools, cost estimators are facing great challenges in applying known estimation models in the 21st

century. Historical data may prove to be irrelevant for future projects. The search for reliable,

accurate, and low cost estimation methods must continue. Several areas are in need of immediate

attention and these include the need for models based on development using formal methods or

those based on iterative software processes. Also, more studies are needed to improve the

accuracy of cost estimates for maintenance projects. Although a good deal of progress has been

made in software cost estimation, a great deal remains to be done. Outstanding issues needing

further research include:

1. Software size estimation;

2. Software size and complexity metrics;

3. Software cost driver attributes and their effects;


30
4. Software cost model analysis and their refinements;

5. Quantitative models of software project dynamics;

6. Quantitative models of software life-cycle evolution;

7. Software data collection.

References

1. A. J. Albrecht, and J.E. Gaffney, Software function, source lines of codes, and development

effort prediction: a software science validation, IEEE Trans Software Eng. SE-9, 1983, pp. 639-

648.

2. J.D. Aron, Estimating Resource for Large Programming Systems, NATO Science Committee,

Rome, Italy, October 1969

3. R.K.D. Black, R.P. Curnow, R. Katz and M.D. Gray, BCS Software Production Data, Final

Technical Report, RADC-TR-77-116, Boeing Computer Services, Inc., March 1977.

4. B.W. Boehm, Software engineering economics, Englewood Cliffs, NJ: Prentice-Hall, 1981

5. B.W. Boehm et al The COCOMO 2.0 Software Cost Estimation Model, American

Programmer, July 1996, pp.2-17.

6. L.C. Briand, K. El Eman, F. Bomarius, COBRA: A hybrid method for software cost

estimation, benchmarking, and risk assessment, International conference on software

engineering, 1998, pp. 390-399.

31
7. G. Cantone, A. Cimitile and U. De Carlini, A comparison of models for software cost

estimation and management of software projects, in Computer Systems: Performance and

Simulation, Elisevier Science Publishers B.V., 1986

8. W.S. Donelson, Project planning and control, Datamation, June 1976, pp. 73-80.

9. N. E. Fenton and S. L. Pfleeger, Software Metrics: A Rigorous and Practical Approach, PWS

Publishing Company, 1997

10. G.R. Finnie, G.E. Wittig, AI tools for software development estimation, Software

Engineering and Education and Practice Conference, IEEE Computer Society Press, pp. 346-

353, 1996.

11. M. H. Halstead, Elements of software science, Elsevier, New York, 1977

12. P.G. Hamer, G.D. Frewin, M.H. Halsteads Software Science a critical examination,

Proceedings of the 6th International Conference on Software Engineering,Sept. 13-16, 1982, pp.

197-206.

12.5. F.J. Heemstra, Software cost estimation, Information and Software Technology vol. 34,

no. 10, 1992, pp. 627-639.

13. J. Hihn and H. Habib-Agahi, Cost Estimation of software intensive projects: a survey of

current practices, International Conference on Software Engineering, 1991, pp. 276-287.

14. ISBSG, International software benchmarking standards group, http://www.isbsg.org/au.

15. D.R. Jeffery, G. C. Low, A comparison of function point counting techniques, IEEE Trans

on Soft. Eng., vol. 19, no. 5, 1993, pp. 529-532.

32
16. C. Jones, Applied Software Measurement, Assuring Productivity and Quality, McGraw-Hill,

1997.

17. C.F. Kemerer, An empirical validation of software cost estimation models,

Communications of the ACM, vol. 30, no. 5, May 1987, pp. 416-429

18. R.D. Luce and H. Raiffa, Games and Decisions. New York: Wiley, 1957.

19. R. Nelson, Management Handbook for the Estimation of Computer Programming Costs,AD-

A648750, Systems Development Corp., 1966

20. R.E. Park, PRICE S: The calculation of within and why, Proceedings of ISPA Tenth Annual

Conference, Brighton, England, July 1988.

21. G.N. Parkinson, Parkinsons Law and Other Studies in Administration, Houghton-Miffin,

Boston, 1957.

22. N. A. Parr, An alternative to the Raleigh Curve Model for Software development effort,

IEEE on Software Eng. May 1980.

23. L.H. Putnam, A general empirical solution to the macro software sizing and estimating

problem, IEEE Trans. Soft. Eng. May 1980

24. W. Royce, Software project management: a unified framework, Addison Wesley, 1998

25. V. Y. Shen, S. D. Conte, H. E. Dunsmore, Software Science revisited: a critical analysis of

the theory and its empirical support, IEEE Transactions on Software Engineering, 9,2,1983, pp.

155-165.

33
26. M. Shepperd and C. Schofield, Estimating software project effort using analogy, IEEE

Trans. Soft. Eng. SE-23:12, 1997, pp. 736-743.

27. K. Srinivasan and D. Fisher, Machine learning approaches to estimating software

development effort, IEEE Trans. Soft. Eng., vol. 21, no. 2, Feb. 1995, pp. 126-137.

28. D. St-Pierre, et al, Full Function Points: Counting Practice Manual, Technical Report 1997-

04, University of Quebec at Montreal, 1997

29. S. S. Vivinanza, T. Mukhopadhyay, and M. J. Prietula, Software-effort estimation: an

exploratory study of expert performance, Information Systems Research, vol. 2, no. 4, Dec.

1991, pp. 243-262.

30. C.E. Walston and C.P. Felix, A method of programming measurement and estimation,

IBM Systems Journal, vol. 16, no. 1, 1977, pp. 54-73.

31. R.W. Wolverton, The cost of developing large-scale software, IEEE Trans. Computer, June

1974, pp. 615-636.

Appendix

Figure I: Software cost estimation accuracy versus phase [4]

34
Figure II: Computer Programming Project Cycle [19]

35
Figure III: Computer Programming Processing Steps [19]

36
Figure IV: Early Sample Cost Justification Form [19]

37
Figure V: Early Sample Project Description Form [19]

38
Figure VI: Early Software Budget Form [19]

39
Table I: Strengths and Weaknesses of Software Cost-Estimation Methods [4]

Method Strengths Weaknesses

Algorithmic Model Objective, repeatable, Subjective inputs

analyzable formula
Assessment of exceptional

Efficient, good for sensitivity circumstances

analysis
Calibrated to past, not future

Objectively calibrated to

experience

Expert Judgment Assessment of No better than participants

representativeness,
Biases, incomplete recall
interactions, exceptional

circumstances

Analogy Based on representative Representativeness of

experience experience

Parkinson Correlates with some Reinforces poor practice

experience
Generally produces large

overruns

Top-down System level focus Less detailed basis

Efficient Less stable

Bottom-up More detailed basis May overlook system level

costs
More stable

Requires more effort


Fosters individual

committment

40
Table II: Cost factors and their weights in COCOMO II [4]

Cost Factors Description Rating

Very Low Nominal Hight Very


Low High

Product

RELY Reuquired software reliability 0.75 0.88 1 1.15 1.4

DATA Database size - 0.94 1.00 1.08 1.16

CPLX Product Complexity 0.7 0.85 1.00 1.15 1.30

Computer

TIME Execution time constraint - - 1.00 1.11 1.3

STOR Main storage constraint - - 1.00 1.06 1.21

VIRT Virtual machine volatility - 0.87 1.00 1.15 1.30

TURN Computer turnaround time - 0.87 1.00 1.07 1.15

Personnel

ACAP Analyst capability 1.46 1.19 1.00 0.86 0.71

AEXP Applicaton experience 1.29 1.13 1.00 0.91 0.82

PCAP Programmer capability 1.42 1.17 1.00 0.86 0.70

VEXP Virtual machine experience 1.21 1.10 1.00 0.9 -

LEXP Language experience 1.14 1.07 1.00 0.95 -

Project

MODP Modern programing practice 1.24 1.1 1 0.91 0.82

TOOL Software tools 1.24 1.1 1 0.91 0.82

SCED Development schedule 1.23 1.08 1 1.04 1.1

Table III: COCOMO Software Development Modes [4]

41
Feature Mode

Organic Semi- Embed

detached

Organization understanding of product objectives Thorough Considerable General

Experience in working with related software systems Extensive Considerable Moderate

Need for software conformance with pre-established requirements Basic Considerable Full

Need for software conformance with external interface specifications Basic Considerable Full

Concurrent development of associated new hardware and operational procedures Some Moderate Extensive

Need for innovative data processing architectures, algorithms Minimal Some Considerable

Premium on early completion Low Medium High

Product size range <50 KDSI <300 KDSI All sizes

Examples

Table IV [4]

42
Table V [4]

43

Potrebbero piacerti anche