Sei sulla pagina 1di 8 Software Improvement through Benchmarking: Case Study Results Dr. Hans Sassenburg, Dr. Lucian

Software Improvement through Benchmarking: Case Study Results

Dr. Hans Sassenburg, Dr. Lucian Voinea, Peter Wijnhoven

Abstract Since the early nineties in the previous century, many organizations have substantially invested into software process improvement. Starting in the military industry, the concept of process improvement has nowadays been widely adopted in many other industry segments. It is one the few initiatives that have sustained over time, this in contrast to many hypes. Available models and standards help to define improved processes not only on paper, but also to institutionalize them in the daily way of working. However, a justified and often raised question is what the payoff is. Does the quality of products increase? Has efficiency improved? Are products being brought to the market faster? And an overall question:

compared to what? Benchmarking is a technique that makes use of external comparisons to better evaluate real capability and identify possible actions for the future. As such, it is an important instrument to drive improvement efforts. Using a best practice set of Key Performance Indicators to benchmark capability in several industrial case studies, no strong correlation could be found between real capability and maturity levels. Satisfying models or standards is no guarantee for real performance improvements. It is recommended to focus on a multi-dimensional assessment of the capability of an organization and derive improvements from benchmarked results.


Benchmarking, software process improvement, Key Performance Indicator, metrics, capability, performance.



We manage things "by the numbers" in many aspects of our lives. These numbers give us insight and help steer our actions. Software metrics extend the concept of "managing by the numbers" into the realm of software development. The software industry still isn't doing a very good job at managing by the numbers. Intuition prevails, where numbers should be used. Most projects are still managed by three indicators only: scheduled dead-line, overall budget and removal of critical defects towards the end of the project. This is a narrow view on a multi-dimensional problem. Compare it with a contesting in a Formula 1 race looking only at the fuel- and speedometer. Neglecting oil pressure, tyre pressure, fuel stop planning, weather conditions, and many other variables, will definitely cause you to lose the race.

Successful (software) organizations have found six measurement related objectives extremely valuable [Sassenburg 2006]:

Knowing the capability of one's organization through the analysis of historical project data. In addition, one’s own capability may be benchmarked against industry averages.

Making credible commitments in terms of what will be delivered when against what cost. This involves project estimation based on known capability and analyzed requirements.

Investigating ways to optimize project objectives (on dimensions like schedule or cost). This involves developing and comparing different project alternatives.

Software Improvement through Benchmarking: Case Study Results
Software Improvement through Benchmarking: Case Study Results

Managing development once it starts. This involves project management, but more than generating simple PERT and Gantt charts.

Deciding when a product can be released. This is a trade-off between an early release to capture the benefits of an earlier market introduction, and the deferral of product release to enhance functionality or improve quality.

Analyzing the impact of new initiatives by assessing how capability is affected in which areas. This prevents organizations from chasing hypes.

Being able to meet these objectives requires an implemented measurement process, that converts measured process and product attributes to meaningful management information. Within a project or organization, it is often easy to get people enthused about metrics. But all too often, this enthusiasm does not translate into action. Even when it does, it is unlikely to be sustained and people might get lost in incomplete details. Getting too little or too much data is easy, identifying the relevant data and converting it to meaningful information for everyone is the challenge. Management needs the ability to step back from the details and see the bigger picture. Dashboards with the right information perform that function. They should support answering the following questions as listed in Table 1.

Table 1: Typical management questions to answer.


Typical questions

Project performance

How predictable is the performance of projects?

Process efficiency

How efficient is the development process?

Product scope

How large and stable is the scope of the planned effort?

Product quality

What is the quality of the resulting outcome/product?

2 Best Practice KPIs

The critical success factor here is defining the appropriate Key Performance Indicators (KPIs) in each category. The goal of these KPIs is to foster greater visibility and faster reaction to opportunities and threats, hereby enabling informed decision-making. Based on research and industrial experience, a coherent set of KPIs has been selected that answers the questions of Table 1. These KPIs represent current best practice in industry [Sassenburg 2009]. Efforts undertaken at improving development capability should have demonstrable effects on each of these KPIs, as indicated in the example in the most right column of Table 2.

Table 2: Overview best practice KPI set [Sassenburg 2009].


Typical Key Performance Indicators



Schedule Effort Staffing rate (manpower build-up profile [Putnam 1992, 1997]) Productivity (LOC/hour or other ratio)






Process efficiency

Core activities Support activities Prevention activities

(% of total effort) (% of total effort) (% of total effort)




Appraisal/rework activities (% of total effort)


Product scope

Number of features Percentage of deferred features Size (in KLOC or other unit) Re-use level (percentage of size)






Product quality

Complexity (architectural level, source code level) Test coverage (unit, integration, system testing) Defect density (released defects per KLOC or other unit) Cumulative defect removal efficiency






Software Improvement through Benchmarking: Case Study Results
Software Improvement through Benchmarking: Case Study Results

3 Software Benchmarking

To manage process efficiently, software development organizations must focus on understanding how they perform. Key measures of performance include productivity rate, project time-to-market, and project deliverable quality. Assessing the results of software process is a starting point, but it does not provide context by itself – it is not sufficient for a complete understanding of status, how, and where to improve. Benchmarking is comparing one’s own performance and operating methods with industry averages or best in class examples, with the goal of locating and improving one’s own performance [Camp 1989]. As such, it is an important instrument to prioritize and drive improvement efforts. For many years the lack of readily available benchmark data prevented software managers from analyzing the real economics of software. Many (process) improvement initiatives resulted in satisfying standards/models instead of tangibly improving measured capability. Through the work of Capers Jones [2008] and others, now data on thousands of projects is available to the software industry. This enables making solid business decisions about software development practices and their results in terms of productivity and quality. It allows using economics as the basis of quality analysis [Boehm 2000, Wagner 2007] and balancing between cost versus productivity and quality.

In a series of assignments conducted by the authors, the presented best practice KPI set was used to measure the performance capability of organizations. This implied the assessment of values for each indicator. Two important conclusions were drawn regarding the availability and quality of the data found [Sassenburg 2009]:

Availability. Many low maturity organizations have the opinion that they lack quantitative data. In most cases, this is not true. Although not centrally stored, many sources of data can normally be identified. The challenge is to identify these data sources and analyze them in order to obtain useful information.

Quality. Higher maturity organizations often have the opinion that they have access to rich sources of information. In most cases, the contrary is true. Despite many measurements, the conversion of data to useful management information is a weakness for many organizations. In addition, the availability of clearly defined measurement constructs and validation of measurement data before consolidation are exceptions. This leads to problems with respect to consistency, completeness and reliability [Maxwell 2001].

In case a software organization does not have sufficient, reliable benchmark data available, they can make use of the published data of Capers Jones [2008] and ISBSG [].

4 Case Study Results

The best practice KPI set was used in several benchmarking studies. Presented here are the case study results of two different organizations as an example. The results are representative for many other studies. Both organizations develop embedded systems for the business-to-business market. Although the systems are not consumer products, volumes are fairly high, varying from hundreds to many thousands. One system is considered safety critical, the other system is complicated due to regulations in the area of security of information which may not be manipulated in any way. In early discussions with both organisations, it was revealed that process improvement is institutionalized since many years and that CMMI Maturity Level 3 compliance is within reach. In both cases, some common issues had to be dealt with:

So far, no strong benchmarking data has been published for feature deferral ratios, re-use levels, test coverage during different test types and complexity. Instead, we used our own data from previous studies to benchmark against.

A common issue in embedded systems is the way feature size is calculated. Although function points are preferred as size measure, the only data available was lines of code. The backfiring technique was used to convert lines of code to function points [Jones 1995]. In both cases the programming language used was C, the resulting number of function points was close to 1‘000.

In the following paragraphs we highlight remarkable results from the studies that lead to further

Software Improvement through Benchmarking: Case Study Results
Software Improvement through Benchmarking: Case Study Results

analysis and improvement efforts. Without benchmarking against industry data, these improvement opportunities would most likely have been unnoticed and left unaddressed.

would most likely have been unnoticed and left unaddressed. Figure 1: Productivity benchmarking. Figure 1 shows

Figure 1: Productivity benchmarking.

Figure 1 shows how both case studies compare to benchmarking data regarding productivity in function points per staff month [Jones 2008]. It is obvious that both cases show a much lower productivity level than industry average. In a competitive market this is important to notice, analyse and improve. In these cases, the lower productivity was believed to be a consequence of safety requirements for case study A and security requirements for case study B.

Further remarkable results were found with respect to process efficiency. Regarding process efficiency, a Cost-of-Quality approach is used, based on work of Juran [1988] and Crosby [1979]. Distinction is made between four categories [Sassenburg 2010]:

Core. Costs in this category are essential and bring direct value to a customer by changing the product in some way: requirements, architecture, coding.

Support. Costs in this category are essential but do not bring direct value to a customer: project management, configuration management, administrative support.

Prevention. These are costs incurred to prevent (keep failure and appraisal cost to a minimum) poor quality: quality planning, process improvement teams, reviews, inspections.

Appraisal, rework. These are costs incurred to determine the degree of conformance to quality requirements: mainly testing and defect detection/removal.

Using these definitions, it will be obvious that improving efficiency will normally mean reducing the overall costs by reducing appraisal and rework costs. This can be achieved by increasing prevention costs. This Cost-of-Quality approach is relatively easy to implement and to use. From a project plan, all scheduled activities can be mapped to the four categories and the ratios can be calculated. Note that this enables management to validate the feasibility of a project plan if ratios from the past are known. If any of the projected ratios deviates substantially from values realized in the past, there should be assignable causes for this.

In Figure 2, the case studies results are compared to industry averages. 1 In both cases, a ratio for Appraisal/Rework of approximately 50% was found, which is very high, not only compared with industry averages but as an absolute figure as well. Also here, analysis led to the conclusion that this is a consequence of safety requirements for case study A and security requirements for case study B. However, still much higher ratios for prevention would be expected. This became one of the focus points for improvement activities.

1 Benchmarking ratios were obtained by mapping published project data [Jones 2008] to the four categories.

Software Improvement through Benchmarking: Case Study Results
Software Improvement through Benchmarking: Case Study Results
Software Improvement through Benchmarking: Case Study Results Figure 2: Process efficiency benchmarking. Figure 3: Defect

Figure 2: Process efficiency benchmarking.

Study Results Figure 2: Process efficiency benchmarking. Figure 3: Defect density benchmarking. In Figure 3, the

Figure 3: Defect density benchmarking.

In Figure 3, the case study results are compared to benchmarking data regarding defect density in defects per 1‘000 lines of code [Jones 2008]. While for software with safety requirements and security requirements one might expect having better figures than industry average, the contrary is the case here. The answer of management in both cases was that after releasing the software, many additional tests took place and delivery was to a limited number of users only. In other words, the defect density that finally reached the end-user was less high. On the other hand, they acknowledged that post- release maintenance and support costs were extremely high and should be reduced.

Figure 4 shows that the high defect density finds its origin in the low defect removal efficiency compared to industry average [Jones 2008]: too many defects remain uncovered during development and are detected post-release time. In-depth analysis revealed that the primary causes for low removal efficiency were highly complex architectures and code implementations. As a result, test coverage was very low.

implementations. As a result, test coverage was very low. Figure 4: Removal efficiency benchmarking. In both

Figure 4: Removal efficiency benchmarking.

In both cases, it was very clear to all stakeholders that there are two main weak areas:

Software Improvement through Benchmarking: Case Study Results
Software Improvement through Benchmarking: Case Study Results

The effort distribution revealed a very insufficient process, with a high ratio for Appraisal/rework. If post-release efforts for fixing defects would be included, the ratio would even become substantially higher.

The architecture and code quality in both cases were low. At architectural level, high fan-out values were indicators for low cohesion and tight coupling resulting in a high level of change propagation. At code level, high cyclomatic complexity [McCabe 1976] values were found. As a result, problems arise regarding understandability, modifiability and verifiability.

These two areas were considered the primary causes for low overall capability and were used as the basis to define improvements. The availability of quantitative and benchmarked data helped both organizations to derive a solid business case for improvements.



Do higher maturity levels automatically lead to increased performance? In the studies performed, no strong correlation could be found between capability, expressed using the sixteen indicators, and maturity levels. On the other hand, process improvement makes sense, as it standardizes development processes. This creates transparency with respect to roles, responsibilities, activities and deliverables. However, standardization is no guarantee for real capability improvement. That is why aiming at for instance higher CMMI levels only is considered a wrong approach.

The recommendation is to focus on a multi-dimensional assessment of the capability of an organization and derive improvements from benchmarked results. A first step will be baselining the current capability using the best practice KPI set. In case measurements are not in place, the first improvement actions are identified. As a second step, realistic target values must be determined for a given period of time. The gap between target and actual values is the basis for deriving improvement steps. By focusing on the primary causes for low capability, the chances of sub optimization are reduced or even eliminated. The interesting fact is that improvements can only be achieved by changing the way of working. And of course, a validated improved way of working should be shared:

considering standardization across projects becomes a logical process.

This brings us to the conclusion of this paper. Process improvement and standardization should not be

a goal in itself. Real capability improvement is achieved by taking a quantitative view on processes

and products, and setting realistic and quantified improvement targets. Using the presented KPI set in

a benchmarking study reveals real capability by identifying strengths and weaknesses. This provides the basis for deriving improvements that make sense, whereas implementing and sustaining such improvements are structured by the use of process maturity models.

Software Improvement through Benchmarking: Case Study Results
Software Improvement through Benchmarking: Case Study Results



[Boehm 2000]

Boehm, B.W., Sullivan, K.J., “Software Economics: A Roadmap”, ICSE, Proceedings of the Conference on The Future of Software Engineering, 2010.

[Camp 1989]

Camp, R.C., Benchmarking: the search for industry best practices that lead to superior performance. Milwaukee, Wisconsin: Quality press for the American society for quality control, 1989

[Crosby 1979]

Crosby, P.B., “Quality is Free”, New York: McGraw-Hill, 1979.

[Jones 1995]

Jones, C.J., “Backfiring: converting lines of code to function points”, IEEE Computer, No- vember 1995.

[Jones 2008]

Jones, C.J., “Applied Software Measurement”, McGraw-Hill, 2008.

[Juran 1988]

Juran, J.M., Gryna, F.M., ”Juran’s Quality Control Handbook“, 4th ed., New York: McGraw- Hill Book Company, 1988.

[Maxwell 2001]

Maxwell, K. D., “Collecting Data for Comparability: Benchmarking Software Development Productivity“, IEEE Software, Sep/Oct 2001.

[McCabe 1976]

McCabe, T.J., “A Complexity Measure”, IEEE Transactions on Software Engineering, Vol. 2, pp. 308-320, 1976.

[Putnam 1992]

Putnam, L.H., Myers, W., “Measures for Excellence: Reliable Software On Time Within Budget”, Yourdon Press Computing Series, 1992.

[Putnam 1997]

Putnam, L.H., Myers, W., “Industrial Strength Software: Effective Management Using Measurement”, IEEE Computer Society, 1997.

[Sassenburg 2006]

Sassenburg, H., "Design of a Methodology to Support Software Release Decisions", Doctoral thesis, University of Groningen, 2006.

[Sassenburg 2009]

Sassenburg, H., Voinea, L., “Standardization does not necessarily imply Performance Improvement”, Automatisering Gids (in Dutch), Sept. 4 th , 2009.

[Sassenburg 2010]

Sassenburg, H., Wijnhoven, P., “From Testing to Designing”, Automatisering Gids (in Dutch), March 20 th , 2010.

[Wagner 2007]

Wagner, S., “Using Economics as Basis for Modelling and Evaluating Software Quality”, ICSE, Proceedings of the First International Workshop on The Economics of Software and Computation, 2007.

Software Improvement through Benchmarking: Case Study Results
Software Improvement through Benchmarking: Case Study Results

7 Author CVs

Dr. Hans Sassenburg

Dr. Hans Sassenburg received a Master of Science degree in electrical engineering from the Eindhoven University of Technology (Netherlands) in 1986 and a PhD degree in economics from the University of Groningen (Netherlands) in 2006. He worked as an independent consultant until 1996, when he co-founded a consulting and training firm. This company

specialized in software process improvement and software architecture and was sold in 2000. In 2001 he moved to Switzerland, where he founded a new consulting and training firm SE- CURE AG ( In addition, he has been a visiting scientist at the Software Engineering Institute ( since January of 2005. In 2009, he co-founded the



international accredited partners. Dr. Sassenburg is an internationally published author on software engineering and economics.






Dr. Lucian Voinea

Dr. Lucian Voinea received a Professional Doctorate in Engineering degree (PDEng) from the Eindhoven University of Technology (Netherlands) in 2003 and a PhD degree in computer science from the same university in 2007. Starting from 1999, he worked as a freelance software engineer for companies in Europe and North America. In 2007 he co-founded SolidSource (, a company that provides solutions to support software development and maintenance. In 2009 he co-founded the Software Benchmarking Organization (, a consortium whose aim is to create a framework for benchmarking capability in the software development industry. Dr. Voinea is an internationally published author on software engineering and visualization topics.

Peter Wijnhoven

Peter Wijnhoven holds a BSc degree in mechanical engineering and a BSc degree in computer science. He has over 25 years of experience in embedded systems development. During this period he fulfilled a wide variety of functions, ranging from programmer to architect, from team to project leader, and from consultant to manager consulting group of Sioux Embedded Systems ( Mr. Wijnhoven is active as SPI- and business consultant since 1996, and is managing a group of consultants in the field of development process improvement, architecture, and integration and verification since 2000. During these years he has been actively involved in process improvement projects in multi-disciplinary R&D organizations in Europe, North America, and Asia.