Sei sulla pagina 1di 21

LENGTH OF SOURCE CODE

BY PANKAJ KAMTHAN

1. INTRODUCTION
Not everything that counts can be counted, and not everything that can be counted counts.

Albert Einstein This document discusses the issue of the length of source code from the perspective of theory and practice of software measurement and, in doing so, highlights some of the challenges. There are a number of products created during development. There are internal product attributes for each of these products, one of which is size. The knowledge of the size of the following products can be useful: Early Conceptual Models. Specification. Design. Source Code.

There are a number of aspects of size, one of which is its physical size, namely length. In this document, the focus is on the size, specifically the length, of the source code. 2. CONVENTION In the rest of the document, [SLOC] represents an arbitrary metric for the source lines of code (SLOC). In this sense, [SLOC] is an abstraction. The terms KLOC and KSLOC are also commonly used in relation to SLOC.

The [SLOC] is arguably the oldest, most commonly-used, and one of the most widelycited size metrics in the industry. For example, it has been suggested [NASA, 1995, Page 45] that [] use lines of code to represent size. However, with the passage of time, [SLOC] has also become one of the most controversial metrics [Galorath, Evans, 2006, Chapter 5], and there are a number of variations of it. 3. UNDERSTANDING SLOC It is important to develop an understanding of [SLOC] prior to any action, such as measurement, on it. A conceptual model contributes towards creating such an understanding. Definition [Model]. A model is a simplification, with respect to some goal, of a thing. To be able to devise a metric for length, there needs to be (1) a conceptual model of computer program (and therefore of its source code), and (2) a conceptual model of length. Figure 1 presents a conceptual model for the source code of a program. It illustrates a number of elements and their interrelationships. The presence of a loop on an element means that the element is related to itself. It is possible to consider n-ry relationships, multiple relationships, and inverse relationships. However, for the sake of simplicity, the relationships are limited to binary, single, unidirectional relationships.

Figure 1. A conceptual model for the source code of a program. There are a number of ways to represent source code. For example, the L in [SLOC] is based on the presumption that the source code is being represented in text.

Program

Source Code

Text

Line

Remark. It is important to make a distinction between a program and its source code, and the source code and its representation type.

Source Code

Size

Length

[SLOC]

4. HISTORY OF SLOC The term SLOC has its origins in line-oriented programming languages including, but not limited to, PL/I, FORTRAN, and assembly languages. Figure 2 illustrates a punch card with a single physical line of source code.

Figure 2. A punch card (based on an emulator1) with a PL/1 statement. 5. EVOLUTION OF PROGRAMMING AND THE LENGTH OF SOURCE CODE

The model of source code length can be influenced by the approach to programming and the nature of the programming language deployed. There have been changes over the years due to the evolution in programming approaches and programming language paradigms. For example, initially and up until early 1990s, the source code produced during programming was textual only. The advent of third and higher generation programming languages, especially visual programming languages, has changed this. For example, there are languages that enable a programmer to produce structure and behavior of user interfaces with little or no source text. In such cases, the concept of line may not even apply, and length may have a meaning different from its conventional sense. The advent of object-oriented programming languages, especially those that are classbased, have also provided an alternative to the notion of a line, namely that of a class.

1 2

URL: http://www.kloth.net/services/cardpunch.php . URL: http://oreilly.com/news/languageposter_0504.html .

6. VIEWPOINTS There can be a number of different (but not necessarily orthogonal) viewpoints of source code length. View and Approach Development Viewpoint Delivery Viewpoint Usage Viewpoint

Source Code Length

CorrespondsTo

CorrespondsTo

Indeed, each viewpoint is intended for a specific purpose, and thus has (1) its own view of the source code length and (2) its own approach for counting the lines of source code. 6.1. DEVELOPMENT VIEWPOINT The definition of source code length adopted can be influenced by the ways in which its corresponding program is developed and run. It can be acknowledged that source code includes relatively more header statements and data declarations, and relatively less code that actually executes. For example, for certain purposes, such as testing, it is important to know the number of Executable Statements (ES). This metric considers separate statements on the same physical line as distinct, each of which contributes towards the count. It ignores comment lines, header statements, and data declarations. 6.2. DELIVERY VIEWPOINT The definition of source code length adopted can be influenced by what really matters to the recipients at the end of the software development process, that is, what is eventually delivered to the client. The amount of source code that is delivered can be significantly different from the amount of source code that is actually developed. For example, for development and testing, drivers, stubs, prototypes, and scaffolding code may be written or generated by the programming team. However, these may be

discarded or ignored at the time the final version is tested and subsequently turned over to the client. Therefore, there is a need to distinguish the amount of delivered source code from the amount of developed source code. The number of Delivered Source Instructions (DSI) encapsulates this aspect of length. This metric considers separate statements on the same physical line as distinct, each of which contributes towards the count. It ignores comment lines. DSI is different from ES in the sense that it includes header statements and data declarations. 6.3. USAGE VIEWPOINT The definition of source code length adopted can be influenced by the ways in which the measure of length is expected to be used. For example, an organization may use length for inter-project conclusions such as (1) compare projects to determine average project size, or (2) observe the trends in project size over time. For example, an organization may use length for intra-project conclusions such as (1) compare units to determine average unit size, or (2) to explore the relationship between unit length and the number of faults (for example, if the length influences the number of faults). 7. MODELS OF SLOC There are two common models of SLOC, namely the physical SLOC and the logical SLOC. 7.1. PHYSICAL SLOC
The most brilliant decision in all of Unix was the choice of a single character for the newline sequence. Mike ODell

The model of source code length can be influenced by what physically (spatially) exists upon development.

DEFINITION OF PHYSICAL SLOC The following definition is one of the earliest definitions of the physical SLOC. It specifically includes all lines containing program headers, declarations, and executable and non-executable statements. Definition 1 [Source Line of Code (SLOC)] [Conte, Dunsmore, Shen, 1986]. A source line of code is any line of program text that is not a comment or blank line, regardless of the number of statements or fragments of statements on the line. The following definition resulted from putting measurement programs into practice at Hewlett-Packard. Definition 2 [Source Line of Code (SLOC)] [Grady, Caswell, 1987]. A line of code is a non-commented source statement: any statement in the source code except for blank lines or comment lines. The following definition resulted from a standardization effort. Definition 3 [Source Line of Code (SLOC)] [Park, 1992]. [A single] physical SLOC [corresponds] to one line starting with the first character and ending by a carriage return or an end-of-file marker of the same line, and which excludes the blank and comment line. EXAMPLES Example 1. sum = a + b + c + d + e + f + g + h + i; Example 2. /* The following has a semantic error. */ if (x < 0) { printf("x is a positive number"); } In each of these cases, the physical SLOC = 3.

ADVANTAGES AND LIMITATIONS OF PHYSICAL SLOC A physical SLOC count based on these definitions does not take in account syntactic and other variations across different programming languages. Therefore, it can be viewed as language-independent. However, a physical SLOC count is dependent of the style conventions of the statements that are being counted. TOOLS FOR CALCULATING PHYSICAL SLOC There are a number of tools for calculating the physical SLOC, including SLOCCount3, LocMetrics4, and CLOC5. These tools vary in a number of ways (commercial or noncommercial, textual or graphical interface, and so on). There are SLOCCount implementations for different operating systems and for different programming languages. For example, there are SLOCCount implementations on Linux (Debian and Ubuntu). SLOCCount has also inspired the development of LocMetrics and CLOC. 7.2. LOGICAL SLOC The definition of source code length can be influenced by what in the source code actually does something when its corresponding program is executed. The logical SLOC is given by the number of statements in a program. ADVANTAGES AND LIMITATIONS OF LOGICAL SLOC The purpose of counting SLOC for a program is usually related to its logic and, in this sense, logical SLOC has an advantage over physical SLOC. A logical SLOC count is independent of the style conventions of the statements that are being counted. Therefore, logical SLOC can provide an accurate count of cases such as multiple logical statements residing on a single line, or that single logical statement spanning multiple lines.
3 4

URL: http://www.dwheeler.com/sloccount/ . URL: http://www.locmetrics.com/ . 5 URL: http://cloc.sourceforge.net/ .

The notion of a statement varies across programming languages. This variation makes logical SLOC language-dependent. For example, a logical SLOC measure for C (and other programming languages that have been inspired by it) is the number of statement-terminating semicolons (;). EXAMPLES Example 1. sum = a + b + c + d + e + f + g + h + i; In this case, the logical SLOC = 1. Example 2. /* The following has a semantic error. */ if (x < 0) { printf("x is a positive number"); } In this case, depending on the interpretation, the logical SLOC = 1 (based on the presence of the number of ;) or logical SLOC = 2 (based on the presence of the if statement and the printf statement). Remark. The counting of the logical SLOC is ambiguous. 7.3. LOGICAL SLOC: REPRISE The counting of the logical SLOC should not depend on a specific syntactic construct. Indeed, determining the beginning and the end of each statement leads to a number of issues in counting logical SLOC. For example, a semicolon may not be used by a programming language; its use may be optional; or it may not play the role of a statement terminator. This realization has led to improvements in the understanding of logical SLOC.

Definition [Source Statement] [Nguyen, Deeds-Rubin, Tan, Boehm, 2007]. A source statement is considered as a block of code that performs some action at runtime or directs compilers at compile time. The source statements can be classified into three types: executable, declaration, and compiler directive. Definition [Executable Line of Code] [Nguyen, Deeds-Rubin, Tan, Boehm, 2007]. A line that contains software instruction executed during runtime and on which a breakpoint can be set in a debugging tool. An instruction can be stated in a simple or compound form. Definition [Data Declaration Line] [Nguyen, Deeds-Rubin, Tan, Boehm, 2007]. A line that contains declaration of data and used by an assembler or compiler to interpret other elements of the program. Definition [Compiler Directive] [Nguyen, Deeds-Rubin, Tan, Boehm, 2007]. A statement that tells the compiler how to compile a program, but not what to compile. A source statement is an atomic and relatively independent unit at the source code level. In other words, the statement is considered as the smallest increment of work carried out by a programmer performs at a given unit of time. Thus, simple and compound statements yield the same number of logical SLOC. For example, the for statement, which consists of initialization, condition, and increment statements, is counted as one logical SLOC rather than three (one for each enclosed statement). EXAMPLES Example 1. Java. if (i > 10) break; Example 2. Perl. if ($x != 0) { print "non-zero"; }

10

Example 3. XML. <?xml version="1.0"?> <!DOCTYPE greeting SYSTEM "hello.dtd"> In each of these cases, the logical SLOC = 2. TOOLS FOR CALCULATING LOGICAL SLOC There are a number of tools for calculating logical SLOC, including USC CodeCountTM6. There are USC CodeCountTM implementations for different programming languages and markup languages. In USC CodeCountTM, logical SLOC is the total number of source statements in the source code. 8. TOWARDS A STANDARDIZATION OF SLOC The pursuit of overcoming the difficulties in counting in both logical and physical SLOC has led to standardization efforts. IEEE

The IEEE Standard 1045-1992 [IEEE, 1993] provides definitions and attributes of SLOC-related metrics. SEI/CMU

The initiative [Park, 1992] of the Software Engineering Institute (SEI) at Carnegie Mellon University (CMU) provides counting methods that could be used to define a consistent and repeatable SLOC measurement. It includes counting definitions and checklists to be used as guidelines. It is referred to and used by SLOCCount.
6

URL: http://csse.usc.edu/research/CODECOUNT/ .

11

However, the focus of the framework is on what to count, not on how many to count. It includes both logical and physical SLOC, and allows for variations in each of these counts. This poses difficulties in the development of counting tools [Nguyen, Deeds-Rubin, Tan, Boehm, 2007]. For example, USC CodeCountTM counts each compiler directive as a logical SLOC, while LocMetrics does not. 9. [SLOC] AND METRICS TAXONOMY From the perspective of one of the metrics taxonomy, the following is a faceted classification of [SLOC]: Metric Coordinates of Classification [SLOC] Internal, Product, Implementation, Direct (Atomic), Static, Objective From the perspective of another metrics taxonomy7, [SLOC] is a kind of linguistic metric. 10. [SLOC] AND THEORY OF SOFTWARE MEASUREMENT The length of a programs source code is measured on the ratio scale. The zero-length element is an empty piece of code. It is possible to measure length in a variety of ways, including lines of code, the number of executable statements, the number of characters, and so on. Let M be the measure of the length of a programs source code in [SLOC] and M' be the length in the number of characters. Then, it is possible to convert from one length measure to another using a transformation of the form

M' = aM,
where a is a constant, namely the average number of characters per line of code.

URL: http://www.cs.technion.ac.il/Courses/OOP/slides/export/236804-Fall-1997/metrics/part1.html .

12

11. THE ADVANTAGES OF [SLOC] There are a number of advantages and disadvantages, including risks, of using [SLOC] for estimation have been pointed out elsewhere [Pfleeger, Wu, Lewis, 2005]. 11.1. VISIBILITY The [SLOC] has been considered attractive, especially by project managers, for a number of reasons: The SLOC are proof of actual work. The SLOC are visible. The counting and understanding of the SLOC does not require any special skill. 11.2. AUTOMATION A [SLOC] calculation can be readily automated, and such a utility could be developed relatively easily as it does not require a sophisticated tool to do the automation. This reduces the time and effort required to produce an estimate. However, a counting utility may not be (easily) transferable across programming languages. 11.3. REUSE The [SLOC] serves as a basis for a number of other metrics that are derived throughout the software development life cycle. 12. A PERSPECTIVE ON THE ENDORSED APPLICATIONS OF THE [SLOC] There have been a number of claims of the uses of [SLOC], although not all can be substantiated. 12.1. COMPARISON OF PROGRAMS The [SLOC] could be used to compare source code based on the same programming language.

13

However, in absence of necessary adjustments, physical SLOC is not useful for comparing programs across different programming languages. For example, a physical SLOC comparison between source code in Perl, Java, and COBOL may even seem ridiculous. 12.2. PROGRAMMER PRODUCTIVITY The programmer productivity can be defined in a number of ways [Fenton, Pfleeger, 1997, Page 408], including

Size / Effort = ( [SLOC] Measurement ) / ( Person-Months ).


However, using [SLOC] leads to a simplistic measure of productivity as it does not taken into account the effective use of resources and creativity. In other words, if the [SLOC] calculation is used as the only measure of programmer productivity, then it encourages quantity over quality. In [Jones, Bonsignour, 2012, Chapter 2], a case study of a telecommunications company in Europe is presented. It suggests the use of [SLOC] is termed as professional malpractice because it violates the basic principles of manufacturing economics and show[s] the highest productivity rates for the lowest-level languages (Figure 3).

Figure 3. A ranking of productivity using [SLOC]. (Source: [Jones, Bonsignour, 2012, Table 2.24].)

14

12.3. COST ESTIMATION In the 1970s and part of 1980s, the attention in software development was largely on programming, and the SLOC was the most perceivable indicator of software cost. The use of [SLOC] is prevalent in a number of cost estimation approaches. It is an input parameter for a number of cost estimation models such as COCOMO, SLIM, and SEERSEM. For example, KDSI is used as a size input for the COCOMO 81 Cost Estimation Model [Boehm, 1981], and the logical SLOC is recommended as a size input for the COCOMO II Cost Estimation Model [Boehm, Abts, Brown, Chulani, Clark, Horowitz, Madachy, Reifer, Steece, 2000]. However, there are limitations of using the [SLOC] for estimating effort. There is no direct relationship between SLOC and effort, and therefore the correlation between SLOC and effort is weak. This is further elaborated in the following: Automation and Succinctness. There are cases where a large amount of source code could be automatically generated with little effort (as in the case of user interface development using programming languages like Visual Basic) and, conversely, a lot of effort may have gone into making source code succinct. Comment Lines. There is effort involved in producing comment lines, although they may not require the same effort as the rest of the source code, especially for a nontrivial algorithm. However, the physical SLOC and the logical SLOC calculations are expected to exclude any comment lines. This can discourage programmers from including comments. 13. A PERSPECTIVE ON THE EXCLUSION OF BLANK LINES AND COMMENT LINES FROM [SLOC] The following two definitions are necessary for stating a definition of the physical SLOC. Definition [Blank Line] [Nguyen, Deeds-Rubin, Tan, Boehm, 2007]. A physical source line of code that contains any number of white space characters such as space, tab, form feed, carriage return, line feed, or their derivatives. Definition [Comment Line] [Nguyen, Deeds-Rubin, Tan, Boehm, 2007]. A comment is a string of zero or more characters that follow language-specific comment delimiter.

15

13.1. SOURCE CODE QUALITY The [SLOC] calculation does not take into account the quality of source code. BLANK LINES AND READABILITY The use of styling conventions (such as formatting using blank lines) contributes to the readability of the source code of a program. However, the [SLOC] calculations are expected to exclude any blank lines. COMMENT LINES AND UNDERSTANDABILITY It is known that, in general and if done appropriately, internal documentation contributes to the understandability of a software artifact. In particular, comments are a kind of annotation (meta-information), and thereby contribute to understandability of source code. However, the [SLOC] calculations are expected to exclude any comment lines. 13.2. SOURCE CODE ON PHYSICAL MEDIA There is loss of important information in using [SLOC]. For example, in certain situations, the source code length is used for deciding the amount of computer storage required for the source code, or the amount of pages required for a printout. In these cases, the source code length must reflect blank lines and comment lines. 13.3. A PARTIAL SOLUTION The following is an adaptation [Fenton, Pfleeger, 1997]. Let PSLOC be a metric that counts the number of physical SLOC according to either of the definitions 1, 2, or 3, and let CSLOC be a metric that counts the comment lines of source code.

16

Then,

TSLOC = PSLOC + CSLOC,


where TSLOC is preferable over PSLOC as a single metric for source code length. Let x and y be two pieces of source code, and the empirical relation Is-longerThan8 be represented by the numerical relation > between the TSLOC. Then,

x Is-Longer-Than y

TSLOC(x) > TSLOC(y).

Thus, the TSLOC satisfies the representation condition. Therefore, from the perspective of representational theory of measurement [Fenton, 1994], the TSLOC is a valid measure for the length attribute of a source code entity. The expression

CSLOC / TSLOC
measures the density of comments in source code, which can be an indicator of the extent of self-documentation. There are tools such LocMetrics that have support for CSLOC. 14. CONCLUSION To be specific, the text of the programs source code is the entity, the length is the attribute, and [SLOC] is one of the metrics for the attribute. In any reference to [SLOC], the details of viewpoint, view, and the counting approach must be made explicit. In particular, there needs to be a clarification of what is being counted and how it is being counted. If not, then the count, based on the viewpoint, view, or the counting approach being used, can vary significantly from each other. For example, a comparison of source code from

It can be noted that this can indeed be checked by mere observation, for example, by visual examination of two printouts.

17

software projects, spanning multiple organizations, using different definitions of SLOC becomes prohibitive. ACKNOWLEDGEMENT The inclusion in this document of an image from an external source is only for noncommercial educational purposes, and its use is hereby acknowledged.

18

REFERENCES [Boehm, 1981] Software Engineering Economics. By B. W. Boehm. Prentice-Hall. 1981. [Boehm, Abts, Brown, Chulani, Clark, Horowitz, Madachy, Reifer, Steece, 2000] Software Cost Estimation with COCOMO II. By B. W. Boehm, C. Abts, A. W. Brown, S. Chulani, B. K. Clark, E. Horowitz, R. Madachy, D. Reifer, B. Steece. Prentice-Hall. 2000. [Conte, Dunsmore, Shen, 1986] Software Engineering Metrics and Models. By S. Conte, H. Dunsmore, V. Shen. Benjamin-Cummings. 1986. [Fenton, 1994] Software Measurement: A Necessary Scientific Basis. By N. Fenton. IEEE Transactions on Software Engineering. Volume 20. Issue 3. 1994. Pages 199-206. [Fenton, Pfleeger, 1997] Software Metrics: A Rigorous and Practical Approach. By N. E. Fenton, S. L. Pfleeger. International Thomson Computer Press. 1997. [Galorath, Evans, 2006] Software Sizing, Estimation, and Risk Management. By D. D. Galorath, M. W. Evans. Auerbach Publications. 2006. [Grady, Caswell, 1987] Software Metrics: Establishing a Company-Wide Program. By R. B. Grady, D. L. Caswell. Prentice-Hall. 1987. [IEEE, 1993] IEEE Standard 1045-1992. Standard for Software Productivity Metrics. IEEE Computer Society. 1993. [Jones, Bonsignour, 2012] The Economics of Software Quality. By C. Jones, O. Bonsignour. Addison-Wesley. 2012. [NASA, 1995] Software Measurement Guidebook. By NASA Software Engineering Program. Technical Report NASA-GB-001-94. National Aeronautics and Space Administration. 1995. [Nguyen, Deeds-Rubin, Tan, Boehm, 2007] A SLOC Counting Standard. By V. Nguyen, S. Deeds-Rubin, T. Tan, B. Boehm. Technical Report. COCOMO Forum. 2007. [Park, 1992] Software Size Measurement: A Framework for Counting Source Statements. By R. E. Park. Technical Report CMU/SEI-92-TR-020 ESC-TR-92-020. Software Engineering Institute. Carnegie Mellon University. Pittsburgh, U.S.A.

19

[Pfleeger, Wu, Lewis, 2005] Software Cost Estimation and Sizing Methods: Issues and Guidelines. By S. L. Pfleeger, F. Wu, R. Lewis. RAND Corporation. 2005.

20

This resource is under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported license.

21

Potrebbero piacerti anche