Sei sulla pagina 1di 28

1

White-box testing
From Wikipedia, the free encyclopedia

White-box testing (a.k.a. clear box testing, glass box testing, transparent box testing, or structural testing) is a
method of testing software that tests internal structures or workings of an application, as opposed to its
functionality (i.e. black-box testing). In white-box testing an internal perspective of the system, as well as
programming skills, are required and used to design test cases. The tester chooses inputs to exercise paths
through the code and determine the appropriate outputs. This is analogous to testing nodes in a circuit, e.g. in-
circuit testing (ICT).

While white-box testing can be applied at the unit, integration and system levels of the software
testing process, it is usually done at the unit level. It can test paths within a unit, paths between units during
integration, and between subsystems during a system level test. Though this method of test design can
uncover many errors or problems, it might not detect unimplemented parts of the specification or missing
requirements.

White-box test design techniques include:

 Control flow testing

 Data flow testing

 Branch testing

 Path testing
Hacking

In penetration testing, white-box testing refers to a methodology where an ethical hacker has full knowledge of
the system being attacked. The goal of a white-box penetration test is to simulate a malicious insider who has
some knowledge and possibly basic credentials to the target system.

Compare with black-box testing.

Control flow
From Wikipedia, the free encyclopedia

Not to be confused with Flow control.

In computer science, control flow (or alternatively, flow of control) refers to the order in which the
individual statements, instructions, or function calls of an imperative or a declarative program are executed or
evaluated.
2

Within an imperative programming language, a control flow statement is a statement whose execution results
in a choice being made as to which of two or more paths should be followed. For non-strict functional
languages, functions and language constructs exist to achieve the same result, but they are not necessarily
called control flow statements.

The kinds of control flow statements supported by different languages vary, but can be categorized by their
effect:

 continuation at a different statement (unconditional branch or jump),

 executing a set of statements only if some condition is met (choice - i.e. conditional branch),

 executing a set of statements zero or more times, until some condition is met (i.e. loop - the same
as conditional branch),

 executing a set of distant statements, after which the flow of control usually returns
(subroutines, coroutines, and continuations),

 stopping the program, preventing any further execution (unconditional halt).

Interrupts and signals are low-level mechanisms that can alter the flow of control in a way similar to a
subroutine, but usually occur as a response to some external stimulus or event (that can
occur asynchronously), rather than execution of an 'in-line' control flow statement. Self-modifying code can also
be used to affect control flow through its side effects, but usually does not involve an explicit control flow
statement (an exception being the ALTER verb in COBOL[citation needed]).

• At the level of machine or assembly language, control flow instructions usually work by altering
the program counter. For some CPUs the only control flow instructions available are conditional or
unconditional branches (sometimes called jumps).

Primitives
Labels
Main article: Label (programming language)

A label is an explicit name or number assigned to a fixed position within the source code, and which may be
referenced by control flow statements appearing elsewhere in the source code. Other than marking a position
within the source code a label has no effect.

Line numbers are an alternative to a named label (and used in some languages such as Fortran and BASIC),
that are whole numbers placed at the beginning of each line of text within the source code. Languages which
use these often impose the constraint that the line numbers must increase in value in each subsequent line, but
may not require that they be consecutive. For example, in BASIC:

10 LET X = 3
20 PRINT X
3

In other languages such as C and Ada a label is an identifier, usually appearing at the beginning of a line and
immediately followed by a colon. For example, in C:

Success: printf ("The operation was successful.\n");

The Algol 60 language allowed both whole numbers and identifiers as labels (both attached by colons to the
following statement), but few if any other variants of Algol allowed whole numbers.

Goto
Main article: GOTO

The goto statement (a combination of the English words go and to, and pronounced accordingly) is the most
basic form of unconditional transfer of control.

Although the keyword may either be in upper or lower case depending on the language, it is usually written as:

goto label

The effect of a goto statement is to cause the next statement to be executed to be the statement appearing at
(or immediately after) the indicated label.

Goto statements have been considered harmful by many computer scientists, notably Dijkstra.

Subroutines
Main article: Subroutine

The terminology for subroutines varies; they may alternatively be known as routines, procedures, functions
(especially if they return results) or methods (especially if they belong to classes or type classes).

In the 1950s, computer memories were very small by current standards so subroutines were used
primarily[citation needed] to reduce program size; a piece of code was written once and then used many times from
various other places in the program.

Nowadays, subroutines are more frequently used to help make a program more structured, e.g. by isolating
some particular algorithm or hiding some particular data access method. If many programmers are working on
a single program, subroutines are one kind of modularitythat can help split up the work.

Minimal structured control flow

See also: Structured program theorem

In May 1966, Böhm and Jacopini published an article[1] in Communications of the ACM which showed that any
program with gotos could be transformed into a goto-free form involving only choice (IF THEN ELSE) and
loops (WHILE condition DO xxx), possibly with duplicated code and/or the addition of Boolean variables
4

(true/false flags). Later authors have shown that choice can be replaced by loops (and yet more Boolean
variables).

The fact that such minimalism is possible does not necessarily mean that it is desirable; after all, computers
theoretically only need one machine instruction (subtract one number from another and branch if the result is
negative), but practical computers have dozens or even hundreds of machine instructions.

What Böhm and Jacopini's article showed was that all programs could be goto-free. Other research showed
that control structures with one entry and one exit were much easier to understand than any other form,
primarily because they could be used anywhere as a statement without disrupting the control flow. In other
words, they were composable. (Later developments, such as non-strict programming languages - and more
recently, composable software transactions - have continued this line of thought, making components of
programs even more freely composable.)

Control structures in practice

Most programming languages with control structures have an initial keyword which indicates the type of control
structure involved. Languages then divide as to whether or not control structures have a final keyword.

 No final keyword: Algol 60, C, C++, Haskell, Java, Pascal, Perl, PHP, PL/I, Python,PowerShell. Such
languages need some way of grouping statements together:

 Algol 60 and Pascal : begin ... end

 C, C++, Java, Perl, PHP, and PowerShell: curly brackets { ... }

 PL/1: DO ... END

 Python: uses indentation level (see Off-side rule)

 Haskell: either indentation level or curly brackets can be used, and they can be freely mixed

 Final keyword: Ada, Algol 68, Modula-2, Fortran 77, Mythryl, Visual Basic. The forms of the final keyword
vary:

 Ada: final keyword is end + space + initial keyword e.g. if ... end if, loop ... end loop

 Algol 68, Mythryl: initial keyword spelled backwards e.g. if ... fi, case ... esac

 Fortran 77: final keyword is end + initial keyword e.g. IF ... ENDIF, DO ... ENDDO

 Modula-2: same final keyword END for everything

 Visual Basic: every control structure has its own keyword. If ... End If; For ... Next; Do ... Loop

Choice

Main article: Conditional (programming)

Loops
5

"Program loop" redirects here. For a specific type of loop that listens to and dispatches messages, see Event
loop. For other uses, seeLoop.

A loop is a sequence of statements which is specified once but which may be carried out several times in
succession. The code "inside" the loop (the body of the loop, shown below as xxx) is obeyed a specified
number of times, or once for each of a collection of items, or until some condition is met.

In functional programming languages, such as Haskell and Scheme, loops can be expressed by
using recursion or fixed point iteration rather than explicit looping constructs. Tail recursion is a special case of
recursion which can be easily transformed to iteration.

Count-controlled loops
Main article: For loop

See also: Loop counter

Most programming languages have constructions for repeating a loop a certain number of times. Note that if N
is less than 1 in these examples then the language may specify that the body is skipped completely, or that the
body is executed just once with N = 1. In most cases counting can go downwards instead of upwards and step
sizes other than 1 can be used.

FOR I = 1 TO N for I := 1 to N do begin


xxx xxx
NEXT I end;

DO I = 1,N for ( I=1; I<=N; ++I ) {


xxx xxx
END DO }

In many programming languages, only integers can be reliably used in a count-controlled loop. Floating-point
numbers are represented imprecisely due to hardware constraints, so a loop such as

for X := 0.1 step 0.1 to 1.0 do

might be repeated 9 or 10 times, depending on rounding errors and/or the hardware and/or the compiler
version. Furthermore, if the increment of X occurs by repeated addition, accumulated rounding errors may
mean that the value of X in each iteration can differ quite significantly from the expected sequence 0.1, 0.2, 0.3,
..., 1.0.

Condition-controlled loops
Main article: While loop
6

See also: Do-while loop

Most programming languages have constructions for repeating a loop until some condition changes. Note that
some variations place the test at the start of the loop, while others have the test at the end of the loop. In the
former case the body may be skipped completely, while in the latter case the body is always executed at least
once.

DO WHILE (test) repeat


xxx xxx
LOOP until test;

while (test) { do
xxx xxx
} while (test);

Collection-controlled loops
Main article: Foreach

Several programming languages (e.g. Ada, D, Smalltalk, Perl, Java, C#, Mythryl, Visual
Basic, Ruby, Python, JavaScript, Fortran 95 and later) have special constructs which allow implicitly looping
through all elements of an array, or all members of a set or collection.

someCollection do: [:eachElement |xxx].

foreach (item; myCollection) { xxx }

foreach someArray { xxx }

Collection<String> coll; for (String s : coll) {}

foreach (string s in myStringCollection) { xxx }

$someCollection | ForEach-Object { $_ }

forall ( index = first:last:step... )

General iteration
General iteration constructs such as C's for statement and Common Lisp's do form can be used to express
any of the above sorts of loops, as well as others -- such as looping over a number of collections in parallel.
Where a more specific looping construct can be used, it is usually preferred over the general iteration construct,
since it often makes the purpose of the expression more clear.
7

Infinite loops
Infinite loops are used to assure a program segment loops forever or until an exceptional condition arises, such
as an error. For instance, an event-driven program (such as a server) should loop forever handling events as
they occur, only stopping when the process is terminated by an operator.

Often, an infinite loop is unintentionally created by a programming error in a condition-controlled loop, wherein
the loop condition uses variables that never change within the loop.

Continuation with next iteration


Sometimes within the body of a loop there is a desire to skip the remainder of the loop body and continue with
the next iteration of the loop. Some languages provide a statement such as continue, skip, or next which

will do this. The effect is to prematurely terminate the innermost loop body and then resume as normal with the
next iteration. If the iteration is the last one in the loop, the effect is to terminate the entire loop early.

[edit]Redo current iteration


Some languages, like Perl and Ruby, have a redo statement that restarts the current iteration from the

beginning.

[edit]Restart loop
Ruby has a retry statement that restarts the entire loop from the initial iteration.

[edit]Early exit from loops


When using a count-controlled loop to search through a table, it might be desirable to stop searching as soon
as the required item is found. Some programming languages provide a statement such as break or exit,

whose effect is to terminate the current loop immediately and transfer control to the statement immediately
following that loop. One can also return out of a subroutine executing the looped statements, breaking out of

both the nested loop and the subroutine. Things can get a bit messy if searching a multi-dimensional table
using nested loops (see #Proposed control structures below).

The following example is done in Ada which supports both early exit from loops and loops with test in the
middle. Both features are very similar and comparing both code snippets will show the difference: early
exit needs to be combined with an if statement while a condition in the middle is a self contained construct.

with Ada.Text IO;


with Ada.Integer Text IO;

procedure Print_Squares is
X : Integer;
begin
Read_Data : loop
Ada.Integer Text IO.Get(X);
exit Read_Data when X = 0;
Ada.Text IO.Put (X * X);
8

Ada.Text IO.New_Line;
end loop Read_Data;
end Print_Squares;

Python supports conditional execution of code depending on whether a loop was exited early (with
a break statement) or not by using a else-clause with the loop. For example,

for n in set_of_numbers:
if isprime(n):
print "Set contains a prime number"
break
else:
print "Set did not contain any prime numbers"

Note that the else clause in the above example is attached to the for statement, and not the
inner if statement. Both Python's for andwhile loops support such an else clause, which is executed only if

early exit of the loop did not occur.

Loop variants and invariants


Loop variants and loop invariants are used to express correctness of loops.[2]

In practical terms, a loop variant is an integer expression which has an initial non-negative value. The variant's
value must decrease during each loop iteration but must never become negative during the correct execution of
the loop. Loop variants are used to guarantee that loops will terminate.

A loop invariant is an assertion which must be true before the first loop iteration and remain true after each
iteration. This implies that when a loop terminates correctly, both the exit condition and the loop invariant are
satisfied. Loop invariants are used to monitor specific properties of a loop during successive iterations.

Some programming languages, such as Eiffel contain native support for loop variants and invariants. In other
cases, support is an add-on, such as the Java Modeling Language's specification for loop statements in Java.

Loop system cross reference table

correctness
conditional loop
facilities
Program
ming early continu re ret
languag exit ation do ry
e
e be mid cou collect gen infini varia invari
n
gin dle nt ion eral te[1] nt ant
d

deep
Ye
Ada Yes Yes Yes arrays No Yes neste No
s
d
9

deep
Ye No [2
deep
C Yes No No Yes No neste [3] No
s ]
nested
d [3]

deep
Ye No [2
deep
C++ Yes No No [9]
Yes No neste [3] No
s ]
nested
d [3]

deep
Ye No [2
deep
C# Yes No Yes Yes No neste
s ]
nested [3]
d[3]

deep
Common Ye
Yes Yes Yes Yes Yes Yes neste No
Lisp s
d

one
Yes[1 No[
Eiffel Yes No No 0] Yes Yes No level [
No No 11] Yes Yes
10]

F# Yes No No Yes Yes No No No [6]


No No

FORTRAN one
Yes No No Yes No No No Yes
77 level

deep
Fortran
Yes No No Yes No No Yes neste Yes
90
d

Fortran deep
95 and Yes No No Yes arrays No Yes neste Yes
later d

Haskell No No No No Yes No Yes No [6]


No No

deep non- non-


Ye No [2
deep
Java Yes No Yes Yes No neste No nativ native
s ]
nested
d e[12] [12]
10

deep
JavaScrip Ye No [2
deep
Yes No Yes Yes No neste No
t s ]
nested
d

arrays,
OCaml Yes No No Yes No No No [6]
No No
lists

deep
Ye No [2 deep
PHP Yes No Yes [4]
Yes No neste No
s ][5]
nested
d

deep
Ye No [2 deep Ye
Perl Yes No Yes Yes No neste
s ][5]
nested s
d

deep
No [5
deep
Python Yes No No ] Yes No No neste [6] No
nested
d[6]

one
No [
Ye
REBOL 7] Yes Yes Yes No [8]
Yes level [
No No
s 6]

deep
Ye deep Ye
Ruby Yes No Yes Yes No Yes neste Yes
s nested [6]
s
d [6]

Standard arrays,
Yes No No No No No No [6]
No No
ML lists

one
level
Visual one level
Ye per
Basic Yes No Yes Yes No Yes per type
s type
.NET of loop
of
loop
11

Windows
Ye No [2
PowerSh Yes No Yes Yes No ? Yes
s ]
ell

1. a
while (true) does not count as an infinite loop for this purpose, because it is not a dedicated
language structure.

2. a b c d e f g h
C's for (init; test; increment) loop is a general loop construct, not specifically

a counting one, although it is often used for that.

3. a b c
Deep breaks may be accomplished in C, C++ and C# through the use of labels and go to s.

4. a
Iteration over objects was added in PHP 5.

5. a b c
A counting loop can be simulated by iterating over an incrementing list or generator, for instance,
Python's range().

6. a b c d e
Deep breaks may be accomplished through the use of exception handling.

7. a
There is no special construct, since the while function can be used for this.

8. a
There is no special construct, but users can define general loop functions.

9. a
The upcoming C++0x standard introduces the range-based for. In the STL there is
an std::for_each template function which can iterate on STL containers and call an unary

function for each element.[3] The functionality also can be constructed as macro on these containers.[4]

10. a Count controlled looping is effected by iteration across an integer interval; early exit by including an
additional condition for exit.

11. a Eiffel supports a reserved word retry, however it is used in exception handling, not loop control.

12. a Requires Java Modeling Language (JML) behavioral interface specification language.
Structured non-local control flow

Many programming languages, particularly those which favor more dynamic styles of programming, offer
constructs for non-local control flow. These cause the flow of execution to jump out of a given context and
resume at some predeclared point. Conditions, exceptions, and continuations are three common sorts of non-
local control constructs.

Conditions
PL/I has some 22 standard conditions (e.g. ZERODIVIDE SUBSCRIPTRANGE ENDFILE) which can be
RAISEd and which can be intercepted by: ON condition action; Programmers can also define and use their own
named conditions.

Like the unstructured if only one statement can be specified so in many cases a GOTO is needed to decide
where flow of control should resume.
12

Unfortunately, some implementations had a substantial overhead in both space and time (especially
SUBSCRIPTRANGE), so many programmers tried to avoid using conditions.

Common Syntax examples:

ON condition GOTO label

[edit]Exceptions

Main article: Exception handling

Modern languages have a structured construct for exception handling which does not rely on the use of GOTO:

try {
xxx1 // Somewhere in here
xxx2 // use: '''throw''' someValue;
xxx3
} catch (someClass& someId) { // catch value of someClass
actionForSomeClass
} catch (someType& anotherId) { // catch value of someType
actionForSomeType
} catch (...) { // catch anything not already caught
actionForAnythingElse
}

Any number and variety of catch clauses can be used above. In D, Java, C#, and Python
a finally clause can be added to the tryconstruct. No matter how control leaves the try the code
inside the finally clause is guaranteed to execute. This is useful when writing code that must relinquish
an expensive resource (such as an opened file or a database connection) when finished processing:
FileStream stm = null; // C# example
try {
stm = new FileStream ("logfile.txt", FileMode.Create);
return ProcessStuff(stm); // may throw an exception
} finally {
if (stm != null)
stm. Close();
}

Since this pattern is fairly common, C# has a special syntax:


using (FileStream stm = new FileStream ("logfile.txt", FileMode.Create)) {
return ProcessStuff(stm); // may throw an exception
}

Upon leaving the using-block, the compiler guarantees that the stm object is released.
Python's with statement and Ruby's block argument to File.open are used to similar effect.
13

All these languages define standard exceptions and the circumstances under which they are thrown.
Users can throw exceptions of their own (in fact C++ and Python allow users to throw and catch almost
any type).

If there is no catch matching a particular throw, then control percolates back through subroutine calls
and/or nested blocks until a matching catch is found or until the end of the main program is reached, at
which point the program is forcibly stopped with a suitable error message.

The AppleScript scripting programming language provides several pieces of information to a "try" block:

try
set myNumber to myNumber / 0

on error e number n from f to t partial result pr

if ( e = "Can't divide by zero" ) then display dialog "You idiot!"

end try
Continuations
Main article: Continuation

Non-local control flow cross reference

Programming conditi excepti


language ons ons

Ada No Yes

C No No

C++ No Yes

C# No Yes

D No Yes

Eiffel No Yes

Haskell No Yes
14

Java No Yes

Mythryl Yes Yes

Objective-C No Yes

PHP No Yes

PL/I Yes No

Python No Yes

REBOL Yes Yes

Ruby No Yes

Visual Basic .NET Yes Yes

Windows
No Yes
PowerShell

Proposed control structures


In a spoof Datamation article[5] in 1973, R. Lawrence Clark suggested that the GOTO statement could be
replaced by the COMEFROMstatement, and provides some entertaining examples. This was actually
implemented in the INTERCAL programming language, a language designed to make programs as
obscure as possible.

In his 1974 article "Structured Programming with go to Statements",[6] Donald Knuth identified two
situations which were not covered by the control structures listed above, and gave examples of control
structures which could handle these situations. Despite their utility, these constructions have not yet found
their way into mainstream programming languages.
Loop with test in the middle
The following was proposed by Dahl in 1972:[7]
15

loop loop
xxx1 read(char);
while test; while not atEndOfFile;
xxx2 write(char);
repeat; repeat;

If xxx1 is omitted we get a loop with the test at the top. If xxx2 is omitted we get a loop with the test at the
bottom. If while is omitted we get an infinite loop. Hence this single construction can replace several
constructions in most programming languages. A possible variant is to allow more than one while test;
within the loop, but the use of exitwhen (see next section) appears to cover this case better.

Languages lacking this construct generally emulate it using an equivalent infinite-loop-with-break idiom:

while (true) {
xxx1
if (not test)
break
xxx2
}

The Wikibook Ada


Programminghas a
page on the topic of

Control

In Ada, the above loop construct (loop-while-repeat) can be represented using a standard infinite loop
(loop - end loop) that has an exit when clause in the middle (not to be confused with
theexitwhen statement in the following section).
with Ada.Text_IO;
with Ada.Integer_Text_IO;

procedure Print_Squares is
X : Integer;
begin
Read_Data : loop
Ada.Integer_Text_IO.Get(X);
exit Read_Data when X = 0;
Ada.Text IO.Put (X * X);
Ada.Text IO.New_Line;
end loop Read_Data;
end Print_Squares;

Naming a loop (Like Read_Data in our example) is optional but allows to leave the outer loop of several
nested loops.
16

Multiple early exit/exit from nested loops


This was proposed by Zahn in 1974.[8] A modified version is presented here.

exitwhen EventA or EventB or EventC;


xxx
exits
EventA: actionA
EventB: actionB
EventC: actionC
endexit;

exitwhen is used to specify the events which may occur within xxx, their occurrence is indicated by using
the name of the event as a statement. When some event does occur, the relevant action is carried out,
and then control passes just after endexit. This construction provides a very clear separation between
determining that some situation applies, and the action to be taken for that situation.

exitwhen is conceptually similar to exception handling, and exceptions or similar constructs are used for
this purpose in many languages.

The following simple example involves searching a two-dimensional table for a particular item.

exitwhen found or missing;


for I := 1 to N do
for J := 1 to M do
if table[I,J] = target then found;
missing;
exits
found: print ("item is in table");
missing: print ("item is not in table");
endexit

Control flow diagram


From Wikipedia, the free encyclopedia
17

Example of a so called "performance seeking control flow diagram".[1]

A control flow diagram (CFD) is a diagram to describe the control flow of a business process, process or
program.

Control flow diagrams were developed in the 1950s, and are widely used in multiple engineeringdisciplines.
They are one of the classic business process modeling methodologies, along withflow charts, data flow
diagrams, functional flow block diagram, Gantt charts, PERT diagrams, and IDEF.[2]

Overview

A control flow diagram can consist of a subdivision to show sequential steps, with if-then-else conditions,
repetition, and/or case conditions. Suitably annotated geometrical figures are used to represent operations,
data, or equipment, and arrows are used to indicate the sequential flow from one to another.[3]

There are several types of control flow diagrams, for example:

 Change control flow diagram, used in project management

 Configuration decision control flow diagram, used in configuration management

 Process control flow diagram, used in process management

 Quality control flow diagram, used in quality control.


18

In software and systems development control flow diagrams can be used in control flow analysis, data flow
analysis, algorithm analysis, andsimulation. Control and data flow analysis are most applicable for real time and
data driven systems. These flow analyses transform logic and data requirements text into graphic flows which
are easier to analyze than the text. PERT, state transition, and transaction diagrams are examples of control
flow diagrams.[4]

Types of Control Flow Diagrams


Process Control Flow Diagram
A flow diagram can be developed for the process control system for each critical activity. Process control is
normally a closed cycle in which a sensor provides information to a process control software
application through a communications system. The application determines if the sensor information is within the
predetermined (or calculated) data parameters and constraints. The results of this comparison are fed to an
actuator, which controls the critical component. This feedback may control the component electronically or may
indicate the need for a manual action.[5]

This closed-cycle process has many checks and balances to ensure that it stays safe. The investigation of how
the process control can be subverted is likely to be extensive because all or part of the process control may be
oral instructions to an individual monitoring the process. It may be fully computer controlled and automated, or
it may be a hybrid in which only the sensor is automated and the action requires manual intervention. Further,
some process control systems may use prior generations of hardware and software, while others are state of
the art.[5]

Performance seeking control flow diagram


The figure presents an example of a performance seeking control flow diagram of the algorithm. The control
law consists of estimation, modeling, and optimization processes. In the Kalman filter estimator, the inputs,
outputs, and residuals were recorded. At the compact propulsion system modeling stage, all the estimated inlet
and engine parameters were recorded.[1]

In addition to temperatures, pressures, and control positions, such estimated parameters as stall margins,
thrust, and drag components were recorded. In the optimization phase, the operating condition constraints,
optimal solution, and linear programming health status condition codes were recorded. Finally, the actual
commands that were sent to the engine through the DEEC were recorded.[1] dfd(data float diagam)is network
manen ment system

Data flow diagram


From Wikipedia, the free encyclopedia
19

Data flow diagram example.[1]

A data flow diagram (DFD) is a graphical representation of the "flow" of data through an information system.
DFDs can also be used for thevisualization of data processing (structured design).

On a DFD, data items flow from an external data source or an internal data store to an internal data store or an
external data sink, via an internal process.

A DFD provides no information about the timing of processes, or about whether processes will operate in
sequence or in parallel. It is therefore quite different from a flowchart, which shows the flow of control through
an algorithm, allowing a reader to determine what operations will be performed, in what order, and under what
circumstances, but not what kinds of data will be input to and output from the system, nor where the data will
come from and go to, nor where the data will be stored (all of which are shown on a DFD).

Overview

Data flow diagram example.


20

Data flow diagram -Yourdon/DeMarco notation.

It is common practice to draw a context-level data flow diagram first, which shows the interaction between the
system and external agents which act as data sources and data sinks. On the context diagram (also known as
the 'Level 0 DFD') the system's interactions with the outside world are modelled purely in terms of data flows
across the system boundary. The context diagram shows the entire system as a single process, and gives no
clues as to its internal organization.

This context-level DFD is next "exploded", to produce a Level 1 DFD that shows some of the detail of the
system being modeled. The Level 1 DFD shows how the system is divided into sub-systems (processes), each
of which deals with one or more of the data flows to or from an external agent, and which together provide all of
the functionality of the system as a whole. It also identifies internal data stores that must be present in order for
the system to do its job, and shows the flow of data between the various parts of the system.

Data flow diagrams were proposed by Larry Constantine, the original developer of structured design,[2] based
on Martin and Estrin's "data flow graph" model of computation.

Data flow diagrams (DFDs) are one of the three essential perspectives of the structured-systems analysis and
design method SSADM. The sponsor of a project and the end users will need to be briefed and consulted
throughout all stages of a system's evolution. With a data flow diagram, users are able to visualize how the
system will operate, what the system will accomplish, and how the system will be implemented. The old
system's dataflow diagrams can be drawn up and compared with the new system's data flow diagrams to draw
comparisons to implement a more efficient system. Data flow diagrams can be used to provide the end user
with a physical idea of where the data they input ultimately has an effect upon the structure of the whole system
from order to dispatch to report. How any system is developed can be determined through a data flow diagram.
21

In the course of developing a set of levelled data flow diagrams the analyst/designers is forced to address how
the system may be decomposed into component sub-systems, and to identify the transaction data in the data
model.

There are different notations to draw data flow diagrams (Yourdon & Coad and Gane & Sarson[3]), defining
different visual representations for processes, data stores, data flow, and external entities.[4]

Developing a data flow diagram


Event partitioning approach
Event partitioning was described by Edward Yourdon in Just Enough Structured Analysis.[5]

A context level Data flow diagram created using Select SSADM.

This level shows the overall context of the system and its operating environment and shows the whole system
as just one process. It does not usually show data stores, unless they are "owned" by external systems, e.g.
are accessed by but not maintained by this system, however, these are often shown as external entities.[6]

Level 1 (high level diagram)

This level (level 1) shows all processes at the first level of numbering, data stores, external entities and the
data flows between them. The purpose of this level is to show the major and high-level processes of the system
and their model will have one, and only one, level-1 diagram. A level-1 diagram must be balanced with its
parent context level diagram, i.e. there must be the same external entities and the same data flows, these can
be broken down to more detail in the level 1, example the "enquiry" data flow could be split into "enquiry
request" and "enquiry results" and still be valid.[6] This is all about using your creativity.
22

Level 2 (low level diagram)

A Level 2 Data flow diagram showing the "Process Enquiry" process for the same system.

This level is a decomposition of a process shown in a level-1 diagram, as such there should be a level-2
diagram for each and every process shown in a level-1 diagram. In this example, processes 1.1, 1.2 & 1.3 are
all vimal of process 1. Together they wholly and completely describe process 1, and combined must perform
the full capacity of this parent process. As before, a level-2 diagram must be balanced with its parent level-1
diagram.

Cyclomatic complexity
From Wikipedia, the free encyclopedia

Cyclomatic complexity (or conditional complexity) is a software metric (measurement). It was developed by
Thomas J. McCabe, Sr. in 1976 and is used to indicate the complexity of a program. It directly measures the
number of linearly independent paths through a program'ssource code. The concept, although not the method,
is somewhat similar to that of general text complexity measured by the Flesch-Kincaid Readability Test.

Cyclomatic complexity is computed using the control flow graph of the program: the nodes of
the graph correspond to indivisible groups of commands of a program, and a directed edge connects two
nodes if the second command might be executed immediately after the first command. Cyclomatic complexity
may also be applied to individual functions, modules, methods or classes within a program.

One testing strategy, called Basis Path Testing by McCabe who first proposed it, is to test each linearly
independent path through the program; in this case, the number of test cases will equal the cyclomatic
complexity of the program.[1]

Description
23

A control flow graph of a simple program. The program begins executing at the red node, then enters a loop (group of three
nodes immediately below the red node). On exiting the loop, there is a conditional statement (group below the loop), and
finally the program exits at the blue node. For this graph, E = 9, N = 8 and P = 1, so the cyclomatic complexity of the
program is 3.

The cyclomatic complexity of a section of source code is the count of the number of linearly
independent paths through the source code. For instance, if the source code contained no decision points such
as IF statements or FOR loops, the complexity would be 1, since there is only a single path through the code. If
the code had a single IF statement containing a single condition there would be two paths through the code,
one path where the IF statement is evaluated as TRUE and one path where the IF statement is evaluated as
FALSE.

Mathematically, the cyclomatic complexity of a structured program[note 1] is defined with reference to a directed
graph containing the basic blocks of the program, with an edge between two basic blocks if control may pass
from the first to the second (the control flow graph of the program). The complexity is then defined as:[2]

M = E − N + 2P

where

M = cyclomatic complexity

E = the number of edges of the graph

N = the number of nodes of the graph

P = the number of connected components


24

The same function as above, shown as astrongly-connected control flow graph, for calculation via the alternative method.
For this graph, E = 10, N = 8 and P = 1, so the cyclomatic complexity of the program is still 3.

An alternative formulation is to use a graph in which each exit point is connected back to the entry point. In this
case, the graph is said to be strongly connected, and the cyclomatic complexity of the program is equal to the
cyclomatic number of its graph (also known as thefirst Betti number), which is defined as:[2]

M=E−N+P

This may be seen as calculating the number of linearly independent cycles that exist in the graph, i.e. those
cycles that do not contain other cycles within themselves. Note that because each exit point loops back to the
entry point, there is at least one such cycle for each exit point.

For a single program (or subroutine or method), P is always equal to 1. Cyclomatic complexity may, however,
be applied to several such programs or subprograms at the same time (e.g., to all of the methods in a class),
and in these cases P will be equal to the number of programs in question, as each subprogram will appear as a
disconnected subset of the graph.

It can be shown that the cyclomatic complexity of any structured program with only one entrance point and one
exit point is equal to the number of decision points (i.e., 'if' statements or conditional loops) contained in that
program plus one.[2][3]

Cyclomatic complexity may be extended to a program with multiple exit points; in this case it is equal to:

π-s+2

where π is the number of decision points in the program, and s is the number of exit points.[3][4]
25

Formal definition
Formally, cyclomatic complexity can be defined as a relative Betti number, the size of a relative
homology group:

which is read as “the first homology of the graph G, relative to the terminal nodes t”. This is a technical way of
saying “the number of linearly independent paths through the flow graph from an entry to an exit”, where:

 “linearly independent” corresponds to homology, and means one does not double-count backtracking;

 “paths” corresponds to first homology: a path is a 1-dimensional object;

 “relative” means the path must begin and end at an entry or exit point.

This corresponds to the intuitive notion of cyclomatic complexity, and can be calculated as above.

Alternatively, one can compute this via absolute Betti number (absolute homology – not relative) by identifying
(gluing together) all terminal nodes on a given component (or equivalently, draw paths connecting the exits to

the entrance), in which case (calling the new, augmented graph , which is ), one obtains:

This corresponds to the characterization of cyclomatic complexity as “number of loops plus number of
components”.

Etymology / Naming
The name Cyclomatic Complexity presents some confusion, as this metric does not only count cycles (loops)
in the program. Instead, the name refers to the number of different cycles in the program control flow graph,
after having added an imagined branch back from the exit node to the entry node.[2]

A better name for popular usage would be Conditional Complexity, as "it has been found to be more
convenient to count conditions instead of predicates when calculating complexity".[5]

[edit]Applications

[edit]Limiting complexity during development


One of McCabe's original applications was to limit the complexity of routines during program development; he
recommended that programmers should count the complexity of the modules they are developing, and split
them into smaller modules whenever the cyclomatic complexity of the module exceeded 10.[2] This practice was
adopted by the NIST Structured Testing methodology, with an observation that since McCabe's original
publication, the figure of 10 had received substantial corroborating evidence, but that in some circumstances it
may be appropriate to relax the restriction and permit modules with a complexity as high as 15. As the
methodology acknowledged that there were occasional reasons for going beyond the agreed-upon limit, it
26

phrased its recommendation as: "For each module, either limit cyclomatic complexity to [the agreed-upon limit]
or provide a written explanation of why the limit was exceeded."[6]

[edit]Implications for Software Testing


Another application of cyclomatic complexity is in determining the number of test cases that are necessary to
achieve thorough test coverage of a particular module.

It is useful because of two properties of the cyclomatic complexity, M, for a specific module:

 M is an upper bound for the number of test cases that are necessary to achieve a complete branch
coverage.

 M is a lower bound for the number of paths through the control flow graph (CFG). Assuming each test
case takes one path, the number of cases needed to achieve path coverage is equal to the number of paths
that can actually be taken. But some paths may be impossible, so although the number of paths through the
CFG is clearly an upper bound on the number of test cases needed for path coverage, this latter number
(of possible paths) is sometimes less than M.

All three of the above numbers may be equal: branch coverage cyclomatic complexity number of paths.

For example, consider a program that consists of two sequential if-then-else statements.

if( c1() )
f1();
else
f2();

if( c2() )
f3();
else
f4();
27

The control flow graph of the source code above; the red circle is the entry point of the function, and the blue circle is the exit
point. The exit has been connected to the entry to make the graph strongly connected.

In this example, two test cases are sufficient to achieve a complete branch coverage, while four are necessary
for complete path coverage. The cyclomatic complexity of the program is 3 (as the strongly-connected graph
for the program contains 9 edges, 7 nodes and 1 connected component) (9-7+1).

In general, in order to fully test a module all execution paths through the module should be exercised. This
implies a module with a high complexity number requires more testing effort than a module with a lower value
since the higher complexity number indicates more pathways through the code. This also implies that a module
with higher complexity is more difficult for a programmer to understand since the programmer must understand
the different pathways and the results of those pathways.

Unfortunately, it is not always practical to test all possible paths through a program. Considering the example
above, each time an additional if-then-else statement is added, the number of possible paths doubles. As the
program grew in this fashion, it would quickly reach the point where testing all of the paths was impractical.

One common testing strategy, espoused for example by the NIST Structured Testing methodology, is to use
the cyclomatic complexity of a module to determine the number ofwhite-box tests that are required to obtain
sufficient coverage of the module. In almost all cases, according to such a methodology, a module should have
at least as many tests as its cyclomatic complexity; in most cases, this number of tests is adequate to exercise
all the relevant paths of the function.[6]

As an example of a function that requires more than simply branch coverage to test accurately, consider again
the above function, but assume that to avoid a bug occurring, any code that calls either f1() or f3() must also
28

call the other.[note 2] Assuming that the results of c1() and c2() are independent, that means that the function as
presented above contains a bug. Branch coverage would allow us to test the method with just two tests, and
one possible set of tests would be to test the following cases:

 c1() returns true and c2() returns true

 c1() returns false and c2() returns false

Neither of these cases exposes the bug. If, however, we use cyclomatic complexity to indicate the number of
tests we require, the number increases to 3. We must therefore test one of the following paths:

 c1() returns true and c2() returns false

 c1() returns false and c2() returns true

Either of these tests will expose the bug.

[edit]Cohesion

One would also expect that a module with higher complexity would tend to have lower cohesion (less than
functional cohesion) than a module with lower complexity. The possible correlation between higher complexity
measure with a lower level of cohesion is predicated on a module with more decision points generally
implementing more than a single well defined function. A 2005 study showed stronger correlations between
complexity metrics and an expert assessment of cohesion in the classes studied than the correlation between
the expert's assessment and metrics designed to calculate cohesion.[7]

[edit]Correlation to number of defects


A number of studies have investigated cyclomatic complexity's correlation to the number of defects contained
in a module. Most such studies find a strong positive correlation between cyclomatic complexity and defects:
modules that have the highest complexity tend to also contain the most defects. For example, a 2008 study by
metric-monitoring software supplier Enerjy analyzed classes of open-source Java applications and divided
them into two sets based on how commonly faults were found in them. They found strong correlation between
cyclomatic complexity and their faultiness, with classes with a combined complexity of 11 having a probability
of being fault-prone of just 0.28, rising to 0.98 for classes with a complexity of 74.[8]

However, studies that control for program size (i.e., comparing modules that have different complexities but
similar size, typically measured in lines of code) are generally less conclusive, with many finding no significant
correlation, while others do find correlation. Some researchers who have studied the area question the validity
of the methods used by the studies finding no correlation.[9]

Les Hatton claimed recently (Keynote at TAIC-PART 2008, Windsor, UK, Sept 2008) that McCabe Cyclomatic
Complexity has the same prediction ability as lines of code.

Potrebbero piacerti anche