Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
September 1999
FUNCTIONS .................................................................................................................13
Table of Contents
ARRAYS .......................................................................................................................35
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 1. 2. 3. 4. 5. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. ii Introduction.......................................................................................................................................35 Defining and referencing arrays........................................................................................................35 Array initialisation ............................................................................................................................37 Multi-dimensional arrays ..................................................................................................................38 Arrays as function arguments ...........................................................................................................38 Pointers and arrays............................................................................................................................39 Character strings and variable pointers .............................................................................................40 Character string input/output ............................................................................................................40 Arrays of pointers and pointers to pointers .......................................................................................41 Command line arguments .................................................................................................................42 Initialising pointer arrays ..................................................................................................................43 Review ..............................................................................................................................................43 Summary...........................................................................................................................................44 An array application - Stack of char .................................................................................................45 Introduction.......................................................................................................................................47 The steps to produce an executable...................................................................................................48 Types, storage class and scope..........................................................................................................48 Local duration ...................................................................................................................................49 Declaration versus definition ............................................................................................................50 Static duration ...................................................................................................................................51 Storage class static ............................................................................................................................52 Static local variables .........................................................................................................................52 Static global variables .......................................................................................................................52 The C++ pre-processor .....................................................................................................................53 Conditional compilation....................................................................................................................53 Conditional file inclusion..................................................................................................................54 Data Types ........................................................................................................................................55 Abstract Data Types..........................................................................................................................55 Classification ....................................................................................................................................55 Categories of Collection ...................................................................................................................56 Stacks................................................................................................................................................56 Abstract Data Type? .........................................................................................................................59 Queues ..............................................................................................................................................59 Lists...................................................................................................................................................61 Structs ...............................................................................................................................................61 Unions...............................................................................................................................................62 Structures ..........................................................................................................................................63 Comparison between structs and arrays ............................................................................................64 Storage Management ........................................................................................................................65 Dynamic Data Structures - Linked Lists ...........................................................................................68 Other dynamic structures ..................................................................................................................72 Introduction.......................................................................................................................................73 Components of Sorting .....................................................................................................................73 Sorting Files......................................................................................................................................73 Why sort?..........................................................................................................................................75 Does it pay to sort? ...........................................................................................................................75 What is the best sort? ........................................................................................................................75 Sorting efficiency..............................................................................................................................75 Simple Array Sort - Exchange (Bubble) ...........................................................................................76 Insertion Sort.....................................................................................................................................77 Simple Sort performance ..................................................................................................................78 Conclusions.......................................................................................................................................78 Complex sorts ...................................................................................................................................78
SORTING......................................................................................................................73
Table of Contents
13. 14. 15. 16. 17. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 1. 2. 3. 4. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 1. 2. 3. 4. 5. 6. 7. 8. 1. 2. 3. QuickSort.......................................................................................................................................... 79 Efficiency of Quicksort .................................................................................................................... 80 C++ code for function Quicksort ( see Wirth )................................................................................. 81 Comparison of complex sorting algorithms...................................................................................... 81 Further Reading ................................................................................................................................ 81 The context for testing - Verification and Validation....................................................................... 83 The objectives of testing................................................................................................................... 83 Testing & Debugging ....................................................................................................................... 84 Two different testing strategies ........................................................................................................ 84 Categories of Testing........................................................................................................................ 86 Test Planning.................................................................................................................................... 86 How much testing? ........................................................................................................................... 87 Test Data v Test Cases ..................................................................................................................... 87 Black box v White box testing ......................................................................................................... 87 Black box testing .............................................................................................................................. 88 White box testing - Introduction....................................................................................................... 91 White box testing.............................................................................................................................. 92 Automated Testing ........................................................................................................................... 96 Representing Abstract Structure ....................................................................................................... 99 Implementing Data Structures ........................................................................................................ 100 Metrics............................................................................................................................................ 100 Mathematical Notations.................................................................................................................. 101 Applications.................................................................................................................................... 105 Implementation............................................................................................................................... 105 Variations ....................................................................................................................................... 105 Example Declaration ...................................................................................................................... 105 Expression Trees ............................................................................................................................ 106 Tree Traversal................................................................................................................................. 106 Parse Trees ..................................................................................................................................... 107 Binary Search Trees ....................................................................................................................... 107 Importance of Balance.................................................................................................................... 108 Other types of tree .......................................................................................................................... 108 Applications.................................................................................................................................... 111 Operations ...................................................................................................................................... 111 Efficiency ....................................................................................................................................... 111 Problem .......................................................................................................................................... 111 Hashing........................................................................................................................................... 111 Collision Resolution ....................................................................................................................... 112 Hash Table example ....................................................................................................................... 112 Perfect Hashing Functions.............................................................................................................. 113 The ctype library............................................................................................................................. 115 The maths library............................................................................................................................ 116 The standard library........................................................................................................................ 117
TESTING ......................................................................................................................83
LIBRARIES................................................................................................................. 115
BIBLIOGRAPHY......................................................................................................... 119
iii
2.
Data Types
There are a number of basic data types built in to all programming languages. A data type consists of a name and a specification of :! !
the range of values that a variable of that type can hold - its domain. This range is often limited due to the amount of storage that is used by such items. the operations that may be carried out on values of that type
In C++, the most common data type is int - whole numbers that may be positive or negative - natural numbers. The amount of storage allocated to variables of type int is
A sequence of characters
Basic C++
often 2 bytes and sometimes 4 bytes depending on the compiler. This allows a range of values from
! !
-32768 - 32767 in the case of 2 bytes and -2,147,483,648 - 2,147,483,647 where 4 bytes are employed.
These peculiar ranges arise from use of the binary system. The fundamental native2 data types and their storage size in GNU C++ are:type Char unsigned char short int Int unsigned int long int Float Double Range of values Character codes 0 - 127 Unsigned character codes 0 - 255 Signed integer -32768 to 32767 Signed integer -2,147,483,648 to 2,147,483,647 Unsigned integer 0 - 4,294,967,295 Signed integer -2,147,483,648 to 2,147,483,647 1.17549e-38 to 3.40282e+38 2.22507e-308 to 1.79769e+308 Bytes 1 1 2 4 4 4 4 8
Note that, unlike some compilers, GNU C++ uses 4 bytes for type int thus providing the same range of values as type long int (or just long). Unsigned integers have double the capacity of signed integers because there is no need to store the sign. Strings and characters are not the same. A string containing only a single character, e.g. "W" actually occupies 2 bytes of storage one for the 'W' and one for the ASCII NUL. A character variable can hold only one single character, e.g. 'W', normally occupying only one byte. To declare a variable of type string and give it a value immediately:char myname[] = "Terry Chapman"; If the string is not intended to be changed, it should be declared as a constant:const char myname[] = "Terry Chapman"; The empty brackets signify an array whose size is determined automatically by the compiler which also reserves space for the terminating ASCII NUL. The variable or constant can be output in the usual way, i.e. cout << myname;
3.
String Constants
A string constant is a sequence of characters enclosed in double quotes. e.g. "MSc Information Technology". The sequence may be empty e.g. "". If the string is to include certain characters, e.g. double quotes and the backslash, then these must be escaped with the '\' backslash character, e.g. "She said \"I have lost my file mydir\\myprog.cpp\"". When output, this would display: She said "I have lost my file mydir\myprog.cpp"
Basic C++
Other special characters may be included, e.g.
\n \t \f newline Tab formfeed \? \' \a question mark single quote alarm bell
A string constant can extend over 2 or more lines by placing a backslash at the end of an uncompleted line. Two adjacent strings are concatenated to form a single string e.g. "This string " "is concatenated with this one" There is no native data type string in C++. Instead, strings are implemented as an array3 of characters terminated by the special character '\0' (ASCII NUL). 0 1 2 3 4 5 ie the unprintable character which has the ASCII code 0. We will cover arrays H e l l o \0 later - they are a very important compound data type holding a sequence of data items in a contiguous area of memory. Strings and characters are not the same. A string containing only a single character, e.g. "W" actually occupies 2 bytes of storage one for the 'W' and one for the ASCII NUL. A character variable can hold only one single character, e.g. 'W', normally occupying only one byte. To declare a variable of type string and give it a value immediately:char myname[] = "Terry Chapman"; If the string is not intended to be changed, it should be declared as a constant:const char myname[] = "Terry Chapman"; The empty brackets signify an array whose size is determined automatically by the compiler which also reserves space for the terminating ASCII NUL. The variable or constant can be output in the usual way, i.e. cout << myname;
4.
Basic C++
Identifiers must start with a letter. After this, they may contain any number of letters, digits or the underscore character. They must not include spaces. int this_is_a_very_long_identifier_with_99 = 99; // valid float The Average; // invalid - contains a space char 2good; // invalid, starts with digit You must use meaningful identifiers. They are part of the programs documentation and should be expressive of the purpose for which the identifier is required. An exception to this is loop control variables that have no other purpose than to access elements of an array. These are commonly a single character e.g. i, j. Constants are named items that cannot change. These are used for values in the program that will remain constant throughout the programs execution. They must be initialised with a value. Examples:const double pi = 3.14159265359; const int numitems = 350;
5.
Arithmetic Operators
+ * / % unary plus or addition unary minus or subtraction multiplication division modulus
Note that there is no exponentiation operator that raises a number to a power. There are library routines that accomplish this. The above operators apply to all numeric types (except %). Modulus produces the remainder after integer division and applies only to integral types5 % 2 = 1, 11 % 3 = 2, 19 % 5 = 4.
You should find a table of operator precedence in your textbook. 2 + 3 * 4 means "add 2 to the product of 3 and 4". If you want it to mean "add 2 to 3 and then multiply by 4" you must change the precedence with parentheses (2 + 3) * 4. A combination of arithmetic operators and arithmetic constants or variables is known as an arithmetic expression. An expression has a value, thus 10 * 3 has the value 30. A statement on the other hand is a command to carry out processing, e.g. x = 10 * 3; is a statement that means assign to the variable x the value of the expression 10 * 3. You might have rationalised the difference between a statement and an expression by thinking to yourself that an expression has a value whereas a statement does not. You would be correct if you were talking about most conventional programming languages like Pascal, Modula-2 and BASIC. But you would be wrong if you were talking about C and C++ since, in these languages, a statement also has a value - in the above example, the statement x = 10 * 3 has the value (30). This value can be used for further operations, e.g. for assignment to another variable:y = x = 10 * 3; // both x and y now have the value 30
Basic C++
6. Type conversions
There are two aspects:!
Automatic conversions carried out by the compiler These are discussed below (para 7) Type conversion operators These use the name of a type as a function in order to force an expression into a particular type e.g. int(99.21) will yield 99.
7.
Assignment operator
C++ carries out automatic type conversion so that the result of an expression on the right hand side of the assignment symbol is automatically converted (if possible) into the type of the variable on the left hand side. This is convenient in many ways, but there are occasions when you need to know what the exact effect is. Like letting a futuristic washing machine automatically decide what program to use according to the clothes you put in. What program does the machine decide to use when you wash a silk shirt and a very dirty towel? Do you get a grubby towel or a ruined silk shirt? Ultimately you will need to know what the conversion rules are, but do not worry about them at present. In any case it is desirable not to make a habit of mixing your washing since you may get a result you did not expect. Briefly, fractional values (types float and double) are truncated when assigned to integral variables (int, unsigned int, long int). Large values that exceed the capacity of the integral variable to which they are assigned will cause overflow and the result will be meaningless. No overflow warning is issued and care should be taken when writing expressions with integral value to ensure that overflow does not occur.
8.
Note that the last may only be used with integers, all others may be used with any arithmetic type. Note also the effect of sum /= 3 + 7. The expression 3 + 7 is evaluated first.
9.
Basic C++
10. Iostream library
Input and output in C++ is based on streams. A stream is an abstract concept that you do not need to worry about. Just think of the natural phenomenon. Whenever a C++ program executes, three streams are opened automatically - standard input, standard output and standard error. Normally, standard input is expected to come from the keyboard and standard output is sent to the display. However standard input and standard output can be redirected from the DOS command line using the < and > characters when the program is executed. Standard error cannot be redirected.
12. Streams
Access to istream (input stream) and ostream (output stream) operators is obtained by putting the preprocessor directive #include<iostream> at the top of each program file that needs to carry out standard input and/or standard output. This has the effect of including the header file iostream.h (a text file) in the compilation.
cout << "A message : " << message << endl; where message is a string constant or variable. << and >> endl cin.get(ch) cout.put(ch) cout.good() cin.good() cout.bad() cin.bad() cin.eof()
are known as the insertion and extraction operators. The unusual notation arises from the object-oriented aspects of the language. Just take it for granted at present causes subsequent output to be displayed on the next line of the display. gets a single character from standard input and returns the state of the standard input stream puts a single character to standard output and returns the state of the standard output stream Return true if there has been no error from the last output (input) operation The opposite of good() Returns true if end of input encountered, false otherwise. When entering from the keyboard, end of input is indicated with Ctrl Z.
All of the fundamental types supported by C++ (including strings) may be input using cin >> and output using cout <<.
6
Basic C++
13. Output manipulators
As their name implies, these allow formatting of the output stream for such things as the field width, justification, decimal precision etc. They are normally included within the output statement - see examples below and Skansholm pp 365-369. Use of these manipulators requires that the header file iomanip be included in the program:#include<iomanip> setw(int)
sets the field width to n characters for the output e.g. cout << "22 right adjusted in field width of 4 is [" << setw(4) << 22 << "]"; produces 22 right adjusted in field width of 4 is [ 22] setw must be repeated for each subsequent output for which a fieldwidth is required. In the absence of setw() the fieldwidth is the actual width of the output. specifies the character that is to be used for padding output that is narrower than the field width, e.g. cout << [ << setw(4) << setfill(*) << 22 << ]; produces [**22]
setfill( char )
setprecision(int)
changes the precision for the display of types float and double (the default is 6 digits). Normally it determines the number of digits displayed, but if the showpoint flag (see below) has been set, then it controls the number of decimal places displayed change flags that control such things as justification, precision etc. setiosflags( ios::showpoint ) forces the decimal point to be displayed even for whole numbers. After the showpoint flag has been set, the effect of setprecision is to control the number of decimal places displayed. setiosflags( ios::left ) and setiosflags( ios::right ) determine the justification of the output which will remain unchanged until the flag is modified by another call. setf() is a member function of iostream and does the same job as setiosflags except that it cannot be used within an output statement as setiosflags can. It would be called by e.g. cout.setf(ios::right);
The items starting with ios:: within the parentheses after setiosflags are constants that are defined in the iostream library. Their names are self-explanatory and you do not need to know their values. The meaning of ios:: will only be explained in a subsequent module unless you read up on it yourself. A program basiccpp.cpp is provided in the lab that shows the effect of setw(n) and some of the flags that can be set using setiosflags(), including display of integer in octal and hexadecimal.
Basic C++
14. Relational operators and expressions
14.1 Relational operators
< > <= >= == != less than greater than less than or equal greater than or equal equal (Note: 2 equal signs with no space between) not equal
Basic C++
16. Logical operators and expressions
16.1 Logical operators.
The draft C++ ANSI standard introduced the new operators AND, OR and NOT. These are not supported by the GNU C++ compiler, nor by Borland 4.5. Instead use &&, || and !
&& or AND ||, or OR !, or NOT logical AND logical OR unary negation
Basic C++
18. The while statement
This is one of several iteration constructs provided by C++, and is the simplest. while ( logical expression == true ) <statement> The parentheses () are required. If there is more than one statement to be executed within the loop then braces { } are required:while (logical expression == true) { statement1; statement2; etc.. }
while
condition set up condition
true
Example
// show.cpp // copies its input to its output next #include<iostream.h> statement int main(void) { char ch; cin.get(ch); // get a character from the keyboard while ( cin.good() ) // Becomes false if end of file or other input problem { cout.put(ch); // output the character to the display cin.get(ch); // get the next char in preparation for the next loop iteration } return(0); } The while statement is preceded by a statement cin.get(ch) that sets up the value to be tested by while. This is important because the termination condition may already exist in which case the loop should not be entered. If the loop is entered, then cin.get(ch) is repeated at the bottom of the loop to set up the condition again. This is invariably the way that files are processed since they may be empty. It is a common error to forget to initialise the test condition before entering the while loop. This program can be used to display the contents of a text file if issued at the DOS command line using redirection:show < show.cpp displays the source program file show.cpp at the terminal
The output can also be redirected, giving a file copy show < show.cpp > showcpy.txt showcpy.txt is now an exact copy of show.cpp Here is a refinement of the above program:-
10
Basic C++
// show2.cpp // copies its input to its output #include<iostream> int main(void) { char ch; while ( cin.get(ch) ) // Becomes false if end of file or other input problem cout.put(ch); // output the character to the display return(0); } The get( ch ) function is called within the loop condition parentheses. The expression cin.get(ch) does two things: a) it gets a character from standard input and passes it back via its argument ch and b) it returns a reference to the standard input stream cin as its function result. The stream has the value 0 when there is no further input and this is the condition being tested by while. This does away with the need for the get prior to entry of the loop, and also with the get at the bottom of the loop.
if
condition
statement(s)
next statement
11
Basic C++
20. Style for logical expressions
In natural language we can say If late for lecture then hurry else have another coffee. We do not say If late for lecture is true then . Similarly, in programming, the test of a logical value e.g. in an if statement would be written as if( late_for_lecture ) hurry(); else have_another_coffee(); and not if( late_for_lecture == true ) hurry(); It is generally considered to be poor programming style to use this second approach and you will lose marks if you use it.
12
Functions Functions
1. Introduction
You have already seen and used a function - the function main which every C++ program must have. Until now it has been reasonable to write all of the code of your programs in this function. However, as programs become larger, it is necessary to break them down into collections of smaller and more manageable units. One such subdivision is the function. Functions give us the ability to store a computation in a named block of code and to carry out the computation simply by referring to its name i.e. by calling the function. This facility for breaking programs down into simpler and more manageable units is a major weapon in the fight to reduce the complexity of large programs and involves the process of abstraction. Abstraction allows us to concentrate on the current task and to ignore details that are not relevant. So when we call a function e.g. sqrt to find the square root of a number, we are concerned only with how to make the call and not what steps the function takes to achieve the computation. We do need to know the data type of the number to be passed to sqrt, the data type of the value returned by it and what happens if we pass a negative value etc. - these aspects are relevant to our making the call, but the actual details of the computation are not relevant. Of course, at different times we will have different levels and views of abstraction - if we had been concerned with writing function sqrt then we would have been concentrating our attention on expressing the algorithm to compute the square root of a number and would have ignored unnecessary detail elsewhere (e.g. the other functions which make up the library maths). A further advantage of storing code in functions is, of course, the ability to re-use them again in other programs. This type of abstraction is called procedural abstraction after the procedures - the name that most other languages use to refer to these named blocks of code. Technically a function differs from a procedure in that it returns a value, whereas a procedure does not. C++ does not have procedures, but it is possible to specify that a function does not return a value. Functions in C, C++ and most other languages (except the functional languages) do not conform closely to the mathematical concept of a function that accepts a single argument and returns a single value. As we shall see, it is possible to pass more than one value to a function and to get back more than one result. The structure of a function is:type-specifier function_name(argument_list) { definition_and_statement_list } type_specifier function_name The data type of the value that is returned by the function A programmer-defined identifier that conforms to the rules for identifiers. This is the name that is used to call the function.
formal argument_list The names and types of the values that are passed to the function on which it is to carry out some computation. definition_and_statement_list Exactly what you have been writing in function main up until now, i.e.. constant and variable definitions and statements including (normally) a return statement that provides the value returned back to the point of the call, e.g. the return(0) appearing at the bottom of main.
13
Functions
Example:You are writing a program which needs to compute values raised to a power. There is no exponentiation operator in C++, so you must develop one yourself. You want to be able to write e.g.:result = power(12,3) where result is a integral variable which is to be given the value 12 raised to the power 3 (i.e. 1728). On other occasions in the same program different numbers are to be raised to different powers, e.g. in the statement cout << power(7,5) << endl; outputs 7 raised to the power 5, i.e.16,807. So the function must be generalised to handle a range of different inputs for its single result. This generalisation is provided by the argument list. In the call to the function, the values passed to the function are known as the actual arguments i.e. 12 and 3 in the first example above, and 7 and 5 in the second example. In the definition of the function they are known as the formal arguments. It is important that you understand this distinction because these two terms are used frequently when talking about functions. Assuming that we want to be able to handle some large resulting values, the integral return type should probably be of type long int. The type of the arguments can be left as plain integer. The formal specification of function power is then:long int power(int a, int b) // long power (without the int) is also OK { definition_and_statement_list return (<long_integer_expression>) } Where <long_integer_expression> is an expression of the result of raising a to the power b. When the call power(12,3) is made the actual argument values 12 and 3 are copied into their respective formal argument variables a and b. If the actual arguments had been integer variables (as opposed to constants) with the same values (12 and 3), then the values of the actual argument variables would have been copied into the formal argument variables producing exactly the same effect. In the function, the formal arguments a and b are effectively local variables of the function. Any variable definitions made in the body of the function are also local variables. This means that they are not accessible from outside the function. In fact, normally, they only exist while the function is executing and are then removed from memory. Inside the body of power there will be an appropriate computation that produces a value representing a raised to the power b, and this value will be passed back by the return statement. A function normally has a value (unless its return type is void) and can therefore be used on the right hand side of an assignment or within a cout statement in just the same way as a variable or an arithmetic expression. In fact a call to a function which returns a value is an expression. In the case of the statement:- result = power(12,3), the returned value will therefore be assigned to result. The value returned by power can be used anywhere else that an expression of long int type is required, e.g. in cout << power(7,5) // 16,807 or even as the actual argument of a call to another function. cout << power( power( 2, 3 ), 4 ) // 4,096
14
Functions
2. Input and output in functions
In general, it is considered good practice to isolate input and output statements in one particular area of a program. This is because I/O tends to be hardware-specific and it is easier to make changes for a different machine platform or display device if all the I/O code is in one place. When writing small programs in a learning situation, it is not always easy to follow this guide for best practice. But, wherever possible, try to confine I/O to one or more suitable functions rather than spreading it across the program in a number of functions whose primary purpose is not I/O. In particular, it is not good practice to carry out I/O in low level functions. The reason for this is that a function that may be re-used many times in many different programs cannot know how the calling program wishes its output to be displayed, whereas the calling program does know this. Different operating environments have different ways of displaying output to the user of the program, so a low level routine that displays output for a character console could not be used in a program that runs in a windowing environment.
3.
Multi-function programs
There must always be a function called main in any C++ program. There may be any number of other functions in the same source program file (or indeed in other source program files). The question then arises - where do you put these other functions? C++ does not allow functions to be nested within other functions (unlike Pascal and Modula-2). So additional functions may appear textually either before function main, or after it. When the compiler scans the source text of a program, it will flag an error if it finds a call to a function whose definition it has not yet encountered. So if a function is defined after main, then a function declaration must appear before the point at which the call is made. This declaration (also known as a function prototype) should normally be placed at the start of main giving the compiler sufficient information to enable it to check that the function has been called properly. This prototype will consist only of the return type, the function name, and the types of its arguments. // fun01.cpp // illustrates the placing of functions in relation to main // tdc 28/09/95 #include<iostream> int add(int a, int b ) { return(a + b); } // this placing is deprecated
int main(void) { // int mult(int a, int b); // prototype commented out int x = 10, y = 3; cout << add( x, y ) << endl << mult(x,y) << endl Error: Function 'mult' should have a prototype in function main() return(0); } int mult(int a, int b) { return (a * b); } Function add has been placed before main contrary to the recommendation for best practice above.
15
Functions
The prototype for function mult has been commented out, causing the compiler error. Removing the comments allows the program to compile successfully. Different organisations may set their own 'house' styles, but we will show the full definition of functions after main with prototypes normally appearing as the first definitions within the body of main. Note that the identifiers a and b in the prototype for mult are not essential. The prototype could have been int mult(int, int); // prototype with argument identifiers omitted But the argument identifiers may be included if they aid the understanding of their purpose. The compiler will also flag an error if the prototype does not match the formal definition as regards either its name, or its number and type of arguments. But it will not detect a difference between the return type as declared in the prototype and as defined in the formal definition. If there is such a difference then a run-time error is likely to result.
4.
16
Functions
code. They may well contain constructs such as branching (if, else) and loops (while etc.) within which the subsidiary functions are called.
5.
Automatic variables
Variables declared within a function are called local variables and have the default storage class automatic (auto is the key word). Since this is the default, the storage class does not have to be given and it is normal to omit it. There are other storage classes that will be dealt with later. Scope is an important topic since the scope rules determine the visibility of objects. If an object is not visible, it cannot be changed. Your Unix password is invisible to others because, if others had access to your account, you do not know what they could do. They might let you have useful comments about your work. On the other hand they might change it, or delete it. The scope mechanism is employed to reduce the chances of errors in a program caused by some other programmer (or even yourself!) from inadvertently corrupting the program as a result of changing an object to which he/she should not have access. This is part of the concept of encapsulation which we shall cover in more detail in the second Semester. For now, work on the principle that functions should not, as a rule, use or modify global variables. As an example, if function x requires a variable to control a loop, declare that variable locally within the function. In that way, only errors within the function itself can cause the loop to run incorrectly. If a global variable were used for this purpose, there is a possibility of it being changed from outside the function while the loop is executing causing errors that can be very difficult to identify and correct. Similarly, although there can be exceptions, functions should not modify global variables directly. Instead this should be done via arguments. More about how in a later lecture. An obvious corollary to the lack of visibility of a local variable from outside the function is that variable names may be duplicated within different functions without any clash.
6.
Function values
A fairly obvious point - the value appearing after return should be of the same type as that in the definition. Thus int add(int a, int b) should, in its return statement, return an integer value. You have been doing this for some time in function main. As mentioned earlier, it is possible for a function to accept no arguments, or to return no value. In either of these cases, the reserved word void should be used, e.g. void dosomething(void) is a function which neither accepts arguments nor returns a value. In this case it must not have a return statement, and a call to it must be used differently to reflect the fact that no value is returned. dosomething(); result = dosomething(); // i.e. a statement, not an expression // wrong
7.
Function arguments
These are a means of passing information to a called function. It is also possible for a function to pass information back via its arguments and this will be dealt with later. Arguments are a comma-separated list of type/identifier pairs appearing within the parentheses after the function name, e.g. (int a, int b) as in function add above. Naturally, the number and type of the actual arguments supplied in the call must match the number
17
Functions
and type of the formal arguments with the exception of default arguments (see Default arguments on page 32. The function may modify the values of its arguments, and this will have no effect on the values of any actual argument variables used in the call. Remember that the values of the actual arguments are copied into the formal argument identifiers. This is the pass-by-value argument mechanism. The actual arguments may be any expression of the correct type. This includes a literal constant, e.g. 9.0, a variable, e.g. f, or even a call to another function which returns a value of the correct type, e.g. cout << sqrt( sqrt(81.0) ); // outputs 3.
8.
9.
unary negation binary subtraction of integers binary subtraction of floats binary subtraction of long int
We are allowed to use the same operator for semantically similar operations because it is convenient to do so even though the actual computation required is quite different - the compiler determines which computation to perform based on the type of the operands. But many languages will not allow the corresponding functions to have the same name, e.g. subtract( int, int ) - a function accepting two arguments of type int would not be permitted to exist in the same scope as subtract( float, float ) - a function accepting two float arguments. This is illogical. Fortunately for us, C++ does permit overloading of function names provided that they can be distinguished by their signatures i.e. the number and type of their arguments. You have already seen this with the standard output stream cout that has a function << that accepts an argument of any one of the fundamental types. The language allows the function << to be declared in such a way that it can be used as an operator. Note that functions with the same name must be distinguishable by their number and type of arguments. The function return type is not taken into account in determining whether they are different. void print( int, int ); void print( float, float ); int print( int, int ); // OK. different argument types // error erroneous redeclaration, the return type is // not considered
18
Functions
10. Reference Arguments
This will be dealt with in a subsequent lecture.
12. Summary
We have looked at functions which may have formal arguments or should have the word void in the formal argument list to indicate that no arguments are required. Functions normally return a value via the return statement, and the type returned must agree with the return type provided in the definition. Functions are called by name, passing actual arguments whose values are copied into the formal arguments. Since a call to a function that returns a value is an expression (i.e. it has a value), a function call may be used in any case where an expression is expected. It is recommended that function definitions appear after the function main. This requires that function prototypes appear as the first lines of function main. Functions whose prototypes are supplied in main are private to main, i.e. the prototypes serve the requirements of main and no other functions. If there are other functions, defined after main, and before the functions they wish to call, then they will not be able to do so. There are two solutions:-
19
Functions
! !
Ensure that the definitions of the functions to be called appear before the definitions of the functions that wish to call them. Provide prototype declarations for the called functions before main so that they have file scope and can therefore be called from anywhere in that file.
Local variables of a function usually have the storage class auto and are not visible to code outside the function. They cease to exist after the function terminates. The formal arguments are also invisible from outside. Changes to formal arguments that are passed by value and changes to local auto variables have no effect outside the function, and their identifiers may duplicate identifiers appearing elsewhere in the program. Functions are one of the weapons that C++ provides in the war against complexity and the errors that this complexity may bring with it. They are an example of procedural abstraction and allow a program to be designed as a hierarchy of functions that progressively refine the problem by breaking it down into smaller problems. Large programs must be designed on paper using this process of stepwise refinement before the program is written. A suitable tool for this design process is a PDL (program description language), one variant of this being known as Structured English. Libraries of frequently used routines (functions) can be written and a very large number of libraries are provided with all compilers, each library containing a number of functions. Pre and post conditions provided as comments at the head of the function are an important way of specifying what they do and how they are to be used. This helps to ensure that, when a large number of tested functions is finally brought together to form a program, the various parts work together as specified. Ideally, input and output should be isolated in a limited number of functions designed for that purpose and not scattered about over many functions whose primary purpose is not I/O. Generally speaking, functions should not modify global variables and should never use global variables for such local uses as loop control.
20
2.
3.
Flow of Control
Equivalent to:if ( justify == 'L' ) cout.setf( ios::left ); else cout.setf( ios::right );
4.
In most programming languages, the for iteration construct is suitable mainly for loops whose number of iterations can be determined in advance. In C++, the for loop is much more general and can, in fact, be employed for any loop including while and do. The syntax is for ( expression1; expression2; expression3 ) statement_block; where:expression1 may consist of one or more statements (separated by commas) that initialise the loop. e.g. count = 1, max = 10; A new variable declaration may be made here whose scope will extend to the end of the for loop block:- e.g. int count = 1, max = 10; expression2 is a logical expression which determines the continuation of the loop (in the same way as in a while loop) e.g. count <= max. This expression may consist of several logical expressions connected by the boolean operators && (and) and || (or). is a statement or statements which will be executed at the end of each loop iteration. Normally this is used to modify the loop control variable, e.g. count++ is either one statement, or more than one statement surrounded by braces. The single statement may be empty.
expression3
statement_block
If any of the 3 expressions is missing, the semi-colon separator must remain to show its absence. Examples a) for( ; ; ) cout << "hello" << endl; runs for ever, printing "hello" on a new line each time b) for(bool forever = true; forever ; ) cout << "hello" << endl; behaves as a) because forever is forever true c) for ( cin.get(ch); !cin.eof() ; cin.get(ch)) cout.put(ch); this is the same as:cin.get( ch ); while( !cin.eof()) { cout.put(ch); cin.get(ch); }
22
Flow of Control
Note that, in example b) above, bool forever in the first expression is the declaration of a new boolean variable. It is convenient and makes programs easier to read if the declaration of variables is as close as possible to the point where they are used. This facility is one of the improvements over the C language provided by C++. Because of its versatility, there is a tendency for programmers to use the for loop exclusively and to ignore the while loop. However, the latter is designed to deal explicitly with cases where the loop should not be entered at all under certain conditions (e.g. when processing a file which may be empty). Although this condition can be handled by for as shown above, its primary purpose is for loops whose number of iterations can be determined before it is entered e.g. when processing arrays (to be covered soon). The very fact that a while loop is being used signals that it may never be entered whereas, in a for loop this fact can only be determined by inspection of its expressions.
5.
The do statement
In a limited number of cases, processing requires that the loop condition is tested at the bottom rather than at the top of loop. In other words, the statement(s) in the loop body will always be executed at least once. The format of the do loop is:do statement_block; while ( expression ); where expression is a logical expression yielding either true or false. As with all loop statements, if statement_block comprises more than one statement, it must be enclosed in braces:do { statement_1; statement_2 ; ... } while ( expression );
// Note: the test normally appears on same line as the // closing brace
6.
Nested loops
Frequently, a loop is nested within another loop or loops. The reasons why this might be necessary will become clearer when arrays are covered. Notice that the total number of iterations of the inner loop is the product of the number of its iterations and those of any surrounding loops. This number can escalate to very large values and can result in programs that run slowly. for ( int i = 0; i < 10; i++ ) for ( int j = 0; j < 10; j++ ) for ( int k = 0; k < 10; k++ ) process( i, j, k )
;
Sometimes, it may not be obvious how many potential iterations of the inner statement will occur because, for instance, the second and third lines above may consist of function calls that, themselves, contain a loop. You should always be aware of the possibility of introducing inefficiencies into a program in this way because it may result in unacceptable performance.
23
8.
9.
constant_expression1,2,3 ..
statement_block
default
24
Flow of Control
Example (this program is installed in the lab) int main(void) { void DoPrint( void ); void DoDisplay( void ); void DoEdit( void ); const char EDIT = 'E', DISPLAY = 'D', PRINT = 'P', QUIT = 'Q'; cout << "Enter choice P)rint D)isplay E)dit Q)uit : " << flush; for( char ch = '\0'; ch != QUIT; ) // ch initialised to null. While ch == null { ch = getch(); // getch from conio.h - char input without echo to the // display switch ( toupper( ch )) // toupper from ctype.h { case PRINT : DoPrint(); break; // assumed functions DoPrint etc. are // defined elsewhere case DISPLAY : DoDisplay(); break; case EDIT : DoEdit(); break; case QUIT : break; default : cout << '\a' << endl; ch = '\0'; // invalid response, // sound the bell } cout << "Enter choice P)rint D)isplay E)dit Q)uit : " << flush; } return(0); }
25
2.
Reference Type
We introduce a new data type the reference whose value is not an integer, float, char etc. but a reference to a variable which holds an integer, float, char etc. It is an alias for another object. Alias means another name for. Example:k b
int& b
and assume that variable k is stored in memory location number 46524. The value stored at memory location 46524 is 5.
Variable b, a reference variable, is declared to be a reference to variable k. It will therefore hold as its value the memory location of k, thus referencing the value of k, i.e. 5. Any assignment of a new value to k will therefore affect the value referenced by b, and any change to the value referenced by b will change the value of k. Note that the special symbol & is placed after the type (int) in the declaration of b and that this must be followed by an initialisation using a previously declared variable (not a value) of the correct type. Once this declaration and initialisation has been made, b is behaves exactly as though it were an ordinary variable. The compiler looks after the necessary indirection so that e.g. the assignment:b = 12; is interpreted as 'assign the value 12 to the memory location referenced by b'. After this assignment, k also has the value 12, and after the further statement b++, k has the value 13. Note that b++ does not (in contrast to a pointer) increment the value stored in b, i.e.. it does not change 46524 to 46525. Once a reference variable has been associated with another variable in this way, it cannot be changed so that it refers to a different variable. Thus &b = m intended to mean 'change b so that it is now an alias for m instead of k' is not allowed.
27
Abstraction The fact that references hold an address does not need to be known in order to use them, whereas you must take specific action in order to make a pointer point to some other object and to obtain the value of the object pointed to (see Syntax below). Syntax Pointers require special symbols to be used by the programmer !
to assign to a pointer the address of another object i.e. to make it point to it - use the address operator & to yield the value of the object to which a pointer points, known as dereferencing - use the indirection operator *
Reference variables, once declared are treated as ordinary variables without the use of special symbols. The necessary indirection is looked after by the compiler. References are at a higher level of abstraction than pointers. A further difference is that pointers can be reassigned at will to point to another variable and can be incremented to step through memory. They are a much lower level tool than references as befits their origin. References cannot be reassigned to point to a different object.
Pointer variables
int k = 5; int* ptr int* ptr = &k; OK. Declaration of a pointer to int named ptr declaration and initialisation combined using address operator & using indirection operator * to assign 12 to the variable to which ptr points prints 12 prints 12 using dereferencing prints the address of k e.g. 46524 increments the address held by ptr which now references the memory location immediately following that of k. (k is unchanged) int k = 5; int& ref; int& ref = k;
Reference variables
illegal. Must be initialised on declaration declaration and initialisation assignment to the variable referenced by ref (no special syntax needed) prints 12 prints 12 increments the variable referenced by ref, k now has the value 13
*ptr = 12; cout << k; cout << *ptr; cout << ptr; ptr++;
ref++;
28
5.
There may not seem to be a great deal of value in this mechanism until we meet compound data types, e.g. array and struct.
6.
29
7.
30
An example of the use of these two functions is:char source[25] = "GNU"; char *blank = " ", *cplus = "C++"; char destination[25]; char *p = destination; // p points to the string destination p = strcat(source, blank); // concatenate a blank onto source. p points to source strcat(source, cplus); // concatenate "C++" onto source strcpy(destination, p); // copy the result back into destination. p still points to // source which has been changed. cout << "destination = " << destination << endl; destination = GNU C++
31
9.
Inline functions
Calling a function has an overhead that costs time. The runtime system has to set up a 'stack frame' and allocate space for the arguments and local variables. On termination, the stack frame has to be released and a jump made to the point immediately after the call. Very small functions can be specified as 'inline' so that the compiler will substitute the actual code of the function body for each occurrence of a call to the function. This will improve speed at the expense of code size. In fact, the use of inline is a recommendation only, and there is no guarantee that the compiler will honour it - this will depend on the compiler and the size of the inline function. int main ( void ) { inline int square( int ); // prototype ... z = square( x ); // compiler should substitute z = x * x ... } int square( int a ) { return ( a * a ); } A test of the above program was timed for 100 million calls to function square. The elapsed time without inlining was approx 3.9 seconds and, with inlining, approx 3.05 seconds - an improvement of 20%. The code size was increased by a very minor amount because the call to function square occurs only once.
32
33
Arrays Arrays
1. Introduction
Arrays are an aggregate type capable of holding a number of values all of the same type, contiguously in memory. The components may be any one of the fundamental data types int, long, unsigned, float, char, enumerated, pointer or one of the aggregate types, i.e. array, struct or class. The struct and class types have not yet been covered. The struct is referred to in other languages as record and consists of one or more fields of (possibly) different types (including arrays and records). The class data type will be covered in the Object-Oriented Programming & Design module. The advantage of the built-in array type is that a large number of data items can be held in a single named array variable whose components can be accessed randomly as we shall see later. The disadvantage is that its size is fixed at compile time and this cannot be varied at run time to accommodate the fluctuating requirements of the application. Most of the time, therefore, it is wasting space because it is not full and the type itself does not allow resizing. The solution, as we shall see later, is dynamic memory allocation.
2.
Example
0 9 1 14 2 7 3 5 4 1 5 3
an array called table capable of holding 6 integers an array called temperatures capable of holding 31 floats an array called name capable of holding 16 characters (but note that, allowing for the terminating NUL character, only 15 readable characters can be held).
Arrays are indexed. That is, each element is uniquely numbered. The numbering always starts at 0 and always increments by 1 for each successive element (regardless of the size of the elements).
35
Arrays
The value held by table element 0 is 9, the value held by table element 1 is 14 etc. Access to the elements (or components) is by subscripting the table name with the desired element number. Thus table[0] is an integer with the value 9, table[1] contains 14 etc. Notice that, since the numbering starts at 0, the last element always has an index one less than the number of elements. The subscripted array can be used anywhere that an expression of the component type is required:const int size = 6; int table[ size ]; table[ 5 ] = 22; table[ 1 ] = table[ 5 ]; cout << table[1];
Change the value of element 1 to that of element 5
The subscript may be any expression with an integer value, thus:int i = 3; table[ i ] = table[ size - 1 ];
change the value of element 3 to that of element 5 (the last)
Since the array subscript can be a variable, we can process an array's elements by means of a loop using as subscript a variable that increments for each iteration of the loop:-
2.1
The input is a valid integer The end of the array has not been reached
For this reason, the input is read into an auxiliary variable anint before the start of the loop and before it is assigned to an array element inside the loop. A further input is then assigned to anint at the bottom of the loop.
2.2
36
Arrays
2.3 Shuffling array elements one position left (or down)
This requires care to avoid overwriting the changes. const int size = 6; int table[ size ] = { 0, 1, 2, 3, 4, 5 }; // initialised on declaration - see below Original contents 0 1 2 3 4 5
2.4
3.
Array initialisation
Arrays may be initialised on declaration by enclosing a list of values within braces, separated by commas. If all elements of the array are given values in this way, the number of elements need not be supplied between the brackets after the array name:int table[] = { 9, 14, 7, 5, 1, 3 }; Multi-dimensional arrays may be initialised by placing braces around each row, and separating the rows with commas (see the definition of type Plane in section 4):Plane aPlane = { { 'X', ' ', 'X', 'X' }, { ' ', 'X', ' ', 'X' }, .... { 'X', 'X', ' ', 'X' } // // // // Row 1 Row 2 etc. Row 12, no comma
}; Where some initialisers are omitted, and the array is not auto, the remaining elements are set to 0. The behaviour for auto (local function) variables is undefined. The number of elements in an array can be found by the built-in sizeof function:cout << "sizeof(table) = " << sizeof(table) << endl << "sizeof(int) = " << sizeof(int) << endl << "num elements = " << sizeof(table) / sizeof(table[0]) << endl; sizeof(table) = 24 sizeof(int) = 4 num elements = 6 But note that sizeof cannot be used in a function to find the size of an array formal argument since this is a pointer.
37
Arrays
4. Multi-dimensional arrays
There is no theoretical limit to the number of dimensions an array may have, although the number of elements increases rapidly with the number of dimensions as do the chances of there being redundant elements. Two dimensional arrays are declared with 2 values, each enclosed in brackets:// airplane reservation system const int maxRows = 12, seatsPerRow = 4; typedef char Plane[maxRows][seatsPerRow]; // declares a new type based on a // fundamental type Plane aPlane; // aPlane is a variable of type Plane void makeEmpty( Plane aPlane) { for( int row = 0; row < maxRows; row++ ) for( int seat = 0; seat < seatsPerRow; seat++ ) aPlane[ row ][ seat ] = ' '; // Space = empty } Functions that operate on the Plane data structure bool seatFree( Plane aPlane, int row, int seat ); // return true if row,seat is a space, else false void allocateSeat( Plane aPlane, int row, int seat ); // mark seat allocated with an 'X' void showSeatingPlan( const Plane aPlane ); // show plan with spaces and Xs as opposite
1 2 3 4
Seat
1 2 3 4
X X X X X X
Row
5.
11 12
An example of a 2 dimensional array aPlane of type Plane being passed to a function appears in 4 above. In C++, an array formal argument to a function is always a pointer to the first element of the array. This is automatic without any action on the part of the programmer. Within the function, the array may be subscripted in the normal way. This explains why, in the function makeEmpty above, it was not necessary to use a reference argument to ensure that the changed value of the array was passed back to the point of the call. Since a pointer is passed automatically, any change to the formal argument within the function body is, in fact, being made to the actual argument. If it is not intended that the function should modify its formal argument, then the argument should be const modified to indicate the fact. The compiler will then flag an error if the function body contains statements that might modify the formal argument. void showSeatingPlan( const Plane aPlane )
aPlane is a constant and may not appear on the LHS of an assignment within the function.
38
Arrays
6. Pointers and arrays
This has already been introduced under pointers. Note that an array name unqualified is treated by the compiler as an address, so const int size = 6; int table[size] = { 0, 1, 2, 3, 4, 5 }; int *ptr = table; cout << *ptr *ptr = 10 ptr++ cout << *ptr cout << *(table + 3) cout << table[3] // assigns to ptr the address of the first element of table // outputs the object to which ptr points, namely the // integer 0 // changes the value of table[0] to 10 // moves ptr to point to the next element of the array // outputs 1 // outputs 3 // same as above, outputs 3
Unlike most other languages, C++ supports pointer arithmetic and, since table is a pointer, a variable can be used to indicate an offset from the beginning for ( int i = size - 1; i > 0; i-- ) *( table + i ) = *( table + i - 1 );// shuffle contents one element to the right or, using a supplementary pointer for ( int* p = table + size - 1; p > table; p-- ) *p = *( p - 1 );
address of table + size(6) - 1 elements = address of last element The compiler knows the size of an int, so p-results in p being adjusted by sizeof(int), i.e. by 2 or 4 bytes on a PC (depending on the compiler), similarly with p - 1
39
Arrays
7. Character strings and variable pointers
Notice the difference between char word[] = "hello" and char *greeting = "hello". word is a constant address where the string is stored. greeting is a pointer containing the address at which the string is stored. char word[] = "hello"; char *greeting = "hello"; cout << "word[] = " << word << endl; cout << "greeting = " << greeting << endl; word = "fred"; strcpy(word, "wilfred"); greeting = "william"; cout << "word[] = " << word << endl; // OK. No problem // OK. No problem
compiler error: "incompatible types in assignment of 'char[5]' to 'char[6]'" because word is a constant pointer and can't be assigned do this instead, but note that, if the new string is longer, the extra chars are stored outside the array's allocated memory and may cause the program to crash
8.
As shown above, inserting into the output stream either the name of a character array e.g. word or a pointer to a character string e.g. greeting has the same effect. setw(<field_width>) causes a string to be output right-justified in field_width. It can be left justified by the manipulator cout.setf( ios::left, ios::adjustfield ); or by setiosflags(ios::left) as in cout << setiosflags(ios::left) << setw(10) << word << endl; cin can be used for string input, but terminates at the first whitespace character (space, tab). To avoid possible overflow by the input exceeding the space allocated to the string, setw can be used within cin to limit the number of characters entered. The excess characters are held in the input buffer and are used to satisfy any subsequent use of cin. const int MESSAGESIZE = 4; char input[MESSAGESIZE+1]; cout << "Enter a message without spaces: "; cin >> setw(MESSAGESIZE+1) >> input; char overflow[80]; cin >> overflow; cout << "your input: " << input << endl << "the overflow was: " << overflow << endl; To input lines of text whose length is unknown at compile time, use cin.getline( char *line, int limit, char delim = '\n' ) The input is restricted to limit characters (e.g. 80 for a typical line of text) and is terminated by the supplied delimiter that defaults to newline and may be omitted to use the default. The terminator is not stored in the array. The address at which the line is stored is held in the pointer line
40
Arrays
const int linelen = 80; char line[linelen+1]; cin.getline( line, linelen ); while( !cin.eof() ) { cout << line << endl; cin.getline( line, linelen); }
9.
-------- Inspecting ptr ------8F50:0FF0 [0] 8F4C:001E "one" [1] 8F4C:0022 "two" [2] 8F4C:0026 "three" [3] 8F4C:002C "four"
This makes for efficient use of memory when storing large numbers of strings. The 4 arrays of char are allocated contiguously in memory and the above could be viewed as follows:o n e \0 t w o \0 t h r e e \0 f o u r \0
Printing this array of pointers can be done by for (int i = 0; i < 4; i++ ) cout << ptr[i]) << endl ;
The GNU C++ debugger built into RHIDE does not support inspect
41
Arrays
10. Command line arguments
You have already encountered programs that accept command line arguments, e.g. dir /w. Dir accepts an argument w that indicates a wide display of file names. The slash is just an indicator that an argument follows. MS DOS provides the facility for programs to pick up arguments supplied at the command line when invoking a program. For example pretty.exe might be a C++ program to 'pretty print' C++ source files, in the command line invocation pretty myprog.cpp the argument myprog.cpp represents the name of the source file to be printed. In C++, information about these command line arguments is provided by 2 arguments to function main named by convention:! !
the number of arguments (including the name of the executed program) an array of pointers to char representing the strings appearing on the command line.
In the above example, argc = 2, argv[0] is a pointer to the string "pretty", and argv[1] is a pointer to the string "bacteria.cpp". Whitespace on the command line separates the arguments into the individual components of argv[]. Thus a command line containing myprog /x/y/t myfile would represent 3 arguments, with "myprog" in argv[0], "/x/y/t" in argv[1] and myfile in argv[2], whereas myprog /x /y /t myfile would produce argc with the value 5 with argv[0] holding the string myprog the four arguments /x, /y, /t and myfile held in elements argv[1], argv[2], argv[3] and argv[4] respectively. What these arguments mean, of course, is up to the author of myprog. It is good practice to check the number of arguments in main and, if the number falls outside the number expected (often a variable number of arguments can be entered), an error message is issued and the program terminates. If no arguments are supplied (other than the program name, of course) and at least one is expected, then it is usual to print the program name together with a list of valid arguments. This list should not be verbose and should not exceed about 22 lines otherwise some lines will disappear off the top of the screen. There is a convention that MSDOS programs expect arguments announced by the slash '/'. In Unix the character used is invariably minus '-'. Assuming that you have written a program to pretty-print a C++ program; that the program name is pretty and that 3 arguments are allowed:1. 2. 3. /ln print n lines per page, where n is an integer (optional - defaults to 60) /fn print with font size n, where n is an integer (required) filename to print (required)
argc will hold a maximum value of 4 (the name of the program plus 3 arguments) and a minimum of 3. If argc < 3 or argc > 4 then there is an error and the program should display an error message to the terminal and then terminate. The error message would be something like:incorrect number of arguments usage: pretty [/ln] /fn filename /ln = print n lines per page /fn = use font size n (8..12)
42
Arrays
Note the square brackets to indicate an optional argument. The program can then be terminated with either:! !
return 1; exit(1);
when the error is detected in main, or in other cases. exit is in cstdlib (or stdlib.h).
By convention, a non-zero value returned from main or as an argument to exit indicates an error. In both cases, other non-zero values can be used to indicate different error conditions.
12. Review
You will, by now, have seen that arrays and pointers to arrays in C++ are somewhat complex and error-prone. This is because these facilities were designed over 20 years ago for 'C' (a language that was originally designed for writing operating systems) and have had to be retained in C++ for backward compatibility. In fact, the object-oriented facilities provided by C++ allow these deficiencies to be hidden from the application programmer who can use libraries of classes e.g. class string which hides the underlying shortcomings of the built-in array of char type. In particular, the disadvantage of the fixed size of built-in arrays and the absence of array bounds checking can be overcome in container classes which are provided with most C++ implementations and are now standardised as the Standard Template Library. However, we shall be concerned with how container classes are designed and written and we therefore need to understand the base facilities on which they are built. You will be provided with a simple String data type that can be used for assignments. You should read Skansholm pp 91-93 on the standard string type that is now part of the Standard Template Library. If you wish, you can use this standard type wherever strings are required.
43
Arrays
13. Summary
!
The array type allows a collection of items of the same type to be stored under a single name. The array declaration specifies the type of its components and the number of elements. Individual components of an array can be accessed by subscripting the array name with an integer expression, making them well suited to processing by loops. The compiler provides no run time checking of array bounds so that care needs to be taken to ensure that array bounds are not exceeded otherwise memory may be corrupted. When an array is passed to a function, the address of the first element of the actual argument is copied into the corresponding formal argument. An array formal argument can be declared as either e.g. int table[] or int *table they both mean a pointer to an array of int. Within the body of the function, the components of the array may be accessed either using subscripts, normally in the form of a variable whose values are controlled by a loop e.g. table[ i ], or by a pointer. In the formal argument list of a function, a multi-dimensional array must specify the number of all dimensions except the first. Arrays with 2 or more dimensions are likely to be specific to a particular application and are best given a new type name using typedef. Arrays can be initialised on declaration with values inside braces separated by commas. Any items unspecified in this way are initialised to 0 except in auto declarations where the treatment of unspecified values is undefined. This default initialisation only has meaning for the primitive types. Strings are one-dimensional arrays of char terminated by the ASCII NUL ('\0') character. Room must be allowed for this character otherwise output and other routines will not behave correctly. In some programmer-defined functions that process arrays of char, the terminator must be provided by the programmer. Arrays of pointers to char can be used to handle arrays of strings. This is how command line arguments are provided as the second argument (argv) to function main, the first argument (argc) being an integer representing the number of arguments. A array of pointers to char can be initialised with a list of strings. The number of pointer elements, unless given within the brackets, is fixed by the number of strings in the initialisation list. Output of this array of char pointers could be by:int size = sizeof(course) / sizeof(course[0]); for ( int i = 0; i < size; i++ ) cout << course[ i ] << " "; cout << endl; sizeof(course[0]) will yield either 2 or 4 (the size of a pointer), and sizeof(course) will yield either 10 or 20 (5 pointers). The value of size in either case will be 5.
44
Arrays
14. An array application - Stack of char
A stack is an abstract data type - a type that is not provided by the programming language but which can be implemented by using the data structuring facilities of the language. A stack works on the LIFO (last in, first out) principle - the last item put onto the stack is the first to be removed from it. The last item put onto the stack is at the top of the stack and the next item to be removed will be taken from the top. Access to the stack is at one end only - the top. Compare it to a stack of plates - the next one to be used is the latest one to be placed onto the stack. The standard operations on a stack of char are:! ! ! ! !
void push( char ) void pop( void ) char top( void ) bool empty( void )
char is pushed onto the stack the top of stack item is removed the top of stack char is returned, the stack is unchanged returns true if the stack is full, otherwise false
One way of implementing a stack is to use an array:// charstck.cpp // illustrates an array implementation of a stack of char const int MAXSTACK = 20; // 20 elements char stack[MAXSTACK ]; // the stack int thetop; // the index value of the current top of stack // (initially empty) int main ( void ) { void push( char ch ); // 5 function prototypes void pop ( void ); char top( void ); bool empty( void ); void makeempty( void ); char word[] = abracadabra; makeempty(); for ( int i = 0; word[ i ] != \0; i++ ) // push each letter of word push( word[i] ); cout << word << reversed = ; while ( !empty( ) ) { cout << top( ); pop( ); // output the top char and then pop } cout << endl; return 0; } void push( char ch ) // post - ch has been placed at the top of the stack { ... } void pop ( void ) // pre - the stack is not empty // post - the top of stack item has been removed { ... }
45
Arrays
char top( void ) // pre - the stack is not empty // post - the top of stack item has been returned. The state of the stack is unchanged { ... } bool empty( void ) // post - if the stack is empty, true is returned, else false is returned { ... } void makeempty( void ) // post - the stack is empty // abracadabra reversed = arbadacarba Note that the code in function main never accesses the array stack directly. All operations are carried out only via the provided routines makeempty, push, pop, top, empty. This is an example of data abstraction - the stack data structure is protected from corruption by requiring all accesses to be made through these functions. In the example, this discipline is not enforced - it is possible for the stack to be accessed directly since stack is a global variable that has file scope. We shall see later how direct access can be prevented, and how the stack can be encapsulated in a single entity that holds both the array and the variable that records the top of stack.
46
The main program file that includes a function main Zero or more modules providing support functions, data types etc. comprising
!
A header file ( .h ) that contains prototype declarations for the functions provided by the module and possibly type and data declarations. A source ( .cpp ) file containing the definition of the functions, types and variables provided by the module. This file may or may not be present. The object file ( .obj ) created by compiling the .cpp file (see above) that provides the definition of the functions whose declarations appear in the header.
The main program file contains compiler directives to #include the header file(s) for the supporting modules. This ensures that functions and variables, constants and types defined in the supporting source files can be accessed by the main program. In other words, the header files provide the prototypes for functions and referencing declarations for variables etc. that allow the compiler to generate code for the main program without the source of the supporting .cpp files themselves being present at compile time. At link time, the programmer must indicate which supporting object ( .obj ) files he wants to be linked with the object code of the main program. Within the GNU C++ IDE this is done by creating a project which defines all the required source files for a particular project and ensures that the object code of each is up to date before the linker links them all in to produce the executable. The project definition itself is saved as a .gpr file which can be opened and changed as required. By default, the name of the executable file will be the name of the project file. Thus assign1.gpr (the project file) will cause the executable resulting from linking all object files to be named assign1.exe regardless of the name of the main source program file. The default can be changed by the menu item Project.main targetname. Take iostream as an example. You must include the compiler directive #include<iostream> to ensure that the actual text of this header file is included in the compilation of your main program. Without this, the compiler would not be able to make sense of a call to e.g. cin.get(). You do not need the source of iostream (iostream.cpp) and it is not even present on the machine. At link time, the linker sees the header declaration and knows from this that the object file for iostream must be combined with the object code generated from the source of your main program in order to produce the executable. The integrated environment allows the location of the object code of iostream to be specified and the linker fetches it from that directory for inclusion. Thus we have the concept of separate program modules that consist of two parts:! !
an interface part - the header file iostream an implementation part - the object file iostream.obj. (In fact, you will not find iostream.obj in the directory because the code is included in the library files in the lib directory).
The interface part defines the services provided by the module in terms of the functions, variables, constants and types that are provided (exported) by the module. The implementation part provides the actual implementation in the form of object code that is needed at link time. This is another example of abstraction. We need to know how to call the iostream functions, and it is convenient that objects like cin and cout are pre-declared. For this
47
Program Files
reason, prototypes of the functions and the declaration of the standard I/O streams are made available to us in the header file iostream, but the implementation is hidden in the library files since we need not be concerned with how the functions are implemented nor how stream objects are represented. Consequently we can access the resources provided by iostream only via the routines and declarations provided in the header file (the interface). We cannot access the representation of streams because it is hidden and is therefore protected from the possible corruption that might have occurred had we been allowed direct access to it. Note that the ANSI C++ standard specifies that system header files such as iostream, string, vector etc. should not be given with a .h file name extension. However, all other modules (including those that you write) must have the extension .h. The GNU C++ compiler meets this requirement of the standard, but other, older, compilers may not and, in those cases you will have to use the old name for such system headers, e.g. iostream.h.
2.
the main program file containing the function main a source file containing the definition of support functions, type and variable definitions the header file containing the external referencing declarations for the functions, types and variables that are defined in other.cpp
Select Project - Open project - call it myprog.prj Add other.cpp and main.cpp to the project Compile other.cpp Compile main.cpp. (Header file other.h is brought in during compilation) Link main.obj with other.obj When you choose link with main.cpp
! ! !
You are not linking the source files but the object files created by the compiler. The linker doesn't know what to link with main.obj unless you have a project The linker links together the object code of other, main and of any library code required e.g. iostream You could use make instead of compile and link. This will compile all modules whose object file has a time earlier than the source file (.cpp) and then link.
The name of the executable is the same as that of the project i.e. myprog ( not main ). Some students find this process of setting up a project intimidating for some reason. But it quite simple and has to be mastered in order to write real programs that consist of more than one file.
3.
48
Program Files
!
type This is important because it determines the amount of memory that is allocated for the representation of the object and also its bit pattern. Thus both the number of bytes and the pattern of the bits stored in those bytes will be completely different between e.g. an int and a float even if they appear to hold the same value. storage class This is important because it determines the lifetime of the object, i.e. how long it remains in existence occupying storage. Storage class has defaults which are determined by the position in the source code of the object's declaration. This may be varied by providing an explicit storage class on declaration. There are 3 categories of lifetime !
! !
lifetime is transient and exists only for the lifetime of the enclosing block (usually a function, but see later). lifetime exists for the duration of the program's execution allocated dynamically during a program's execution. lifetime is for the duration of the program, or until de-allocation whichever is sooner. This will be dealt with later.
scope This is the portion of the source code within which the object is visible. Thus a variable declared within a function is visible (in scope) only within the block of statements that constitute the function body regardless of its storage class. See also Skansholm Chapter 4.3 Declaration, scope and visibility.
There can be different combinations of scope and storage class, e.g. a function local variable can be declared static. The effect is that its visibility (scope) remains limited to the enclosing block (i.e. the function body) but its lifetime continues for the duration of the program's execution.
4.
Local duration
Unlike some programming languages (e.g. Pascal and Modula-2), the body of a function may not include the definition of another function. In other words, functions may not be nested in C++ and the only valid definitions appearing within a function are those for data items. Variables defined in a function have the default storage class auto and the formal arguments to the function are also treated as auto. The body of a function is a sequence of declarations and statements surrounded by braces {}. This construct is known as a compound statement or block. Within a function body, any statement may itself be a block. It is logical therefore that such a block, nested within a function body, should be allowed to contain data declarations, and that the scope of those declarations should be the surrounding block as with function local variables. Therefore the sequence of statements that depend on the truth or otherwise of the logical expression in an if statement may be a block that contains declarations whose scope is limited to that block. A block may even consist of just the braces surrounding one or more statements :-
49
Program Files
void swapifless ( int& a, int& b ) { if ( a < b ) { const int temp = a; a = b; b = temp ; { int inner = temp; cout << a; } cout << inner << endl; } cout << temp << endl; } Function swapifless above could have included a local variable definition int temp (declared before the if statement). This outer temp would have been invisible within the if block because the inner temp would have caused a 'hole' in its scope. This hole would extend for the scope of the if block only. A local variable can, of course, be initialised on definition. This initialisation can be by any expression that is valid at that point, for instance by an expression that contains reference to the formal arguments as above. In the absence of any initialisation, the value of a local auto variable is undefined. // error undefined symbol temp if block function body block
inner block
5.
A definition of a variable is a statement that allocates storage with optional initialisation:int count = 0; // allocates storage
A declaration of a variable is a notification to the compiler that a variable has been defined in another file, but is being referenced in the current file:extern int count; // external referencing declaration. Does not // allocate storage
You will not normally need to make external referencing declarations because our standard practice will be to #include a header file that serves the same purpose (see para 6 below).
50
Program Files
6. Static duration
An external referencing declaration for a function is no different in form from the function prototypes with which you are already familiar. It informs the compiler that a function is to be called from a separate file from that in which it is defined. An external referencing declaration for a function is made in the source program file in which the call to the function is to be made, i.e. in the file in which it is not defined. The format is as follows: external void print( void ); // declares a function that is defined in another file // external may be omitted
External referencing declarations are usually made by placing in the main program file a compiler directive to #include a header file that provides the necessary external referencing declarations as explained in paragraph 1. Variables declared outside of any function - e.g. before function main have file scope and are referred to as global variables. The C++ compiler guarantees to initialise any global variables to zero, but it is considered good practice to initialise them explicitly. As with any data declaration, using the same identifier as another object declared in a surrounding block, a local variable causes a hole in the scope of the global variable with the same name - see the example below:#include<iostream> int sum; int main( void ) { void subroutine( void ); // prototype declaration sum = 15; subroutine(); cout << "Global sum is " << sum << endl; return 0; } void subroutine( void ) { float sum = 1.234; cout << "Local sum is " << sum << endl; } The global variable sum is distinct from the local variable of the same name in function subroutine. The latter causes a hole in the scope of the global from the point immediately after the definition of float sum. The only variable of that name visible within subroutine is the local one with the value 1.234. As a corollary float sum is not visible with main because its scope is confined to the function in which it is defined. The program's output is:Local sum is 1.234 Global sum is 15 It is possible to gain access to a global variable even when it is masked by a local variable of the same name. In function subroutine for instance the global variable sum can be referenced by preceding it with the double colon scope resolution operator which you have already met in e.g. setiosflags( ios::left ):cout << "Local sum is " << sum << endl; cout << "Global sum is " << ::sum << endl;
51
Program Files
7. Storage class static
Variables that are explicitly given the storage class static may be either local or global. The meaning of the static differs depending on whether its declaration appears within a function or outside.
8.
9.
52
Program Files
10. The C++ pre-processor
This is a simple macro processor that, in the case of GNU C++, constitutes a separate pass by the compiler. It makes a pass over the source file substituting all occurrences of defined identifiers with the token string that represents the macro definition. Thus, if you liked Pascal and also like typing, you could make C++ look more like Pascal by replacing all occurrences of { with BEGIN and all occurrences of } with END; and by providing macros that carry out the conversion back to the C++ convention immediately prior to compilation; #define BEGIN { #define END; } int main( void ) BEGIN int a, b; if ( a > b ) BEGIN int temp = a; a = b; b = temp; END; return(0); END; The macro processor was used extensively in C to produce the effect of inline functions and constant declarations which are now part of the C++ language. Its use in C++ is therefore mostly confined to controlling conditional compilation and the inclusion of header files.
53
Program Files
In order to eliminate the debugging statements, it is only necessary to change the value of DEBUG from true to false (0), and re-compile and link. The GNU C++ IDE allows macro constant definitions to be changed via the menu item:Options.Compiler options To define a macro named DEBUG, go to this menu item and enter -DDEBUG. To undefine it, enter -UDEBUG. A file macro.cpp is installed in the labs for you to try this out. The conditional compilation facility may also be used to generate different versions of a program for different platforms or conditions.
#include"myheader.h"look in the current directory first, then the standard include directories The standard include directories are stored in a directory indicated by operating system path directives that are set up when the system starts or that are indicated by values that can be configured from within the IDE. When developing programs that consist of several modules (files) it is normal to supply a header file for each module other than the main module. The main module then requires compiler directives to #include these header files, using the form #include "filename.h". If necessary, the header file may also be included in the compilation of the .cpp file for which it is the header. In cases where header files themselves contain include directives, there is the likelihood that some declarations will be included twice. In those cases, header file inclusion may be made conditional on the existence or otherwise of a definition Initially, you will not be writing programs whose complexity requires the use of #ifndef and #define so do not worry about them unduly. When the linker complains that you have multiple definitions of a function or variable, you will know that you have hit the problem. Then seek advice.
54
2.
3.
Classification
There are two main groups - single entities of which there may be many instances e.g.Clock, and collections (or containers) of many objects of the same type e.g. Set, List etc. The components of these collections may be of any type, but, within one collection, must all be of the same type. Frequently, part of the definition of a collection is the relationship between the members.
55
Data Structures
4. Categories of Collection
The broad categories are:!
Collections in which there is no relationship between the members except that, in the domain of all possible values that may be a members, each is either a member or is not, e.g. Set and Bag. Linear structures in which the members have a one to one relationship with each other.
Set
Linear
Hierarchical structures in which the members have a one to many relationship with each other.
Hierarchical (Tree)
5.
Stacks
Definition This is the simplest of the linear collection types since the number of operations is typically small. As with all containers, the components may be of any type, but must be of the same type within any one stack. Additions to, and removals from the stack are made at one end only - the top. Access to components is limited to the item currently at the top. The consequence of this relationship between members is that the first item to be added is the last to be removed. This is known as a LIFO structure - last in, first out. Stacks are very widely used in Computer Science. When a function is called, a stack frame is built containing the address to which control must return when the function has finished execution. In addition, space is reserved in the stack frame for any auto local variables and for the values of any actual arguments passed to the function. This structure is pushed onto the system stack. When the function terminates, the stack frame is popped from the stack, causing the arguments and local variables to perish. Another application is recording the path taken through a structure so func that it can be retraced - the 'Hansel & Gretel' effect.
main main
Graph
int funa ( int y ) { return ( y * 2 ) ; } int funb ( int z ) { return ( funa ( z ) / 2 ); } int func( int a ) { return ( funb( a ) ); } int main (void ) { int x = 4, y; y = func( x ); }
funa
funb
funb
funb
func
func
func
func
main
main
main
main
main
56
Data Structures
The classic operations are:push top pop empty push a new item onto the stack retrieve the top of stack item without removing it remove the top of stack item test if the stack is empty
Viewed as an abstract type, a stack cannot be full, but the actual implementation may have to place a limit on the number of items that can be held on the stack. This gives rise to a further operation full test if the stack is full
Operations on abstract data types can typically be categorised into those that:! ! !
change the state of the data type e.g. push, pop report on the state of the data type without changing it e.g. top, empty, full. create and/or initialise an instance of the type - no example here
Each operation is provided with a pre-condition and post-condition that states i) pre - any requirement placed on the caller as to the state of the structure prior to the call, or on the values passed as arguments; for instance, top and pop must not be called on an empty stack. ii) post - the state of the structure that is guaranteed to hold after the operation has been carried out, provided that the pre-condition has been met; for instance, after a push, the number pushed is at the top of the stack. The definition of a stack of integers can be placed in a header file which is then available for importing (using #include "intstack.h") by any client program requiring it:// intstack.h // definition of a stack of integers void push( int arg ); // pre - !full() // post - stack contains the value of arg, top() = arg void pop( void ); // pre - !empty() // post - top() has been removed int top ( void ); // pre - !empty() // post - stack is unchanged, the item at the top of the stack has been returned bool empty(); // pre - none // post - returns TRUE if stack is empty, otherwise FALSE bool full(); // pre - none // post - returns TRUE if stack is full, otherwise FALSE
57
Data Structures
Representation The obvious first choice for representing a stack is an array, although this has the disadvantage that an upper limit for the number of items to be stored must be chosen before compiling, and this cannot be varied at run-time. This representation should be hidden from a user of the stack by specifying the storage class static // intstack.cpp // representation and implementation of a stack of integers #include "intstack.h" const int MAX_STACK = 10; // the maximum number of items that can be stored static int data[MAX_STACK]; // the container for the stack members static int Top; // the index of the top item. // Top will need to be initialised on startup, incremented // before pushing a new member, and decremented after // popping a member. // When Top = MAX_STACK - 1, the stack is full Implementation of the operations This is left as an exercise. The full definition of the functions would be placed after the global data definitions in intstack.cpp. Note that intstack.cpp contains an include compiler directive for the header file. intstack.cpp would contain only the data declarations shown above and the function definitions. There must be no function main. Using the stack A client program wishing to use the integer stack would import the definition (i.e. #include "intstack.h") and then carry out operations on it as though it had been defined in the same file. Because of the static qualifiers used for the array definition data and the integer variable Top, the client program cannot access the representation directly even if extern declarations are made for these two items in the client's source code. const MAX_STACK also cannot be accessed because of its const qualifier. #include <iostream> #include "intstack.h" int main( void ) { // push some items cout << endl << endl; while( !full()) { static int item = 0; push( ++item ); cout << "pushing " << item << endl; } Now an attempt to access the stack variables directly - causes linker errors:Top = -1; // Linker error undefined symbol _Top - defined as static cout << "MAX_STACK = " // Linker error undefined symbol << MAX_STACK << endl; // MAX_STACK is const in intstack.cpp // pop them while ( !empty() ) { cout << "popping " << top() << endl; pop(); }
58
Data Structures
6. Abstract Data Type?
Can this implementation of a stack of integers be classed as an abstract data type? It has been defined in terms of its set of operations. It is encapsulated by being placed in separate files and its representation is hidden from its clients - its state can only be altered through the supplied operations. But only one stack can exist at any one time in any one client program. The client cannot declare instances of the type by e.g. IntStack astack, bstack; This is clear since there is no mechanism provided by the stack module for specifying on which stack the operations are to be carried out - there is only one. This single instance of an encapsulated type is sometimes referred to as an abstract state machine and is simple to implement and useful when only one instance of the type is required at any one time. Later, we will see how a true abstract data type can be defined of which as many instances may be created as the client program requires.
7.
Queues
A queue follows closely the real-world example. Operations are permitted at both 'ends' with additions (enqueue or append) being made at the tail and removals (serve or remove) being taken from the head. Effectively, the elements are ordered physically according to the time of their arrival. It is known as a FIFO structure - first in, first out. Typical operations are:! ! ! ! !
add an element at the tail remove an element from the head return the length of the queue query whether the queue is empty query whether the queue is full
Implementation Again, an array implementation is considered. We need two integers to indicate the head and tail of the queue and possibly a further integer to record the size (although this can be computed from head and tail). const int MAX_QUEUE = 10; static char queueitems[ MAX_QUEUE ]; static int head = 0, tail = -1, count = 0; // A queue of characters
Initially, the indicator (technically cursor) tail is set to a special value to indicate the empty state. The head of the queue can be viewed as being at the 'left hand' or 'bottom' of the array, while the tail grows 'right' or 'up' the array as items are appended.
59
Data Structures
head
1.
Empty
tail
head
2.
append('A')
tail
head
3.
append('B')
tail
head
4.
ch = serve()
tail
head
2 C
5.
append('C')
tail
The problem with this method of handling the array is that as items are appended and served, the queue moves up the array, and will eventually bump up against the end when, in fact, there may be space available lower down caused by elements being removed from the head e.g. A in this case. One solution is to slide all items in the queue down the array once the tail has reached the top, but data moves are relatively expensive - particularly if the queue elements are large. A satisfactory solution is to view the array as circular so that the first element follows on immediately after the last. Spare space in the array caused by removals will always be available for use as long as the number of elements remains below MAX_QUEUE. Instead of simply incrementing head on each removal, and tail on each append, these two cursors must be taken modulus MAX_QUEUE each time they are incremented. Thus, if e.g. tail is presently 9, and a further element is appended, tail becomes ( 9 + 1 ) % 10 = 0, and the newly arrived element is inserted at array element 0.
count = 6
9
tail O
1
M
2
L
7
Process
K
6
J
5
head
60
Data Structures
void enqueue( char element ) { tail = (tail + 1) % MAX_QUEUE; queueitems[tail] = element; count++; } The simplest way of implementing the test for full and empty is to maintain the size of the queue in a variable (e.g. count) within the queue module. As with all data structures based on an array, the storage space is fixed at compile time and the number of items that can therefore be stored is bounded. This inflexibility means that arrays can only be used in cases where the maximum number of components can be determined in advance.
8.
Lists
Basically a list is a sequence of elements, each element other than the first and the last having a predecessor and a successor. Another way of expressing this is that a list is
! !
by their time of arrival, i.e. each successive addition is placed after the previous last, or inversely by their time of arrival - each element is inserted before the previous in a similar way to a stack, although access may be allowed to any element. by some quality of the data e.g. a list of names ordered alphabetically. by requesting insertion at the 'current' position as indicated by some cursor.
Again, an array is considered as the method of representation. However, we find that there is a high cost involved where insertion and deletion is permitted other than at the ends. Each insertion within the list will require all elements following it to be moved up the array to make room, and, since there can be no null elements, each deletion will require all following elements to be moved down to close the gap. The time required to carry out these moves makes this method of representation less than optimal. There are more efficient and flexible ways of implementing lists in cases where insertions and deletions are permitted within the list.
9.
Structs
Frequently there is a need to store information about an entity under a single name where the information describing that entity involves different data types. The struct is an aggregate type that provides this facility:struct student { char name[30]; int age; char coursecode[6]; }; student courserep; // student is a type, not a variable.
61
Data Structures
Each separate data item within the structure is referred to as a data member. Once the new type student has been declared, a collection with that component type can be defined. student aclass[16]; // aclass is an array of 16 students
Access to the members of a struct is by dot notation:strcpy( courserep.name, William Brown ); // simple assignment not allowed courserep.age = 21; strcpy( courserep.coursecode, mit96 ); cout << courserep.name << endl << courserep.age << endl << courserep.coursecode << endl; A queue of students could be declared as:const int MAX_QUEUE = 16; static student stuqueue[ MAX_QUEUE ]; static int head = 0, tail = -1, count = 0; // A queue of students
10. Unions
This is similar to the struct in that it can hold one or more items of different types. It differs from struct in that it can hold only one of its components at any one time. The compiler allocates storage for the largest of the specified members and all members are overlaid onto the same storage. In other programming languages this type is usually known as a variant record. There are two main uses for unions.
!
In cases where different instances of the same entity may have different characteristics, i.e. they are described by a different set of variables. This might arise in a collection of students where part-time students require a record of their employer whereas full-time students do not. In low level programming when a location in memory may be viewed as two different sets of data, e.g. either two separate integer values or a long integer.
Example: typedef short TwoInts[2]; union cheat { Twoints twoints; long along; }; cheat x; x.twoints[0] = 255; x.twoints[1] = 1; cout << x.along << endl;
65791
62
These examples illustrate several things about the data type struct.
! !
The members (referred to as fields in other languages) may be of the same type, or of different types. There is no limit to the number of members, but large records can be built up from other struct types, for instance, type Person has a field birthdate which is itself a struct type (Date). The members may be of any type, including arrays (and other structs) The type name can be used in declarations of arrays whose elements are of struct type, e.g. mscit is an array of 40 elements, each of whose data type is Student. Each Student has a data member called personaldata of type Person; a tutorGrp of type char; and an array of 9 elements of type int called modulemarks. The type-name appearing after the reserved word struct is known as the structure tag. It is desirable that this name (e.g. Date, Person, Student) be unique within its own scope.
! !
As you can see, structures can be used in combination with other structures and with arrays to create arbitrarily complex types capable of modelling many real-world entities.
63
Component data type The elements of an array must all be of the same type whereas structs may contain data members of different types. Assignment An array may not be assigned to another array because an array name is a constant pointer whereas the use of a structure variable name accesses the whole structure. The consequences of this are important:Variables of structure type may be assigned to other variables of the same type. The effect of assignment is to copy all of the fields from the source structure to the target structure (including each element of any array members of the structure). Thus we could write or Jane = Fred; mscit[1] = mscit[2];
Function arguments and return Structure arguments are, by default, passed to a function by value (not as a pointer in the case of arrays). However, a reference argument may be used to reduce the cost of copying large structures and/or to enable any changes to the structure to be reflected in the actual argument. If the objective is to eliminate the cost of copying large structures when a function is called and it is not the intention to modify the structure within the function, then the formal reference argument can be const modified, e.g. void printDate( const Date& aDate ) { cout << aDate.day << '/' << aDate.month << '/' << aDate.year << endl; } There is no intention to change the value of the argument aDate since it is only being output. However, to reduce the cost of copying the actual argument into the formal argument, the formal argument is made a reference to the actual argument Date&. Copying a reference involves only a few bytes. A function may return a structure or a reference to a structure as its result. Example:Date changeDate( Date aDate ); { aDate.year++; return aDate; }
Access to components Elements of an array can be accessed by subscripting the array name as in the example above. The subscript can be a variable that is modified within a loop c.f. the Plane example. This allows computed random access to any array component. The members of a struct, on the other hand, are accessed using dot notation i.e. the structure variable name followed by a dot followed by the member name. The dot is known as the structure member operator. If the member name is itself a structure and access is required to its members, then further dots are required to tunnel down through the member hierarchy, viz.
64
Pointers to structures If a structure is referenced by a pointer then the de-referencing operator applied to the pointer provides the access:Date* dptr = today; // dptr is a pointer to Date and points to the Date today Date dt = *dptr; // dt is assigned the value of today by dereferencing the // pointer dptr However, the structure member operator (dot) has a higher precedence than the dereferencing operator (*). So access to a member of today via the pointer dptr must use parentheses to resolve the precedence:cout << (*dptr).year; // displays the year member of today via the // pointer dptr This type of access is frequently required and the syntax is rather clumsy. A new operator is introduced for this purpose - the structure pointer operator ->. This does two things - dereferences the pointer to access the whole structure, and then accesses the member given after the operator (year in this example). cout << dptr->year;
Initialisation As with arrays, structures may be initialised at the time they are defined, e.g. Date his_birthday = { 1995, 11, 15 };
3.
Storage Management
So far we have only been able to use data items that have been defined at compile-time. Thus, an array defined in the source code of a program as:int table[100]; Will hold 100 integers and, if the requirements of the program exceed this number of elements, then the excess cannot be handled. Clearly this is unsatisfactory. The programmer cannot predict the demands that will be made on his program when it is being used by a client. What may have seemed a generous estimate when the program was written might soon turn out in practice to be a ludicrous under-estimate. What is more, if the estimate is indeed generous, then a large amount of storage space remains unused and therefore wasted because it cannot be used temporarily by other data items. An example is a windowing system like MS Windows. The programmers of Windows could not possibly have worked on the assumption that the number of open windows should never exceed a certain fixed limit. Since that code was written, the memory installed in the average PC has at least doubled, redoubled, and redoubled again. To have fixed this limit 3 or 4 years ago would have put all users in a straight jacket which would now appear intolerable. So how can we create and delete data items dynamically at run-time in response to the demands of the application program? By using the memory allocation and deletion procedures new and delete. The use of these routines is closely bound up with pointers and equivalent facilities are to be found in most of the conventional programming languages such as Ada, Pascal, Modula-2 and C.
65
3.2
3.3
delete
the delete operator has two forms, without brackets for single data items, and with brackets for arrays. Note that, whereas the form of new required the brackets to be placed after the type name:char* chptr = new char[20]; the syntax of delete requires the brackets to be placed after delete delete intptr; // de-allocate memory occupied by int pointed to by intptr delete[] chptr; // de-allocate memory occupied by string pointed to by chptr The effect of delete is to return back to the heap the memory referenced by the pointer (intptr and chptr in the above examples) and not to delete the pointer itself. After this, it is an error to attempt to de-reference these pointers in order to access the item they previously referenced.
3.4
Lifetime
The lifetime of objects allocated by new is from allocation to the earlier of deallocation (via delete) or termination of the program. Notice that lifetime may be different from scope. If a pointer providing access to a dynamically allocated item goes out of scope (perhaps because it is a local function
66
67
a list is either empty or consists of a head representing a single data item followed by a tail which is a list of data items.
Lists may be implemented using arrays but dynamic memory allocation is more flexible in that the list may grow and shrink in response to the demands of the application. The list should be viewed as a series of nodes, each node containing some data and a link to the next node. The link is a pointer to a node, and the node is most usefully implemented as a struct.
Linked List
last first count data Node link data Node link data Node link
For simplicity, a list of integers will be illustrated, but the data contained in a node (struct) may be as large or as complex as the application requires. The node is therefore defined as:struct Node { int data; Node* link; } Each node therefore consists of a data field (in this case an integer) and a pointer to the next node. The list itself can be implemented as a structure containing links to the first and last nodes in the list, and a count of the number of nodes. These links are, again, of type pointer to node. If the list is empty, then the links to the first and last nodes are given the special value 0 referred to above. The same principle will be applied to the link member of the last node in the list since it will have no successor:struct LinkList { int count; Node* first, * last; } The operations for a list are much less closely prescribed than those for stacks and queues since it is a more general structure and access may be provided at any point. There are also several possibilities for the ordering of the nodes. For simplicity therefore, the example shown below will add new items to the end of the list, and remove items from the front. This is therefore, in effect, a queue.
68
4.2
// count of elements = 0 // pointer to first element does not point to anything // pointer to last element does not point to anything
4.3
{ Node* n = new Node; // allocate memory from the heap sufficient to // accommodate a Node and store a pointer it // in n. return(n); // return the pointer as the function's result }
4.4
69
Node *n = newnode();
data
link
Heap
n->data = item; n->link = 0; a) t.last->link = n; b) t.last = n; c) t.count++; LinkList
last first 2 3 3
b) c) a)
1 Node 2 Node 3 Node
70
c) d) a) 1
Node 2 Node 3 Node
b) e)
tempnode
Heap
a) int tempdata = t.first -> data; b) Node *tempnode = t.first; c) t.first = t.first -> link; d) t.count--; e) delete tempnode LinkList
last first 2 2 Node 3 Node
71
4.8
5.
72
Sorting Sorting
1. Introduction
There are two main types of sorting - sorting arrays held in random access memory, and sorting files. In the early period of computing, file sorting tended to be dominant because RAM was very expensive and mass storage was held on magnetic tape, access to which is sequential. In contrast, magnetic disk storage provides the possibility of accessing file records by reference to their position in the file.
2.
Components of Sorting
Sorting involves rearranging the elements so that they are in order. This, in turn consists of two operations:! !
Comparing elements - usually by reference to a key field Moving elements - usually by swapping pairs of elements
There are normally many more comparisons than moves and the number of comparisons will be the most significant operation in terms of time, and therefore the prime indicator of the efficiency of a sorting algorithm.
3.
Sorting Files
Database systems are now universal, and file sorting has become less important. Instead, a number of different indexes are held - either within the data file, or as separate files - that allow the data file to be read (and output) in different orderings. If the amount of RAM permits it, and indexes are not supported, then the fastest way of sorting a file is to read it into an array, sort the array and write the data back out to file. If the file is too big, then it can be broken up into chunks, each of which is sorted in an array and written out to a separate file. Then the several ordered files are merged back into a single file. The traditional file merge requires only 2 elements of the file to be in memory at any one time and works as follows:!
split the original file into two new files writing 1 item to each new file alternately. Then merge back into the original file in pairs, creating n 2 runs of 2 items per run split the original file into 2 writing 2 items to each file alternately. Then merge back into the original file in quadruples creating n 4 runs of 4 items per run split the original file into 2 writing 4 items to each file alternately. Then merge back into the original file in octuples creating n 8 runs of 8 items per run etc.
The sort has finished when the original file contains 1 run of n items. The following is a simplified example based on a file of 8 items. The principle is exactly the same for any number of items.
73
Sorting
Pass 1 Original File Split into 2 files consisting of 1 item from file 1 file 2 Description Files 5 8 3 6 7 2 4 1 5 8 3 6 7 2 4 1
5 8, 3 6, 2 7, 1 4 5 8, 3 6, 2 7 1 4
2.1
5 3 5 6 8 6
3 3 5 3 5 6 3 5 6 8, 3 5 6 8, 1 3 5 6 8, 1 2 3 5 6 8, 1 2 4 3 5 6 8, 1 2 4 7 3 5 6 8 1 2 4 7 1 2 3 4 5 6 7 8
8 2 1 2 4 7 4
only 1 item remaining from this run, write it 3 Split into 2 files consisting of 4 items from
Note that:! ! !
There are only 2 elements from the file present in memory at any one time The process is dominated by I/O time The number of passes required to sort the original file is log2n
n
Passes 3 6 9 12 15 18 21
74
Sorting
4. Why sort?
! ! !
Sorting is used to optimise searching for and retrieving data either by humans or by the computer To produce a report which, because it is sorted, simplifies the manual retrieval of information To make more efficient searches for items held in either main memory or external storage
5.
For very small data amounts of data, sequential searching may be sufficiently fast to avoid the need for sorting But a simple sorting technique can be employed for low data volumes, needing little overhead.
6.
already ordered, or nearly so in random order already inversely ordered, or nearly so Temporary local variables an explicit stack additional space on the system stack for stack frames if a recursive algorithm is used
7.
Sorting efficiency
We are not usually concerned with the absolute amount of time required for a sort. But we are concerned with how the time t taken for a sort varies with the number of items n required to be sorted. If there is a linear relationship, then t will vary directly with n. i.e. it will be O(n). But no O(n) sort has yet been discovered!
75
Sorting
If t varies as a function of n2 then an increase in n by a factor of, say 10 will increase t 100 times and increasing n by 100 will increase t 10,000 times The simple sorting algorithms are all O(n2)
8.
Pass K 44 55 12 42 94 18 6 67
1 8 44 12 42 55 18 6 67 94
2 7 12 42 44 18 6 55 67 94
3 6 12 42 18 6 44 55 67 94
4 5 12 18 6 42 44 55 67 94
5 4 12 6 18 42 44 55 67 94
6 3 6 12 18 42 44 55 67 94
7 2 6 12 18 42 44 55 67 94
Notice that after each pass, the heaviest element in the unsorted part of the array has settled to the bottom, increasing the sorted portion by one and decreasing the unsorted portion by one. The indicators of the efficiency of this algorithm are:Comparisons Max moves Ave moves = = = (n-1) + (n-2) ... + 1
3 3
/2 (n2 - n) /4(n - n)
2
This algorithm can be improved by employing a flag that is set when no exchanges take place on a pass. In this case the array is sorted and no further passes are required. This is an O(n2) algorithm. It is never used in real application because it is the least efficient of all sorting algorithms. It is introduced here because it is relatively easy to understand and so that you will know never to use it!
76
Sorting
9. Insertion Sort
This works in a similar way to the sorting of a hand of cards Pick up the last but one element and place it in the correct order in the last 2 Pick up the last but 2 and place in the correct order in the last 3 etc. If the number of items to be sorted > 1 then For each element k from last item but one down to 0 j=k+1 save = k'th element While j <= last item AND the key of save > the key of the j'th element r[j-1] = r[j]; increment j endwhile r[j-1] = save endfor endif
1 7 6 44 55 12 42 94 18 6 67
2 6 18 44 55 12 42 94 6 18 67
3 5 94 44 55 12 42 6 18 67 94
4 4 42 44 55 12 6 18 42 67 94
5 3 12 44 55 6 12 18 42 67 94
6 2 55 44 6 12 18 42 55 67 94
7 1 44 6 12 18 42 44 55 67 94
= =
= =
On average, there are half as many comparisons as Exchange sort The algorithm is efficient if the data is already in order It is an O(n2) algorithm It is stable - equal keys are not moved. This can be important if 2 or more consecutive sorts are required - each using a different key - the second being the tie breaker when the first keys contain duplicates.
77
Sorting
10. Simple Sort performance
Selection Sort (not covered in this note) Moves Compares Worst Average Best 3(n-1) 3(n-1) 3(n-1) n(n-1) n(n-1) n(n-1) Insertion Sort Moves n(n-1) n(n-1) 2(n-1) Compares n(n-1) n(n-1) n-1 Exchange Sort Moves 1.5n(n-1) 3/4n(n-1) 0.00 Compares n(n-1) n(n-1) n-1
Ordered Random
11. Conclusions
11.1 Insertion sort is better for small data items and large keys. It also gives good performance when the data is already ordered (or nearly so). For this reason it is often used in conjunction with advanced sorting algorithms, e.g. Quicksort 11.2 Exchange sort is the slowest sorting algorithm and is only used in teaching or trivial applications because it is the simplest to code 11.3 Selection sort (not shown) is better for large data items with small keys. It has shown slightly better performance than Insertion on inversely ordered data
Shell sort - derived from insertion sort Quicksort - See later Heapsort These are in a different class to the simple sorts. The number of comparisons tend to vary in proportion to n.log2 n and they are therefore O(n.log n) sorts.
78
Sorting
13. QuickSort
This was invented by C.A.R. Hoare - a famous Oxford professor of computing and is an advanced algorithm, based on the exchange sort, that normally employs recursion. It is the most efficient of the advanced sorts although it becomes inefficient under certain very exceptional conditions. The more data items, the less likely these conditions are to arise. Insertion sort is often used in conjunction with Quicksort to sort small partitions. The technique is to split the array into two partitions and then to sort the first partition followed by the second partition:void QuickSort( AnyType array[] ) { If sorting is needed then split array into partitions S1 and S2 QuickSort(S1); QuickSort(S2); EndIf } All the keys in partition S1 must be less than (or possibly equal to) each of the keys in partition S2. The recursive routine sorts successively smaller and smaller partitions until a partition contains only one item and is therefore sorted The partitions are portions of the array itself - described by starting and ending indexes, and not some additional temporary data structure. Here is a refinement of the first description using four array index variables void QuickSort( AnyType array[], int first, int last ) { if( first < last ) { split the array into 2 partitions QuickSort( array, first, last_of_first_partition ); QuickSort( array, first_of_last_partition, last ); } } The 'partition' portion of the algorithm is where all the work is done. the second and third statements are simply recursive calls to the function itself. The partitioning process ensures that all items in the first partition have values that are <= all items in the second partition - although neither partition is necessarily sorted. One of the keys in the partition currently under consideration is selected as the pivot (the central element in this example) The items in the current partition are scanned
! ! !
first from left to right looking for an element >= pivot then from right to left looking for an element <= pivot when each scan has stopped, and provided the scan indexes have not crossed over, the two items are swapped.
79
Sorting
Pivot 44 Scan Swap 18 Scan Swap 18 6 12 42 94 55 44 67 55 12 42 94 6 44 67 Scan 55 12 42 94 6 18 67 Scan
1st Partition
2nd Partition
Scanning continues until the 2 pointers cross over. The pivot is now in its correct position in the array and is no longer involved in the partitioning. It may have been moved from its original position. Quicksort is called recursively to partition the lower and upper partitions, provided there are at least 2 elements in them
14.3 Average
For all possible orderings of the keys 1.39n.log2n. Mathematicians can see the proof in Algorithms - see para 17. below.
80
Sorting
15. C++ code for function Quicksort ( see Wirth )
void QuickSort( int array[], int first, int last ) { int lb = first, ub = last; // lower bound and upper bound int pivot = array[ (first + last) / 2 ]; // pivot = central element int temp; // for the swap do { while ( array[ lb ] < pivot ) // search up for item >= pivot lb++; while ( pivot < array[ ub ] ) // search down for item <= pivot ub--; if ( lb <= ub ) // if not crossed over, then swap { swap ( lb, ub ); // swap elements using their index lb++; // increment ready for next scan ub--; // decrement ready for next scan } } while ( lb <= ub ); // until indexes cross over if ( first < ub ) // if > 1 item in the partition QuickSort(array, first, ub); // partition the lower partition if ( lb < last ) // if > 1 item in the partition QuickSort(array, lb, last); // partition the upper partition }
16.3 Quicksort - is significantly faster than either of the above whatever the initial ordering of the data.
500 T 400 i 300 m e 200 100 Shell Sort Heap Sort Quicksort
81
Testing Testing
1. The context for testing - Verification and Validation
Verification and Validation is a generic term for all processes which ensure that the software meets its requirements, and that the specification meets the needs of the client. In other words, Verification means Are we building the product right? This involves checking that the software product conforms to its specification Validation means Are we building the right product? This involves checking to ensure that the software product meets the expectations of the client
Techniques required
!
Analysis of the design and program listing. Includes Walkthroughs, Inspections, Formal verification Exercising the program using test data similar to real data, i.e.
2.
To show that the software system meets its specification. To exercise the system in such a way that any latent defects are exposed.
Testing cannot prove the absence of defects, only their presence. A successful test is one that discovers defects.
are effectively infinite. For large programs, testing all possible combinations of pathways through the code and all possible variations in categories of input would take until the end of the universe even at the rate of one test per millisecond.
83
Testing
3. Testing & Debugging
! !
Testing is required to discover errors in software. Debugging is the process of correcting errors discovered by testing.
Locate Error
Design Repair
Repair Error
Re-Test
It is much more economical to discover errors at the design stage than after the program has been coded because this avoids the correction process i.e. it avoids the need to debug and re-test.
4.
Bottom-up Top-down
4.1
Bottom-up testing
As each component (e.g. function or module) is developed it is tested 'stand-alone' by using a specially written 'test harness' or 'test driver'. This is referred to as unit testing. In C++ a module is a file pair - the interface (header file) and the implementation (object code file). Usually this pair will implement either:! !
A set of useful functions, e.g. iostream, math An abstract type, e.g. a linked list or string abstraction
Re-usable components (e.g. a linked list module) should be distributed with test drivers. Individual components e.g. functions are tested to ensure that they operate correctly. Each component is treated as a stand-alone entity that does not need other components in order for it to be tested. Functions are assembled into modules that are then tested. - module testing. Several modules may be amalgamated to produce sub-systems which are then tested - sub-system testing. One of the problems that module or sub-system testing might reveal is a mismatch between the interfaces. This can occur when the module using the facilities of another module has been designed on assumptions that differ from those made in the design of the module. This might result from a lack of understanding of the interface specification on the part of either the author or the user of the module. Or it might be caused by an error in implementation.
Unit Testing
Module Testing
SubSystem Testing
System Testing
Acceptance Testing
Component Testing
Integration Testing
User Testing
Finally, all modules are combined to produce the program - system testing.
84
Testing
After this, the user carries out acceptance testing. For bespoke systems developed for a single user, this is sometimes referred to as alpha testing. For marketable software products beta testing may be used where a number of users agree to use the system and to report on any problems. In exchange for this they may get the software either free or at a preferential rate.
Advantage It is easier to create test conditions. The functionality is there - it just needs code to test it. Disadvantages If combined with top-down development, all system components must " be available before testing can start because the last items to be completed under this development strategy are the lowest level components - the first to be tested. " If top-down development is not employed, then special test drivers have to be written for each component. Eventually these are replaced by the actual higher level components when they are implemented.
4.2
Top-Down Testing
This starts with a skeleton of the system. An 'executive module' (at the top of the hierarchy). Some or all of lower level modules may not have been implemented and exist only as stubs. Stubs are functions whose body has not yet been implemented. They simply report e.g. the name of the function or the value of the arguments and/or return a dummy value. Initially, the tests are very limited - the purpose is only to exercise the interfaces between major sub-systems. As more and more modules are implemented the tests can become more comprehensive.
Advantages! !
The testing process matches the top-down design approach. Structural errors - perhaps faults in the design are found earlier. This may avoid extensive re-design at a later stage. The availability of a limited working system is a morale booster and may be available to demonstrate to client. It may be difficult to provide stubs which simulate the behaviour of a complex component. In most systems, output is generated by lower level modules. There may therefore be a need for an artificial environment to generate test results for higher level modules.
Disadvantages
!
4.3
Conclusion
The top-down approach is generally considered preferable for most systems today Yourdon. But, in practice, it will always be necessary to include a certain amount of bottom up testing of low level components.
85
Testing
5. Categories of Testing
5.1 Functional testing
The most common form. Its purpose is to ensure that the program performs its normal functions correctly - see above.
5.2
Thread testing
This may be used in real-time systems which are usually made up of a number of co-operating processes. An external event such as an input from a sensor may cause control to be transferred from the current process to the process that handles that event. Real time systems are difficult to test because of the time-dependent interactions between the processes. An error may occur only when the processes are each in a particular state. Thread testing follows the functional testing of the processes and is designed to trace the effect of the different external events as they thread through the various processes. The number of combinations of state of the various processes may be so great that it is impossible to test all of them, e.g. 10 processes, each with 10 possible states produces 10,000,000,000 different combinations.
5.3
Recovery Testing
Purpose - to ensure that the system can recover from various types of failure. This is important in on-line and real-time systems e.g. controlling manufacturing processes. It may be necessary to simulate in software such failures as hardware, power, operating system etc.
5.4
6.
Test Planning
The planning of tests should be carried out during the Specification and Design phases of the software project:-
Req'ments Spec
System Spec
System Design
Detailed Design
Service
Acceptance test
86
Testing
6.1 Test Plan & Test Log
The Test plan includes
! ! ! !
A unique identifying number for the test. A description of the purpose of the test. A specification of the data to be used. A description of the expected result.
A reference to a test plan item. The date of the test. The result of test. An indication of whether or not expected result was obtained. A reference to any corrective action required if a fault is found. A possible reference to re-testing if this is needed.
7.
8.
Test data can sometimes be generated automatically, but it is impossible to generate test cases automatically.
9.
These two methods are NOT alternatives. White box testing may be carried out early in the testing process, while black box testing may be applied later. They are likely to uncover different classes of error.
87
Testing
10. Black box testing
There are two techniques for deriving the test data ! !
any value in range 18..65 any value in range MIN(int)..17 any value in range 66..MAX(int)
Test cases can then be designed for each valid equivalence class and for each invalid equivalence class - a total of 3 tests in this simple case. If there is more than one argument, the test cases should cover the invalid classes for only one argument at a time because one erroneous argument may mask the effect of another erroneous argument.
Another Example
/* Pre Post -
bool binsearch( int array[], int numitems, int target, int& location )
The array is ordered, numitems >= 1, numitems <= no. of array elements If target is present in the array, then location records the element number at which target was found and true is returned, else location records the correct insertion point and false is returned */
{ int low = 0, high = numitems - 1, mid; bool found = false; do { mid = (low + high) / 2; if( target > array[ mid ] ) low = mid + 1; else high = mid - 1; } while( target != array[ mid ] && low <= high ); found = ( target == array[ mid ] ) if ( found ) location = mid; else location = low; return found; }
88
Testing
Valid Equivalence classes for input arguments:The choice of VECs may require experience, e.g. that the binary search of an ordered array may, if not correctly coded, behave differently depending on whether the number of items stored in the array is odd or even, or if there is only one item.
!
Array "
"
"
!
has 1 item (numitems = 1) has even number of items (e.g numitems = 6) has odd number of items (e.g. numitems = 7)
Target is present in the array " " is not present in the array
Invalid Equivalence classes for input arguments:These are all cases where the pre-conditions are not met. The specification of the binsearch function says nothing about how it will respond to such error conditions. C++ provides the facility for an exception to be raised in such cases and for error handlers implemented elsewhere in the code to catch the exception and take the necessary action. In a production program these invalid equivalence classes would be tested to ensure that the exception and handling mechanisms dealt correctly with the various causes of the error.
valid return values (there are no invalid return values) non-zero (true) " zero (false) "
89
Testing
10.2 Boundary Value Analysis
This complements equivalence partitioning and, in practice, is used at the same time as equivalence partitioning to determine the test data required for testing a component. Boundary values are those directly on ! just below ! just above the boundaries of the equivalence classes
!
It is an observed fact that a greater number of errors occur at the boundaries of the input domain than in the centre.
Examples
! ! ! ! ! ! !
Range of values, e.g. 18..65 Test 17,18,65 and 66 Discrete set of values, e.g. 2, 3, 5, 8, 13 Test 1, 2, 13, 14 Data structure (e.g. array) has 1..100 elements Test 0, 1, 100, 101 Loop iterations, none, 1, 2, max, max + 1
has 1 item (numitems = 1) has even number of items (numitems = e.g. 6) has odd number of items (numitems = e.g. 7)
= 6 combinations of valid equivalence classes Experience shows that programmers often make errors in an algorithm due to a misunderstanding of its behaviour at the boundaries of its input domain. In the case of the binary search algorithm, these errors might occur when the target (if present) is located in the first element of the array, or in the last element. Obviously it is necessary also to test the normal case when the target is in neither of these locations.
90
Testing
Thus the further test cases are added to those above:Target is in first element of the array ! Target is in the last element of the array ! Target is in neither the first nor the last element When the equivalence classes already developed are combined with these boundary values, the following 10 test cases arise:! ! ! ! ! ! ! ! ! ! !
numitems = 1, target is present numitems = 1, target is not present numitems is even, target is in the first element numitems is even, target is in the last element numitems is even, target is present and in neither the first nor the last element numitems is even, target is not present numitems is odd, target is in the first element numitems is odd, target is in the last element numitems is odd, target is present and in neither the first nor the last element numitems is odd, target is not present.
Branching Decision
Statement Block
Loop twice
How many different sets of paths exist for this simple piece of code?
1 First iteration Second iteration A A 2 A B 3 A C 4 B B 5 B A 6 B C 7 C C 8 C A 9 C B
91
Testing
The answer is 9, i.e. 3 paths raised to the power number of loop iterations. And this? The answer is 95,367,431,640,625 = 520 different sets of paths. Evaluating every possible set of paths at 1 test/millisecond would take 3,022 years. So exhaustive testing is not possible. In practice, tests should guarantee that
! ! ! !
Each path (not necessarily all sets of paths) has been exercised. All logical branches have both values tested (true and false). All loops are exercised at their boundaries and within their bounds. All internal data structures have been exercised to ensure their validity.
loop 20 times
But why do we need to go to all this trouble? Wouldn't we spend our time better simply ensuring that the function/module/program requirements have been met? In other words why don't we confine our tests to black box testing?
Because
!
Logic errors and incorrect assumptions tend to occur in inverse proportion to the probability that a path will be executed. Normal processing tends to be well understood and scrutinised, but special cases tend to fall down the cracks.
! !
We often believe that a path is unlikely to be executed when, in fact, it may be executed regularly. Typing errors are usually picked up by the compiler. But those that are not detected are just as likely to occur on an obscure logical path as on a mainstream path.
Statement coverage Condition coverage Branch testing " " Domain testing Loop coverage
92
Testing
12.3 Path testing
A technique for finding the number of unique paths through a program thus providing the number of test cases. Uses flow graphs derived from the program code or from the PDL (program description language) for the routine + metrics for calculating the cyclomatic complexity.
If Repeat
Number of regions (including the one outside the graph) Number of edges - number of nodes + 2 Number of predicate nodes + 1. (Predicates are simple 2 branch constructs. Each diamond in the flow chart opposite is a predicate).
low = mid + 1;
Each of these three methods produces the same cyclomatic complexity metric (i.e. the number of independent paths through the code). In this example = 5 The number of independent paths also provides the number of different test cases required to ensure that all statements are exercised.
if ( found )
location = mid;
return found;
93
Testing
12.6 Condition Testing
Conditions are made up of:!
mid = (low + high) / 2;
Arithmetic & character expressions involving arithmetic and character variables and constants Relational expressions - logical expressions involving arithmetic and character expressions and relational operators. They have the value of either TRUE or FALSE. Boolean variables.- Values Non-zero (TRUE), zero (FALSE). Boolean operators (&&, ||, !) joining one or more logical expressions. Parentheses surrounding simple or compound conditions
if ( found )
R4 location = mid;
Condition testing
Focuses on testing each condition in the component (including each of the simple conditions making up a compound condition).
Flow graph for binary search
return found;
The advantages of condition testing are i) it is easy to generate test cases and ii) it is likely to reveal other errors in the program.
94
Testing
Example
if ( A > 1 && B == 0 ) X /= A;
A>1
TRUE / FALSE T T F F Value 3 3 1 1 T F T F B == 0 TRUE / FALSE Value 0 1 0 1 A > 1 && B == 0 TRUE / FALSE T F F F
For the above 2 conditions there are 4 test cases i.e. 22. For 3 conditions, there are 23 = 8 possible combinations etc. This technique is therefore only practicable for small numbers of conditions.
There are therefore 3 test cases for each of the two variables in the example compound condition, leading to 32 = 9 test cases. Again, the number of test cases rises rapidly as the number of variables involved in a relational expression increases.
Simple loops
Nested loops
Concatenated loops
95
Testing
Simple loops
The following tests should be applied to simple loops, where n is the maximum number of allowable iterations of the loop:! ! ! ! !
Skip (loop is not entered) One pass 2 passes m passes (m < n) n - 1, n, n + 1 passes
Nested loops
The number of times that statements within the inner loop are executed is the product of the number of iterations of all nested loops within which it appears. Thus a triply nested loop, where each loop iterates 10 times, will cause statements in the inner loop to be executed 1,000 times. The number of test cases grows geometrically and full testing may be impracticable. The suggested solution is:a) b) c) d) Start with the innermost loop, setting all outer loop control variables to their minimum. Test the inner loop as Simple above. Work outwards to next innermost etc. keeping outer loop control variables at their minimums, and the inner at typical values. Continue until all nested loops have been tested.
Concatenated loops
Where the concatenated loops are independent of each other, treat each as a simple loop. Where the second loop has the same control variable as the first and starts with its value unchanged, treat the two loops as nested.
Static Analysers Carry out a static analysis of the program's structure and format. Code auditors Special purpose filters that check the quality of software to ensure it meets minimum coding standards. Assertion processors
96
Testing
!
The programmer writes assertions about the state of program. The assertion processor tests whether they are true or false. C incorporates a simple form of assertion testing:#include <assert.h> int main ( void ) { int i = 0; for( ; i <= 10; i++ ); assert( i == 10 ); return 0; } /* Assertion failed: i == 10, file ASSERT.CPP, line 7 Abnormal program termination */ C++ provides exception handling which gives greater flexibility and permits an exception handler to attempt recovery from an error.
! ! ! !
Test file & Test data generators Test verifiers - measure and report on internal test coverage Test harnesses - Allow the program to be installed in a test environment, and fed input data. The behaviour of subordinate modules is simulated by stubs. Output comparators - compare output from the current version of program with that from an earlier version to determine any differences This is an area of growing importance and descendants of the first generation testing tools are expected to cause radical changes in the way software is tested.
97
We can find the address of the next name by adding 8 to the address of Scheme 1 the current element. Thus, Scheme 1 implements the logical structure of the data by locating its elements in physically adjacent memory locations. But if we wish to retrieve a name (in order to access some other data associated with it), then we would have to scan the list from the start, looking for the name to be retrieved. Scheme 2 Each name is positioned in memory according to the value of its first letter. The address for a particular name is found by 1000 + 8 * (int(firstletter) - int(`A')) In this case there is no way of finding the logical successor of a record. We are prevented from operating on the data using its logical structure. But if we wished to retrieve a particular name, we could do so very quickly by calculating the address directly from the name. Scheme 3
Address 1000 1008 1016 1024 1032 .. 1096 Name Arnold Conrad Dickens Eliot .. Milton
Scheme 2
Each element contains both a name and an address pointing to the Name element's logical successor. Given the address of any element, we Address can find its successor by simply going to the address contained in 992 that element. 1000 Milton Scheme 3 implements the logical order by linking the elements 1008 Dickens together in the proper sequence which is not the same as the 1016 Eliot physical sequence. Address 992 is used to hold the address of the 1024 Arnold first name in the list. Milton has a blank successor address field
1032 Conrad
Scheme 3
As with Scheme 1 we cannot find a given name other than by starting at the beginning of the list and comparing each successive name with the target. These three schemes illustrate the three fundamental methods of implementing abstract list data types - by an array, a hash table and a linked list.
99
3.
Metrics
One way of implementing a list is to use an array. It is true that arrays are relatively unsuitable for this purpose because of their inflexibility and because of the need to shuffle array elements down to fill the hole left by a deletion, but they have the advantage of requiring no overhead in terms of space. Linked lists, of course, carry an overhead in the form of the links (pointers) that connect the nodes. Envisage then a list implemented as an array as in Scheme 1 above and assume that we wish to find the name Eliot in the list.
3.1
Number of Comparisons
We simply start at the first name in the list (Milton) and search through the list, comparing each name encountered with Eliot. One measure of the time required to find this name is the number of comparisons made of each name with the target. Unless the list is very short, the time required to initialise and finalise the search will be relatively unimportant when set against the number of comparisons. It is generally true that the number of comparisons made when searching a data structure will be one of the major factors in determining the speed of execution.
3.2
100
4.
Mathematical Notations
One way of ascertaining the efficiency of algorithms used in operations on data structures is to write a program which tests the algorithm on a large number of different types and sizes of data. This approach is useful in trying to understand an algorithm and the factors which affect its efficiency, but the problem is that:a) b) The data would only be valid for the computer, operating system and language we have employed and the nature of the data stored in the data structure. We could not possibly examine exhaustively all possible combinations of data (there are over 358,000 different combinations of just four characters, ignoring case). We would finish up with a mass of results which would be difficult to understand and distil into a general indication of the efficiency of the algorithm under consideration.
c)
We require a crude indicator of the time complexity of an algorithm that relates the time taken to the number of elements held in the data structure. We are not particularly concerned with the absolute amount of time, which, for one algorithm, will depend on the factors mentioned in a) above. Looking at the search example above, how many comparisons, on average will be required to find a name in the list? Let n denote the number of names in the list:Element Number 1 2 3 .. .. n Number of Comparisons 1 2 3 .. .. n
To find the average number of comparisons necessary to locate a name present in the list, we first find the total required to find each of the names, and then divide by n. Thus, n comparisons would be needed to find the last name, n - 1 to find the last but one ... through to just one comparison to find the first. We can calculate the average number of comparisons for n items without needing to know the value of n:-
101
Since there are n items in the sequence, the total of the third row is n(n + 1). To find the average number of comparisons, we need to divide by n and also by 2 since we added the 2 sequences together. (n + 1) Divide by 2n to find the average for any one name n(n+1) ie 2n Thus the average number of comparisons required to find a name in the list is about half n whatever the value of n. Since we have seen that the number of comparisons is a major determinant of the time required, we can say that the time taken for this search is proportional to (n + 1). Since the constant is not significant in relation to other possible factors of n, we can say that the order of magnitude of the efficiency of the search is n, and we write this as O(n). This is sometimes referred to as the Big O notation. Only the dominant term is chosen to represent a crude notion of the order of magnitude of the entire expression, eg n(n+1) 15n logn + 0.1n2 + 5 6 logn + 3n + 7 2n - 5 Why is the second item above classified as O(n2) when this appears to form only a small part of the expression? Table 1 shows the value of this function for various values of n. The last column shows the value of the expression divided by 0.1n2. Note that from n = 512, the value in this last column starts to settle down to about 1.0 indicating the overwhelming importance of the 0.1n2 component.
n 15n log2n 3 5 7 9 12 16 20 0.1n
2
is is is
/ 0.1n
8 120 32 480 128 1,920 512 7,680 4,096 61,440 65,536 983,040 1,048,576 15,728,640
TABLE 1
Table 2 gives some idea of the values of several different functions of n. Some simple sorting methods (e.g. Exchange or Bubble sort) operate in a time which is O(n2) whereas other, more complex algorithms (e.g. Shell sort and Quicksort), operate in a time which is O(nlog2n). If there were 1024 items to sort, the simple method would take approx 1,000,000 units of time. Compare this with a time of only approx 10,000 for the O(nlog2n) sort. However, it is not true to say that the complex sort is 100 times faster than the simple sort. Because of its complexity, the more powerful O(nlog2n) sort will carry an overhead which results in constants which are present in the true value of the function but which are ignored in arriving at the crude order of magnitude value. For this reason also, the complex sort may not be as fast as the simple sort for small values of n.
102
8 8 64 3 3 24
In some sources you may find logarithms specified without the base, eg O(nlogn). Does it matter which logarithm base in used in these order of magnitude expressions? The answer is no, because, although the absolute values of the expressions will differ according to the base used, the rate of increase of the function for increasing values of n will remain the same for all logarithm bases. Table 3 illustrates this by showing the values of the expression O(nlogn) for logarithms base 2, e and 10 and for values of n which double in each row. Note that the rate of increase is exactly the same for all three bases, and is approximately 2.2 times for each doubling of n.
n 128 256 512 1,024 2,048 4,096 8,192 16,384 n.log2n 896 2,048 4,639 10,240 22,528 49,152 106,496 229,377 rate of increase 2.29 2.27 2.21 2.20 2.18 2.17 2.15 n.ln n 621 1,420 3,216 7,098 15,615 34,070 73,817 158,991 rate of increase 2.29 2.27 2.21 2.20 2.18 2.17 2.15 n.log10n 270 617 1,397 3,083 6,782 14,796 32,058 69,049 rate of increase 2.29 2.27 2.21 2.20 2.18 2.17 2.15
103
Trees Trees
1. Applications
! ! !
Trees are hierarchical structures and can be used in any application that models a hierarchical structure, e.g. disk directory and file structure. In some forms they can provide rapid searching and lookup They can maintain their data ordered (usually on a unique key that is associated with their data)
2.
Implementation
Trees cannot normally be based on a fixed size structure such as an array. They are normally implemented using dynamically allocated nodes linked by pointers.
3.
Variations
! ! ! ! !
Binary Search trees Expression Trees Balanced Trees N'ary Trees B Trees
4.
Example Declaration
struct DataItem { int key; anytype value; }; struct Node { DataItem data; Node* left, *right; }; struct BinaryTree { int count; Node* root; }; // key to search on // depends on the application
105
Trees
5. Expression Trees
Assume the expression ( 3 + 4 ) * ( 6 - 4 ) is to be evaluated. Parsing and evaluating an infix expression of this sort in a single pass is very difficult because the string has to be searched back and forth to recognise and allow for the modifying effect that the parentheses have on the meaning of the expression. A tree of nodes representing operators ( +, -, *, / ) and values (or variables) can be built to represent the semantics of the expression without the parentheses. The tree can then be traversed to retrieve the symbols and values in an appropriate order for evaluation - see Traversal below.
#
6.
Tree Traversal
There are several possible ways in which the tree can be traversed, the most common are known as inorder, postorder and preorder:Inorder PostOrder PreOrder <left tree> Node <right tree> <left tree> <right tree> Node Node <left tree> <right tree> (3 + 4) * (6 - 4) 34+64-* *+34-64
The post order traversal would produce the nodes in an order suitable for evaluating the resultant postfix expression using a stack. The algorithm for binary tree traversal is one of the most elegant in computer science. It is recursive:void inorderTraverse( Node* p ) { if ( p != 0 ) { inorderTraverse( p->left ); Process( p-> data ); inorderTraverse( p->right ); } } Process( p-> data ) is the operation that is to be carried out on each node. Note that this algorithm effectively maintains its own stack of nodes visited but not yet processed. This is represented by the series of stack frames that is pushed onto the system stack for each call to the function. A non-recursive version of this algorithm requires an explicit stack of nodes to be maintained and is quite inelegant when compared to the above.
106
Trees
7. Parse Trees
Sentence Subject Object Noun Verb = = = = = Subject Verb Object Noun | Noun Phrase Noun | Noun Phrase Cat | Mat | Dog sat | ate | chased
Noun Subject
OR
Sentence
Verb
Object
OR
Noun Phrase
Noun
Noun Phrase
Parse trees such as the above very simple example above may be used in natural language recognition and language translation software.
8.
either empty or consists of a node with left and right binary trees
Binary search trees are ordered on a unique key field. The first data item to arrive causes a new node to be allocated which becomes the root node. Access to the tree is always via the root. For subsequent additions, the tree is traversed, looking for an empty left or right child node starting at the root. If the key of the data to be added is less than that of the current node, then the left child of the current node is visited. If the data to be inserted is greater than that of the current node, then the right child is visited. If the two data values are equal, then the data cannot be added since binary trees rely on the keys being unique. Eventually, an empty left link or right link is encountered. A new node is allocated and linked in to the tree as the left, or right child of the node currently being visited. All additions therefore take place at the lower levels of the tree - as leaf nodes.
!
Searching for 6 Left, Right, found Searching for 11 Right, Left, Right, not found Inserting 13 Right, Right, Left, not found so Insert as Left child of 14
2 6 4
12
10
14
The total number of nodes in a perfectly balanced binary search tree is 2 20 levels, the total number of nodes would be 1,048,575.
Level
The efficiency of a perfectly balanced tree is measured by the average number of comparisons required to find a key that is present in the tree. Since it requires one comparison to visit the root node, two comparisons to examine the root node and one of its child nodes etc. the maximum number of comparisons is the number of levels and, since the number of nodes doubles at each level, the average number of comparisons for a perfectly balanced tree is the number of levels - 1. Thus for a perfectly balanced tree of 1,048,000 nodes, the average number of comparisons is Number of Levels - 1 = 19.
107
Trees
This makes binary search trees a suitable structure for fast retrieval of data by reference to a key and, for this reason, the C++ Standard Template Library uses balanced binary search trees to implement searchable structures such as map and set.
9.
Importance of Balance
This tree was generated by inserting the data in numeric order - 2, 4, 6 .. 16. If, as in this case, the tree is not balanced, search efficiency degrades towards a simple sequential search, i.e. from an average number of comparisons = Level - 1 to (n + 1). There is little difference between the two in this small example but, for large numbers of items, the difference in searching efficiency is extremely large.
2 4 6 8 10 12 14
16
AVL Trees (from Adelson-Velskii & Landis) employ a balancing algorithm on every insertion and deletion which ensures that the tree maintains an adequate (although not perfect) balance. Another algorithm is red/black trees that are used in the Standard Template Library.
108
Trees
Level 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Nodes in Level 1 2 4 8 16 32 64 128 256 512 1,024 2,048 4,096 8,192 16,384 32,768 65,536 131,072 262,144 524,288
2^ level 2 4 8 16 32 64 128 256 512 1,024 2,048 4,096 8,192 16,384 32,768 65,536 131,072 262,144 524,288 1,048,576
No. of comparisons for Total Nodes level = (2^level - 1) level * nodes in level 1 1 3 4 7 12 15 32 31 80 63 192 127 448 255 1,024 511 2,304 1,023 5,120 2,047 11,264 4,095 24,576 8,191 53,248 16,383 114,688 32,767 245,760 65,535 524,288 131,071 1,114,112 262,143 2,359,296 524,287 4,980,736 1,048,575 10,485,760
Total comps 1 5 17 49 129 321 769 1,793 4,097 9,217 20,481 45,057 98,305 212,993 458,753 983,041 2,097,153 4,456,449 9,437,185 19,922,945
Ave comps per node 1.000 1.667 2.429 3.267 4.161 5.095 6.055 7.031 8.018 9.010 10.005 11.003 12.002 13.001 14.000 15.000 16.000 17.000 18.000 19.000
109
Compilers (see later under perfect hashing functions) Basis for other Abstract Data Types, e.g. Set, Dictionary Very efficient retrieval
2.
Operations
! ! !
3.
Efficiency
The measure of efficiency of searching and sorting is given using the big O notation (see Data Structure Metrics on page 99). This is a very crude measure of the relationship between time and the number of items being dealt with. The important factor is the rate at which time increases as the number of items increases. Hash tables are unique among data structures in that their efficiency is not dependent on the number of items stored and their efficiency is therefore given as O(1).
4.
Problem
The penalty paid for this exceptional measure of efficiency is that hashing destroys the lexical order of keys, so that they cannot subsequently be retrieved in their lexical order.
5.
Hashing
Data is stored in a Hash Table that is based on the fundamental array structure provided by the language. The size of the table is always a prime number. Insertion (and searching) is performed by applying some function to the key which converts it into an integer in the range 0 .. table_size -1. The modulus operation is used to achieve wrap-around. In this example the column headed ASC represents the sum of the ASCII codes of the first 3 characters of the name. This is then taken modulo 11 (the table size) to produce the table index. The insertion of the first three items is Key ASC Table Index Name shown in the hash table (second of the two tables). The fourth key BYR produces the same SHELLEY SHE 224 4 index as that of WORDSWORTH - a collision. WORDSWORTH WOR 248 6 This is not surprising since we are trying to KEATS KEA 209 0 insert a very large domain of values into a table BYRON BYR 237 6 with only 11 locations.
BLAKE BLA 207 9 BETJEMAN BET 219 10
111
Hash Tables
6. Collision Resolution
There are two strategies for resolving collisions:!
Open Addressing A second hashing function is used to give a new table location and a further attempt is made to enter the key into the table. The simplest function to produce a new location after a collision is to successively add 1 to the result of hashing the key. But this can cause clustering where the relative density of certain areas of the table is higher than average. This can give rise to a higher than necessary number of collisions. An improved second hashing function is:hashvalue = hashvalue + step where step = hashvalue % ( table size - 2) + 1 step is computed only once before the loop is entered.
Data
SHELLEY WORDSWORTH
Probing continues until an empty slot is found or, after a certain number of tries, the table is deemed to be full.
!
Chaining The Table entry contains a data entry and a pointer to the head of a list of data items that collided with the first or, more simply, just a pointer to the head of a list.
BYR BYRON
7.
void init(void ) { for ( int i = 0; i < TABLESIZE; i++ ) tabl[i].occupied = false; theSize = TABLESIZE; itemcount = 0;
112
Hash Tables
} void add( const String& key, const String& data ) { // for best efficiency, the number of occupied slots should be <= // 80% of table size if ( itemcount > theSize * 8 / 10 ) { resize( ); } int hash = key.hashvalue(); // key must support a hashvalue function int step = hash % (theSize - 2) + 1; // step size for collision resolution hash %= theSize; // hash mod table size int numprobes = 1; // to count the number of probes // look for an unoccupied slot bool foundslot = ( !tbl[hash].occupied ); // loop not entered if unoccupied slot found first time while( !foundslot && (numprobes < theSize) ) // second cond is belt & braces { hash = ( hash + step ) % theSize; foundslot = ( !tbl[hash].occupied ); numprobes++; } assert( foundslot ); // should always be true tbl[hash].Key = key; // store the key tbl[hash].Data = data; // and the associated data tbl[hash].occupied = true; // slot is now occupied itemcount++; // increment count of items }
8.
113
Hash Tables
[2] do
[3] end [4] else [5] case [6] downto [7] goto [8] to [9] otherwise [10] type [11] while [12] const [13] div [14] and [15] set [16] or [17] of [18] mod [19] file [20] record [21] packed [22] not [23] then [24] procedure [25] with [26] repeat [27] var [28] in [29] array [30] if [31] nil [32] for [33] begin [34] until [35] label [36] function [37] program
114
Libraries Libraries
1. The ctype library
This is a 'C' library of functions that operate on characters. They include functions to test whether a char is a letter, a digit, punctuation etc. and also to carry out case conversion. The functions available from ctype.h are:int int int int int int int int int int int int int int int isalnum(int c); isalpha(int c); isascii(int c); toascii(int c); iscntrl(int c); isdigit(int c); isgraph(int c); islower(int c); isprint(int c); ispunct(int c); isspace(int c); isupper(int c); isxdigit(int c); tolower(int c); toupper(int c);
The use of int instead of char in the return and argument types is historical. For the is.. functions, the return type can be understood to be boolean, In all cases the argument type can be read as type char. Help on each on these functions is provided from the RHIDE menu Help.libc reference. functional categories.ctype.
115
Libraries
2. The maths library
These are to be found in math.h. To use them you need to #include <cmath> or #include <math.h> The functions and constants to be found are:
double acos(double x); double asin(double x); double atan(double x); double atan2(double y, double x); double ceil(double x); double cos(double x); double cosh(double x); double exp(double x); double fabs(double x); double floor(double x); double fmod(double x, double y); double frexp(double x, int *pexp); double ldexp(double x, int _exp); double log(double y); double log10(double x); double modf(double x, double *pint); double pow(double x, double y); double sin(double x); double sinh(double x); double sqrt(double x); double tan(double x); double tanh(double x); double acosh(double a); double asinh(double a); double atanh(double a); double hypot(double x, double y); double log2(double x); long double modfl(long double x, long double *pint); double pow10(double x); double pow2(double x); #define M_E #define M_LOG2E #define M_LOG10E #define M_LN2 #define M_LN10 #define M_PI #define M_PI_2 #define M_PI_4 #define M_1_PI #define M_2_PI #define M_2_SQRTPI #define M_SQRT2 #define M_SQRT1_2 #define PI #define PI2 2.7182818284590452354 1.4426950408889634074 0.43429448190325182765 0.69314718055994530942 2.30258509299404568402 3.14159265358979323846 1.57079632679489661923 0.78539816339744830962 0.31830988618379067154 0.63661977236758134308 1.12837916709551257390 1.41421356237309504880 0.70710678118654752440 M_PI M_PI_2
The usage of any of these functions can be found by running the info program from the DOS command line. Move the cursor to * libc.a: (libc.inf). The Standard C Library Reference press Enter and choose menu options Functional Categories and math functions. press Q to exit the info program
116
Libraries
3. The standard library
This requires the inclusion of cstdlib or stdlib.h. It is a miscellaneous collection of functions for such operations as converting strings to numeric types, sorting and searching, exiting or aborting a program, and executing DOS commands. void int int double int long void * div_t void char * long ldiv_t void int void double long unsigned long int abort(void); abs(int _i); atexit(void (*_func)(void)); atof(const char *_s); atoi(const char *_s); atol(const char *_s); bsearch(const void *_key, const void *_base, size_t _nelem, size_t _size, int (*_cmp)(const void *_ck, const void *_ce)); div(int _numer, int _denom); exit(int _status) __attribute__((noreturn)); getenv(const char *_name); labs(long _i); ldiv(long _numer, long _denom); qsort(void *_base, size_t _nelem, size_t _size, int (*_cmp)(const void *_e1, const void *_e2)); rand(void); srand(unsigned _seed); strtod(const char *_s, char **_endptr); strtol(const char *_s, char **_endptr, int _base); strtoul(const char *_s, char **_endptr, int _base); system(const char *_s);
Some functions in the standard library have been omitted from the above list, because they are either 'C' functions that have a better counterpart in C++ or because they refer to the wide char type that is not covered on this course. Help on these functions can be obtained from within RHIDE by selecting Help.libc reference.alphabetical list or by entering info at a DOS prompt, moving the cursor to * libc.a: (libc). The Standard C Library Reference and pressing Enter, then Alphabetical list.
117
Bibliography Bibliography
C++ From the Beginning C++ for Engineers Instant C++ Programming C++ Primer 3rd Edition The C++ Programming Language 3rd Edition Object-Oriented Programming using C++ Software Engineering 4th Edition Software Engineering - A Practitioner's Approach Algorithms + Data Structures = Programs Classic Data Structures in C++ Skansholm J Bramer B & Bramer S Wilks Ian Lippman Stanley B Addison-Wesley Arnold Wrox Addison-Wesley
Stroustrup Bjarne Addison Wesley Romanovskaya, Shapetko & Svitovsky Wrox Sommerville I Addison-Wesley Pressman R S Wirth N Budd Timothy A McGraw-Hill Prentice Hall Addison Wesley
119