Sei sulla pagina 1di 33

LUND INSTITUTE OF TECHNOLOGY Department of Computer Science

C++ Programming 2011/12

Laboratory Exercises, C++ Programming


General information: The course has four compulsory laboratory exercises. You are to work in groups of two people. You sign up for the labs at sam.cs.lth.se/Labs (see the course plan for instructions). The labs are mostly homework. Before each lab session, you must have done all the assignments (A1, A2, . . . ) in the lab, written and tested the programs, and so on. Reasonable attempts at solutions count; the lab assistant is the judge of whats reasonable. Contact a teacher if you have problems solving the assignments. You will not be allowed to participate in the lab if you havent done all the assignments. Smaller problems with the assignments, e.g., details that do not function correctly, can be solved with the help of the lab assistant during the lab session. Extra labs are organized only for students who cannot attend a lab because of illness. Notify Per Holm (Per.Holm@cs.lth.se) if you fall ill, before the lab. The labs are about: 1. Basic C++, compiling, linking. 2. The GNU make and gdb tools, valgrind. 3. C++ strings and streams. 4. Programming with the STL, operator overloading. Practical information: You will use many half-written program skeletons during the lab. You must download the necessary les from the course homepage before you start working on the lab assignments. The lab les are in separate directories lab[ 1-4] and are available in gzipped tar format. Download the tar le and unpack it like this: tar xzf lab1.tar.gz This will create a directory lab1 in the current directory. General sources of information about C++ and the GNU tools: http://www.cplusplus.com http://www.cppreference.com Info in Emacs (Ctrl-h i)

The GNU Compiler Collection and C++

1
Read:

The GNU Compiler Collection and C++

Objective: to demonstrate basic GCC usage for C++ compiling and linking.

Book: basic C++, variables and types including pointers, expressions, statements, functions, simple classes, namespaces, ifstream, ofstream. Manpages for gcc, g++, and ld. An Introduction to GCC, by Brian Gough, http://www.network-theory.co.uk/docs/ gccintro/. Treats the same material as this lab, and more. Also available in book form.

Introduction

The GNU Compiler Collection, GCC, is a suite of native compilers for C, C++, Objective C, Fortran 77, Java and Ada. Front-ends for other languages, such as Fortran 95, Pascal, Modula-2 and Cobol are in experimental development stages. The list of supported target platforms is impressive GCC has been ported to all kinds of Unix systems, as well as Microsoft Windows and special purpose embedded systems. GCC is free software, released under the GNU GPL. Each compiler in GCC contains six programs. Only two or three of the programs are language specic. For C++, the programs are: Driver: the engine that drives the whole set of compiler tools. For C++ programs you invoke the driver with g++ (use gcc for C, g77 for Fortran, gcj for Java, etc.). The driver invokes the other programs one by one, passing the output of each program as input to the next. Preprocessor: normally called cpp. It takes a C++ source le and handles preprocessor directives (#include les, #define macros, conditional compilation with #ifdef, etc.). Compiler: the C++ compiler, cc1plus. This is the actual compiler that translates the input le into assembly language. Optimizer: sometimes a separate module, sometimes integrated in the compiler module. It handles optimization on a language-independent code representation. Assembler: normally called as. It translates the assembly code into machine code, which is stored in object les. Linker-Loader: called ld, collects object les into an executable le. The assembler and the linker are not actually parts of GCC. They may be proprietary operating system tools or free equivalents from the GNU binutils package. A C++ source code le is recognized by its extension. We will use .cc , which is the recommended extension. Other extensions, such as .cxx , .cpp , .C , are sometimes used. In C++ (and in C) declarations are collected in header les with the extension .h . To distinguish C++ headers from C headers other extensions are sometimes used, such as .hh , .hxx , .hpp , .H . We will use the usual .h extension. The C++ standard library is declared in headers without an extension. These are, for example, iostream , vector , string . Older C++ systems use pre-standard library headers, named iostream.h , vector.h , string.h . Do not use the old headers in new programs. A C++ program may not use any identier that has not been previously declared in the same translation unit. For example, suppose that main uses a class MyClass. The program code can be organized in several ways, but the following is always used (apart from toy examples):

The GNU Compiler Collection and C++

Dene the class in a le myclass.h: #ifndef MYCLASS_H // include guard #define MYCLASS_H // #include necessary headers here class MyClass { public: MyClass(int x); // ... private: // ... }; #endif Dene the member functions in a le myclass.cc: #include "myclass.h" // #include other necessary headers MyClass::MyClass(int x) { ... } // ... Dene the main function in a le test.cc (the le name is arbitrary): #include "myclass.h" int main() { MyClass m(5); // ... }

The include guard is necessary to prevent multiple denitions of names. Do not write function denitions in a header (except for inline functions and template functions). The g++ command line looks like this: g++ [options] [-o outfile] infile1 [infile2 ...] All the les in a program can be compiled and linked with one command (the -o option species the name of the executable le; if this is omitted the executable is named a.out ): g++ -o test test.cc myclass.cc To execute the program, just enter its name: ./test However, it is more common that les are compiled separately and then linked: g++ -c myclass.cc g++ -c test.cc g++ -o test test.o myclass.o The -c option directs the driver to stop before the linking phase and produce an object le, named as the source le but with the extension .o instead of .cc . The driver can be interrupted also at other stages, using the -S or -E options. The -S option stops the driver after assembly code has been generated and produces an assembly code le

The GNU Compiler Collection and C++

named le.s . The -E option stops the driver after preprocessing and prints the result from this rst stage on standard output. A1. A2. Write a Hello, world! program in a le hello.cc , compile and test it. Generate preprocessed output in hello.ii , assembly code output in hello.s , and object code in hello.o . Study the les: hello.ii: this is just to give you an impression of the size of the <iostream> header. You will nd your program at the end of the le. hello.s: your code starts at the label main. hello.o: since this is a binary le there is not much to look at. The command nm hello.o (study the manpage) prints the les symbol table (entry points in the program and the names of the functions that are called by the program and must be found by the linker).

Options and messages

There are many options to the g++ command that we didnt mention in the previous section. In the future, we will require that your source les compile correctly using the following command line: g++ -c -pipe -O2 -Wall -W -ansi -pedantic-errors -Wmissing-braces \ -Wparentheses -Wold-style-cast file.cc Short explanations (you can read more about these and other options in the gcc and g++ manpages): -c -pipe -O2 -Wall -W -ansi -pedantic-errors -Wmissing-braces -Wparentheses -Wold-style-cast just produce object code, do not link use pipes instead of temporary les for communication between tools (faster) optimize the object code on level 2 print most warnings print extra warnings follow standard C++ syntax rules treat serious warnings as errors warn for missing braces {} in some cases warn for missing parentheses () in some cases warn for old-style casts, e.g., (int) instead of static cast<int>

Do not disregard warning messages. Even though the compiler chooses to only issue warnings instead of errors, your program is erroneous or at least questionable. Another option, -Werror, turns all warnings into errors, thus forcing you to correct your program and remove all warnings before object code is produced. However, this makes it impossible to test a half-written program so we will normally not use it. Some of the warning messages are produced by the optimizer, and will therefore not be output if the -O2 ag is not used. But you must be aware that optimization takes time, and on a slow machine you may wish to remove this ag during development to save compilation time. Some platforms dene higher optimization levels, -O3, -O4, etc. You should not use these levels, unless you know very well what their implications are. Some of the optimizations that are performed at these levels are very aggressive and may result in a faster but much larger program.

The GNU Compiler Collection and C++

It is important that you become used to reading and understanding the GCC error messages. The messages are sometimes long and may be difcult to understand, especially when the errors involve the standard STL classes (or any other complex template classes).

Introduction to make, and a list example

You have to type a lot in order to compile and link C++ programs the command lines are long, and it is easy to forget an option or two. You also have to remember to recompile all les that depend on a le that you have modied. There are tools that make it easier to compile and link, build, programs. These may be integrated development environments (Eclipse, Visual Studio, . . . ) or separate command line tools. In Unix, make is the most important tool. We will explain make in detail in lab 2. For now, you only have to know this: make reads a Makele when it is invoked, the makele contains a description of dependencies between les (which les that must be recompiled/relinked if a le is updated), the makele also contains a description of how to perform the compilation/linking. As an example, we take the program from assignment A3 (see below). There, two les (list.cc and ltest.cc ) must be compiled and the program linked. Instead of typing the command lines, you just enter the command make. Make reads the makele and executes the necessary commands. The makele looks like this: # Define the compiler options CXXFLAGS = -pipe -O2 -Wall -W -ansi -pedantic-errors CXXFLAGS += -Wmissing-braces -Wparentheses -Wold-style-cast # Define what to do when make is executed without arguments. all: ltest # The following rule means "if ltest is older than ltest.o or list.o, # then link ltest". ltest: ltest.o list.o g++ -o ltest ltest.o list.o # Define the rules to create the object files. ltest.o: ltest.cc list.h g++ -c $(CXXFLAGS) ltest.cc list.o: list.cc list.h g++ -c $(CXXFLAGS) list.cc You may add rules for other programs to the makele. All action lines must start with a tab, not eight spaces. A3. The class List describes a linked list of integer numbers.1 The numbers are stored in nodes. A node has a pointer to the next node (0 in the last node). Before the data nodes there is an empty node, a head. An empty list contains only the head node. The only purpose of the head node is to make it easier to delete an element at the front of the list.

In practice, you would never write your own list class. There are several alternatives in the standard library.

The GNU Compiler Collection and C++

namespace cpp_lab1 { /* List is a list of long integers */ class List { public: /* create an empty list */ List(); /* destroy this list */ ~List(); /* insert d into this list as the first element */ void insert(long d); /* remove the first element less than/equal to/greater than d, depending on the value of df. Do nothing if there is no value to remove. The public constants may be accessed with List::LESS, List::EQUAL, List::GREATER outside the class */ enum DeleteFlag { LESS, EQUAL, GREATER }; void remove(long d, DeleteFlag df = EQUAL); /* returns the size of the list */ int size() const; /* returns true if the list is empty */ bool empty() const; /* returns the value of the largest number in the list */ long largest() const; /* print the contents of the list (for debugging) */ void debugPrint() const; private: /* a list node */ struct Node { long value; // the node value Node* next; // pointer to the next node, 0 in the last node Node(long value = 0, Node* next = 0); }; Node* head; // the pointer to the list head, which contains // a pointer to the first list node /* forbid copying of lists */ List(const List&); List& operator=(const List&); }; }; Notes: Node is a struct, i.e., a class where the members are public by default. This is not dangerous, since Node is private to the class. The copy constructor and assignment operator are private, so you cannot copy lists. The le list.h contains the class denition and list.cc contains a skeleton of the class implementation. Complete the implementation in accordance with the specication. Also implement, in a le ltest.cc , a test program that checks that your List implementation is correct. Be careful to check exceptional cases, such as removing the rst or the last element in the list. Type make to build the program. Note that you will get warnings about unused parameters when the implementation skeleton is compiled. These warnings will disappear

The GNU Compiler Collection and C++

when you implement the functions. Try to write code with testing and debugging in mind. The skeleton le list.cc includes <cassert>. This makes it possible to introduce assertions on state invariants. The le also contains a denition of a debug macro, TRACE, which writes messages to the clog output stream. Assertions as well as TRACE statements are removed from the code when NDEBUG is dened.2 A4. Implement a class Coding with two static methods: /* For any character c, encode(c) is a character different from c */ static unsigned char encode(unsigned char c); /* For any character c, decode(encode(c)) == c */ static unsigned char decode(unsigned char c); Use a simple method for the coding and decoding. Then write a program, encode, that reads a text le3 , encodes it, and writes the encoded text to another le. The command line: ./encode file should run the program, encode le , and write the output to le.enc . Write another program, decode, that reads an encoded le, decodes it, and writes the decoded text to another le. The command line should be similar to that of the encode program. Add rules to the makele for building the programs. Test your programs and check that a le that is rst encoded and then decoded is identical to the original. Note: the programs will work also for les that are UTF8-encoded. In this encoding, national Swedish characters are encoded in two bytes, and the encode and decode functions will be called twice for each such character.

Object Code Libraries

A lot of software is shipped in the form of libraries, e.g., class packages. In order to use a library, a developer does not need the source code, only the object les and the headers. Object le libraries may contain thousands of les and cannot reasonably be shipped as separate les. Instead, the les are collected into library les that are directly usable by the linker. 4.1 Static Libraries

The simplest kind of library is a static library. The linker treats the object les in a static library in the same way as other object les, i.e., all code is linked into the executable les, which as a result may grow very large. In Unix, a static library is an archive le, lib name .a . In addition to the object les, an archive contains an index of the symbols that are dened in the object les. A collection of object les f1.o , f2.o , f3.o , . . . , are collected into a library using the ar command: ar crv libfoo.a f1.o f2.o f3.o ...
2

There are two ways to dene a global macro like NDEBUG. It can either be specied in the source le: #define NDEBUG

or be given on the compiler command line, using the -D option: g++ -c -DNDEBUG $(OTHER CXXFLAGS) list.cc
3

Note that you cannot use while (infile >> ch) to read all characters in infile, since >> skips whitespace. Use infile.get(ch) instead. Output with outfile << ch should be ok, but outfile.put(ch) looks more symmetric.

The GNU Compiler Collection and C++

(Some Unix versions require that you also create the symbol table with ranlib libfoo.a.) In order to link a program main.o with the object les obj1.o , obj2.o and with object les from the library libfoo.a , you use the following command line: g++ -o main main.o obj1.o obj2.o -L. -lfoo The linker always searches for libraries in certain system directories. The -L. option makes the linker search also in the current directory.4 The library name (without lib and .a) is given after -l. A5. Collect the object les generated in the other assignments, except those containing main functions, in a library liblab1.a . Then link the programs (ltest, encode, decode) again, using the library. Shared Libraries

4.2

Since most programs use a large amount of code from libraries, executable les can grow very large. Instead of linking library code into each executable that needs it, the code can be loaded at runtime. The object les should then be in shared libraries. When linking programs with shared libraries, the les from the library are not actually linked into the executable. Instead a pointer is established from the program to the library. The obvious advantage is that common code does not need to be reproduced in all programs. In Unix shared library les are named lib name .so[ .x.y.z] (.so for shared objects, .x.y.z is an optional version number). The linker uses the environment variable LD LIBRARY PATH as the search path for shared libraries. In Microsoft Windows shared libraries are known as DLL les (for dynamically loadable libraries). A6. (Advanced, optional) Create a library as in the previous exercise, but make it shared instead of static. Then link the executables, with different names, using the shared library. Make sure they run correctly. Compare the sizes of the dynamically linked executables to the statically linked (there will not be a big size difference, since the size of the library code is small). Use the command ldd (list dynamic dependencies) to inspect the linkage of your programs. Hints: shared libraries are created by the linker, not the ar archiver. Use the gcc manpage and the ld manpage (and, if needed, other manpages) to explain the following sequence of operations: g++ -fPIC -c *.cc g++ -shared -Wl,-soname,liblab1.so.1 -o liblab1.so.1.0 *.o ln -s liblab1.so.1.0 liblab1.so.1 ln -s liblab1.so.1 liblab1.so You then link with -L. -llab1 as usual. The linker merely checks that all referenced symbols are in the shared library. Before you execute the program, you must dene LD LIBRARY PATH so it includes the current directory (export is for zsh, setenv for tcsh): export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH setenv LD_LIBRARY_PATH .:$LD_LIBRARY_PATH
4 You may have several -L and -l options on a command line. Example, where the current directory and the directory /usr/ local/ mylib are searched for the libraries libfoo1.a and libfoo2.a :

g++ -o main main.o obj1.o obj2.o -L. -L/usr/local/mylib -lfoo1 -lfoo2

10

The GNU Compiler Collection and C++

Final Remarks

If a source le includes a header that is not in the current directory, the path to the directory containing the header can be given to the compiler using the -I option (similar to -L): g++ -c -Ipath file.cc To make GCC print all it is doing during compilation (what commands it is calling and a lot of other information), the option -v (for verbose) can be added to the command line.

Tools for Practical C++ Development

11

Tools for Practical C++ Development

Objective: to introduce a set of tools which are often used to facilitate C++ development in a Unix environment. The make tool (GNU Make) is used for compilation and linking, gdb (the GNU debugger) for controlled execution and testing, valgrind (not GNU) for nding memory-related errors. Read: GNU Make, http://www.gnu.org/software/make/manual/ GDB User Manual, http://www.gnu.org/software/gdb/documentation/ valgrind Quick Start, http://www.valgrind.org/docs/manual/QuickStart.html Manpages for the tools used in the lab. The manuals have introductory tutorials that are recommended as starting points. The manuals are very well written and you should consult them if you want to learn more than what is treated during the lab.

1
1.1

GNU Make
Introduction

As you saw in lab 1, make is a good tool it sees to it that only les that have been modied are compiled (and les that depend on modied les). To do this, it compares the modication times5 of source les and object les. We will use the GNU version of make. There are other make programs, which are almost but not wholly compatible to GNU make. In a Linux or Darwin system, the command make usually refers to GNU make. You can check what variant you have with the command make --version. It should give output like the following: make --version GNU Make 3.81 Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. ... If make turns out to be something else than GNU make, try the command gmake instead of make; this should refer to GNU make. Make uses a description of the source project in order to gure out how to generate executables. The description is written in a le, usually called Makele . When make is executed, it reads the makele and executes the compilation and linking (and maybe other) commands that are necessary to build the project. The most important construct in a makele is a rule. A rule species how a le (the target), which is to be generated, depends on other les (the prerequisites). For instance, the following rule species that the le ltest.o depends on the les ltest.cc and ltest.h:
5

In a network environment with a distributed le system, there may occur problems with unsynchronized clocks between a client and the le server. Make will report errors like: make: make: ***Warning: File "foo" has modification time in the future. warning: Clock skew detected. Your build may be incomplete.

Unfortunately, there is not much you can do about this problem, if you dont have root privileges on the computer. Inform the system administrators about the problem and switch to another computer if possible. If no other computer is available, temporarily work on a local le system where you have write access, such as /tmp .

12

Tools for Practical C++ Development

ltest.o: ltest.cc list.h On the line following the dependency specication you write a shell command that generates the target. This is called an explicit rule. Example (with a short command line without options): ltest.o: ltest.cc list.h g++ -c ltest.cc Make has a very unusual syntax requirement: the shell command must be preceded by a tab character; spaces are not accepted. An implicit rule does not give an explicit command to generate the target. Instead, it relies on makes large collection of default rules. For instance, make knows how to generate .o -les from most kinds of source les, for instance that g++ is used to generate .o -les from .cc -les. The actual implicit rule is: $(CXX) $(CPPFLAGS) $(CXXFLAGS) -c -o $@ $< CXX, CPPFLAGS, and CXXFLAGS are variables that the user can dene. The syntax $(VARIABLE) is used to evaluate a variable, returning its value. CXX is the name of the C++ compiler, CPPFLAGS are the options to the preprocessor, CXXFLAGS are the options to the compiler. $@ expands to the name of the target, $< expands to the rst of the prerequisites. We are now ready to write our rst makele, which will build the ltest program from lab 1. The value g++ for CXX is default, thus that denition is optional. CPPFLAGS and CXXFLAGS are empty by default. CXXFLAGS is almost always redened. # Compiler and compiler options: CXX = g++ CXXFLAGS = -pipe -O2 -Wall -W -ansi -pedantic-errors CXXFLAGS += -Wmissing-braces -Wparentheses -Wold-style-cast # Linking: ltest: ltest.o list.o $(CXX) -o $@ $^ # Dependencies, the implicit rule .cc => .o is used: ltest.o: ltest.cc list.h list.o: list.cc list.h $^ expands to the complete list of prerequisites. We will make improvements to this makele later. Suppose that none of the les ltest , ltest.o , list.o exists. Then, the following commands are executed when you run make (the long command lines have been wrapped): make ltest g++ -pipe -O2 -Wall -W -ansi -pedantic-errors -Wmissing-braces -Wparentheses -Wold-style-cast -c -o ltest.o ltest.cc g++ -pipe -O2 -Wall -W -ansi -pedantic-errors -Wmissing-braces -Wparentheses -Wold-style-cast -c -o list.o list.cc g++ -o ltest ltest.o list.o If an error occurs in one of the commands, make will be aborted. If there, for instance, is an error in list.cc , the compilation of that le will be aborted and the program will not be linked. When you have corrected the error and run make again, it will discover that ltest.o is up to date and only remake list.o and ltest . If you run make without a target name as parameter, make builds the rst target it nds in the

Tools for Practical C++ Development

13

makele. Since ltest is the rst target, make ltest and make are equivalent. By convention, the rst target should be named all . Therefore, the rst rules in the makele should look like this: # Default target all, make everything all: ltest # Linking: ltest: ltest.o list.o $(CXX) -o $@ $^ Makeles can also contain directives that control the behavior of make. For example, a makele can include other les with the include directive. A1. The le Makele contains the makele that has been used as an example. Experiment with make: copy the necessary source les (list.h , list.cc and ltest.cc ) to the lab2 directory and run make. Run make again. Delete the executable program and run make again. Change one or more of the source les (it is sufcient to touch them) and see what happens. Run make ltest.o. Run make notarget. Read the manpage and try other options. Etc., etc. Phony targets

1.2

Makeles may contain targets that do not actually correspond to les. The all target in the previous section is an example. Now, suppose that a le all is created in the directory that contains the makele. If that le is newer than the ltest le, a make invocation will do nothing but say make: Nothing to be done for all., which is not the desired behavior. The solution is to specify the target all as a phony target, like this: .PHONY: all Another common phony target is clean . Its purpose is to remove intermediate les, such as object les, and it has no prerequisites. It typically looks like this: .PHONY: clean clean: $(RM) $(OBJS) The variable RM defaults to rm -f, where the option -f tells rm not to warn about non-existent les. The return value from the command is always 0, so the clean target will always succeed. A cleaning rule that removes also executable les and libraries could be called, e.g., cleaner or realclean . All Unix source distributions contain one or more makeles. A makele should contain a phony target install , with all as its prerequisite. The make install command copies the programs and libraries built by the all rule to a system-wide location where they can be reached by other users. This location, which is called the installation prex, is usually the directory /usr/ local . Installation uses the GNU command install (or plain cp) to copy programs to $( PREFIX) /bin and libraries to $( PREFIX) /lib . Since only root has permission to write in /usr/ local , you will have to use one of your own directories as prex. A2. Create the bin and lib directories. Add all , clean and install targets to your makele and specify suitable phony targets. Also provide an uninstall target that removes the les that were installed by the install target.

14

Tools for Practical C++ Development

1.3

Rules for Linking

In our example makele, we used an explicit rule for linking. Actually, there is an implicit rule for linking, which looks (after some variable expansions) like this: $(CC) $(LDFLAGS) $^ $(LOADLIBES) $(LDLIBS) -o $@ LDFLAGS are options to the linker, such as -Ldirectory. LOADLIBES and LDLIBS6 are variables intended to contain libraries, such as -llab1. So this is a good rule, except for one thing: it uses $(CC) to link, and CC is by default gcc, not g++. But if you change the denition of CC, the implicit rule works also for C++: # Define the linker CC = g++

1.4

Generating Prerequisites Automatically

While youre working with a project the prerequisites are often changed. New #include directives are added and others are removed. In order for make to have correct information of the dependencies, the makele must be modied accordingly. The necessary modications can be performed automatically by make itself. A3. Read the make manual, section 4.14 Generating Prerequisites Automatically, and implement such functionality in your makele. Hint: Copy the %.d rule (it is not necessary that you understand everything in it) to your makele and make suitable modications. Do not forget to include the *.d les. The rst time you run make you will get a warning about .d les that dont exist. This is normal and not an error. Look at the .d les that are created and see that they contain the correct dependencies. One useful make facility that we havent mentioned earlier is wildcards: as an example, you can dene the variable SRC as a list of all .cc les with SRC = $(wildcard *.cc). And from this you can get all .o les with OBJ = $(SRC:.cc=.o). From now on, you must update the makele when you add new programs, so that you can always issue the make command in the build directory to build everything. When you add a new program, you should only have to add a rule that denes the .o -les that are necessary to link the program, and maybe add the program name to a list of executables. This is important it will save you many hours of command typing. (Optional) Collect all object les not containing main functions into a library, and link executables against that library. If you know how to create shared libraries, use a shared library libcxx.so , otherwise create a static archive libcxx.a . Writing a Good Makele

1.5

A makele should always be written to run in sh (Bourne Shell), not in csh. Do not use special features from, e.g., bash, zsh, or ksh. The tools used in explicit rules should be accessed through make variables, so that the user can substitute alternatives to the defaults. A makele should also be written so that it doesnt have to be located in the same directory as the source code. You can achieve this by dening a SRCDIR variable and make all rules refer to the source code from that prex. Even better is to use the variable VPATH, which species a search path for source les. Vpath is described in section 4.5 of the make manual.
6

There doesnt seem to be any difference between LOADLIBES and LDLIBS they always appear together and are concatenated. Use LDLIBS.

Tools for Practical C++ Development

15

A4.

(Optional, VPATH is difcult) Create a new directory, src, for C++ lab source code and copy the source les from lab 1 into it. In a parallel directory build , install the Makele , and modify it so that you can build the programs (and possibly libraries), standing in the build directory. The all target, which should be default, should build your ltest, encode, and decode programs. No command should write to the src directory.

Debugging

Consider the following program (reciprocal.cc ): #include <iostream> #include <cstdlib> // for atoi, see below double reciprocal(int i) { return 1.0 / i; } int main(int argc, char** argv) { int i = std::atoi(argv[1]); // atoi: "ascii-to-integer" double r = reciprocal(i); std::cout << "argv[1] = " << argv[1] << ", 1/argv[1] = " << r << std::endl; } The program prints the inverse of the number given as the rst argument on the command line. If you forget the argument when you execute the program, an error message is printed: ./reciprocal 8231 segmentation fault ./reciprocal This is the most common runtime error message in C and C++ programs. Unlike Java, where the error message would be something like ArrayIndexOutOfBoundsException at line 2 of main, you dont get any help at all from the runtime system. The error message is not informative (segmentation fault means that you have tried to use a pointer that points to unallocated memory) and you are not informed of the location of the error. So, you need help in locating the error (not in this program, but in larger programs). One method is to insert print statements at strategic places in the program, but this is a slow process. A better method is to use a debugger, which is a program that controls the execution of a program, giving you the possibility to execute the program one line at a time, to insert breakpoints where execution stops, to print variable values, etc. Here we will present the GNU Debugger, GDB. To start with, you must compile and link your program with the option -ggdb and without optimization (no -O2): g++ -c -ggdb reciprocal.cc (lots of options omitted) g++ -o reciprocal -ggdb reciprocal.o This causes debugging information to be inserted into the program (line number information, variable names, etc.). If you run the program again, the same error message is produced (naturally, the error is still there). To run the program under control of the debugger, issue the gdb command with the program name as argument: gdb reciprocal GNU gdb ... 7.2 Copyright (C) ... ... (gdb)

16

Tools for Practical C++ Development

(gdb) is a command prompt. The rst step is to run the program inside the debugger. Enter the command run and the program arguments. First run the program without arguments, like this: (gdb) run Starting program: /h/dd/c/.../reciprocal Program received signal SIGSEGV, Segmentation fault. *__GI_____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=0, loc=0xb7f1c380) at strtol_l.c:298 ... The error occurred in the strtol l internal function, which is an internal function that atoi calls. Inspect the call stack with the where command: (gdb) where #0 *__GI_____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=0, loc=0xb7f1c380) at strtol_l.c:298 #1 0xb7e07fc6 in *__GI_strtol (nptr=0x0, endptr=0x0, base=10) at strtol.c:110 #2 0xb7e051b1 in atoi (nptr=0x0) at atoi.c:28 #3 0x080487b0 in main (argc=1, argv=0xbfa1be44) at reciprocal.cc:9 You see that the atoi function was called from main. Use the up command (three times) to go to the main function in the call stack: ... (gdb) up #3 0x080487b0 in main (argc=1, argv=0xbfa1be44) at reciprocal.cc:9 9 int i = std::atoi(argv[1]); // atoi: "ascii-to-integer" Note that GDB nds the source code for main, and it shows the line where the function call occurred. You can examine the value of a variable using the print command: (gdb) print argv[1] $1 = 0x0 This conrms a suspicion that the problem is indeed a 0 pointer passed to atoi. Set a breakpoint on the rst line of main with the break command: (gdb) break main Breakpoint 1 at 0x80487a0: file reciprocal.cc, line 9. Now run the program with an argument: (gdb) run 5 The program being debugged has been started already. Start it from the beginning? (y or n) y Starting program: /h/dd/c/.../reciprocal 5 Breakpoint 1, main (argc=2, argv=0xbff7aae4) at reciprocal.cc:9 9 int i = std::atoi(argv[1]); // atoi: "ascii-to-integer" The debugger has stopped at the breakpoint. Step over the call to atoi using the next command:

Tools for Practical C++ Development

17

(gdb) next 10

double r = reciprocal(i);

If you want to see what is going on inside reciprocal, step into the function with step: (gdb) step reciprocal (i=5) at reciprocal.cc:5 5 return 1.0 / i; The next statement to be executed is the return statement. Check the value of i and then issue the continue command to execute to the next breakpoint (which doesnt exist, so the program terminates): (gdb) print i $2 = 5 (gdb) continue Continuing. argv[1] = 5, 1/argv[1] = 0.2 Program exited normally.

A5.

Try the debugger by working through the example. Experiment with the commands (the table below gives short explanations of some useful commands). Commands may be abbreviated and you may use tab for command completion. Ctrl-a takes you to the beginning of the command line, Ctrl-e to the end, and so on. Use the help facility and the manual if you want to learn more. help [command] run [args...] continue next step where up down list [nbr] break func break nbr catch [arg] info [arg] delete [nbr] print expr display var undisplay [nbr] set var = expr kill watch var Get help about gdb commands Run the program with arguments specied. Continue execution. Step to the next line over called functions. Step to the next line into called functions. Print the call stack. Go up to caller. Go down to callee. List 10 lines around the current line or around line nbr (the following lines if repeated). Set a breakpoint on the rst line of a function func. Set a breakpoint at line nbr in the current le. Break on events, for example exceptions that are thrown or caught (catch throw, catch catch). Generic command for showing things about the program, for example information about breakpoints (info break). Delete all breakpoints or breakpoint nbr. Print the value of the expression expr. Print the value of variable var every time the program stops. Delete all displays or display nbr. Assign the value of expr to var. Terminate the debugger process. Set a watchpoint, i.e., watch all modications of a variable. This can be very slow but can be the best solution to nd bugs when random pointers are used to write in the wrong memory location.

18

Tools for Practical C++ Development

whatis var|func source file make

Print the type of a variable or function. Read and execute the commands in file. Run the make command.

The debugging information that is written to the executable le can be removed using the strip program from binutils. See the manpage for more information. You may wish to disable optimization (remove the -O2 option) while you are debugging a program. GCC allows you to use a debugger together with optimization but the optimization may produce surprising results: some variables that you have dened may not exist in the object code, ow of control may move where you did not expect it, some statements may not be executed because they compute constant results, etc. To make it easier to switch between generating debugging versions and production versions of your programs, you can add something like this to your makele: # insert this line somewhere at the top of the file DEBUG = true // or false # insert the following lines after the definitions of CXXFLAGS and LDFLAGS ifeq ($(DEBUG), true) CXXFLAGS += -ggdb LDFLAGS += -ggdb else CXXFLAGS += -O2 endif You can specify the value of DEBUG on the command line (the -e options means that the command line value will override any denition in the makele): make -e DEBUG=true

Finding Memory-Related Errors

C++ is less programmer-friendly than Java. In Java, many common errors are caught by the compiler (use of uninitialized variables) or by the runtime system (addressing outside array bounds, dereferencing null pointers, etc.). In C++, errors of this kind are not caught; instead they result in erroneous results or traps during program execution. Furthermore, you get no information whatsoever about where in the program the error occurred. Since allocation of dynamic memory in C++ is manual, you also have a whole new class of errors (double delete, memory leaks). Valgrind (http://www.valgrind.org) is a tool (available only under Linux and Mac OS X) that helps you to nd memory-related errors at the precise locations at which they occur. It does this by emulating an x86 processor and supplying each data bit with information about the usage of the bit. This results in slower program execution, but this is more than compensated for by the reduced time spent in hunting bugs. It is very easy to use valgrind. To nd the error in the reciprocal program from the previous section, you just do this (compile and link with -ggdb and without optimization): valgrind ==2938== ==2938== ==2938== ==2938== ==2938== --leak-check=yes ./reciprocal Memcheck, a memory error detector Copyright (C) 2002-2010, and GNU GPLd, by Julian Seward et al. ... Invalid read of size 1 at 0x41A602F: ____strtol_l_internal (strtol_l.c:298)

Tools for Practical C++ Development

19

==2938== by 0x41A5DE6: strtol (strtol.c:110) ==2938== by 0x41A3160: atoi (atoi.c:28) ==2938== by 0x80486B9: main (reciprocal.cc:9) ==2938== Address 0x0 is not stackd, mallocd or (recently) freed ==2938== ... ==2938== ==2938== HEAP SUMMARY: ==2938== in use at exit: 0 bytes in 0 blocks ==2938== total heap usage: 0 allocs, 0 frees, 0 bytes allocated ==2938== ==2938== All heap blocks were freed -- no leaks are possible ==2938== ==2938== For counts of detected and suppressed errors, rerun with: -v ==2938== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 17 from 6) [1] 2938 segmentation fault valgrind --leak-check=yes ./reciprocal When the error occurs, you get an error message and a stack trace (and a lot of other information). In this case, it is almost the same information as when you used gdb, but this is only because the error resulted in a segmentation violation. But most errors dont result in immediate traps: void zero(int* v, size_t n) { for (size_t i = 0; i < n; ++i) v[i] = 0; } int main() { int* v = new int[5]; zero(v, 10); // oops, should have been 5 // ... this error may result in a trap now or much later, or wrong results }

A6.

The le vgtest.cc contains a test program with two functions with common programming errors. Write more functions to check other common errors. Note that valgrind cannot nd addressing errors in stack-allocated arrays (like int v[10]; v[10] = 0). Run the program under valgrind, try to interpret the output. Advice: always use valgrind to check your programs! Note that valgrind may give false error reports on C++ programs that use STL or C++ strings. See the valgrind FAQ, http://valgrind.org/docs/manual/faq.html#faq.reports

4
4.1

Makeles and Topological Sorting


Introduction

Targets and prerequisites of a makele form a graph where the vertices are les and an arc between two vertices corresponds to a dependency. When make is invoked, it computes an order in which the les are to be compiled so the dependencies are satised. This corresponds to a topological sort of the graph. Consider a simplied makele containing only implicit rules, i.e., lines reading target: prerequisites. You are to implement a C++ program topsort that reads such a le, whose name is given on the command line, and outputs the targets to standard output in an order where the dependencies are satised. If the makele contains syntax errors or cyclic dependencies, the program should exit with an error message. This is not a trivial task. The solution will be developed stepwise in the following sections.

20

Tools for Practical C++ Development

4.2

Graph Representation

The rst task is to choose a representation of the dependency graph. We will use a representation where each vertex has a list of its neighbors. This list is called an adjacency list. Consider the following makele: D: A E: B D B: A C The dependencies in the makele form the following graph (the graph to the left, the vertices with their adjacency lists to the right):
A

B A B C E D D C

A graph has a list of Vertex objects (A, B, C, D, E, in the gure). A vertex is represented by an object of class Vertex, which has the attributes name (the label on the vertex) and adj (the adjacency list). In the gure, the adjacency list of vertex A contains B and D. The class denitions follow. First, class Vertex (the attribute color is used when traversing the graph, see section 4.3): class Vertex { friend class VertexList; // give VertexList access to private members private: /* create a vertex with name nm */ Vertex(const std::string& nm); std::string name; // name VertexList adj; // list enum Color { WHITE, GRAY, Color color; // used }; In class VertexList, the adjacency list is implemented as a vector of pointers to other vertices (vptrs). The functions top sort and dfs visit are described in section 4.3. struct cyclic {}; // exception type, the graph is cyclic class VertexList { public: /* create an empty vertex list, destroy the list */ VertexList(); ~VertexList(); /* insert a vertex with the label name in the list */ void add_vertex(const std::string& name); of the vertex of adjacent vertices BLACK }; in the traversal algorithm

Tools for Practical C++ Development

21

/* insert an arc from the vertex from to the vertex to */ void add_arc(const std::string& from, const std::string& to); /* return the vertex names in reverse topological order */ std::stack<std::string> top_sort() throw(cyclic); /* print the vertex list (for debugging) */ void debugPrint() const; private: void insert(Vertex* v); // insert the vertex pointer v in vptrs, if // its not already there Vertex* find(const std::string& name) const; // return a pointer to the // vertex named name, 0 if not found void dfs_visit(Vertex* v, std::stack<std::string>& result) throw(cyclic); // visit v in the traversal algorithm std::vector<Vertex*> vptrs; // vector of pointers to vertices

VertexList(const VertexList&); // forbid copying VertexList& operator=(const VertexList&); }; Finally, the class Graph is merely a typedef for a VertexList: typedef VertexList Graph; Notes: add vertex in VertexList should create a new Vertex object and add it to the vertex list. It should do nothing if a vertex with that name already is present in the list. add arc should insert to in froms adjacency list. It should start with inserting from and to in the vertex list in that way arcs and vertices may be added in arbitrary order. find is used in add vertex and in insert. The VertexList destructor is quite difcult to write correctly, since you have to be careful so no Vertex object is deleted more than once. It is easiest to start by removing pointers from the adjacency lists so the graph isnt circular, before deleting the objects. You may treat the destructor as optional, if you wish. Advice: draw your own picture of a graph structure, with all objects, vptrs and pointers. A7. The classes are in the les vertex.h , vertex.cc , vertexlist.h , vertexlist.cc , graph.h . Implement the classes, except top sort and dfs visit , and test using the program graph test.cc . Comment out the lines in the function testGraph that check the top sort algorithm. Topological Sort

4.3

A topological ordering of the vertices of a graph is essentially obtained by a depth-rst search from all vertices. (There are other algorithms for topological sorting. If you prefer another algorithm you are free to use that one instead.) The topological ordering is not necessarily unique. For instance, the vertices of the graph in the preceding section may be ordered like this: A C B D E C A D B E or or . . .

The algorithm is described below, in pseudo-code. The algorithm produces the vertices in reverse topological order, which is why we use a stack for the result.

22

Tools for Practical C++ Development

The algorithm should detect cycles in the graph. If that happens, your function should throw a cyclic exception. The cycle detection is not described in the algorithm; you must add that yourself. Topological-Sort(Graph G) { /* 1. initialize the graph and the result stack */ for each vertex v in G v.color = WHITE; Stack result; /* 2. search from all unvisited vertices */ for each vertex v in G if (v.color == WHITE) DFS-visit(v, result); } DFS-visit(Vertex v, Stack result) { /* 1. white vertex v is just discovered, color it gray */ v.color = GRAY; /* 2. recursively visit all unvisited adjacent nodes */ for each vertex w in v.adj if (w.color == WHITE) DFS-visit(w); /* 3. finish vertex v (color it black and output it) */ v.color = BLACK; result.push(v.name); }

A8. 4.4 A9.

Implement top sort and dfs visit and test the program. Parsing the Makele Finally, you are ready to solve the last part of the assignment: to read a makele, build a graph, sort it topologically and print the results. The input to the program is a makele containing lines of the type: target: prereq1 prereq2 ... The names of the target and the prerequisites are strings. The list of prerequisites may be empty. The program should perform reasonable syntax checks (the most important is probably that a line must contain a target). The le topsort.cc contains a test program. Implement the function buildGraph and test. The les test[ 1-5] .txt contain test cases (the same that were used in the graph test program in assignment A8), but you are encouraged to also test other cases.

Strings and Streams

23

3
Read:

Strings and Streams

Objective: to practice using the string class and the stream classes.

Book: strings, streams, function templates.

1
1.1

Class string
Introduction

In C, a string is a null-terminated array of characters. This representation is the cause of many errors: overwriting array bounds, trying to access arrays through uninitialized or incorrect pointers, and leaving dangling pointers after an array has been deallocated. The <cstring> library contains some useful operations on C-strings, such as copying and comparing strings. C++ strings hide the physical representation of the sequence of characters. A C++ string object knows its starting location in memory, its contents (the characters), its length, and the length to which it can grow before it must resize its internal data buffer (i.e., its capacity). The exact implementation of the string class is not dened by the C++ standard. The specication in the standard is intended to be exible enough to allow different implementations, yet guarantee predictable behavior. The string identier is not actually a class, but a typedef for a specialized template: typedef std::basic_string<char> string; So string is a string containing characters of the type char. There is another standard string specialization, wstring, which describes strings of characters of the type wchar t. wchar t is the type representing wide characters. The standard does not specify the size of these characters; usually its 16 or 32 bits (32 in gcc). In this lab we will ignore all internationalization problems and assume that all characters t in one byte. Strings in C++ are not limited to the classes string and wstring; you may specify strings of characters of any type, such as int. However, this is discouraged: there are other sequence containers that are better suited for non-char types. Since were only interested in the string specialization of basic string, you may wonder why we mention basic string at all? There are two reasons: First, std::basic string<> is all the compiler knows about in all but the rst few translation passes. Hence, diagnostic messages only mention std::basic string<>. Secondly, it is important for the general understanding of the ideas behind the string method implementations to have a grasp of the basics of a C++ concept called character traits, i.e., character characteristics. For now, we will just show parts of the denition of basic string: /* Standard header <string> */ namespace std { template<typename char_t, typename traits_t = char_traits<char_t> > class basic_string { // types: typedef traits_t traits_type; typedef typename traits_t::char_type char_type; typedef unsigned long size_type; static const size_type npos = -1; // much more }; }

24

Strings and Streams

char type is the type of the characters in the string. size type is a type used for indexing in the string. npos (no position) indicates a position beyond the end of the string; it may be returned by functions that search for characters in a string. Since size type is unsigned, npos represents positive innity. 1.2 Operations on Strings

We now show some of the most important operations on strings (actually basic strings). The operations are both member functions and global helper functions. We start with the member functions (for formatting reasons, some parameters of type size type have been specied as int): class string { public: /*** construction ***/ string(); // create an empty string string(const string& s); // create a copy of s string(const char* cs); // create a string with the characters from cs string(const string& s, int start, int n); // create a string with n // characters from s, starting at position start string(int n, char ch); // create a string with n copies of ch /*** information ***/ int size(); int capacity(); void reserve(int n);

// // // //

number of characters in the string the string can hold this many characters before resizing is necessary set the capacity to n, resize if necessary

/*** character access ***/ const char& operator[](size_type pos) const; char& operator[](size_type pos); /*** substrings */ string substr(size_type start, int n); // the substring starting at // position start, containing n characters /*** finding things ***/ // see below /*** void void void inserting, replacing, and removing ***/ insert(size_type pos, const string& s); // insert s at position pos append(const string& s); // append s at the end replace(size_type start, int n, const string& s); // replace n // characters starting at pos with s void erase(size_type start = 0, int n = npos); // remove n characters // starting at pos /*** assignment and concatenation ***/ string& operator=(const string& s); string& operator=(const char* cs); string& operator=(char ch); string& operator+=(const string& s); // also for const char* and char /*** access to C-string representation ***/ const char* c_str(); }

Note that there is no constructor string(char). Use string(1, char) instead.

Strings and Streams

25

The subscript functions operator[] do not check for a valid index. There are similar at() functions that do check, and that throw out of range if the index is not valid. The substr() member function takes a starting position as its rst argument and the number of characters as the second argument. This is different from the substring() method in java.lang.String, where the second argument is the end position of the substring. There are overloads of most of the functions. You can use C-strings or characters as parameters instead of strings. There is a bewildering variety of member functions for nding strings, C-strings or characters. They all return npos if the search fails. The functions have the following signature (the string parameter may also be a C-string or a character): size_type FIND_VARIANT(const string& s, size_type pos = 0) const; s is the string to search for, pos is the starting position. (The default value for pos is npos, not 0, in the functions that search backwards). The nd variants are find (nd a string, forwards), rfind (nd a string, backwards), find first of and find last of (nd one of the characters in a string, forwards or backwards), find first not of and find last not of (nd a character that is not one of the characters in a string, forwards or backwards). Example: void f() { string int i1 int i2 int i3 int i4 int i5 int i6 }

s = = = = = =

= "accdcde"; s.find("cd"); s.rfind("cd"); s.find_first_of("cd"); s.find_last_of("cd"); s.find_first_not_of("cd"); s.find_last_not_of("cd");

// // // // // //

i1 i2 i3 i4 i5 i6

= = = = = =

2 4 1 5 0 6

(s[2]==c && (s[4]==c && (s[1]==c) (s[5]==d) (s[0]!=c && (s[6]!=c &&

s[3]==d) s[5]==d)

s[0]!=d) s[6]!=d)

The global overloaded operator functions are for concatenation (operator+) and for comparison (operator==, operator<, etc.). They all have the expected meaning. You cannot use + to concatenate a string with a number, only with another string, C-string or character (this is unlike Java). The comparison functions rely on two functions that compare individual characters: eq() and lt(). These are dened in the character traits class. A1. Implement the following function: void replace_all(string& s, const string& from, const string& to); so the following code executes correctly: string text = "A man, a plan, a canal, Panama!"; replace_all(text, "an", "XXX"); assert(text == "A mXXX, a plXXX, a cXXXal, PXXXama!"); text = "ananan"; replace_all(text, "an", "XXX"); assert(text == "XXXXXXXXX"); text = "ananan"; replace_all(text, "an", "anan"); assert(text == "anananananan");

26

Strings and Streams

A2.

The Sieve of Eratosthenes is an ancient method for nding all prime numbers less than some xed number M. It starts by enumerating all numbers in the interval [0, M] and assuming they are all primes. The rst two numbers, 0 and 1 are marked, as they are not primes. The algorithm then starts with the number 2, marks all subsequent multiples of 2 as composites, and repeats the process for the next prime candidate. When the initial sequence is exhausted, the numbers not marked as composites are the primes in [0, M]. In this assignment you shall use a string for the enumeration. Initialize a string with appropriate length to PPPPP...PPP. The characters at positions that are not prime numbers should be changed to C. Write a test program that prints the prime numbers between 1 and 200 and also the largest prime that is less than 100,000. Example with the numbers 027: 1 2 0123456789012345678901234567 Initial: CCPPPPPPPPPPPPPPPPPPPPPPPPPP Find 2, mark 4,6,8,...: CCPPCPCPCPCPCPCPCPCPCPCPCPCP Find 3, mark 6,9,12,...: CCPPCPCPCCCPCPCCCPCPCCCPCPCC Find 5, mark 10,15,20,25: CCPPCPCPCCCPCPCCCPCPCCCPCCCC Find 7, mark 14,21: CCPPCPCPCCCPCPCCCPCPCCCPCCCC ...

2
2.1

The iostream Library


Input/Output of User-Dened Objects

We have already used most of the stream classes: istream, ifstream, and istringstream for reading, and ostream, ofstream, and ostringstream for writing. There are also iostreams that allow both reading and writing. The stream classes are organized in the following (simplied) generalization hierarchy:
ios_base

ios

istream

ostream

iostream

ifstream

istringstream

fstream

stringstream

ofstream

ostringstream

The classes ios base and ios contain, among other things, information about the stream state. There are, for example, functions bool good() (the state is ok) and bool eof() (end-of-le has been reached). There is also a conversion operator void*() that returns nonzero if the state is good, and a bool operator!() that returns nonzero if the state is not good. We have used these operators with input les, writing for example while (infile >> ch) or if (! infile). Now, we are interested in reading objects of a class A from an istream using operator>> and in writing objects to an ostream using operator<<. We do it by dening the following operator functions: std::istream& operator>>(std::istream& is, A& aobj); std::ostream& operator<<(std::ostream& os, const A& aobj);

Strings and Streams

27

Note: Since the parameters are of type istream or ostream, the operator functions work for both le streams and string streams. The functions are global helper functions.7 Both operators return the stream object that is the rst argument to the operator, enabling us to write, e.g., cout << a << b; (i.e., operator<<(operator<<(cout,a), b);). The operators usually need access to private members of class A and may be friends of the class. The input operator should be written so that it can (minimally) read data produced by the output operator. As an example of an input operator, we here show (slightly simplied) the GNU implementation of input to objects of std::complex<num type>. namespace std { template <typename num_type> istream& operator>>(istream& is, complex<num_type>& cplx) /* * Input formats for a complex number: re, or (re), or (re,im) * Possibly whitespace between the tokens. * * operator>>(istream&, num_type&) is assumed to set the stream state * appropriately when reading the numbers. */ { num_type re = 0, im = 0; char c = 0; is >> c; if (c == () { is >> re >> c; if (c == ,) { is >> im >> c; if (c == )) { cplx = complex<num_type>(re, im); } else { // format error, set the stream state to fail is.setstate(ios_base::failbit); } } else if (c == )) { cplx = complex<num_type>(re, num_type(0)); } else { // format error, set the stream state to fail is.setstate(ios_base::failbit); } } else { is.putback(c); is >> re; cplx = complex<num_type>(re, num_type(0)); } return is; } }
7

The alternative implementation, to write operator>> and operator<< as member functions, is not possible here: the functions would have to be members of class istream and ostream, respectively, and we cannot add to the standard classes.

28

Strings and Streams

A3.

The les date.h , date.cc , and date test.cc describe a simple date class. Implement the class and add operators for input and output of dates. Dates should be output in the form 2012-01-10. The input operator should accept dates in the same format. (You may consider dates written like 2012-1-10 and 2012 -001 - 10 as legal, if you wish.) String Streams

2.2

The string stream classes (istringstream and ostringstream) function as their le counterparts (ifstream and ofstream). The only difference is that characters are read from/written to a string instead of a le. A4. In Java, the class Object denes a method toString() that is supposed to produce a readable representation of an object. This method can be overridden in subclasses. Write a C++ function toString for the same purpose. Examples of usage: double d = 1.234; Date today; std::string sd = toString(d); std::string st = toString(today); You may assume that the argument object can be output with <<. A5. Type casting in C++ can be performed with, for example, the static cast operator. Casting from a string to a numeric value is naturally not supported, since this would involve converting a sequence of characters to a number. Write a function string cast that can be used to cast a string to an object of another type. Examples of usage: try { int i = string_cast<int>("123"); double d = string_cast<double>("12.34"); Date date = string_cast<Date>("2011-01-10"); } catch (StringCastException) { cout << "... error" << endl; } You must also dene the class StringCastException. You may assume that the argument object can be input with >>.

Programming with the STL

29

Programming with the STL

Objective: to practice using the STL container classes and algoritms, with emphasis on efciency. You will also learn more about operator overloading and STL iterators. Read: Book: STL containers and STL algorithms. Operator overloading, iterators.

Domain Name Servers and the STL Container Classes

On the web, computers are identied by IP addresses (32-bit numbers). Humans identify computers by symbolic names. A Domain Name Server (DNS) is a component that translates a symbolic name to the corresponding IP address. The DNS system is a very large distributed database that contains billions (or at least many millions) of IP addresses and that receives billions of lookup requests every day. Furthermore, the database is continuously updated. If you wish to know more about name servers, read the introduction at http://computer.howstuffworks. com/dns.htm. In this lab, you will implement a local DNS in C++. With local, we mean that the DNS does not communicate with other name servers; it can only perform translations using its own database. The goal is to develop a time-efcient name server, and you will study different implementation alternatives. You shall implement three versions (classes) of the name server, using different STL container classes. All three classes should implement the interface NameServerInterface: typedef std::string HostName; typedef unsigned int IPAddress; const IPAddress NON_EXISTING_ADDRESS = 0; class NameServerInterface { public: virtual ~NameServerInterface() {} virtual void insert(const HostName&, const IPAddress&) = 0; virtual bool remove(const HostName&) = 0; virtual IPAddress lookup(const HostName&) const = 0; }; insert() inserts a name/address pair into the database, without checking if the name already exists. remove() removes a name/address pair and returns true if the name exists; it does nothing and returns false if the name doesnt exist. lookup() returns the IP address for a specied name, or NON EXISTING ADDRESS if the name doesnt exist. Since this is an STL lab, you shall use STL containers and algorithms as much as possible. This means, for example, that you are not allowed to use any for or while statements in your solutions. (There is one exception: you may use a for or while statement in the hash function, see assignment A1c.) A1. The denition of the class NameServerInterface is in the le nameserverinterface.h . a) Implement a class VectorNameServer that uses an unsorted vector to store the name/ address pairs. Use linear search to search for an element. The intent is to demonstrate that such an implementation is inefcient. Use the STL find if algorithm to search for a host name. Consider carefully the third parameter to the algorithm: it should be a function or a functor that compares an element in the vector (a name/address pair) with a name. If you use a functor, the call may look like this:

30

Programming with the STL

find_if(v.begin(), v.end(), bind2nd(first_equal(), name)) first equal is a functor (a class) that takes two parameters, a pair and a host name, and compares the rst component of the pair with the name. b) Implement a class MapNameServer that uses a map to store the name/address pairs. The average search time in this implementation will be considerably better than that for the vector implementation. c) It is possible that the map implementation of the name server is sufciently efcient, but you shall also try a third implementation. Implement a class HashNameServer that uses a hash table to store the name/address pairs. Use a vector of vectors to store the name/address pairs.8 The hash table implementation is open for experimentation: you must select an appropriate size for the hash table and a suitable hash function.9 It can also be worthwhile to exploit the fact that most (in fact all, in the test program) computer names start with www.. Without any greater difculty, you should be able to improve the search times that you obtained in the map implementation by about 50%. Use the program nstest.cc to check that the insert/remove/lookup functions work correctly. Then, use the program nstime.cc to measure and print the search times for the three different implementations, using the le nameserverdata.txt as input (the le contains 25,143 name/address pairs10 ). A2. Examples of average search times in milliseconds for a name server with 25,143 names are in the following table. 25,143 vector map hash 0.185 0.00066 0.00029 100,000

Search the Internet for information about search algorithms for different data structures, or use your knowledge from the algorithms and data structures course, and ll in the blanks in the table. Write a similar table for your own implementation.

2
2.1

Bitsets, Subscripting, and Iterators


Bitsets

To manipulate individual bits in a word, C++ provides the bitwise operators & (and), | (or), ^ (exclusive or), and the shift operators << (shift left) and >> (shift right). The STL class bitset<N> generalizes this notion and provides operations on a set of N bits indexed from 0 through N-1. N may be arbitrary large, indicating that the bitset may occupy many words. For historical reasons, bitset doesnt provide any iterators. We will develop a simplied version of the bitset class where all the bits t in one word, and extend the class with iterators so it becomes possible to use the standard STL algorithms with the class. Our goal is to provide enough functionality to make the following program work correctly:
8 GNU STL contains a class hash map (in newer versions unordered map), a map implementation that uses a hash table. You should not use any of these classes in this assignment. 9 Note that a good hash function should take all (or at least many) of the characters of a string into account and that "abc" and "cba" should have different hash codes. For instance, a hash function that merely adds the rst and last characters of a string is not acceptable. 10 It was difcult to nd a le with real name/address pairs, so the computer names are common words from a dictionary le, with www. rst and .se or similar last. The IP addresses are running numbers.

Programming with the STL

31

int main() { using namespace std; using namespace cpp_lab4; // Define an empty bitset, set some bits, print the bitset Bitset bs; bs[3] = true; bs[0] = bs[3]; copy(bs.begin(), bs.end(), ostream_iterator<int>(cout)); cout << endl; // Clear a bit, use the STL find algorithm to find the first // set bit, clear this bit, set the next bit, print the bitset bs[0] = false; BitsetIterator it = find(bs.begin(), bs.end(), true); if (it != bs.end()) { *it = false; ++it; if (it != bs.end()) *it = true; } copy(bs.begin(), bs.end(), ostream_iterator<int>(cout)); cout << endl; } The output from the program should be (you may get 64 digits if you run the program on a 64 bit machine): : 10010000000000000000000000000000 00001000000000000000000000000000 A BitsetIterator has to support both reading and writing, so it must be a model of ForwardIterator. Actually, it is not difcult to make it a model of RandomAccessIterator, but this would mean that we had to supply more functions. We will develop the solution in several steps: Implement the bit ddling methods necessary to set, clear, and test an individual bit in a word (this we have written for you). Investigate how operator[] should be implemented. This is rather difcult. Add iterators to the class. This turns out to be relatively simple. A3. The les simplebitset.h and simplebitset.cc contain the implementation of a simple version of the Bitset class, with get and set functions instead of an overloaded subscripting operator. Study the class and convince yourself that you understand how the bits are manipulated. Use the program in simplebittest.cc to check the function of the class. Subscripting

2.2

Subscripting, e.g., bs[0], is handled by operator[]. In order to allow subscripting to be used on the left hand side of an assignment operator, operator[] must return a reference (e.g., like int& operator[](int i) in a vector class). The problem with the Bitset class is that we need a reference to an individual bit in a word, and there are no pointers to bits in C++. We must write a proxy class, BitReference, to represent the reference. This class contains a pointer to the word that contains the bits (the member bits in class Bitset), and an integer that is the position of the bit in the word.

32

Programming with the STL

An outline of the class (BitsetStorage is the type of the word that contains the bits): class BitReference { public: BitReference(Bitset::BitStorage* pb, size_t p) : p_bits(pb), pos(p) {} // Operations will be added later protected: Bitset::BitStorage* p_bits; // pointer to the word containing bits size_t pos; // position of the bit in the word }; Now, operator[] in class Bitset may be dened as follows: BitReference operator[](size_t pos) { return BitReference(&bits, pos); } The actual bit ddling is performed in the BitReference class. In order to see what we need to implement in this class, we must study the results of some expressions involving operator[]: bs[3] = true; // bs.operator[](3) = true; => // BitReference(&bs.bits,3) = true; => // BitReference(&bs.bits,3).operator=(true); From this follows that we must implement the following operator function in BitReference: BitReference& operator=(bool x); // for bs[i] = x This function should set the bit referenced by the BitReference object to the value of x (just like the set function in the original Bitset class). When we investigate (do this on your own) other possibilities to use operator[] we nd that we also need the following functions: BitReference& operator=(const BitReference& bsr); // for bs[i] = bs[j] operator bool() const; // for x = bs[i]

A4.

Use the les bitset.h , bitset.cc , bitreference.h , bitreference.cc , and bitsettest1.cc . Implement the functions in bitreference.cc and test. It is a good idea to execute the program line by line under control of gdb, even if your implementation is correct, so you can follow the execution closely and see what happens (turn off the -O2 switch on the compilation command line if you do this). Iterators

2.3

From one of the OH slides: An iterator points to a value. All iterators are DefaultConstructible and Assignable and support ++it and it++. A ForwardIterator should additionally be EqualityComparable and support *it, both for reading and writing via the iterator. The most important requirement is that an iterator should point to a value. A BitsetIterator should point to a Boolean value, and we already have something that does this: the class BitReference! The additional requirements (++, equality test, and *) can be implemented in a derived class. It will look like this:11
11

This class only contains the constructs that are necessary for the bitset test program. For example, we have not implemented postx ++ and comparison with ==.

Programming with the STL

33

class BitsetIterator : public BitReference { public: // ... typedefs, see below BitsetIterator(Bitset::BitsetStorage* pb, size_t p) : BitReference(pb, p) {} BitsetIterator& operator=(const BitsetIterator& bsi); bool operator!=(const BitsetIterator& bsi) const; BitsetIterator& operator++(); BitReference operator*(); }; The assignment operator must be redened so it copies the member variables (as opposed to the assignment operator in the base class BitReference, which sets a bit in the bitset). The typedefs are necessary to make the iterator STL-compliant, i.e., to make it possible to use the iterator with the standard STL algorithms. The only typedef that isnt obvious is: typedef std::forward_iterator_tag iterator_category; This is an example of an iterator tag, and it informs the compiler that the iterator is a forward iterator. A5. Implement the member functions in bitsetiterator.cc . Uncomment the lines in bitset.h and bitset.cc that have to do with iterators, implement the begin() and end() functions. Use the program bitsettest2.cc to test your classes.

Potrebbero piacerti anche