Sei sulla pagina 1di 5

CS246 — N OTES ON C, C++, AND JAVA U NIVERSITY OF WATERLOO , W INTER 2002

CS246 — Software Abstraction and Specification

Notes on C, C++, and Java


Edited by Michael W. Godfrey
Version 2.0 — January 3, 2002

1 Introduction

The purpose of this set of notes is to give CS246 students a bit of background into and an overview of the C, C++, and Java
programming languages. While there may be only a few pages now, we anticipate that it will continue to grow over time.

2 C, C++, and Java

The C, C++, and Java programming languages share an interesting history. Because I believe that history is undervalued in the
study of computer science (well, everywhere really), I’d like to give you a short lesson that might make your understanding of
these language more resonant.

2.1 The C programming language

The C programming language was developed at Bell Labs in Murray Hill, NJ in the late 1960s and early 1970s. It was developed
in tandem with the original implementation of UNIX, and was designed to be used on the PDP-8 computer. Most of the original
work on UNIX was done by Ken Thompson, with the C language being the work of Dennis Ritchie; later, Brian Kernighan (a
Toronto boy) helped in the design and implementation of C and, together with Ritchie, wrote the first book on the C language
called (oddly enough), The C Programming Language. 3

Prior to the development of C, most “applications” programming (i.e., business- or data-oriented) was done in a high-level
language such as COBOL, Fortran, or PL/1, and most “systems” programming (i.e., grungy, OS service-level tasks) was done in
assembly language. Broadly speaking, this meant that most applications programs were inefficient, and most systems programs
were not portable. One of the overriding design goals of C, therefore, was to provide a portable programming language that
allowed one to write very efficient code, both for systems programs and application programs. It accomplished this by setting
its abstraction level to be just above that of most hardware architectures of the day, e.g., data structures correspond to pieces
of memory and pointers to pieces of memory. In this way, it was relatively easy to create compilers for different architectures
(since the abstraction level was just above that of most hardware), and it was possible to program very efficiently, by keeping
more or less direct control of the computer’s resources.

C was a major step forward for programming; this fact can hardly be overemphasized. However, there are serious drawbacks
to the language as well. First, its syntax is notoriously arcane and hard to learn. Second, there is very little error checking built
in to the language (and if you want error checking, you have to build it yourself for the most part); for example, if you ask for
element 45 of an array of ints but your array really only has 20 elements, the system will simply calculate where element 45
3 This book is so well known, it is sometimes called “the bible”; when the second edition came out in the 1980s (the ANSI C version), it was often referred

to as “the new testament”. If you pick up a copy, you will note that UW’s own Prof. Ric Holt is thanked in the foreword.

Page 1
CS246 — N OTES ON C, C++, AND JAVA U NIVERSITY OF WATERLOO , W INTER 2002

would have been, if it existed, and return whatever bit pattern is stored in memory there, interpreted as if it were an int. Third,
C provides very little ability to design and manage your own abstractions; it was this fact that led directly to the creation of
C++. For example, in Java you simply declare and then new an object; when you’re done with it, the storage is automatically
reclaimed by the system without any effort on your part (this is called garbage collection). Storage allocation and deallocation
in C is performed by the programmer; for each (non-trivial) entity you want, you typically have to tell the system how big it is,
ask for just that many bytes, and then later return the storage when you’re done with it.

2.1.1 Data and control structures

The data and control structures of C are fairly simple. The data structures include simple data types, such as integers and
floating-point numbers, as well as composite types such as arrays and structs, which are like objects with data fields but
no methods. And then there are pointers, which are a bit like references in Java, except that they are really just numbers that
correspond to memory addresses. References in Java are fairly clean abstractions; they refer to an object of some kind. When
you have a reference to a Java object, you are only permitted to perform “appropriate” actions, such as calling a method or
retrieving a data value that the caller is permitted to access. A pointer in C could point to anything: an integer, the first element
of an array, another pointer, a piece of a piece of a struct, etc. And you can do just about anything you want to a pointer;
you can perform arithmetic on pointers 4 , you can write to the   byte in a array of structs (want to know which field of
which element you just changed? you’d better understand how the compiler lays out data structures in memory!), you can ask
the system to pretend that your data structure has a different shape than you originally stated.
Low-level control structures in C are similar to those of Java: declarations , assignment statements, ifs, loops, etc. In fact, the
syntax is (intentionally) almost identical. Higher level control structures basically boil down to functions, which can either be
local to a single file or global.

2.1.2 Files, compilation, and linking

The C language does not support programming abstractions above that of variables and functions. In particular, there is no idea
of a variable or function container, such as a class or module. Instead, C programmers use the convention of letting the file
system act as a kind of module boundary mechanism. If you have a set of variables and functions that you would like to place
together, you decide which ones should be local (visible only within the module) and which should be global (potentially visible
to any other module). You place the declarations for the global variables and functions in a header file, called something like
blarg.h, and you place the definitions of all of the variables and functions in an implementation file, called something like
blarg.c. Any other module that wishes to use anything declared in module blarg will include the header file blarg.h.
C and C++ programs are typically compiled file-by-file and then linked together to form a single executable program. Each
implementation file (e.g., foo.c for C, foo.cc for C++) is compiled into an object file (e.g., foo.o) that is peculiar to the
operating system and hardware platform. For example, on MFCF Unix machines, “gcc -c foo.cc” results in the creation
of foo.o. Each implementation file is compiled by first including all desired .h files into the compilation; these tell the
compiler the “shape” of the entities (typically functions) that are used within this implementation file but defined elsewhere.
Effectively, the compiler leaves empty slots for these functions in the object files that will be resolved later, when linking is
performed.
Once all of the files have been compiled, a single executable file is created by merging the contents of the object files and
performing linking on the unresolved references. For example,
gcc -o execMe foo.o blarg.o frobozz.o
results in the creation of an executable named execMe that contains the functions defined in foo, blarg, and frobozz, and
also resolves all external references (e.g., function f defined in foo calls function g defined in blarg). When you ask the
operating system to execute this new program execMe, the main function is called; there must be exactly one main function
defined somewhere in the system among foo.c, blarg.c, and frobozz.c.
4 This actually makes sense, but I’m not going to explain it here. Pick up a book on C if you’re interested.

Page 2
CS246 — N OTES ON C, C++, AND JAVA U NIVERSITY OF WATERLOO , W INTER 2002

2.2 The C++ programming language

The C++ programming language was developed in the 1980s, primarily by Bjarne Stroustrup, and also at Bell Labs in Murray
Hill, NJ. Originally, C++ was implemented as a set of macros to add some extensibility to C; later, it was turned into a full
language with its own compiler.
The major reason for the success of C++ is simple: it is a powerful and fully object-oriented programming language that is also
backwardly compatible with C. Almost every legal C program is also a legal C++ program. 5 This backwardly compatibility has
several important advantages for commercial software developers; most importantly, you can adapt older “legacy” C programs
by simply adding newer C++ portions to the existing system. 6
The attitude of C++ toward safety, efficiency, and complexity can be summarized as “pay as you go”. Most structures are
efficient by default, but also somewhat fragile and sometimes complicated to use. If, however, you are willing to sacrifice
some efficiency, then you can improve safety and simplicity (i.e., roughly speaking, you can write Java programs in C++).
For example, in Java arrays are perfectly safe, reasonably efficient for most uses, and even flexible in the sense that they are
very nearly fully fledged objects. Arrays in C++ are borrowed directly from C; they are fast but unchecked (i.e., if you say
“A[12] = 24;”, then the value 24 is inserted into the memory location where element 12 of A would have been, even if
there are only 10 elements in A). However, the Standard Template Library of C++ also includes a vector data structure, which
is similar to Java.util.vector in that it can be grown or shrunk as needed. If v is a vector, then you can access its
elements in one of two ways: like an array (e.g., “v[12]”), which is fast but unchecked, and through a method call (e.g.,
“v.at(12)”), which is slower but safe as a runtime check if performed to ensure that there really is an element 12.
Java is often described as a cleaner, simpler C++. The other side of this coin is that C++ allows the programmer very great
control over issues such as efficiency. The cost is complexity: if there’s only one way to do something in Java, there are often
several ways of doing it in C++, each with its own set of tradeoffs (efficiency, expressiveness, flexibility, simplicity, safety, etc.).
When you have a very large program, this complexity can become overwhelming.

2.3 The Java programming language

Java is primarily the design of James Gosling, of Sun Microsystems. 7 . According to Gosling, he started out trying to write small
embedded applications for various devices, such as VCRs and toasters, in a way that would be portable and reasonably efficient.
He tried C++ but was unhappy with the results, so he said about doing it “right”, at least according to his own programming
language aesthetics. The first version was introduced in 1995; it’s clear that Java has benefitted from a lot of hindsight with
respect to C++.
Superficially, Java looks and feels a lot like C++: the low-level syntax is very similar, and it’s an object-oriented language. The
significant differences include:

 Java removes some of the more controversial and complex features of C++ such as multiple inheritance, operator over-
loading, and templates.
 Java code compiles to a platform independent format called byte code, rather than to a machine-specific binary. The basic
idea is that byte code is close enough in detail level to most machine architectures that it is relatively straightforward to
create a byte code interpreter for any given hardware and operating system platform. This philosophy is sometimes
summarized as “write once, run everywhere”.8
5 If you want to know why I said “almost”, do some reading. For example, you could write a legal C program with variables named class or protected;

however, this would not be a legal C++ program as class and protected are reserved keywords.
6 C++ is not the only object-oriented version of C. The Objective-C language, which is used in the NeXT development environment, was another contender.

However, Objective-C was not backwardly compatible with C, and this is probably the major reason why C++ won out.
7 Gosling is originally from Calgary. Way back when, he was accepted into the undergraduate program at Waterloo, but at the last minute he decided to do

his bachelor’s degree back home. Something to do with skiing, he said.


8 And because this portability has often been imperfectly implemented, this slogan is sometimes rephrased as “write once, debug everywhere”.

Page 3
CS246 — N OTES ON C, C++, AND JAVA U NIVERSITY OF WATERLOO , W INTER 2002

 Unlike Java, C++ classes and objects can be declared and created in two ways (we will discuss the details in class). In
C++, you can create objects on the heap and manage them by reference as in Java; alternatively, in C++ you can create
objects on the stack and treat them like instances of fundamental types.

 In Java objects that are no longer needed are gathered up by the system using an approach called garbage collection. In
C++, it is the programmer who must decide when, where, and how each (heap-based) object is to be destroyed.9 That
is, in C++ programmers must perform their own storage deallocation for objects whereas in Java this task is mostly
performed by the system automatically. In C and C++, incorrect storage allocation is a major source of errors; the most
common problems are the accessing of an entity that’s been deleted, and a memory leak, where unneeded storage is not
returned to the system and your program starts to have problems allocating storage for new objects.

 Java is “fully” object oriented. In C++, functions and variables may “standalone” (as in C) or may be part of a class. In
Java, all functions and variables must be members of a class.10

 In C++, you can define complex and interesting abstractions, and then you can define ways of “cheating” or otherwise
breaking them; the philosophy of Java is simply that you shouldn’t cheat and you shouldn’t break abstractions.

 In C++, you often have a choice between safety and efficiency. In Java, there is usually no choice: you get safety. What
this means in practice is that Java programs do a lot more checking at run time, which can slow down your execution
speed. Every time you access an array element or reference an object in Java, there is a run-time check to make sure
what you are asking for is reasonable. For compute-intensive programs, this can entail a significant overhead. This is
why large, efficient “back-end” type of applications are often still written in C or C++. However, for the vast majority
of applications being written today, it’s not clear that this overhead is unreasonable. “Is Java too slow” is a debate that
has been raging since the language was introduced; the answer seems to be that Java is just fine for a very large class of
programs.

3 In summary

Boiling C, C++, and Java down to a few sentences, I think this might be a fair description:

 C is widespread in systems software. C is a “procedural” language: its basic elements are functions and variables.
It’s possible to write very efficient systems in C, but it doesn’t give you much help with higher-level programming
abstractions. The notation is arcane and hard to learn, and you have to program defensively. C is pretty “unsafe”; when
you make a mistake you rarely get a warning.

 C++ is becoming more widespread, partly because it is backwardly compatible with C. It is possible to write very efficient
systems in C++. It is possible to program “cleanly” in an object-oriented style but the language permits you to break most
of your abstractions when you feel the need. You usually have a choice between, say, efficiency versus clean abstraction.
This choice has made for a monstrously complicated language in which it is easy to write monstrously complicated
systems.

 Java is similar to C++ in many ways, but it is simpler, cleaner, and possibly less efficient. Its use of a virtual machine
and byte code gives portability almost for free (someone has to write a reasonable VM for your hardware and operating
system and you’re done). Java is in widespread use in web-enabled applications and web applets; it’s not clear that it will
be as useful in computationally intense “back-end” kinds of systems.
9 Stack-based objects in C++ are automatically destroyed at the end of their defining scope, as you might expect.
10 Standalone functions and variables can be faked in Java by grouping them in a class declaring them as static, which basically means “there’s only one
of these things”.

Page 4
CS246 — N OTES ON C, C++, AND JAVA U NIVERSITY OF WATERLOO , W INTER 2002

A question that students often ask is “which is the best programming language”. The correct answer is always “it depends”.
Each language has its own set of trade-offs: efficiency, simplicity, cost and availability of supporting tools, etc. In commercial
settings, the question is asked only rarely: most software developers work on systems that have been around for a long time and
you use whatever language the system was written in. Even older “legacy” languages such as COBOL and FORTRAN, which
no one would suggest as examples of elegant programming language design, have their uses: there is a large amount of human
expertise, legacy applications and libraries, and supporting infrastructure for them.

In universities, you sometimes hear people say “if only industry used language XXX, they would solve so many of their
problems”11 . Well, there is no doubt that over time we have made great progress in programming language design, and there
is some merit to this view. However, it is mostly impractical advice for a number of reasons. First, “academic” languages
are often weakly supported by tools; commercial software development requires fast, efficient, robust, and industrial-strength
compilers, profiles, debuggers, etc. Second, many of the problems that plague commercial software development would not
be much aided by a new programming language. At one end of the spectrum, the whiz-bang Silicon Valley dot-com world is
troubled by constant change, both in technology and personnel. The big problem is in getting something reasonable out the door
quickly. It doesn’t matter much what language you use; that tends to be dictated by your application area and other technical
needs. At the other end, the more staid business of corporate IT is dominated by large legacy systems that no one understands
well, older employees with skill sets that are unlikely to change much, and stultifying corporate infrastructure that is largely
resistant to fast change.12 That is to say, no matter where you are, you have more difficult problems than “what programming
language should we use”.

Finally, there is what I call the Technological Passive Peter Principle. The Peter Principle, I hope you know, states that within
a business organization, people rise to their respective levels of “natural incompetence”. That is, all the while you are doing a
good job, you keep getting promoted. You only stop your rise when you reach a job that you’re not very good at. That’s your
level of natural incompetence. Godfrey’s Technological Passive Peter Principle states that all useful technologies are used up to
(and sometimes just beyond) their level of natural incompetence. That is to say, we are always on the bleeding edge, writing the
most complicated software systems that we can. So when we come up with better tools and abstractions, such as object-oriented
programming, we build the biggest systems we can until we exhaust out intellectual abilities to manage and evolve them. That
is to say whenever we get a new programming language and accompanying tools, we will find a way to create programs that,
while they are bigger and more complex than last year’s unmaintainable monsters, are nonetheless unmaintainable monsters
themselves. Food for thought.

11 Common values of XXX include Java, ML, Lisp, Modula, and Eiffel.
12 OK, I’m exaggerating somewhat.

Page 5

Potrebbero piacerti anche