Sei sulla pagina 1di 49

Relational Theory

Database Technology [DBTECO601]


Thomas Devine

http://noucamp

thomas.devine@lyit.ie

November 15, 2007

1
Contents
1 Introduction 5
2 The Structure Of Relational Models 6
2.1 The structure of relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Properties of relations . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Relational terminology . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Null . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.4 The University relations . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3 Developing a relational model . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.1 Declaring relations . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.2 The University relational model . . . . . . . . . . . . . . . . . . . . 14
2.4 Candidate keys, primary keys and alternate keys . . . . . . . . . . . . . . . . 16
2.4.1 Candidate keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.2 Primary keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4.3 Alternate keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Representing relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5.1 Quali ed names and the dot notation . . . . . . . . . . . . . . . . . 19
2.5.2 Representing 1:n relationships using foreign keys . . . . . . . . . . . 19
2.6 Representing E-R models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6.1 The Hospital relational model . . . . . . . . . . . . . . . . . . . . . 22
2.6.2 Representing 1:1 relationships . . . . . . . . . . . . . . . . . . . . . 24
2.6.3 Recursive relationships . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 Manipulating Relations 27
3.1 The select and project operators . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1 The select operator . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.2 The project operator . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.3 Combining expressions . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 The join and divide operators . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.1 The join operator . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.2 Using join with other operators . . . . . . . . . . . . . . . . . . . . 35
3.2.3 The divide operator . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Updating 38
5 Practical Manipulation Using RAS 39
5.1 An overview of RAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1.1 Using the join operator . . . . . . . . . . . . . . . . . . . . . . . . 42

2
6 Constraints 43
6.1 Key constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.1.1 Primary keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.1.2 Alternate keys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.1.3 Foreign keys and referential integrity . . . . . . . . . . . . . . . . . . 45
6.2 Attribute constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.2.1 Allowing Null . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3
List of Figures
1 The Student relation depicted as a table . . . . . . . . . . . . . . . . . . . . 7
2 The Enrolment relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 A table that cannot be called a relation . . . . . . . . . . . . . . . . . . . . 8
4 The Student relation in a di erent order . . . . . . . . . . . . . . . . . . . . 10
5 Relational terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
6 Venn diagram for Student relation . . . . . . . . . . . . . . . . . . . . . . . 11
7 University Conceptual Data Model . . . . . . . . . . . . . . . . . . . . . . . 18
8 Hospital E-R Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
9 An extension of the Nurse relation . . . . . . . . . . . . . . . . . . . . . . . 27
10 Students registered after 1996 . . . . . . . . . . . . . . . . . . . . . . . . . 30
11 Students registered after 1996 excluding student s09 . . . . . . . . . . . . . 30
12 Projecting student names . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
13 Projecting counsellor numbers . . . . . . . . . . . . . . . . . . . . . . . . . 31
14 Course titles and credit values . . . . . . . . . . . . . . . . . . . . . . . . . 32
15 A join of Student and Sta . . . . . . . . . . . . . . . . . . . . . . . . . . 34
16 The Studies relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
17 The NewCourses relation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
18 Students studying new courses . . . . . . . . . . . . . . . . . . . . . . . . . 38
19 University E-R Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4
1 Introduction
Relational theory is concerned with what is required rather than how it can be achieved,
providing an understanding of DBMSs, which is independent of implementation concerns.
Because relational theory is independent of implementation concerns, it can also be used to
represent conceptual data models equivalent to the E-R models that you have met earlier. In
this section we shall be exploring the use of relational theory for this purpose. We shall use
the terminology of models rather than the terminology of a DBMS because we do not want
other DBMS issues to intrude. Bearing in mind that we intend our discussion to be at the
conceptual level we shall be using terms like database to mean a collection of data without
any implication of an implementation using a computer system.
Given that this block is about relational theory, its main aims are as follows:-
 De ne and explain the components of relational theory { structure, manipulation and
constraints.
 Explain, in some detail, relational theory in terms of these components.
 Provide practical computing experience in using the manipulative part of relational the-
ory. This is done by means of a relational algebra system in the practical activities.
In the remainder of this section, we shall explain and de ne the three components which
make up relational theory. Relational theory is a prescription for the representation and ma-
nipulation of data. We shall, therefore, be concerned with the data structures that relational
theory supports, for it is out of these structures that any relational model must be con-
structed. The term relational model is a conceptual model produced from the process of
relational theory. We shall also be concerned with the types of constraint that relational
theory supports. A constraint is a restriction on the allowable operations so that a database
is always consistent and contains no data that are invalid representations of the real world.
To summarise, relational theory will consist of three components:-
 a structural part;
 a manipulative part;
 a constraints part.
The structural part de nes the data structures on which the theory is based, the manip-
ulative part the operators which can act on the data in those structures, and the constraints
part the constraints that may be applied to instances of those structures.
The structural part of relational theory is based on a number of fundamental constructs
from which the data structures of a relational model can be built. These constructs are
relations (and their attributes) and domains. Any relational model will contain named
examples of these constructs. That is, it will contain named relations, attributes and
domains, appropriate to the needs of the organisation in question. You have, in fact,

5
already encountered practical applications of some of these relational constructs in the
activities in earlier sections, in which you were shown tables (depicting relations), and
columns (representing attributes).
The manipulative part of relational theory is concerned with the way in which a set of
relations may be manipulated to produce other relations. The result of any manipulation
is always itself a relation. The purpose of the manipulation can be thought of as
providing the means for describing the insertion, deletion, amendment and retrieval of
data. We shall be studying manipulation of relations using the set of relational operators
known as the relational algebra. We shall study common operations for producing a
desired resulting relation by suitable operations on one or more other relations. One
such simple operation is to produce a relation that contains only some of the attributes
of a given source relation.
The constraints part of relational theory consists of the features that constrain a database
to contain only valid combinations of data. These features are termed constraint
de nitions and form part of a relational model.

2 The Structure Of Relational Models


The structural part of relational theory is based on three constructs: relation, attribute and
domain. We begin this account of the structural part by giving some informal de nitions of
these constructs, accompanied by examples, in order to build up to the more formal de nitions
that we shall give later. We shall discuss the fundamental construct relation and its attributes
together, in subsection 2.1, before introducing the domain construct in subsection 2.2.

2.1 The structure of relations

A relation can be pictured as a form of table, with attributes being named columns of the
table. Hence we speak of a relation as being composed of its attributes. It can be used to
represent a set of occurrences of an entity type, with each row representing one occurrence.
We give as our rst example of a named relation, the Student relation. Figure 1 depicts
the Student relation as a table, with an example collection of rows. The table has ve
columns, labelled Studentld, Name, Registered, CounsellorNo and Region. That is to say,
the table depicts the Student relation as being composed of ve named attributes, each being
an example of the attribute construct of relational theory. Strictly we have to say that this
table is only a depiction of the relation. Incidentally, here the term attribute is being used as
a construct in a relation whereas previously you have met it as an attribute of an entity type.
In Figure 1, each row of the table corresponds to a Student entity { an occurrence of the
entity type Student in the University E-R model, as introduced earlier. Each column of the
table (that is, each attribute of the relation) represents the values of a particular attribute
of the Student entity type. The column headings for the table (that is, the names of the
attributes of the relation) correspond to the attributes that comprise the Student entity type.

6
Figure 1: The Student relation depicted as a table

We may write down the individual rows of a relation using an angled-bracket notation.
Thus, the row <s02, Thompson, 1998, 5212, 4> corresponds to one occurrence of the
Student entity type; namely the occurrence for which the value of the identi er StudentId
is s02. Within this row, the value Thompson (that is, a value of the Name attribute in the
relation) corresponds to the value of the Name attribute for that entity. The other entries
in the row similarly correspond to values of the other attributes of that entity.
Tables, such as that in Figure 1, are only a convenient depiction of a relation. In particular,
the order of the rows and columns as inevitably shown in a table has no signi cance to a
relation.
You will have noticed that we have chosen the same name for the relation as for the entity
type, and the same names for the columns (that is, for the attributes of the relation) as for
the attributes of the entity type. This is a deliberate choice on our part to reinforce the
correspondence between entity types and relations. However, you should realise that this is
just our choice: the names do not have to be the same. When the names are the same, it is
important to remember that, for example, the Student relation is di erent from the Student
entity type. A similar distinction holds between the names of the attributes of an entity type
and the names of the attributes of a relation.
In order to reinforce this distinction we have printed E-R model names in a di erent style
to relational names. For example, by Student we intend a reference to an entity type, while
by Student we intend a reference to a relation. In your own work you will need to make a
similar distinction, usually by including either the phrase 'entity type' or the word 'relation'
as appropriate.

2.1.1 Properties of relations


As you have seen, a relation is composed of attributes and may be depicted as a form of
table. It is not, however, just any kind of table. Speci cally, it is a table that adheres to a
set of rules, as follows.
1. Each value in the table is atomic: that is to say, for each row, the value within a column
is always one value and never a group of values. For example, the row <s10, Urbach,
1997, 5212, 4> is made up of just one StudentId attribute value, one Name value, one
Registered value, one CounsellorNo value and one Region value.

7
2. The values within a column (that is, the values of an attribute) are all of the same
kind. For example, the values for the Registered column (that is, for the Registered
attribute) are four-digit numbers, each of which represents a year.
3. Each column of the table has a name, di erent from any other in the table, by which
it may be identi ed, e.g. Region.
4. Each row is unique; meaning that it is di erent in some respect from each other row.
5. The ordering of rows and columns is not signi cant. For example, the rows need not
have been printed in ascending order of StudentId, nor need StudentId have been the
rst column.
Student is a valid relation according to these rules, as is a second example of a named
relation, the Enrolment relation (which corresponds to the Enrolment entity type of the
University E-R model). Figure 2 depicts the Enrolment relation as a table, with an example
collection of rows.

Figure 2: The Enrolment relation

We now explore some of the implications of these rules. Later, when we have given a
formal de nition of a relation, you will be in a position to appreciate why these rules must
hold.
Rule 1 The rst rule (atomic entries) prevents certain tables from being regarded as the
depictions of relations. As an example of a table which contradicts this rst rule,
consider the table shown in Figure 3.
The table in Figure 3 consists of two columns. The rst, StudentId, is straightforward
and its values conform to the rules. The second column, CourseCodes, is a column
in which the entry in any given row is sometimes multivalued, for example 'c2, c7'.
Thus the table in Figure 3 does not conform to the rst rule because the entries in
the CourseCodes column are not all atomic. It is therefore not a depiction of a valid
relation.

8
Figure 3: A table that cannot be called a relation

Rule 2 The tables in Figures 1 and 2 also obey the second rule. The values within a column
are all of the same kind: they are homogeneous. All the values within a column are
of the same data type. More precisely, we say that all the values within a column are
drawn from the same domain.
Rule 3 The third rule, which relates to unique column names, means that within a relation
a column may be referred to uniquely by its name. All the tables that we have given so
far obey the third rule. Some of the consequences of this third rule will become clearer
when we consider the fth rule.
Rule 4 The fourth rule means that we shall always be able to distinguish one row from another
and so be able to de ne at least one attribute or combination of attributes that will
identify a row uniquely. In other words, no two rows in the relation can have the same
values for this combination of attributes. We call such an attribute or combination of
attributes a key. A key identi es a row uniquely.
In particular, we shall be able to choose one of these keys (if there is more than one) to
be the primary key. Primary keys (of relations) correspond to identi ers (of entities).
The fourth rule requires that such a primary key always exists, if the table is to be the
depiction of a relation.
When we are choosing a primary key from several candidate keys we are looking for a key
that is a single attribute or the smallest combination of attributes that identi es rows
uniquely. Later we shall provide a fuller de nition of a primary key that will formalise
this.
It is possible to construct relations in which it is necessary for all the attributes in the
relation to comprise the primary key, but in practice a subset of the attributes usually
proves to be sucient and is often a single attribute. For example, the primary key of
the Student relation is the StudentId attribute.
Where the primary key of a relation is some combination of attributes, we shall enclose
the component attributes in brackets. Thus the primary key of Enrolment will be written
(StudentId, CourseCode).
Rule 5 The fth, and last, rule states that the ordering of rows and columns in a relation
is not signi cant. This means that the table depicted in Figure 4 represents the same

9
relation (Student) as that in Figure 1, even though they may be considered to be
di erent tables. The rule means that any reference to the ordering of rows and columns
in a relation has no meaning.

Figure 4: The Student relation in a di erent order


One result of the fth rule is that you can only refer to a column by means of its name,
not by means of its position. For example, a reference to the third column in Student
has no meaning: it could be the CounsellorNo or the Registered column, depending
on whether the relation is represented by Figure 4 or Figure 1, or it could be some
other column, dependent on some other table representing the relation. (The third
rule, di erent column names within a relation, can be seen as necessary for this fth
rule: we must be able to refer to columns uniquely, within a relation, by means of their
names alone.)
Similarly, you cannot refer to a row by means of its position. For example, a reference
to the second row of Student is unde ned. A row can only be identi ed by the values
it contains. In particular, a row may be identi ed uniquely by means of the value of the
primary key for that row. (Hence, the need for distinct rows.)
In summary, a relation is an abstract structure whereas a table is a depiction of such a
structure, with certain features (such as a physical ordering of columns and rows) that are
merely properties of the depiction rather than of the abstraction.
In spite of the fact that we have stressed the distinction between relations and tables we
shall nevertheless often refer to them as such when it is simpler to do so.

2.1.2 Relational terminology


Some specialised terminology is used for relations so that we do not have to use words like row
and column which are more properly the terminology of tables. This terminology is illustrated
in Figure 5 for the Student relation.
The number of attributes in a relation is known as the degree of the relation (not to be
confused with the degree of a relationship in an E-R model). Hence, the degree of Student
is 5. Rows are known as tuples. Thus one tuple in Student is <s10, Urbach, 1997, 5212,
4>. The number of tuples in a relation is known as the cardinality of the relation. Student,

10
Figure 5: Relational terminology

as depicted in Figure 5, has a cardinality of six. The Student relation, represented by Figure
5, consists of the attributes (that is, the column names) and the six tuples.
Figure 6 depicts the Student relation as a Venn diagram. Here, we represent a relation
as a set of tuples bounded by a circle. Venn diagrams have the advantage that they depict
relations as a set of tuples that are in no particular order.

Figure 6: Venn diagram for Student relation


The tuples of a relation at any given instant are known as the extension (or body) of
the relation. The extension, therefore, varies with time, changing as tuples are inserted,
deleted and amended. The attributes (that is, the column names) comprise the heading
of a relation. Unlike its extension, the heading of a relation does not vary with time. If a
further attribute were added to the Student relation, say, the attribute DateOfBirth, then
the resulting relation would be considered as constituting a new, di erent relation.
By convention, the heading of a relation is written in a form very similar to that employed
for entity type headings:-

11
Relation-name(Attribute-1, Attribute-2, ..., Attribute-n)

Remember that we are using a di erent style of printing to that employed with entity
types, in order to emphasise that it is a relation that is being written.
When writing down the heading of a relation, it is frequently convenient to indicate
the primary key of the relation. This is done by underlining the appropriate attribute(s). By
convention, the primary key is placed rst. Thus we write the heading of the Student relation
as:
Student(StudentId, Name, Registered, CounsellorNo, Region)

2.1.3 Null
So far we have assumed that every attribute in the tuple must have a valid value for the tuple
to be acceptable. However, in practice we may want to record some of the details about an
entity in a tuple while not yet having all the information available to complete it. Another
reason for not having values available for all the attributes is that sometimes an attribute
may not be applicable to all the tuples in the relation. In either of these cases it is legitimate
to replace the value of the attribute by a marker indicating the absence of a value. This
marker is known as null. Note that null is a marker, not a value of the attribute, although
sometimes it is loosely referred to as a null value.

2.1.4 The University relations


We have already seen the complete set of relations that correspond to the entity types of the
University E-R model in a previous printed attachment. You will need to refer to this set of
relations several times during your study.

2.2 Domains

The previous subsection introduced the constructs relation and attribute of the relational
model, with the intuitive understanding that they can be used to represent entity types and
their attributes from E-R models. In this subsection we shall introduce the domain construct.
Domains are important in terms of data typing, as mentioned in Rule 2 of the previous
subsection. We shall emphasise the way in which domains are semantic features.
A domain may be de ned as follows:-
A domain is a named set of values, with a common meaning, from which one or more
attributes draw their actual values.
Domains (of relations) correspond to value sets (of entity attributes). The most important
consequence of two attributes being de ned on the same domain is that their attribute
values are comparable. Domains are similar to user-de ned data types in many programming
languages but we emphasise their semantic role. In particular, attributes are not comparable
if they have been declared to be in di erent domains, even if the underlying data type of both
domains is the same.

12
Just as in many programming languages a user-de ned data type must be named and its
base data type (integer, real, Boolean, string, etc.) given, so the same must be done for
domains. This is done below, for the domains needed for the University relations.

domains
Identi ersOfStudents = s01 .. s99
PersonNames = string
Years = Yearnumber
Sta Numbers = 1000 .. 9999
Regions = 1 .. 9
CodesOfCourses = c1 .. c9
TitlesOfCourses = string
CreditValues = (30, 60)
AssignmentNumbers = 1 .. 5
Grades = 0 .. 100

The notation used above is not the notation used in any particular relational DBMS. Re-
member that in this block we are examining relational theory so any understandable notation
would do.
The notation used the domains declaration has the following features. The domain is
de ned as a range of values where this is appropriate. Hence:-
Identi ersOfStudents = s01 .. s99

indicates that any attribute de ned on the domain Identi ersOfStudents may only take
values in the range from s01 to s99, inclusive. The double dots mean that all the intermediate
values are included.
Where the range of permissible values is so small that all the values may be enumerated,
this is done, as in:-
CreditValues = (30, 60)

which states that any attribute de ned on the domain CreditValues may only take the
values 30 or 60.
Where neither a range nor an enumeration can be given, the appropriate base data type
is speci ed. For example:-
PersonNames = string

This means any sequence of characters. In the case of a calendar date you will notice
that we have assumed that an underlying base type of Yearnumber is available.

2.3 Developing a relational model

We are now in a position to bring together all the ideas we have been discussing regarding
relations, attributes and domains in order to construct relational models (that is, models

13
de ned in terms of relational theory). A relational model consists simply of the de nitions
of the domains and relations that make up the model. It is written using a more detailed
notation than that employed for the headings of relations, giving domains rst and declaring
the primary key of a relation separately from the declaration of the attribute(s) that comprise
the primary key.
We have already developed a set of suitable domains for the University relational model,
so we shall now proceed to declare in a similar way the relations required.

2.3.1 Declaring relations


Given the declarations of domains we can declare in a model of the University the composition
of the relations and the types of the attributes. The relation declaraions below shows some
typical relations in the University model. (Once again the syntax that we use corresponds to
no particular language.)

relation Student
StudentId: Iden ersOfStudents
Name: PersonNames
Registered: Years
CounsellorNo: Sta Numbers
Region: Regions
primary key StudentId
relation Course
CourseCode: CodesOfCourses
Title: TitlesOfCourses
Credit: CreditValues
primary key CourseCode
relation Enrolment
Studentld: Identi ersOfStudents
CourseCode: CodesOfCourses
TutorNo: Sta Numbers
primary key (StudentId, CourseCode)
The various domains capture some of the meaning of the data involved. In particular,
they proscribe certain comparisons and assignments as being invalid. For example, in a given
tuple in the Student relation it is valid to assign a value of s99 to the StudentId attribute
but not the value s100, because this value is not within the declared domain.
Similarly, it is legitimate to compare values between attributes in di erent relations only
if those attributes have been declared to have the same domain. For example, it is legitimate
to ask if the attribute StudentId in a particular tuple of Student is equal to the value of
the attribute StudentId in a tuple of Enrolment. This comparison is valid not because the

14
attributes happen to have the same names but because they have been declared to be in the
same domain.
It would not, however, be legitimate to ask whether the value of an AssignmentNumber
in a tuple of Assignment was equal to the value of Region in a Student tuple even though
the underlying data type might actually be an integer in both relations.
Thus domains express some of the meaning of the data by limiting the allowable compar-
isons and operations.

2.3.2 The University relational model


Below is a rst version of a relational model for the University example:-

model University
domains
Identi ersOfStudents = s01 .. s99
PersonNames = string
Years = Yearnumber
Sta Numbers = 1000 .. 9999
Regions = 1 .. 9
CodesOfCourses = c1 .. c9
TitlesOfCourses = string
CreditValues = (30, 60)
AssignmentNumbers = 1 .. 5
Grades = 0 .. 100

relation Student
StudentId: Iden ersOfStudents
Name: PersonNames
Registered: Years
CounsellorNo: Sta Numbers
Region: Regions
primary key StudentId
relation Course
CourseCode: CodesOfCourses
Title: TitlesOfCourses
Credit: CreditValues
primary key CourseCode
relation Enrolment
Studentld: Identi ersOfStudents
CourseCode: CodesOfCourses
TutorNo: Sta Numbers

15
primary key (StudentId, CourseCode)
relation Sta
Sta No: Sta Numbers
Name: PersonNames
Region: Regions
primary key Sta No
relation Assignment
Studentld: Iden ers0fStudents
CourseCode: CodesOfCourses
AssignmentNo: AssignmentNumbers
Grade: Grades
primary key (Studentld, CourseCode, AssignmentNo)
We need to note two speci c issues raised by the example of the University relational
model.
First, as you will have noticed, as well as de ning each relation by means of a name, and
giving the attributes that comprise the relation and the domains on which those attributes
are de ned, the relational model de nes the primary key of each relation. For example:-
primary key (Studentld, CourseCode, AssignmentNo)
occurs as part of the de nition of Assignment and so declares the primary key of Assign-
ment to be (StudentId, CourseCode, AssignmentNo). The above declaration is interpreted as
being that the primary key is StudentId combined with CourseCode and with AssignmentNo,
rather than that the primary key is StudentId or CourseCode or AssignmentNo. Each relation
has exactly one primary key.
Second, the relational model has expressed some of the meaning of the data by means
of constraints (that is to say, by its semantic features). For instance, an attempt to insert
the tuple <s09, Shannon, 1998, 5212, 4> into the Student relation will be invalid according
to the University model, since it violates the constraint of the primary key assertion: there
already exists a tuple whose StudentId value is s09. Similarly, an attempt to insert the tuple
<s11, Mercier, 1996, 3158, 14> is invalid, since the Region value, 14, is not a permissible
value for an attribute de ned on the domain Regions. Similar rejections would occur should
amendments to existing tuples violate the semantics of the University relational model.
It is particularly important to note that these constraints are recorded in a single descrip-
tion: the relational model.

2.4 Candidate keys, primary keys and alternate keys

Recall the informal de nition of a primary key of a relation as an attribute, or combination of


attributes, such that no two rows (i.e. no two tuples) in the relation can have the same value

16
for the attribute(s) comprising the primary key. In this section we shall explore the concept
of primary (and other) keys in more detail.

2.4.1 Candidate keys


The distinctness of tuples implies that, for some attribute, or combination of attributes,
within a relation, no two tuples will have the same value for that attribute (or attributes).
We term such an attribute (or combination of attributes), K, a candidate key if and only if
it has the properties of uniqueness and minimality:-
uniqueness { we mean that, at any given time, no two tuples of the relation have the same
value for the attribute K.
minimality { we mean that, if K is a combination of attributes, no attribute may be discarded
from the combined attributes K without destroying the uniqueness.
Every relation has at least one candidate key, by virtue of the distinctness of tuples. Note
that once we declare that a certain combination of attributes is a candidate key then this is
a constraint on what the relation may contain. Such a constraint determines what is allowed
and what is not allowed in an extension; it does not just re ect the current extension.
As an example to illustrate the de nition of a candidate key, we shall consider the Enrol-
ment relation. The combination of attributes (StudentId, CourseCode) is a candidate key
of Enrolment since no two tuples could have the same combined (StudentId, CourseCode)
value (so it has the uniqueness property) and because if we discard either StudentId or Cour-
seCode from this combination of attributes then the uniqueness property is lost (so it has
the minimality property). The combination of attributes (StudentId, CourseCode, TutorNo)
is not a candidate key of Enrolment because, although it has the uniqueness property, it does
not have the minimality property. We may discard an attribute, TutorNo, yet still retain the
uniqueness property.

2.4.2 Primary keys


If you now look back at the informal de nition of a primary key, you will see that, although
that de nition guarantees uniqueness, it does not guarantee minimality. It makes sense for
a primary key to exhibit minimality. For why should we want to identify tuples using more
attribute values than necessary? Hence we shall rede ne a primary key formally as follows:-

The primary key of a relation is one particular key chosen from the candidate keys.

Thus every primary key is a candidate key, and hence has the properties of identifying
tuples uniquely and of being minimal.
Most relations have just one candidate key. In which case the primary key must be this
key. However, some relations have more than one candidate key, as demonstrated by the
following examples.
Our rst example is a modi ed version of the Sta relation:-

17
Modi edSta (Sta No, Name, Region, NationalInsuranceNumber)

The relation Modi edSta has two candidate keys, Sta No and NationalInsuranceNum-
ber, because, fairly obviously, both have the properties of uniqueness and minimality.
Our second example concerns the relation:-
Appointment (Patientld, ApptDate, ApptTime, ConsultantNo)

Appointment records data about the appointments patients have with consultants. We
shall assume that no patient may have more than one appointment on any given day and that
consultants see only one patient at a time.
As a patient can have no more than one appointment on any given day, (Patientld,
ApptDate) is a candidate key for Appointment. So one way of writing the heading of the
relation, with the candidate key underlined, is:-
Appointment (Patientld, ApptDate, ApptTime, ConsultantNo)

The underlined candidate key can be read as re ecting the fact that a patient has only one
appointment on a given date (that is to say, some of the semantics are recorded). However,
there is an alternative to this candidate key, arising from looking at the relation from the
consultants' point of view, which is identi ed in a practical exercise.

2.4.3 Alternate keys


Where a relation has more than one candidate key, just one of these candidate keys is desig-
nated the primary key. The remaining candidate keys are termed alternate keys. Obviously,
where there is only one candidate key, that is the primary key and there are no alternate keys.
Where there is a choice to be made among candidate keys, we shall treat the designation of
the primary key as being an arbitrary choice. Alternate keys are declared in a relational model
in exactly the same way as primary keys, that is, by using the words alternate key followed
by the name of the alternate key, as part of the de nition of a relation. It is important to
include this in a model since an alternate key is a constraint no di erent from the primary
key. It is the fact that the key is a candidate key that expresses the constraint.

2.5 Representing relationships

We have spent a considerable amount of time showing how entity types in an E{R diagram
may be represented by relations but so far we have seen nothing about the representation of
the relationships. The reason for this is simple: they are not explicitly represented. Instead,
they may be inferred from the attributes that di erent relations have in common. Let us see
how this inference works in the speci c case of the University model.
Figure 19 gives the E-R diagram for the University E-R model. In an E-R model, rela-
tionships are between entity types, such as the relationship Enrolled between Student and
Enrolment. (You may nd it helps to have the University case study pages to hand for the
following discussion.)

18
Figure 7: University Conceptual Data Model

If you look back at the University case study pages and examine the relations Student
and Enrolment, you can see that they both have an attribute which identi es the student
concerned. That a relationship exists between the two relations can be inferred from the fact
that they have common attributes { StudentId in Student and StudentId in Enrolment.
That the degree of the relationship is 1:n, from Student to Enrolment, can be inferred
from the fact that StudentId is the primary key of Student but is not a key of Enrolment.
That is to say, a particular StudentId attribute value, such as s09, occurs in only one tuple
of Student, speci cally <s09, Reeves, 1998, 5212, 4>, but may occur in several tuples of
Enrolment, speci cally <s09, c4, 5324>, <s09, c2, 8431> and <s09, c7, 5324>.

2.5.1 Quali ed names and the dot notation


Before we go any further it is useful to introduce a notation for distinguishing attributes that
have the same name in di erent relations. Since relational theory requires attribute names
to be unique only within a relation, it is quite possible for a pair of attributes, each in a
di erent relation, to have the same name. For example, CourseCode is used as the name of
an attribute in three relations: Course, Enrolment and Assignment. Of course, we deal with
any ambiguity by indicating the relations involved - CourseCode in Course, CourseCode in
Enrolment, and so on.
We can formalise this approach simply by pre xing the name of an attribute with the name
of its relation whenever any ambiguity arises. The combination of relation name and attribute
name is called the quali ed attribute name. The particular notation used for achieving this
is known colloquially as 'dot' notation since, for example, the Studentld attribute in Student
may be quali ed as Student.Studentld (spoken as 'Student dot Studentld'). Similarly, the
Studentld attribute in Enrolment may be quali ed as Enrolment.Studentld which now clearly
distinguishes it from Student.Studentld.

19
2.5.2 Representing 1:n relationships using foreign keys
Inferring the existence of relationships from common domains is not very satisfactory and we
can improve on this situation and also determine the degree of the relationship using another
mechanism, called the foreign key mechanism.
When we want to represent a 1:n relationship using common attributes we can ensure
that the tuple on the '1' side is unique by choosing an attribute that it is a candidate key of
that relation. We can declare in the relation that is on the 'n' side of the relationship that it
has an attribute matching this key in the relation on the '1' side of the relationship. In other
words, we want an attribute in one relation to match a candidate key in another relation.
This de nition is called a foreign key. Formally, the de nition of a foreign key is:-

A foreign key is an attribute (or combination of attributes) in one relation, R2, whose values
are the same as values of a candidate key (usually the primary key) of some other relation, R1.

As an example of a straightforward foreign key, consider the Counsels relationship be-


tween the Sta and Student relations. The Counsels relationship is represented by matching
values of attributes de ned on the Sta Numbers domain; in particular, by the values of Stu-
dent.CounsellorNo matching values of Sta .Sta No. The attribute Student.CounsellorNo is
declared to be a foreign key in the Student relation (the R2 of the de nition) and its values
are the same as values of Sta .Sta No, which is the primary key of Sta (the R1, of the
de nition). A suitable way of doing this in the University model is a declaration in the Student
relation:-

foreign key CounsellorNo references Sta


The values of a foreign key are values of the primary key of some relation that are placed
(or posted, as it is sometimes known) in another relation in order to establish the relationship
between the two relations. So, the values of Student.CounsellorNo are values of the primary
key of Sta (that is, Sta .Sta No) placed in Student in order to indicate that a relationship
(named Counsels in the E-R model) exists between the two relations. The attribute(s) of
the foreign key and the associated primary key must, of course, be de ned on the same
domain(s).
Note that the foreign key normally has no uniqueness property for the relation into which it
has been placed. For example, CounsellorNo is not a candidate key for Student, the University
case study relations shows several tuples in Student that have the same CounsellorNo value.
In summary, this primary key/foreign key mechanism is the mechanism for representing
1:n relationships between relations. The relation on the :n side of the relationship includes,
as a foreign key, an attribute that is de ned on the same domain as a candidate key (usually
the primary key) of the relation on the 1: side of the relationship.
The primary key/foreign key mechanism is important because the declaration of a foreign
key, and of the primary key that it matches (or references), is an assertion about the existence
and degree of a relationship between relations. That is, it is an assertion about (some of)

20
the semantics of the data represented in a relational model.
We can now update the relational model with the additional declaration of the appropriate
foreign keys. This is shown below:-

model University
domains
Identi ersOfStudents = s01 .. s99
PersonNames = string
Years = Yearnumber
Sta Numbers = 1000 .. 9999
Regions = 1 .. 9
CodesOfCourses = c1 .. c9
TitlesOfCourses = string
CreditValues = (30, 60)
AssignmentNumbers = 1 .. 5
Grades = 0 .. 100

relation Student
StudentId: Iden ersOfStudents
Name: PersonNames
Registered: Years
CounsellorNo: Sta Numbers
Region: Regions
primary key StudentId
foreign key CounsellorNo references Sta
relation Course
CourseCode: CodesOfCourses
Title: TitlesOfCourses
Credit: CreditValues
primary key CourseCode
relation Enrolment
Studentld: Identi ersOfStudents
CourseCode: CodesOfCourses
TutorNo: Sta Numbers
primary key (StudentId, CourseCode)
foreign key StudentId references Student
foreign key CourseCode references Course
foreign key TutorNo references Sta
relation Sta
Sta No: Sta Numbers

21
Name: PersonNames
Region: Regions
primary key Sta No
relation Assignment
Studentld: Iden ers0fStudents
CourseCode: CodesOfCourses
AssignmentNo: AssignmentNumbers
Grade: Grades
primary key (Studentld, CourseCode, AssignmentNo)
foreign key (StudentId,CourseCode) references Enrolment
The relational model now di ers from the earlier one only for those relations that have
foreign keys; that is, Student, Enrolment and Assignment. Taking Assignment as an exam-
ple, we have the additional entry:-

foreign key (Studentld, CourseCode) references Enrolment


Following the declaration of the foreign key, foreign key (Studentld, CourseCode), the
primary key that matches the foreign key is speci ed by giving the name of the appropriate
relation (the referenced relation, as it is called) { references Enrolment. Giving the name
of the referenced relation is sucient to denote the primary key that is being referenced
(matched) since each relation has just one primary key. In the case of Enrolment, this is
(StudentId, CourseCode). Note, however, that if we had chosen to allow a foreign key to
reference any candidate key, not necessarily the primary key, then the declaration would need
to make the candidate key explicit.
Although the primary key/foreign key mechanism is crucial to the representation of rela-
tionships between relations, it may not be the sole mechanism involved. As was stated above,
it is the mechanism for representing 1:n relationships. As we shall see, in Subsection 2.6, this
mechanism needs an additional constraint to represent 1:1 relationships.

2.6 Representing E-R models

In this subsection we shall examine how the Hospital E-R model, may be represented as
a relational model. In the course of constructing this model we shall meet not only 1:n
relationships but also 1:1. m:n relationships are not going to be discussed.

2.6.1 The Hospital relational model


The E-R diagram for the version of the Hospital E-R model that we shall use is given as
Figure 8.
The entity types, their attributes and identi ers are as follows:-

22
Figure 8: Hospital E-R Diagram

Ward (WardNo, WardName)


Patient (Patientld, PatientName)
Nurse (Sta No, NurseName)
Doctor (Sta No, DoctorName, Position, Specialism)
Team (TeamCode, TelephoneNo)
Treatment (Sta No, Patientld, StartDate, Reason)
Drug (DrugCode, DrugName)
Prescription (PrescriptionNo, Quantity, DailyDosage)

As a rst step in representing this Hospital E-R model as a relational model, we may repre-
sent each of these entity types (provisionally) as a relation having attributes that correspond
to the attributes of the appropriate entity type. In addition, each entity identi er becomes
the primary key of the relevant relation. That is, we have the following eight corresponding
relations:-

Ward (WardNo, WardName)


Patient (Patientld, PatientName)
Nurse (Sta No, NurseName)
Doctor (Sta No, DoctorName, Position, Specialism)
Team (TeamCode, TelephoneNo)
Treatment (Sta No, Patientld, StartDate, Reason)
Drug (DrugCode, DrugName)
Prescription (PrescriptionNo, Quantity, DailyDosage)

The set of relations and their attributes given for the Hospital relational model is incom-
plete since not all the desired relationships between the relations are represented.
In fact, the set of relations and attributes can be said to represent only two relationships:

23
Provides and Receives. (Even these relationships are only implicitly represented by common
names for attributes.) For example, the Provides relationship is represented by the shared
attribute Sta No in both Doctor and Treatment. To be more precise, the Provides relation-
ship is represented by the declaration of Treatment.Sta No as a foreign key referencing the
primary key Doctor.Sta No. It is necessary that Doctor.Sta No and Treatment.Sta No are
both de ned on the same domain.
The provisional set of relations can represent the Provides and Receives relationships
simply by declaring the foreign key because the attributes of Treatment already include the
necessary attribute, Patientld, for the foreign key. We just happened to be lucky with this one,
but in general we shall have to add attributes with appropriate domains for the foreign keys.
In order to represent seven of the other eight relationships of the E-R model - ConsistsOf,
IsResponsibleFor, OccupiedBy, Sta edBy, Supervises, Requires and IsUsed { we need to add
the appropriate foreign keys to the relevant relations. For example, in order to represent
the ConsistsOf relationship, the primary key of Team is added as an attribute, which can be
declared to be a foreign key in Doctor.
In representing Consists0f, IsResponsibleFor, OccupiedBy and Sta edBy, the added at-
tribute for the foreign key has so far been given the same name as the primary key in the
referenced relation in our solutions. This need not be the case, since the essential require-
ment is simply that both foreign key and primary key are de ned on the same domain. Indeed,
a di erent name for the foreign key can sometimes emphasise the intended meaning. For
example, we shall now work with Patient as:-

Patient (PatientId, PatientName, ConsultantNo, WardNo)

on the grounds that ConsultantNo is a more meaningful name than Sta No for the
foreign key that is used to represent the IsResponsibleFor relationship, since only consultants
are allowed to be responsible for patients.

2.6.2 Representing 1:1 relationships


Finally, we come to the representation of the eighth relationship, HeadedBy, which is 1:1
between Team and Doctor. Here are the two relevant relations, in their current state:-

Team (TeamCode, TelephoneNo)


Doctor (Sta No, DoctorName, Position, Specialism, TeamCode)

We represent the HeadedBy relationship by adding a foreign key, as we have done before
with 1:n relationships. However, there are two important points of di erence, dictated by the
1:1 degree of the relationship. First, we note that as the relationship is symmetrical we must
have a choice about which of the relations will contain the foreign key. Second, we need to
ensure that the values of the foreign key attribute are unique since we want the relationship
to be 1:1 not 1:n.
Let us rst consider which of the two relations, Team or Doctor, would be better for the

24
inclusion of the attribute for the foreign key. We can either add the foreign key HeadNo to
Team to show who heads that team or we can add the foreign key TeamCode to Doctor,
showing which team that doctor heads.
Choosing the rst option yields:-

TeamOne (TeamCode, TelephoneNo, HeadNo)


DoctorOne (Sta No, DoctorName, Position, Specialism, TeamCode)

We have named the attribute for this foreign key HeadNo to re ect its purpose. (Note
that DoctorOne contains, as before, the foreign key TeamCode as the means for representing
the ConsistsOf relationship.)
The second option yields:-

TeamTwo (TeamCode, TelephoneNo)


DoctorTwo (Sta No, DoctorName, Position, Specialism, TeamCode, HeadsTeamCode)

with HeadsTeamCode added as an attribute to be declared as a foreign key in DoctorTwo.


Although both these options are possible it is much better to choose the rst option
because it represents a more natural grouping of the data. HeadNo is a natural additional
attribute of the relation Team because it is reasonable to think of the head of a team as a
property of the team. In the second option the attribute HeadsTeamCode seems unnatural
since most doctors do not head a team, re ected by the fact that there will be many tuples
of DoctorTwo which would contain null for the HeadsTeamCode attribute.
Given that we have chosen this rst option, how is the relationship constrained to be 1:1
rather than 1:n? We require a mechanism to ensure that values of HeadNo are unique in
TeamOne. You have already met such a mechanism: HeadNo must be declared to be an
alternate key of TeamOne. If HeadNo is declared to be an alternate key then no two tuples
of TeamOne can have the same value for the attribute HeadNo.
So, in summary, the representation of a 1:1 relationship has two steps. First, decide which
one of the two relations will have the attribute declared as a foreign key, and then declare
that same attribute to be an alternate key.
The nal version of the eight relations for the Hospital example is:-

Ward (WardNo, WardName)


Patient (Patientld, PatientName, ConsultantNo, WardNo)
Nurse (Sta No, NurseName, WardNo, SupervisorNo)
Doctor (Sta No, DoctorName, Position, Specialism, TeamCode)
Team (TeamCode, TelephoneNo, HeadNo)
Treatment (Sta No, Patientld, StartDate, Reason)
Drug (DrugCode, DrugName)
Prescription (PrescriptionNo, Quantity, DailyDosage, PrescribingSta No, ReceivingPatien-
tId, PrescribedDrugCode)

25
We may now complete the relational model. First, we require the domain de nitions.
These are given below (we have chosen the domain values):-
domains
WardNumbers = w1 .. w9
NamesOfWards = string
Identi ersOfPatients = p01 .. p99
PersonNames = string
Sta Numbers = 1000 .. 9999
PositionsOfDoctors = (Consultant, Registrar, House Ocer)
SpecialismsOfConsultants = string
NumbersOfTelephones = 1000 ... 8000
Reasons = string
CodesO eams = t1 .. t8
CodesOfDrugs = d01 .. d99
NamesOfDrugs = string
Dates = calendardates
NumbersOPrescriptions = integer
QuantitiesOfDrugs = string
DailyDosagesOfDrugs = string

2.6.3 Recursive relationships


We have dealt with the representation of all the straightforward 1:n relationships for the
Hospital E-R model and now want to consider the recursive relationship Supervises. You
will remember that Supervises is a recursive 1:n relationship, representing the hierarchy of
supervision within the nurses on a ward.
Supervises may be represented by the same primary key/foreign key mechanism as has
been employed for the other relationships considered so far. The de nition of a foreign key
allows the possibility that the relations concerned are not distinct. This means that the
foreign key mechanism can still be used even when the relationship represented is recursive.
The use of the mechanism when R1, and R2 are not distinct allows the representation of
the recursive relationship Supervises. We may add an attribute referencing the primary key of
Nurse (the R1) to give the sta number of the supervisor, which will be declared as a foreign
key in Nurse (the R2) itself. That is, we have:-

Nurse (Sta No, NurseName, WardNo, SupervisorNo)

where SupervisorNo is the foreign key, with values that match the values of the primary
key, Sta No. Note here that the foreign key cannot have the same name as the primary key
it references, since names of attributes must be unique within a relation.
Exactly the same reasoning as we have employed for non-recursive relationships applies to
the representation of the Supervises relationship. Sta No and SupervisorNo must be de ned

26
on the same domain. A value of Sta No will identify at most one tuple of Nurse, since it is
the primary key, but the same SupervisorNo value will potentially occur in many Sta tuples.
Thus the degree of the Supervises relationship is 1:n.
Figure 9 gives an extension of Nurse. Two tuples in Figure 9 have a SupervisorNo entry
which is the primary key of that same nurse. For example, the nurse with Sta number 126
has Supervisor number 126. This self-reference has been used here to indicate that the nurse
with the Sta No value of 126 is the senior nurse on the ward and has no other supervisor.

Figure 9: An extension of the Nurse relation

2.7 Summary

The structural part of the relational model is based on three constructs: relation, attribute
and domain. A relation is distinct from a table but it is conveniently depicted as a table with
the understanding that no ordering of attributes or tuples is to be implied. A relation may be
used to represent an entity type.
More formally, a relation consists of a xed set of attributes, each of which is de ned
on some underlying domain. This constitutes the heading of the relation. In addition, a
relation has a potentially time-varying set of tuples that constitute the extension (or body)
of the relation. Each relation has at least one candidate key, each value of which will identify
precisely one tuple in the relation. One of the candidate keys in a relation is designated
the primary key; which must be a minimal set of attributes. Any remaining candidate keys
become alternate keys.
One method for representing relationships between relations uses an attribute (or com-
bination of attributes) which is posted in one relation and whose values correspond to the
values of a primary key in the other relation. These values must be declared to be in the
same domain. The posted attribute is called a foreign key. This mechanism may be used to
represent 1:1 and 1:n relationships.

27
Domains, primary keys, foreign keys and alternate keys assert (some of) the semantics of
the data represented by a set of relations. A set of relations may be described by a relational
model.
Having completed this section you should now be able to:-
1. use the terminology of relational theory to describe and discuss an example set of
relations;
2. represent an E-R model by a set of relations using the primary key/foreign key mecha-
nism with a posted attribute to represent relationships between the relations;
3. describe a set of relations by means of a relational model.

3 Manipulating Relations
You have gained a sound understanding of the structural part of relational theory from Section
2. We shall now examine the manipulative part, which is about manipulating relations using a
set of operators. The operators that we shall describe throughout most of this section consist
of a collection of set-level operators that make up the relational algebra plus a relational
assignment operator.
Before examining the relational algebra in detail we must make three points that are
crucial to an understanding of the utility and simplicity of the algebra.
First, the operators act on whole relations. They take as their operands whole relations
at a time, a set of tuples, as opposed to single tuples at a time.
Second, the relational algebra has what is termed the closure property. The closure prop-
erty guarantees that the result of applying any relational algebra operation to any relation(s)
will always produce a relation as the result. Hence the result of any relational algebra expres-
sion may be used as the operand in another relational algebra expression or may be assigned,
using the relational assignment operator, to some named relation.
Third, the relational algebra is not necessarily intended to be a language that would be
used seriously in an implementation of a relational DBMS. Rather, its intent is to stand as
a theoretical basis that is relationally complete, in the sense that it addresses the subtleties
and issues of relational processing.
The relational algebra operators that we shall now examine in detail are select, project,
join and divide. We shall explain and illustrate these operators by means of the University
relational model. It is reproduced here below. We shall refer also to the extensions of the
University case study.

model University
domains
Identi ersOfStudents = s01 .. s99
PersonNames = string
Years = Yearnumber
Sta Numbers = 1000 .. 9999

28
Regions = 1 .. 9
CodesOfCourses = c1 .. c9
TitlesOfCourses = string
CreditValues = (30, 60)
AssignmentNumbers = 1 .. 5
Grades = 0 .. 100

relation Student
StudentId: Iden ersOfStudents
Name: PersonNames
Registered: Years
CounsellorNo: Sta Numbers
Region: Regions
primary key StudentId
foreign key CounsellorNo references Sta
relation Course
CourseCode: CodesOfCourses
Title: TitlesOfCourses
Credit: CreditValues
primary key CourseCode
relation Enrolment
Studentld: Identi ersOfStudents
CourseCode: CodesOfCourses
TutorNo: Sta Numbers
primary key (StudentId, CourseCode)
foreign key StudentId references Student
foreign key CourseCode references Course
foreign key TutorNo references Sta
relation Sta
Sta No: Sta Numbers
Name: PersonNames
Region: Regions
primary key Sta No
relation Assignment
Studentld: Iden ers0fStudents
CourseCode: CodesOfCourses
AssignmentNo: AssignmentNumbers
Grade: Grades
primary key (Studentld, CourseCode, AssignmentNo)

29
foreign key (StudentId,CourseCode) references Enrolment
In Section 5, we shall introduce you to RAS (relational algebra system) { a software
package which will allow you to explore the algebra on your computer.

3.1 The select and project operators

The select and project operators are both unary operators { they process one relation as
their single operand.

3.1.1 The select operator


The select operator can be thought of as 'slicing' a relation horizontally. The select operator
applied to a given relation produces as its result a relation whose extension is a subset of
the extension of a given relation, the contents of the subset being determined by a selection
condition. The general form of a select expression is:-

select <relation> where <selection condition>


For example, we may have:-

select Student where Registered > 1996


Evaluating this relational algebra expression will result in a relation (by the closure prop-
erty). Speci cally, it will result in a relation whose heading is the same as Student, the
<relation>, but whose extension will consist of those tuples of the extension of Student for
which the <selection condition>, Registered > 1996, is true. That is, the expression results
in (or evaluates to, as it is sometimes expressed) the relation in Figure 10.

Figure 10: Students registered after 1996


The selection condition can consist of a simple condition or combination of simple condi-
tions. For example:-

select Student where Registered > 1996 and StudentId <> s09
which will result in the relation in Figure 11.

30
Figure 11: Students registered after 1996 excluding student s09

Each simple condition in a selection condition is typically a comparison of an attribute


with a constant. For example, in processing the simple condition Registered > 1996, the val-
ues of the attribute Registered are compared with the constant 1996. As you might expect,
the constant must be a value drawn from the same domain as that on which the attribute
is based. Alternatively, the simple condition may compare two attribute values provided, of
course, that the two attributes occur in the relation and are de ned on the same domain.
For example, using the relation:-

GeneralPractitioner (GPId, GPName, SecId, SecName)

introduced in Subsection 2.4, the following is a valid select expression:-

select GeneralPractitioner where GPName = SecName


The expression produces all those tuples of GeneralPractitioner in which the name of the
secretary is the same as the name of the general practitioner.

3.1.2 The project operator


The project operator can be thought of as 'slicing' a relation vertically, picking out the wanted
attributes from a given relation. For example:-

project Student over Name


produces a relation whose heading consists of just one attribute, Name, and whose ex-
tension consists of all the distinct Name values in Student, as depicted in Figure 12.
This relation happens to have the same cardinality as Student, since all Name values in
the extension of Student are distinct. However:-

project Student over CounsellorNo


produces the relation depicted in Figure 13.
The cardinality of the relation of Figure 13 is di erent from that of Student since there
are duplicated CounsellorNo values in Student which are reduced to a single value as part of
the project operation. This can be seen as a consequence of the closure property { the result

31
Figure 12: Projecting student names

Figure 13: Projecting counsellor numbers

of a project expression is a relation and tuples in a relation are required to be distinct.


The general form of a project expression is:-

project <relation> over <attribute list>


where <attribute list> is one or more of the attributes in the heading of <relation>. The
result of evaluating this expression is a relation with a heading formed from the attribute list
given in the expression and an extension produced from the extension of the given relation
with duplication removed. (The domains of the attributes are unchanged from the attributes
of the original relation.)
For example, the expression:-

project Course over Title, Credit


produces the relation depicted in Figure 14.

3.1.3 Combining expressions


Relational expressions, such as those involving select and project, can be combined so that
the result of one expression is used as the operand in a subsequent expression. Note that
such a combination is legitimate by virtue of the closure property. We shall describe two ways
of achieving such a combination.
One way is to link a sequence of expressions together by naming the relations that result

32
Figure 14: Course titles and credit values

from individual expressions. This may be done using the term giving. An example of this is:-

select Sta where Region = 3 giving A


where the use of giving allows the relation produced by the select expression to be referred
to as A. The relation named A may now be used as the operand relation in a subsequent
expression. For example, suppose we want the names of all the sta in Region 3. This may
be achieved using a sequence of two linked expressions, as follows:-

select Sta where Region = 3 giving A


project A over Name giving B
The label A refers to the result of the initial select expression and B refers to a relation
of degree 1 and cardinality 1, consisting of the single tuple <Jennings>. In general, any
number of expressions may be linked together by means of the term giving and temporary
relations such as A, B, C, etc., to produce a sequence of linked expressions. The relations
produced are temporary because we do not want them to become a permanent part of the
database. Any names, not just a single letter, may be used to label them, but we have used
single letters to emphasise their temporary nature.
The other way of achieving a combination of expressions can be illustrated by building up
another single expression which produces exactly the same result as the two linked expressions
above. We can produce the single expression that we want as follows.
The second expression of the linked pair is:-

project A over Name giving B


Since A is the relation resulting from the rst expression of the linked pair, select Sta
where Region = 3, it may be replaced in the second expression by the rst expression but
enclosed by brackets, giving:-

project (select Sta where Region = 3) over Name giving B


We refer to this as a nested expression, since we nest the rst individual operation to be
performed (in the brackets) within the next operation to be performed. In general, a nested

33
expression may contain any number of subsidiary expressions, nested one within another.
If you nd it dicult to develop a nested expression directly, try developing it rst as a
sequence of expressions and then convert the sequence to the nested form by replacing the
temporary relations with the appropriate bracketed relational expression.

3.2 The join and divide operators

The join and divide operators are binary operators { they process two relations as their two
operands. As such they allow the processing of one relation with another, using the shared
attribute values between relations.

3.2.1 The join operator


There are, in fact, several versions of the join operation in the relational algebra. We shall
concentrate on the operation known as a natural join, performed by what we shall simply
term the join operator. A join expression has the general format:-

join <relation 1> and <relation 2> where <attribute 1> = <attribute 2>
where <attribute 1> is in <relation 1> and <attribute 2> is in <relation 2>. For ex-
ample, the following is a join expression:-

join Student and Sta where CounsellorNo = Sta No


and it produces the relation depicted in Figure 15.

Figure 15: A join of Student and Sta


If select and project 'slice' relations, then join 'pastes' relations together using shared
attribute values. Thus, in the above example, join 'pastes' the tuples of the Student and
Sta relations together. A Student tuple is only 'pasted' to a Sta tuple where the condition

34
in the join expression, CounsellorNo = Sta No, holds. In general, this condition involves the
comparison of two sets of attribute values { <attribute 1> from <relation 1> and <attribute
2> from <relation 2>. As with any comparison in the relational model, these attribute values
must be de ned on the same domain.
Only one set of the compared attribute values occurs in the result of a join expression.
By convention, these attribute values are named from the rst relation (that is, <relation
1>). For example, Figure 15 results from the values of CounsellorNo in the Student relation
(<relation 1>) being compared with the values of Sta No in the Sta relation (<relation
2>), and so the compared values appear in a single column labelled CounsellorNo. Note,
however, that those attributes not included in the where clause of a join expression are not
combined, with the result that in Figure 15, for example, two attribute names appear twice.
The possible duplication of attribute names is a minor problem as the result of a join.
Figure 15 is not, apparently, a relation, since the table's second and sixth columns have the
same name, as indeed do the table's fth and seventh columns. We resolve this problem as
you might expect by taking the attribute names in the result of a join as quali ed attribute
names, with their quali er being that of their original relation.2 Thus the heading for Figure
15, using quali ed attribute names, is:-

(Student.Studentld, Student.Name, Student.Registered, Student. CounsellorNo, Student.Region,


Sta .Name, Sta .Region)

Hence, Figure 15, with these quali ed attribute names, does depict a relation.

3.2.2 Using join with other operators


In conjunction with other operators, join can be used as a powerful aid to constructing an
expression to give a wide range of results. For example, suppose we want to know just the
names of sta who counsel students who were registered after 1996. First, we join Student
and Sta so that we have together all the data about the students and the sta who counsel
them:-

join Student and Sta where CounsellorNo = Sta No giving A


Then, we select just those tuples that relate to students registered after 1996:-

select A where Registered > 1996 giving B


Finally, we project out just the attribute that is of interest to complete the query:-

project B over Sta .Nome giving C


That is, we have the sequence of three linked expressions:-

35
join Student and Sta where CounsellorNo = Sta No
select A where Registered > 1996 giving B
project B over Sta .Nome giving C
Note that we have used the quali ed attribute name Sta .Name in the third expression.
This is necessary since the relation B contains two unquali ed attribute names both labelled
Name { one Name from the Student relation and another Name from the Sta relation.
We shall employ a general rule that quali ed attribute names are used only wherever ambi-
guity would arise without their use. So, Registered in the second expression does not need
quali cation since its use there is not ambiguous. Note also that, where a quali ed attribute
name is used, the quali cation used is the name of the original relation, rather than the name
of any temporary relation.
Rather than writing the above query as three linked expressions, it could have been written
as a single nested expression.
Starting with the last expression of the sequence, we start writing it down exactly as
it appears until we reach the name of a temporary relation. At this point instead of that
name we write a pair of brackets enclosing a large space, and nish o the nal expression.
Into the bracketed space we then insert the expression for the temporary relation using the
expression from earlier in the sequence which de ned that temporary relation. If we encounter
yet another temporary relation we insert another pair of brackets enclosing a space and so
on until all the intermediate temporary relations have been replaced.
As another example of the power of the relational algebra, we shall consider the require-
ment to list the names of the students who are taking 60-point courses. We shall develop the
query as a series of linked expressions. First, the tuples in Course which are 60-point courses
are identi ed:-

select Course where Credit= 60 giving A


These tuples are then joined to Enrolment in order to gain the Studentld values for the
students studying these courses:-

join A and Enrolment where CourseCode = CourseCode giving B


The tuples of B are then joined to Student in order to gain the Name values:-

join B and Student where Studentld = Studentld giving C


Finally, the Name values are projected out:-

project C over Name

36
3.2.3 The divide operator
We have said that join 'pastes' relations together using shared attribute values. The divide
operator also processes two relations on the basis of shared attribute values. However, rather
than 'pasting', divide reduces one of the relations into a smaller relation, which contains only
those tuples for which the second relation was a 'factor.
An immediate illustration will help. We shall use divide to list the student identi ers of
those students who are studying all the courses. To begin, we need to describe the two
relations that will be used in the division. The rst relation, Studies, is shown in Figure 16.
It shows every course being studied by each student.

Figure 16: The Studies relation


You can think of Studies as being produced from the Enrolment relation by the expression:-

project Enrolment over StudentId, CourseCode giving Studies


The second relation, NewCourses, which will be used to divide the rst, is shown in Figure
17. NewCourses contains the CourseCode values for new courses.
We are now in a position to write the divide expression that will produce the student
identi ers of those students who are studying new courses:-

divide Studies by NewCourses over CourseCode giving C

37
Figure 17: The NewCourses relation

The relation C that results from this expression is shown in Figure 18.

Figure 18: Students studying new courses


The result is two values s05 and s09, because only these Studentld values appears in
Studies together with all the three values of CourseCode that are in NewCourses: c2, c4
and c7. The other Studentld values in Studies do not appear in C since they do not appear
in Studies together with all of the values of CourseCode that are in NewCourses.
A divide expression has the general form:-

divide <dividend relation> by <divisor relation> over <attribute>


The <dividend relation> is the one that will be reduced and it consists of at least two
attributes. The <divisor relation> must consist of a subset of the attributes of the <dividend
relation>. Since it is possible for both attributes of the <dividend relation> to be de ned on
the same domain, the pair of attributes over which the division is to be performed is indicated
by giving, as <attribute>, the name of the attribute in the <dividend relation> which is to
be 'divided out' by the <divisor relation>.
If we consider the <dividend relation> to have the two attributes A and B and the <divisor
relation> to have the attribute C, where B is to be 'divided out' of the <dividend relation>
by C, then the result will be a relation with the single attribute A such that each value of A in
the result occurs in tuples of the <dividend relation> that contain values of B corresponding
to all of the values of C in the <divisor relation>.

4 Updating
One of the purposes of the operators that make up the manipulative part of the relational
model was to provide a means of describing the insertion, deletion, amendment and retrieval

38
of data in a database. The relational operators that have been described so far in this
section provide the means for describing only the retrieval of data. We have not yet seen
how relational theory provides for the updating of relations { the insertion, deletion and
amendment of data. Updating may be achieved by means of the relational assignment
operator, :=. The relational assignment operator allows any relation to be assigned to any
other union-compatible relation, with the tuples of one relation being replaced by a copy of
the tuples of the other.
Two relations are union-compatible if they are of the same degree and there is one-to-one
mapping between the attributes in both relations.
Thus:-

UpdatedRelation := RelationalExpression

assigns the result of RelationalExpression to UpdatedRelation, with the tuples of Up-


datedRelation being replaced by copies of the tuples of RelationalExpression (the extension
of the relational expression replacing that of the relation to be updated, with the headings
unchanged).
Using the relational assignment operator, union may be used to insert tuples into a rela-
tion. For example,

Student := Student union <s11, Levinson, 1998, 3158, 3>

adds a tuple to Student. This shows the insertion of a single tuple, the values of which
have been explicitly speci ed. Of course, the second relation in the union may be any named
relation (and the cardinality can be greater than 1), provided it is union-compatible.
In a similar vein, di erence may be used to delete tuples from relations. For example:-

Student := Student di erence <s11, Levinson, 1998, 3158, 3>

deletes a tuple from the Student relation. More powerfully, the linked expressions:

select Student where Region = 4 giving Region4Students


Student:= Student di erence Region4Students

delete the tuples of all the students in Region 4.


Amendment of tuples can be achieved by, rst, deleting the tuples to be amended, and
then inserting the tuples in amended form. For example:-

Student := Student di erence <s01, Akeroyd, 1993, 3158, 3>


Student := Student union <s01, Akeroyd, 1996, 3158, 3>

changes the Registered value of the student with a Studentld of s01 from 1993 to 1996.

39
5 Practical Manipulation Using RAS
This section is devoted to practical work, and you will use your computer to run software,
to illustrate and explore some of the features of relational theory which you have studied
in previous sections. The software is directed at the manipulative part of relational theory
and allows you to execute relational algebra expressions against selected relational databases.
The software is called relational algebra system or simply RAS.
Before you go on to study RAS in detail, we should emphasise that it is a piece of
educational software built to demonstrate aspects of relational theory, such as the select,
project and join operators. In particular, it is not meant to be a realistic relational DBMS
and we would not wish you to judge RAS against the criteria appropriate for such products {
its intentions are quite di erent.

5.1 An overview of RAS

RAS permits access to data described by the University relational model and part of the
Hospital relational model, and provides a facility for displaying these models. RAS also allows
the extensions of the relations in each of the models to be examined in detail.
RAS supports manipulation via the relational algebra described in Section 3, with the
proviso that each expression in RAS must involve only a single relational operator. RAS
therefore does not support nested expressions, and so you will need to make use of linked
relational algebra expressions (linked by means of the giving construct).
First you will explore some of the features of RAS, and then it concentrates on executing
example relational algebra expressions similar to those you studied in Section 3. For this
activity you will need your computer and you will need rst to have completed the RAS
installation.
Into the input window you can type this command sequence:-
select Student where Registered > 1996 giving RecentStudents;
display RecentStudents;

In a moment, you can execute this sequence, but just look at the structure of it rst.
There are two separate commands { a select command which ends at the rst semicolon
and a display command which ends at the second.
Look at the rst command. The command is an instruction to do some selection of tuples
from a relation, according to some condition, and use the result of this selection to give a
temporary relation called RecentStudents. That is, nd all students registered after 1996.
The rst command merely does the selection { if you need to see the resulting relation,
you need to display it, which is just what the second command does.
Now execute this command sequence. From the Execute menu choose the menu item
All commands. The result will be shown in the output window. You can see the relation
produced by scrolling through the output window. This seems very similar data to that you
have been studying in the text except that RAS lists the attributes in the heading vertically

40
rather than horizontally. Notice that the heading of RecentStudents is given using quali ed
attribute names, using the dot notation, such as Student.Studentld.
Before going any further with speci c commands you need to take a look at the model
and the database. We shall give you several commands for this but they will be executed
singly so there is no need to remove old commands from the input window until you need to.
Type the following command and execute it:
show model;
If you scroll through the output, you should see that this is very similar to the University
model. This command shows the model but not any of the tuples. If you do not need to see
the whole model you may choose to see just the domains or individual relations.
For example, try typing and executing each of the following commands separately:-
show domains;
show Student;

You have seen how you can examine the model, what about the tuples that make up the
model data? You can list these by typing and executing the command:-
display model data;

You should see a display of all the relations, their attributes and all the tuples contained
in the University database as they stand at the moment. Note that this command used the
word 'display' rather than 'show'. In RAS you use 'show' to obtain a listing of the structural
elements of the database model, domains, and so on; while you use 'display' to see a collection
of tuples.
Looking at the output, you can see that RAS has listed all ve of the permanent relations
of the model and also, for convenience, the temporary relation RecentStudents that you
created This is done in a standard way: rst the de nition of the relation { the heading, and
then the tuples in that relation, the extension, in headed columns.
So Student has the familiar attributes of:-

StudentId, Name, Registered, CounsellorNo, Region

and the primary key is underlined in the usual way.


Notice also in the extension of the Enrolment relation, the presence of nulls in the last
three tuples. These nulls indicate that an Enrolment entity can exist without being related
to a Sta entity via the Tutors relationship. That is to say, student s46 can be enrolled for
the three courses c3, c2 and c4, without yet having a tutor assigned to him.
Now let us try a slightly more complex command using a combination of conditions:-
select Student where Registered > 1996 and StudentId <> 's09' giving Temp-A;
display Temp-A;

41
Before you execute these commands, notice how two simple conditions have been linked
by and and how it is necessary to place non-numeric values, like the student identi er 's09',
in single quotes. Execute these commands and you will see the e ect of the extra condition:
StudentId not equal to 's09'. In the resulting display, no tuple for student s09 appears.
The next RAS command uses the project operator:-
project Student over Name giving Temp-C;
display Temp-C;

Before you execute these commands, look back to the Student relation in the output
window. 'Patel' occurs as the value for the Name attribute of two separate tuples s57 and
s38. So, what do you expect project to do when it produces a relation consisting of names
alone?
Execute the commands and you will see that RAS has ignored duplicate Name values in
producing the Temp-C relation, just as you would expect.
Of course, project can be used to include more than just one attribute. You can see this
in the next command:-
project Course over Title, Credit giving Temp-C;
display Temp-C;

Notice, that these commands use the same name for the temporary relation, Temp-C, as
the previous ones. What e ect might this have? Execute them now to see the result.
These commands have replaced the old version of Temp-C with a new relation having the
same name. This illustrates the general point that temporary relations, which exist only for
the duration of a RAS session, can be replaced by an entirely di erent relation at any point
in that session.
You can use combinations of RAS commands together, linking them via temporary rela-
tions. You can see this in the next command sequence,:-
select Student where Registered > 1996 giving RecentStudents;
display RecentStudents;
project RecentStudents over Name giving JustNames;
display JustNames;

Before you execute these four commands, spend a moment thinking about what they will
do. So exactly what do you expect to see when the nal display is executed? When you have
decided what to expect execute the whole set of four commands.

5.1.1 Using the join operator


Now let us look at the join operator. This is the last operator that you will be using in the
introductory section, although there are several others available in RAS.
Before doing the join it would be useful to display both the Student relation and the Sta
relation, so do that rst using two suitable commands. You can see that some attribute

42
values are shared between the two relations, although the attributes have di erent names.
In the Student relation, the attribute CounsellorNo is the number of a member of sta , and
those same values appear in the Sta relation as Sta No values.
These are the attributes that you will join together:-
join Student and Staff where CounsellorNo = StaffNo giving StudentWithCounsellors;
display StudentWithCounsellors;

This syntax assumes that the rst-named attribute, CounsellorNo, is found in the rst-
named relation Student, and the second attribute Sta No, is found in the second relation
Sta . Now execute these commands and check that the result is as expected.
Notice the labelling of the attributes in the heading of the resulting relation, which we
have given the name StudentWithCounsellors. This is where quali ed attribute names are
important { they tell you the original relation that the attributes came from.
In order to save space, only the unquali ed names have been given in the column headings,
but you can always check by looking above at the quali ed names in the vertical heading. So
the second column heading Name is an abbreviation for the second quali ed attribute in the
heading, Student.Name. The sixth column is also headed Name, but that corresponds to the
sixth quali ed attribute in the heading, Sta .Name.
Of course, in a joined relation, there is one set of attribute values that came from both
of the original relations { the values for which the comparison for equality evaluated to true.
Here, the single example of this is labelled as Student.CounsellorNo because it is conventional
in RAS to give compared values the rst of their possible labels from the join command. That
is just the way we described it in Section 3.
Also, in Section 3, there was a query that you are now going to answer using a join
together with some of the other operators. The query is intended to provide the names of
sta who counsel students registered after 1996. That uses a join, a select and project in
that order. To see the commands type:-
join Student and Staff where CounsellorNo = StaffNo giving StudentWithCounsellors;
display StudentWithCounsellors;
select StudentWithCounsellors where Registered > 1996 giving RecentOnes;
display RecentOnes;
project RecentOnes over Staff.Name giving JustCounsellorNames;
display JustCounsellorNames;

Notice that we have inserted a display command after every command involving a rela-
tional operator. Apart from the last of these, which is necessary so that you can see the
results, the inclusion of these is unnecessary for the successful execution of the query. We
have included them so that you can see the intermediate results in this introductory session.
In fact, this is often a useful technique while you are developing a string of commands so
that you can check what relation is produced at each step.
Finally, before you execute these commands, notice the project command in the last
command but one. There, a quali ed attribute name, Sta .Name, is needed because without

43
the quali cation, the command would be ambiguous. Name could be either Student.Name
or Sta .Name.
You can now execute these commands, study the output and convince yourself that you
understand exactly what each command is achieving.
You should now be able to formulate and execute your own commands and use RAS to
produce the outputs.

6 Constraints
This section examines the constraints part of relational theory. The constraints part of
relational theory allows constraint de nitions to be recorded in a relational model. A constraint
declaration constrains the operations that may be performed on relations, so that at all times
the data is consistent according to the model. These constraints may be used to forbid the
insertion of a new invalid tuple or the deletion of a tuple that is required to be retained for
consistency of the model. Constraint de nitions take a variety of forms depending on the
semantics that are to be represented. This variety can be represented by three main types of
constraint: key constraints, attribute constraints and general constraints.

6.1 Key constraints

From Subsection 2.4, you should be aware of the part played by the various key declarations
in a relational model (e.g. primary keys, alternate keys, foreign keys) in asserting semantics.
We begin this section on key constraints by summarising the semantic signi cance of such
key declarations and then continue by describing the appropriate constraint de nitions.

6.1.1 Primary keys


The declaration of the primary key of a relation is one form of constraint de nition for can-
didate keys. The University relational model has a primary key declaration for each relation.
For example, the following is a portion of that model:-

relation Sta
Sta No: Sta Numbers
Name: PersonNames
Region: Regions
primary key Sta No
This last line, the declaration of the primary key Sta No, is an example of a constraint
de nition, because it constrains the set of valid states of the University relational database.
Speci cally, the Sta relation cannot contain two tuples which have the same value for
the Sta No primary key { the value of the Sta No primary key is the means by which the
uniqueness of tuples in Sta is guaranteed.

44
Furthermore, the Sta No primary key in a Sta tuple must always have a value { oth-
erwise the basis for guaranteeing the uniqueness of tuples in Sta is undermined. This will
be true for any relation, not just Sta . The primary key of any relation must always have a
value. This property of all primary keys in all relations is re ected in a general rule of relational
theory, known as the entity integrity rule { no attribute that forms part of the primary key
of a relation is allowed to be null.
Null is here taken to indicate that a value is missing.

6.1.2 Alternate keys


The declaration of an alternate key in a model also imposes a constraint but it is a less
demanding constraint than the primary key constraint.
We shall use the GeneralPractitioner relation as an example. GeneralPractitioner has a
heading of:-

GeneralPractitioner (GPId, GPName, Secld, SecName)

with candidate keys of GPId and Secld. Assuming that we choose GPId as the primary
key, then GeneralPractitioner may be declared in a relational model as:-

relation GeneralPractitioner
GPId: lden ersOfGPs
GPName: PersonNames
SecId: Identi ersOfSecs
SecName: PersonNames
primary key GPId
alternate key SecId
The declaration of the alternate key in the last line is another example of a constraint
de nition because, like the primary key declaration, it constrains the set of valid states of
the GeneralPractitioner relation. Speci cally, the GeneralPractitioner relation cannot contain
two tuples which have the same value for the SecId attribute. However, unlike the primary key
declaration, an alternate key declaration is not the means by which the uniqueness of tuples
in a relation is guaranteed. Speci cally, an alternate key may, depending on the semantics
being represented, be null. That is to say, entity integrity does not apply to alternate keys
because entity integrity has already been guaranteed by the uniqueness of the primary key.

6.1.3 Foreign keys and referential integrity


You are already very familiar with foreign key declarations. For example, here is a portion of
the University relational model:-
relation Student
StudentId: Iden ersOfStudents
Name: PersonNames

45
Registered: Years
CounsellorNo: Sta Numbers
Region: Regions
primary key StudentId
foreign key CounsellorNo references Sta
The declaration of CounsellorNo as a foreign key constrains the set of valid states of the
Student relation. Speci cally, a value of CounsellorNo in a Student tuple cannot be just any
value { if it has a value then it must be a value that is the same as the value of Sta No in
some tuple in Staff.
Hence we have another general rule of relational theory, known as the referential integrity
rule:-

If a relation, R2 has a foreign key, F, that references the primary key, P, in another re-
lation, R1, then every R2.F entry must either be a value equal to an R1.P primary key value
or be null.

As with entity integrity, null is taken to indicate that a value is missing. Referential
integrity is a rule that must hold for any set of relations representing a relational model. It
enforces the semantics that a foreign key value must always match some existing value of the
primary key which it references. Whether a given foreign key is allowed to be null depends on
the particular semantics that are being represented.

6.2 Attribute constraints

Attribute constraint de nitions constrain the set of valid states of a relational database by
constraining the values an attribute can take. You have already met an important type of
attribute constraint de nition { namely, the domain de nition, which constrains an attribute
to take values from the speci ed domain.
We shall consider here one other type of attribute constraint de nition { those concerned
with null.

6.2.1 Allowing Null


Null indicates the absence of a value drawn from the domain on which an attribute is de-
ned. Whether or not an attribute is allowed to be null is an important attribute constraint.
Therefore, in principle, we should extend each attribute declaration in a relational model with
a constraint de nition that states whether or not null is allowed for the attribute. However,
we shall assume that a policy is employed, whereby the absence of a null constraint de nition
is taken to mean that null is allowed for the attribute. Thus, for example, given that every
student must have a name and that such a name must be known in order to record the stu-
dent in the University relational database, we might amend the declaration for the attribute

46
Name in the University relational model as follows:-
relation Student
StudentId: Iden ersOfStudents
Name: PersonNames not allowed null
Registered: Years
CounsellorNo: Sta Numbers
Region: Regions
primary key StudentId
foreign key CounsellorNo references Sta
The presence of a not allowed null constraint de nition as part of the Name declara-
tion constrains the valid states of the University relational database. Speci cally, the Name
attribute in a Student tuple cannot be null. The absence of a not allowed null constraint
de nition is generally taken to mean that such attributes may be null without placing the
database in an invalid state.
There are three important exceptions to this policy of generally allowing null: primary
keys, alternate keys and foreign keys.
The case of primary keys is simple: they cannot be null, otherwise entity integrity is
violated. Consequently, an attribute that is part of a primary key does not require a not
allowed null constraint de nition because the constraint always applies.
Unlike primary keys, alternate keys may or may not be allowed to be null, depending on
the semantics that are being represented. However, it is good practice to make an exception
to the default policy by insisting that an explicit declaration is made as to whether or not null
is allowed in the case of an alternate key. For example, with the GeneralPractitioner relation,
the alternate key SecId may be allowed to be null, given that a general practitioner may not
always have a secretary. That is, the GeneralPractitioner relation is declared as follows:-

relation GeneralPractitioner
GPId: lden ersOfGPs
GPName: PersonNames
SecId: Identi ersOfSecs
SecName: PersonNames
primary key GPId
alternate key SecId allowed null
On the other hand, the Modi edSta relation below can be seen as an example in which
the alternate key is not allowed to be null:-

relation Modi edSta


Sta No: Sta Numbers
Name: PersonNames
Region: Regions
NationalInsuranceNumber: NationalInsuranceNumbers

47
primary key Sta No
alternate key NationallnsuranceNumber not allowed null
The intention here is that a member of sta cannot be entered into the database unless
their National Insurance Number is known. (The clause not allowed null could equally well
have been appended to the attribute declaration of NationalInsuranceNumber in the same
way as for Name in the Student relation above.)
The situation with foreign keys is similar to that with alternate keys in that, generally,
a foreign key may or may not be allowed to be null but an explicit declaration is needed to
assert the particular semantics. We have already seen the importance of foreign keys in the
representation of a relationship, including the degree of a relationship. Here, we show that
whether or not a foreign key is allowed to be null may represent more of the semantics of
relationships: speci cally, the participation condition for an entity on the :n side of a 1:n
relationship.
We shall proceed by considering Figure 19 of the University E-R model.

Figure 19: University E-R Model


Consider the Sta and Student entity types and the Counsels relationship. The Counsels
relationship is mandatory with respect to the Student entity type, since each student must
be related to a member of sta via the Counsels relationship. This mandatory participation
condition for Counsels with respect to Student means, in terms of the primary key/foreign
key mechanism, that there is the constraint that the foreign key Student.CounsellorNo, which
references Sta .Sta No, may not be null. This constraint, coupled with referential integrity,
ensures that each Student tuple is always related to a Sta tuple. Such a constraint may be
declared in the relational model with a suitable modi cation to the foreign key declaration,
for example:-

foreign key CounsellorNo references Sta not allowed null


Implicitly, any attribute may be null if not explicitly constrained, but we can choose to
make the optionality of a relationship explicit by an appropriate declaration. Allowing a for-
eign key to be null could be expressed in a relational model with a suitable modi cation to

48
the foreign key declaration, for example:-

foreign key TutorNo references Sta allowed null

49