Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
cover
Student Notebook
ERC 2.0
Trademarks
IBM® is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
DB2 DB2 Universal Database RACF
z/OS 400
Other company, product, and service names may be trademarks or service marks of
others.
The information contained in this document has not been submitted to any formal IBM test and is distributed on an “as is” basis without
any warranty either express or implied. The use of this information or the implementation of any of these techniques is a customer
responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While
each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will
result elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk. The original
repository material for this course has been certified as being Year 2000 compliant.
© Copyright International Business Machines Corporation 2000, 2002. All rights reserved.
This document may not be reproduced in whole or in part without the prior written permission of IBM.
Note to U.S. Government Users — Documentation related to restricted rights — Use, duplication or disclosure is subject to restrictions
set forth in GSA ADP Schedule Contract with IBM Corp.
V1.2.2
Student Notebook
TOC Contents
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Agenda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X-1
TMK Trademarks
The reader should recognize that the following terms, which appear in the content of this
training document, are official trademarks of IBM or other companies:
IBM® is a registered trademark of International Business Machines Corporation.
The following are trademarks of International Business Machines Corporation in the United
States, or other countries, or both:
DB2 DB2 Universal Database RACF
z/OS 400
Other company, product, and service names may be trademarks or service marks of
others.
Purpose
This course presents a methodology for modeling and designing relational databases.
Audience
People responsible for designing relational databases and people who need an in-depth
understanding of data modeling.
Prerequisites
The course does not require any special prerequisites.
Objectives
After completing this course, you should be able to:
• Design relational databases.
• Consider logical and physical aspects including integrity requirements during the design.
Contents
This course covers the following major topics:
• Relational concepts
• Views and results during database design
• Problem statement
• Entity-relationship modeling
• Data and process inventories
• Tuple types
• From tuple types to tables
• Integrity rules
• Indexes
• Logical data structures and views
pref Agenda
Day 1
Relational Concepts
Views and Results During Design
Problem Statement
Exercises: Problem Statement
Review Exercises: Problem Statement
Entity-Relationship Model (Part 1)
Day 2
Entity-Relationship Model (Part 2)
Exercises: ER Model
Review Exercises: ER Model
Data and Process Inventories
Exercises: Data and Process Inventories
Day 3
Review Exercises: Data and Process Inventories
Tuple Types
Exercises: Tuple Types
Review Exercises: Tuple Types
From Tuple Types to Tables
Exercises: From Tuple Types to Tables
Day 4
Review Exercises: From Tuple Types to Tables
Integrity Rules
Exercises: Integrity Rules
Review Exercises: Integrity Rules
Indexes
Exercises: Indexes
Review Exercises: Indexes
Logical Data Structures
Unit Objectives
Notes:
The relational data model describes the conceptual representation of the data objects of
relational databases and gives guidelines for their implementation. In this unit, we will
discuss the main relational data object, the table, and some of the guidelines applicable to
the implementation of tables.
Conceptually, all data in relational databases is stored in tables. Also, when data is
presented to a user externally, it has the appearance of a table. Tables consist of rows and
columns as we will discuss in this unit.
In this unit, we will also discuss guidelines of the relational data model pertaining to the
uniqueness of rows and columns, the physical ordering of rows and columns, i.e., the
stored sequence of rows and columns, and the linkage of tables. The discussions will
emphasize the implications of these guidelines for the design and processing of tables.
Components of Tables
COLUMN
AIRCRAFT_MODEL
TYPE MODEL CATEGORY MANUFACTURER ENGINES
A340 100 JET AIRBUS 4
B737 500 JET BOEING 2
B737 700 JET BOEING 2 ROW
A320 200 JET AIRBUS 2
VALUE FIELD
Notes:
Tables are the main data object described by the relational data model. Conceptually, all
data of relational databases is stored into tables. Also, all data returned to a user is
presented in form of a table.
Structurally, as with tables in books or newspapers, a table is subdivided in rows and
columns. Horizontally, a table is subdivided in rows. The data stored into a row is logically
related and belongs to a single object, such as a person or an aircraft model. Conversely,
the data for a single object is stored into a single row.
You can compare rows to records in flat files for regular access methods or to segments in
hierarchical databases. From the access method's point of view, records are unstructured.
In contrast, from the database management system's perspective, rows are structured.
Their structure is determined by the columns of the table.
Vertically, a table is subdivided into columns. All data stored into a column has the same
semantical meaning and is of the same type. Columns have names. You can define the
name of a column and should choose it in such a way that it expresses the semantical
meaning of the column.
Uempty The columns of a table subdivide the rows of the table into fields. The fields are the actual
receptacles for the data stored into a table. All rows of a table are subdivided in the same
manner, i.e., have the same columns in the same order.
A field may or may not contain data. The data in a field is also referred to as the value of
the field or the value of the column for the appropriate row. From the relational database
management system's point of view, the data in a field is atomic and unstructured. This
means that, from the relational database management system's point of view, a field
contains a single value. This does not preclude that the relational database management
system may offer (column) functions allowing you to further manipulate the data of a
column.
Column names
must be unique
AIRCRAFT
SERIAL_NUMBER ACQUIRED ENGINE ENGINE ENGINE ENGINE
B238725737 1994-07-21 P0102313 P0102314
B238768737 1997-05-12 R0942497 R0942498
B167029747 1992-10-20 G0015237 G0015240 G0025635 G0025678
A11599320 1994-02-19 R0307023 R0307025
A11599320 1994-02-19 R0307023 R0307025
A203623340 1996-08-01 R0346723 R0346724 R0346743 R0346744
Notes:
In contrast to records that are always retrieved in their entirety, you need not retrieve all
columns of the rows. You can select particular columns by providing their names. You can
also only change selected columns of the rows of a table. For this reason, the relational
data model requires that all column names of a table be unique. Thus, in the example on
the visual, you cannot have four columns with the name ENGINE. If you need all four
columns, you must name them differently. In the example, the four columns have been
renamed to ENGINE_1, ENGINE_2, ENGINE_3, and ENGINE_4.
In many cases, if you have naming conflicts for columns of a table, the semantics of the
conflicting columns has not been defined sufficiently. By better defining the meaning of the
columns, you may find different, more meaningful, names for the columns as is the case for
the illustrated example. (The engines of an aircraft are generally referred to as Engine 1,
Engine 2, and so on.)
From a design point of view, the illustrated solution may not even be the desirable solution.
What happens, for example, if new aircraft models are introduced whose aircraft have
more than four engines? This will be discussed later in the course.
Uempty In the same way, as you can retrieve or update selective columns of a table, you can
retrieve, update, or delete specific rows of a table. To ensure that you can do this, the
relational data model recommends that all rows be unique, i.e., that no two rows contain
the exact same data. Many relational database management systems do not enforce this
rule, but there are some which do. Therefore, if your database design is to be
system-independent, you should make sure that the rows of your tables are unique. We will
see later in the course how you can achieve this.
There are also other design considerations that make it highly recommendable to ensure
that all rows of a table are unique. A design should not just be short-lived, it should be
something lasting. At this moment, duplicate rows in a table may be fine because you might
not intend to retrieve, update, or delete rows individually. However, your perception may
change as new applications are introduced.
Ask yourself why you may want to have multiple identical rows? If you only need them to
determine how often the event creating the rows occurred, you might be better off to add a
column to the table counting the occurrences and remove the duplicates. This may reduce
the space required for your table and improve performance.
The design methodology taught in this course will insist that rows in the resulting tables are
unique.
DB2 allows duplicate rows in tables, but can automatically ensure that no two rows are
alike in a table.
Unordered Retrieval
Next time, rows may be returned in a different sequence
Next time, columns may be returned in a different sequence
Notes:
According to the relational data model, the sequence in which the rows and columns of a
table are physically stored in a relational database is completely up to the relational
database management system. Conversely, the physical sequence of the rows and
columns does not imply the sequence in which the rows or columns are returned if an
ordering has not been requested by the end user or application. As a matter of fact, the
same (unordered) retrieval request issued twice may return the rows and columns in a
different sequence the second time it is issued.
This means that an application cannot rely on the physical sequence of the rows or
columns in the database. If the order of the rows or columns is important to the application
during retrieval, it must tell the relational database management system how the returned
rows should be ordered. The ordering of the rows can be based on the values of one or
more columns of the table; the order can be by ascending or descending column values.
The order that can be requested is always a logical order and never a physical order.
The application can define the order in which the columns are to be returned by specifying
the column names in the desired sequence in the retrieval request.
Uempty
Linkage of Tables
AIRCRAFT_MODEL
TYPE MODEL CATEGORY MANUFACTURER ENGINES
A340 100 JET AIRBUS 4
B737 500 JET BOEING 2
B737 700 JET BOEING 2
A320 200 JET AIRBUS 2
MANUFACTURER
MID NAME CITY
AIRBUS AIRBUS INDUSTRIES TOULOUSE
BOEING BOEING CORPORATION SEATTLE
Notes:
A table is seldom on its own meaning that a relational database normally consists of
multiple tables which are logically interconnected.
The visual shows two tables, AIRCRAFT_MODEL and MANUFACTURER. There is clearly
an interconnection between the two tables. Each row of table AIRCRAFT_MODEL contains
an identifier for the manufacturer of the corresponding aircraft model, but does not give any
details for the manufacturer. The details for the manufacturer are contained in table
MANUFACTURER. Logically, each row of table AIRCRAFT_MODEL with a specific
manufacturer-id is interconnected with the row of table MANUFACTURER having the same
manufacturer-id.
The relational data model prescribes that logical associations are not physically
implemented in the relational database and that they are dynamically established, by
means of Join operations, on a request-by-request basis. In particular, there are no
physical pointers, such as addresses, in the columns referring to rows of other tables. The
request-based joining of tables is accomplished by means of the values of the columns
named in the join operation.
In the example of the visual, the joining of the rows for aircraft models A340, Model 100,
and A320, Model 200, in table AIRCRAFT_MODEL with the proper manufacturer is
achieved by having the same value (AIRBUS) in columns MANUFACTURER and MID,
respectively. Of course, the columns must be specified in the request performing the join
operation.
Similarly, all rows for aircraft models having the value BOEING in the MANUFACTURER
column are joined with the appropriate row of table MANUFACTURER.
The important point is that, during relational database design, you need not worry about
physical pointers. However, you will have to worry about logical relationships which are
realized through column values rather than pointers. Column values are not affected by
reorganizations, physical pointers may be affected.
The relational data model disallows externally visible pointers, but does not prohibit internal
pointers (e.g., in index entries) that are not externally visible.
Uempty Checkpoint
Uempty
Unit Summary
Notes:
© Copyright IBM Corp. 2000, 2002 Unit 2. Views and Results During Database Design 2-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit Objectives
Notes:
When designing a relational database, different views are assumed for the data of the
subject application domain. These views are:
• The conceptual view
• The storage view
• The logical view
During this unit, we will discuss these data views, give an overview of the steps performed
during database design, list their results, and relate the results of the steps to the data
views.
© Copyright IBM Corp. 2000, 2002 Unit 2. Views and Results During Database Design 2-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Tuple Types
Tables
Logical Data Structures
Integrity Rules
Notes:
When designing the database for an application domain, you start with a problem
statement for the application domain, i.e., a document describing the types of business
objects for the application domain, the relationships between them, and the business
constraints for both of them. The problem statement must be established by an application
domain expert (analyst). In general, it is not produced by the database designer who does
not have the domain expertise, but it is input for him/her.
Starting with the problem statement, a series of steps is performed during the design.
These steps look at the data of the application domain from three different angles, called
views:
• The conceptual view
• The storage view
• The logical view
For each of these views, a set of results is produced during database design. You can
associate a view with its results and describe it by its results. For this reason, it is quite
common to say "the ... view consists of ..." rather than "the ... view establishes ...". In the
Uempty latter case, the view is seen more as the activity of looking at the application domain from a
specific angle and producing certain results whereas, in the former case, the view is seen
as the results produced. During this course, both terminologies are used.
The conceptual view scrutinizes and structures the data of the application domain based
on their semantical meaning, i.e., their meaning for the business (application domain). It
does this independently of the business processes accessing the data and without regard
to any existing or planned method for storing the data.
Thus, during the conceptual view, the process- and implementation-independent
architecture of the data of the application domain is established.
The storage view looks at the data of the application domain from a storage point of view.
During the storage view, in a series of steps, the objects of the conceptual view are
mapped into objects (in particular, tables) of the relational database management system
chosen for the implementation of the data. Thus, the storage view is not an
implementation-independent view of the data. Rather, it is an implementation-oriented view
of the data during which the conceptual view is physically implemented in the selected
relational database management system.
The logical view looks at the data of the application domain from a process point of view.
Generally, a particular business process does not access all data of the application
domain, but only a part of the data. Thus, it has its own process-dependent view of the data
of the application domain. Accordingly, during the logical view, the process-dependent
views for the business processes of the application domain are established.
© Copyright IBM Corp. 2000, 2002 Unit 2. Views and Results During Database Design 2-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Conceptual View
Problem Statement
Tuple Types
Tables
Logical Data Structures
Integrity Rules
Notes:
As mentioned before, during the conceptual view, the process- and
implementation-independent architecture of the data of the application domain is
established. The application domain, described by the problem statement, is scrutinized for
its business object types, the relationships between them, and the business constraints.
As a result of this scrutiny, an entity-relationship model is established visualizing and
structuring the business object types of the application domain as entity types; illustrating
the relationships between the business object types by means of relationship types; and
modeling the constraints for the entity types and relationship types imposed by the
business constraints.
In a second step, which is not directly performed by the database designer, but requires
his/her participation, the elementary data of the application domain, referred to as data
elements, are identified and described in detail. The descriptions are recorded in a
document, the data inventory.
As the data elements are collected, they are assigned to the business object types to which
they belong. More precisely, they are assigned to the corresponding entity types verifying
Uempty whether or not the entity-relationship model established during the previous step is
complete.
© Copyright IBM Corp. 2000, 2002 Unit 2. Views and Results During Database Design 2-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Storage View
Problem Statement
Tuple Types
Tables
Logical Data Structures
Integrity Rules
Notes:
During storage view, the conceptual view is physically implemented in a relational database
management system. More precisely, the results of the conceptual view are implemented.
The steps executed during storage view transform the results of the conceptual view into
tables and related objects of the chosen relational database management system. The
initial steps are mostly independent of the chosen relational database management
system. The further you proceed, the more system-dependent aspects have to be
considered although many of the considerations are of a global nature.
The first step of the storage view uses the data inventory to construct tuple types for the
entity types and relationship types of the entity-relationship model developed during the
conceptual view and normalizes them. Tuple types are the precursors of tables and provide
the basis for the computerized processing of the entity types and relationship types for the
application domain. During the normalization of the tuple types, data redundancies and
abnormalities are resolved that may lead to data inconsistencies if not removed.
Uempty Based on a prescribed set of rules, the next step of the storage view converts the tuple
types into tables of the chosen relational database management system taking into
account the supported functions and features.
Also as part of storage view, any rules concerning the integrity of the data, including the
constraints defined as part of the entity-relationship model, must be converted into rules for
the tables created by the previous step and implemented if the chosen relational database
management system provides the necessary functions such as check constraints,
referential constraints, and triggers.
Some of the associations between the tables (especially, those implied by referential
constraints) make it imperative that indexes be defined for certain columns of the tables.
The last step of storage view will define these indexes.
© Copyright IBM Corp. 2000, 2002 Unit 2. Views and Results During Database Design 2-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Logical View
Problem Statement
Tuple Types
Tables
Logical Data Structures
Integrity Rules
Notes:
The logical view looks at the data of the application domain from the perspective of the
processes for the application domain. During the logical view, the required process-specific
views for the processes of the application domain are established.
The first step during the logical view for an application domain describes, in an
implementation- and database-independent fashion, the processes retrieving and/or
manipulating the data of the application domain. The process descriptions are collected in
a document referred to as process inventory. For each process, they must identify the data
elements used by the process.
After the tables for the application domain and the integrity rules for them have been
defined, as part of logical view, the necessary logical data structures are established for all
processes described in the process inventory. Each logical data structure describes a view
that a process (or part of a process) has of the tables defined during storage view. More
precisely, the logical data structure describes the subset of the tables for the application
domain required by the process or a part of the process. It also illustrates how the process
Uempty or the part of the process must logically navigate through the tables to achieve its function.
Thus, it reflects the tables needed, the subsets required, and the flow between the tables.
As you can see by now, the steps of the various views are interconnected and may be
dependent on each other. The process inventory of the logical view is the primary source
for the data inventory since it identifies the data elements used by the processes. Similarly,
the tables and the integrity rules of the storage view are required input for the logical data
structures.
© Copyright IBM Corp. 2000, 2002 Unit 2. Views and Results During Database Design 2-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Design Methodology
Problem Statement
Tuple Types
Tables
Logical Data Structures
Integrity Rules
Notes:
This visual illustrates the complete design methodology used during the course. The
entity-relationship model developed during the conceptual view is used and updated by the
later steps of the design as additional knowledge becomes available. This ensures that the
model remains valid and useful at all times.
The methodology described on the previous pages and illustrated by the diagram on this
visual is not a pure top-down approach. Design is and must be an iterative process. When
you start with the design, your knowledge of the application domain is most likely
incomplete even if the problem statement was prepared carefully. No matter how
thoroughly you execute the various steps, subsequent steps will detect holes and errors in
the results of the preceding steps that will force you to revisit these steps.
Unless the problem statement is incomplete, you should always start an iteration with the
entity-relationship model. Check if the required change impacts the entity-relationship
model. If it does not, proceed to the next step and verify its results.
If the problem detected reveals that the problem statement is incomplete or incorrect, have
it extended or corrected by the application domain expert. It is not your, i.e., the database
Uempty designer's, responsibility to change the problem statement. This must be done by a person
with the proper domain competence. However, it is your responsibility to make the
application domain expert aware of the problem. After the problem statement has been
corrected, continue the iteration with the entity-relationship model as before.
The fact that relational database design is an iterative process should not make you sloppy.
The better the problem statement and the more carefully the various steps are performed,
the better your design will be. However, it does not make sense to dwell endlessly on a
specific step.
© Copyright IBM Corp. 2000, 2002 Unit 2. Views and Results During Database Design 2-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Checkpoint
4. During the conceptual view, the data of the application domain are
structured taking into account the business processes for the
application domain. (T/F)
5. The logical view looks at the data of the application domain from
the viewpoint of the business processes for the application domain.
(T/F)
Uempty 6. Match the three data views with the results produced by them:
a. Conceptual view ____ Tuple types
b. Storage View ____ Entity-relationship model
c. Logical view ____ Data inventory
____ Process inventory
____ Tables
____ Integrity rules
____ Logical data structures
____ Indexes
© Copyright IBM Corp. 2000, 2002 Unit 2. Views and Results During Database Design 2-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit Summary
Notes:
Unit Objectives
Notes:
When designing the database for an application domain, you start with a problem
statement for the application domain. This unit discusses the problem statement in detail
and describes:
• The purpose of the problem statement
• Who is responsible for the creation of the problem statement
• The role of the database designer in the creation of the problem statement
• The contents the problem statement should have to be a usable input for database
design.
Database designer
should work with
application domain expert
Notes:
The problem statement must be created by someone who has detailed knowledge of the
application domain, i.e., an application domain expert. Only then will the problem statement
reflect the application domain correctly and completely.
The problem statement is input for the database designer and is a global description of the
application domain for which the database designer is to develop a database. It is a global
description rather than a detailed description. This means it describes the important
characteristics of the application domain rather than the various data elements and
processes of the application domain or any implementation-dependent details. It should be
a functional description of the application domain. It should not describe the current or a
planned implementation. It must allow the database designer to:
• Gain a basic understanding of the application domain so that he/she can comprehend
the context of business objects, business relationships, and business constraints
important for the design; detect inconsistencies; and discuss problems detected during
the design with sufficient ease and knowledge with the application domain expert or the
responsible department.
Uempty • Create an entity-relationship model for the application domain visualizing and clarifying
the types of business objects and business relationships and the business constraints to
be implemented in the database.
As mentioned before, the problem statement should not contain detailed information about
the application domain. The detailed information about the application domain is provided
by the data inventory and the process inventory discussed later.
Although the application domain expert is responsible for the problem statement, the
database designer should work with the application domain expert during the creation of
the problem statement. The database designer knows best which input he/she needs for
the design of the appropriate database and, thus, can provide the necessary guidance to
the application domain expert.
Furthermore, by working with the application domain expert, the database designer will
gain a better understanding of the application domain easing his/her work considerably.
Notes:
First, the problem statement should contain a textual description, an overview, of the
application domain. This overview should describe, in simple words, what the application
domain does so that the database designer gets at least a certain idea of what is going on.
Furthermore, the overview should indicate what the application domain (or more precisely,
the appropriate departments) wants to achieve by using a database management system.
In particular, the overview should point out which areas of the application domain should be
implemented in the target database. The actual application domain may be much larger,
and it may not be intended or possible to implement the entire domain in the database.
Secondly, the problem statement should list and describe all categories (types) of
business objects which are important for the application domain and about which
information is to be stored in the target database. These categories are referred to as
business object types.
As a category, a business object type represents all business objects having the same
meaning and characteristics rather than distinct business objects. For an airline company,
for example, the problem statement should describe that information about aircraft models
Uempty in general is to be stored rather than about a specific aircraft model such as a Boeing 747,
Model 400.
For each business object type to be implemented in the target database, the following
information should be provided:
• A textual description of the semantics of the business object type without going into
details such as the individual attributes of the business object type. The details for the
business object types will be provided via the data inventory discussed later.
• How the distinct objects belonging to the business object type can be identified.
Overview
Notes:
Throughout this course, we will use a sample application domain to demonstrate the
various items discussed. This application domain comprises the flight planning, pilot
assignment, and aircraft maintenance for an airline called Come Aboard or, in short, CAB.
This visual illustrates the overview section of the problem statement for our sample
application domain. The amount of information provided in the overview depends on the
general familiarity of the application domain. If the application domain is less known or
more complex, the overview will require more information. The sample application domain
used in this course is generally well-known and is not really complex. Consequently, the
short description on the visual should be sufficient.
You should note that the second paragraph limits the application domain being considered
to the fight planning, the pilot assignment, and the aircraft maintenance. Without this
restriction, the application domain for the management of an airline would comprise
additional areas such as flight reservation or seat selection.
The entire problem statement for the sample application domain can be found in Appendix
A - Sample Problem Statement.
Uempty
Sample Problem Statement (2 of 8)
CAB wants to store information about the following business object types in
its database:
AIRCRAFT MODELS
For its flying activities, CAB uses aircraft of different types or, more precisely,
models such as Boeing 737, Model 500, or Airbus A320, Model 200. For the
aircraft models it owns or has on order, CAB wants to maintain information in
its database such as their category (e.g., JET or TURBOPROP), length,
height, wing span, or number of engines.
The aircraft models can be uniquely identified by their type code (e.g., B737)
together with their model number (e.g., 500).
unique identifier
Notes:
This visual illustrates a business object type for our sample airline company called CAB.
Aircraft Models is a business object type for the application domain being considered since
flight planning, pilot assignment, and aircraft maintenance are dealing with aircraft models.
When a flight is planned and an aircraft is assigned to the flight, that aircraft cannot be an
arbitrary aircraft. It must be an aircraft of a specific aircraft model because the model is
published in the timetables and the starting and landing airports require the aircraft to be of
a certain model.
Similarly, pilots are only allowed to fly aircraft of those models they have a license for, and
mechanics may only service aircraft of models they have been trained for.
Aircraft Models is a business object type rather than a business object because it
represents a set of objects with the same meaning (being models of aircraft) and the same
characteristics such as manufacturer, category (jet or turboprop), or number of engines.
As highlighted on the visual, the individual aircraft can uniquely be identified by their type
code (e.g., B737) together with their model number (e.g., 500).
AIRCRAFT
CAB owns multiple aircraft of the various aircraft models. For the aircrafts it
owns, CAB wants to maintain information such as the date when the aircraft
was acquired, the engines mounted on the aircraft, or the seats of the
aircraft.
Each aircraft has a unique serial number. This serial number is unique
across aircraft models.
unique identifier
AIRPORTS
CAB services a set of airports with its aircraft. For these airports as well as
for airports CAB plans to service in the near future, CAB wants to keep
information in its database such as the airport code, the location of the
airport, the address of CAB's city ticketing office, or the address of CAB's
airport office.
The airport codes uniquely identify the various airports.
Notes:
This visual illustrates two further business object types for our sample application domain:
Aircraft and Airports. Again, both of these types are of interest for the application domain
and represent true categories. Each of them represents a set of objects having the same
meaning and the same characteristics.
As highlighted on the visual, the various aircraft of the business object type Aircraft can be
identified by means of a unique serial number. Even for aircraft of different models, this
serial number is unique, i.e., no two aircraft can have the same serial number.
For airports, their international airport codes (e.g., SFO for San Francisco, CA, JFK for
John F. Kennedy Airport in New York, NY, or STR for Stuttgart, Germany) serve as unique
identifiers.
The full set of business object types for the CAB application domain can be found in
Appendix A - Sample Problem Statement.
Uempty
Contents of Problem Statement (2 of 3)
Notes:
As a third item, the problem statement should contain a listing of all types (categories) of
logical relationships that exist between business objects of the various business object
types. A business relationship logically interconnects two or more business objects which
may belong to different or to the same business object type. The objects may even be
identical, i.e., an object may have a relationship with itself.
Business relationships of the same type always interconnect business objects of the same
respective types. For example, if r1 and r2 are business relationships of the same type and
r1 associates an object of business object type O1 with an object of business object type
O2, then r2 must also interconnect an object of O1 with an object of O2.
Note that we are talking about types or categories of relationships, referred to as business
relationship types, rather than individual relationships between business objects. For the
problem statement, it is not important which specific business objects have a relationship
with each other. It is only important to identify the type of the business relationship and to
understand its semantics and characteristics. For the sample airline application domain, for
example, it is only important to know that aircraft belong to aircraft models and that an
individual aircraft always belongs to one and only one model. For the problem statement, it
is not important to know that the aircraft with serial number B238725737 is a Boeing 737,
Model 500.
Depending on the business object type from which you look at the business relationship
type, there are different (directional) views of the same business relationship type. For the
above airline example, you may look at the business relationship type from Aircraft's point
of view or from Aircraft Models' point of view. From Aircraft's point of view, the semantics is
that an aircraft belongs to an aircraft model; from Aircraft Models' point of view, the
meaning is that a specific aircraft model comprises an aircraft. As expected, the meanings
are complementary. You can think of them as separate directional business relationship
types that make up a single nondirectional (or bidirectional) business relationship type.
For each business relationship type, the problem statement should include:
• A textual description of the business relationship type, i.e., describe its meaning and the
business object types involved.
• How many relationships of the same type an object can have. The important fact is
whether it can have many relationships or at most one.
• If the type of the relationship requires every existing object of an object type to have at
least one relationship of the considered business relationship type.
• If the objects having a relationship of the considered type with an object must be deleted
as well if that object is deleted, i.e., the consequences of delete operations on objects
that are interconnected by means of relationships.
It is possible that the objects of two business object types are interconnected by multiple
(different) business relationship types.
Uempty
Sample Problem Statement (4 of 8)
For an aircraft model, CAB may have any number of aircraft. In particular, it
is possible that there are no aircraft (yet) for an aircraft model. Conversely,
an aircraft belongs to one and only one aircraft model.
1 aircraft ~ 1 to 1
aircraft model Mandatory relationship type
Notes:
This visual illustrates a business relationship type for our sample application domain. As
mentioned before, there exists a business relationship type linking objects of business
object type Aircraft Models to objects of business object type Aircraft.
From Aircraft Models' point of view, the meaning of the business relationship type is that an
aircraft model comprises an aircraft. Conversely, from Aircraft's point of view, the meaning
is that an aircraft belongs to an aircraft model.
As highlighted on the visual, there may be many aircraft for an aircraft model, but it is also
possible that an aircraft model does not have any aircraft.
A given aircraft, in contrast, can only belong to a single aircraft model. Furthermore, an
aircraft must always belong to an aircraft model. Accordingly, every (existing) aircraft must
have a relationship to an aircraft model. Thus, from Aircraft's point of view, the business
relationship type is a mandatory business relationship type. From Aircraft Models' point of
view, the business relationship type is not mandatory because an aircraft model need not
have a relationship to an aircraft.
When the aircraft is removed from the list of aircraft, its maintenance
records are deleted as well.
Notes:
This visual illustrates another business relationship type for CAB. This business
relationship type interrelates aircraft and maintenance records: The objects of business
object type Aircraft (may) have relationships with objects of business relationship type
Maintenance Records (Aircraft Has Maintenance Record). Conversely, each object of
Maintenance Records must have a relationship to one and only one object of Aircraft
(Maintenance Record for Aircraft).
As the description states, all maintenance records for an aircraft are to be deleted when the
aircraft is deleted. From Aircraft's point of view, the business relationship type is a
cascading business relationship type because delete operations are cascaded down to the
associated objects of the other business object type.
The above description of the business relationship type does not match with the description
in Appendix A - Sample Problem Statement since we want to illustrate a cascading
business relationship type. The description in Appendix A - Sample Problem Statement has
some peculiarities which will be discussed later.
Uempty The remaining business relationship types for our sample application domain can be found
in Appendix A - Sample Problem Statement.
Notes:
The fourth section of the problem statement should list all business constraints for the
business object types and business relationship types of the application domain.
Business constraints represent restrictions that exist for the objects of business object
types or the relationships of business relationship types or a mixture thereof. For example,
such a restriction could require that for each (existing) business object of business object
type O1 a corresponding business object of business object type O2 must exist. We will
see further, more intuitive, examples for our sample airline application domain on the
subsequent visuals.
For each business constraint, the problem statement should contain the following
information:
• A textual description of the business constraint, i.e., of the restriction the business
objects or business relationships involved must adhere to.
• The description should identify the business object types and/or business relationship
types to whose objects or relationships the restriction applies, i.e., whose insert, update,
Uempty or delete operations are limited by the business constraint. As mentioned before, a
single business constraint can restrict the objects or relationships of a single or multiple
business object types or business relationship types, or of a mixture thereof.
• The description should specify when the appropriate restriction is to be applied, i.e.,
what triggers the application of the restriction. This has two facets:
1. There may be circumstances or conditions attached to a business constraint
specifying that the restriction is to be exercised only if these conditions are met. For
example, the condition could specify that the restriction only concerns aircraft
manufactured by Boeing or that the restriction only applies to aircraft put in service
before January 1, 1985.
2. For the affected business object types or business relationship types, the description
should specify the type of operations (insert, update, or delete) for which the
constraint must be enforced provided that the before-mentioned conditions are met.
• The description of the business constraint should specify the action to be performed
when the constraint is violated. The simplest form of action is to reject the operation.
However, there are more complex actions possible. For example, the violation of the
constraint could trigger the creation of a business object for another business object
type.
As you can see from the description, a business constraint may not only involve the
business object types or business relationship types to which its restriction applies, but also
other business object types or business relationship types for evaluating the triggering
condition or for the action to be performed if the constraint is violated.
Business Constraints
The following constraints exist for the business object and relationship types
that CAB wants to maintain in its database:
Notes:
This visual illustrates a business constraint for our sample airline called Come Aboard. The
business constraint limits the number of engines for an aircraft to the number of engines for
the corresponding aircraft model.
The business object type to which the constraint applies is Aircraft. The restriction controls
how many engines an aircraft can have.
There is not a particular condition under which the constraint is to be applied. The
constraint must be verified (enforced) whenever an engine is added to (mounted on) an
aircraft.
The request to add an engine to an aircraft should be rejected if the limit for the
corresponding aircraft model were exceeded, i.e., the constraint were violated.
As mentioned before, the business constraint applies to business object type Aircraft.
However, in order to verify it, business relationship type Aircraft Belongs to Aircraft Model
and business object type Aircraft Models are needed.
Uempty
Sample Problem Statement (7 of 8)
Business
relationship type
Notes:
The above business constraint requires that the captain and the copilot for a flight must be
different.
This business constraint applies to the relationships of a business relationship type rather
than to the objects of a business object type. It applies to business relationship type Pilot
for Flight which interconnects business object types Pilots and Flights.
The corresponding restriction must be applied in two cases:
• When a pilot is assigned to a flight, i.e., when a new relationship for the business
relationship type is added.
• When a pilot assignment is changed, i.e., an existing relationship for the business
relationship type is changed. (You could also view this as the deletion of the old
business relationship followed by the addition of a new business relationship.)
The pilot assignment (new or changed) should be rejected if pilot and copilot were the
same.
Note that the business constraint applies to Pilot for Flight and also needs Pilot for Flight to
check if the constraint has been violated.
Uempty
Sample Problem Statement (8 of 8)
PILOTS FOR FLIGHT MUST HAVE LICENSE FOR AIRCRAFT MODEL FOR LEG
A pilot for a flight must have the license to fly the aircraft model for the leg for
the flight.
Notes:
The business constraint on this visual requires that the pilots for a flight must be licensed to
fly the aircraft model used for the leg for the flight. This means that the above business
constraint applies to the same business relationship type as the previous example: Pilot for
Flight.
However, there is a peculiarity for this business constraint. The associated restriction is to
be checked:
1. When a pilot is assigned to a flight or a pilot assignment is changed.
2. When the aircraft model for the leg for the flight is changed. As a consequence, pilots
previously assigned to flights for the leg might no longer be licensed to fly the new
aircraft model.
The point illustrated here is that a constraint may also have to be enforced when business
objects or business relationships of another business object type or business relationship
type are inserted, updated, or changed. In case of the above example, the constraint has to
be verified when a relationship of business relationship type Aircraft Model for Leg is
changed.
As shown on the visual, the action to be performed when the constraint is violated depends
on what was causing the violation, the assignment of a pilot or the reassignment of an
aircraft model.
Uempty Checkpoint
Uempty 11. List the items that the problem statement should contain for each
business constraint.
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
Unit Summary
Notes:
Unit Objectives
After completion of this unit, you should be able to:
Notes:
During this unit, it will be described how to develop an entity-relationship model for an
application domain. You will learn to analyze a given problem statement and to define the
entity types for the application domain represented by the problem statement. There are
basic entity types corresponding to the truly independent business object types and
dependent entity types that are based on other entity types. Their instances require the
existence of corresponding instances of the entity type they are based upon.
Furthermore, you will learn to determine the relationship types for an application domain
and to represent them in an entity-relationship model. Most of the relationship types of an
entity-relationship model correspond to the business relationship types described in the
problem statement for the application domain, but there will also be others. Most
relationship types do not have attributes further describing them, but you will experience
relationship types with attributes as well and learn how to represent their attributes in an
entity-relationship model.
A special category of entity types are supertypes and subtypes which are interconnected
by so-called is-bundles (relationship types). They allow you to form categories and classify
Uempty the represented entity instances, i.e., the objects represented by the entity types.
Supertypes and subtypes are advanced modeling constructs.
A further topic of this unit are constraints for the entity types and relationship types of an
application domain. Most of the constraints are derived from the business constraints for
the application domain, but there are also others.
All elements discussed in this unit are ingredients of entity-relationship models. Thus, you
will learn in this unit how to establish an entity-relationship model for an application domain.
During the unit, we will use the sample application domain for the airline company called
Come Aboard (CAB) we have used in the previous unit. Based on the problem statement
described in Appendix A - Sample Problem Statement, we will establish an
entity-relationship model for our sample airline company. We will pick specific items of the
problem statement and illustrate how they are modeled.
Tuple Types
Tables
Logical Data Structures
Integrity Rules
Notes:
The development of an entity-relationship model for an application domain is the first step
of the conceptual view during which the process- and implementation-independent
architecture of the data of the application domain is established.
The application domain, described by the problem statement, is scrutinized for its business
object types, the relationships between them, and for business constraints. As a result of
the scrutiny, an entity-relationship model is established visualizing and structuring the
business object types of the application domain as entity types; illustrating the relationship
types between the entity types resulting from the business relationship types of the
application domain; and modeling the constraints for the entity types and relationship types
imposed by the business constraints.
As a general rule, the better the problem statement describing the application domain, the
easier it will be to establish the corresponding entity-relationship model. Therefore, you
should insist on a good problem statement as outlined in Unit 3 - Problem Statement and
assist the domain expert in producing it.
Uempty Even if you have a good problem statement, you will most likely encounter items that are
obscure. Consult the domain expert or the appropriate department of expertise to clarify
the open issues. Do not make assumptions on your own that are not based upon
knowledge of the application domain, but are your own speculations. They might be wrong!
The entity-relationship model is the basis for all subsequent design steps. A wrong
assumption for the entity-relationship model will produce incorrect results for the
subsequent steps. If the problem is detected during a later step, you must reiterate all
preceding steps, starting with the entity-relationship model or even the problem statement,
and correct the erroneous results. Therefore, it is advisable to solve open questions
concerning the application domain with the competent people right away and not to make
assumptions not based on knowledge of the application domain.
The development of the entity-relationship model is not a one-time affair. The
entity-relationship model is maintained constantly and changed as the subsequent steps
reveal errors or discover undocumented business object types, business relationship
types, or business constraints. If the problems found concern undocumented items of the
application domain or items not properly described in the problem statement, the problem
statement must be corrected as well. It should be corrected by the domain expert.
Entity Type
An independent conceptual unit representing a class of
objects with the same meaning and characteristics
about which information is to be stored and maintained
Entity (Instance)
An actual object belonging to an entity type
Attribute
A conceptual piece of information with a
distinct meaning stored for the instances of an entity type,
not actual values
Notes:
As the name entity-relationship model suggests, entity types are one of the building blocks
of an entity-relationship model. They constitute independent conceptual units
representing classes of objects with the same meaning and characteristics about
which information is to be stored and maintained.
Many of these classes derive themselves from the business object types for the application
domain, but not necessarily all of them. Some of the entity types, especially those added
later in the design process, are caused by design rules, e.g., the rules for avoiding
redundancies in the information stored.
You should realize that an entity type represents a class of objects rather than a specific
object. It is a conceptual category of items. The items may physically exist, such as an
aircraft, or they may not physically exist and only be imaginary, such as an aircraft model.
(An aircraft model does not physically exist, it only exists on paper, i.e., it is imaginary.)
An entity type must be an independent conceptual unit. This means that the objects of the
entity type must have a conceptual meaning by themselves; that the information
represented by the objects is understandable by itself; and that, from the application
Uempty domain's point of view, it makes sense to process the information represented by the
objects independently. For our sample airline, it makes sense to have an entity type PILOT
representing the pilots of the company and providing information such as the name, age,
and even shoe size of the pilots (if CAB provides the shoes for the pilots as part of their
uniforms). However, it would not make sense to have an entity type combining the shoe
size of pilots with the wing span of aircraft models because the individual objects of the
entity type would not have a reasonable conceptual meaning.
The term independent in this context does not mean that the objects of an entity type are
completely unrelated to objects of other entity types. In contrast, in a real-life
entity-relationship model, there are many interconnections between the objects of the
various entity types. It is fairly suspicious if there are entity types having no associations
with other entity types. For our sample airline company, PILOT and AIRCRAFT MODEL are
two apparent entity types making sense on their own, but, nevertheless, being interrelated
with each other: Pilots have licenses to fly aircraft models.
Up to now, we talked about the objects belonging to an entity type. In modeling
terminology, the actual objects belonging to an entity type are referred to as instances of
the entity type, entity instances, or simply entities.
The term attribute is used to denote a conceptual piece of information with a distinct
meaning stored for the instances of an entity type. Attributes represent the conceptual type
of information stored, such as last name, and not actual values (such as MILLER for last
name). Therefore, it would be better to talk about attribute types, but this is not the
terminology generally used.
Attributes represent partial information for an entity type. Whether several pieces of
information together form a distinct entity type or just a set of attributes of a larger entity
type depends on their importance for the application domain and their independence. For
example, addresses consisting of country, state, city, and street only represent a set of
attributes for the pilots of our sample airline company. They would form a separate entity
type, identifying buildings, for a shipping company.
Notes:
Entity types and attributes have the following properties:
• Each entity type receives a unique name. This should be the generic class name
expressing the function of the instances of the entity type. By convention, the name for
the class name is used in the singular form as this is done for biological genders where
you talk about the class Human Being and not about the class Human Beings. All
capital letters will be used for the name. Since the name for an entity type is used for
reference purposes, it must be unique.
For our sample airline application domain, entity types are for example: AIRCRAFT
MODEL, AIRCRAFT, PILOT, and AIRPORT.
• For all instances of an entity type, the same common characteristics are stored.
Primarily, this means that the same attributes are stored. However, as we will see later
on, it also means the same types of relationships and/or constraints are recorded.
• The attributes stored for the instances of an entity type have a direct bearing on the
meaning of the entity type. In other words, attributes are not stored for the instances of
Uempty an entity type if they have nothing to do with the semantics of the entity type. For
example, Wing Span should not be an attribute of entity type PILOT. It has nothing to do
with pilots. It is an attribute of aircraft models and, thus, of entity type AIRCRAFT
MODEL.
As obvious this statement seems to be, again and again attributes are assigned to the
wrong entity type.
• The attributes for an entity type receive a unique name. The name should clearly
express the meaning of the attribute, i.e., the characteristic it represents. The name of
an attribute may consist of multiple words. We will start each word with a capital letter
except for connecting words such as of, for, or and.
Examples of attributes are:
For entity type AIRCRAFT MODEL: Number of Engines, i.e., the number of engines
for an aircraft model.
Manufacturer, i.e., the company manufacturing
an aircraft model.
For entity type AIRCRAFT: Aircraft Number, i.e., the unique serial number
identifying an aircraft.
Seat, i.e., the seats on an aircraft.
For entity type PILOT: Last Name, i.e., the last name of a pilot.
For entity type AIRPORT: Airport Code, i.e., the three-letter designator
used in aviation for the various airports.
• For an entity instance, an attribute may assume no, one, or multiple values (an array of
values). However, all values assumed have the same meaning, namely, the meaning
imposed by the attribute. For example, attribute Seat for entity type AIRCRAFT may
assume multiple values: one for each seat on the particular aircraft. How many values
an attribute must assume at least or at most depends on the entity type.
• Attributes can be elementary or composite. From the perspective of the application
domain, the values of an elementary attribute are (logically) indivisible. This means they
cannot be subdivided into smaller units that, by themselves, have a meaning for the
application domain. Thus, they are not structured. For our sample airline company,
examples of elementary attributes are: the serial number for an aircraft and the number
of engines for an aircraft model.
In contrast, composite attributes consist of components. This means their values can be
decomposed into smaller units having an own meaning for the application domain. All
values have the same structure imposed by the components. The components of a
composite attribute are logically related. They can be elementary attributes or again
composite attributes. Each value of a composite attribute for an entity instance is
composed of the appropriate values of its components for the entity instance.
Uempty
Representation of Entity Types
AIRCRAFT MODEL
K Type Code
K Model Number
AIRCRAFT Entity Type Dimensions
MODEL Length
Height
Wing Span
Category
Standard Attribute
Representation Representation
Entity Instances
Notes:
In an entity-relationship model, entity types are represented as rectangles. Most of the
time, the rectangle for an entity type just contains the name of the entity type because of
the limited size of the drawing area available for the entity-relationship model. This
representation is referred to as standard representation of the entity types.
A more detailed representation of an entity type includes attributes for the entity type. In
this case, the rectangle contains a header separated from the rest of the information by a
horizontal line. The header contains the name of the entity type. Below the header, the
attributes for the entity type are listed. For a composite attribute, its components may be
shown as well and are indented to identify them as components.
The attributes belonging to the entity key are preceded by the letter K, the nonkey
attributes are not. If a composite attribute (i.e., all its components) belongs to the entity key,
it is marked appropriately and not its components. This representation of entity types is
referred to as attribute representation of the entity types.
To make the key attributes better visible, their names will be italicized throughout this
document especially when representing entity instances.
Frequently, only a few sample attributes are shown to restrict the size of the rectangle. In
general, the illustrated attributes include the attributes of the entity key. Often, only the
attributes of the entity key are shown.
Because it tends to reduce the clarity of the entity-relationship model and because of the
limited size of the drawing area, generally, the attribute representation is only used if:
• the attributes are important for the understanding
• a small portion of the entity-relationship model is illustrated.
In general, tools only show the standard representation, i.e., the rectangles with the names.
If you click with the mouse on the rectangle for an entity type, a separate window is opened
providing a textual description of the entity type and listing the attributes as far as they have
been entered. A similar approach can be applied when using paper for the
entity-relationship model: The entity-relationship model is drawn in standard representation
and, for each entity type, a page is added providing details about the entity type including
the name, a textual description, and the attributes known.
Sometimes, it is desirable to illustrate a few sample instances for an entity type. An entity
instance is represented as a rectangular box with a header containing the name of the
entity type. The header is followed by a line for each desired attribute. For an elementary
attribute, the name of the attribute is followed by a colon (:) which, in turn, is followed by the
values of the attribute for the represented entity instance. If the attribute assumes multiple
values for the entity instance, the values are separated by commas.
If the components for a composite attribute are shown, the line for a composite attribute
only contains the name of the composite attribute. The components of composite attributes
are indented as for the attribute representation. If the components for a composite attribute
are not shown, the line for a composite attribute has the same format as a line for an
elementary attribute. The components of a value are enclosed in parentheses and
separated by commas.
As described for the attribute representation, we will italicize the lines (name and values)
for the key attributes throughout this course. If a composite attribute belongs to the entity
key and its components are shown, only the line for the composite attribute is italicized.
The examples on the visual list both key and nonkey attributes. Only a subset of the
attributes for entity type AIRCRAFT MODEL is shown. Generally, when developing the
initial entity-relationship model, you do not know all attributes for the entity types yet.
However, you should know the entity keys! If the key for an entity type cannot be derived
from the description of the related business object type in the problem statement, contact
the domain expert to identify the entity key.
The visual illustrates both the standard representation and the attribute representation for
entity type AIRCRAFT MODEL. It also shows two entity instances, a Boeing 747, Model
400, and an Airbus 310, Model 300.
Uempty
Determining the Entity Types (1 of 2)
BUT . . .
Notes:
Since the entity-relationship model must reflect the application domain and visualize its
business object types, the problem statement for the application domain constitutes the
primary source for determining the entity types. For each business object type of the
application domain, there is normally an entity type in the entity-relationship model. The
entity types derived this way are usually referred to as basic entity types since they are
inherent (basic) to the application domain.
This illustrates how important it is that a good problem statement is available when the
modeling begins. Therefore, the database designer should insist on a good problem
statement being established by the domain expert (with the help of the database designer
to ensure that it contains the proper information). The better the problem statement, the
easier it is to develop the corresponding entity-relationship model.
You should realize that the final entity-relationship model will contain additional entity types
that were not apparent from the problem statement. For some of these entity types, the
corresponding business object types were simply forgotten in the problem statement, and
the problem statement should be corrected accordingly by the domain expert. Other entity
types were part of more complex business object types and must be separated out. We will
see such cases later in this unit. For them, you should also request an update of the
problem statement by the domain expert.
Furthermore, structuring requirements of the later steps of the design process may
introduce additional entity types. The entity-relationship model is updated as these entity
types are found. You may rightfully ask if these entity types do not have corresponding
business object types? In many cases (if not all), they indeed should have corresponding
business object types. However, these business object types are frequently not
immediately obvious to the domain expert because they play a secondary role from the
perspective of the application domain. It is highly advisable that the database designer
discusses these entity types with the domain expert and convinces him/her to update the
problem statement accordingly.
Uempty
Determining the Entity Types (2 of 2)
Ask yourself the following questions:
Considered by themselves, have the entity instances a
meaning for the application domain?
What is the generic class name?
Would the application domain conceivably process the
instances on their own?
How can the entity instances uniquely be identified?
What is the entity key?
Do the entity instances also have nonkey attributes?
Will there be eventually multiple instances of that type
for the application domain?
Notes:
This visual lists a set of questions you might want to ask yourself before accepting
something (a business object type) as entity type:
• Considered by themselves, have the instances of the candidate entity type a reasonable
meaning for the application domain?
The term by themselves emphasizes the independence of the instances. Further
subquestions leading to the answer are:
- What would be the generic class name and does it make sense in the context of the
application domain? Does it indeed represent a conceptual entity compatible with the
application domain?
- Would the application domain conceivably process the instances on their own, i.e., do
the instances have a meaning by themselves, or are the instances only meaningful
when processed together with the instances of another entity type? In the latter case,
you may rather be dealing with a subset of attributes for the other entity type and not
with a separate entity type.
• How would the instances of the candidate entity type be uniquely identified, i.e., what
would be the entity key?
You should be able to find a set of attributes that identifies the entity instances in a
manner natural to the application domain.
• Do the instances of the candidate entity type also have nonkey attributes?
It is possible, but very seldom, that all attributes of an entity type belong to the entity key.
Therefore, you should be suspicious if there are not any nonkey attributes.
• Will the candidate entity type eventually contain multiple instances or will there always
be only a single instance for the entity type?
Again, it is possible that an entity type will always contain just a single entity instance
(something like a control record), but it is very unusual and should make you suspicious.
If the problem statement does not contain the answers to the above questions, go back to
the domain expert or the appropriate department of expertise. Do not make unfounded
assumptions!
If you can answer all the above questions satisfactorily and affirmatively, the candidate
entity type is most likely a real entity type. However, you should be aware that the
affirmative answers only provide clues and not proofs that something is an entity type. That
is because the entity types depend on the application domain.
Uempty
Entity Types - A Piece of Advice
You should establish the entity types carefully
However:
Do not linger on endlessly!!!
You may not have all information yet
Some entity types may be hidden and reveal themselves in
the subsequent steps
Some entity types may turn out not to be entity types as you get
more information and become wiser
Remember: It is an iterative
process!!!
Figure 4-8. Entity Types - A Piece of Advice CF182.0
Notes:
Since the entity-relationship model is the basis for all further steps of the design process,
you should establish the entity types very carefully. However, you should not linger on
endlessly. Projects have failed in the past because the participants fought endlessly over
what the entity types for the application domain were.
At such an early stage of the design process, you may not have all information to be a
hundred percent certain of the entity types, especially since the problem statement does
not list all data elements yet that play a role for the application domain. It only lists a few
sample data elements for each business object type.
Despite of all good intentions when writing the problem statement, some entity types may
be hidden and only emerge in the subsequent steps or when all data elements are
compiled.
Conversely, some of the entity types may have been overrated and prove not to be entity
types after all as more information becomes available during the subsequent steps and you
become more familiar with the application domain.
Thus, establish the entity types carefully, but continue on to the next steps of the design
after you feel confident with what you have done. Remember that the design methodology
used in this course represents an iterative approach allowing you to continuously improve
the entity-relationship model and the dependent results.
Uempty
Entity Types for CAB
MECHANIC PILOT
AIRCRAFT
MODEL
AIRPORT
AIRCRAFT
MAINTENANCE
RECORD
ITINERARY FLIGHT
Notes:
This visual illustrates the entity types for our sample airline company called Come Aboard.
The entity types were derived from the problem statement in Appendix A - Sample Problem
Statement. Since this is a fairly good problem statement, there is an entity type for each
business object type. However, later in this unit, we will see that some additional entity
types will have to be added.
The entity types have the following entity keys:
Relationship Type
A conceptual association between the entity
instances, one each, of two not necessarily different entity types
Relationship (Instance)
A specific interrelation of a given relationship type
between specific entity instances of the entity types for
the relationship type
Notes:
As the name already suggests, relationship types form the second component of
entity-relationship models. Initially, we will concentrate on relationship types between entity
types. Later, we will expand, i.e., generalize, the relationship type definition given here.
A relationship type (between entity types) is a conceptual association between the entity
instances, one each, of two not necessarily different entity types. Thus, it describes a class
of interrelationships, having the same characteristics, connecting the entity instances of
two entity types.
The terms relationship instance and relationship are used to denote a specific
interrelationship of a given relationship type between specific instances of the entity types
for the relationship type.
Relationship instances of the same relationship type always interconnect instances of the
same respective entity types. If r1 and r2 are relationship instances of the same
relationship type and r1 associates an instance of entity type E1 with an instance of entity
type E2, then r2 must also interconnect an instance of E1 to an instance of E2.
Furthermore, r1 and r2 must have the same meaning and characteristics.
Uempty Taking this into account, you can conceive a relationship type as the entirety of all
(potential) relationships, with the same meaning, between entity instances of two (not
necessarily different) entity types.
By definition, relationship instances exist only as long as the instances exist they
interconnect. If one of the instances is deleted, the relationship instance no longer exists.
By definition, relationship types are binary in the sense that their instances always
interconnect two entity instances. At the first glance, this seems to be restrictive, but it will
prove not to be the case when the relationship type definition is extended later in this unit.
Please note the similarity of the relationship type definition to the definition of business
relationship types given in Unit 3 - Problem Statement. Therefore, you may already suspect
that the business relationship types will be the primary source for the relationship types of
the entity-relationship model. This is indeed the case, but there will be additional
relationship types as we will see later in this unit.
Arrow for
primary direction
Notes:
In the entity-relationship model, the entity types for a relationship type are interconnected.
Since relationship types are binary by definition as explained before, each relationship type
can be viewed from two directions. One of the direction is referred to as primary direction,
the other as inverse direction. The term primary seems to indicate that one of the directions
is more important than the other. From a data modeling perspective, this is not the case
and it is irrelevant which direction is chosen as primary direction. From an application point
of view, you may want to choose the direction as primary direction which, application-wise,
is more important.
In the above example, the relationship type interconnecting the entity types PILOT and
AIRCRAFT MODEL can be looked at from PILOT's point of view meaning that a pilot can
fly an aircraft model. The relationship type can also be looked at from AIRCRAFT MODEL's
point of view. Then, the meaning is that an aircraft model can be flown by a pilot. As
expected, the meanings are complementary. Let us choose the direction from PILOT to
AIRCRAFT MODEL as the primary direction.
Uempty In the entity-relationship model, the primary direction of a relationship type is indicated by
an arrow specifying the direction of the view.
To allow referencing them, all relationship types are uniquely named. More precisely, each
direction receives a unique name. In the entity-relationship model, the names for the
directions are placed next to the connecting arrow and the name for the inverse direction is
enclosed in parentheses. This convention together with the arrow for the primary direction
allows you to understand and interpret the relationship type correctly from the
entity-relationship model.
When talking about a direction of a relationship type, it makes sense to talk about the
source and the target of the direction. The source of the direction is the entity type from
which you look at the relationship type. The target is the opposite entity type.
In the example on the visual, _can_fly_ (more precisely,
PILOT_can_fly_AIRCRAFT MODEL as we will see in a minute) is the name of the primary
direction. PILOT is the source for the primary direction and AIRCRAFT MODEL its target.
For the inverse direction, the name is _can_be_flown_by_ (more precisely, AIRCRAFT
MODEL_can_be_flown_by_PILOT); AIRCRAFT MODEL is the source; and PILOT is the
target.
From a data modeling point of view, it is only important to be able to identify the relationship
type as such and not the various directions. For this, it is sufficient to list a single name for
the relationship type in the entity-relationship model. For simplicity, the name of the primary
direction is used since it does not require the enclosing parentheses.
People often talk about the source and target of a relationship type without mentioning a
specific direction. In this case, the source and the target of the primary direction are meant.
We will follow this convention as well throughout this course.
As mentioned before, the directions of a relationship type receive unique names. To avoid
overly lengthy names in the entity-relationship model, we are using the following naming
convention throughout the course:
• The full name for a direction always starts with the name of the source followed by an
underscore and always ends with the name of the target preceded by an underscore. All
words in between the source and target names are separated by underscores rather
than blanks.
• In the entity-relationship model, only the part of the name is shown that follows the name
of the source and precedes the name of the target. The names of the source and the
target are not shown. Thus, the illustrated name portion (abbreviated name) always
starts with an underscore and always ends with an underscore signaling the absence of
the source and target names.
This convention allow us to use the same abbreviated name in the entity-relationship
model for the directions of different relationship types or for both directions of a relationship
type and still to be able to determine the full unique names for them.
In addition to illustrating the relationship types in the entity-relationship model, you should
provide a detailed description for them on a separate piece of paper including the names
for both directions, the names of their sources and targets, and a textual description of the
meaning of the relationship type. When following the above naming convention, the names
of the source and target for a direction are implicitly identified and need not be specified
explicitly.
Uempty
Relationship Instance Diagram
_can_fly_ AIRCRAFT
PILOT
(_can_be_flown_by_) MODEL
AIRCRAFT MODEL
Type Code: B747
Model Number: 400
PILOT _can_fly_ Cruising Speed: 930 km/h
Employee Number: 0491337 ... ...
Last Name: Miller
First Name: Jack AIRCRAFT MODEL
... ... _can_fly_
Type Code: A340
Model Number: 100
PILOT _can_fly_
Cruising Speed: 890 km/h
... ...
Employee Number: 1662951
Last Name: Smith
First Name: Joe AIRCRAFT MODEL
... ... _can_fly_ Type Code: A310
Model Number: 300
Cruising Speed: 860 km/h
... ...
Notes:
Relationship instance diagrams are a useful means to illustrate a relationship type by
example. They cannot replace entity-relationship models. They can only help to better
visualize small parts of an entity-relationship model by means of examples.
In a relationship instance diagram, sample entity instances are interconnected by named
arrows in the manner intended by the subject relationship type.
The topmost part of the above visual shows how the relationship type is represented in an
entity-relationship model. The representation is followed by a relationship instance diagram
for the relationship type. The relationship instance diagram shows that pilot Miller, Jack
(employee number 0491337) can fly Boeing 747, Model 400 (type code B747, model
number 400), and Airbus 340, Model 100 (type code A340, model number 100). Pilot
Smith, Joe can fly Airbus 340, Model 100, and 310, Model 300.
FLIGHT
PILOT Flight Number: YY1842
Employee Number: 0491337 _captain_for_ From: FRA
Last Name: Miller To: JFK
First Name: Jack Flight Locator: 453
... ...
Planned Departure
_copilot_for_ Departure Date: 1999-07-21
PILOT Departure Time: 10:30
Employee Number: 1662951 ... ...
Last Name: Smith
First Name: Joe FLIGHT
... ... _captain_for_ Flight Number: YY2843
From: ATL
PILOT To: SJC
Employee Number: 0844092 _copilot_for_ Flight Locator: 210
Last Name: Ferguson Planned Departure
First Name: Jane Departure Date: 1999-08-01
... ... Departure Time: 16:35
... ...
Notes:
The above visual demonstrates that there may exist multiple different relationship types
between two entity types underlining why it is important to name the relationship types
(more precisely, their directions). The names allow you to differentiate the various
relationship types.
As described by the problem statement for our sample airline company called Come
Aboard, to each flight, one pilot is assigned as (flight) captain and another as copilot. This
gives rise to two relationship types between entity types PILOT and FLIGHT:
PILOT_captain_for_FLIGHT
PILOT_copilot_for_FLIGHT
The upper part of the picture illustrates their representation in an entity-relationship model.
Only the primary names are shown for the relationship types. For better distinguishability,
the connecting arrow for PILOT_copilot_for_FLIGHT has been dotted. This does not imply
a special meaning.
Uempty The lower part of the picture illustrates a relationship instance diagram comprising the two
relationship types. Pilot Miller, Jack with employee number 0491337 is captain for flight
YY1842, flight locator 453, from Frankfurt (FRA) to New York Kennedy airport (JFK). Pilot
Smith, Joe (employee number 1662951) is pilot for flight YY2843, flight locator 210, from
Atlanta (ATL) to San Jose, California (SJC). Smith, Joe is also copilot for the flight Miller,
Jack is captain for.
Pilot Ferguson, Jane is copilot for captain Smith's flight from Atlanta to San Jose.
The requirement that captain and copilot for a flight must be different cannot be modeled by
these relationship types. It must be expressed by means of constraints discussed later.
AIRPORT _nonstop_to_
AIRPORT
Airport Code: SJC
Country: USA
City: San Jose
... ...
_nonstop_to_
AIRPORT
Airport Code: ATL
Country: USA _nonstop_to_ _nonstop_to_
City: Atlanta
... ...
_nonstop_to_
AIRPORT
Airport Code: STR
Country: Germany
City: Stuttgart
... ...
Notes:
The problem statement for our sample airline company states that itineraries consist of
ordered collections of nonstop connections between airports. As the term connection
implies, nonstop connections are relationships between two airports: an airport has a
nonstop connection to another airport. As indicated on the visual, the abbreviated name for
the relationship type is _nonstop_to_. Accordingly, the full name is
AIRPORT_nonstop_to_AIRPORT. The first airport is the airport of departure, the second
airport the airport of arrival.
Even though the individual relationship instances are binary in that they interconnect two
entity instances, a relationship type interconnecting instances of the same entity type is
referred to as unary relationship type.
The upper part of the visual illustrates the representation of a unary relationship type in an
entity-relationship model: the arrow returns to the entity type it starts from.
The lower part of the visual illustrates instances for the represented relationship type.
Atlanta (ATL) has a nonstop connection to San Jose, California (SJC). Stuttgart, Germany
Uempty (STR), has nonstop connections to Atlanta and San Jose. San Jose has a nonstop
connection to Stuttgart.
AIRPORTS - ITINERARIES
An itinerary consists of one or more legs. The legs are nonstop connections
between two airports, the starting and ending airports for the leg. Airports
can be the starting or ending points for legs of multiple itineraries.
Notes:
Let us look closer at the business relationship type for Come Aboard associating airports
with itineraries. It states that itineraries consist of legs. The legs are nonstop connections
between two airports, the starting airport (airport of departure) and the ending airport
(airport of arrival) for the respective leg.
As explained on the previous visual, the legs (nonstop connections) are relationship
instances and not entity instances because they only reflect the fact that two airports are
interconnected, i.e., have a relationship. The appropriate relationship type had been called
AIRPORT_nonstop_to_AIRPORT.
Since itineraries consist of one or more legs, they have relationships with legs, i.e., with
relationships (relationship instances). This means that we need to extend the definition of
relationship types to allow relationship types as source or target.
Uempty
Relationship Types - Generalized Definition
Relationship Type
A conceptual association between:
The entity instances, one each, of two not necessarily
different entity types
The relationship instances, one each, of two not
necessarily different relationship types
The entity instances and the relationship instances,
one of each, of an entity type and a relationship type
Relationship (Instance)
A specific interrelation of a given relationship type
Notes:
This visual contains the general relationship type definition. It extends the previous
definition, which only allowed entity types as source or target, by allowing entity types or
relationship types as source or target of relationship types. All kinds of combinations are
allowed:
• The relationship instances can interconnect the instances of two (not necessarily
different) entity types. This was the initial, restricted, definition of relationship types.
• The relationship instances can interconnect the instances of two (not necessarily
different) relationship types.
• The relationship instances can interconnect an entity instance and a relationship
instance. Either one can be the source or the target. It is not important here which one is
the source or the target because the role can be reversed by selecting the other
direction of the relationship type as the primary direction.
A relationship instance in this extended sense is nothing else than a specific interrelation of
the considered relationship type.
_in_
AIRPORT ITINERARY
_nonstop_to_
AIRPORT
_in_ ITINERARY
Airport Code: STR
... ... _in_ Flight Number: YY3367
_nonstop_to_ ... ...
AIRPORT
_in_
Airport Code: FRA
... ... ITINERARY _in_
_in_
_nonstop_to_ Flight Number: YY0025
AIRPORT ... ...
Airport Code: ATL _in_
... ...
_nonstop_to_
AIRPORT ITINERARY
Airport Code: SFO Flight Number: YY0100
... ... _in_ ... ...
_nonstop_to_
Notes:
As described in the problem statement for Come Aboard, an itinerary consists of one or
more legs. We have already determined that the legs are relationship instances of
relationship type AIRPORT_nonstop_to_AIRPORT. Accordingly, we need a relationship
type interconnecting this relationship type and entity type ITINERARY. In the
entity-relationship model portion above, this relationship type is represented as an arrow
from relationship type AIRPORT_nonstop_to_AIRPORT (source) to entity type ITINERARY
(target). Its abbreviated name is _in_. According to our naming convention, the full name of
the (primary direction of the) relationship type is:
AIRPORT_nonstop_to_AIRPORT_in_ITINERARY
If necessary, parentheses may be used to avoid duplicate names or any
misunderstandings.
The lower part of the visual illustrates a relationship instance diagram for the
entity-relationship model portion of the upper part:
Uempty • The itinerary for flight number YY3367 is composed of two nonstop connections: one
from Atlanta (ATL) to Stuttgart (STR) and one from Stuttgart to Frankfurt (FRA).
• The itinerary for flight YY0025 consists of three legs (nonstop connections): one from
Stuttgart to Frankfurt, one from Frankfurt to Atlanta, and one from Atlanta to San
Francisco (SFO).
• The itinerary for flight number YY0100 consists of two legs: one from San Francisco to
Atlanta and one from Atlanta to Stuttgart.
If the airline company had round flights (e.g., sightseeing flights), airports could be
connected to themselves.
The model does not make a statement about the order of the legs although the problem
statement specifies that an itinerary is an ordered collection of nonstop connections. For
the sample itineraries above, we have used the implicit rule that the starting airport for the
next leg must be the ending airport for the previous leg. This rule may not always hold true
or it may not provide the order of the legs if the starting and ending airports for an itinerary
are the same (around-the-world trips). It is possible to model the order of the legs. We will
do this after we have talked about the necessary modeling constructs later in this unit.
As defined, relationship types are binary in nature in that all relationship instances
interconnect two instances. Many modeling methodologies only allow entity types as
source or target of relationship types and, in order to compensate for the loss of
functionality, introduce n-ary relationship types.
N-ary relationship types interconnect the instances of n entity types. The business
relationship type Airports - Itineraries used for this visual would be considered as a ternary
(3-ary) relationship type by these methodologies whose instances interconnect three entity
instances: a starting airport, an ending airport, and the itinerary. For the correct
interpretation of n-ary relationship types, you need to define the roles of the entity types
within the relationship types.
In case of our sample ternary relationship type, you need to specify that the first airport is
the starting airport for the leg of the itinerary and the second airport the ending airport of
that leg. By doing this, you implicitly define a relationship between the two airports, namely,
that they are the starting and ending airports for a nonstop connection. This is the
relationship type explicitly implemented in the entity-relationship model portion of the
visual. It more clearly expresses the actual situation.
Binary relationship types are sufficient if relationship types are allowed as source or target.
By using only binary relationship types as we have defined them, the application domain is
much better structured and hidden relationship types are revealed. Furthermore, using only
binary relationship types avoids violations of the Fourth Normal Form and the Fifth Normal
Form.
Notes:
It is not always clear whether a business relationship type must be modeled as a
relationship type or just constitutes an attribute of an entity type. This is illustrated on the
visual by means of business relationship type AIRCRAFT - MAINTENANCE RECORDS for
our sample airline company.
The fact that there is a business relationship type seems to indicate that the
entity-relationship model should include a relationship type
AIRCRAFT_has_MAINTENANCE RECORD interconnecting entity types AIRCRAFT and
MAINTENANCE RECORD.
However, the description of the business relationship type states that the maintenance
records for an aircraft contain the serial number for the aircraft (aircraft number). This
seems to indicate that the aircraft number should be an attribute of entity type
MAINTENANCE RECORD. But, do not be fooled! The before-mentioned text only
expresses that the aircraft number is displayed with a maintenance record (e.g., in the
maintenance-record form on paper or in a window on a screen). It does not describe how
the associations between aircraft and maintenance records are internally stored in a
Uempty database. For a relational database management system, they would not be stored as part
of the maintenance records if a maintenance record belonged to multiple aircraft. (In case
of our sample airline company, it can only belong to one aircraft.)
Moreover, the entity-relationship model is part of the conceptual view during which only the
conceptual interrelationships, and not any physical implementations, should be considered.
Accordingly, the fact that a maintenance record contains the aircraft number rather
expresses the relationship between maintenance records and aircraft (the inverse direction
of relationship type AIRCRAFT_has_MAINTENANCE RECORD).
Well, we must disappoint you in this case ... Unfortunately, the description of the business
relationship type includes the remark that the maintenance records (including the aircraft
number) must be kept even after the remaining information for the aircraft has been
deleted. This means that the association with the aircraft must be maintained.
This requirement prevents modeling the business relationship type between aircraft and
maintenance records as a relationship type in the entity-relationship model since
relationship instances, at all times, require the existence of their source and target
instances. A relationship instance (automatically) disappears when its source or target
instance is deleted.
Thus, the considered business relationship type cannot be expressed as a relationship
type. It must be expressed as an attribute of entity type MAINTENANCE RECORD.
If such an anomaly, as exemplified by the considered business relationship type, does not
exist, an association between two entity types should always be expressed as a
relationship type in the entity-relationship model regardless of any future implementation
considerations.
MECHANIC _can_fly_
PILOT
AIRCRAFT
_trained MODEL
_for_ _can_land AIRPORT
_at_
_copilot _captain
_from_ _for_ _for_ _for_ _for_
_nonstop_to_
AIRCRAFT _in_
_scheduled
_for_
_for_
MAINTENANCE
RECORD
ITINERARY FLIGHT
_belongs_to_ _for_
Notes:
This visual contains the relationship types for our sample airline company called Come
Aboard. The relationship types were derived from the problem statement contained in
Appendix A - Sample Problem Statement. Since this is a fairly good problem statement, the
relationship types could easily be derived from the business relationship types described
by the problem statement. However, later in this unit, we will see that some additional
relationship types will have to be added.
Note that there is no relationship type between MAINTENANCE RECORD and AIRCRAFT
as discussed before.
Uempty
Cardinalities
AIRCRAFT _for_
AIRCRAFT
MODEL 1. .1 (_belongs_to_) 0. .m
0. .1 0. .m 1. .1 1. .m
Possible cardinalities:
1 m
0. .1 _for_ 0. .m
AIRCRAFT FLIGHT
(_has_been_assigned_)
Notes:
For modeling purposes and for the transformation of an entity-relationship model into tuple
types and tables, it is important to know if an instance of the source of a relationship type
can have relationships with multiple target instances, or vice versa, or only with a single
target or source instance. It is also important to know if a source or target instance must
always be connected to at least one target or source instance, respectively.
Since the relationship types of the entity-relationship model mostly correspond to the
business relationship types of the problem statement, the multiplicities for the relationship
types should be reflected by the descriptions of the corresponding business relationship
types in the problem statement. The multiplicities were required input for the business
relationship types. Since they are application-domain specific, the database designer
should not make assumptions about them on his/her own. He/She should consult the
domain expert or the appropriate department of expertise to obtain the correct information.
In the entity-relationship model, the multiplicities are expressed by cardinalities:
• The cardinality for the target describes how many target instances may be associated
with a single source instance and is placed close to the connecting arrow at the target
(end) of the relationship type.
• The cardinality for the source expresses how many source instances may be associated
with a single target instance and is placed close to the connection arrow at the source
(end) of the relationship type.
• A cardinality consists of two values, a minimum value and a maximum value separated
by two periods:
minimum .. maximum
Minimum can be 0 (zero) or 1. Maximum can be 1 or m where m is used as abbreviation
for many.
A minimum of 0 for the cardinality of the target (source) means that a source (target)
instance may not necessarily have a relationship with a target (source) instance.
A minimum of 1 for the cardinality of the target (source) means that a source (target)
instance must always have at least one relationship with the instances of the target
(source).
A maximum of 1 for the cardinality of the target (source) means that a source (target)
instance cannot have more than one relationship with the instances of the target
(source).
A maximum of m for the cardinality of the target (source) means that a source (target)
instance can have many relationships with the instances of the target (source).
The upper relationship type on the visual interconnects aircraft models and aircraft. For the
corresponding business relationship type, the problem statement states the following:
• An aircraft belongs to one and only one aircraft model.
• An aircraft model may apply to multiple aircraft.
The fact that an aircraft belongs to one and only one aircraft model is expressed by a
cardinality of 1..1 at the AIRCRAFT MODEL end of the relationship type since it describes
the cardinality for the source. The fact that an aircraft model may apply to multiple aircraft is
expressed by a cardinality of 0..m at the AIRCRAFT end of the relationship type. The
minimum value of zero allows for aircraft models for which there is no aircraft.
The lower part of the visual illustrates the cardinalities for relationship type
AIRCRAFT_for_FLIGHT for our sample airline company. According to the problem
statement, an aircraft may be used for many flights resulting in a target cardinality of 0..m.
Note that aircraft need not be assigned to flights at all times. According to the problem
statement, at most one aircraft can be assigned to a flight, but there need not be an aircraft
assigned to a flight. This results in a cardinality of 0..1 for the source of the relationship
type.
Because of the minimum and maximum values they can assume, the possible cardinalities
are:
Cardinalities (Example 1)
MAINTENANCE _from_
RECORD
MECHANIC
m 1. .1
Possibly many
maintenance records Maintenance record from
from a mechanic at least one mechanic
Notes:
The above visual illustrates the cardinalities for relationship type MAINTENANCE
RECORD_from_MECHANIC describing the interrelationships between maintenance
records and mechanics for our sample airline company.
A maintenance record must be from at least one mechanic as indicated by the minimum
value of 1 for the target cardinality (at the MECHANIC end of the relationship type).
Accordingly, in the relationship instance diagram, there must be at least one connection
from each maintenance record to a mechanic. The maximum value of 1 for the cardinality
reflects that a maintenance record can be from at most one mechanic. Consequently, there
must not be more than one connection from a maintenance record to mechanics.
The source cardinality of m is equivalent to a cardinality of 0..m. It specifies that a
mechanic may be responsible for multiple (many) maintenance records, but need not be
responsible for any:
• Mechanic 9163488 is responsible for a single maintenance record.
• Mechanic 0275912 is responsible for two, i.e., multiple, maintenance records.
Uempty • Mechanic 4712002 (currently) is not responsible for any maintenance records.
Cardinalities (Example 2)
_nonstop_to_
m
_in_
AIRPORT ITINERARY
1 m
..
m m
Notes:
On the above visual, AIRPORT_nonstop_to_AIRPORT is a m:m relationship type: An
airport can be the airport of arrival or the airport of departure for any number of nonstop
connections (legs).
According to the problem statement for Come Aboard, an itinerary must always have at
least one leg and can have multiple legs. Thus, the source cardinality must be 1..m for the
_in_ relationship type. Note the way the cardinality is written on the visual to save space.
The target cardinality for the _in_ relationship type is m meaning that the legs may be part
of multiple itineraries, but need not belong to any itineraries. The question is justified if a leg
must not always belong to at least one itinerary resulting in a target cardinality of 1..m
rather than 0..m? Why have a nonstop connection otherwise? The problem statement for
our airline company is not precise in this regard and we must consult the domain expert for
the correct cardinality. His/Her answer is that CAB wants to record planned nonstop
connections between airports even before itineraries are established. This means that the
cardinality of m is correct.
Uempty Relationship types with cardinalities of 1.. at both ends represent a kind of "chicken and
egg" problem when adding instances for the source or target. If an instance for the source
is added, the corresponding target instance, if it does not exist yet, and the interconnecting
relationship instance must be added at the same time. A similar scenario applies to adding
a target instance. The transaction concept of relational database management systems
allows this provided that the completeness check is performed at the end of the
transaction, i.e., when the transaction is committed, and not when the source or target
instance are inserted.
To avoid the problem from the beginning, it may be preferable to change one of the
minimum cardinalities to 0. In case of our example, this allows the legs to be established
(first) without a check being performed. However, you should note that this is an
implementation problem and not a conceptual design problem. Therefore, you should use
1.. cardinalities at both ends, if that is what the application domain requires, and handle the
resulting problem during the later design phases.
. .1 . .1
Source Target
Source OR Target
. .1 Relationship Key . .1
=
. .m Key of . . . . .m
Target AND Source
Target Source
. .m . .m
Figure 4-23. Defining Attributes and Relationship Key CF182.0
Notes:
To fully describe a relationship instance, you must specify the source and target instances
interconnected by the relationship instance. The source and target instances can be
identified by means of the values of their keys. If the source and target of the relationship
type are entity types, the keys are the respective entity keys. We will see in a moment what
the key is if the source or the target is a relationship type.
Since the keys of source and target completely describe and define the possible
relationship instances, they are referred to as defining attributes of the relationship type.
The defining attributes of a relationship type are completely independent of the cardinalities
for the relationship type.
Similar to the introduction of the term entity key, the term relationship key is introduced to
denote a subset of the defining attributes of a relationship type that can be used to uniquely
identify the potential relationship instances and does not contain any defining attributes not
needed for the unique identification (minimum principle).
It depends on the cardinalities for the relationship type which of the defining attributes can
form the relationship key:
Uempty • If both cardinalities of the relationship type are ..m cardinalities (i.e., 0..m or 1..m), each
source instance can be associated with multiple target instances and each target
instance with multiple source instances. Thus, each source or target key value may
occur as defining attribute of multiple relationship instances.
Consequently, a relationship instance can only be uniquely identified (referred to) by
providing both the key value for the source and the target. Accordingly, the relationship
key for the relationship type consists of the key of the source and the key of the target.
• If the cardinality of the source is ..1 and the cardinality of the target is ..m, there may be
multiple target instances for each source instance, but there may be only one source
instance for any target instance. Thus, a source key value may occur as defining
attribute of multiple relationship instances whereas a target key value can only occur as
defining attribute of a single relationship instance.
Consequently, a relationship instance can uniquely be identified by providing the value
of its target defining attribute, i.e., the key value of its target instance. In other words, the
relationship key consists of (the attributes of) the key of the target of the relationship
type.
• Similarly, if the cardinality of the target is ..1 and the cardinality of the source is ..m, there
may be multiple source instances for each target instance, but there may be only one
target instance for any source instance. Thus, the target key value may occur as
defining attribute of multiple relationship instances whereas a source key value can only
occur as defining attribute of a single relationship instance.
Consequently, a relationship instance can uniquely be identified by providing the value
of its source defining attribute, i.e., the key value of its source instance. In other words,
the relationship key consists of (the attributes of) the key of the source of the relationship
type.
• If both cardinalities are ..1, for every source instance there may only be one target
instance and vice versa. Thus, each source or target key value may occur once as
defining attribute of a relationship instance. A relationship instance can be uniquely
identified by providing the key value of its source instance or the key value of its target
instance. Only one is required.
Accordingly, you can choose as relationship key either the key of the source or the key
of the target of the relationship type, but not both (minimum principle).
Relationship Key
Aircraft Number
Defining Attributes
Type Code
Model Number
Aircraft Number
Notes:
The visual shows relationship type AIRCRAFT MODEL_for_AIRCRAFT, a 1:m relationship
type since there may be multiple aircraft for each aircraft model, but one and only one
aircraft model for each aircraft.
The defining attributes for the relationship type are the keys of the source and the target,
i.e., Type Code and Model Number from AIRCRAFT MODEL and Aircraft Number from
AIRCRAFT.
Since there is only one aircraft model for each aircraft, Aircraft Number, i.e., the key of
entity type AIRCRAFT, becomes the relationship key.
Uempty
Relationship Key (Example 2)
_nonstop_to_
To m
AIRPORT _in_ ITINERARY
K Airport Code K Flight Number
... 1 m ...
..
From m m
Defining Attributes:
Defining Attributes: Flight Number
From (Airport Code) From (Airport Code)
To (Airport Code) To (Airport Code)
Notes:
If we want to determine the defining attributes or the relationship key of relationship type
AIRPORT_nonstop_to_AIRPORT_in_ITINERARY, we first need to find the relationship key
of relationship type AIRPORT_nonstop_to_AIRPORT. Its source and target are entity types
so that we can immediately derive its defining attributes and relationship key. The defining
attributes are twice Airport Code, once playing the role of the airport of departure (From)
and once the role of the airport of arrival (To).
To make this apparent, you can (and should) indicate the respective roles at the
appropriate ends of the relationship type. The defining attributes for the relationship type
should be named accordingly. As done on the visual, you should add, in parentheses, the
original name of the attributes since the roles only act as synonyms for them.
AIRPORT_nonstop_to_AIRPORT is a m:m relationship type. Therefore, the relationship
key consists of all defining attributes.
After having determined the relationship key of AIRPORT_nonstop_to_AIRPORT, we also
know the defining attributes of relationship type
AIRPORT_nonstop_to_AIRPORT_in_ITINERARY. They consist of the key of the target
and the key of the source for the relationship type, i.e., of Flight Number (from ITINERARY)
and From and To (from AIRPORT_nonstop_to_AIRPORT). The sequence of the attributes
is not important.
Since AIRPORT_nonstop_to_AIRPORT_in_ITINERARY is a m:m relationship type, its
relationship key consists of its defining attributes, i.e., Flight Number, From, and To.
When determining the defining attributes or the relationship key of a relationship type, you
must back-step until you finally reach relationship types whose source and target are entity
types. Start determining the defining attributes and the relationship keys from there. If the
source or target of the _nonstop_to_ relationship type had been relationship types, you
would have had back-step further to determine the defining attributes and the relationship
key.
Uempty
Cardinalities for CAB
m
MECHANIC _can_fly_
PILOT
1. .1 m m 1 1
m
m AIRCRAFT
m m
_trained MODEL
_for_ _can_land m AIRPORT m
.1. .1. _at_
1 1 From To
_copilot _captain
_from_ _for_ _for_ _for_ _for_
_nonstop_to_
m 1. .m
m
AIRCRAFT _in_
_scheduled 1. .1
_for_ m
1
m m _for_ m m
MAINTENANCE Owner
RECORD
ITINERARY FLIGHT
m 1 m
m
_belongs_to_ _for_
Notes:
This visual contains the cardinalities for the relationship types for our sample airline
company called Come Aboard. The cardinalities for the relationship types were derived
from the description of the business relationship types contained in the problem statement
in Appendix A - Sample Problem Statement.
Based on the cardinalities, the relationship types have the following relationship keys:
Notes:
A closer look at entity type AIRCRAFT MODEL for our sample airline company reveals that
it contains some attributes that are not really aircraft model specific, but rather aircraft type
specific. For example, Category (JET, TURBOPROB, etc.), Manufacturer, and Number of
Engines are only dependent on Type Code, i.e, the type of the aircraft (e.g., B747), and not
on the specific model. They are the same for all models of the same type.
This leads to the conclusion that business object type Aircraft Models as described by the
problem statement is rather a combination of two entity types, namely, of entity types
AIRCRAFT TYPE and AIRCRAFT MODEL as illustrated on the right-hand side of the
visual. AIRCRAFT TYPE only contains the type-specific attributes and AIRCRAFT MODEL
the model-specific attributes (e.g., Dimensions consisting of Length, Height, and Wing
Span) that may be different for the models of a type.
The entity key of AIRCRAFT TYPE is Type Code. For AIRCRAFT MODEL, it consists of
Type Code and Model Number as before since Model Number alone is not unique.
Of course, entity types AIRCRAFT TYPE and AIRCRAFT MODEL are interconnected by a
relationship type:
_for_ D
AIRCRAFT AIRCRAFT
TYPE 1. .1 1. .m MODEL
No cardinality in diagram
since always 1. .1 AIRCRAFT MODEL
Type Code: B747
_for_ Model Number: 400
AIRCRAFT TYPE
Must be
Type Code: B747 equal
_for_ AIRCRAFT MODEL
Type Code: B747
Model Number: 200
AIRCRAFT MODEL
Type Code: A310
_for_ Model Number: 200
AIRCRAFT TYPE
Must be
Type Code: A310 equal
_for_ AIRCRAFT MODEL
Type Code: A310
Model Number: 300
Notes:
An aircraft model cannot be connected to an arbitrary aircraft type. It can only be
associated with the aircraft type having the same type code as the aircraft model: A Boeing
747, Model 400 (Type Code = B747, Model Number = 400) is a Boeing 747 (type) and,
therefore, can only be associated with the instance of AIRCRAFT TYPE having the entity
key value B747.
An entity type being dependent on another entity type or on a relationship type in such a
way that
• a part of its entity key or its full key is the key of the other entity type or the key of the
relationship type
• each of its instances must be connected to, and only to, the entity instance or
relationship instance with the matching key value
is referred to as dependent entity type. In the entity-relationship model, the dependent
entity type is identified by the letter D at its end of the relationship type establishing the
dependency.
Uempty Because of this key interdependency, each dependent entity instance must belong to one
and only one parent instance. Thus, the cardinality at the parent end of the relationship
type establishing the dependency must always be 1..1 and is omitted to simplify the
diagram.
Notes:
As described before, a dependent entity type is an entity type fulfilling the following
conditions:
• A part of its key or its entire key is equal to the key of another entity type or of a
relationship type. This entity type or relationship type is referred to as parent entity type
or parent relationship type, respectively.
• There must exist a relationship type between the parent entity type or relationship type
and the dependent entity type with the following characteristics:
- Each instance of the dependent entity type is, at all times, connected to one and only
one parent instance.
- The dependent and parent instances interconnected are those with matching values:
The value of the appropriate key portion of the dependent entity instance must be
equal to the key value of the parent instance.
The relationship type interconnecting the parent entity type or relationship type and the
dependent entity type is referred to as owning relationship type.
Uempty An entity type must not be dependent on more than one entity type or relationship type.
Should you see the need for a dependency on two parents, a relationship type between the
parents is missing and should be established. Should you see a dependency on more than
two parents, multiple interrelated relationship types are missing, and the dependent entity
type is to be based on the last of them.
As discussed before, the defining attributes of a relationship type are the keys of its source
and target. However, because of the matching values of the key/key portion, the key of the
dependent entity type is sufficient to completely describe the owning relationship type.
Therefore, the key of the parent is omitted.
As a consequence of the implied 1..1 cardinality at the parent end, the key of the
dependent entity type is also the key of the owning relationship type.
Relationship Key:
_nonstop_to_ Flight Number
From (Airport Code)
To (Airport Code)
To m
AIRPORT _in_ ITINERARY
K Airport Code K Flight Number
... 1 m ...
..
From m m _as_
D 1. .1
Relationship Key:
From (Airport Code) LEG
To (Airport Code) K Flight Number
K From
K To
Leg Number
Notes:
Besides the defining attributes, relationship types may have additional attributes further
characterizing them. These attributes are referred to as nondefining attributes of the
relationship type.
When talking about the nonstop connections for an itinerary, we observed that the legs of
an itinerary must be ordered. Each instance of relationship type _in_
(AIRPORT_nonstop_to_AIRPORT_in_ITINERARY) having as target the considered
itinerary represents a leg of that itinerary. Its defining attributes specify the flight number for
the itinerary and the nonstop connection (starting and ending airports) for the leg.
To order the legs, it is necessary to assign an attribute (Leg Number) to relationship type
_in_ by means of which the sequence of the legs for the itinerary can be established. The
attribute cannot simply be added to entity type ITINERARY since, in this case, the
itineraries were ordered without considering the legs. The attribute must also not be an
attribute for relationship type _nonstop_to_ (AIRPORT_nonstop_to_AIRPORT) since, in
this case, the nonstop connections were ordered without consideration for the itineraries.
The order for a leg, however, depends on both the itinerary and the nonstop connection for
Uempty the leg. Consequently, Leg Number must be an attribute for relationship type _in_
(AIRPORT_nonstop_to_AIRPORT_in_ITINERARY).
Nondefining attributes are assigned to a relationship type by basing a dependent entity
type on the relationship type containing the attributes and the relationship key. Thus, to
each instance of the relationship type, zero, one, or more instances of the dependent entity
type are attached. The cardinality for the dependent entity type determines how many
instances of the dependent entity type can and must be attached to a relationship instance.
In case of the example on the visual, a dependent entity type is based on relationship type
_in_ containing nonkey attribute Leg Number. For each relationship instance, i.e., each leg
of an itinerary, it specifies the sequence number (leg number) for the appropriate nonstop
connection for the itinerary. Since the dependent entity type further describes the legs of
the itinerary, we have called it LEG in the above visual. The abbreviated name of the
owning relationship type is _as_. According to our naming convention, its full name is:
AIRPORT_nonstop_to_AIRPORT_in_ITINERARY_as_LEG
Because each leg only receives a single leg number, the cardinality for the dependent
entity type must be 0..1 or 1..1. Which of these it is depends on how you want to model it. If
you also want to assign a leg number to the only leg of a one-leg itinerary, the cardinality
must be 1..1. If you only want to sequence the legs of multi-leg itineraries, the cardinality
should be 0..1. We have chosen the first alternative because it treats itineraries more
uniformly, prevents that legs of a multi-leg itinerary are not sequenced, and tends to be
more general.
In addition to the leg number, the dependent entity type contains the key of the parent
relationship type, i.e., the attributes Flight Number (coming from ITINERARY), From, and
To (both from AIRPORT_nonstop_to_AIRPORT).
Because of the maximum cardinality of 1, the key of the (parent) relationship type becomes
the key of the dependent relationship type.
In addition to the key attributes, the dependent entity type may contain any number of
(nondefining) attributes for the relationship type (e.g., the planned departure and arrival
Helvetica for the leg) as long as the maximum value of the cardinality for the dependent
entity type is observed. However, dependent entity types should follow the rules for entity
types we have established before. In particular, the dependent entity type should have a
sensible meaning for the relationship type (and application domain) and its attributes
should all support that meaning. The dependent entity type should not be a garbage
collection. If necessary, introduce multiple dependent entity types for the relationship type
each having a well-defined meaning.
You also may want to introduce multiple dependent entity types for the relationship type if
many of the nondefining attributes are optional. In this case, you might prefer to have a
dependent entity type with cardinality 1..1 for the mandatory attributes and one or more
others with cardinality 0..1 for the optional attributes. All of the attributes of such an entity
type should be optional at the same time: If one of the attributes does not apply, the others
do not apply either. The resulting dependent entity types should again have a well-defined
meaning for the relationship type whose nondefining attributes they contain.
For attributes requiring a different maximum cardinality, you need different dependent entity
types (possibly multiple ones in accordance with the discussions above).
Uempty
Nondefining Attributes - Sample Diagram
LEG
Flight Number: YY3367
From: ATL
To: STR
AIRPORT Leg Number: 1
Airport Code: ATL
... ...
_as_
_nonstop_to_
_in_
AIRPORT ITINERARY
Airport Code: STR Flight Number: YY3367
... ... ... ...
_in_
_nonstop_to_
AIRPORT _as_
Airport Code: FRA LEG
... ...
Flight Number: YY3367
From: STR
To: FRA
Leg Number: 2
Notes:
The above instance diagram illustrates dependent entity type LEG, containing the
nondefining attributes for relationship type
AIRPORT_nonstop_to_AIRPORT_in_ITINERARY (= _in_), for a sample itinerary. Because
of cardinality 1..1 for dependent entity type LEG, there is only one dependent entity
instance for each instance of relationship type _in_.
The dependent entity instance contains the key of its parent relationship instance and the
assigned leg number. The nonstop connection from Atlanta (ATL) to Stuttgart (STR) is the
first leg (Leg Number = 1) for itinerary YY3367. The nonstop connection from Stuttgart to
Frankfurt (FRA) is the second leg (Leg Number = 2) of the itinerary.
Notes:
Using (nondefining) attributes, i.e., a dependent entity type, you can replace the two
relationship types PILOT_captain_for_FLIGHT and PILOT_copilot_for_FLIGHT by a single
relationship type PILOT_assigned_to_FLIGHT as illustrated in the lower part of the above
visual. Each instance of the new relationship type has associated with it an instance of
dependent entity type PILOT ASSIGNMENT specifying the function (CAPTAIN or
COPILOT) for the selected pilot on the selected flight.
Since a pilot can be assigned to multiple flights and multiple pilots can be assigned to a
flight, relationship type PILOT_assigned_to_FLIGHT is a m:m relationship type.
Accordingly, its key consists of the keys for PILOT and FLIGHT.
Since the cardinality for PILOT ASSIGNMENT is 1..1 (a pilot assigned to a flight has one
and only one function for that flight), no additional attributes are needed to achieve
uniqueness of the entity instances and the key of the parent relationship type becomes the
entity key of the dependent entity type.
Both approaches have advantages and disadvantages. The first approach of using two
relationship types ensures that not more than two pilots are assigned to a flight and not
Uempty more than one pilot as captain or copilot, respectively. However, without additional
constraints, it does not prevent a pilot from being assigned as captain and copilot to the
same flight. (Constraints are discussed later in this unit.)
The second approach, using a single relationship type, does not prevent the assignment of
multiple captains or copilots to a flight without additional constraint. It also does not prevent
that more than two pilots are assigned to a flight. However, because of the uniqueness
requirement for the entity key, it ensures that a pilot only assumes one role for a flight.
The second solution is more flexible and open-ended. By removing the appropriate
constraints and allowing additional values for attribute Pilot Function, it enables Come
Aboard to introduce substitute captains and copilot (i.e., standbys for pilots that fall sick) or
to assign multiple captains or copilots to long flights for which the maximum flying period for
pilots were exceeded. However, before introducing these new functions, they must be
discussed with and approved by the domain expert or the appropriate department of
expertise. In case of multiple captains and copilots for long flights, you can easily think of
additional attributes for dependent entity type PILOT ASSIGNMENT: for example, the time
in the flight when a pilot is captain or copilot.
Parent
Defining Attributes:
Key of Dependent
Entity Type
Relationship Key: Key of Target
Key of Dependent Target
Entity Type
D
Dependent
Entity Type Target
Defining Attributes:
Key of Dependent
Entity Type
Key of Target
Notes:
It is conceivable that an owning relationship type is the source or target of another
relationship type. However, in this case, you can base the other relationship type on the
dependent entity type rather than on the owning relationship type as explained in the
following.
For simplicity, let us assume that the owning relationship type is the source of the second
relationship type. As explained before, the key of the owning relationship type is the key of
the dependent entity type. Therefore, the defining attributes for the second relationship
type are the key of the dependent entity type and the key of the target.
If the second relationship type had as source the dependent entity type, its defining
attributes would also be the key of the dependent entity type and the key of the target. This
means that the potential relationship instance are the same in both cases and that the two
implementations of the second relationship type are equivalent. Consequently, you can
base the second relationship type on the dependent entity type rather than on the owning
relationship type simplifying the entity-relationship model.
Uempty
Controlling Property
Owner 1
MAINTENANCE _belongs_to_
RECORD
C m
Deletion of Deletion of
Relationship Instance Controlled Instance
MAINTENANCE RECORD
Maintenance Number: 004712
... ...
_belongs_to_
MAINTENANCE RECORD _belongs_to_
MAINTENANCE RECORD
Maintenance Number: 004711 Maintenance Number: 004713
... ... ... ...
_belongs_to_
MAINTENANCE RECORD
Maintenance Number: 004714
... ...
Notes:
The controlling property can be specified for the source or the target of a relationship type
or for both. In the entity-relationship model, it is indicated by the letter C at the end of the
relationship type to which it applies.
If you specify the controlling property for the source (target), the source (target) instance
belonging to a relationship instance is to be deleted when the relationship instance is
deleted.
As a modeling construct, the controlling property can only describe what should happen if a
relationship instance is deleted. Nevertheless, when talking about an example, people
often say: If this relationship instance is deleted, then this source (or target) instance is
deleted. This means that they talk about the effects of the controlling property when it is
implemented.
The above visual illustrates the controlling property for relationship type
MAINTENANCE RECORD_belongs_to_MAINTENANCE RECORD. The problem
statement for our sample airline company Come Aboard specifies that a maintenance
record should be deleted if its owning maintenance record is deleted. CAB's maintenance
records are hierarchically structured. A maintenance record can belong to another (one)
maintenance record, the owning maintenance record, and can have multiple subrecords.
The implied deletion of the subrecords is modeled by specifying the controlling property for
the subrecord end (the source) of relationship type
MAINTENANCE RECORD_belongs_to_MAINTENANCE RECORD. As indicated on the
visual, the controlling property implies that, as a result of the deletion of the relationship
instance connecting maintenance record 004712 to maintenance record 004711,
maintenance record 004712 is to be deleted.
The deletion of a maintenance record implies the deletion of all relationship instances
having the deleted maintenance record as their target. Consequently, the controlling
property for the source of the relationship type implies that all subrecords of a maintenance
record are to be deleted if the maintenance record is deleted. Thus, if maintenance record
004711 is deleted, maintenance records 004712, 004713, and 004714 should be deleted
as well.
Uempty
Cascading Effect
Owner 1
MAINTENANCE _belongs_to_
RECORD
C m
Notes:
The controlling property may have a cascading effect: One deletion may "cause" many
others. This is especially true for unary relationship types as illustrated on the above visual:
• The maintenance record originally being deleted is the record with maintenance number
004711 ( 1 ).
• The deletion of maintenance record 004711 causes the deletion of the relationship
instances associating maintenance record 004711 with maintenance records 004721
and 004722 ( 2 ) because their target instance is deleted.
• The deletion of the two relationship instances, in conjunction with the controlling
property, implies that maintenance records 004721 and 004722 are to be deleted ( 3 ).
• Since maintenance record 004722 was the target of the relationship instance connecting
maintenance record 004801 to it, the relationship instance is deleted as well ( 4 ).
• Together with the controlling property, the deletion of the relationship instance
interconnecting maintenance records 004722 and 004801 implies that maintenance
record 004801 is to be deleted ( 5 ).
• The deletion of maintenance record 004801 causes the deletion of the relationship
instance interconnecting maintenance records 004801 and 004802 because the target
of the relationship instance has been deleted ( 6 ).
• Finally, the deletion of the relationship instance implies that maintenance record 004802
is to be deleted ( 7 ).
Thus, due to the controlling property, the deletion of a single maintenance record implies
the deletion of all maintenance records except maintenance record 002907. Maintenance
record 002907 is not interconnected to any of the deleted maintenance records.
The example illustrates very clearly that you must be careful when using the controlling
property and must understand its explicit and implicit effects. If the cascading effect is what
the application domain wants to achieve (as is the case for the maintenance records), the
usage of the controlling property is perfectly all right.
Uempty
Controlling for Relationship Type Attributes
_nonstop_to_
To m
AIRPORT _in_ ITINERARY
K Airport Code K Flight Number
... 1 m ...
..
From m m _as_
C D 1. .1
LEG
K Flight Number
K From
K To
Leg Number
Notes:
As we discussed before, the nondefining attributes for relationship types are modeled by
means of dependent entity types. When a relationship instance is deleted, the dependent
entity instance or instances containing the nondefining attributes (values) for the
relationship instance must be deleted as well. This can be achieved by means of the
controlling property for the dependent entity type.
The visual illustrates this for dependent entity type LEG. If a nonstop connection is
removed from an itinerary, its leg number (a nondefining attribute for relationship type
AIRPORT_nonstop_to_AIRPORT_in_ITINERARY) should be deleted as well as indicated
by the controlling property for LEG.
PILOT MECHANIC
K Employee Number K Employee Number
Last Name General Last Name
First Name Employee First Name
Address Information Address
Date of Birth Date of Birth
... ...
Date Last Checkup Area of Expertise
Result Last Checkup Type of Certification
Date Next Checkup Date Certification
Last Flown On Security Status
... ...
Pilot Mechanic
Specific Specific
Information Information
Notes:
If you scrutinize the attributes for entity types PILOT and MECHANIC for our sample airline
company Come Aboard, you will realize that they have attributes (e.g., Last Name, First
Name, Address, and Date of Birth) that are common to both of them. They also have
attributes that are specific to the particular entity type: Date Last Checkup, Result Last
Checkup, Date Next Checkup, and Last Flown On only apply to pilots; Area of Expertise,
Type of Certification, Date of Certification, and Security Status only to mechanics.
The common attributes are not specific to pilots or mechanics. Rather, they are common to
all employees. Pilots and mechanics are subcategories (or subtypes) of employees. As
employees, the common attributes apply to them as well.
Since CAB does not want to distinguish the different types of employees when only
processing the employee information, it makes sense to introduce another entity type,
called EMPLOYEE, which functions as a supertype and contains the attributes common to
all employees. The common attributes are removed from PILOT and MECHANIC so they
only contain the attributes that are specific to pilots or mechanics, respectively.
Uempty The introduction of entity type EMPLOYEE is illustrated on the next visual. It leads to
supertypes and subtypes.
C D 1 1 D C
PILOT MECHANIC
K Employee Number K Employee Number
Date Last Checkup Area of Expertise
Result Last Checkup Type of Certification
Date Next Checkup Date Certification
Last Flown On Security Status
... ...
Notes:
When categorizing items, you form classes and subclasses. The subclasses structure the
elements of the classes. They do not contain different elements. Each member of a
subclass also belongs to the (superior) class to which the subclass belongs.
In modeling, the items categorized are the instances of entity types. The superior class is
called supertype entity type or supertype. The subclasses are referred to as subtype entity
types or subtypes. A supertype may have one or more subtypes. The term class structure
is used to denote the structure consisting of a supertype and its subtypes.
For each instance of a subtype, the supertype contains one, and only one, corresponding
instance reflecting the fact that a member of a subclass, at the same time, is a member of
the corresponding superior class. In the example on the visual, pilots and mechanics are,
at the same time, employees. Therefore, for each instance of entity types PILOT or
MECHANIC, there must be a corresponding instance in entity type EMPLOYEE.
A supertype can be considered as the (common) generalization of its subtypes.
Conversely, the subtypes can be considered as specializations of the supertype. Therefore,
Uempty the terms generalization and specialization are used in conjunction with supertypes and
subtypes.
Whereas there must be a supertype instance for each subtype instance, there need not be
a subtype instance for every supertype instance. This means that the specialization can be
incomplete (partial). Come Aboard, for example, has employees (other than pilots or
mechanics) who do not have specific attributes. For them, there is not a subtype.
The supertype/subtype concept implies that the total set of attributes for the conceptual
object represented by a subtype instance consists of its subtype attributes and the
attributes for the corresponding supertype instance. In our example, the total set of
attributes for a pilot consists of his/her pilot-specific attributes (as represented by the PILOT
instance) and his/her attributes as an employee (i.e., the attributes of the corresponding
EMPLOYEE instance).
Processing-wise, you want both the attributes of the subtype instance and of the
corresponding supertype instance when referring to a subtype instance. In contrast, when
referring to a supertype instance, i.e., when processing the represented object in the
quality expressed by the supertype, you only want the attributes of the supertype instance
and not the attributes of any subtype instances associated with it.
When a supertype instance is deleted, any associated subtype instances must be deleted
as well because the conceptual object associated with the supertype instance no longer
exists. By themselves, a subtype instance and its corresponding supertype instance can be
considered as partial instances. Together, they form the complete instance.
Logically, the supertype has a relationship type of the form
supertype_is_subtype
with each subtype (e.g., EMPLOYEE_is_PILOT and EMPLOYEE_is_MECHANIC). To
indicate that these relationship types belong to the same class structure, they are
combined to a fork whose handle starts at the supertype. (Note that an entity type may be
structured in more than one way into subtypes making it necessary to group the
relationship types for a class structure.)
In addition, the supertype is identified by the letter S next to it. Without this indication, the
supertype could not be identified for class structures having just one subtype.
The set of _is_ relationship types interconnecting a supertype and its subtypes is referred
to as is-bundle. All relationship types of the is-bundle have a supertype cardinality of 1..1
since there must be one and only one supertype instance for every subtype instance.
Therefore, the supertype cardinality is omitted. It is considered implied by the letter S for
the supertype.
For each supertype instance, a subtype may contain at most one instance. Consequently,
the cardinality for the subtype of the _is_ relationship type must be ..1.
As entity types, the subtypes must have an entity key. The most natural choice is the entity
key of the supertype. In this case, a subtype instance is always to be connected to the
supertype instance with the matching key value. Accordingly, the subtype becomes a
dependent entity type and is marked as such.
Since a subtype instance is to be deleted when its supertype instance is deleted, the
controlling property applies to the subtypes.
Uempty
Bundle Cardinalities
. .1 Exclusive
0. .1 or 1
Employee may be a pilot or a
mechanic, but not both
EMPLOYEE
S
1. .1
Employee must be a pilot or a
_is_ mechanic, but not both
1
0. .m or m
C D 1 1 D C Employee may be a pilot and/or
a mechanic
PILOT MECHANIC
1. .m
Employee must be a pilot and/or
a mechanic
1. . Covering
Notes:
As discussed before, for each subtype instance, there must be one, and only one,
supertype instance. The reversal of this statement is not true. There need not necessarily
be a subtype instance for a supertype instance. For a supertype instance, there may also
be instances in multiple subtypes. However, a subtype may contain at most one instance
for any supertype instance.
It is a characteristic property of a class structure if, for every supertype instance, at least
one of the subtypes must contain a corresponding subtype instance.
It is another characteristic property of a class structure if, for a supertype instance, multiple
subtypes can contain a corresponding subtype instance.
These two properties are controlled by the bundle cardinality. The bundle cardinality is a
cardinality for the is-bundle rather than for the sources and targets of the relationship types
it comprises. The bundle cardinality specifies how many relationship instances of the is
bundle (is-relationship instances) a supertype instance must have at least and may have at
most. In other words, it specifies how many prongs the fork must have at least and may
have at most for a supertype instance.
In the entity-relationship model, the bundle cardinality is specified at the point of the fork for
the is-bundle where the handle and the prongs meet. It can assume the following values:
0..1 or 1
A supertype instance may have at most one is-relationship instance. This means that a
supertype instance need not have a corresponding subtype instance in any of the
subtypes. It may have a corresponding instance in at most one subtype.
For the example on the visual, this would mean that an employee need not be a pilot or a
mechanic. The employee can be a pilot or mechanic, but cannot be both.
1..1
Every supertype instance must have one and only one is-relationship instance. This
means that a supertype instance must have a corresponding subtype instance in one and
only one subtype.
For the example on the visual, this would mean that an employee must be a pilot or a
mechanic, but cannot be both.
0..m or m
A supertype instance may have any number of is-relationship instances. This means that
a supertype instance need not have a corresponding subtype instance in any of the
subtypes. It may have corresponding instances (one each) in multiple subtypes.
For the example on the visual, this would mean that an employee need not be a pilot or a
mechanic, but can be a pilot or mechanic and can be both.
1..m
A supertype instance must have one or more is-relationship instances. This means that a
supertype instance must have corresponding subtype instances (one each) in at least one
subtype. It may have corresponding instances in multiple subtypes.
For the example on the visual, this would mean that an employee must be a pilot or
mechanic and can be both.
The bundle cardinality is only specified if the supertype has more than one subtype. In case
of only a single subtype, the subtype cardinality is sufficient.
As always, the correct choice of the bundle cardinality depends on the application domain.
The problem statement for our sample airline company implies that there are other
employees than pilots and mechanics. Thus, the bundle cardinality can only be 0..1 or 0..m.
Since the business constraints for Come Aboard state that a pilot cannot be a mechanic at
the same time, the bundle cardinality must be 0..1 (= 1) for the illustrated example.
Bundle cardinalities of the form ..1 specify that a supertype instance may have subtype
instances in at most one subtype. Therefore, the subtype set (the set of subtypes) is
referred to as exclusive.
Uempty Bundle cardinalities of the form 1.. specify that a supertype instance must have a
corresponding subtype instance in at least one subtype. Therefore, the subtype set is
referred to as covering.
You should ensure that the subtype cardinalities and the bundle cardinality are compatible.
If at least one of the subtype cardinalities is 1..1 (meaning that, for each supertype
instance, there must be a corresponding instance in this subtype), the bundle cardinality
should be 1.. (1..1 or 1..m).
_belongs_to_
m MAINTENANCE RECORD 1
K Maintenance Number
C Date of Maintenance Owner
Type of Maintenance
...
S
_is_
1. .1
D 1 1 D
ARCHIVE RECORD ACTIVE RECORD _for_
K Maintenance Number K Maintenance Number AIRCRAFT
Aircraft Number . . . ??? m 1. .1
Retention Date
...
Notes:
When we discussed the business relationship type between aircraft and maintenance
records before, we determined that the business relationship type cannot be expressed by
a relationship type in the entity-relationship model. The reason was that the maintenance
records for an aircraft, including the aircraft number, must be kept even after the remaining
information about the aircraft has been deleted. This led to the conclusion that the aircraft
number must be an attribute of entity type MAINTENANCE RECORD.
Using a class structure for the maintenance records, the business relationship type can be
expressed by means of a relationship type, however, only for the maintenance records of
existing aircraft:
• The maintenance records are subdivided into two subtypes: Maintenance records for
aircraft owned by CAB (entity type ACTIVE RECORD) and maintenance records for
aircraft no longer owned by CAB (entity type ARCHIVE RECORD).
Since a maintenance record must be either an active record or an archive record, the
bundle cardinality must be 1..1. Both entity types are dependent entity types and the key
of the supertype (Maintenance Number) is also the entity key of the subtypes.
Uempty • Since the remaining aircraft information no longer exists for archive maintenance
records, subtype ARCHIVE RECORD must include the serial number of the aircraft to
which it belonged (Aircraft Number). It may contain other attributes, such as the date
until when the maintenance record must be retained (Retention Date), that only exist for
archive maintenance records.
• Besides the entity key, subtype ACTIVE RECORD may contain additional attributes that
only exist for active maintenance records. But are there any? This illustrates the
possibility of entity types just containing the entity key. As we mentioned before, you
should be suspicious of entity types not having nonkey attributes. So, you should be
here and question if this is a good solution?
• By definition, active maintenance records belong to aircraft owned by Come Aboard.
Therefore, their relationship to aircraft can be expressed by a relationship type between
ACTIVE RECORD and AIRCRAFT.
• In general, other relationship types having MAINTENANCE RECORD as their source or
target are not affected by the introduction of the class structure.
If an aircraft is no longer owned by CAB and its entity instance is removed, the appropriate
instances for its maintenance records must be moved from subtype ACTIVE RECORD to
subtype ARCHIVE RECORD. This is enforced by the entity-relationship model:
• The target cardinality of 1..1 for relationship type ACTIVE RECORD_for_AIRCRAFT
requires, at all times, an aircraft for an active maintenance record. Consequently, if an
aircraft is deleted, its active maintenance record instances must either be assigned to
other aircraft or removed from ACTIVE RECORD. Since it would be incorrect to assign
them to other aircraft, they must be removed from ACTIVE RECORD.
• On the other hand, bundle cardinality 1..1 requires that each instance of
MAINTENANCE RECORD has an instance in either ARCHIVE RECORD or ACTIVE
RECORD. If an aircraft is deleted, its maintenance records cannot have instances in
ACTIVE RECORD as explained before. Thus, they must have instances in ARCHIVE
RECORD.
As an instance is moved from ACTIVE RECORD to ARCHIVE RECORD, the serial number
for the aircraft it belonged to (Aircraft Number) must be added along with any other
attributes for archive maintenance records.
_belongs_to_ _for_
Notes:
The above entity-relationship model for our sample airline company includes the changes
discussed since we established the cardinalities for the relationship types. However, it does
not include the alternate maintenance record solution on the previous visual since it does
not really provide an improvement in our case.
Note the following changes:
• Entity type AIRCRAFT TYPE has been introduced. Entity type AIRCRAFT MODEL
becomes dependent on it.
• Relationship type MAINTENANCE RECORD_belongs_to_MAINTENANCE RECORD is
controlling for its source.
• Relationship type AIRPORT_nonstop_to_AIRPORT_in_ITINERARY has Leg Number
as nondefining attribute in dependent entity type LEG.
• Relationship type AIRPORT_nonstop_to_AIRPORT_in_ITINERARY_for_FLIGHT has
been replaced by relationship type FLIGHT_for_LEG. Note that entity type FLIGHT
Uempty becomes dependent on LEG: The values of a portion of its entity key must always match
up with the appropriate values of the key of LEG.
The new relationship type seems to be more natural because its target is entity type
LEG. However, we can do this only because:
- We chose to have a leg number even for one-leg itineraries (cardinality of LEG is 1..1
in AIRPORT_nonstop_to_AIRPORT_in_ITINERARY_as_LEG); otherwise, there
would not be an entity instance of LEG for nonstop connections without leg number to
which we could connect flights.
- The two relationship types have the same defining attributes, the key of entity type
FLIGHT. (Also for the old relationship type, FLIGHT would have been a dependent
entity type.)
• AIRCRAFT MODEL_for_AIRPORT_nonstop_to_AIRPORT_in_ITINERARY has been
replaced by relationship type AIRCRAFT MODEL_for_LEG. Again, this can only be
done because the cardinality of LEG is 1..1 in relationship type
AIRPORT_nonstop_to_AIRPORT_in_ITINERARY_as_LEG and the defining attributes
of the new relationship type are the same as for the old relationship type. (Note that the
key of LEG is the same as the key of
AIRPORT_nonstop_to_AIRPORT_in_ITINERARY.)
• The two relationship types PILOT_captain_for_FLIGHT and
PILOT_captain_for_FLIGHT have been replaced by relationship type
PILOT_assigned_to_FLIGHT as discussed.
• EMPLOYEE has been introduced as a supertype for PILOT and MECHANIC.
Constraints
Constraint
An interdependency between objects of an
entity-relationship model restricting the possible
instances of entity or relationship types
The interdependent objects can be attributes,
entity types, or relationship types
A single constraint can restrict the instances
of multiple entity or relationship types
Notes:
Constraints are interdependencies between the objects of an entity-relationship model
restricting the possible instances that entity types or relationship types can assume. The
interdependent objects can be attributes, entity types, or relationship types. A single
constraint can restrict the instances of multiple entity types and/or relationship types.
Logically, a constraint consists of three components: a set of constraining objects, a set of
constrained objects, and a rule. The constraining objects can be attributes, entity types, or
relationship types. Their values (attributes) or instances (entity types or relationship types)
restrict the instances of the constrained objects. The constrained objects may be entity
types or relationship types. The rule specifies how the values or instances of the
constraining objects restrict the instances of the constrained objects.
There may be all kinds of constraints for the entity types and relationship types of an
entity-relationship model. The simplest form of a constraint restricts the values of an
attribute of an entity type and, thus, the instances that the entity type can assume. In this
case, the constraining object is the attribute and the constrained object is the entity type.
Uempty The rule describes how the values of the attribute, and, thus, the instances of the entity
type, are constrained.
In principle, the value ranges (domains) of attributes could be considered as constraints.
However, these are not the constraints you would like to visualize in an entity-relationship
model since they would clutter it. You do need to document the domains of the attributes
(more precisely, of the data elements on which the attributes are based), but you do this
outside the entity-relationship model, namely, in the data inventory described in Unit 5 -
Data and Process Inventories.
In the entity-relationship model, you should only document restrictions for attributes that go
beyond the limitations imposed by the domains. The constraints that you really want to
visualize in an entity-relationship model are those where an attribute, entity type, or
relationship type constricts the instances of a different entity type or relationship type.
You should not formulate something as a constraint if it can reasonably be expressed by
other modeling constructs. However, there will always be constraints that cannot be
expressed by other modeling constructs even if additional modeling constructs were
introduced. The variety of possible constraints is so immense that it is impossible to cover
them all by additional modeling constructs.
The primary source for constraints are the business constraints for the considered
application domain. Generally, a nontrivial application domain will have many constraints.
So does the application domain for our sample airline company. For example, the business
constraint that an aircraft cannot have more engines mounted than the aircraft type (!)
allows gives rise to a constraint for entity type AIRCRAFT. We will study this and further
examples on the subsequent visuals.
Besides the constraints resulting from the business constraints, an entity-relationship
model may also contain other constraints that cannot directly be derived from business
constraints. We will see such an example as well.
Constraints in ER Model
Notes:
As mentioned before, a constraint can limit the instances of a single object or of multiple
objects. If a single entity type or relationship type is constrained, the constraint is positioned
near the constrained object.
If multiple objects are constrained by the same constraint, the constrained objects are
interconnected by a dotted line and the constraint is placed next to the connecting dotted
line. To avoid cluttering and to maintain the clearness of the entity-relationship model, you
may prefer not to connect the constrained objects, but rather repeat the constraint for every
constrained object. If the constrained objects are far apart in the entity-relationship model,
it may be difficult or even impractical to interconnect them.
To visualize the interdependency, a dotted arrow may be drawn from the constraining object
(in case of an attribute from its entity type) to a constrained object and the constraint placed
next to it.
In the entity-relationship model, the constraints themselves are documented as follows:
• The constraints are enclosed in braces.
Uempty • Each constraint consists of a unique identifier which is optionally followed by a colon (:)
and the rule describing the interdependency.
• The unique identifier for the constraint can be anything you like. Usually, it is a number.
Its purpose is to tie together repetitions of the same constraint (if multiple objects are
constrained by a single constraint as explained above) and to identify a detailed
description of the constraint outside the diagram.
• The colon and the rule may only be omitted if an outside description of the constraint is
provided.
Multiple constraints for an object may be placed within the same enclosing braces. The
different constraints are separated by semicolons (;).
For the rule, you may use conditional expressions or formulas, if applicable, or natural
language text. Natural language text may be easier to understand, but holds the danger of
ambiguities. However, many of the rules can only be formulated using natural language.
It would be possible to define a formal notation using conditional expressions,
mathematical symbols, set symbols, and functional operators covering most of the cases,
but this formal notation would be complex and not necessarily enhance the clarity of the
entity-relationship model. Most of the time, natural language may still be your best choice.
Therefore, we will use natural language in most of the examples in this document.
Constraints (Example 1)
AIRCRAFT TYPE
K Type Code D
Category _for_
Manufacturer
AIRCRAFT MODEL
1. .m
Number of Engines
... 1. .1
_for_
m
AIRCRAFT
K Aircraft Number
Date Acquired
Engine: value-1,
{ 1 : Number of engines for aircraft <_ value-2,
Number of engines for aircraft type } value-3,
value-4
...
Constraint No. 1
Rule: Number of engines for aircraft <_ Number of engines for aircraft type
Explanation: An aircraft cannot have more engines mounted than the
aircraft type allows
Figure 4-44. Constraints (Example 1) CF182.0
Notes:
The problem statement for Come Aboard in Unit 3 - Problem Statement states as a
business constraint that an aircraft cannot have more engines mounted than the aircraft
model allows. In the meantime, we have learned that the number of engines is rather a
characteristic of the aircraft type and, therefore, an attribute of entity type AIRCRAFT TYPE
(and not of entity type AIRCRAFT MODEL) as illustrated in the above entity-relationship
model portion.
Attribute Number of Engines of entity type AIRCRAFT TYPE is the constraining object of
the constraint. It restricts how many values the attribute Engine may have for instances of
entity type AIRCRAFT. Thus, it constrains the instances of entity type AIRCRAFT, the
constrained object.
The dotted arrow from AIRCRAFT TYPE to AIRCRAFT visualizes who constrains whom.
The braces next to the arrow contain the identifier (1) and the rule for the constraint.
At the bottom of the visual, an outside (of the entity-relationship model) description of the
constraint is given. It repeats the rule and provides an explanation. A more complete
Uempty outside description should list the constraining objects (Number of Engines in entity type
AIRCRAFT TYPE) and the constrained objects (AIRCRAFT).
Constraints (Example 2)
1 _captain_for_ m
PILOT { 2a } { 2b } FLIGHT
1 _copilot_for_ m
Constraint No. 2a
Rule: The captain for a flight cannot become the copilot at the same time
Explanation: A pilot that has been assigned as captain to a flight cannot become
copilot for the flight at the same time. This means that relationship
type PILOT_copilot_for_FLIGHT cannot receive a relationship
instance already contained in PILOT_captain_for_FLIGHT.
Constraint No. 2b
Rule: The copilot for a flight cannot become the captain at the same time
Explanation: A pilot that has been assigned as copilot to a flight cannot become
captain for the flight at the same time. This means that relationship
type PILOT_captain_for_FLIGHT cannot receive a relationship
instance already contained in PILOT_copilot_for_FLIGHT.
Figure 4-45. Constraints (Example 2) CF182.0
Notes:
This example demonstrates that a business constraint may result in multiple constraints for
the entity-relationship model.
CAB has a business constraint specifying that the captain and copilot for a flight must be
different. Assuming that the pilot assignment is modeled by the two relationship types
PILOT_captain_for_FLIGHT and PILOT_copilot_for_FLIGHT (original solution), the
translation of the business constraint into constraints for the entity-relationship model
results in two constraints:
• The first constraint (2a) constrains the instances of relationship type
PILOT_copilot_for_FLIGHT by requiring that a pilot that has already been assigned as
captain to a flight cannot become the copilot of the flight as well. Thus, the instances of
PILOT_copilot_for_FLIGHT are constrained by the instances of
PILOT_captain_for_FLIGHT. If PILOT_captain_for_FLIGHT already contains an
instance for a specified pilot and flight, an instance for them must not be added to
PILOT_copilot_for_FLIGHT.
Constraints (Example 3)
Constraint No. 2
Rule: Pilot Function must be unique for a flight
Explanation: For a flight, each function (CAPTAIN or COPILOT) must only be
assigned once. This means the combination (Flight Number,
From, To, Flight Locator, Pilot Function) must be unique.
Figure 4-46. Constraints (Example 3) CF182.0
Notes:
This visual illustrates the constraint required if the pilot assignment is modeled using a
single relationship type with nondefining attributes (dependent entity type PILOT
ASSIGNMENT). As we discussed before, in this case, the uniqueness of the entity key of
dependent entity type PILOT ASSIGNMENT automatically takes care of the requirement
that a pilot be assigned only once to a flight.
However, in order to ensure that not more than one captain and not more than one copilot
are assigned to a flight, we need a constraint. The rule for the constraint is simply that the
value of attribute Pilot Function must be unique for each flight. In other words, the
quintuplet of attributes (Flight Number, From, To, Flight Locator, Pilot Function) must be
unique.
In this case, the five attributes are the constraining objects and entity type PILOT
ASSIGNMENT is the constrained object.
Note that the above constraint does not restrict the values of attribute Pilot Function to the
two values CAPTAIN and COPILOT. This should be achieved through the domain definition
for the appropriate data element.
Uempty Furthermore, this constraint is not a direct derivative of a business constraint. It originates
from the way we have modeled the pilot assignment. It does not enforce that the captain
and copilot for a flight are different. (This is achieved in another way.) It only enforces that
each function is only assigned once to a flight.
Constraints (Example 4)
AIRCRAFT _for_
LEG
MODEL 1. .1 m
m
_can_fly_ _for_
AND
m m D
{3}
m m
PILOT FLIGHT
_assigned_to_
Constraint No. 3
Rule: Pilot for flight must have license to fly aircraft model for leg
Explanation: A pilot assigned to a flight must be able, i.e., have the license,
to fly the aircraft model for the leg for the flight.
Notes:
Come Aboard has a business constraint requiring that the pilots for a flight must have the
license to fly the aircraft model for the leg for the flight, i.e., can fly the aircraft model. The
above visual illustrates how this business constraint can be translated into a constraint for
the entity-relationship model.
For the constraint, relationship types PILOT_can_fly_AIRCRAFT MODEL,
FLIGHT_for_LEG, and AIRCRAFT MODEL_for_LEG are the constraining objects since
their instances determine the pilots that can be assigned to a given flight: The aircraft
model must be for the leg of the flight and the pilot must be able to fly the aircraft model. In
other words, for a given flight, the aircraft model must be determined using relationship
types FLIGHT_for_LEG and AIRCRAFT MODEL_for_LEG. Then, the resulting aircraft
model must be used to determine the pilots that can fly the aircraft model.
The constrained object is relationship type PILOT_assigned_to_FLIGHT because only
special pilots can be assigned to a flight, namely, those that can fly the aircraft model for
the leg for the flight (rule).
Uempty Note that dependent entity type PILOT ASSIGNMENT for relationship type
PILOT_assigned_to_FLIGHT is not shown on the visual.
You might get the idea that you could avoid the constraint by having a relationship type
(PILOT_can_fly_AIRCRAFT MODEL)_assigned_to_(AIRCRAFT MODEL_for_LEG),
interconnecting PILOT_can_fly_AIRCRAFT MODEL and AIRCRAFT MODEL_for_LEG,
and basing the relationship type assigning pilots to flights on this relationship type rather
than on PILOT. Not so since an instance of AIRCRAFT MODEL_for_LEG could be paired
with any instance of PILOT_can_fly_AIRCRAFT MODEL, even with one that had a different
aircraft model! There is nothing in the relationship type definition enforcing that only
particular instances can be interconnected. Only a constraint could ensure that only
instances with the same aircraft model were interconnected.
Notes:
Most of the time, the entity-relationship model for an application domain will not fit onto a
single page. Sure, you can use a bigger piece of paper, but this will only alleviate the
problem and not solve it. The consequence is that the entity-relationship model must be
split into pieces fitting on a single page.
Your first attempt should be to identify autonomous subareas of the application domain and
to separate their entity-relationship models. If you cannot find such subareas or their
entity-relationship models do not fit on a single page, try to identify different views with
which you can look at the application domain and establish the entity-relationship models
for them. A view comprises all objects of the entity-relationship model that a specific group
of people needs to know or that concerns them.
For our sample airline company called Come Aboard, a sample view would be the Pilot
View which comprises all entity types, relationship types, and constraints that pilots need to
know about or apply to them. Another view would be the Maintenance View including all
entity types, relationship types, and constraints concerning the aircraft maintenance. Both
views are illustrated on the subsequent visuals.
Uempty If the subareas or views are still too large, try to find smaller logical units that you can break
out and that will fit onto one page.
If nothing of the above helps, you just have to break out any pieces of the
entity-relationship model that will fit onto one page. Try to break them out in such a way that
the entity types and relationship types on that page have as few relationship types to entity
types or relationship types on other pages as possible. Of course, you need to illustrate the
page-crossing relationship types on other pages. There, you need to repeat the entity types
or relationship types of this page being their sources or targets.
Generally, the submodels on the various pages will overlap. Some parts of the entire
entity-relationship model will occur on multiple pages. The submodels must not conflict with
each other. Together, they must cover the entire application domain, i.e., cover all portions
of the entire, undivided, entity-relationship model for the application domain.
AIRCRAFT
PILOT
TYPE
m
_for_
{ 2 : Pilot Function must
be unique for flight } D 1. .m
m m PILOT DC _by_ _assigned AIRCRAFT
AIRPORT _to_
From To ASSIGNMENT 1. .1 MODEL
1. .1
_nonstop_to_ _for_
1. .m
m m
_as_ DC _for_ D _for_
_in_ LEG FLIGHT AIRCRAFT
1. .1 m m 1
ITINERARY
Notes:
This visual illustrates the Pilot View for Come Aboard. It comprises all entity types,
relationship types, and constraints that pilots need to know or are concerned with.
Pilots want to know the flights they have been assigned to and their function on the flight.
Therefore, the view needs to include, besides entity type PILOT, entity types FLIGHT and
PILOT ASSIGNMENT and relationship types PILOT_assigned_to_FLIGHT and
PILOT_assigned_to_FLIGHT_by_PILOT ASSIGNMENT.
Furthermore, pilots want to know to which leg of the itinerary a flight belongs and all
information about the airports for the leg. Thus, the view must include entity types LEG,
ITINERARY, and AIRPORT and relationship types FLIGHT_for_LEG,
AIRPORT_nonstop_to_AIRPORT, AIRPORT_nonstop_to_AIRPORT_in_ITINERARY, and
AIRPORT_nonstop_to_AIRPORT_in_ITINERARY_as LEG.
In addition, pilots need to know everything about the aircraft for the flight including its model
and type. Consequently, the view must comprise entity types AIRCRAFT, AIRCRAFT
MODEL, and AIRCRAFT TYPE and relationship types AIRCRAFT_for_FLIGHT,
AIRCRAFT MODEL_for_AIRCRAFT, and AIRCRAFT TYPE_for_AIRCRAFT MODEL.
Uempty The illustrated constraint is the only one concerning the entity types and relationship types
of this entity-relationship model view. Business constraints Pilots for Flight Must Have
License for Aircraft Model for Leg and Only Aircraft Model With Start and Landing Rights
for Legs concern relationship type AIRCRAFT MODEL_for_LEG which is not part of this
view. Pilots need not necessarily know the corresponding constraints. The constraints
rather deal with flight planning and pilot assignment, done by different groups of people,
and would have to appear in the appropriate views.
Notes:
The Maintenance View comprises all entity types, relationship types, and constraints
needed for the scheduling or performance of aircraft maintenance.
The scheduling concerns mechanics, aircraft, and aircraft models and must select
mechanics from those trained for the aircraft model for the aircraft to be serviced. Thus, the
maintenance view must include entity types MECHANIC, AIRCRAFT MODEL, and
AIRCRAFT and relationship types AIRCRAFT MODEL_for_AIRCRAFT,
MECHANIC_for_AIRCRAFT_MODEL, and MECHANIC_scheduled_for_AIRCRAFT.
During a maintenance, mechanics must write maintenance records or look at them.
Therefore, the view must include entity type MAINTENANCE RECORD and relationship
types MAINTENANCE RECORD_from_MECHANIC and
MAINTENANCE RECORD_belongs_to_MAINTENANCE RECORD.
Furthermore, a mechanic needs to know information about the aircraft, its model, and its
type. Consequently, the view must also include entity type AIRCRAFT TYPE and
relationship type AIRCRAFT TYPE_for_AIRCRAFT MODEL.
Notes:
These days, many companies want to establish an enterprise-wide entity-relationship
model. For bigger companies, an enterprise-wide entity-relationship model comprises
multiple application domains. The enterprise-wide entity-relationship model might be too
complex to immediately build the entire model especially since you will rarely find a single
domain expert that fully understands all application domains involved.
As a consequence, it might be better to start with separate models for the various
application domains and then to consolidate them. It is necessary to consolidate the results
of every step of the design process for all application domains before continuing on to the
next step for any of the application domains. If you do not do this, the results most likely will
not fit together.
For the entity-relationship models, this means two things:
1. You should consolidate the problem statements of the various application domains
before developing the respective entity-relationship models.
Uempty 2. You should consolidate the entity-relationship models before proceeding to the next step
of the design process for any of the application domains.
Notes:
During the consolidation of the entity-relationship models for the various application
domains, you may experience problems concerning the entity types, the relationship types,
or the constraints of the different models.
The different models may contain entity types that have the same names, but a different
meaning. Thus, the names must be changed to achieve uniqueness. Conversely, entity
types with different names may correspond to the same business object types and,
therefore, should be named the same.
Furthermore, because of the different perspectives of the individual application domains,
an entity type for one application domain may just be a set of attributes of another entity
type in another application domain. In this case, the set of attributes must become an entity
type in the other application domain as well and the necessary relationship types using this
new entity type as source or target must be established.
A further problem that may surface is that the entity keys for the same entity types in
different entity-relationship models may be different. This problem is easy to resolve. Since
Uempty the entity types are the same, they have the same attributes so that the same entity key
can be chosen for all entity-relationship models.
As for entity types, the different entity-relationship models may contain relationship types
with the same name, but with a different meaning. To remove the problem, the names must
be changed to achieve uniqueness. This applies to the names for both directions of the
relationship types. Conversely, differently named relationship types may have the same
meaning and, therefore, should be named the same.
The cardinalities of the same relationship types may be different in different
entity-relationship models. In this case, the true cardinalities must be determined and the
erroneous entity-relationship models changed accordingly.
Some of the properties for relationship types may different. If there is a difference for the
controlling property, it must be determined if the deletion of the appropriate source or target
instances should indeed take place on an enterprise-wide scale. If so, the controlling
property must be added where it was omitted. If not, it must be dropped where specified.
If a relationship type is an owning relationship type in one entity-relationship model, but not
in another, it must be checked if the dependent entity type really fulfills the dependency
requirements and the proper corrections must be made in one of the models. Furthermore,
a class structure may have been recognized in one entity-relationship model, but not the
other. In this case, it must be introduced in the entity-relationship model where it is missing,
the supertype must appropriately be identified, and relationship types using the former
entity types as source or target must be verified.
In case of 1:1 relationship types, there are two choices for the relationship key. Thus,
different models may have chosen a different relationship key. Just choose the same
relationship key for all models concerned.
In some models, constraints may be missing that have been identified in other models. It
must be verified if the constraints have an enterprise-wide scope and the erroneous model
must be changed accordingly.
Furthermore, the different models may contain conflicting constraints. The conflicts must
be resolved by the domain experts and the models changed accordingly.
There may be other problems during the consolidation, but this list should already give you
a pretty good idea of what to look for.
Checkpoint
5. The business object types for the application domain are the
primary source for the entity types of an entity-relationship model.
(T/F)
11. Assume that you have two entity types SEAT and PASSENGER
and a relationship type PASSENGER_has_SEAT expressing
which seats have been assigned to passengers. A passenger may
have zero, one, or multiple seats assigned to him/her. A seat can
only be assigned to a single passenger.
Specify the cardinalities for the source and target of relationship
type PASSENGER_has_SEAT:
a. Cardinality for source: _______
b. Cardinality for target: _______
12. Describe the terms 1:1 relationship type, 1:m relationship type, and
m:m relationship type.
_____________________________________________________
_____________________________________________________
_____________________________________________________
r1
A B
1 m
r2
Uempty 14. Based on the entity-relationship model for the previous checkpoint
question, assume that entity types A, B, and C have the following
instances:
_____________________________________________________
_____________________________________________________
Would you expect any relationship instances for relationship type
r2?
_____________________________________________________
_____________________________________________________
r1
A B
m m 1
r2
1..m
List the defining attributes and the relationship keys for relationship
types r1 and r2. Use the term key of ... to describe them.
Defining attributes for r1: _______________________________
Relationship key for r1: _______________________________
Defining attributes for r2: _______________________________
Relationship key for r2: _______________________________
16. The entity key of a dependent entity type must be equal to the
entity key of another entity type or the relationship key of a
relationship type. (T/F)
17. Name the criteria that an entity type must fulfill to be a dependent
entity type.
_____________________________________________________
_____________________________________________________
_____________________________________________________
Uempty 18. Assume that you have the following entity-relationship model:
r1 D
A B
m
19. How can you represent the nondefining attributes for a relationship
type in an entity-relationship model?
_____________________________________________________
_____________________________________________________
_____________________________________________________
20. If you specify the controlling property for the target of a relationship
type, a relationship instance is to be deleted when its target
instance is deleted. (T/F)
21. If you specify the controlling property for the target of a relationship
type, the target instance belonging to a relationship instance is to
be deleted when the relationship instance is deleted. (T/F)
r2 C
A B
m m m
C m
r1 r4
C m
C D
m r3 m
Object Instances
A A1, A2, A3
B B1, B2
C C1, C2, C3
D D1, D2, D3
r1 (C1, A2), (C2, A3)
r2 (A1, B1), (A2, B1), (A3, B2)
r3 (C1, D1), (C1, D2), (C2, D3)
r4 ((A1, B1), (C1, D2)), ((A1, B1), (C2, D3))
Which instances will the various entity types and relationship types
have after entity instance C2 of entity type C has been deleted?
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
26. Match the following partial sentences with the proper bundle
cardinalities:
a. The subtype set is exclusive, but not
____ 1..1
covering if the bundle cardinality is ...
b. The subtype set is covering, but not
____ 1..m
exclusive if the bundle cardinality is ...
c. The subtype set is exclusive and
____ 0..1
covering if the bundle cardinality is ...
d. The subtype set is not covering and not
____ 0..m
exclusive if the bundle cardinality is ...
27. Which mechanism can you use to restrict the instances of entity
types or relationship types?
_____________________________________________________
_____________________________________________________
Uempty
Unit Summary (1 of 3)
The three major components of entity-relationship models are:
Entity types, relationship types, constraints
Entity types are conceptual units representing classes of objects
with the same meaning and characteristics
Have attributes (conceptual pieces of information)
Instances uniquely identified by entity key
Primary source: business object types for application domain
Relationship types are classes of interrelationships between
the instances of entity types and/or relationship types
All interrelationships have the same meaning and characteristics
All interrelationships interconnect two instances
A relationship type has a primary and an inverse direction
Primary source: business relationship types for application domain
Cardinalities for relationship types determine:
How many interrelationships a source instance can and must have
with a target instance
How many interrelationships a target instance can and must have
with a source instance
Notes:
Unit Summary (2 of 3)
Defining attributes completely describe instances of relationship type
Relationship key uniquely identifies instances of relationship type
Relationship type may have nondefining attributes
Modeled by means of dependent entity types
Dependent entity type is an entity type connected to a parent
entity type or a relationship type via an owning relationship type
Each dependent instance connected to one and only one parent instance
Key portion of dependent entity type = key of parent
Only instances interconnected with matching key portion/key values
Controlling property for relationship type specifies if source or
target instance to be deleted when relationship instance is deleted
Cascading effect
Class structures allow the categorization of entity instances
Supertype = generalization of subtypes
Subtypes = specializations of supertype
Subtype set may be exclusive and/or covering
Notes:
Uempty
Unit Summary (3 of 3)
Notes:
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit Objectives
After completion of this unit, you should be able to:
Notes:
Up to now, from the problem statement, the entity-relationship model for the application
domain has been developed. To develop the corresponding database, you must determine
the data that should be contained in the database before you can proceed. This means that
you must establish a list of all data for the application domain, that is, the data inventory.
In this unit, we will talk about the data inventory and the process inventory which is
interrelated with it. We will describe their purposes and explain the significance of the data
inventory for database design. You will find out whose responsibility it is to establish the
data and process inventories for the application domain.
In addition, you will learn what the content of data and process inventories should be from
the perspective of database design. The process inventory is primarily intended for
application programmers, but is important for database design: The descriptions of its
business processes reveal the data that should be contained in the database for the
application domain.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Tuple Types
Tables
Logical Data Structures
Integrity Rules
Logical View Storage Indexes
View
Notes:
The preceding steps of the design process dealt with the problem statement and the
entity-relationship model for the application domain. To establish the database for the
application domain from the entity-relationship model, you need to know the various pieces
of data to be stored in the database. The data for the application domain are described in
the data inventory.
Data inventory and process inventory are developed in parallel during the conceptual view
of the design process. They are established after the entity-relationship model because the
entity-relationship model can be used in their development and is verified as part of their
development.
In principle, the data inventory can be developed without the process inventory. However,
the best method for developing the data inventory is to couple its development and the
development of the process inventory. The process inventory contains a description of all
business processes for the application domain. The description for a business process lists
the data used by the business process. Hence, the process inventory reveals the data
Uempty elements that should be contained in the database for the application domain and, thus,
should be described in the data inventory.
By coupling the data and process inventories, you can ensure that all data needed by
documented business processes of the application domain are contained in the data
inventory and only these data. Consequently, the database will contain precisely the
required data. Furthermore, you can ensure that the data inventory is updated as new
business processes are planned and recorded in the process inventory.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The data inventory contains a detailed description of all data for the application domain: It
describes all abstract data types, data elements, and data groups for the application
domain.
Data elements are indivisible pieces of data. They cannot be divided into smaller pieces
meaningful for the application domain.
In contrast, data groups are sets of logically related (for the application domain) data
elements and/or data groups. This is a recursive definition and implies that data groups can
contain data groups. The data elements or data groups of a data group are referred to as
components of the (owning) data group.
A data group can be viewed as a tree structure of one or more levels whose lowest level
nodes (terminal nodes) are data elements. An example of a data group may be Name of
Person, i.e., the name of a person, consisting of data elements Last Name, First Name,
and Middle Initial.
Uempty The correct identification of data groups is important for the later steps of the design
process since they identify items that logically belong together. In particular, they enable
the recognition of repetitive groups and of groups of data that can be separated out
(vertical splitting).
Since the data inventory is part of the conceptual view, it should be purely
application-domain oriented. It should not contain data elements or data groups caused by
implementation and not having a direct meaning for the application domain.
The entity-relationship model is input for the development of the data inventory. It helps
identify data elements and data groups. Data elements and data groups can be viewed as
abstractions or generalizations of elementary attributes and composite attributes,
respectively. They define elementary or composite data for the application domain
independent of their usage by entity types. Therefore, the definition of the data elements
and data groups should be independent of the entity types of the entity-relationship model.
In this context, the question arises if you should have two different data elements or data
groups for data with the same fundamental meaning, but a (slightly) different usage? For
example, for our sample airline company, we want to store the planned departure time and
the actual departure time for a flight. Should you have different data elements Planned
Departure Time and Actual Departure Time or just a single data element Departure Time?
Planned Departure Time and Actual Departure Time are certainly different attributes for
entity type FLIGHT.
The answer is that both solutions are feasible. If you choose a single data element, data
element Departure Time is used in two different roles (purposes) by entity type FLIGHT: It
is used as planned departure time (attribute name Planned Departure Time) and as actual
departure time (attribute name Actual Departure Time). If you choose two different data
elements, they are used by entity type FLIGHT in their fundamental meanings. In this case,
the attribute names can be the names of the data elements.
It is a matter of taste and judgement where you make the assignment of roles: on the data
element level, the data group level, or the entity type level. If you make the differentiation
on the data element level, you must make the description of the data elements more
restrictive, but need not deal with roles. If you differentiate on the data group or entity type
level, you can keep the description of the data elements more general, but have to deal
with roles. Although the roles must be described as well, the definition work may be
somewhat less in the latter case.
However, you should make a sensible trade-off. In the extreme, you could decide on having
a single data element for all data with the same data type and use roles for all usages. For
example, you could have a single data element Time representing a time and define all
kinds of roles for it: as planned time of departure, as actual time of departure, as planned
time of arrival, as actual time of arrival, and so on. By now, you should know that this is
certainly not the way to go. Having data elements Departure Time and Arrival Time would
probably be adequate in this case.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
As described above, data elements or data groups may be used as components by other
data groups. They may also be used as attributes by entity types or tuple types (as we will
see later on).
Data elements and data groups as such do not have cardinalities. However, when used as
component or attribute, a cardinality is associated with a data element or data group. The
cardinality specifies how many values the data element/data group must assume at least
and at most and will assume on average for this usage. Note that two sets, having the
same data elements and data groups, are considered different data groups if the
cardinalities of the components are different.
A data element or data group may be used by many data groups and entity types. A data
element or data group may even be used multiple times (for different purposes) by the
same data group or entity type.
When a data group is used by an entity type, it becomes a composite attribute of the entity
type. This means that the entity type contains elementary attributes for all data elements of
the tree structure for the data group.
The data inventory must be created by someone with detailed knowledge of the application
domain. Thus, the application domain expert must be involved in the creation of the data
inventory. However, he/she needs the help of the database designer. The database
designer knows best what is needed for database design. He/She knows the entity types of
the entity-relationship model that may contain the data elements or data groups and has a
better understanding of data types. The data inventory identifies the entity types using the
data elements or data groups and describes the abstract data types the data elements are
based upon.
According to the above, the application domain expert and the database designer must
jointly develop the data inventory.
The data inventory is input for the database designer who needs it during the later steps of
the design process. The data inventory also helps application programmers when
designing the application programs or queries for the business processes of the application
domain.
Uempty
Contents of Data Inventory (1 of 3)
Abstract Data Types
Signature: A unique name for the data type followed, in parentheses and
separated by commas, by the parameters for the data type
Values: A description of the values that data belonging to the data
type can assume
For a finite number of values, a list of the possible values
Can be values of another data type or a subset thereof
Can be defined by a formula
Can be a textual description of the values
Notes:
The first component of the data inventory is the description of the abstract data types for
the application domain.
Data types describe the values data of that data type can assume and the operations that
can be performed with the data. Thus, by associating a data element with a data type, you
define the fundamental values and the operations for the data element.
In support of the SQL standard, all relational database management systems provide a set
of standard data types such as INTEGER, DECIMAL, CHARACTER, DATE, or TIME.
These standard data types are general-purpose data types covering many situations, but
are imprecise.
Abstract data types go beyond standard data types. You tailor them for your application
domain so they reflect the values the associated data elements can assume and the
operations that can be performed with the data elements.
As an example, take the employee numbers for pilots and mechanics of our sample airline
company called Come Aboard. They consist of digits, and you would be attempted to
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
assign them to the standard data type INTEGER. However, they are not really integers:
You should not perform the usual integer operations (such as integer addition and
subtraction) with them. Furthermore, they cannot be negative and leading zeros have a
meaning and should not be suppressed. You might suggest to define them as
CHARACTER data. However, this would result in a different problem: Employee numbers
could contain letters which is not correct either. The solution is an abstract data type
reflecting that employee numbers consist of digits and cannot be added or subtracted.
By implementing abstract data types, you can ensure that the values of the data of your
database are always "syntactically" correct. You can also prevent undesired or illegal
operations for them.
The same data type can be used by many data elements. Sometimes, different data
elements have only slightly different requirements on their data types raising the question:
can the same data type be used? For example, for two character-string type data elements,
only the allowable maximum length may be different.
To enable the common usage for slight differences, data types can be parameterized. For
each data element, you can specify different parameter values. For the character-string
example, the parameters could be the minimum length and the maximum length for the
respective character-strings.
Data types can only be associated with data elements. They cannot be associated with
data groups since these just have a grouping function. Data groups may be composed of
many data elements all having different data types.
The data inventory should contain a description of all abstract data types for your
application domain. The descriptions of the data elements will refer to the data types.
Preferably, you should only use your own abstract data types. However, realistically, most
data inventories will also use the standard data types of the SQL standard. Include a list of
the standard data types used by your data elements.
For each abstract data type, you should describe the following:
• The Signature of the Abstract Data Type
The signature consists of a unique name for the data type followed, in parentheses and
separated by commas, by the parameters for the data type.
• The Values for the Abstract Data Type
The description of the values depends on the abstract data type:
- If the abstract data type can only assume a finite number of values, the values can
be listed.
- The values of the abstract data type may be the values of another abstract data
type, of a standard data type, or of a subset thereof. In this case, specify the
appropriate subset.
- The values of the abstract data type may be defined by a formula. This is especially
true for integer data which must satisfy a specific formula.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Equal Comparison
EQUAL(text-data-1, text-data-2) { TRUE | FALSE }
Normalizes text-data-1 and text-data-2 and compares them character by
character
Result is TRUE if all characters are equal; FALSE otherwise
Figure 5-5. Sample Data Types (1 of 4) CF182.0
Notes:
The abstract data type described on the visual deals with text data, i.e., descriptive text
such as remarks added to the maintenance records for Come Aboard. The values consist
of arbitrary strings of printable characters. The strings must have at least as many
characters as specified by parameter minimum-length of the signature and not more than
specified by parameter maximum-length.
Both parameters of the signature are optional. If minimum-length is not specified, a default
minimum length of 1 is assumed. If maximum-length is not specified, the length of the
string is not limited.
Two operations are allowed for text data:
• Operation
NORM(text-data-1) t text-data-2
normalizes text-data-1. This means it removes leading and trailing blanks and replaces
intermediate groups of blanks by a single blank each. The result is again a text data
string, text-data-2.
Uempty For example, " This is text ..." becomes "This is text ..." when normalized. (Note that
the surrounding double-quotes do not belong to the text. They are used here for clarity
purposes and delimit the text.)
• The second operation is the equal comparison for text data:
EQUAL(text-data-1, text-data-2) t {TRUE | FALSE}
Text-data-1 and text-data-2 are input for the operation. The operation normalizes both
strings and compares the normalized text data character by character. If the normalized
strings are character-wise identical, they are considered equal and the result is TRUE. If
they are character-wise different, the result is FALSE. In particular, the strings are not
considered equal if the corresponding normalized strings have a different length.
Accordingly, the strings " Equal strings" and "Equal strings " are considered equal.
In the database, you want text data to be stored in the normalized form. Normalization of
text data is important when searching for a specific string based on user input. User input
may not always be normalized.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The abstract data type described on the visual deals with name data such as the last name
or first name of a person. Names do not allow arbitrary characters. They allow letters,
blanks, and single dashes (-) or periods (.). Thus, the values for abstract data type Name
Data consist of strings of these characters of the specified minimum and maximum lengths.
For name data, you have the same operations as for text data. However, the normalization
of name data uppercases all letters. For example, "Miller", "MiLLer", and "miller" all become
"MILLER" when normalized. (Note that the surrounding double-quotes do not belong to the
text.)
As for text data, the equal comparison compares the normalized strings. Thus, "Miller",
"MiLLer", and "miller" are all considered equal.
In the database, you want name data to be stored in the normalized form. Normalization of
name data is important when searching for a specific name based on user input. User input
may not always be normalized.
Uempty
Sample Data Types (3 of 4)
Alphanumeric String
Equal Comparison
Notes:
The abstract data type on this visual deals with alphanumeric strings. They consist of
letters and digits. An example are the aircraft serial numbers for Come Aboard. Again the
abstract data type is parameterized: The minimum and maximum lengths for the
alphanumeric string can be specified.
For this abstract data type, lowercase letters and uppercase letters are considered different
since a normalization operation has not been provided. Leading or trailing blanks are not
allowed. They are considered as improper input.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Airport Code
Operations:
Equal Comparison
Notes:
The values of the abstract data type on this visual are the international codes for the
airports serviced by Come Aboard. Thus, a finite set of values that can be listed. As
indicates by the ellipsis (three dots) at the end, only a few values are shown on the visual.
The abstract data type is not parameterized. It only supports the equal comparison.
Uempty
Contents of Data Inventory (2 of 3)
Data Elements and Data Groups
For each data element/data group:
Name: A unique identifier for the data element or data group
One or more words
Each word starting with a capital letter except for connecting
words such as of or for
As natural as possible for application domain
Type: Data element or data group
Description: A detailed textual description of the meaning of the data
element or data group for the application domain
As precise as possible to avoid synonyms and homonyms
Data Type: If data element, data type for data element
Lengths: If data element:
Minimum Maximum Average Number Decimal
Length Length Length of Digits Places
Domain: If data element, value constraints for data element over
and above those imposed by data type
Figure 5-9. Contents of Data Inventory (2 of 3) CF182.0
Notes:
The second component of the data inventory is the inventory of the data elements and data
groups for the application domain. For the application domain, data elements are indivisible
pieces of information. Data groups are groups of logically related data elements or data
groups as explained before.
For each data element or data group, you should provide the following basic items:
Name
The unique name for the data element or data group. Each data element and data group
receives a unique name. The name should clearly express the meaning of the data
element or data group for the application domain. It may consist of multiple words. We will
start each word with a capital letter except for connecting words such as of or for.
The names are used by the business processes, described in the process inventory, to
refer to the data elements and data groups they need.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Type
The type of the object described, i.e., if the object is a data element or a data group. Thus,
the proper values are data element and data group, respectively.
Description
A detailed textual description of the meaning of the data element or data group for the
application domain. The description should be as precise as possible to avoid synonyms
and homonyms.
Synonyms are data elements or data groups having a different name, but meaning the
same object. Synonymous data elements or data groups can lead to the same information
being stored multiple times, i.e., to the redundant storage of information.
Due to the equivocalness and ambiguity of their names and descriptions, data elements or
data groups that are homonyms can be interpreted to mean different things. Their names
and descriptions should be made unambiguous. If necessary, they must be split into
multiple data elements or data groups.
Data Type
For data elements, the data type (standard data type or abstract data type) of the data
element including the applicable values for parameters of the data type.
For data groups, this item is not applicable. It should either be marked as not applicable or
be omitted.
Lengths
For data elements, the lengths applicable for them. For string-type data elements,
important lengths are: their minimum length, their maximum length, and their average
length. The average length is important for estimates made by the database administrator
when allocating space for tables.
For numbers, important values are: their number of digits and, if applicable, the number of
decimal places.
If the data type for the data element is parameterized, some of these values may already
have been specified as parameters for the data type.
For integers, you may prefer to specify a domain, that is, the range of values that can be
assumed. The number of digits can then be derived from the specified range.
For data groups, this item is not applicable. It should either be marked as not applicable or
be omitted.
Domain
For data elements, value constraints over and above those implied by the data type for the
data element.
If you use abstract data types extensively and correctly, you probably will not need
additional values constraints.
Uempty If you use standard data types, such as INTEGER, you might want additional value
constraints such as the minimum and maximum values the data element can assume.
If multiple data elements basically have the same data type and only differ marginally in
their allowable values, you might decide to use a single abstract data type covering the
values of all data elements concerned and define value constraints for the various data
elements.
For data groups, this item is not applicable. It should either be marked as not applicable or
be omitted.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Entity Types: For each entity type the data element or data group
belongs to and for each role it plays for the entity type:
Entity Type Role Cardinality
Name Description Name Min Max Avg
Entity-Relationship Model
Completeness Check
for Entity-Relationship Model
Figure 5-10. Contents of Data Inventory (3 of 3) CF182.0
Notes:
In addition to the basic items on the previous visual, you should provide the following items
for a data element or data group:
Data Groups
The data element or data group being described may be a component of other data
groups. It may belong to a data group more than once, however, in different roles. For each
data group the data element or data group belongs to and each role played, provide the
following:
• The name of the owning data group.
• The role played in the other data group. Provide a textual description of the role and the
name the data element or data group assumes in that role. Description and name need
only be provided if they are different from the fundamental purpose and name of the data
element or data group.
• The cardinality of the data element or data group for the role in the data group. Provide
minimum, maximum, and average cardinality. This means, specify how many values the
Uempty data element or data group must assume at least and at most and will assume on
average for this usage. If the maximum cardinality is not limited, use an asterisk (*).
Entity Types
Most of the time, data elements or data groups are immediately used by entity types and
not indirectly through data groups. If the data element or data group is a direct attribute of
an entity type, provide the following:
• The name of the entity type. The entity-relationship model helps you determine the
appropriate entity types.
• The role played for the entity type. Provide a textual description of the role and the name
the data element or data group assumes in that role. Description and name need only be
provided if they are different from the fundamental purpose and name for the data
element or data group.
• The cardinality of the data element or data group for the role it plays in the entity type.
Provide minimum, maximum, and average cardinality. This means, specify how many
values the data element or data group must assume at least and at most and will
assume on average for this usage. If the maximum cardinality is not limited, use an
asterisk (*).
Do not provide an entry for this item if the data element or data group is not immediately
used by entity types.
By determining the entity types for data elements and data groups, you verify the
completeness of the entity-relationship model. If you find a data element or data group that
cannot be associated with another data group or an entity type, the entity-relationship
model is incomplete and must be corrected.
Relationship types are not of interest in this context because their defining attributes are
derivatives of the keys of their source and target.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Last Name
Notes:
The above visual illustrates a data element for our sample airline company. The data
element is a component of a data group. It is not used as direct attribute of an entity type.
The data element is called Last Name and represents the last name of a person (for
example, a pilot or mechanic). The data type for the data element is Name Data defined
before as an abstract data type. The signature NAMEDATA(1, 60) specifies that a last
name must consists of at least one character and must not have more than 60 characters.
The abstract data type is described on page 5-14.
The lengths relevant for last names are the minimum length, the maximum length, and the
average length. Minimum length and maximum length must be the same as for the
signature for the data type.
There are not any value restrictions above those for name data.
The data element is a component of data group Name of Person described on the next
visual. For each instance of the data group, it may assume one and only one value.
Therefore, Minimum, Maximum, and Average all have the value 1. Role and Role Name
Uempty have not been provided. The data element plays its fundamental role and is used under its
defined name (Last Name) in the data group. It is not necessary and would be repetitive to
repeat the name and description of the data element.
The data element is not used as a direct attribute by any entity type.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Name of Person
Notes:
This visual describes data group Name of Person for data element Last Name.
The data group represents the full name for a person consisting of the last name, first
name, and middle initial for the person. The data inventory must contain descriptions for
the appropriate data elements. We have seen the description for data element Last Name.
Items Data Type, Lengths, and Domain do not apply to data groups.
Data group Name of Person is not again a component of another data group. It is used as
direct (composite) attribute by entity type EMPLOYEE. For each entity instance, it assumes
one and only one value (minimum cardinality = maximum cardinality = average cardinality
= 1).
Uempty
Sample Data Elements and Groups (3 of 7)
Aircraft Number
Notes:
Data element Aircraft Number, the universal aircraft serial number for aircraft, is an
alphanumeric string of 10 characters (data type ALPHANUMERIC(10, 10)). Therefore,
Minimum Length, Maximum Length, and Average Length have the same value 10.
The data element is not a component of another data group. It is used as direct attributes
by entity types AIRCRAFT and MAINTENANCE RECORD.
For entity type AIRCRAFT, it is the unique identifier for the various aircraft that Come
Aboard owns. Since playing a single role for the entity type, its fundamental role, the data
element need not be named differently. As unique identifier, the data element assumes one
and only one value for every instance of entity type AIRCRAFT.
For entity type MAINTENANCE RECORD, the data element represents the aircraft serial
number of the aircraft for the maintenance record. Also in this case, there is no need to
rename the data element since it is used in a single role by the entity type. Its original name
clearly expresses the purpose it is used for. Since every maintenance record contains one
and only one aircraft number, minimum, maximum, and average cardinality are all 1.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Engine Number
Notes:
This visual illustrates another data element for Come Aboard, the serial number for aircraft
engines.
Data element Engine Number uses the same abstract data type as data element Aircraft
Number, however, with different parameter values. Whereas aircraft serial numbers were
10 characters long, engine serial numbers may consist of 8 to 12 alphanumeric characters.
Using the same data type is perfectly all right as long as you want to allow that the various
data elements can be compared with each other. If you do not want aircraft serial numbers
to be compared with engine serial numbers, you should define two different abstract data
types.
Engine Number is a component of data group Engine. It is not used by other data groups or
directly by entity types. For each instance of data group Engine, Engine Number assumes
one and only one value.
Uempty
Sample Data Elements and Groups (5 of 7)
Engine
Name Engine
Type Data Group
Description An engine for an aircraft
Data Type N/A
Lengths N/A
Domain N/A
Data Groups
Entity Types AIRCRAFT
Role: -
Role Name: -
Cardinality: Minimum = 0, Maximum = 4, Average = 2
Notes:
Data Group Engine is the data group for data element Engine Number. It has additional
components such as the type of the engine and information about the manufacturer for the
engine. Engine is a repetitive group for entity type AIRCRAFT. This is because an aircraft
may have multiple engines mounted. Consequently, for each instance of entity type
AIRCRAFT, Engine may assume multiple values each of which is composed of appropriate
values for the components of Engine.
The minimum cardinality of 0 signals that aircraft need not have engines mounted. The
maximum cardinality of 4 specifies that an aircraft cannot have more than four engines
mounted. The average cardinality of 2 indicates that, on the average, an aircraft has two
engines mounted.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Manufacturer
Name Manufacturer
Type Data Group
Description All information concerning a manufacturer (e.g., manufacturer code, company
name, complete address, and phone number)
Data Type N/A
Lengths N/A
Domain N/A
Data Groups Engine
Role: -
Role Name: -
Cardinality: Minimum = 1, Maximum = 1, Average = 1
Entity Types AIRCRAFT TYPE
Role: -
Role Name: -
Cardinality: Minimum = 1, Maximum = 1, Average = 1
Notes:
Manufacturer is a data group consisting of all information pertaining to a manufacturer. In
particular, it includes:
• the manufacturer code (a unique identification for the manufacturer)
• the name of the manufacturer's company
• the address of the manufacturer
• the phone number of the manufacturer
The address of the manufacturer is again a data group.
Data group Manufacturer is a component of data group Engine of entity type AIRCRAFT. It
also is a direct attribute of entity type AIRCRAFT TYPE.
This example illustrates a hierarchy of data groups:
Address t Manufacturer t Engine
Address is a data group representing an address. Assuming that Address consists of the
data elements Street Address, Post Office Box, City, State, Country, and Postal Code, the
tree structure for Engine looks as follows:
Uempty Engine
Engine Number
Engine Type
Manufacturer
Manufacturer Code
Company Name
Address
Street
Post Office Box
City
State
Country
Postal Code
Phone Number
The names of data groups are shown in bold. Indentation indicates the next level of the
tree structure. Items with the same indentation are on the same level of the tree structure.
Since data group Engine is used as a composite attribute by entity type AIRCRAFT, the
data elements at the terminal nodes become elementary attributes of entity type
AIRCRAFT.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Number of Engines
Notes:
In the illustrated example, data element Number of Engines is associated with standard
data type INTEGER which is not parameterized. This means that a value range cannot be
specified for the data type. However, the minimum value that Number of Engines can
assume is 0. The maximum value is 4. To indicate this, you can use a domain specification
as done in the example. The domain specification must be implemented by database
functions such as check constraints, if available, or by checking user input.
Uempty
Methods for Establishing a Data Inventory
Notes:
There are many ways to establish a data inventory. However, there are three methods that
are normally considered when creating a data inventory:
• You can survey the departments of expertise and ask their members for the data
elements and data groups needed by the application domain.
• If available, you can review existing data (in files or databases) and programs to
determine the data elements and data groups needed for the application domain.
• You can develop the data inventory in parallel with the process inventory which
describes the business processes for the application domain. As a business process is
described, the data elements and data groups it uses become apparent.
You can use one of these methods or a combination thereof. We will discuss the
advantages and the disadvantages of these methods in the following.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
This method suggests that the application domain experts asks the members of the
departments of expertise for the data needed by their tasks.
The quality of the result depends on several communicative factors:
• It depends on the ability of the domain expert to extract the proper information from the
members of the departments of expertise. From the discussions in the unit so far, he/she
should know which information is needed. However, the answers received are frequently
not very well structured. They must be scrutinized and filtered to reveal the actual facts.
• It depends on the ability of the interviewer to tell the application domain expert precisely
what is needed. The members of the departments of expertise are not computer
experts. Frequently, they do not have a feeling for the information needed.
• It depends on the willingness of the members of the departments of expertise to
cooperate with the domain expert. Since they do not see a direct benefit for them, they
may find it tiresome and annoying to be involved in the interviews. Their willingness
largely depends on the pressure they are under as a consequence of their actual work.
Uempty Even if the above-mentioned problems do not occur, there are some other pitfalls with this
technique. The approach is fairly unstructured and, during a discussion, it is very easy to
forget something as you probably know from your own experience.
Conversely, during a discussion, things may surface that are on the mind of the interviewee
or in his/her fantasy rather than being facts. This may lead to superfluous data elements
and data groups in the data inventory and, thus, unnecessary fields in the database being
designed.
A further disadvantage of this method is that it is a one-time effort. Consequently, later
changes (e.g., new data elements or extensions of data groups) are not reflected in the
data inventory.
Summing it up, surveying the departments of expertise is rather an auxiliary method than
the method to be used. Together with other methods, it may be quite helpful, especially,
since it promotes the contact to the members of the departments of expertise, the actual
"customers" of the database being developed.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Existing data files (on paper, in flat files, etc.) and program
listings are screened for data of the application domain
Notes:
This method screens existing data files (which may be on paper, in flat files, or in old
databases) and program listings for data used by the application domain. From the data
found, the data elements or data groups for the application domain are derived and
registered in the data inventory.
For a large application domain, a great number and variety of files and documents may
have to be inspected. This is not a problem as such because, whatever you do to come to
a data inventory, it will cost you quite some effort; otherwise, the data inventory will be
incomplete. The success depends on the availability and the quality of the documentation
of the data and the programs. The poorer the documentation, the more effort you must put
in.
The amount of information to be scanned may cause potential data elements or data
groups to be overlooked. Conversely, you may find data elements or data groups that are
not really objects of the application domain, but caused by the particular implementation
used so far. You do not want these data elements and data groups in the data inventory.
Uempty Summarizing the preceding points, we can say that this is a feasible method provided the
required information is available. However, you must be wary of implementation-dependent
data elements or data groups and ignore them.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The method discussed on this visual synchronizes the data inventory with the process
inventory. It couples the development and maintenance of the data and process
inventories. The process inventory contains a detailed description of the business
processes for the application domain. It is input for application programmers and enables
them to write the programs supporting the application domain. As we will see later in this
unit, the description of the business processes includes, for each business process, a list of
the data elements and data groups used.
As a business process is described or changed, the affected data elements or data groups
are described or their description is updated in the data inventory. Thus, the process
inventory and the data inventory remain synchronized at all times. If processes require new
data elements or data groups, they are associated with the proper entity types as described
before. If an entity type cannot be found for a data element or data group, the
entity-relationship model must be changed as well. As you can imagine, this will result in
changes for your database.
Uempty The advantage of this approach is that it leaves the responsibility for the data elements and
data groups where it belongs to, namely, with the business processes. As a consequence,
the data inventory contains the data elements and data groups for the documented
processes and only for those. These may be existing or planned business processes. A
positive side effect is that the planned business processes must have materialized at least
so far that they have been documented; that they no longer are some vague ideas in some
people's head that never are realized.
The discussed method ensures that all needed data elements and data groups are in the
data inventory and, thus, will be in the database.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
As already mentioned before, the process inventory contains a detailed description of all
business processes for the application domain. The descriptions should be completely
business oriented. They should be independent of any implementation considerations.
The process inventory is established by the application domain expert because he/she has
the overall knowledge of the application domain required. Of course, he/she needs to
discuss the business processes and verify their descriptions with the departments of
expertise. Whereas the members of the departments of expertise are frequently not willing
to discuss the data elements or data groups, they generally are interested in talking about
the business processes. The reason is that the business processes represent their daily
work. They want to ensure that their implementation makes their work as easy as possible.
The process inventory is input for the application programmers. It must allow them to
understand the business processes for the application domain and to develop the required
programs, queries, etc.
The descriptions for the business processes must identify all data elements and data
groups for the business processes. Since the data elements and data groups are process
Uempty independent, they are not described in the process inventory, but in the data inventory. The
business processes only refer to the data elements and data groups in data inventory.
Therefore, the data inventory is also input for the application programmers.
When describing a business process, the application domain expert should verify that the
entity-relationship model contains all entity types and relationship types necessary for the
implementation of the business process. We will discuss this in more detail on one of the
subsequent visuals.
Except for assisting the application domain expert in verifying the entity-relationship model,
the data base designer is not involved in the establishment of the process inventory.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
For each business process, the process inventory should contain the following items:
Title
The unique title under which the business process is known throughout the application
domain. The business process should be easily recognizable from the title.
Purpose
A short description of the purpose of the business process, i.e., an outline what, from a
business perspective, the business process is supposed to achieve.
Input
A description of all data, including their role, that are external input for the business
process. This means a description of all data that are perceived as input by the (end) users
of the business process. In particular, these may be data entered by them in entry fields or
selected via check boxes, radio buttons, or combination boxes.
Uempty As mentioned, for each input, its role should be identified. For example, the description
should not just say "aircraft number", but rather "aircraft number for the maintenance record
of the specified aircraft". This is important for the application programmers for two reasons:
1. They may have to provide an appropriate description for the corresponding input field on
a window or in help information.
2. They need to know which data to access. As you know already from the
entity-relationship model, the aircraft number may occur in multiple entity types and,
thus, later on, in multiple tables.
Textual Description
A detailed textual description of the various steps of the business process. A textual
description is necessary since the description must be verified by the departments of
expertise and will be available to the users of the business process. Generally, a formal
description of the business process is not understood by these people. The textual
description also helps the database designer when verifying design steps by means of the
business processes described in the process inventory.
Formal Description
A formal description (for example, by means of decision tables) of the conditions, rules, and
actions for the business process. Application programmers may prefer such a formal
description over a textual description because it is more precise and, thus, eases their task.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
In addition to the items on the previous visual, the description for a business process
should contain the following items:
Output
A description of all data, including their role, which are external output for the business
process, i.e., perceived as output by the users of the business process. In particular, these
may be data displayed in a window or in a listing. It may also be something as abstract as
an interrelationship established by the business process (e.g., the assignment of an aircraft
to a flight) or a message.
Furthermore, the output may be conditional. This means, it can depend on the input
provided for the business process and on situations encountered during its execution.
As mentioned, for each output, its role should be identified as far as applicable. For
example, the description should not just say "airport code", but rather "airport code for
airport of departure for leg". Application programmers need this information to properly
describe the output on windows or in listings.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Business Processes
Notes:
The next few visuals show the description of a business process for our sample airline
company called Come Aboard. The business process assigns a pilot as captain to a flight.
Appropriately enough, the unique title under which the business process is known
throughout the application domain is Assign Captain for Flight.
Item Purpose explains in more detail what the business process will accomplish: It will
assign the specified pilot to the specified flight.
The input for the business process must identify the flight as well as the pilot who becomes
the captain for the flight. To identify the flight, the fight number, the airport of departure, the
airport of arrival, and the locator for the flight must be provided. Note that the addendum for
flight in the visual identifies the role of the input. This is important because flight number,
airport of departure, and airport arrival can be used to identify different things (e.g., legs
rather than flights).
Uempty
Sample Business Process (2 of 5)
Textual This business process performs the following operations:
Description: 1 It is verified that the specified flight and pilot exist.
If flight or pilot do not exist, an appropriate error message is displayed
and the business process ends.
2 If pilot and flight exist, it is checked if the pilot has the license to fly the
aircraft model for the leg for the flight.
If the pilot cannot fly the aircraft model, an appropriate error message is
displayed and the business process ends.
3 If the pilot has the license to fly the aircraft model, it is checked if the
pilot has already been assigned to the flight.
If the pilot is already captain or copilot for the flight, an appropriate
message is displayed and the business process ends.
4 If the pilot has not yet been assigned to the flight, it is checked if
another pilot is already captain for the flight.
If so, a message is displayed containing employee number, last name,
and first name of the current captain and the business process ends.
5 If a captain has not yet been assigned to the flight, the specified pilot
becomes the captain for the flight.
6 A message is displayed confirming that the pilot has been assigned as
captain to the flight. The message includes employee number, last
name, and first name of the assigned captain.
Figure 5-26. Sample Business Process (2 of 5) CF182.0
Notes:
This visual lists the individual steps that, from a business perspective, must be performed
by the business process. It does not describe an implementation. The implementation may
look completely different and even will in this case: It can make of use of two constraints of
the entity-relationship model provided these are implemented. As a consequence of the
constraints, a lot of the checking for the business process need not be implemented since it
is handled by the constraints.
Because the description is pretty intelligible, we need not discuss it further here.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
This visual illustrates the external output for the sample business process assigning a pilot
as captain to a flight.
The flight information, i.e, the flight number, the airport of departure, the airport of arrival,
and the locator for the flight are always returned as output. They also were input for the
business process. Again, the addendum for flight on the visual indicates the role of the
output.
The further output is dependent on conditions encountered during the execution of the
business process:
If another pilot has already been assigned as captain to the flight, employee number, last
name, and first name of that pilot and the employee number of the specified pilot are
returned. (As a consequence, the specified pilot was not assigned to the flight.)
If the specified pilot has been assigned as captain to the flight, his/her employee number,
last name, and first name are returned. In addition a message is issued that the pilot has
Uempty been assigned successfully. Accordingly, the fact that the pilot has been assigned to the
flight is perceived as an output by the user of the business process.
You could think of further conditions resulting in different output. Such a condition is that the
pilot has already been assigned as copilot to the flight. These conditions and their output
should be described as well. We have not done this here to keep the output for the sample
business process on a single visual.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
_belongs_to_ _for_
Notes:
For each business process, you should verify the entity-relationship model for the
application domain. You should check if it contains all required entity types and relationship
types by scrutinizing all steps of the business process.
To determine the entity types needed, you must determine the data elements and data
groups used by the steps. Thus, in the course of the verification, you determine all data
elements and data groups read or written by the business process.
When verifying the entity-relationship model for a business process, you perform a walk
through the entity-relationship model and determine the view needed for the business
process.
We will do this now for the sample business process assigning the captain for a flight. The
steps of the business process have been described on page 5-47. They will be repeated
here as far as required for the understanding:
1. The first step of the business process verifies that the specified pilot and flight exist. If
not, an appropriate message is displayed and the business process ends.
Uempty The specified flight exists if entity type FLIGHT contains an entity instance for the
specified flight number, airport of departure, airport of arrival, and flight locator. Thus,
entity type FLIGHT must have attributes for the following data elements:
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Thus, the entity-relationship model includes all entity types and relationship types
required for the step.
Accessing a relationship type means accessing its defining attributes since they
completely describe the relationship instances. As we know, the defining attributes are
the keys of the source and target for the relationship type. Consequently, source and
target of the relationship type are the primary receptacles for the data elements and data
groups corresponding to the defining attributes. If they do not contain them, the
relationship type cannot contain them. If they are their keys, the relationship type will
automatically contain them. Therefore, in the data inventory, the data elements/data
groups for the accessed defining attributes are associated with the source and target
entity types rather than with the relationship type.
In view of this convention, the walk through the entity-relationship model for this step of
the business process requires the following data elements for the indicated entity types.
The roles are included in parentheses:
Uempty interest whether the pilot has been assigned as captain or copilot, entity type PILOT
ASSIGNMENT is not needed.
Accordingly, this step of the business process uses the following data elements in the
indicated entity types:
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty As a consequence, the following data elements in entity type PILOT ASSIGNMENT are
written (see also page 5-57):
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The data elements and data groups on this visual have already been discussed in the
notes for the previous visual on page 5-50.
Note that column Contained In is not part of the description for a business process. It has
been added here to indicate the data groups and entity types the various data
elements/data groups will be associated with in the data inventory. It does not make sense
to describe this in the process inventory. The implementation of the business processes
must be based on the actual tables rather than on the entity types of the entity-relationship
model.
Uempty
Sample Business Process (5 of 5)
Notes:
This visual illustrates the data elements written by the sample business process. They have
already been discussed on page 5-50 ff..
Note that column Contained In is not part of the description for a business process. It has
been added here to indicate the data groups and entity types the various data
elements/data groups will be associated with in the data inventory.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Process Decomposition
To attain a complete set of business processes for the application domain
Business process = Process actually performed by the application domain
Step-by-step decomposition of application domain into groups of
related business processes and, finally, individual business processes
Next iteration is a refinement of the previous iteration
Next iteration creates business-related subsets of groups for previous iteration
Iteration stops if group consists of a single business process
Independent of whether or not the business process will employ other
business processes to achieve its task (implementation detail)
Business process then described in process inventory
Result is a process tree
Lowest level are business processes to be described in data inventory
Higher levels are groups of related business processes
Only a grouping of business processes
Does not imply an implementation structure
Does not specify if a business process internally uses another business
process to accomplish its task
Notes:
To ensure the completeness of the data inventory, you need a comprehensive process
inventory. This requires that you have a complete set of the business processes for the
application domain, i.e., the processes (tasks) actually performed by the application
domain.
One technique for obtaining a comprehensive set of business processes is process
decomposition. It is a step-by-step decomposition of the application domain into groups of
related business processes and, finally, individual business processes.
Process decomposition is an iterative process. The next iteration is a refinement of the
previous iteration and creates business-related subgroups (subsets) for the groups
resulting from the previous iteration. The iteration stops when a group finally consists of a
single business process.
Process decomposition is a pure grouping of the business processes based on the tasks
performed by the application domain. It just describes which business processes are
performed by a the various subfunctions of the application domain. The same business
process may be performed by multiple subfunctions.
Uempty Process decomposition neither considers nor reflects whether or not a business process
internally uses other business processes to perform its work. For example, the business
process displaying all maintenance records for an aircraft may very well use the business
process displaying an individual maintenance record. However, this is not a concern of
process decomposition and not reflected in its output.
Neither does process decomposition occupy itself with modules internally used or
invocation sequences. These are implementation details. Only externally visible tasks, i.e.,
tasks performed by the application domain, are considered and reflected. Remember that
we still are in the conceptual view. At this stage, you should not make any assumptions
about the implementation of the business processes.
As a business process is identified during process decomposition, it is described in the
process inventory.
The result of process decomposition is a process tree. The nodes at the lowest level of the
process tree are the business processes described in the process inventory. The nodes at
the higher levels are groups of related business processes. They act like folders of
directory structures. The process tree groups the business processes in accordance with
their usage by subfunctions of the application domain.
The process tree does not imply an implementation structure or an invocation sequence. It
does not specify if a business process internally uses another business process to
accomplish a task. It neither establishes nor enforces that separate business processes
become separate programs or queries.
The process tree should be incorporated in the process inventory. It provides an
overview of the business processes described in the process inventory.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
AVIATION
AIRCRAFT PILOT
ASSIGNMENT ASSIGNMENT
Notes:
This visual illustrates a part of the process tree for our sample airline company called Come
Aboard.
We have used folders for the higher-level nodes to demonstrate the similarity to directory
structures on personal computers. A folder contains a list of items. In our case, these are
business processes or other folders containing business processes.
The top-level folder (node), called Aviation, represents the entire application domain. It
covers all business processes for the application domain.
The first iteration of the process decomposition resulted into groups of business processes
for subfunctions Airport Management, Itinerary Management, Flight Management, Aircraft
Management, and Aircraft Maintenance. Some additional subfunctions (such as Employee
Management) are not shown on the visual.
The visual shows a second iteration for Flight Management resulting in groups of business
processes for subfunctions Aircraft Assignment and Pilot Assignment. The business
processes for these groups (third iteration) are also listed on the visual.
Uempty Many business processes access a single business object type or business relationship
type or, if you prefer, a single entity type or relationship type. However, there are also
business processes accessing multiple entity types and/or relationship types.
Note that business process Display All Aircraft Information for Flight might invoke business
processes Display Aircraft for Flight, Display Aircraft Model for Flight, and Display Aircraft
Type for Flight. Since this is an implementation detail, the process tree does not show it.
The actual implementation may look different.
Business processes Display Flights for Pilot and Display Flights for Aircraft may very well
be used by other subfunctions of the application domain as well. The first business process
may be used by Employee Management; the second by Aircraft Maintenance. The unique
title for a business process prevents that it is implemented twice. You can view the business
process as belonging to one subfunction (its major user) and the other subfunctions having
shortcuts for it.
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
AVIATION
Create Itinerary
Change Itinerary
Remove Itinerary
Display Itinerary
Add Single Leg to Itinerary
Add All Legs for Itinerary
Change Leg of Itinerary
Remove Leg of Itinerary
Display Single Leg of Itinerary
Display All Legs of Itinerary
Change Aircraft Model for Leg of Itinerary
Display Flights for Leg of Itinerary
Notes:
The next iteration for subfunction Itinerary Management does not result in groups of
business processes. Rather, it immediately provides the business processes for Itinerary
Management. This illustrates that the number of iterations required may vary for different
parts of the process tree.
This part of the process tree also shows a business process, Display All Legs of Itinerary,
whose implementation might use another business process (Display Single Leg of
Itinerary). Again, the process tree is not supposed to show such implementation details.
Uempty Checkpoint
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
5. Name the three items that should be described for an abstract data
type.
_____________________________________________________
_____________________________________________________
_____________________________________________________
6. Which of the following items should you specify for a data group?
a. A unique name.
b. A textual description.
c. Its data type.
d. The data groups using it as components.
e. The entity types using it as attributes.
f. Its minimum, and average lengths.
g. A domain for its values.
7. Why should you associate data elements or data groups with entity
types when adding them to the data inventory?
_____________________________________________________
_____________________________________________________
_____________________________________________________
Uempty 10. Which principle is behind coupling the data and process
inventories?
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
12. Name at least six items that the description for a business process
should include.
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
Unit Summary (1 of 2)
Notes:
© Copyright IBM Corp. 2000, 2002 Unit 5. Data and Process Inventories 5-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit Summary (2 of 2)
Notes:
Unit Objectives
After completion of this unit, you should be able to:
Notes:
Up to now, the entity-relationship model and the data and process inventories for the
application domain have been established. Now, it is time to transform the information
collected so far into objects that are machine processable. This requires a sequence of
steps. The first step is to establish the tuple types for the application domain and to
normalize them.
In this unit, we will talk about the purpose of tuple types, describe for which objects of the
entity-relationship model they are established, and how they are established.
The tuple types established this way may contain anomalies and redundant information
and are submitted to a process called Normalization. We will talk about the purpose of
normalization and Normal Forms.
Thus, after the completion of this unit, you should be able to establish the tuple types for an
application domain and to normalize them.
Tuple Types
Tables
Logical Data Structures
Integrity Rules
Notes:
As part of the conceptual view, the entity-relationship model and the data and process
inventories for the application domain were established. The first step of the storage view
uses the data inventory to construct tuple types for the entity types and relationship types
of the entity-relationship model and normalizes them.
Tuple types are an intermediate result of the design process. They are the precursors of
tables and provide the basis for the computerized processing of the entity types and
relationship types for the application domain. They are part of storage view since they
represent the first step in the physical implementation of the conceptual view.
You can view the design process as a layered approach transforming the objects of the
application domain step-by-step into more and more physical representations. Tuple types
are an intermediate result of this transformation process which, finally, results in the tables
and related objects of the target relational database management system.
Uempty
Tuple Types
Tuple Type
A construct:
Representing a class of objects with the same meaning,
structure, and characteristics
Consisting of a set of attributes
Forming the basis for the computerized processing
of the objects belonging to the tuple type
Tuple
A specific instance of a given tuple type
Notes:
Tuple types are the first result of storage view. They are established when the
entity-relationship model is transformed step-by-step into the physical objects for the target
relational database management system. They are not yet the tables for the target system.
They are an intermediate result.
Similarly to entity types, tuple types are constructs representing classes of objects with the
same meaning, structure, and characteristics. As entity types, they consist of attributes
which may be elementary or composite and can assume zero, one, or multiple values.
Nevertheless, they are not entity types. They rather could be seen as a generalization or
standardization of entity types and relationship types.
Tuple types form the basis for the computerized processing of the objects they represent.
Whereas entity types and relationship types were purely conceptual classes, tuple types
should be seen as semi-physical constructs. They can be compared with logical files as
further discussed on the subsequent visual.
A specific instance of a tuple type is referred to as tuple.
In the literature, tuple types are frequently referred to as relations. We have chosen the
term tuple types to avoid confusion with relationships and relationship types and to
emphasize that they are classes of tuples.
Uempty
Characteristics of Tuple Types
Tuple types can be viewed as logical data sets
Tuples form computational units and can be viewed as logical records
Attributes determine contents of logical records
All tuples of a tuple type have the same
Entity Types meaning, structure, and characteristics
Attributes can be elementary or composite
Tuple Types Attributes can assume zero, one, or multiple
values for a tuple
Cardinality determines number of values
Tables
Tuple type must have a set of attributes
uniquely identifying its potential tuples
Primary key
Each tuple type receives a unique name
Relationship
Types Tuple types established for entity types
and most relationship types
Notes:
Tuple types can be viewed as logical data sets. They are the logical containers for the
structured information represented by the tuples. Accordingly, the tuples can be viewed as
logical records. They are the computational units being processed. The attributes of a tuple
type determine the structure and contents of the tuples.
As logical records of logical data sets, all tuples of a tuple type have the same meaning,
structure, and characteristics. This means that they are composed of the same type of
information (attributes) and that the same constraints apply to them.
The attributes for a tuple type can be elementary or composite attributes. For a tuple, an
attribute can assume zero, one, or multiple values. The cardinality for the attribute
determines how many values the attribute must assume at least and at most for each tuple.
Each tuple type must have a set of attributes whose values uniquely identify all potential
tuples of the tuple type. This set of attributes is referred to as primary key of the tuple type.
For reference purposes, each tuple type receives a unique name. This should be the
unique class name expressing the function of the tuples.
In the design process, tuple types are an intermediate result in the process of transforming
the entity types and relationship types of the entity-relationship model into tables of the
target relational database management system. Tuple types are established for all entity
types and most relationship types. As for entity types, the attributes of tuple types are
affiliated with data elements and data groups of the data inventory. Basically, when
establishing a tuple type, the data elements and data groups corresponding to the
attributes or defining attributes of the associated entity type or relationship type are
compiled.
Uempty
Tuple Types for Entity Types
Notes:
For every entity type of the entity-relationship model for the application domain, one tuple
type is established.
As name of the tuple type, we will use the name for the entity type. If you wish, you can use
a different name, but there is no need for that.
The tuple type consists of all attributes for the entity type. Thus, when forming the tuple
type, the data elements and data groups of the data inventory corresponding to the
attributes of the entity type are compiled and cardinalities assigned to them.
The primary key for the tuple type consists of the attributes for the entity key of the entity
type. Since entity keys satisfy the minimum principle (all attributes are necessary for the
unique identification of the entity instances), the primary key also follows the minimum
principle: All attributes are necessary for the unique identification of the individual tuples.
As you can imagine, the constraints for the entity type must be translated into equivalent
constraints for the tuple type. However, at this point in time, we will not worry about the
constraints since tuple types are only an intermediate result.
Note that we have already prepared the establishment of tuple types for entity types. In the
data inventory, we have recorded to which entity types the various data elements and data
groups belong. Thus, to obtain the tuple type for an entity type, you just need to compile the
data elements and data groups for the entity type.
Uempty
Tuple Types for Relationship Types
Name for tuple type = Full name for relationship type
AIRCRAFT MODEL
K Type Code Defining Attributes
K Model Number
... Type Code
Model Number Relationship Key
1. .1
Aircraft Number Aircraft Number
_for_
Notes:
As for tuple types for entity types, we will choose the full name of the relationship type as
name of the corresponding tuple type.
Since they describe the relationship type, the tuple type must consist of all defining
attributes for the relationship type. Thus, to form the tuple type, the data elements and data
groups of the data inventory for the defining attributes of the relationship type are compiled.
Since a tuple expresses a single relationship, all attributes must assume one and only one
value for each tuple. Accordingly, minimum cardinality and maximum cardinality must be 1
for all attributes of the tuple type.
As you would expect, the attributes of the relationship key become the attributes of the
primary key for the tuple type. Since the relationship key had to follow the minimum
principle, the primary key for the tuple type follows the minimum principle as well: All
attributes are required to uniquely identify the individual tuples of the tuple type.
The example on the visual shows relationship type AIRCRAFT MODEL_for_AIRCRAFT, a
1:m relationship type. As you know, the defining attributes for this relationship type are the
entity keys of its source and target: attributes Type Code and Model Number from
AIRCRAFT MODEL and attribute Aircraft Number from AIRCRAFT. They become the
attributes of tuple type AIRCRAFT MODEL_for_AIRCRAFT.
Since the relationship type is a 1:m relationship type, Aircraft Number, the entity key of
AIRCRAFT, becomes the relationship key. Therefore, it also becomes the primary key for
tuple type AIRCRAFT MODEL_for_AIRCRAFT.
Any constraints for the relationship type must be translated into equivalent constraints for
the tuple type. Again, we will not worry about them right now.
Up to now, we have only described how to establish the tuple type for a relationship type.
We have not yet answered the question if there is a tuple type for every relationship type of
the entity-relationship model? Usually, there is one tuple type for a relationship type.
However, there are some exceptions. For some relationship types, there must not be a
tuple type. These cases are described by the subsequent visuals.
Uempty
No Tuple Type for Relationship Type (1 of 3)
Notes:
An owning relationship type connects an entity type or relationship type, the parent, to a
dependent entity type. (The rectangle with rounded corners indicates that the represented
object may be an entity type or a relationship type.) The key of the parent is part of the key
of the dependent entity type and only instances with matching values are interconnected.
As we have seen before, the defining attributes for an owning relationship type consist of
the key of the dependent entity type. Accordingly, the tuple type for the owning relationship
type would just consist of the key of the dependent entity type.
The tuple type for the dependent entity type also contains the key for the dependent entity
type. Since only instances with matching values are interconnected, the tuple type for the
dependent entity type expresses all interconnections, and only those, established via the
owning relationship type. Consequently, a tuple type for the owning relationship type would
be redundant. Therefore, a tuple type for the owning relationship type need not and must
not be provided.
Entity type or
relationship type
Tuple type for r2 includes
. .m defining attributes for r1
Notes:
There is a second case when a tuple type must not be provided for a relationship type.
Assume that you have an m:m relationship type r1 which is the source of another
relationship type r2 with a minimum target cardinality of 1 (cardinality 1..). Because r1 is an
m:m relationship type, its relationship key consists of all its defining attributes.
Consequently, the defining attributes of r2 include the defining attributes of r1. Accordingly,
the tuple type for r2 includes all attributes of the tuple type for r1.
The target cardinality of 1.. of relationship type r2 implies that each instance of r1 is
connected to at least one target instance of r2. In turn, this entails that r2 contains an
instance for every instance of r1.
This means that the tuple type for r2 completely describes the instances for r1 and r2. As a
consequence, a tuple type for r1 would be redundant and, therefore, need not and must
not be provided.
Uempty Note that it is imperative that the minimum target cardinality of r2 be 1. Otherwise, there
would not necessarily be an instance of r2 for every instance of r1. Thus, the instances of
r2 would not describe all instances of r1 and an own tuple type would be required for r1.
Note that it is also necessary that r1 is an m:m relationship type. Otherwise, the key of r1
would not consist of all defining attributes. Thus, the tuple type for r2 would not include all
defining attributes for r1 and, therefore, not completely describe the instances for r1.
Of course, a tuple type would also not be required if r1 were the target of a relationship type
r2 with a minimum source cardinality of 1. This case can be reduced to the case discussed
above by redefining the primary direction of r2.
r2
D 1. .
Dependent
Entity Type
Notes:
This visual combines the cases discussed on the previous two visuals. Thus, it is a
corollary of them.
If r1 is an m:m relationship type and r2 an owning relationship type with a minimum
cardinality of 1 for the dependent entity type, tuple types must not be provided for them.
A tuple type must not be provided for r1 because the tuple type for r2 would fully describe
all instances for r1. A tuple type for r2 must not be provided either because the tuple type
for the dependent entity type completely describes the appropriate instances.
In particular, the situation on the visual exists for mandatory nondefining attributes for m:m
relationship types.
Uempty
Required Tuple Types for CAB
EMPLOYEE
S _is_
1
DC 1 1 DC
AIRCRAFT
m
MECHANIC TYPE PILOT
_can_fly_
_for_
1. .1 m m m
D 1. .m
m
m AIRCRAFT
m m
_trained MODEL
_for_ _can_land m AIRPORT m
.1. .1. _at_
1 1 From To DC _by_ _assigned
_from_
PILOT
_for_ _to_
_nonstop_to_ ASSIGNMENT 1. .1
m 1. .m
m _as_ DC
AIRCRAFT _in_ LEG
_scheduled 1. .1
_for_ _for_
1 m
m m _for_ m
C MAINTENANCE Owner D
RECORD ITINERARY m
FLIGHT
m 1
m
_belongs_to_ _for_
Notes:
The above visual illustrates for which entity types and relationship types of our sample
airline company called Come Aboard tuple types are required:
• Tuple types are required for all entity types of the entity-relationship model for Come
Aboard. Therefore, the entity types are shown in reverse video.
• Because they are owning relationship types, tuple types must not be provided for
relationship types:
AIRCRAFT TYPE_for_AIRCRAFT MODEL
AIRPORT_nonstop_to_AIRPORT_in_ITINERARY_as_LEG
FLIGHT_for_LEG
PILOT_assigned_to_FLIGHT_by_PILOT ASSIGNMENT
EMPLOYEE_is_MECHANIC
EMPLOYEE_is_PILOT
The arrows for these relationship types have been shaded.
Note that the is-bundle for supertype EMPLOYEE represents a set of relationship types
(two). All of them are owning relationship types.
• Tuple types must not be provided for relationship types:
AIRPORT_nonstop_to_AIRPORT_in_ITINERARY
PILOT_assigned_to_FLIGHT
They are m:m relationship types being the source of other relationship types whose
minimum target cardinality is 1. Both are cases of m:m relationship types with mandatory
nondefining attributes.
The arrows for these relationship types have been shaded.
• Tuple types are required for all remaining relationship types. Their connecting arrows
have been highlighted.
Uempty
Documentation of Tuple Types
Notes:
As described before, tuple types consist of attributes. A tuple type for an entity type
consists of the attributes of the entity type. A tuple type for a relationship type consists of
the defining attributes for the relationship type. To describe a tuple type, you need to list its
attributes, thereby, reflecting that the attributes may be composite.
Each line on the visual following the name for the tuple type represents an attribute. The
components of a composite attribute immediately follow the line for the composite attribute
itself and are indented. If a component is again a composite attribute, its components are
indented even further.
In the example, Manufacturer is a composite attribute having composite attribute Address
as a component.
To highlight the name of the tuple type, it is in boldface and has been underlined. The
names of composite attributes have been bold-faced as well.
Attributes belonging to the primary key are marked by the letters PK separated from the
name of the attribute by a comma. If a composite attribute belongs to the primary key (i.e.,
all its components belong to the primary key), the name of the composite attribute is
marked with the letters PK rather than the individual components.
As discussed before, attributes have cardinalities specifying the minimum number and
maximum number of values that an instance of the attribute must/can have. As part of the
documentation of a tuple type, we want to show the cardinalities for its attributes. They are
needed during normalization and in later steps of the design process.
The cardinality for an attribute follows its name or, if applicable, the letters PK and is
specified as follows: Minimum cardinality and maximum cardinality are separated by two
periods and enclosed in brackets:
[minimum .. maximum]
If there is no upper limit for the number of values the attribute can assume, an asterisk (*) is
used as maximum cardinality. Enclosing brackets are used in analogy to the dimension
specification for arrays in programming languages.
If the cardinality for an attribute is omitted, [1..1] is assumed.
Note that the specified cardinalities are relative: If the attribute is a direct component of the
tuple type, the cardinality expresses how many values the attributes must/can assume for
each tuple of the tuple type. If the attribute is a component of a composite attribute, the
cardinality rather specifies how many values the attribute must/can assume within each
instance of the composite attribute.
As a consequence, it is possible that, despite of a minimum cardinality of 1, an attribute
does not assume a value for a specific tuple! This happens if the owning composite
attribute has a minimum cardinality of 0 and does not assume a value for a tuple.
The cardinality for an attribute can be derived from the cardinality specifications for the data
element or data group the attribute is based upon. Thus, it would be possible to omit the
cardinalities in the documentation of a tuple type and to go back to the data inventory when
the cardinalities are needed. However, it is quite handy to have the cardinalities in the tuple
type documentation.
Uempty
Tuple Types With Roles
FLIGHT
Flight Number, PK
Name of Data Airport Code AS From, PK Attribute/Role
Element Airport Code AS To, PK Name
Flight Locator, PK
Departure AS Planned Departure
Departure Date
Departure Time
Name of Data Arrival AS Planned Arrival Attribute/Role
Group Name
Arrival Date
Arrival Time
Departure AS Actual Departure [0..1]
Departure Date
Departure Time Qualified Name
Arrival AS Actual Arrival [0..1] Departure Time OF
Arrival Date Actual Departure
Arrival Time
Notes:
Generally, an attribute of a tuple type receives the same name as the data element or data
group it is based upon. However, you might want to give it a different name. In some cases,
you even have to. If a data element or data group is used by multiple attributes at the same
level in different roles, you need to give the attributes different names. Same level in this
context means as direct components of the tuple type or of a composite attribute.
For example, in tuple type FLIGHT, data element Airport Code is used twice as direct
attribute of the tuple type. Once it is used as airport code for the airport of departure, once
as airport code for the airport of arrival. Without naming them differently, the two roles could
not be differentiated. In the data inventory for Come Aboard, the two roles for the data
element have been identified with different role names (From and To). Therefore, the
names of the attributes should be the role names.
However, you still want to keep the link to the appropriate data element or data group in the
data inventory. You can achieve this by specifying the data element or data group name
and the attribute name by means of an AS clause as done on the visual.
In addition to data element Airport Code, tuple type FLIGHT uses data groups Departure
and Arrival in different roles. The different usages of data group Departure have been
highlighted. Departure is used as planned departure (role/attribute name Planned
Departure) and as actual departure (role/attribute name Actual Departure).
Data group Departure contains data elements Departure Date and Departure Time. This
raises the question if the attributes for the different usages need not be named differently?
They need not because they are components of differently named composite attributes and
are unique in the scope of the composite attributes.
Formally, the full name of a component is qualified by the name of the composite attribute.
For example, the full name of attribute Departure Time of composite attribute Actual
Departure is Departure Time OF Actual Departure.
Uempty
Some Sample Tuple Types for CAB
Notes:
The above visual illustrates some further tuple types for our sample airline company called
Come Aboard. The tuple types in the upper box are for entity types. Tuple type AIRCRAFT
MODEL has only mandatory attributes, i.e., all attributes have a minimum cardinality of 1.
They also all have a maximum cardinality of 1.
Tuple type ITINERARY has only a few attributes. You are probably missing the (starting)
weekdays on which the itinerary is operated as described in the problem statement for
Come Aboard. A closer examination reveals that the weekdays on which itineraries are
operated are not inherent characteristics of itineraries. Rather, they are characteristics of
the legs for itineraries. The (starting) weekdays for an itinerary can be derived from the
(starting) weekdays for its legs.
Only a few attributes of tuple type MAINTENANCE RECORD are shown. Note that the
tuple type contains an attribute Aircraft Number expressing to which aircraft the
maintenance record belongs. In Unit 4 - Entity-Relationship Model, we determined that the
interrelationship between maintenance records and aircraft could not be expressed by
Uempty
A Special Consideration
But be careful!!!
As such, the pure existence of an entity type or the
existence of a relationship type using it as source or
target are information that you may lose
Notes:
It is possible that the tuple type for an entity type just consists of the primary key. However,
it is pretty unusual. Therefore, you should discuss with the application domain expert if the
appropriate entity type is really necessary. When establishing the problem statement for
the application domain, the application domain expert might have thought that there would
be information of that type. However, the data inventory may not contain any data elements
and data groups for the entity type.
If the application domain expert agrees, remove the entity type from the entity-relationship
model, adjust the relationship types using the entity type as source or target accordingly,
and correct the tuple types.
However, you really should examine the case carefully. The pure existence of an entity type
or the use of it by relationship types as source or target constitutes already information that
you may lose by removing the entity type. An entity type represents a class of objects with
the same meaning and characteristics. Being an instance of that class identifies the
appropriate object as a member of the class even if there are no further characteristics to
be stored for the object.
Normalization - An Introduction
Established Tuple Types . . .
Generally, cannot be converted one-to-one into tables
Attributes can assume multiple values whereas columns cannot
May contain redundant information
May lead to inconsistent tuples
May contain insert, update, and delete anomalies
Information cannot be stored because of missing unrelated information
Information may become inconsistent due to updates
Information may be lost when a tuple is deleted
Normalization
Improves condition of tuple types by raising their quality level
Normal Forms define quality levels of tuple types
Five Normal Forms: 1st Normal Form through 5th Normal Form
Subsequent Normal Form based on previous Normal Form
The higher the Normal Form the better the quality of the tuple type
Only first three Normal Forms of practical relevance
Notes:
The tuple types established so far may have the following problems:
• They may have attributes with a maximum cardinality other than 1, i.e., have repeating
groups. A tuple type with repeating groups cannot immediately be converted into a table.
This is because the columns of tables can only accept a single value.
• Even within tuple types, redundant information may be stored. This may lead to
inconsistencies between the tuples of a tuple type if update operations do not change all
affected tuples.
• The tuple types may contain insert, update, and delete anomalies. These anomalies
may prevent the storage of information, cause inconsistent tuples, or result in the loss of
information.
Normalization remedies these deficiencies within, but not across tuple types. It improves
the condition of the tuple types by raising their quality level step-by-step.
There are five quality levels defined for tuple types by means of Normal Forms. These
Normal Forms are referred to as First Normal Form, Second Normal Form, and so on.
Uempty Each subsequent Normal Form requires that the previous Normal Form is satisfied
together with some additional conditions. Thus, the higher the Normal Form for a tuple
type, the better and more stable it is and the fewer of the above-mentioned problems may
occur.
Only the first three Normal Forms are of practical relevance. Nearly nobody ensures that
his/her tuple types satisfy the Fourth Normal Form or even the Fifth Normal Form. Both
Normal Forms deal with n-ary many-to-many relationship types and are more of a
theoretical nature. They are very complex and violations are extremely hard to detect.
Normally, when establishing tuple types based on an entity-relationship model with only
binary relationship types, you should not have violations of the Fourth Normal Form or the
Fifth Normal Form. This assumes that you have dutifully identified your relationship types
and not hidden and combined them in artificial entity types.
Because of the limited practical value of the remaining Normal Forms, we will concentrate
on the first three Normal Forms. However, to illustrate how difficult it is to verify the higher
Normal Forms, we will address the Fourth Normal Form as well, but skip the Fifth Normal
Form.
AIRCRAFT
Aircraft Number, PK
ITINERARY Date Manufactured
Flight Number, PK Seat [0..*] Repeating
Seat Number
Group
Established On
Effective From [0..1] Seat Location
Effective Until [0..1] Seat Class
Section
In 1st Normal Form Date in Service [0..1]
...
Not in 1st Normal Form
Notes:
The First Normal Form deals with repeating groups. This means, it deals with attributes
having a maximum cardinality higher than 1 (considering *, meaning unlimited, also as
higher than 1). Repeating groups represent a problem when mapping tuple types into
tables because the columns of tables only allow a single value. Therefore, the First Normal
Form requires that all attributes, elementary or composite, have at most one value. It is
allowed that an attribute may not have a value for some tuples.
Tuple type ITINERARY for our sample airline company called Come Aboard is in First
Normal Form. None of its attributes has a maximum cardinality higher than 1.
Tuple type AIRCRAFT violates the First Normal Form because it contains a repeating
group. Composite attribute Seat has a maximum cardinality of *. This means, an aircraft
can have many seats and an upper limit has not been established.
Since Seat is a composite attribute, its values are composed of values for its components.
Each value of Seat consists of a value for Seat Number, Seat Location, Seat Class, and
Section. Effectively, this means that these attributes assume multiple values as well,
namely, as many as the composite attribute.
Uempty
First Normal Form - Solution
AIRCRAFT SEAT
Aircraft Number, PK Aircraft Number, PK
Date Manufactured Seat
Date in Service [0..1] Seat Number, PK
... Seat Location
Seat Class
Section
Notes:
You can solve the violation of the First Normal Form as follows:
• Remove attribute Seat from tuple type AIRCRAFT and create a new tuple type SEAT.
The new tuple type contains one tuple for each seat on every aircraft. Accordingly, the
cardinality of composite attribute Seat in the new tuple type is [1..1].
• To not lose the interconnection to aircraft, the new tuple type must contain, for each
seat, the serial number of the aircraft to which the seat belongs (attribute Aircraft
Number).
• None of the attributes alone can form the primary key for the new tuple type since none
uniquely identifies the tuples of the tuple type. Seat numbers are not unique across
aircraft. Different aircraft may have the same seat numbers. However, seat numbers are
unique per aircraft. Therefore, the primary key must consist of two attributes:
Aircraft Number and Seat Number
Sometimes, it is necessary to introduce an additional attribute (e.g., a sequence
number) to attain the unique identification of the tuples. Sometimes, it is desirable to
introduce an additional attribute which, together with other attributes, uniquely identifies
the tuples. However, remember that the primary key is used to reference the individual
tuples of a tuple type. Therefore, it should be as natural as possible. A time which,
together with other attributes, could be used to uniquely identify the tuples is not a good
component for a primary key. Who remembers the various times for the tuples?!
When creating the new tuple type, all logically related attributes with the same maximum
cardinality should be moved to the same tuple type. If the data groups for the composite
attributes of a tuple type were established properly, all logically related attributes should be
part of the same composite attribute. In case of our example, they all belong to composite
attribute Seat. The composite attribute is then the only one (in addition to the primary key of
the original tuple type) to be moved to the new tuple type.
If data groups have not been established at all or improperly, you must determine during
normalization which attributes logically belong together and should be moved together. In
other words, the data groups must be established in any case. Why not establishing them
correctly from the start, i.e., when the data inventory is established?!
Repeating groups may be nested and should be resolved from outside in. Thus, a tuple
type resulting from normalization must be inspected again for violations of the First Normal
Form.
Uempty
First Normal Form - Instance Example
Seat
Aircraft Date Seat Seat Seat Class Section Date in
Number Manufactured Number Location Service
B474001323 1994-10-12 1A WINDOW FIRST N/SMOKING 1997-01-01
1B MIDDLE FIRST N/SMOKING
1C AISLE FIRST N/SMOKING
... ... ... ...
46J WINDOW ECONOMY SMOKING BEFORE
B171004217 1999-10-23 1A WINDOW BUSINESS N/SMOKING 1999-11-15
1B AISLE BUSINESS N/SMOKING
... ... ... ...
28G WINDOW ECONOMY N/SMOKING
AIRCRAFT
Seat
Aircraft Seat Seat Seat Class Section
Number Number Location
AFTER B474001323 1A WINDOW FIRST N/SMOKING
B474001323 1B MIDDLE FIRST N/SMOKING
Aircraft Date Date in B474001323 1C AISLE FIRST N/SMOKING
Number Manufactured Service ... ... ... ... ...
B474001323 1994-10-12 1997-01-01 B474001323 46J WINDOW ECONOMY SMOKING
B171004217 1A WINDOW BUSINESS N/SMOKING
B171004217 1999-10-23 1999-11-15
B171004217 1B AISLE BUSINESS N/SMOKING
AIRCRAFT ... ... ... ... ...
SEAT B171004217 28G WINDOW ECONOMY N/SMOKING
Notes:
This visual uses an instance example for the tuple types considered on the previous
visuals. The tuple types are represented as tables to illustrate some sample tuples for
them.
The top portion of the visual illustrates tuple type AIRCRAFT before normalization. For both
tuples shown, composite attribute Seat, and, thus, its components Seat Number, Seat
Location, Seat Class, and Section, assume many values. The component values in a line
belong together. They form the components of the appropriate value for the composite
attribute. (As you may correctly conclude from the visual, the components of a composite
attribute become separate columns in the tables of the relational database management
system.)
The bottom half of the visual illustrates the situation after normalization. Tuple type
AIRCRAFT no longer contains any seat information. The seat information is contained in
tuple type SEAT. For each seat on an aircraft, SEAT contains one tuple. Aircraft Number
identifies to which aircraft the seat belongs.
_belongs_to_ _for_
Notes:
The fact that a new tuple type has been created to achieve First Normal Form should be
reflected in the entity-relationship model. In case of our example, this means that a
dependent entity type SEAT for entity type AIRCRAFT must be introduced together with the
associated owning relationship type AIRCRAFT_has_SEAT. The entity type is indeed a
dependent entity type:
• The key of entity type AIRCRAFT is part of the entity key for SEAT.
• Instances with matching key/key portion values, and only those, are interconnected.
The target cardinality for the owning relationship type is m (0..m) because the cardinality for
composite attribute Seat was [0..*] in the original tuple type. This means that there are
aircraft without seats (cargo planes). If necessary, go back to the application domain expert
to verify the cardinality.
The problem statement for the application domain should be updated as well (by the
application domain expert).
Uempty
First Normal Form - 2nd Example (1 of 2)
AIRCRAFT AIRCRAFT
Aircraft Number, PK Aircraft Number, PK
Date Manufactured Date Manufactured
Date in Service [0..1]
Date in Service [0..1] Engine 1 [0..1]
Engine [0..4] Repeating Engine Number
Engine Number Group Engine Type
Engine Type Manufacturer
Manufacturer Engine Position
Engine 2 [0..1]
Engine Position Engine Number
... Engine Type
Manufacturer
Are you really sure that Engine Position
Engine 3 [0..1]
this is the solution??? Engine Number
Engine Type
Can you control that there will never be Manufacturer
more than four engines? Engine Position
Engine 4 [0..1]
What about engines not mounted on aircraft? Engine Number
Engine Type
Go back to the application Manufacturer
domain expert and ... Engine Position
...
Figure 6-20. First Normal Form - 2nd Example (1 of 2) CF182.0
Notes:
For the Seat example considered so far, you would have had another, but not attractive,
solution: You could have introduced an own tuple in tuple type AIRCRAFT for each value of
composite attribute Seat by repeating the corresponding values for the other attributes.
However, in this way, you would have created a lot of redundancy endangering the
consistency of the tuples through update operations not changing all related tuples. Thus,
not really a solution to be considered.
This visual discusses another possible solution for repeating groups with a low fixed
maximum cardinality. Look at the example on the visual. Tuple type AIRCRAFT has
another repeating group, namely, the engines belonging to the aircraft. In this repeating
group, Manufacturer is again a composite attribute. Its components have not been listed
since not relevant for the present discussion.
In contrast to the previous example, composite attribute Engine has a low fixed maximum
cardinality. Its maximum cardinality is four.
To abolish the repeating group, you could replace Engine by four composite attributes
Engine 1, Engine 2, Engine 3, and Engine 4. All of these would have the same components
as Engine, but a cardinality of [0..1].
Formally, the violation of the First Normal Form has vanished. However, you should ask
yourself if that is really the solution that you want because it has serious limitations and
drawbacks:
• Are you really sure that the maximum cardinality will not increase over time? Is it really
under your control that the maximum cardinality will not increase or can somebody else
just change the rules on you? If the maximum cardinality increases, you need additional
attributes reflecting the cardinality increase. This will cause changes in your queries and,
especially, your programs because they will handle the various engines individually.
In contrast, if you have a new tuple type with one tuple for each engine of an aircraft, you
can use loop processing. If the proper end-of-data conditions are tested, processing can
be independent of the number of engines mounted and the maximum number of
engines for an aircraft.
• Another question to consider for this solution (as well as for the original tuple type) is:
What happens with engines not mounted on an aircraft? Do you not keep the referenced
information for them as well? As the entity-relationship model and the tuple type for
Come Aboard stand right now, you would not know where to keep information about
engines not mounted.
The case on the visual reveals a problem with the conceptual view of your database
design, especially, with the entity-relationship model. You should go back to the application
domain expert and ask him/her if the engine information must be kept for engines not
mounted? If so, you should solve the violation of the First Normal Form by first correcting
your entity-relationship model and then changing your tuple types accordingly. This is
illustrated on the next visual.
Uempty
First Normal Form - 2nd Example (2 of 2)
... get your ER model in order!!!
DC 1. .1
ENGINE LOCATION
K Engine Number
Engine Position
Notes:
In case of our example, the application domain expert has confirmed that information about
engines is also required for engines not mounted on aircraft. Consequently, the engines
represent an independent conceptual unit, a class of objects with the same meaning and
characteristics. Therefore, they must be represented by an entity type in the
entity-relationship model. Accordingly, the entity-relationship model for Come Aboard is
incomplete. It should be corrected before the tuple types are corrected:
• An entity type ENGINE is introduced containing elementary attributes Engine Number
and Engine Type and composite attribute Manufacturer.
Since the serial numbers for engines are unique across engine manufacturers, Engine
Number becomes the entity key for ENGINE.
• In addition to the entity type, a relationship type ENGINE_on_AIRCRAFT must be
introduced specifying which engines are mounted on the individual aircraft.
• You may wonder why attribute Engine Position has not been added to entity type
ENGINE. Engine Position specifies in which position the appropriate engine is mounted
on an aircraft. The engine position is not a characteristic of the engine as such, but
rather a characteristic of the relationship linking the engine to an aircraft. Accordingly,
Engine Number is a nondefining attribute of relationship type ENGINE_on_AIRCRAFT.
As described in Unit 4 - Entity-Relationship Model, dependent entity types are used to
model the nondefining attributes of relationship types. Therefore, dependent entity type
ENGINE LOCATION is introduced containing attribute Engine Position. Its parent is
relationship type ENGINE_on_AIRCRAFT and its owning relationship type is
ENGINE_on_AIRCRAFT_in_ENGINE POSITION.
The target cardinality of the owning relationship type is 1..1 because each mounted
engine must be in one and only one position of the aircraft. The entity key of ENGINE
LOCATION is Engine Number, the relationship key of ENGINE_on_AIRCRAFT. The
cascading property for the target of the owning relationship type expresses the fact that
the engine position is to be deleted when the engine is taken off the aircraft.
After we have corrected the entity-relationship model, we can establish the corresponding
tuple types:
• We need tuple types for the three entity types, i.e., for AIRCRAFT, ENGINE, and
ENGINE LOCATION. The tuple type for AIRCRAFT no longer contains engine
information. The tuple type for ENGINE contains only the really engine-specific
information. Tuple type ENGINE LOCATION describes, for mounted engines, on which
engine position they are mounted. It does not specify on which aircraft the engine is
mounted.
• We need a tuple type for relationship type ENGINE_on_AIRCRAFT. The tuple type
contains the engine number of the mounted engine and the aircraft number of the
aircraft on which the engine is mounted.
• Since ENGINE_on_AIRCRAFT_in_ENGINE POSITION is an owning relationship type,
we must not have a tuple type for it.
Tuple types ENGINE LOCATION and ENGINE_on_AIRCRAFT have the same primary key.
Since every tuple of ENGINE LOCATION has a corresponding tuple with the same primary
key value in ENGINE_on_AIRCRAFT and vice versa, the two tuple types can be combined.
The resulting tuple type is again called ENGINE_on_AIRCRAFT. We will not further discuss
here when tuple types can be combined. We will leave this to the next unit.
Uempty
Second Normal Form - Definition
A tuple type is in the Second Normal Form if:
It is in the First Normal Form
All its elementary nonkey attributes are
functionally dependent on the entire primary key
FLIGHT
Flight Number, PK
Airport Code AS From, PK
Airport Code AS To, PK LEG
Flight Locator, PK Flight Number, PK
Departure AS Planned Departure
Departure Date Airport Code AS From, PK
Departure Time
Arrival AS Planned Arrival Airport Code AS To, PK
Only Dependent
Arrival Date Leg Number
Arrival Time On
Departure AS Actual Departure [0..1] Mileage Credit
Departure Date ...
Departure Time
Arrival AS Actual Arrival [0..1]
Arrival Date
Arrival Time
Not in 2nd Normal Form
In 2nd Normal Form
Figure 6-22. Second Normal Form - Definition CF182.0
Notes:
Basically, the Second Normal Form deals with the improper assignment of attributes to
tuple types. It applies to tuple types whose primary keys consist of more than one
elementary attribute.
A tuple type is in the Second Normal Form if:
• It is in First Normal Form.
• All its elementary nonkey attributes are functionally dependent on the entire primary
key, i.e., on all attributes belonging to the primary key.
As mentioned before, if a composite attributes belongs to the primary key, all its
components belong to the primary key. Thus, the functional dependence must be on all
components of the composite attribute.
Similarly, all elementary components of a composite attribute must be functionally
dependent on the entire primary key for the Second Normal Form to be satisfied.
The primary key of tuple type FLIGHT for our sample airline company consists of four
attributes. All elementary nonkey attributes of the tuple type are functionally dependent on
the entire primary key, i.e., on all four attributes. Therefore, tuple type FLIGHT is in Second
Normal Form.
The primary key of tuple type LEG consists of the three attributes Flight Number, From and
To. From and To identify the airport of departure and the airport of arrival for the leg of the
considered flight. Leg Number depends on all attributes of the primary key. A different
itinerary (flight number) may contain the same nonstop connection as a different leg.
In contrast, attribute Mileage Credit, i.e., the miles credited for the leg on frequent-flyer
accounts, does not dependent on Flight Number. It only depends on the airport of
departure and the airport of arrival, i.e., on From and To. Thus, the tuple type violates the
Second Normal Form.
Uempty
Second Normal Form - Solution
m m Tuple types for AIRPORT
AIRPORT and ITINERARY unchanged
From To
Notes:
As mentioned before, the Second Normal Form deals with attributes assigned to the wrong
tuple type. Attribute Mileage Credit in our example should not have been assigned to tuple
type Leg.
To determine the proper tuple type, you should consult the entity-relationship model for the
application domain. There are two possibilities:
• The entity-relationship model contains the entity type to which the improperly assigned
really belongs. In this case, add the attribute to the tuple type for the entity type.
• The entity-relationship model is incomplete since it does not contain the proper entity
type for the attribute. In this case, correct the entity-relationship model and reestablish
the tuple types concerned based on the corrected entity-relationship model.
In case of our example, the entity-relationship model is missing the proper entity type for
attribute Mileage Credit. As a matter of fact, Mileage Credit is rather a nondefining attribute
for nonstop connections, i.e., for relationship type AIRPORT_nonstop_to_AIRPORT. Thus,
it is modeled as a dependent entity type for that relationship type as illustrated on the
visual. The dependent entity type is called NONSTOP CONNECTION.
The cardinality of 1..1 for the target of the owning relationship type requires the mileage
credit to be provided when the nonstop connection is established.
Having introduced dependent entity type NONSTOP CONNECTION, the relationship type
specifying the nonstop connections for the various itineraries can now interconnect entity
types NONSTOP CONNECTION and ITINERARY. It need no longer interconnect
relationship type AIRPORT_nonstop_to_AIRPORT and entity type ITINERARY. The new
relationship type is called NONSTOP CONNECTION_in_ITINERARY. As a consequence,
dependent entity type LEG must now be based on this relationship type.
Of course, the problem statement for the application domain and the data inventory should
be updated accordingly by the application domain expert and the data base designer.
After we have corrected the entity-relationship model, we can reestablish the tuple types
for the entity types and relationship types concerned:
• The tuple types for entity types AIRPORT and ITINERARY remain unchanged.
• The new tuple type NONSTOP CONNECTION contains attribute Mileage Credit and the
key of the dependent entity type, i.e., the attributes From and To.
• The tuple type for entity type LEG no longer contains attribute Mileage Credit.
• Tuple types must not be provided for any of the relationship types on the visual for the
following reasons:
- Relationship types AIRPORT_nonstop_to_AIRPORT and
NONSTOP CONNECTION_in_ITINERARY are m:m relationship types being the
source of other relationship types with a minimum target cardinality of 1 (see
page 6-14).
- Relationship types AIRPORT_nonstop_to_AIRPORT_in_NONSTOP CONNECTION
and NONSTOP CONNECTION_in_ITINERARY_as_LEG are owning relationship
types (see page 6-13).
During the establishment of the entity-relationship model for Come Aboard, we already
resolved another violation of the Second Normal Form. The attributes of entity type
AIRCRAFT TYPE originally belonged to entity type AIRCRAFT MODEL which represented
a violation of the Second Normal Form.
Uempty
Third Normal Form - Definition
A tuple type is in the Third Normal Form if:
It is in the Second Normal Form
None of its elementary nonkey attributes is
functionally dependent on other nonkey attributes
AIRCRAFT TYPE
AIRCRAFT MODEL Type Code, PK
Category
Type Code, PK Manufacturer
Model Number, PK Manufacturer Code
Dimensions Company Name
Length Functionally Address
Height Dependent On Street [0..1]
Wing Span Post Office Box [0..1]
Weights City
Net Weight State [0..1]
Maximum Weight Country
Postal Code [0..1]
Cruising Speed Phone Number
Number of Engines
In 3rd Normal Form
Not in 3rd Normal Form
Figure 6-24. Third Normal Form - Definition CF182.0
Notes:
The Third Normal Form requires that a tuple type is in Second Normal Form and none of its
elementary nonkey attributes is functionally dependent on other nonkey attributes.
If attribute-1 and attribute-2 are attributes of a tuple type, attribute-2 is functionally
dependent on attribute-1 if, for each occurrence of a value of attribute-1, attribute-2
assumes the same value. For different values of attribute-1, attribute-2 may assume
different values. However, for the same value of attribute-1, it must always assume the
same value. Functional dependence may not just exist on a single elementary attribute; it
can also exist on a composite attribute, meaning dependence on all components, or on a
set of attributes.
For the Third Normal Form, functional independence is not only required for the direct
elementary attributes of the tuple type, but for all components of composite attributes. This
means, it is required for all elementary attributes of the tree structure for the tuple type.
Furthermore, there must not be a functional dependence on components of composite
attributes.
Tuple type AIRCRAFT MODEL on the visual is in Third Normal Form because none of its
elementary nonkey attributes is dependent on other nonkey attributes. The dimensions,
weights, and the cruising speed are all functionally independent of each other. For the
same dimensions, different weights and cruising speeds may apply and vice versa.
In tuple type AIRCRAFT TYPE, elementary attributes Company Name, Phone Number,
and all components of composite attribute Address are functionally dependent on attribute
Manufacturer Code. Thus, tuple type AIRCRAFT TYPE is not in Third Normal Form.
Violations of the Third Normal Form can lead to inconsistent tuples as a consequence of
update operations changing only some of the tuples with the same dependent values. They
may also lead to the loss of the dependent information if the last tuple for a value is deleted.
If the data groups for the composite attributes of a tuple type were established properly, all
related functional dependences should be within the same composite attribute. For our
example, they are all in composite attribute Manufacturer. Thus, the usage of properly
created composite attributes can ease your task of determining functional dependences. If
you have not formed data groups/composite attribute or have not established them
correctly, functional dependences may exist across composite attributes.
Uempty
Third Normal Form - Solution
MANUFACTURER
Manufacturer Code, PK
Company Name
AIRCRAFT TYPE Address
Type Code, PK Street [0..1]
Category Post Office Box [0..1]
Manufacturer Code City
Number of Engines State [0..1]
Country
Postal Code [0..1]
Phone Number
Notes:
To solve a violation of the Third Normal Form, you must move all attributes being
functionally dependent on the same set of attributes to a new tuple type. The attributes the
moved attributes were dependent on are repeated in the new tuple type. They become the
primary key of the new tuple type.
In case of our example, the attributes Company Name, Phone Number and all components
of Address are removed from tuple type AIRCRAFT TYPE. They become attributes of a
new tuple type MANUFACTURER. Attribute Manufacturer Code remains in tuple type
AIRCRAFT TYPE, but is repeated in MANUFACTURER. It becomes the primary key of
MANUFACTURER. In this way, the association between aircraft types and manufacturers is
maintained.
If the composite attributes for a tuple type have been formed correctly, the resolution of a
Third Normal Form violation incorporates the following:
• A new tuple type is created for the entire composite attribute having the functional
dependences.
• The primary key of the new tuple type is repeated (remains) in the original tuple type.
For our sample tuple type, the composite attributes have been formed correctly.
Accordingly, a new tuple type MANUFACTURER has been created for composite attribute
Manufacturer and the primary key of that tuple type is repeated in the original tuple type.
Uempty
Third Normal Form - Instance Example
BEFORE
Manufacturer
Type Manufacturer Company Name City State Country Number of
Code Code Engines
B747 BOEING BOEING CORPORATION SEATTLE WA USA 4
A310 AIRBUS AIRBUS INDUSTRIES TOULOUSE FRANCE 2
A340 AIRBUS AIRBUS INDUSTRIES TOULOUSE FRANCE 4
B737 BOEING BOEING CORPORATION SEATTLE WA USA 2
A319 AIRBUS AIRBUS INDUSTRIES TOULOUSE FRANCE 2
B777 BOEING BOEING CORPORATION SEATTLE WA USA 2
AIRCRAFT TYPE
Type Manufacturer Number of
Code Code Engines
B747 BOEING 4
A310 AIRBUS 2
A340 AIRBUS 4
B737 BOEING 2 AIRCRAFT TYPE AFTER
A319 AIRBUS 2
B777 BOEING 2
Notes:
This visual gives an instance example for the tuple types of the previous visuals. However,
because of the limited size of the visual, some attributes have been omitted: Category,
Street, Post Office Box, Postal Code, and Phone Number are not shown. The tuple types
have been presented in form of tables to show multiple instances for them.
The top portion of the visual illustrates tuple type AIRCRAFT TYPE before normalization.
The information for a manufacturer is (must be) repeated for each aircraft type produced by
him/her. As you can envisage, this leads to inconsistent information if only some of the
tuples for a manufacturer are updated when the manufacturer information changes.
The bottom half of the visual illustrates the situation after normalization. Tuple type
AIRCRAFT TYPE now only contains the manufacturer code and no longer the information
functionally dependent on it. The information for a manufacturer is contained in tuple type
MANUFACTURER. Tuple type MANUFACTURER contains one tuple for every
manufacturer.
The new tuple type allows Come Aboard to store information about manufacturers without
having aircraft types from them. This was not possible before normalization.
Notes:
The fact that a new tuple type has been created to comply with the Third Normal Form
should be reflected in the entity-relationship model. The creation of a new tuple type really
means that the appropriate information has become an independent conceptual unit
representing a class of objects with the same meaning and characteristics. Thus, it means
that the entity-relationship model should contain a new entity type.
Since the new tuple type has an association with the old tuple type, the new entity type
must have a relationship type with the entity type (or relationship type) for the old tuple
type.
In case of our example, the entity-relationship model must be extended by a new entity
type (MANUFACTURER) and a new relationship type between entity types AIRCRAFT
TYPE and MANUFACTURER. The relationship type is called AIRCRAFT
TYPE_from_MANUFACTURER. The relationship type is a 1:m relationship type: An
aircraft type can be from one and only one manufacturer, but a manufacturer may
manufacture multiple aircraft types. Accordingly, the key of the relationship type is Type
Code, the key of entity type AIRCRAFT TYPE.
Uempty The source cardinality of m (0..m) indicates that Come Aboard wants to keep information
about manufacturers even if it does not own one of their aircraft types. However, you must
verify this with the application domain expert. The unnormalized tuple type for AIRCRAFT
TYPE would not have allowed you to store information about manufacturers without an
aircraft type. This was a further reason for resolving the violation of the Third Normal Form.
As a matter of principle, you should correct the entity-relationship model first and then
reestablish the tuple types based on the corrected entity-relationship model. When
reestablishing the tuple types based on the corrected entity-relationship model, you get
tuple types for entity types AIRCRAFT TYPE and MANUFACTURER and for relationship
type AIRCRAFT TYPE_from_MANUFACTURER.
The tuple type for AIRCRAFT TYPE does not contain attribute Manufacturer Code! The
interrelationship between aircraft types and manufacturers is rather expressed by tuple
type AIRCRAFT TYPE_from_MANUFACTURER.
The fact that we get three tuple types seems to be conflicting with the solution developed
before. It is not. Tuple types AIRCRAFT TYPE and
AIRCRAFT TYPE_from_MANUFACTURER have the same primary key. For each tuple in
MANUFACTURER, AIRCRAFT TYPE_from_MANUFACTURER has a corresponding
tuple, and vice versa. Therefore, the two tuple types can be combined as will be discussed
further in the subsequent unit.
ENGINE
ENGINE Engine Number, PK
Engine Number, PK Engine Type
Engine Type Manufacturer Code
Manufacturer
Manufacturer Code MANUFACTURER
Company Name Manufacturer Code, PK
Address Company Name
Street [0..1] 3NF Address
Post Office Box [0..1] Street [0..1]
City Post Office Box [0..1]
State [0..1] City
Country State [0..1]
Postal Code [0..1] Country
Phone Number Postal Code [0..1]
Phone Number
Notes:
This visual illustrates another violation of the Third Normal Form for our sample airline
company. Tuple type ENGINE contains the same composite attribute Manufacturer as
tuple type AIRCRAFT TYPE before normalization. Thus, it violates the Third Normal Form
as well.
The resolution of the violation is the same as for AIRCRAFT TYPE. The composite attribute
forms an own tuple type (MANUFACTURER). Except for Manufacturer Code, the attributes
of composite attribute Manufacturer are removed from tuple type AIRCRAFT TYPE.
This raises the question if tuple type MANUFACTURER is the same as created for
AIRCRAFT TYPE? Both tuple types have the same attributes. As usual, the question must
be answered by the application domain expert. From the real world, we know that some
manufacturers produce both aircraft and engines whereas other only manufacture engines
or aircraft. Thus, how should we solve the problem? The alternatives are discussed on the
next two visuals.
Uempty
3rd NF in Multiple Tuple Types (Alternative 1)
Notes:
This visual illustrates a possible solution for the problem raised on the previous visual. The
solution is discussed for the entity-relationship model changes required. The tuple types
then follow automatically. Since the attributes for engine and aircraft manufacturers are the
same, you can use the same entity type MANUFACTURER (and, thus, tuple type) to store
information about both. For each manufacturer, the entity type contains one entity instance.
To complete the entity-relationship model, you need relationship types
AIRCRAFT TYPE_from_MANUFACTURER and ENGINE_from_MANUFACTURER
expressing the interrelationships between aircraft types and manufacturers and engines
and manufacturers.
However, the solution has one problem: It is possible to establish relationships between
aircraft types and manufacturers just producing engines and between engines and
manufacturers only manufacturing aircraft.
This problem is generally considered a data-entry problem. Your data would also be wrong
if you specified the wrong aircraft manufacturer for an aircraft type or the wrong engine
manufacturer for an engine. Therefore, most application domain experts and database
designers will just go with this solution without further constraints. To solve the problem
completely, you can:
• introduce an additional attribute Manufacturer Type specifying the type of manufacturer
(engine manufacturer, aircraft manufacturer, or both engine and aircraft manufacturer)
• define constraints for relationship types AIRCRAFT TYPE_from_MANUFACTURER and
ENGINE_from_MANUFACTURER restricting the instances of the relationship types
based on the values of attribute Manufacturer Type of entity type MANUFACTURER.
The constraints for the relationship types prevent the improper assignment of
manufacturers.
Uempty
3rd NF in Multiple Tuple Types (Alternative 2)
AIRCRAFT MANU-
TYPE m FACTURER
S
_for_ _is_
_from_ 1. .m
D 1. .m
AIRCRAFT DC 1 1 DC
MODEL AIRCRAFT ENGINE
1. .1 1. .1 MANUFACTURER MANUFACTURER
_for_
1. .1
m
_from_
_on_ m
AIRCRAFT ENGINE
1 m
_in_
DC 1. .1
ENGINE
LOCATION
Notes:
This visual illustrates an alternate solution using supertypes and subtypes.
MANUFACTURER is made a supertype for subtypes AIRCRAFT MANUFACTURER and
ENGINE MANUFACTURER. MANUFACTURER contains instances for all manufacturers.
AIRCRAFT MANUFACTURER contains instances for manufacturers producing aircraft
(and possibly engines) and ENGINE MANUFACTURER instances for manufacturers
producing engines (and possibly aircraft).
In addition. relationship types are established between entity type AIRCRAFT TYPE and
subtype AIRCRAFT MANUFACTURER and entity type ENGINE and subtype ENGINE
MANUFACTURER.
Unless you want to store additional information for the different manufacturer types (a
possible by-product of the solution), subtypes AIRCRAFT MANUFACTURER and ENGINE
MANUFACTURER have a single attribute (Manufacturer Code). Therefore, if you do not
have additional information for the different manufacturer types, the solution would probably
be considered exaggerated.
Notes:
The Fourth Normal Form requires that:
• the tuple type is in the Third Normal Form and
• its attributes do not have multivalued dependencies on each other.
A multivalued dependency involves three attributes. If attribute-1, attribute-2, and
attribute-3 are attributes of the same tuple type, attribute-3 is said to be multivalued
dependent on attribute-1 by the way of attribute-2 if the following is true:
For each value of attribute-2 occurring with a specific, but arbitrary, value of attribute-1, the
tuple type must contain tuples with the same values for attribute-3.
To make this definition more understandable, let us assume a21, a22, and a23 are values
of attribute-2 occurring with value a11 for attribute-1 in tuples of the tuple type.
Furthermore, assume that the tuples for a11 and a21 have the following values for
attribute-3:
Uempty
Notes:
The above tuple type has not been the result of the creation of the tuple types for our
sample airline company called Come Aboard. It has been created artificially to demonstrate
a violation of the Fourth Normal Form. It has been created by joining the tuple types for two
m:m relationship types.
The tuple type lists, for the various aircraft models, both the pilots that can fly them and the
mechanics that are trained for them, i.e., can maintain them.
Each tuple contains an aircraft model (type code and model number), a pilot employee
number, and a mechanic employee number. Composite attribute Aircraft Model has been
used for clarity reasons. It groups the two attributes Type Code and Model Number
uniquely identifying aircraft models. All cardinalities of the tuple type are implicitly defined
and, therefore, are [1..1]. Thus, the tuple type is in First Normal Form.
All attributes belong to the primary key for the tuple type since:
• An aircraft model can be flown by many pilots, and a pilot can fly many aircraft models.
Uempty • Many mechanics may be trained for an aircraft model, and a mechanic may be trained
for many aircraft models.
Consequently, the tuple type is in the Second Normal Form and even in the Third Normal
Form.
It is a further assumption for the tuple type that there are not any special interdependencies
between pilots and mechanics.
Notes:
This visual illustrates an instance example for the tuple type explained on the previous
visual. Attribute Mechanic Employee Number is multivalued dependent on composite
attribute Aircraft Model, i.e., on Type Code and Model Number, by the way of Pilot
Employee Number:
• Take a specific aircraft model, for example, the Boeing B747, Model 400.
• It occurs together with pilot employee numbers 0491337, 0844092, and 0003613.
• For the selected aircraft model and pilot employee number 0491337, the mechanic
employee numbers are 5219330 and 6027005.
• Since the mechanics trained for an aircraft model have nothing to do with the pilots, the
same mechanics need be listed for pilot numbers 0844092 and 0003613.
• Similar considerations apply to any other aircraft model selected. For the Airbus A310,
Model 300, tuples with the same mechanic employee numbers must exist for pilots
3721040 and 1662951.
Uempty Accordingly, the tuple type violates the Fourth Normal Form. Improper insertions or
deletions of tuples could result in inconsistent data by violating the multivalued
dependencies. For example, if the tuple for aircraft model B747, Model 400, pilot employee
number 0844092, and mechanic employee number 6027005 were deleted, the data would
be inconsistent.
By the way, multivalued dependencies always come in pairs. If attribute-3 is multivalued
dependent on attribute-1 by the way of attribute-2, then attribute-2 is multivalued dependent
on attribute-1 by the way of attribute-3. In case of our example, Pilot Employee Number is
multivalued dependent on Aircraft Model by the way of Mechanic Employee Number. The
proof is left to you.
Model Mechanic
Type Code
Number Employee Number
MECHANIC_trained_for_ AIRCRAFT MODEL B747 400 5219330
Aircraft Model, PK B747 400 6027005
Type Code A310 300 4421026
Model Number A310 300 6027005
Employee Number, PK A310 300 1427254
Notes:
To solve Fourth Normal Form violations, the multivalued interdependencies between the
attributes must be unbundled by creating separate tuple types. One tuple type is created
for each relationship type. Accordingly, you get a tuple type for:
• the interdependency between aircraft models and pilots which is nothing else than
relationship type PILOT_can_fly_AIRCRAFT MODEL
• the interdependency between aircraft models and mechanics corresponding to
relationship type MECHANIC_trained_for_AIRCRAFT MODEL
Since each tuple type only contains a single employee number, AS clauses need not be
used. The purpose of the employee numbers is apparent from the meaning of the tuple
types.
The instances for the two tuple types are shown on the right-hand side of the visual.
If you have properly identified all relationship types in the entity-relationship model and
have not hidden them in entity types, you should not have violations of the Fourth Normal
Uempty Form. The above example was created by joining the tuple types for the two relationship
types.
Checkpoint
Uempty 7. For which relationship types must tuple types not be established?
_____________________________________________________
_____________________________________________________
_____________________________________________________
Uempty _____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
18. If a tuple type is in Third Normal Form, all its elementary nonkey
attributes are functionally dependent on the entire primary key.
(T/F)
21. Tuple types for relationship types can never violate any of the
Normal Forms. (T/F)
Unit Summary (1 of 2)
Notes:
Uempty
Unit Summary (2 of 2)
Notes:
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit Objectives
After completion of this unit, you should be able to:
Notes:
Conceptually, the tuple types established so far could immediately be converted into tables
of the target database management system. However, this would result in more tables than
necessary making it harder and more expensive than necessary to retrieve and maintain
the data for the application domain. Therefore, it is desirable to combine multiple tuple
types into a single tuple type, and thus a single table, if possible and reasonable. We will
discuss in this unit when tuple types can be combined.
Furthermore, limitations of the target database management system may not allow you to
convert the tuple types one-to-one into tables. The limitations as well as performance
considerations may force you to split tuple types vertically or horizontally into multiple
smaller tuple types which then can be implemented as tables.
Performance considerations may induce you to reverse normalizations you performed and
to take care of the resulting problems in a different manner. You might also want to
denormalize tuple types that were separate and not created by normalizations.
After these steps, you can establish the tables for the application domain. This includes:
Uempty • Implementing the abstract data types for the attributes of the tuple types.
• Determining the data types (abstract or built-in) and column attributes for the columns of
the tables.
• Documenting the tables, their columns, and the related database objects for the
application domain.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Tuple
Types
Tables
Logical Data
Structures
Integrity
Rules
Logical View Storage Indexes
View
Notes:
This unit deals with the establishment of the tables for the target database management
system. The tables are the containers for the data of the application domain. Thus, we are
right in the heart of physical design, i.e., of the storage view.
Uempty
Tables for Tuple Types
AIRCRAFT MODEL
Type Code, PK
Model Number, PK
Dimensions
Length
Height
Wing Span
Weights
Net Weight
Maximum Weight
Cruising Speed
AIRCRAFT_MODEL
Figure 7-3. Tables for Tuple Types CF182.0
Notes:
Formally, the tuple types established so far can be translated into tables of the target
database management system as follows:
• For each tuple type, one table is created. The name for the table must follow the rules
for table names of the target database management system. There are length
restrictions for table names as well as restrictions on the characters they may include.
Unless you use delimited identifiers for table names, the table name may, for example,
not include blanks. However, they may generally include underscores (_). Thus, it is a
good idea to replace blanks in the names of the tuple types by underscores. Delimited
identifiers have the disadvantage that you need to specify enclosing double-quotes for
all references in SQL statements.
• Each (direct or indirect) elementary attribute of the tuple type becomes a column of the
table for the tuple type.
At present, composite attributes cannot be reflected in tables, only their elementary
components, the elementary components of their composite components, and so on.
Thus, tables cannot reflect the structure imposed by the composite attributes.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
The columns receive names in the target database management system. To column
names, the same restrictions apply as to table names. Furthermore, the names for the
columns of a table must be unique. Thus, if the same data group is used multiple times
(in different roles) by a tuple type, you must name the components of the corresponding
composite attributes differently. One way to achieve this is including the name of the
composite attribute (or part of it) in the column name. However, you must ensure that the
length restrictions for column names are adhered to.
• As for tuple types, a primary key is established for each table uniquely identifying the
rows of the table. The elementary attributes of the primary key for the tuple type become
the columns of the primary key for the table.
Uempty
Conversion of Tuple Types into Tables
Data types for data elements associated with elementary attributes must
be implemented by means of:
Built-in data types for target DBMS or user defined distinct types
User defined functions, check constraints, and/or triggers
However . . .
Figure 7-4. Conversion of Tuple Types into Tables CF182.0
Notes:
The bullets in the gray box on this visual have already been described in the student notes
for the previous visual.
The elementary attributes for the tuple type are based on data elements in the data
inventory. In turn, the data elements are associated with data types. These data types must
be reflected in the target database management system. They can be implemented by
means of:
• Built-in data types for the target database management system or user defined distinct
types. Built-in data types are data types provided by the target database management
system. They are also referred to as standard data types. User defined distinct types are
data types that you can define yourself based on the built-in data types. Both built-in
data types and user defined distinct types will be discussed later in this units.
• In addition, the implementation of the data types for the data elements may need user
defined functions, check constraints, and triggers. User defined functions allow you to
perform customized operations for your data. Check constraints allow you to introduce
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
value constraints for the columns of a table. Triggers allow you to perform selected
actions as the consequence of database insert, update, or delete operations.
All these items will be discussed later in this unit.
As we mentioned before, the tuple types can formally be translated into tables in the
manner described. However, you should further manipulate the tuple types before
converting them into tables. The subsequent visuals will discuss why you should do this
and what you should do.
Uempty
Problems With One-to-One Conversion
Notes:
As described before, formally, the normalized tuple types could be converted into tables
one-to-one. However, this may result in more tables than required unnecessarily
complicating queries and programs by Join operations. In addition, the Join operations
result in performance degradations for queries and programs.
To avoid these problems, tuple types should be combined into a single tuple type where
possible and reasonable before converting them into tables.
Size limitations for the target database management system are a second problem
preventing the one-to-one conversion of tuple types into tables. Such limitations are upper
limits for the row size, the number of columns, or the table size. They may force you to split
a tuple type vertically or horizontally into multiple tuple types before creating the tables.
A third consideration is that the resulting tables may have columns with very different
usage characteristics and different importance for the application domain. Some of the
columns may never be used together. Some of them may only be used by unimportant and
not performance-critical business processes whereas the others are used by important,
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
performance-critical, processes. In this case, it may also make sense to split the tuple
types vertically before creating the tables to separate columns with different usage profiles.
Uempty
Merging Partial Tuple Types
BEFORE
Aircraft Date Date in Aircraft Type Model
Number Manufactured Service Number Code Number
B474001323 1994-10-12 1997-01-01 B474001323 B747 400
B373004518 1999-02-28 1999-03-15 One-to-one B373004518 B737 300
B373004519 1999-03-31 1999-04-20 correspondence B373004519 B737 300
A103000534 1998-05-12 1998-07-21 of primary key A103000534 A310 300
A103003167 1997-08-01 1997-09-01 values A103003167 A310 300
A402004217 1999-10-23 1999-11-15 A402004217 A340 200
AIRCRAFT AIRCRAFT MODEL
_for_AIRCRAFT
Same primary key
Notes:
Tuple types having the same primary key can be united in a single tuple type if they
contain, at all times, tuples with corresponding primary key values. This means that each
primary key value in one tuple type also occurs in the other tuple type and vice versa.
The two tuple types can be combined by adding the nonkey attributes of one tuple type to
the other tuple type. Note that it may be necessary to rename some of the added attributes.
It does not matter which tuple type is integrated in the other tuple type. In general, you will
integrate the tuple type with the smaller number of nonkey attributes in the other tuple type.
You may consider renaming the unified tuple type.
Since the original tuple types form parts of the larger, unified, tuple type, the unification is
referred to as merging of partial tuple types.
The example on the visual merges tuple types AIRCRAFT and
AIRCRAFT MODEL_for_AIRCRAFT being tuple types for an entity type and a relationship
type, respectively. For both tuple types, Aircraft Number is the primary key.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
Finding Partial Tuple Types from ER Model
Entity type or D Dependent
relationship type Entity Type
1. .1
Tuple Tuple
Type 1 Type 2
Notes:
For the sample tuple types on the previous visual, we used the entity-relationship model to
determine if the tuple types could be combined. This raises the question if the
entity-relationship model can generally be used to determine the partial tuple types that can
be merged? Indeed, the entity-relationship model helps to determine them.
In the following cases, the tuple types for entity types or relationship types represent partial
tuple types and can be merged:
• One of the tuple types is for a dependent entity type with a cardinality of 1..1 (for the
owning relationship type). The other tuple type may be for an entity type or a relationship
type. In this case, the two tuple types can be combined, for example, by integrating the
tuple type for the dependent entity type into the other tuple type.
Because of cardinality 1..1 for the dependent entity type, both tuple types have the same
primary key: Being a dependent entity type means that the own entity key includes the
key of the parent. Because of maximum cardinality 1, the entity key of the dependent
entity type need not and must not contain additional attributes.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Being a dependent entity type also means that, for every entity instance, the parent
contains an instance with the corresponding key value. Conversely, the minimum
cardinality of 1 requires that the dependent entity type contains an instance for every
parent instance.
• One of the tuple types is for a relationship type with cardinality 1..1 for one end (e.g., the
target) and maximum cardinality m for the other end. In this case, the tuple types for the
relationship type and for the end with maximum cardinality m can be combined. For
example, the tuple type for the relationship type can be integrated into the tuple type for
the end with maximum cardinality m.
Because of the cardinalities, the key for the relationship type consists of the key of the
end with maximum cardinality m. Thus, the corresponding tuple types have the same
primary key.
Cardinality 1..1 enforces that, for every instance of the end with maximum cardinality m,
the relationship type contains one and only one instance with the same key values.
Since source and target must exist for relationship instances, the end with maximum
cardinality m must contain, for every relationship instance, an instance with the same
key value. Thus, the corresponding tuple types are partial tuple types and can be
combined. For example, the tuple type for the relationship type can be integrated into
the tuple type for the end with maximum cardinality m.
This constellation represents the one on the previous visual.
• One of the tuple types is for a relationship type with cardinality 1..1 for one end (e.g., the
target) and cardinality 0..1 for the other end. In addition, the key of the relationship type
has been chosen to be the key of the end with cardinality 0..1. In this case, the tuple
types for the relationship type and for the end with cardinality 0..1 are partial tuple types
and can be combined. For example, the tuple type for the relationship type can be
integrated into the tuple type for the end with cardinality 0..1.
Since the key of the end with cardinality 0..1 has been chosen as relationship key, the
primary keys of the two tuple types are the same. (Note that there was a choice for the
relationship key because both maximum cardinalities were 1.) For the same reasons as
for the previous case, the two tuple types must at all times have corresponding primary
key values.
• One of the tuple types is for a relationship type with cardinality 1..1 for both ends. In this
case, the tuple type for the relationship type can be combined with the tuple type for the
source or with the tuple type for the target. With which tuple type it can be combined,
depends on which of the keys has been made the relationship key: If the key of the
source has been selected, the tuple type for the relationship type and the tuple type for
the source can be combined. If the key of the target has been selected, the tuple type for
the relationship type and the tuple type for the target can be combined.
• As you can imagine, combinations of the above cases may lead to cascaded mergers of
tuple types.
Uempty Theoretically, it is possible that other tuple types can be combined as well. However, you
should only combine tuple types that can be combined directly or through several mergers.
Tuple types that cannot be combined by subsequent mergers have nothing to do with each
other. They lead to columns in tables that are never used together and, therefore, may
negatively impact performance.
It must be decided from case to case whether or not the combination of the tuple types
should be reflected in the entity-relationship model. If tuple types for relationship types are
involved, you do not want to reflect the merging of the tuple types in the entity-relationship
model. The entity-relationship model would no longer correctly describe the
interrelationships between entity types and relationship types.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Tuple type T2 can be imbedded into tuple type T1 if:
1. Both tuple types have the same primary key.
2. The primary key values of T2 form, at all times, a subset of the primary key values of T1.
3. For each tuple of T2, at least one of the nonkey attributes has a value. It need not
necessarily be the same attribute for all tuples.
The resulting extended tuple type T1 contains all attributes it contained before and the
nonkey attributes of tuple type T2. Note that it may be necessary to rename some of the
added attributes.
Tuples of old tuple type T1 not having a counterpart in T2 do not have a value for any
attributes added to new tuple type T1. Tuples of old tuple type T1 with a counterpart in T2
have a value for at least one attribute added to new tuple type T1 (third condition).
After the elimination of T2, it is still possible to determine the original tuple types (and, thus,
entity types or relationship types) for the various tuples. Thus, their (original) identity has
been preserved and no information has been lost.
Uempty Since the tuples of T2 provide additional details for tuples of T1, tuple type T2 is referred as
a detail tuple type.
In the example on the visual, tuple type ENGINE_on_AIRCRAFT is a detail tuple type for
tuple type ENGINE. It provides further detail information for engines, namely, where they
are mounted. ENGINE_on_AIRCRAFT was created during in Unit 6 - Tuple Types as a
consequence of normalization.
Both tuple types have the same primary key Engine Number. Since not all engines are
mounted on aircraft, the primary key values of ENGINE_on_AIRCRAFT form a subset of
the primary key values of tuple type ENGINE. Attribute Aircraft Number of tuple type
ENGINE_on_AIRCRAFT always has a value so that the third condition for the imbedding of
tuple types is satisfied. Accordingly, ENGINE_on_AIRCRAFT is indeed a detail tuple type
of ENGINE and can be imbedded.
Resulting new tuple type ENGINE contains all attributes it had before plus the nonkey
attributes of ENGINE_on_AIRCRAFT. Tuples of old tuple type ENGINE that did not have a
counterpart in ENGINE_on_AIRCRAFT do not have values for the attributes added to tuple
type ENGINE. Tuples that had a counterpart in ENGINE_on_AIRCRAFT have values for
the added attributes.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Defining attribute
Entity type or Entity type or not belonging to
relationship type relationship type key has always a
. .m 0. .1
value
Tuple Tuple
Type 1 Type 2
Defining attribute
Entity type or Relationship key = key of source Entity type or not belonging to
relationship type relationship type key has always a
1. .1 0. .1 value
Tuple Tuple
Type 1 Type 2
Defining attribute
Entity type or Entity type or not belonging to
relationship type relationship type key has always a
0. .1 0. .1
value
Tuple Tuple Tuple
Type 1 Type 2 Type 3
OR
Depending on relationship key selected
Figure 7-9. Finding Detail Tuple Types from ER Model CF182.0
Notes:
As for partial tuple types, the entity-relationship model can be used to determine the detail
tuple types that can be imbedded into other tuple types.
In the following cases, the tuple types for entity types or relationship types represent detail
tuple types and can be imbedded in other tuple types:
• One of the tuple types is for a dependent entity type with a cardinality of 0..1 (for the
owning relationship type). The other tuple type may be for an entity type or a relationship
type. In addition, for each instance of the dependent entity type, at least one nonkey
attribute must always have a value. In this case, the tuple type for the dependent entity
type can be imbedded in the tuple type for the parent.
Because of cardinality 0..1 for the dependent entity type, both tuple types have the
primary key: Being a dependent entity type means that the own entity key includes the
key of the parent. Because of maximum cardinality 1, the entity key of the dependent
entity type need not and must not contain additional attributes.
Uempty Being a dependent entity type also means that, for every entity instance, the parent
contains an instance with the corresponding key value. Minimum cardinality 0 permits
that the dependent entity type does not contain an instance for every parent instance.
• One of the tuple types is for a relationship type with cardinality 0..1 for one end (e.g., the
target) and maximum cardinality m for the other end. In this case, the tuple type for the
relationship type can be imbedded in the tuple type for the end with maximum cardinality
m.
Because of the cardinalities, the key for the relationship type consists of the key of the
end with maximum cardinality m. Thus, the corresponding tuple types have the same
primary key.
Cardinality 0..1 permits that the relationship type does not contain an instance for every
instance of the end with maximum cardinality m. Since source and target must exist for a
relationship instance, the end with maximum cardinality m must contain, for every
relationship instance, an instance with the same key value. Since the defining attributes
not being part of the relationship key contain a value for every relationship instance, the
third condition for detail tuple types is automatically satisfied. Thus, the tuple type for the
relationship type is a detail tuple type. It can be imbedded in the tuple type for the end
with maximum cardinality m.
• One of the tuple types is for a relationship type with cardinality 0..1 for one end (e.g., the
target) and cardinality 1..1 for the other end. In addition, the key of the relationship type
has been chosen to be the key of the end with cardinality 1..1. In this case, the tuple type
for the relationship type is a detail tuple type and can be imbedded in the tuple type for
the end with cardinality 1..1.
Since the key of the end with cardinality 1..1 has been chosen as relationship key, the
primary keys of the two tuple types are the same. (Note that there was a choice for the
relationship key because both maximum cardinalities were 1.)
• One of the tuple types is for a relationship type with cardinality 0..1 at both ends. In this
case, the tuple type for the relationship type can be imbedded in the tuple type for the
source or in the tuple type for the target: If the key of the source has been selected as
relationship key, the tuple type for the relationship type can be imbedded in the tuple
type for the source. If the key of the target has been selected as relationship key, the
tuple type for the relationship type can be imbedded in the tuple type for the target.
• As you can imagine, combinations of the above cases may lead to cascaded imbeds of
tuple types.
Theoretically, other cases are possible. However, in cases that are not equivalent to
cascaded imbeds, you should not imbed the detail tuple type. The two tuple types
concerned have nothing to do with each other. Imbedding the detail tuple type leads to
columns in tables that are never used together and, therefore, may negatively impact
performance.
It must be decided from case to case whether or not the combination of the tuple types
should be reflected in the entity-relationship model. If tuple types for relationship types are
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
involved, you do not want to reflect the imbedding of tuple type in the entity-relationship
model. The entity-relationship model would no longer correctly describe the
interrelationships between entity types and relationship types.
As a conclusion of the previous three visuals, you can say:
Tuple types for 1:1 or 1:m relationship types can always be merged or imbedded.
Uempty
Decomposition of Super Tuple Types (1 of 2)
Employee Last Name First Name Date of
Number Birth
4627953 Miller Jonathan 1968-02-29
7003001 Ambrose Anna 1980-05-12
0562091 Repairmaid Susan 1975-03-17
2342007
0491337
Handyman
Miller
Peter
Jack
1974-04-20
1961-07-21
BEFORE
1662951 Smith Joe 1962-09-01
0844092 Ferguson Jane 1965-04-15
EMPLOYEE
Employee Pilot
Number Level
0844092 Copilot
All tuple types have same
1662951 Captain primary key
0491337 Captain
Key values of first tuple type
PILOT occur in at most one of the
other tuple types
Employee Date of All key values of other tuple
Number Certification types occur in first tuple type
2342007 1998-03-31
0562091 1999-02-25
MECHANIC
Notes:
Let T, T1, T2, ..., Tn be tuple types with the following characteristics:
• All tuple types have the same primary key.
• At all times, each primary key value of T occurs in at most one of the tuple types T1
through Tn. This means that the primary key values of T1 through Tn are disjunctive.
• At all times, the primary values of T1 through Tn occur in tuple type T.
By adding the nonkey attributes of T to each of the tuple types T1 through Tn, the primary
key value sets of T and T1 through Tn can be made disjunctive. The tuples of T with
counterparts in T1 through Tn are removed from T and combined with the appropriate
tuples in T1 through Tn.
T1 through Tn are called a (partial) decomposition of T. Since the role of tuple type T has
changed, you should considered renaming it to correctly reflect its changed role.
If, at all times, each primary key value of T occurs in one of the tuple types T1 through Tn,
tuple type T can be eliminated. T1 through Tn then form a perfect decomposition of T.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
The situation described here exists for class (supertype/subtype) structures with exclusive
subtype sets. For this reason, tuple type T is referred to as super tuple type. Is the subtype
set also covering, the super tuple type can be eliminated.
The example on the visual illustrates the tuple types for a class structure with an exclusive,
but not covering, subtype set. The employees of Come Aboard may be pilots, mechanics,
or other types of employees. However, they may not be pilots and mechanics at the same
time. Since pilots or mechanics are employees at the same time, each tuple of PILOT or
MECHANIC has a counterpart in EMPLOYEE.
As illustrated on the next visual, the primary key values of EMPLOYEE, PILOT, and
MECHANIC can be made disjunctive.
Uempty
Decomposition of Super Tuple Types (2 of 2)
AFTER
Notes:
After the decomposition, tuple types PILOT and MECHANIC include all nonkey attributes of
EMPLOYEE (e.g., Last Name, First Name, and Date of Birth). Tuple type EMPLOYEE has
been renamed to OTHER EMPLOYEE to emphasize its changed role. Now, an employee
is either in OTHER EMPLOYEE or in PILOT or in MECHANIC, but not in more than one.
If the employees of Come Aboard could only be pilots or mechanics, tuple type OTHER
EMPLOYEE would not be needed, i.e, tuple type EMPLOYEE were eliminated completely.
You should note that, for the illustrated tuple types, generally, you would not perform a
decomposition of the super tuple type.
If the decomposition is a perfect decomposition, it should be reflected in the
entity-relationship model.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
When combining tuple types, size limitations for the target DBMS may
become effective forcing you to split the tuple type again
Notes:
The merging, imbedding, and decomposition of tuple types described on the preceding
visuals can be performed without the loss of information. However, there are a few things to
be considered which may make you not combine the tuple types:
• Do not combine tuple types which have nothing to do with each other; whose attributes
are never processed together; or whose attributes are only processed together by
business processes that are not performance-critical.
If you combined the tuple types, other critical business processes might experience a
performance degradation. The rows for the appropriate tables would become longer
resulting in fewer rows per page (physical blocks) and, thus, fewer rows per buffer. This
might increase the number of I/O operations required when processing or searching the
table sequentially.
• When imbedding a detail tuple type, other tuple types should not be referentially be
dependent on the detail tuple type. A tuple type is referentially dependent on another
tuple type if the values of one or more of its attributes must always be a subset of the
values of a corresponding set of attributes of the other tuple type.
Uempty If you imbed the referentially dependent tuple type, its referential integrity can no longer
be enforced by means of the referential integrity support of the target database
management system. You then must use other means to ensure the integrity of the data
(e.g., program logic or, if supported, triggers).
• When decomposing a super tuple type, other tuple types should not be referentially
dependent on the super tuple type.
If you decompose the super tuple type, the referential integrity of dependent tuple types
can no longer be enforced by means of the referential integrity support of the target
database management system.
• When combining tuple types, restrictions or limitations for the referential integrity support
of the target database management system may become effective which would not exist
otherwise. These limitations deal with referential cycles and delete-connected tables.
Referential cycles and delete-connected tables will be discussed in a later unit.
These restrictions can also become effective when you merge or imbed the tuple types
for 1:1 or 1:m relationship types and you might consider not to merge or imbed them.
• When combining tuple types, size limitations for the target database management
system may become effective forcing you not to combine the tuple types or to split them
differently.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Possible Consequences
Cannot combine tuple types in a table that could be combined otherwise
Cannot denormalize tuple types
Must perform additional normalizations of tuple types
Must vertically split tuple types
Must horizonally split tuple types
Figure 7-13. Limitations and Consequences CF182.0
Notes:
All database management systems have limitations. The above visual illustrates the typical
limitations.
Most database management systems store the rows for tables into fixed-length pages, i.e.,
blocks of a fixed length. In general, a row must fit into a single page. The page size can be
chosen from predefined values and is the same for all pages of a table (or a set of tables).
For DB2 Universal Database for example, the page size can be 4096, 8192, 16384, or
32768 bytes.
The selection of a page size causes two problems:
• The maximum length of a row is restricted by the chosen page size. As a solution, you
could choose a bigger page size provided the target database management system
supports a bigger page size.
However, for a few exceptional rows, you do not always want to choose a larger page
size. For the direct retrieval of rows, a larger page size may mean that you read more
data than necessary for the majority of rows. The I/O operation for the larger page size
Uempty takes longer resulting in an undesirable performance degradation. Even for sequential
retrieval, a larger page size can negatively impact the overall system performance since
it may hamper concurrent requests for other tables.
• The fixed page size may result in a lot of unused space. Assume that all rows for a table
have the same fixed length and that the length is just a little over half a page. As a
consequence, only a single row fits into a page and nearly half the page is wasted. If the
row size is just over one third of the page size, you wasted about one third of the space,
and so on. The smaller the row size, the less space is wasted.
As a second limitation, there is typically an upper limit for the number of columns that a
table can have.
The third limitation common to all database management systems is that there is an upper
limit for the amount of space a table can occupy. In the course of time, the last two
limitations have been relaxed and will be relaxed even more.
If you hit one of the limitations mentioned above, the consequences are that:
• You cannot combine tuple types that could be combined otherwise.
• You cannot denormalize tuple types although you would like to.
• You must perform additional normalizations you did not want to do.
• You must vertically split tuple types.
• You must horizontally split tuple types.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Denormalization
Aircraft Date Date in Type Model
Number Manufactured Service Code Number
B474001323 1994-10-12 1997-01-01 B747 400
B373004518 1999-02-28 1999-03-15 B737 300
B373004519 1999-03-31 1999-04-20 B737 300
A103000534 1998-05-12 1989-07-21 A310 300
A103003167 1997-08-01 1997-09-01 A310 300
A402004217 1999-10-23 1999-11-15 A340 200
AIRCRAFT Type Model Length Height
Code Number
A340 200 59.40 16.91
A310 300 46.67 15.81
BEFORE B737 300 33.41 11.13
B747 400 70.67 19.33
AIRCRAFT MODEL
Notes:
Let T1 and T2 be tuple types satisfying the following conditions:
• T1 and T2 have different primary keys.
• T1 has a set of attributes corresponding to the primary key of tuple type T2.
• At all times, the primary key of T2 and the corresponding attributes of T1 contain the
same values.
In this case, tuple type T2 can be integrated into tuple type T1 without loss of information
by adding the nonkey attributes of T2 to T1. However, as a consequence, information may
have to be stored redundantly in the integrated attributes.
This process is called denormalization since it represents a conscious violation of the
Second Normal Form or the Third Normal Form.
Frequently, the primary key values of T2 form a superset of the values of the corresponding
attributes of T1. In this case, you must decide if you can do without the tuples of T2 which
do not have a counterpart in T1. This means you accept the loss of information.
Uempty The example on the visual integrates tuple type AIRCRAFT MODEL into tuple type
AIRCRAFT. Together, attributes Type Code and Model Number of tuple type AIRCRAFT
(T1) correspond to the primary key of tuple type AIRCRAFT MODEL (T2). Since the source
cardinality for relationship type AIRCRAFT MODEL_for_AIRCRAFT, in the
entity-relationship model for Come Aboard, is 1..1, there is an aircraft model for every
aircraft. Thus, each value of attribute pair (Type Code, Model Number) in T1 occurs as
primary key value of T2. However, because of target cardinality m for the relationship type,
there need not be an aircraft for every aircraft model. Consequently, AIRCRAFT MODEL
cannot be integrated in AIRCRAFT unless your decision is not to keep information about
aircraft models for which there is not an aircraft.
Since denormalization can be seen as a reversal of normalization, it reintroduces the
problems you tried to solve by normalization:
• Since every primary key value of T2 may occur multiple times in T1, information is
redundantly stored in the resulting combined tuple type. Consequently, you must ensure
that the attributes of T2 added to T1 are changed for all tuples with the same primary
key value of T2 at the same time. This can be achieved by using proper mass UPDATE
SQL statements for the (table of the) resulting tuple type.
Similarly, when adding a new tuple, it must be ensured that redundant information is
consistent with information already contained in existing tuples. This can be achieved by
copying the corresponding information from the existing tuples rather that entering it
again.
To reduce the risk of inconsistent redundant information as much as possible, you
should not allow end users to issue UPDATE or INSERT SQL statements against not
normalized tables. Rather, you should provide front-ends (to be used by the end users)
that include the proper UPDATE and INSERT statements.
• If the last tuple for a former primary key value of T2 is deleted, all T2-related information
for this value is lost. Similarly, you cannot add information about a new primary key value
of T2 without adding T1-related information at the same time.
For the example on the visual, when you delete the last Boeing 747, Model 400 aircraft
(B474001323), the information about the aircraft model is lost as well. Also, as outlined
above, you cannot add information about a new aircraft model without entering
information about an aircraft for that aircraft model at the same time.
When denormalizing tuple types, other tuple types should not be referentially dependent on
the integrated tuple types. Otherwise, the referential integrity of dependent tuple types can
no longer be enforced by means of the referential integrity support of the target database
management system.
If you look at the entity-relationship model for Come Aboard, you will see that entity type
AIRCRAFT MODEL is source or target of many relationship types. This means that many
tuple types are referentially dependent on it. Therefore, you would never integrate tuple
type AIRCRAFT MODEL into tuple type AIRCRAFT.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
It must be decided from case to case, if the denormalization should be reflected in the
entity-relationship model. It should be reflected in the entity-relationship model if it
combines the tuple types for two entity types.
The primary reason for denormalization is performance. However, because of the problems
involved with denormalization, you should investigate very carefully if the gain is worth the
trouble. If the table for the integrated tuple type always contains only a very few rows (e.g.,
just a page), denormalization will not bring a lot. After the first request, the page will be in
the buffers of the target database management system. Immediate subsequent requests
will not require an I/O operation. Also, locating the appropriate rows in the page does not
dramatically add to the processor time. However, to come to a reliable decision, you should
use the tools provided by the target database management system (such as EXPLAIN) to
determine the behavior of critical requests.
Uempty
Vertical Splitting of Tuple Types
Dimensions
Type Model Length Height Wing Net Maximum Cruising Range
Code Number Span Weight Weight Speed
A340 200 59.40 16.91 60.30 156500 274980 890 14800
BEFORE A310
B737
300
300
46.67
33.41
15.81
11.13
43.90
28.88
93710
35805
164400
62820
860
795
9600
4175
B747 400 70.67 19.33 64.31 226237 396890 930 13570
AIRCRAFT MODEL
Notes:
Vertical splitting of a tuple type means that some attributes of the tuple type are moved to a
new tuple type with the same primary key. Of course, you should not arbitrarily split a tuple
type, but rather move attributes that logically belong together to the new tuple type. The
composite attributes for a tuple type identify attributes that belong together. They are a big
help when splitting a tuple type.
Limitations for the target database management system are one reason for splitting tuple
types. Another, equally important, reason are different usage profiles for the attributes of
the tuple type:
• Some attributes are never used together with other attributes.
• Some attributes are used very seldom and, then, together with other attributes, only in
business processes that are not performance-critical.
In these cases, splitting the tuple type may increase the performance of other,
performance-critical, business processes. As a consequence of the splitting, the rows for
the important tables become shorter and more rows will fit into a page. Thus, more rows
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
can be made available with a single I/O operation and kept in buffers of the target database
management system.
You should note, however, that vertical splitting only makes sense if the rows for the
corresponding table are not already very small. Some database management systems limit
the maximum number of rows per page. Thus, if the row size becomes too small, you lose
space without gaining performance.
When vertically splitting a tuple type, you effectively create a new dependent entity type.
The dependent entity type should be reflected in the entity-relationship model.
In the example on the visual, the dimensions for aircraft models are removed from tuple
type AIRCRAFT MODEL. They are moved to a new tuple type called AIRCRAFT MODEL
DIMENSIONS. The dimensions are less frequently used than the weights for the aircraft
models. Dimensions was a composite attribute of old tuple type AIRCRAFT MODEL.
Vertical splitting is the inverse of merging and imbedding of tuple types. If the original tuple
type contained tuples not having a value for any of the removed attributes, the new
dependent tuple type contains fewer tuples than the parent tuple type. You need not and
should not keep tuples just consisting of a value for the primary key and not containing
other useful information.
Uempty
Horizontal Splitting of Tuple Types
Engine Engine Manufacturer Aircraft Engine
Number Type Code Number Position
PW9880193 PW4062 PW B474001323 1
PW9880194 PW4062 PW B474001323 2
PW9880195 PW4062 PW
PW9882345 PW4062 PW B474001323 3
PW9974034 PW4062 PW B474001323 4 BEFORE
A862946RR RB211-254 RR
A59A350RR RB211-254 RR
R375184566 CF6-80C2 GE A103003167 1
R375184567 CF6-80C2 GE
ENGINE R375184568 CF6-80C2 GE A103003167 2
Engine Engine Manufacturer Aircraft Engine
Number Type Code Number Position
PW9880193 PW4062 PW B474001323 1
PW9880194 PW4062 PW B474001323 2
PW9880195 PW4062 PW
PW9882345 PW4062 PW B474001323 3
PW9974034 PW4062 PW B474001323 4
AFTER R375184566 CF6-80C2 GE A103003167 1
R375184567 CF6-80C2 GE
ENGINE R375184568 CF6-80C2 GE A103003167 2
Engine Engine Manufacturer Aircraft Engine
Number Type Code Number Position
A862946RR RB211-254 RR
RETIRED ENGINE A59A350RR RB211-254 RR
Notes:
Horizontal splitting of tuple types means that you partition the tuples of the tuple types.
Basically, you create multiple tuple types with the same attributes as the original tuple type.
Each of the new tuple types contains a part of the tuples of the old tuple type. The new
tuple types are referred to as partitions (of the old tuple type).
How the tuples are partitioned is completely up to the application domain, and you should
consult the application domain expert for advice. The partitioning need not be based on key
ranges for the primary key.
In the example on the visual, the engines are partitioned into active engines and retired
engines. Retired engines are engines permanently taken out of service. Active engines are
engines still used by aircraft, even though they may not be mounted at present. The
appropriate tuple types have been called ENGINE (for the active engines) and RETIRED
ENGINE. We could have called the tuple type for the active engines differently, but it
seemed handy to still call it ENGINE.
As illustrated on the visual, it might happen that, for a partition, some of the attributes do
not assume a value for any of the tuples. These attributes can be dropped from the tuple
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
type. On the visual, this is the case for attributes Aircraft Number and Engine Position of
tuple type RETIRED ENGINE: Retired engines are not and will not be mounted on aircraft.
Especially, if the partitions receive a new meaning, the horizontal splitting should be
reflected in the entity-relationship model.
When horizontally splitting tuple types, other tuple types should not be referentially
dependent on the split tuple type; otherwise, their referential integrity can no longer be
enforced by means of the referential integrity support of the target database management
system.
One reason for the horizontal splitting of tuple types are size limitations for tables. Another
reason may be that you want to assign the tuples for different responsibilities, branches, or
uses to different tables to avoid concurrent access problems.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-37
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
When creating the tables for the target database management system, you must define the
columns for the tables. Defining the columns means that you have to specify a name, a
data type, and some additional column attributes for them. The names for the columns
must follow the rules for the target database management system and must be unique for
each table as was discussed earlier in this unit.
For the data types, you must translate the application-domain specific data types for the
corresponding data elements into data types supported by the target system. Each
database management system provides a set of built-in (standard) data types. For many
columns, the built-in data types are sufficient. For data elements based on abstract data
types, the built-in data types might not be sufficient and additional functions of the target
database management system must be used to simulate the abstract data types as closely
as possible. For now, let us concentrate on the built-in data types. Abstract data types will
be discussed later in this topic.
Most of the database management systems provide built-in data types for character
strings, numeric data, datetime data, and binary strings:
Uempty • The data types for numeric data generally support integers, decimal numbers, and
floating-point numbers of varying sizes. The data types intended for integers generally
have binary representations of two (SMALLINT), four (INTEGER), or eight (BIGINT)
bytes supporting integers of different sizes. Check the reference manuals for your
database management system to determine the data types supported and their value
ranges. , for example, currently does not support BIGINT.
Decimal numbers are generally specified by means of DECIMAL(m[,n]) or
NUMERIC(m[,n]). Both specifications represent the same data type. m specifies the
number of digits and n the number of decimal places. If n is not specified, zero is
assumed, i.e., the numbers are integers. Internally, decimal numbers are mostly stored
in packed format. This means that each digit and the sign occupy half a byte. Again,
check the reference manuals for the supported syntax and the value ranges.
Floating-point numbers are approximations of real numbers. Normally, the target
database management systems support data types for single precision (REAL) and
double precision (DOUBLE). DOUBLE provides a better approximation of the real
numbers, but occupies more storage. In general, the representations occupy four and
eight bytes, respectively. Because of the different internal representations, check the
reference manuals for your database management system for the types supported and
their value ranges.
• The data types for character strings support single-byte character strings and
double-byte character strings. Single-byte character strings are sequences of one-byte
characters. Thus, each byte of the string represents a character of the underlying
character set. Frequently, if the context is clear, the term character string is used to
denote single-byte character strings.
Double-byte character strings are also referred to as graphic strings. They are
sequences of two-byte characters as required, for example, for some Asian character
sets. Thus, every two bytes of the string represents a character of the underlying
character set.
Both for single-byte and double-byte character strings, there are data types for
fixed-length strings, short varying-length strings, and large varying-length strings. The
latter are referred to as character large objects. For single-byte character strings, the
appropriate data types are CHARACTER(), VARCHAR(), and CLOB(). For double-byte
strings, they are GRAPHIC(), VARGRAPHIC(), and DBCLOB(). The maximum length
for the various data types depends on the target database management system. Thus,
check the reference manuals for your database management system to determine the
types and the maximum lengths supported. Character large objects allow millions or
even billions of characters.
• The datetime data types include data types for the date, the time, and timestamps. The
appropriate data types are DATE, TIME, and TIMESTAMP. As usual, the date includes
two digits for the day, two digits for the month, and four digits for the year. The time
includes two digits each for the hour, the minute, and the second. Timestamps include
date, time, and microseconds.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-39
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
• Binary strings can be binary large objects (data type BLOB()). They are strings of bytes.
Unlike character strings which usually contain text data, they are used to hold
nontraditional data such as pictures. The maximum length can be millions or even
billions of bytes.
Here are some design considerations for the built-in data types:
• Data that are numeric should be defined as numeric data and not as character strings
even if you will not perform calculations with them. When the data are defined as
numeric to the database management system, the database management system can
verify the correctness of the data for you. In addition, it can check if the data fall in the
supported or defined (check constraints) value ranges.
If the data are defined as character strings, all characters of the character set are valid
and the business processes must verify the correctness of the data themselves.
• For integer data, you have multiple choices for the data type. If binary integer data types
support the expected value range for the column, choose one of them because
binary-integer operations are generally cheaper. Choose the data type that best fits the
size of your expected data, but make sure that future extensions will not make the data
type obsolete. Rather, choose the next bigger data type. To change the data type
afterwards, you must delete the table and recreate it. This has consequences for the
objects based on the table and for authorizations you have granted.
• For character columns, you may have the choice between CHARACTER and
VARCHAR. If the actual length of the values varies, VARCHAR may save space.
However, you should be aware that the system adds two bytes for storing the length in
case of VARCHAR. Also, VARCHAR may slightly increase the processing time.
Furthermore, programmers do not like to work with varying-length data.
Therefore, only use VARCHAR if the length of the data varies considerably or you do
not have another choice because of the maximum length of the data. As a ballpark
figure, the difference between the average length and the maximum length for the
column should be greater than 25 bytes. The information for the corresponding data
element in the data inventory should tell you this.
If your target system supports compression, the space argument for VARCHAR
disappears and there is even less reason to use VARCHAR if you can use
CHARACTER instead.
• If you have VARCHAR columns, you should define them as last columns of the table to
save processing time. The sequence in which the columns are defined does not
mandate a sequence for their retrieval. For mass retrieval, some database
management systems calculate the offsets of the various columns once and not for
every row retrieved. They can only do this for the columns preceding the first
varying-length column and for the first varying-length column.
Uempty
Column Attributes - Nullable Columns
Columns need not assume a value for every row, i.e., a value need not
necessarily be provided for each row
Characteristic (attribute) for column
Column referred to as nullable
Special indicator used to indicate if the column has a value for a row
For rows without values, column is said to assume a value of NULL
If the value for a column of a row is NULL, no value has been provided for it
Different from 0 (zero) for numeric columns
Different from blanks for fixed-length character columns
Different from a string of length 0 for varying-length character columns
Notes:
For a tuple type, some attributes (e.g., the primary key attributes) need assume a value for
every tuple whereas others need not. To correctly reflect this, it must be possible to specify
for the columns of the corresponding tables whether or not they must assume a value for
every row.
Indeed, it can be specified for a column, as a column attribute (characteristic), whether or
not a value must be provided for every row. A column that need not assume a value for
every row is referred to as a nullable column.
Internally, most database management systems use a special indicator, referred to as null
indicator, to indicate if the column has a value for a row. If a column does not have a value
for a row, it is said that the column has the value NULL for the row. This is a way of
speaking even though it is a contradiction in terms.
If the value for a column is NULL for a row, a value has not been provided for that row. For
numeric columns, this is different from a value of 0 (zero) for the column. For fixed-length
character columns, it is different from a value of all blanks. For varying-length character
columns, it is different from a character string of length 0.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-41
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
In the example on the visual, engine M18940012 has an engine position of 0 meaning that
it is mounted on an aircraft in position 0. It does not mean that the engine is not mounted.
Zero may be a valid engine position.
In contrast, engines PW9880195 and M18940168 have an engine position of NULL. This
means that an engine position has not been provided for them: they are not mounted on an
aircraft.
You should be aware that NULL values may lead to different results for SQL functions or
operations than values of 0 or blanks or strings of length 0. For example, this is the case for
the column functions AVG and COUNT and for Join operations.
Nullable columns occupy a little additional storage and their handling requires a little extra
processing time. However, the additional storage or processing time is insignificant. You
should define columns that, from the perspective of the application domain, may not
contain a value as nullable and not try to save the extra overhead.
Uempty
Nullable Columns and Cardinalities
ENGINE
Engine Number, PK NOT NULL
Engine Type NOT NULL
Manufacturer Code NOT NULL
Aircraft Number [0..1] Nullable
Engine Position [0..1] Nullable
FLIGHT
Flight Number, PK NOT NULL
Airport Code AS From, PK NOT NULL
Airport Code AS To, PK NOT NULL
Flight Locator, PK NOT NULL
Departure AS Planned Departure
Departure Date NOT NULL
Departure Time NOT NULL
Arrival AS Planned Arrival
Arrival Time NOT NULL
Arrival Date NOT NULL
Departure AS Actual Departure [0..1]
Departure Date Nullable
Departure Time Nullable
Arrival AS Actual Arrival [0..1]
Arrival Date Nullable
Arrival Time Nullable
Notes:
As you certainly remember, we have introduced cardinalities for the attributes of tuple
types. The minimum cardinality for an attribute determines whether or not, in the context
used, the attribute must always assume a value. Thus, the minimum cardinalities for the
attributes determine whether or not the corresponding columns must always have a value.
The first example on the visual shows tuple type ENGINE. Its first three attributes do not
have a cardinality specified. This means that their implied cardinality is [1..1]. Since their
minimum cardinality is 1, the attributes and, thus, the corresponding columns must always
have a value. This can be defined by specifying NOT NULL for the columns.
The last two attributes of tuple type ENGINE have a minimum cardinality of 0. Therefore,
they need not assume a value for every tuple. Accordingly, the corresponding columns
need not assume a value for every row, i.e., the columns are nullable. This can be defined
by not specifying NOT NULL for the columns. By default, columns are nullable.
The second example on the visual illustrates the cardinalities for tuple type FLIGHT and
demonstrates that the cardinalities must be interpreted in the context of the comprising
structure: The first four attributes of FLIGHT are elementary attributes of the tuple type and
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-43
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
have an implied minimum cardinality of 1. Since they are direct attributes of the tuple type,
their minimum cardinality determines directly whether or not the corresponding columns
are nullable.
The other elementary attributes are components of composite attributes. All their minimum
cardinalities are 1. The minimum cardinalities of components do not alone determine
whether or not the corresponding columns are nullable. However, if the minimum
cardinality is 0, the column must be nullable.
If the minimum cardinality is 1, the associated column may still have to be nullable. This
depends on the minimum cardinality of the comprising composite attribute, the minimum
cardinality of the composite attribute comprising the composite attribute, and so on. If the
minimum cardinality of the comprising composite attribute is 1 and the composite attribute
is not again a component of another composite attribute, the corresponding column is not
nullable. It must be defined with NOT NULL. If the composite attribute is again contained in
a composite attribute, the minimum cardinality of the latter decides if the column will be
nullable.
If the minimum cardinality of the composite attribute comprising the elementary attribute is
0, the corresponding column must be defined as nullable.
In the example on the visual, composite attributes Planned Departure and Planned Arrival
have a minimum cardinality of 1. Since they are not again components of another
composite attribute, the columns for their elementary attributes must be defined with NOT
NULL. In contrast, composite attributes Actual Departure and Actual Arrival have a
minimum cardinality of 0. Accordingly, the columns associated with their elementary
attributes must be defined as nullable despite of the minimum cardinality of 1 for the
elementary attributes.
This added complexity stems from the fact that relational database management systems
currently do not support composite attributes.
Uempty
Column Attributes - Default Values
Default
Values
System User
Defaults Defaults
Notes:
The discussions about columns that always must have a value or need not have a value for
a row raise some questions:
• Independent of whether or not the column is nullable, what happens if a value is not
provided for a row? Does the system provide a default value?
• For nullable columns, does the column receive the value NULL or another default value?
This and the next visual will answer these questions.
Most target database management systems allow you to specify that a default value should
be assumed if a value is not provided for a row. The default values assumed can be system
defaults or user defaults.
System defaults are default values used by the database management system if:
• The column may assume default values.
• The database administrator has not defined an own default for the column.
• The user has not provided a column value for a row.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-45
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
Selection of Default Values
NULL
Value User
WITH DEFAULT specified provided
value
Must
provide
value
Notes:
When defining a column for a table, you specify if the column may assume default values
and which default value it should assume. This is controlled by the WITH DEFAULT
keywords.
If the column is nullable and you do not specify WITH DEFAULT, the implicit default for the
column is the NULL value. That is, the column will not contain a value for a row, if the user
does not provide a value for the row on inserts.
If you specify WITH DEFAULT for nullable columns, the default value assumed depends on
whether or not you have provided an own default value. If you have not provided a default
value, the column will assume the system default value for the category of column. If you
provide your own default value, you can specify any value compatible with the data type for
the column or explicitly request that the column is set to NULL.
Similarly, for columns that always must assume a value (NOT NULL), you can request that
they assume a default value for a row if a value has not been provided. If you specify WITH
DEFAULT, but do not provide an own default value, the system default for the appropriate
category of data type is assumed. If you provide an own default value, it is assumed.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-47
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Finally, if you do not specify WITH DEFAULT for a column that always must have a value, a
value must be provided for every row inserted; otherwise, the request fails.
Uempty
Considerations for Abstract Data Types
You must ensure that the data for the abstract data type are properly
represented in the database
You must ensure that the data for the abstract data type satisfy any
value and length constraints imposed on them
You must ensure that the desired operations, and only those, can
be performed with data of the abstract data type
Notes:
When implementing abstract data types as discussed in Unit 5 - Data and Process
Inventories, the following considerations apply.
• Each abstract data type has its own set of allowable values and you must ensure that
the values are properly represented in the database of the target database management
system.
In some cases (e.g., for our sample abstract data type called name data), you want to
store the data in a normalized format. Thus, you must ensure that the data in the
database are in the normalized format.
• Abstract data types can be parameterized. In particular, they may allow you to specify
minimum and maximum lengths for each usage by data elements. Thus, when
implementing the abstract data type, you must ensure that the length constraints for
data elements are reflected as constraints for the columns and enforced for each usage.
In addition, the data elements of the application domain may have domains, i.e., value
constraints further restricting the values of their abstract data types. When defining a
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-49
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
column based on such a data element, you must ensure that its value constraints are
adhered to.
• Abstract data types generally provide a set of operations. When implementing an
abstract data type, you must ensure that these operations can be performed. Also, you
want to ensure that other illegal operations cannot be performed with data of the
abstract data type.
• If data can be entered by end users in different formats, but you want to store the data in
a normalized format, you should provide functions converting the external input into the
normalized format. You need these functions for comparing entered data with the stored
data (e.g., in the WHERE clause of SELECT statements). If the data entered were
compared directly with the stored data, you would not necessarily find the requested
data.
Uempty
User Defined Distinct Types
User Defined Distinct Types
Allow you to define your own data types based on built-in data types
Cannot be parameterized
Always have a fixed maximum length
Even if based on a varying-length built-in data type
For a varying-length source data type, the maximum length is the
length specified when the user defined distinct type is created
Cannot specify a different (smaller) maximum length when the user
defined distinct type is used by a column
Must define it with the maximum length intended for any columns
and restrict the actual column lengths by other means
Disallow all operations for source data type except comparisons
Can only compare data of same user-defined distinct type
Cannot compare directly with data of source data type
Must cast to source data type to compare with source data type
Prevents illegal operations and incorrect comparisons
Notes:
User defined distinct types (UDTs) allow you to define your own data types based on the
built-in data types provided by the target database management system. However, they are
fairly simple-minded data types and cannot be parameterized.
When you create a user defined distinct type, you must select a built-in data type.
The built-in data type is referred to as source data type. If the source data type allows you
to specify a length, a number of digits, or a number of decimal places, you must specify the
appropriate values when you create the user defined distinct type.
Even if the user defined distinct type is based on a varying-length built-in data type, you
cannot specify a length later when the user defined distinct type is used as data type for a
column. The maximum length for the column is that defined for the user defined distinct
type. If you want to use the same user defined distinct type for multiple columns, you must
define it with the maximum length for any anticipated columns. You must use other means
to restrict the actual lengths of the columns. Alternatively, use different user defined distinct
types.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-51
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
With the exception of the comparison operations, the user defined distinct type does not
inherit any functions or operations of its source data type. Without further actions, you
cannot use any scalar or column functions for the source data type.
The comparison operations inherited are limited to the comparison of data belonging to the
user defined distinct type. You cannot directly compare data of the user defined distinct
type with data of the source data type. When you create a user defined distinct type, cast
functions are provided allowing you to change source data to data of the user defined
distinct type and vice versa. The cast data can then be compared with data of the
appropriate data type. The cast function changing data of the source data type to data of
the user defined distinct type has the same name as the user defined distinct type:
udt-name(source-data) t udt-data
The cast function changing data of the user defined distinct type to data of the source data
type has the same name as the source data type:
source-name(udt-data) t source-data
By using user defined distinct types, you can prevent illegal operations and incorrect
comparisons for columns of the same source data type having different semantics.
User defined distinct types are not supported by all target database management systems.
Uempty
User Defined Distinct Types - Example
Different user
defined distinct types ILLEGAL!!!
Notes:
The example on the visual creates two user defined distinct types: One user defined
distinct type is based on built-in data type DECIMAL and is intended to represent
measurements in meters; the other is based on built-in data type INTEGER and is
supposed to represent measurements in centimeters. Their names are METER and CM,
respectively.
When creating the table for tuple type AIRCRAFT MODEL, columns Length_of_Model and
Height_of_Model are defined with user defined distinct type CM. Column Wing_Span is
defined with user defined distinct type METER. (Note that you should really have defined
all three dimensions with the same user defined distinct type.)
If you want to determine all aircraft models whose length is smaller than their wing span,
you cannot specify Length_of_Model < Wing_Span in the WHERE clause of the SELECT
statement. This is because the user defined distinct types of Length_of_Model and
Wing_Span are different. The comparison would indeed provide an incorrect result and is,
therefore, considered illegal.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-53
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
For a valid and correct comparison, you must cast the two columns to their source data
types and convert meters to centimeters in the WHERE clause:
WHERE INTEGER(Length_of_Model) < 100 * DECIMAL(Wing_Span)
Uempty
User Defined Functions (UDFs)
User Defined
Functions
External Sourced
Functions Functions
Allow you to write your own functions Scalar functions are passed arguments
To be used in SQL DML statements and return a single value
To be used in SQL DDL statements
Column functions are passed a column
External functions are based on and return a single value
programs written by you
Sourced functions based on existing Table functions are passed arguments
built-in or user defined functions and return a table
Allow to extend existing functions to One row for each invocation
new user defined distinct types Can only be used in FROM clause
Notes:
User defined functions (UDFs) allow you to write your own functions for the usage in SQL
statements. The user defined functions provided by you can be used in Data Manipulation
Language (DML) statements or Data Definition Language (DDL) statements. DML
statements are SELECT, INSERT, UPDATE, or DELETE. DDL statements are SQL
statements creating, altering, and deleting database objects, such as tables, indexes, user
defined distinct types, or user defined functions.
User defined functions can either be external functions or sourced functions. External
functions are based on programs, written in any of the programming languages supported
by the target database management system, that you provide. Of course, the functions
have to follow certain conventions concerning the passing and returning of arguments, but,
in the programs, you can pretty much do what you want. Depending on the database
management system, you may even issue SQL statements.
Sourced functions are based on existing built-in (system provided) functions or existing
user defined functions. Their primary purpose is to extend existing functions (e.g., the AVG
function or the LENGTH function) for the source data type to a newly created user defined
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-55
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
distinct type. They also allow you to rename an existing built-in function or user defined
function.
User defined functions can be scalar functions, column functions, or table functions. Scalar
functions are passed a set of arguments and return a single value. An example of a built-in
scalar function is the LENGTH function which returns the length of the expression
(argument) passed to it.
Column functions are passed the values of a column (or a subset thereof) and return a
single value which generally is derived from the values of the column. An example of a
built-in column function is the MIN function which returns the minimum of the column
values passed to it.
Table functions are passed a set of arguments and return a table row for each invocation.
They can only be used in the FROM clause of SELECT statements.
External functions can either be scalar functions or table functions. They cannot be column
functions. Sourced functions can only be scalar functions or table functions.
You can overload functions. You can define multiple functions with the same name as long
as the signatures of the various functions are different. This means that the data type of at
least one parameter must be different. Based on the data types of the arguments passed,
the database management system is capable of selecting the proper function.
User defined functions are not supported by all target database management systems.
Uempty
UDFs - Definition and Invocation
CREATE DISTINCT TYPE TEXTDATA
AS VARCHAR(100) Checks text data string
WITH COMPARISONS for correctness and
converts it into stored
CREATE FUNCTION NORM(TEXTDATA) format (normalizes it)
RETURNS TEXTDATA
EXTERNAL NAME 'program'
LANGUAGE programming-language
... Program Library
CREATE FUNCTION
SUBSTR(TEXTDATA, INTEGER, INTEGER) Program
RETURNS VARCHAR(100)
SOURCE
SYSIBM.SUBSTR(VARCHAR(), INTEGER, INTEGER)
Notes:
This visual illustrates the definition of an external scalar and a sourced scalar user defined
function using user defined distinct type TEXTDATA.
The first user defined function, called NORM, checks data of user defined distinct type
TEXTDATA passed to it for correctness (it may only contain certain characters) and
converts it into a normalized text-data format.
Since the function is passed arguments and returns a single value, it is a scalar function.
When you define a function, you must describe the signature of the function. You must
specify:
• The name of the function.
• The data type(s) (including lengths) of the arguments passed to the function (i.e., of the
parameters for the function) or of the column passed. The latter applies to column
functions.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-57
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
You must also describe the output returned. For scalar or column functions, you must
specify the data type of the value returned. For table functions, you must specify the names
and the data types of the columns returned.
Function NORM is an external scalar function since it is based on a user-provided program.
To allow the database management system to establish the connection to the program
when the function is used, the object program to be executed must be identified when the
function is defined. So must be the programming language in which the program has been
written.
The second function on the visual is a sourced function extending built-in function SUBSTR
to text data. When you define it, you must again specify its signature and output for the new
data type. Furthermore, you must tell the system on which existing function it is based
(SOURCE). For the source function, you must provide its signature as well. For the
parameters of the source function, you need not provide lengths or decimal places since
they are already known to the system. However, you must specify the enclosing
parentheses if the data type has parameters.
The qualifier SYSIBM in the example identifies the source function as a built-in function of
an IBM database management system.
A user defined function is invoked by specifying its name followed, in parentheses, by the
arguments passed to the function.
Uempty
Check Constraints
Allow you to restrict acceptable values for columns
Can be defined on column or table level
On column level, to restrict accepted values for column concerned
On table level, to restrict accepted values for columns of table in
relationship to each other
Basically, check expression is a search condition evaluating to true,
false, or unknown
Predicates can be combined by AND and OR
Restrictions for check expressions depending on target DBMS
For example, for DB2 UDB for OS/390
Subselects not allowed
Built-in or user-defined functions not allowed
EXISTS and quantified predicates not allowed
CASE expressions not allowed
First operand of predicate must be a column
For example, for DB2 UDB for UNIX- and Intel-Based Platforms
Subselects not allowed
Some restrictions on use of user-defined functions
Enforced during the insertion, updating, and loading of rows
Notes:
Check constraints allow you to restrict the accepted values for columns of tables beyond
the values permitted by the column's data type.
Check constraints can be defined on the column level or on the table level. This means that
they can be defined for a particular column or for the table as such. When a check
constraint is defined for a column, it can just restrict the values for the column concerned.
References to other columns are not allowed.
In contrast, a check constraint that is defined on the table level can refer to any defined
column of the table. Thus, it can restrict the values of columns in relationship to each other.
For example, you may enforce that the values of a column must be existing values of
another column.
Check constraints are using check expressions. Basically, a check expression is a search
condition evaluating to true, false, or unknown. It may consists of predicates combined by
the logical operators AND and OR. A predicate specifies a condition that is true, false, or
unknown. The result is unknown, for example, if comparing with a NULL value.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-59
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
If the check expression for a constraint evaluates to true or unknown, the constraint is
considered as satisfied.
Partially, the database management systems have severe restrictions for the check
expressions of check constraints. The visual lists some for DB2 Universal Database for
z/OS and DB2 Universal Database for UNIX- and Intel-Based Platforms. For the precise
restrictions, see the reference manuals for your database management system.
Check constraints are enforced during the insertion, updating, and loading of rows.
Check constraints need not be defined when the table is created. They can be added later.
However, they are only enforced during subsequent operations. Existing rows are not
automatically rechecked when a check constraint is added.
Uempty
Check Constraints - Examples
Notes:
The first example on the visual illustrates how abstract data type AIRPORT CODE defined
in Unit 5 - Data and Process Inventories could be implemented. The abstract data type has
a finite set of values, namely, the three-letter codes for airports. Columns of the abstract
data type could be defined as 3-character columns with the check constraint shown on the
visual. The check expression for the check constraint uses the IN predicate listing the valid
character strings. On the visual, only a few values are shown as indicated by the ellipsis.
The second example on the visual implements the domain for data element Number of
Engines defined in Unit 5 - Data and Process Inventories. It uses the BETWEEN predicate
to enforce that the values for column Number_of_Engines, i.e., the number of engines for
an aircraft type, are between 0 and 4.
Note that check constraints can be named.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-61
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Triggers
A trigger is a set of actions to be
Trigger performed when a specific event
occurs
Notes:
A trigger defines a set of actions to be performed when a specific event occurs. Triggers are
defined for tables. The execution of the actions for the trigger can be triggered by insert,
update, or delete operations on the table for the trigger.
Triggers can be used to cause updates to other tables; automatically generate or transform
values for inserted or updated rows; or invoke functions to perform tasks such as issuing
alerts.
Triggers are a useful mechanism to define and enforce transitional business rules, i.e.,
rules involving different states of the data. Using triggers places the logic to enforce the
business rules in the database and relieves the business processes using the tables from
having to enforce it. Centralized logic means easier maintenance since no program
changes are required when the logic changes.
The following items must be considered when defining a trigger:
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-63
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Triggers can reference the values of the affected rows. They can refer to the values before
the execution (update or delete operations) and/or after the execution (update or insert
operations) of the triggering SQL operation. The appropriate version of the data (OLD or
NEW) can be identified by means of the REFERENCING clause when defining the trigger.
As mentioned for the previous visual, before triggers can change the values of columns of
the affected rows. They can do this by setting transition variables via the SET transition-
variable SQL statement. Transition variables use the names of the columns, qualified by a
correlation name assigned to the version of the data via the REFERENCING clause. The
SET transition-variable SQL statement is also referred to as SET assignment SQL
statement.
In contrast to check expressions which, most of the time, are more restrictive, triggers can
generally use built-in functions and user defined functions. The functions can be used by
the search condition of the WHEN clause as well as by the triggered actions.
Uempty The actions of after triggers can cause other triggers to fire, namely, triggers for the tables
maintained by the triggered actions. Since INSERT, UPDATE, and DELETE statements are
not permitted for before triggers, they cannot cause other triggers to fire.
Multiple triggers can be defined for the same table. You can even define multiple triggers for
the same event. If multiple triggers are defined for the same event, the trigger created first
fires first.
There is one drawback associated with triggers: triggers are not effective during the loading
of data.
Not all of the target database management systems support triggers. The various database
management systems supporting triggers may have restrictions. However, in general, the
restrictions are minor and less severe than the restrictions for check constraints.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-65
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Now, we want to illustrate the implementation of a sample abstract data type. We have
chosen abstract data type Name Data described in Unit 5 - Data and Process Inventories.
Its description is repeated on the visual. Its values consist of strings of letters, blanks, and
single dashes (-) or periods (.).
There are two operations defined for the abstract data type. The Normalization operation
(NORM) removes all leading and training blanks from a name-data string; reduces
intermediate groups of blanks to a single blank each; and uppercases all letters. In other
words, it produces a normalized version of the string.
The Equal Comparison operation (EQUAL) defines when two name-data strings are
considered equal. They are considered equal if their normalized versions are the same.
As you can see from the signature of the data type, it is parameterized. For a data element
using it, the minimum length and the maximum length of the accepted strings can be
specified.
In the database, we want to store all data in the normalized format.
Uempty
Setting Up the Abstract Data Type
Normalization function
CREATE FUNCTION Checks name data string for valid name
NORM(NAMEDATA) data
RETURNS NAMEDATA
Returns nonzero SQL state if not valid
EXTERNAL NAME 'program' name data
LANGUAGE programming-language
... Returns zero SQL state and normalized
name data string otherwise
CREATE FUNCTION
Extends LENGTH built-in function to
LENGTH(NAMEDATA) name data
RETURNS INTEGER
Required for enforcing length ranges for
SOURCE columns
SYSIBM.LENGTH(VARCHAR())
Notes:
The approach chosen for the implementation of the abstract data type uses a user defined
distinct type for the abstract data type because we want to discuss some related problems.
It prevents the comparison of character strings that are not name data with name data. It
would be possible to implement the abstract data type without a user defined distinct type
which has some advantages, but also some disadvantages.
First, we define a user defined distinct type called NAMEDATA consisting of varying-length
character strings. When defining the user defined distinct type, you must provide a
maximum length for the source data type. Since the abstract data type is parameterized,
we need to specify the maximum length that any columns using it may have. However, you
must choose the maximum length carefully to ensure that the rows for the tables will fit into
the pages for the tables. The system will enforce this when the tables are created. Thus,
the lengths of the candidate columns should not vary too much.
In the example, we have restricted the maximum length of NAMEDATA columns to 100
characters. The columns using the data type may use smaller maximum lengths. We must
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-67
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
enforce the length ranges for the columns by other means. We will see later on how this
can be achieved.
Next, we define a user defined function, called NORM, corresponding to the Normalization
function. However, it is not quite the Normalization function since it performs some
additional validity checking. The function is an external scalar function accepting strings of
user defined distinct type NAMEDATA. It checks if the input string has a valid name-data
format, i.e., only contains letters (small or capital), blanks, and single dashes or periods. If
the string is invalid, a nonzero SQL state is returned by the function.
If the input string is valid, the function returns a zero SQL state and converts the input string
to its normalized name-data format. The data type for the output is NAMEDATA, the user
defined distinct type. Note that user defined functions must return an SQL state in
addition to the output described in their definition. The SQL state is checked by the target
database management system to determine if to continue or terminate the operation being
performed.
The function does not immediately accept variable-length character strings that are not of
type NAMEDATA. If you want to use it to convert other character strings to normalized
name data, you must first apply the system-provided cast function for user defined distinct
type NAMEDATA:
NORM(NAMEDATA(character-string))
On page 7-55, we defined a user defined function NORM whose only input parameter was
of type TEXTDATA, another user defined distinct type. Note that both user defined
functions may exist at the same time because their signatures are different.
Since the enforcement of the length ranges for the columns needs to determine the length
of input data, we must extend the LENGTH built-in function to user defined distinct type
NAMEDATA. This is done by the second user defined function on the visual, a sourced
scalar function.
Uempty
INSERT Triggers for Abstract Data Type
2 MODE DB2SQL
WHEN ( LENGTH(N.column-name)
NOT BETWEEN minimum-length AND maximum-length )
BEGIN ATOMIC
SIGNAL SQLSTATE '72001' ('INVALID COLUMN LENGTH');
END
Notes:
By means of the user defined functions on the previous visual, we can enforce that:
• The data of the column is always valid, i.e., only contains characters and character
sequences permitted for the abstract data type.
• The data of the column is stored in normalized format: leading and trailing blanks are
removed, intermediate blanks are reduced to a single blank each, and alphabetical
characters are in upper case.
• The minimum length and the maximum length for the column are observed.
For insert operations, this can be achieved by the two triggers on this visual. The triggers
are defined for each table containing name-data columns. The table must be created
before the triggers for the table can be defined.
Symbolic variables (in italics) are used in the CREATE TRIGGER statements on the visual.
If you want to create the triggers, you must replace them by the actually applicable values.
Table-name, column-name, minimum-length, and maximum-length must be replaced by the
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-69
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
name of the table, the name of the column, the minimum length for the column, and the
maximum length, respectively.
Both triggers are activated for each row to be inserted. They are activated before the row is
inserted. The first trigger (INSNAME1) uses user defined function NORM, in a SET
transition-variable SQL statement, to verify the correctness of the input string and to
normalize it. You need the correlation name defined via the REFERENCING clause on both
sides of the equal sign. On the right-hand side, you need it for the user defined function to
refer to the entered value for the new row. On the left-hand side, you need it because you
are changing the column value for the new row.
If the user defined function returns a zero SQL state, the value of the column for the row
becomes the normalized string and this will be the value inserted. If the user defined
function returns a nonzero SQL state, the SET transition-variable SQL statement fails and
the INSERT statement fails.
Note that the input string for the column is of type NAMEDATA when the trigger receives it.
A character string entered as input for the column in the INSERT statement is converted to
type NAMEDATA by the cast function for the user defined distinct type.
The second trigger (INSNAME2) ensures that the length range for the column is enforced
during insert operations. The WHEN clause checks if the length of the column is outside
the range defined by minimum-length and maximum-length. If it is outside, the WHEN
condition evaluates to true and a nonzero SQL state is signaled by means of the SIGNAL
SQLSTATE SQL statement. The nonzero SQL state causes the INSERT statement to
terminate.
The sequence in which the triggers are defined is relevant. The triggers must be created in
the sequence on the visual. As a consequence, the length check is performed for the
normalized string (which may be shorter) and not for the original input string.
You may ask if it were not possible to use a check constraint for the column instead of the
second trigger? The restrictions for check constraints are generally more severe than those
for triggers and your database management system may not allow you to use an equivalent
check constraint. For example, Version 6 of DB2 Universal Database for z/OS does not
allow you to use built-in functions or user defined functions in check expressions. In
addition, the result would not be quite the same. A check expression would verify the length
of the unnormalized string whereas the trigger verifies the length of the normalized string.
If you have multiple name-data columns for a table, you need not have two triggers for each
column. In the first trigger, you can use multiple SET transition-variable SQL statements as
triggered actions to check and normalize all columns. In the second trigger, you can
combine the length checks for all columns by logical ORs.
Uempty
UPDATE Triggers for Abstract Data Type
2 MODE DB2SQL
WHEN ( LENGTH(N.column-name)
length and sets SQL state
Notes:
This visual illustrates the triggers needed for update operations to ensure the correctness
of the new column values; to ensure the observance of length constraints for the column;
and to store the new column values in normalized format.
Both triggers are activated for each row before the row is updated. They are only activated
if the appropriate column is updated (UPDATE OF column-name). Otherwise, the same
remarks apply as to the triggers for insert operations.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-71
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Length checks
INSERT INTO table-name ( . . . , column-name , . . . ) by second
VALUES( . . . , 'wright bros.' , . . . ) trigger
'WRIGHT BROS.'
UPDATE table-name
SET column-name = 'wright bros..'
Notes:
The above visual illustrates the flow of control and the conversions of input during insert
and update operations.
If a character string is assigned to a field defined as NAMEDATA, the system-provided cast
function for the user defined distinct type is automatically invoked. Thus, you need not
invoke it yourself. It casts the character string to user defined distinct type NAMEDATA, the
data type of the input parameter for user defined function NORM.
Next, the first (insert or update) trigger is activated which uses user defined function NORM
to normalize the value and assigns the normalized value to the column for the row. If the
value passes the length checks of the second trigger, the row is stored with the normalized
value for the column.
In the first example, an insert request, string 'wright bros.' is converted to
'WRIGHT BROS.' which then is assigned to the column and stored since it passes the
length checks.
Uempty The second example on the visual illustrates, for an update request, what happens if the
new value for a column, defined as NAMEDATA, is invalid. User defined function NORM,
called by the SET transition-variable SQL statement of the triggered action, determines that
the input string does not have a valid name-data format (two successive periods). It returns
a nonzero SQL state which is passed on by the trigger and causes the update request to
fail.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-73
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
SELECT . . .
FROM table-name
WHERE column-name = NORM ( NAMEDATA ( string ) )
Notes:
To retrieve specific rows based on a search condition for a column defined as NAMEDATA,
you must use both the system-provided cast function and user defined function NORM.
As described for user defined distinct types, you cannot directly compare values of a
column of a user defined distinct type with values of the source type. Accordingly, you
cannot directly compare the values of a column of user defined distinct type NAMEDATA
with a character string. You must first convert the character string to user defined distinct
type NAMEDATA. Furthermore, the input string should be normalized to ensure that the
corresponding rows are found in the table independent of the way they have been entered.
Both is achieved by first applying system-provided cast function NAMEDATA to the string
and then user defined function NORM:
NORM(NAMEDATA(string))
System-provided cast function NAMEDATA casts the string to user defined distinct type
NAMEDATA. Only then, user defined function NORM can be applied since its input must be
of type NAMEDATA. You cannot apply user defined function NORM directly to the
Uempty character string. Since the output of NORM is of type NAMEDATA, it can be compared with
the values of the column.
To avoid the invocation of two functions, you could define an additional user defined
function whose input parameter is of type VARCHAR(); whose output is of type
NAMEDATA; and which normalizes the input string.
Note that you will receive a operands-not-comparable SQL code when immediately
comparing the character string with the values of the columns.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-75
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
An Alternate Implementation (1 of 2)
CREATE FUNCTION
NAMEDATA(VARCHAR(100)) Maximum length of
RETURNS VARCHAR(100) any name-data
EXTERNAL NAME 'program' columns intended
LANGUAGE programming-language
...
Notes:
This visual and the next illustrate an alternate implementation for abstract data type
NAMEDATA. The implementation does not use a user defined distinct type. The name-data
columns for a table are defined with built-in data type VARCHAR(). As length of the
column, the actual maximum length for the column is chosen.
As before, we need a user defined function checking the correctness of input for the
columns and normalizing the input strings. This time, we call the function NAMEDATA. The
data type for its only parameter as well as for its output is VARCHAR(). As length, we use
the maximum length for any anticipated name-data column. This allows us to use the same
function for all columns.
Because we do not use a user defined distinct type, we need not define a sourced user
defined function LENGTH. For determining the length of strings, we can use the LENGTH
built-in function.
Uempty
An Alternate Implementation (2 of 2)
Triggers use function NAMEDATA and need only check for minimum length
CREATE TRIGGER INSNAME1
NO CASCADE BEFORE INSERT ON table-name
REFERENCING NEW AS N
END
Notes:
The triggers needed are basically the same as for the other solution. The only differences
are:
• User defined function NAMEDATA is used instead of user defined function NORM.
• The second trigger only needs to check the minimum length. The maximum length is
enforced by the column length.
Again, you need two triggers for insert and update operations each.
On SELECT statements, you use the NAMEDATA function to normalize the search string.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-77
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
Frequently, the columns of tables contain a well-defined, previously known small set of
values. For table SEAT on the top of the visual, this is the case for columns
SEAT_LOCATION, SEAT_CLASS, and SECTION.
To save space, frequently, smaller tokens (frequently numbers) are stored in the table
instead of the lengthy actual values. Descriptions for the tokens are kept in separate tables
as illustrated in the lower part of the visual. The descriptive tables are referred to as token
translation tables.
To display the rows of the main table with the actual values and not with the tokens, you
need Join operations to fill in the actual values. Even though the token translation tables
are small compared to the main table and their rows will probably be in the buffers of the
database management system, the Join operations may create a performance problem. In
addition, the Join operations will complicate the retrieval of the rows. Furthermore, the
number of tables that can be joined is generally limited.
The use of token translation tables is certainly not recommendable if compression is used
since the savings in this case do not warrant the performance degradation and effort.
Uempty
Token Translation Tables - An Alternative
Notes:
Instead of token translation tables, you can use check constraints in conjunction with CASE
expressions to achieve the same space savings without the problems of Joins.
For a column concerned, you can provide a check expression using the IN predicate to list
all allowed tokens. Using a check expression ensures that only correct values are in the
columns.
On retrieval, you use a CASE expression when selecting the column. The CASE
expression allows you to translate the tokens into the actual values that should be returned.
There is one disadvantage with this method you should be aware of: When new values are
added, you must change the SELECT statements. If they are contained in views, you must
drop the views. The consequence is that authorizations for the views are lost and must be
reestablished. There is not a problem with the check constraints because they can be
deleted and added again without impact.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-79
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-81
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
For user defined distinct types, you just need to provide their name and source data type
and a description for which types of data (columns) they should be used. For the source
data type, the (maximum) length, the number of digits, and/or the number of decimal places
must be provided in accordance with the requirements for the source data type.
To use a user defined distinct type with a varying-length source data type for multiple
columns, you must specify the maximum length of any columns using it.
Uempty
Documenting User Defined Functions (1 of 2)
For each user defined function:
Notes:
The documentation for a user defined function includes:
• Name, signature, and output returned by the user defined function.
• The category of the user defined function (scalar function, column function, or table
function).
• The type of the user defined function (external or sourced).
• A textual description of the user defined function.
• For an external function, name, location, and programming language for the object
program used by the user defined function.
• For a sourced user defined function, the built-in or user defined function on which the
user defined function is sourced including the appropriate parameters.
The items are described on this visual and the next. For the name, signature, and the
output returned, all relevant information is contained on the current visual.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-83
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The textual description should outline in detail what the function does. This is especially
important for external functions. The description should include any SQL states returned
and their meaning.
For the program source, the name and library for the object program (load module or DLL),
should be provided if they are already known. The object program is invoked by the user
defined function, not the source program. (Note that the title of the item is Program Source
and not Source Program.) At the time the function is documented, some of the information
for this item may not yet be available. However, you can already select a name for the
object program. The missing information must be provided later.
Uempty
Documenting Check Constraints
Notes:
For check constraints, the following items need be documented:
• The name of the table to which the constraint applies.
• If the constraint applies to a particular column, the name of the column to which it
applies.
This item is not applicable to check constraints defined on the table level.
• Although the database management systems do not generally force you to specify a
name for a check constraint, you should give a name to each check constraint. This
eases the maintenance of check constraints.
The names for check constraints need only be unique for each table. Nevertheless, it is
recommended that you use unique names for all check constraints of your application
domain.
• A detailed textual description outlining what the check constraint achieves.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-85
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
• The search condition for the check constraint. When specifying the search condition for
the check constraint, verify with your database administrator that it can be implemented,
i.e., only uses functions supported by your database management system.
Uempty
Documenting Tables - Table Info (1 of 2)
For each table:
Table Name: The name of the table in the target DBMS. The maximum
length for table names depends on the target DBMS
Table Long Optional. An additional long name for the table referred to
Name: as label. Can be stored in system tables. Cannot be used
in SQL statements
Notes:
The information to be documented for a table can be subdivided into table-related
information and column-related information. The current visual and the next describe the
table-related information to be documented.
The long table name can be stored into the system tables for the target database
management system by means of the LABEL ON TABLE SQL statement if that is
supported by the target database management system. The description can be stored by
means of the COMMENT ON TABLE SQL statement if that is supported by the target
database management system.
If the primary key consists of multiple columns, it is important to establish and specify the
logical sequence of the columns within the primary key. This will become relevant when
talking about foreign keys in a later unit.
Under the heading Check Constraints, only constraints should be listed that are not column
specific. The column-specific check constraints are listed for the columns.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-87
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
The items on this visual represent information the database administrator needs to know
for the assignment of primary and secondary allocation units and for scheduling
reorganizations.
Length changes for rows during updates may cause rows for tables to relocated. This may
lead to indirect accesses decreasing performance.
Uempty
Documenting Tables - Column Information
For each column of a table:
Notes:
The long column name can be stored into the system tables for the target database
management system by means of the LABEL ON [COLUMN] SQL statement if supported
by the target database management system. The description can be stored by means of
the COMMENT ON [COLUMN] SQL statement if that is supported by the target database
management system.
Under the heading Check Constraints, constraints just involving the column are to be listed
and not table-level constraints, i.e., constraints that involve multiple columns.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-89
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Documenting Triggers
For each trigger:
Notes:
For each trigger, all the items we discussed in detail should be documented. Verify with
your database administrator that the search condition for the WHEN clause of the trigger
can be implemented, i.e., only uses functions supported by your database management
system. Also verify that the intended actions are supported by the target database
management system.
Uempty Checkpoint
3. When can you imbed a tuple type into another tuple type?
_____________________________________________________
_____________________________________________________
_____________________________________________________
4. The tuple types for 1:1 or 1:m relationship types can always be
merged or imbedded. (T/F)
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-91
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
6. Give two reasons why you may not want to combine two tuple
types that theoretically could be combined.
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
12. Horizontal splitting of a tuple type always creates tuple types for
different primary key ranges of the original tuple type. (T/F)
Uempty 13. Match the following categories with the listed built-in data types:
a. Binary integers ____ VARCHAR
b. Decimal numbers ____ INTEGER
c. Floating-point numbers ____ REAL
d. Binary strings ____ DECIMAL
e. Datetime data ____ BIGINT
f. Single-byte character strings ____ BLOB
g. Double-byte character strings ____ DATE
____ SMALLINT
____ CHARACTER
____ DOUBLE
____ GRAPHIC
____ CLOB
____ NUMERIC
____ TIMESTAMP
14. For varying-length character strings, a value of NULL has the same
meaning as a string of length 0. (T/F)
16. Describe the difference between system default values and user
default values.
_____________________________________________________
_____________________________________________________
_____________________________________________________
17. How can you provide your own default value for a column.
_____________________________________________________
_____________________________________________________
_____________________________________________________
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-93
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
18. User defined distinct types must be based on built-in data types.
They cannot be based on other user defined distinct types. (T/F)
20. Describe the difference between external and sourced user defined
functions.
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
24. Check constraints defined on the column level may refer to other
columns of the table. (T/F)
27. A trigger can be executed for each row processed or once for the
triggering SQL statement. (T/F)
29. Which of the following SQL statements are allowed for before
triggers?
a. Fullselects.
b. INSERT statements.
c. UPDATE statements.
d. DELETE statements.
e. SIGNAL SQLSTATE statements.
f. SET transition-variable statements.
30. Which of the following SQL statements are allowed for after
triggers?
a. Fullselects.
b. INSERT statements.
c. UPDATE statements.
d. DELETE statements.
e. SIGNAL SQLSTATE statements.
f. SET transition-variable statements.
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-95
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
31. Trigger can change the values of columns before they are stored.
(T/F)
Uempty
Unit Summary (1 of 3)
Tuple types with always corresponding primary key values can be merged
Tuple types whose primary key values are always a subset of the primary
key values of another tuple type can be imbedded in the other tuple type if:
For each potential tuple, at least one nonkey attribute has a value
Tuple types for 1:1 and 1:m relationship types can always be merged or
imbedded
Tuple type for a supertype with an exclusive and covering subtype set can
be eliminated (perfect decomposition)
You do not always want to combine tuple types
If tuple types have nothing to do with each other
If other tuple types referentially dependent on tuple type to be eliminated
If restrictions for database management system become effective
If necessary for performance reasons, tuple types can be denormalized
(Re)introduces problems for not normalized tuple types
For performance reasons or because of limitations for the target DBMS, you
may want to split tuple types vertically or horizontally
Notes:
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-97
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit Summary (2 of 3)
Tuple types are converted into tables as follows:
Each tuple type becomes a table
Each elementary attribute becomes a column
Each elementary primary key attribute becomes a primary key column
Built-in data types include data types for:
Numeric data: INTEGER, SMALLINT, BIGINT, DECIMAL, NUMERIC,
REAL, DOUBLE
Single-byte character strings: CHARACTER, VARCHAR, CLOB
Double-byte character strings: GRAPHIC, VARGRAPHIC, DBCLOB
Datetime data: DATE, TIME, TIMESTAMP
Binary strings: BLOB
Columns can be defined as nullable or NOT NULL
Nullable: Column need not assume a value for every row
NOT NULL: Column must assume a value for every row
Columns can assume system-provided or user-provided default values
To implement an abstract data type, you need:
User defined distinct types
User defined functions
Check constraints and/or triggers
Notes:
Uempty
Unit Summary (3 of 3)
Notes:
© Copyright IBM Corp. 2000, 2002 Unit 7. From Tuple Types to Tables 7-99
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit Objectives
After completion of this unit, you should be able to:
Notes:
This unit discusses the different types of integrity that must be enforced for a database.
They are:
• Referential integrity
• Domain integrity
• Redundancy integrity
• Constraint integrity
The unit will describe: the integrity rules that can be enforced to achieve referential
integrity; how the referential constraints can be implemented; and how to establish the
referential structure for an application domain. The referential structure provides a
graphical overview of the referential constraints.
The unit will not discuss domain integrity in detail since it has been discussed by the
previous unit. It will discuss how the integrity of redundant information can be achieved.
Furthermore, the unit will explain how constraint integrity can be enforced, i.e., how the
business constraints for an application domain can be implemented.
Tuple Types
Tables
Logical Data Structures
Integrity Rules
Notes:
This unit deals with the establishment of the integrity rules for the database being
designed. Thus, we are in the third step of storage view.
Uempty
Integrity - Areas of Concern and Types
Notes:
As we have seen, multiple tables are using the same columns. For example, column
Type_Code occurs in tables AIRCRAFT_TYPE and AIRCRAFT_MODEL. The values it can
assume in table AIRCRAFT_MODEL are dependent on the values of the column in table
AIRCRAFT_TYPE since they are references to rows in table AIRCRAFT_TYPE. Therefore,
they must always be a subset of the current values of column Type_Code in
AIRCRAFT_TYPE.
It is a concern of database design to ensure that references to other tables are always
correct. The appropriate integrity is referred to as referential integrity.
A similar concern is the correctness of the values in the columns of the tables. A column
must only assume values allowed by the abstract data type for the data element associated
with the column. Furthermore, the values must be within the limits defined by the domain
for the data element.
Column Type_Code mentioned above must only assume 3-letter codes for valid airports.
Similarly, column Number_of_Engines for table AIRCRAFT_TYPE must only assume
integer values between 0 and 4.
The corresponding type of integrity is referred to as value integrity or, more commonly,
domain integrity.
The third cause for concern is the redundant storage of information in the tables of the
application domain. Redundancy can occur as the consequence of the repetitive storage of
data or the storage of data that can be derived from other stored data (derivable data). If
redundancy cannot be avoided (e.g., because of performance reasons), the redundant
information on whose currency business processes are dependent must be consistent at
all times.
The corresponding type of integrity is referred to as redundancy integrity.
The data in the tables are also not correct if they violate business constraints (business
rules) for the application domain. This may be a rule as simple as that an employee cannot
be a pilot and a mechanic at the same time. It may also be a more complex rule such as
that a mechanic can only be assigned to the maintenance of an aircraft if he/she has been
trained for the appropriate aircraft model.
The corresponding type of integrity is referred to as (business) constraint integrity.
The integrity of data can be jeopardized by maintenance operations, i.e., insert, delete, or
update operations. Therefore, to guarantee the integrity of the data, rules must be
established that govern and must be followed for these types of operations. The rules are
referred to as integrity rules. In accordance with the type of operation to which they apply,
the rules are referred to as Insert Rules, Delete Rules, and Update Rules, respectively.
Uempty
Referential Integrity - Terminology
Parent Referential Constraint Dependent
Table Table
Type_ Model_ Length_ Aircraft_ Date_ Type_ Model_
Code Number of_Model Number Manufactured Code Number
A340 200 59.40 B474001323 1994-10-12 B747 400
A310 300 46.67 B373004518 1999-02-28 B737 300
B737 300 33.41 B373004519 1999-03-31 B737 300
B747 400 70.67 A103000534 1998-05-12 A310 300
AIRCRAFT_MODEL A103003167 1997-08-01 A310 300
A402004217 1999-10-23 A340 200
AIRCRAFT
Parent
Table
Parent/ Dependent
Primary Table Foreign
Key Engine_ Engine_ Aircraft_ Key
Number Type Number
PW9880193 PW4062 B474001323
PW9880194 PW4062 B474001323
PW9880195 PW4062
PW9882345 PW4062 B474001323
PW9974034 PW4062 B474001323
R375184566 CF6-80C2 A103003167
R375184567 CF6-80C2
ENGINE R375184568 CF6-80C2 A103003167
Notes:
In conjunction with referential integrity, some terms are used you need to be familiar with:
Key
A logically ordered set of columns of a table. The physical order of the columns in the table
is not relevant. If the key consists of multiple columns, it is referred to as a composite key.
A (logically ordered) set of columns of a table that uniquely identifies the rows of the table.
This need not be the primary key of the table. However, since we have established a
primary key for every table and the primary key can be a parent key. This course will
assume the parent key is a primary key.
Parent Key
On the visual, the parent key is the primary key of table AIRCRAFT_MODEL. It is a
composite key. It consists of columns Type_Code and Model_Number. We will define that
Type_Code is the first column and Model_Number the second column.
Foreign Key
A key which relates to the parent key of another table or the same table and whose values
must always be a subset of the values of the related parent key. Meaning and order of the
parent-key and foreign-key columns must be the same. The names of the columns can be
different. There is a one-to-one correspondence of the columns.
As mentioned before, we will always use the primary key of a table as parent key so that
the foreign key relates to a primary key.
On the visual, columns Type_Code and Model_Number together, and in that order, are a
foreign key of table AIRCRAFT referring to primary key (Type_Code, Model_Number) of
table AIRCRAFT_MODEL.
Referential Constraint
The correlation existing between a foreign key and the corresponding parent key.
On the visual, the correlation between foreign key (Type_Code, Model_Number) of table
AIRCRAFT and primary key (Type_Code, Model_Number) of table AIRCRAFT_MODEL
represents a referential constraint.
The arrow illustrating a referential constraint in a diagram points from the parent key to the
foreign key. A single-headed arrow is used if a parent key value can occur only once as
foreign key value. A double-headed arrow is used if a parent key value can occur more
than once as foreign key value.
Since each foreign key value can only occur once as parent key value, an arrowhead is not
necessary for the inverse direction.
Parent Table
The table of a referential constraint that contains the parent key.
On the visual, AIRCRAFT_MODEL is the parent table for the referential constraint between
foreign key (Type_Code, Model_Number) of table AIRCRAFT and the primary key of
AIRCRAFT_MODEL.
Dependent Table
The table of a referential constraint that contains the foreign key.
On the visual, AIRCRAFT is the dependent table for the referential constraint between
foreign key (Type_Code, Model_Number) of table AIRCRAFT and the primary key of
AIRCRAFT_MODEL.
Self-Referencing Table
A table having a self-referencing constraint.
Table MAINTENANCE_RECORD for Come Aboard is a self-referencing table since it has
the self-referencing constraint mentioned before.
Parent Row
A row of the parent table whose parent key value exists as foreign key value in the
dependent table.
On the visual, all rows of AIRCRAFT_MODEL are parent rows for the referential constraint
between foreign key (Type_Code, Model_Number) of table AIRCRAFT and the primary key
of AIRCRAFT_MODEL.
Dependent Row
A row of the dependent table whose foreign key contains a value.
On the visual, all rows of AIRCRAFT are dependent rows for the referential constraint
between foreign key (Type_Code, Model_Number) of table AIRCRAFT and the primary key
of AIRCRAFT_MODEL.
Referential Integrity
For a referential constraint, referential integrity exists if, for every foreign key value of the
dependent table, the appropriate parent key value exists in the parent table.
On the visual, referential integrity exists for the referential constraint between foreign key
(Type_Code, Model_Number) of table tables AIRCRAFT and the primary key of
AIRCRAFT_MODEL. The visual illustrates a second referential constraint: the referential
constraint between column Aircraft_Number (foreign key) of table ENGINE and the primary
key of table AIRCRAFT (parent key). For this referential constraint, AIRCRAFT is the
parent table and ENGINE the dependent table.
The row for aircraft B373004518 is not a parent row since none of ENGINE's rows is
dependent on it. The row for engine PW9880195 is not a dependent row since column
Aircraft_Number does not contain a value for it.
The referential integrity for a referential constraint must be controlled via insert, delete, and
update rules for the parent table and the dependent table. For different referential
constraints, the integrity rules may be different.
For each referential constraint, you can decide if you want the database management
system to enforce the referential integrity or if you want to take care of it yourself. You may
even decide not to care about referential integrity or to check it only periodically and correct
problems when you find time.
Uempty
Referential Integrity - Insert Rules
Type_ Model_ Length_
Code Number of_Model Always INSERT INTO AIRCRAFT_MODEL
A340 200 59.40 ( Type_Code,
A310 300 46.67 Model_Number, . . . )
B737 300 33.41 VALUES
B747 400 70.67 ( 'B777',
AIRCRAFT_MODEL '200', . . . )
Parent
Table
INSERT INTO AIRCRAFT
Dependent ( Aircraft_Number,
Table Type_Code,
Model_Number, . . . )
Aircraft_ Date_ Type_ Model_ VALUES
Number Manufactured Code Number ( 'B373004863',
B474001323 1994-10-12 B747 400
'B737',
'300', . . . )
B373004518 1999-02-28 B737 300
B373004519 1999-03-31 B737 300
A103000534 1998-05-12 A310 300
INSERT INTO AIRCRAFT
( Aircraft_Number,
A103003167 1997-08-01 A310 300
Type_Code,
A402004217 1999-10-23 A340 200 Model_Number, . . . )
AIRCRAFT VALUES
( 'A006003012',
Only if parent row exists 'A300',
'600', . . . )
Notes:
For the insertion of rows, the following rules ensure the integrity of referential constraints:
• If the referential constraint is not a self-referencing constraint, rows can be added to the
parent table at all times.
If the referential constraint is a self-referencing constraint, the parent table is the
dependent table as well and the restrictions for dependent tables apply.
The insertion of rows into the parent table for a referential constraint may also be
impaired by the parent table being the dependent table of another referential constraint.
• A row may be added to the dependent table for a referential constraint if:
- The foreign key for the row does not contain a value (if allowed for the columns of the
foreign key).
- The foreign key value has a matching parent key value in the parent table.
In short: Insertion of a row only if foreign key does not have a value or matches an
existing parent key value, i.e., an appropriate parent row exists.
For the referential constraint on the visual, rows can be added to table AIRCRAFT_MODEL
without restrictions. The row for aircraft number B373004863 can be added to table
AIRCRAFT since its foreign key value (B737, 300) has a matching parent row in table
AIRCRAFT_MODEL.
The row for aircraft number A006003012 cannot be added to table AIRCRAFT because
AIRCRAFT_MODEL does not contain a row for aircraft model (A300, 600).
Uempty
Referential Integrity - Delete Rules
NA
Type_ Model_ Length_ Aircraft_ Date_ Type_ Model_
Code Number of_Model Number Manufactured Code Number
A340 200 59.40 B474001323 1994-10-12 B747 400
A310 300 46.67 B373004518 1999-02-28 B737 300
B737 300 33.41 B373004519 1999-03-31 B737 300
B747 400 70.67 A103000534 1998-05-12 A310 300
AIRCRAFT_MODEL A103003167 1997-08-01 A310 300
A402004217 1999-10-23 A340 200
AIRCRAFT
C
SN Aircraft_ Seat_
Engine_ Engine_ Aircraft_ Number Number
Number Type Number B474001323 1A
PW9880193 PW4062 B474001323 B474001323 1B
PW9880194 PW4062 B474001323 B474001323 1C
PW9880195 PW4062 ... ...
PW9882345 PW4062 B474001323 B474001323 46J
PW9974034 PW4062 B474001323 B171004217 1A
R375184566 CF6-80C2 A103003167 B171004217 1B
R375184567 CF6-80C2 ... ...
ENGINE R375184568 CF6-80C2 A103003167 SEAT B171004217 28G
Notes:
For the deletion of rows, the following delete rules ensure the integrity of referential
constraints:
• If the referential constraint is not a self-referencing constraint, a row can be deleted from
the dependent table at any time.
If the referential constraint is a self-referencing constraint, the dependent table is the
parent table at the same time. Since dependent rows may be parent rows at the same
time, the delete rule for the referential constraint may prevent the deletion of dependent
rows or cause the deletion of additional rows.
The deletion of rows from the dependent table of a referential constraint can also be
impaired by other referential constraints for which the dependent table is the parent
table.
• For the deletion of rows from the parent table of a referential constraint, one of the
following options can be chosen:
NO ACTION
For NO ACTION, rows of the parent table can only be deleted if none of the rows of the
dependent table becomes an orphan, i.e., does not have a matching parent key value
afterwards.
Conceptually, first, all rows requested by the delete request are deleted. After they have
been deleted, it is checked if the dependent table contains rows dependent on the
deleted rows and now being orphans. If so, the deletions are backed out and the request
is rejected. Thus, NO ACTION checks for conflicts after the deletion.
The subsequent RESTRICT option is very similar to NO ACTION, but its effects may be
different. NO ACTION is the SQL92 standard whereas RESTRICT is a DB2
implementation.
On the visual, NO ACTION (abbreviated as NA) has been chosen for the illustrated
constraint between tables AIRCRAFT_MODEL and AIRCRAFT. This means that an
aircraft model can only be deleted if an aircraft is no longer dependent on it.
SET NULL
The foreign key values of rows dependent on deleted parent rows are deleted, i.e., the
dependencies are removed.
SET NULL is only an option if the foreign key of the dependent table need not have a
value for every row. This raises the question when a composite foreign key is considered
to have no value for a row? In general, the foreign key of a row is considered not to have
a value if at least one column of the foreign key does not have a value. Thus, SET NULL
is only an option if at least one of the foreign key columns has been defined as nullable.
SET NULL resets all columns to NULL which have been defined as nullable.
Uempty On the visual, SET NULL (abbreviated as SN) has been chosen for the illustrated
referential constraint between tables AIRCRAFT and ENGINE. This means that, for an
engine, the reference to the aircraft is removed if the aircraft is deleted. Practically, this
implies that the engine is no longer mounted on an aircraft.
SET NULL can be used because column Aircraft_Number of table ENGINE has been
defined as nullable.
CASCADE
Rows dependent on deleted parent rows are deleted as well.
On the visual, CASCADE (abbreviated as C) has been chosen for the illustrated
referential constraint between tables AIRCRAFT and SEAT. Thus, if an aircraft is
deleted, information about its seats is no longer kept.
Parent Table
NO ACTION
The parent key values of parent rows can only be changed if the
dependent table does not have orphans afterwards
RESTRICT
Can only change parent key values of rows that are not parent rows
SET NULL
If the parent key value of a parent row is changed, the foreign key
values of all dependent rows are set to NULL (if permitted)
CASCADE
If parent key value of a parent row is changed, the foreign key values
of all dependent rows are changed accordingly
Most systems only support NO ACTION or RESTRICT
Figure 8-7. Referential Integrity - Update Rules CF182.0
Notes:
The relational data model defines the following update rules for referential constraints:
• The foreign key values of dependent rows can be changed to matching parent key
values of the parent table or can be deleted (set to NULL) if permitted. As explained for
the delete rules, the deletion of foreign key values generally requires at least one of the
foreign key columns being defined as nullable.
• For the updating of parent key values, the relational data model provides the following
options:
NO ACTION
For NO ACTION, the parent key values of rows of the parent table can only be changed
if none of the rows of the dependent table becomes an orphan, i.e., does not have a
matching parent value, afterwards.
Uempty As for the delete rules, conceptually, the checking for dependent rows is done after all
parent key values have been changed. If orphans are detected, the changes are rolled
back and the request is rejected.
SET NULL
The foreign key values of all rows dependent on parent rows whose parent key values
are changed are deleted, i.e., the dependencies are removed.
SET NULL is only an option if the foreign key of the dependent table need not have a
value for every row. The same foreign key considerations apply as for the delete rules.
CASCADE
The foreign key values of all dependent rows are changed to the new parent key values
of their parent rows.
For parent tables, most database management systems (in particular, DB2) only support
NO ACTION or RESTRICT. To change the parent key value for a parent row, you can use
the following procedure:
1. Add an identical row with the new parent key value to the parent table.
2. Change the foreign key values of the dependent rows to the new parent key value.
3. Delete the former parent row. (Since the foreign key values of the formerly dependent
rows have been changed, the former parent row is no longer a parent row and,
therefore, can be deleted.)
Alternatively, you can temporarily removed the referential constraint and reestablish it after
the parent key values have been changed. However, this requires that the integrity of the
referential constraint is checked by means of utilities before the dependent table can be
processed again.
D . .m NA C D . .m C
D . .1 NA C D . .1 C
Notes:
As mentioned before, we will always use the primary key of the parent table as parent key.
Therefore, we will no longer use the term parent key and only talk about primary
key/foreign key relationships in conjunction with referential constraints in the remainder of
the unit.
The existence of a referential constraint means that rows of the dependent table refer to
rows of the parent table. This implies an interrelationship between the parent table and the
dependent table. Since the tables are derived from tuple types, which are derived from
entity types and relationship types, referential constraints are the consequence of
relationship types of the entity-relationship model. In many cases, the entity-relationship
model also helps you determine the proper delete rules for the referential constraints as
illustrated by the next series of visuals.
The above visual discusses the resulting delete rules if one of the tables is for (a tuple type
belonging to) a dependent entity type. The key of the dependent entity type contains, as a
part, the key of the parent entity type or relationship type. For each dependent instance, the
appropriate parent instance must exist at all times.
Uempty The interrelationships between source and target instances are expressed by the key of
the dependent entity type. As we know, a tuple type, and thus a table, is not established for
the owning relationship type. For the established tables, the interrelationships between the
instances (rows) are expressed by the primary key of the dependent table. The primary key
of the parent table is part of the primary key of the dependent table and constitutes a
foreign key for the dependent table.
The left two examples illustrate the delete rule to be chosen if the controlling property has
not been chosen for the dependent entity type. In these cases, the instances of the
dependent entity type, and thus the rows of the dependent table, are dependent on the
existence of the appropriate parent instances or rows. Since controlling has not been
specified for the dependent entity type, a parent instance (row) cannot be deleted as long
as an instance (row) is dependent on it. Consequently, the proper delete option for the
referential constraint is NO ACTION (or RESTRICT) independently of the cardinality for the
dependent entity type.
If the controlling property has been specified for the dependent entity type, dependent
instances, and thus rows, are to be deleted if the associated parent instances (rows) are
deleted. Thus, in this case, the proper delete option for the referential constraint is
CASCADE as illustrated for the right two examples on the visual.
Target R Target R
Target R Target R
Notes:
The cases on this visual consider m:m relationship types. For them, tuple types and tables
are established for source, target, and relationship type. Let us call them SOURCE,
TARGET, and R, respectively. Since the relationship key consists of the keys of source and
target, the primary key of R consists of the primary keys of SOURCE and TARGET. None
of the columns of the primary key may be nullable.
For a relationship instance, the associated source and target instances must exist at all
times. Therefore, for each row of R, rows with corresponding primary key values must exist
in SOURCE and TARGET. Consequently, as part of the primary key of R, the primary keys
of SOURCE and TARGET constitute foreign keys for R.
For relationship instances, the rule applies that they are deleted if their source or target
instances are deleted and the relationship instance can be deleted. Whether or not and
when a relationship instance can be deleted is controlled by the minimum cardinalities for
the relationship type. The controlling property may also have a certain effect as will be
illustrated on the next visual. If a relationship instance cannot be deleted, its source or
target instances cannot be deleted.
Uempty The cases on the visual assume that the controlling property has not been specified for
either end of the relationship type. For the top left case, the minimum cardinalities for both
ends of the relationship type are 0. Thus, a relationship instance need not exist for a source
or target instance and nothing prevents the deletion of relationship instances. As a
consequence, if a parent row of one of the two parent tables is deleted, the corresponding
dependent rows must be deleted as well. Thus, CASCADE is the proper delete rule for
both referential constraints.
For the top right case, the minimum cardinality for the source is 1 and the minimum
cardinality for the target is 0. Consequently, for each target instance, at least one
relationship instance must exist. Since the controlling property has not been specified for
the target, the deletion of a relationship instance should not cause the automatic deletion of
its target instance.
Both points together disallow the deletion of a source instance if it resulted in a target
instance without relationship instance. They require a delete rule of NO ACTION (or
RESTRICT) for the referential constraint between the tables for the source and the
relationship type: You do not allow the deletion of a row of the table for the source as long
as a row is dependent on it; otherwise, you could delete the row for the last relationship
instance for the target.
Note that this does not completely match the meaning of minimum cardinality 1 for the
source of the relationship type, but this is as close as you can get.
Since the minimum target cardinality is 0, a source instance need not have a relationship
instance. Thus, a relationship instance can be and must be deleted if its target instance is
deleted. For the referential constraint between the tables for the target and the relationship
type, this translates into a delete rule of CASCADE.
For the bottom right case, the roles of source and target have been reversed. Thus, the
delete rules are: CASCADE for the referential constraint between the tables for the source
and the relationship type; NO ACTION (or RESTRICT) for the referential constraint
between the tables for the target and the relationship type.
For the bottom left case, both minimum cardinalities are 1. This means that a relationship
instance cannot be deleted if it is the last relationship instance for the source instance or
the target instance. This translates into delete rules of NO ACTION (or RESTRICT) for both
referential constraints with the caveat mentioned above.
Whenever you have a delete rule of NO ACTION or RESTRICT, you must get rid of
dependent rows (relationship instances) first. Only thereafter, you can delete the parent
rows.
OR
1. .m
r
C m NA C
Target R
CREATE TRIGGER . . .
Source SOURCE TARGET AFTER DELETE ON R
1. .m REFERENCING OLD AS O
FOR EACH ROW MODE DB2SQL
r BEGIN ATOMIC
DELETE FROM TARGET
C m C C
WHERE primary-key =
O.foreign-key;
Target R END
Notes:
For the previous visual, we assumed that the controlling property had not been specified for
either end of the relationship type. As a consequence, the deletion of a relationship
instance did not affect any source or target instances. Likewise, the deletion of a row of the
table for the relationship type did not have an effect on rows of the tables for source and
target.
If the controlling property has been specified for an end of the relationship type, the
deletion of the row for a relationship instance affects rows in other tables: If the controlling
property has been specified for the source, the row for the source instance must be
deleted. If the controlling property has been specified for the target, the row for the target
instance must be deleted. As on the visual, let us assume that the controlling property has
been specified for the target of the relationship type. Then, the deletion of the row for a
relationship instance should automatically trigger the deletion of the row for the target
instance (from the target table).
Referential integrity does not provide for the automatic deletion of rows in tables other than
the dependent table. For the example on the visual, you can still use NO ACTION or
Uempty RESTRICT for the referential constraint between the tables for the source and the
relationship type. This would prevent rows for target instances without corresponding rows
for relationship instances, but would ignore the controlling property.
If you want to implement the controlling property correctly and your database management
system supports triggers, you can do the following:
• Use a delete rule of CASCADE for the referential constraint between the table for the
end opposing the controlling property and the table for the relationship type.
In the example, this is for the referential constraint between the tables for the source and
the relationship type. (The other delete rule is already CASCADE because of minimum
cardinality 0 for the target.)
• To achieve the automatic deletion of the source or target instance for the relationship
instance, you need a trigger for the table for the relationship type. This trigger must be
an after trigger activated on each deletion of a row from the table for the relationship
type. It must delete the row of the controlled end of the relationship type associated with
the row being deleted. The controlled end is the end for which the controlling property
has been specified.
In the example on the visual, the controlling property has been specified for the target.
Therefore, the appropriate row in the table for the target must be deleted. The trigger
shown on the visual will achieve this. The WHERE clause of the DELETE statement has
been simplified. It assumes that the primary key and the foreign key are not composite
keys. If they were composite keys, multiple predicates, combined by logical ANDs,
would be needed in the WHERE clause.
Note that the correlation name of the REFERENCING clause is needed to refer to the
foreign key value of the row for the relationship instance being deleted.
. .m SN C . .m C
. .m NA C . .m C
Notes:
The cases on this visual consider 1:m relationship types. For simplicity, let us assume that
the maximum cardinality for the source is 1. (If the maximum cardinality for the target is 1
instead, just reverse the directions of the relationship type.) In this case, the tuple type for
the relationship type can be integrated or imbedded into the tuple type for the target.
Tables are created for the source tuple type and the extended target tuple type. As foreign
key, the table for the extended target tuple type contains the primary key of the table for the
source.
If the minimum source cardinality is 0, a relationship instance need not exist for a target
instance. Thus, if a source instance is deleted, the target does not prevent the deletion of
relationship instances for the source.
If the controlling property has not been specified for the target (top left case), this translates
into a delete rule of SET NULL for the referential constraint. CASCADE would be wrong
since it would delete the row for the target instance as well. NO ACTION or RESTRICT
would be too restrictive since it would prevent the deletion of the relationship instance.
Uempty If the controlling property has been specified (top right case), the corresponding target
instance should be deleted as well if a relationship instance is deleted. Thus, CASCADE is
the proper delete rule.
The bottom two cases deal with a minimum cardinality of 1 for the source of the relationship
type. Consequently, for each target instance, at least one relationship instance must exist.
For the bottom right case, the controlling property has been specified for the target
meaning that the target instance should be deleted as well if a relationship instance is
deleted. Thus, the minimum source cardinality of 1 does not block the deletion of the
relationship instance and CASCADE is the proper delete rule for the referential constraint.
For the bottom left case, the controlling property has not been specified for the target.
Accordingly, the deletion of a relationship instance should not cause the automatic deletion
of its target instance. Minimum cardinality 1 together with the absence of the controlling
property disallow the deletion of a source instance if it resulted in a target instance without
relationship instance. They require a delete rule of NO ACTION (or RESTRICT) for the
referential constraint: You do not allow the deletion of a row of the table for the source as
long as a row is dependent on it; otherwise, you could delete the row for the last
relationship instance for the target.
Note that this does not completely match the meaning of minimum cardinality 1 for the
source of the relationship type, but this is as close as you can get.
For 1:1 relationship types, the delete rules are determined in the same manner. The only
consideration that is different is that you have a choice for the relationship key. It can either
be the key of the source or the key of the target. Depending on the choice, the tuple type
for the relationship type can be integrated into the tuple type for the source or for the target.
The table for the tuple type into which the tuple type for the relationship type is integrated
contains the foreign key.
m r1 m
A B TA TC TB
m
r2
1. .m C NA C
C TR2
m r1 m
A B TA TC TB
1. .m
r2
1. .m NA NA NA
C TR2
Notes:
In Unit 6 - Tuple Types, we saw that tuple types must not be provided for m:m relationship
types being the source (target) of another relationship type with a minimum target (source)
cardinality of 1. We will study the delete rules for the appropriate cases.
On the current visual and the next four visuals, the m:m relationship type and the other
relationship type are called r1 and r2, respectively. The source and target of r1 are called A
and B. Without loss of generality, we assume that r1 is the source of r2. The target for r2 is
called C.
Furthermore, we assume for this visual and the next that the maximum source cardinality of
r2 is m; otherwise, the tuple type for r2 could be combined with the tuple type for its target.
As we saw in Unit 6 - Tuple Types, a tuple type for r1 must not be provided since the tuple
type for r2 accurately describes the relationship instances for r1. Therefore, tables must
only be created for A, B, C, and r2. Let us name them TA, TB, TC, and TR2, respectively.
To describe the relationship instances for r1, TR2 contains columns for the primary keys of
TA and TB. In addition, it contains columns for the primary key TC.
Uempty Since the relationship key for r1 and the key of C are defining attributes for r2, the
appropriate columns of TR2 must not be nullable. They are even foreign keys of TR2
because their values must exist as primary key values in the respective parent tables.
For the cases on this visual, the minimum cardinality is 0 for both the source and the target
of r1. This means that the affiliated relationship instances can be deleted if a source or
target instance is deleted.
For the upper case on the visual, the minimum source cardinality for r2 is 0 implying that a
relationship instance need not exist for an instance of C. Consequently, the deletion of the
source or target instance for a relationship instance of r1 must remove the relationship
instance and any relationship instances of r2 for which it is the source. This translates into
delete rules of CASCADE for the referential constraints between TA and TR2 and TB and
TR2.
For the referential constraint between the TC and TR2, the following considerations apply:
The minimum target cardinality of 1 for r2 requires an instance of r2 for each instance of r1.
Since the controlling property has not been specified for the source of r2, the delete rule
must be NO ACTION or RESTRICT; otherwise, the last row of TR2 for an instance of r1
could be deleted when a row of TC is deleted.
Again, note that delete rule NO ACTION (or RESTRICT) does not completely match the
meaning of the minimum target cardinality.
For the second case on the visual, the minimum source cardinality of r2 is 1. Therefore,
each instance of C requires an instance of r2. Since the controlling property has not been
specified for C, the delete rules for the referential constraints between TA and TR2 and TB
and TR2 must be NO ACTION or RESTRICT; otherwise, the last row of TR2 for an instance
of C could be deleted when rows of TA or TB are deleted.
Again, note that delete rule NO ACTION (or RESTRICT) does not completely match the
meaning of the minimum source cardinality.
If the controlling property for C were specified, delete rules of CASCADE could be used for
the referential constraints in conjunction with a trigger as outlined on page 8-22.
The trigger would need to delete the appropriate row in TC if a row of TR2 were deleted.
1. .m r1 m
A B TA TC TB
m
r2
1. .m NA NA C
C TR2
1. .m r1 1. .m
A B TA TC TB
m
r2
1. .m NA NA NA
C TR2
Notes:
The cases on this visual illustrate the delete rules if the minimum cardinality of source or
target of r1 is 1.
If, for example, the minimum cardinality of the source of r1 is 1, a relationship instance
must exist for each target instance of r1. The deletion of a source instance must not cause
the deletion of the last relationship instance of r1 for a target instance. Consequently, the
delete rule for the referential constraint between TA and TR2 must be NO ACTION or
RESTRICT. (Again, with the caveat that this does not match completely the meaning of
minimum cardinality 1.)
For the second case, both minimum cardinalities of r1 are 1. As a consequence, the delete
rules between TA and TR2 and TB and TR2 must both be NO ACTION or RESTRICT.
Uempty
Delete Rules and ER Model (7 of 8)
m r1 m
A B TA TB
r2
1. .m D NA NA
C TC
m r1 m
A B TA TB
r2
1. .m DC C C
C TC
Notes:
As we saw in Unit 6 - Tuple Types, tuple types must not be provided for r1 and r2, if:
• r1 is an m:m relationship type,
• r2 is an owning relationship type whose source or target is r1, and
• the minimum cardinality for the dependent entity type is 1.
This is because the key of r1 is part of the key of the dependent entity type and a
dependent entity instance must exist for each instance of r1. Accordingly, tables are only
established for A, B, and C. They are called TA, TB, and TC, respectively. The primary key
of TC comprises foreign keys referring to TA and TB.
For the cases on the visual, the minimum cardinalities are 0 for the source and the target of
r1.
Similar conclusions as for the previous visuals lead to delete rules of NO ACTION or
RESTRICT if the controlling property has not been specified for the dependent entity type.
The delete rules cannot be CASCADE since the deletion of rows of TA or TB would cause
the deletion of rows of TC. The delete rules cannot be SET NULL either. SET NULL could
cause instances for C without instances for r1 which is not allowed for dependent entity
types.
If the controlling property has been specified for the dependent entity type, both delete
rules must be CASCADE. If instances of A or B are deleted, affiliated relationship instances
of r1 should be deleted. Because of the controlling property for C, the associated
dependent entity instances are to be deleted as well.
Uempty
Delete Rules and ER Model (8 of 8)
1. .m r1 m
A B TA TB
r2
1. .m DC NA C
C TC
1. .m r1 1. .m
A B TA TB
r2
1. .m DC NA NA
C TC
Notes:
This visual illustrates two cases for which the minimum cardinalities for source or target of
relationship type r1 are 1.
Let us discuss the first case for which the minimum cardinality of the source is 1. The
minimum cardinality of 1 for A blocks the deletion of the last relationship instance for an
instance of B if an instance of A is to be deleted. Thus, a delete rule of NO ACTION or
RESTRICT is appropriate for the referential constraint between TA and TC. (Again, the
caveat for the minimum cardinality of 1 applies.) Note that the delete rules are independent
of whether or not the controlling property has been specified for the dependent entity type:
In any case, you must block the deletion of the relationship instance for r1.
As a consequence of the controlling property for the dependent entity type, the delete rule
for the referential constraint between TB and TC must be CASCADE. If the controlling
property were not specified, the delete rule would be NO ACTION or RESTRICT.
For the second case on the visual, the minimum cardinalities are 1 for both ends of
relationship type r1. This leads to delete rules of NO ACTION or RESTRICT for both
referential constraints.
m r1 m
A B TA TB
1
r2
1. .m SN SN
C TC
CREATE TRIGGER . . .
AFTER UPDATE ON TC
REFERENCING NEW AS N
FOR EACH ROW MODE DB2SQL
WHEN( (N.foreign-key-TA IS NULL AND N.foreign-key-TB IS NOT NULL)
OR (N.foreign-key-TB IS NULL AND N.foreign-key-TA IS NOT NULL) )
BEGIN ATOMIC
UPDATE TC
SET foreign-key-TA = NULL, foreign-key-TB = NULL
WHERE primary-key = N.primary-key;
END
Notes:
From the preceding discussions, we know that a tuple type is not required for relationship
type r1. Its relationship instances are completely described by the tuples for r2. However,
the source cardinality of 1 for r2 allows us to imbed the tuple type for r2 into the tuple type
for its target C.
Accordingly, tables are only established for A, B, and extended tuple type C. Let us call
them TA, TB, and TC, respectively. As the consequence of the imbedding, TC contains the
primary keys of TA and TB as foreign keys and the appropriate columns must be defined as
nullable.
If an instance of A or B is deleted, any relationship instances of r1 for it must be deleted as
well. In turn, the relationship instances of r2 being dependent on the deleted instances of r1
must be deleted. The cardinalities for r2 do not prevent the deletion of the instances of r2.
However, the target instances for the relationship instances (these are instances of C) must
not be deleted.
For the two referential constraints, this seems to translate into delete rules of SET NULL.
However, not quite so. If a row of TA is deleted, only the references to it in TC are deleted.
Uempty Likewise, if a row of TB is deleted, only the references to it in TC are deleted. However, the
rows of TC describe relationship instances for r1 and not unrelated references to TA and
TB. A relationship instance consists of a pair of references and not a single reference.
Therefore, the references to TA and TB in a row of TC should be deleted at the same time.
You may decide not to care about a reference to TA or TB in TC if the other reference is
NULL. However, if you want to correctly implement relationship type r1, you need a trigger
synchronizing the foreign key columns when one of them is set to NULL. The trigger on the
visual achieves this.
The trigger is activated after a row of TC has be updated. The changing of foreign key
columns by the referential integrity support is considered as an update of the row. Some
systems (e.g., DB2) do not consider it as an update of the columns. Therefore, you should
not specify individual columns (UPDATE OF ...).
The WHEN clause ensures that the triggered action is executed only if one of the new
values for the foreign keys is NULL and the other is not. The triggered action, i.e., the
UPDATE statement, sets both foreign key values for a row to NULL.
In the trigger, synonyms foreign-key-TA and foreign-key-TB are used to denote the foreign
key columns in TC referring to TA and TB, respectively. The REFERENCING clause
enables us to refer to the new values of the updated rows of TC.
Delete Connection
T1
CASCADE CASCADE
T2
Must be the Delete
CASCADE
same and T4 Connected
not SET NULL
T3
NO ACTION NO ACTION
Notes:
A table T is delete-connected to a table T1 if the deletion of a row of T1 may require
(immediate or indirect) accesses to T. For example, for the deletion of a row of T1, it may
be necessary to determine if T contains rows with a foreign key value equal to the primary
key value of the deleted row.
In the example on the visual, T is delete-connected to T1 over two paths of referential
constraints:
• Let us first consider the left path of referential constraints. Delete rule CASCADE
between tables T1 and T2 may cause the deletion of rows of T2 if a row is deleted from
T1. Because of delete rule CASCADE between T2 and T3, this may, in turn, cause the
deletion of rows from T3. Delete rule NO ACTION between T3 and T requires that table
T is checked for matching foreign key values to determine if the rows of T3 can be
deleted.
Thus, T is delete-connected to T1 via the left path. Of course, T2 and T3 are also
delete-connected to T1.
Uempty • T is also delete-connected to T1 via the right path of referential constraints: Delete rule
CASCADE between tables T1 and T4 may cause the deletion of rows of T4 if a row is
deleted from T1. Delete rule NO ACTION between T4 and T requires that table T is
checked for matching foreign key values to determine if the rows of T4 can be deleted.
Of course, T4 is also delete-connected to T1.
For delete-connected tables, the following restriction applies:
If T is delete-connected to T1 via multiple paths with different referential constraints for T,
then the delete rules for the referential constraints involving T must be the same and must
not be SET NULL.
Otherwise, the result of the deletion of a row from T1 would depend on the sequence the
various paths are processed in by the database management system. The relational data
model requires, however, that the result be independent of the sequence chosen. Also, the
number of variations could be so large that checking if the result were the same for all
processing sequences of the paths must be ruled out.
Referential Cycles
Must be
T1 CASCADE NO ACTION
or CASCADE
CASCADE
NO ACTION
T2
At least two
SET NULL must not be
CASCADE
T3
Notes:
A referential cycle is a sequence of referential constraints leading back to the same table.
The visual shows two cycles: First, it illustrates a cycle consisting of multiple referential
constraints involving tables T1, T2, and T3. Then, it illustrates a cycle just involving a single
table, i.e., a self-referencing constraint.
For referential cycles, the following restrictions apply:
1. In a referential cycle of two or more tables, the tables must not be delete-connected to
themselves.
This implies that at least two of the delete rules must not be CASCADE.
2. The delete rule for a self-referencing constraint must be CASCADE or NO ACTION. It
cannot be RESTRICT or SET NULL.
In both cases, the result of operations deleting multiple rows from a table would dependent
on the sequence in which the rows are deleted. The relational data model requires the
results to be independent of the processing sequences of the rows.
Uempty Note that the table of a self-referencing constraint is always delete-connected to itself.
Parent Table
Notes:
If you want to use your database management system for enforcing a referential constraint
between two tables, you must define the referential constraints to the database
management system. A referential constraint concerns two tables: the parent table and the
dependent table.
Assuming that the parent key of the parent table is the primary key (as we do), you must
define which columns form the primary key for the table. If the primary key is a composite
key, you must define the sequence of the columns for the primary key. This sequence is
relevant for the foreign key of the dependent table. The referential constraint itself must be
defined for the dependent table. First, you must specify the columns of the foreign key.
They must be specified in the same sequence as the corresponding primary key columns.
Next, you must specify the parent table. In addition, you must specify the delete rule and, if
your database management system allows it and gives you a choice, the update rule for
the referential constraint.
You may also give the referential constraint a name. The name can be used to delete the
constraint again if it is no longer needed.
Uempty
Referential Integrity - Documentation
Notes:
The documentation for a referential constraint should be added to the documentation for
the dependent table. For each referential constraint, provide the following information:
• A name for the referential constraint. You should name each referential constraint. The
name must be unique per dependent table, but we suggest to make it unique for the
application domain.
• An ordered list of the columns making up the foreign key. The order of the columns
must match the order of the corresponding primary key columns. The names can be
different.
• The name of the parent table, i.e., the table containing the corresponding primary key.
• The delete rule for the referential constraint, i.e., NO ACTION, RESTRICT, SET NULL,
or CASCADE.
• If your database management system gives you a choice, the update rule for the
referential constraint. In most cases, this will be NO ACTION or RESTRICT.
• A unique constraint number. You should give each referential constraint of the
application domain a unique number. This number is not needed for defining the
referential constraint to the target database management system. It is used to identify
the constraint in referential structures (described later in this topic).
A referential structure provides an overview of the referential constraints for the
application domain or for a subset thereof. It shows how the tables for the application
domain are interconnected by referential constraints. The constraint number in the
referential structure serves as reference to the documentation for the referential
constraint. It can be used to find details about the referential constraint.
Uempty
Maintenance View - Updated ER Model
AIRCRAFT _from_ MANU-
TYPE m 1. .1 FACTURER
1. .1
_for_
1. .m D
_trained
_for_ AIRCRAFT
MECHANIC _from_
m m MODEL
1. .1 m 1. .1
_for_
m m
_from_ m _on_
AIRCRAFT ENGINE
_scheduled 1 m
_for_
m _in_
_has_
C MAINTENANCE Owner
m DC 1. .1 D C
m RECORD 1
ENGINE
SEAT
LOCATION
_belongs_to_
Notes:
In Unit 4 - Entity-Relationship Model, we established the initial Maintenance View for our
sample airline company called Come Aboard. This visual shows an update of the
Maintenance View. It includes the changes caused by normalization.
The major changes are:
• Dependent entity type SEAT has been added as a result of normalization (First Normal
Form).
• Entity types ENGINE and ENGINE POSITION and relationship types
ENGINE_on_AIRCRAFT and ENGINE_on_AIRCRAFT_in_ENGINE LOCATION have
been added due to normalization (First Normal Form).
• Entity type MANUFACTURER and relationship types
AIRCRAFT TYPE_from_MANUFACTURER and ENGINE_from_MANUFACTURER
have been added due to normalization (Third Normal Form).
On the visual, the relationship types for which tuple types and, thus, tables must not be
established have been grayed out.
Referential Structure
Referential Structure MANU-
FACTURER
for Maintenance View
1
NA
AIRCRAFT_
TYPE
2
NA
AIRCRAFT_
MECHANIC 5
MODEL
8 7 3
C C NA
MECHANIC_
11 AIRCRAFT
FOR_AM
10 9 4 6
NA C C C SN NA
MAINTENANCE C MECHANIC_
SEAT ENGINE
_RECORD FOR_AC
12
Notes:
A referential structure is a graphical representation of the referential constraints for an
application domain. It gives an overview of the referential constraints, not a detailed
description.
The referential structure for the entire application domain may not fit onto a single page
with the consequence that it must be split into subsets fitting onto a page. On the visual, we
have concentrated on the subset corresponding to the (updated) Maintenance View for our
sample airline company called Come Aboard.
The referential structure contains rectangles for all tables of the considered subset. The
rectangles contain the names of the tables. A referential constraint between two tables is
represented by a single-headed or double-headed arrow leading from the parent table to
the dependent table. A single-headed arrow is used if a primary key value can occur at
most once in the dependent table. A double-headed arrow is used if a primary key value
can occur more than once in the dependent table. (Note that a foreign key value can occur
only once in the parent table.)
Uempty Next to the arrowhead and next to the dependent table, the delete rule for the referential
constraint is specified. The abbreviations NA, R, SN, and C are used for NO ACTION,
RESTRICT, SET NULL, and CASCADE, respectively.
A little square with a number is placed on the arrow to identify the referential constraint. We
have already talked about that number, referred to as constraint number, in conjunction with
the documentation of referential constraints. The constraint number identifies the
documentation for the referential constraint being part of the documentation for the
dependent table. You could think of using the constraint name instead, but the constraint
name is generally too long and clumsy for the use in diagrams.
You can also add a constraint summary, in form of a listing or table (see page 8-46),
providing the names of the tables involved and the foreign key columns.
As mentioned before, the referential structure for the entire application domain may not fit
onto a single page with the consequence that it must be split into subsets fitting onto a
page. If possible, the subsets should correspond to the submodels you established for the
entity-relationship model of the application domain. Otherwise, proceed in the same
manner as for the entity-relationship model and establish referential (sub)structures for
autonomous subareas or different views of the application domain. Only if they will not fit
onto a single page, establish referential (sub)structures for sets of tables logically belonging
together and fitting onto a page.
Generally, the referential (sub)structures of the various pages will overlap. Some tables and
referential constraints will occur on multiple pages. The (sub)structures must not conflict
with each other. Together, they must cover all referential constraints for the application
domain.
Now, let us discuss the referential structure for the Maintenance View of Come Aboard in
more detail:
• Tables must be established for all entity types of the Maintenance View with the
exception of ENGINE LOCATION. Its tuple type together with the tuple type for
relationship type AIRCRAFT_on_ENGINE could be imbedded in the tuple type for
ENGINE.
Furthermore, tables must be established for all m:m relationship types, i.e., for
MECHANIC_trained_for_AIRCRAFT MODEL and
MECHANIC_scheduled_for_AIRCRAFT. The appropriate tables have been called
MECHANIC_for_AM and MECHANIC_for_AC, respectively.
Tables are not needed for any 1:m relationship types. Their tuple types can be combined
with tuple types for their source or target.
• Since the tuple type for relationship type AIRCRAFT TYPE_from_MANUFACTURER
has been imbedded into the tuple type for AIRCRAFT TYPE, table AIRCRAFT_TYPE
has a foreign key referring to table MANUFACTURER.
Because of minimum cardinality 1 for entity type MANUFACTURER and the absence of
the controlling property for AIRCRAFT TYPE, the delete rule for the referential constraint
must be NO ACTION. (An aircraft type must always have a manufacturer.)
• Because AIRCRAFT MODEL is a dependent entity type, table AIRCRAFT_MODEL
contains, as foreign key, the primary key of table AIRCRAFT_TYPE. Since the
controlling property has not been specified for the dependent entity type, the delete rule
must be NO ACTION. (An aircraft model must always have an aircraft type.)
• Since the tuple type for relationship type AIRCRAFT MODEL_for_AIRCRAFT has been
imbedded into the tuple type for AIRCRAFT, table AIRCRAFT has a foreign key referring
to table AIRCRAFT_MODEL.
Because of minimum cardinality 1 for entity type AIRCRAFT MODEL and the absence of
the controlling property for AIRCRAFT, the delete rule for the referential constraint must
be NO ACTION. (An aircraft must always have an aircraft model.)
• Because SEAT is a dependent entity type, table SEAT contains, as foreign key, the
primary key of table AIRCRAFT.
Since the controlling property has been specified for the dependent entity type, the
delete rule must be CASCADE. (If the aircraft is removed, information about the seats
on the aircraft need no longer be kept.)
• Since the tuple type for relationship type ENGINE_from_MANUFACTURER has been
imbedded into the tuple type for ENGINE, table ENGINE has a foreign key referring to
table MANUFACTURER.
Because of minimum cardinality 1 for entity type MANUFACTURER and the absence of
the controlling property for ENGINE, the delete rule for the referential constraint must be
NO ACTION. (An engine must always have a manufacturer.)
• As we mentioned before, the tuple types for entity type ENGINE LOCATION and
relationship type ENGINE_on_AIRCRAFT have been imbedded into the tuple type for
ENGINE. Therefore, table ENGINE has a foreign key referring to table AIRCRAFT.
Because the minimum cardinality of AIRCRAFT is 0 for relationship type
ENGINE_on_AIRCRAFT, the delete rule must be SET NULL. (An engine need not be
mounted on an aircraft.)
• Relationship type MECHANIC_trained_for_AIRCRAFT MODEL is an m:m relationship
type. Therefore, table MECHANIC_FOR_AM has foreign keys referring to tables
AIRCRAFT_MODEL and MECHANIC, respectively.
Since the minimum cardinalities for both ends of the relationship type are 0, both delete
rules must be CASCADE. (The relationship between an aircraft model and a mechanic
can be deleted if either one is "deleted".)
• Relationship type MECHANIC_scheduled_for_AIRCRAFT is an m:m relationship type.
Therefore, table MECHANIC_FOR_AC has foreign keys referring to tables AIRCRAFT
and MECHANIC, respectively.
Uempty Since the minimum cardinalities for both ends of the relationship type are 0, both delete
rules must be CASCADE. (The relationship between an aircraft and a mechanic can be
deleted if either one is "deleted".)
• Since the tuple type for relationship type MAINTENANCE RECORD_from_MECHANIC
has been imbedded into the tuple type for MAINTENANCE RECORD, table
MAINTENANCE_RECORD has a foreign key referring to table MECHANIC.
Because of minimum cardinality 1 for entity type MECHANIC and the absence of the
controlling property for MAINTENANCE RECORD, the delete rule for the referential
constraint must be NO ACTION. (A maintenance record must always have a mechanic.)
• Since the tuple type for relationship type
MAINTENANCE RECORD_belongs_to_MAINTENANCE RECORD has been imbedded
into the tuple type for MAINTENANCE RECORD, table MAINTENANCE_RECORD has
a self-referencing constraint.
Because of the controlling property for the target end of the relationship type, the delete
rule for the referential constraint must be CASCADE. (A maintenance record should be
thrown away if its owning maintenance record is deleted.)
If you assumed that the controlling property had not been specified, the delete rule
should be SET NULL. (If the owning maintenance record is deleted, the dependent
records are kept, but their references to the owning record are reset.) However, the
restrictions for self-referencing constraints would not allow us to choose SET NULL as
delete rule. We would have to choose NO ACTION because CASCADE, the other
alternative, would delete the dependent records.
Notes:
This visual shows the constraint summary for the Maintenance View. The line numbers
match the numbers for the constraints. For each constraint, the dependent table, the parent
table, and the foreign key columns are listed. The numbers in front of foreign key columns
specify their sequence in the foreign key.
Domain Integrity
Notes:
Domain integrity, also referred to as value integrity, deals with the correctness of values in
columns of tables.
A column must only assume values allowed by the abstract data type for the data element
associated with the column. Furthermore, the values must adhere to the domain
specifications (restrictions) for the column's data element. They must also observe length
requirements or restrictions for the data element such as the minimum length, the
maximum length, the number of digits, or the number of decimal places. The length
requirements may have been expressed by parameters for the abstract data type for the
data element.
For example, column Type_Code for table AIRCRAFT must only assume 3-letter codes for
valid airports. Column Number_of_Engines for table AIRCRAFT_TYPE must only assume
integer values between 0 and 4. Column Last_Name of table EMPLOYEE may only
assume values of abstract data type NAMEDATA and must not be longer than 50
characters.
Uempty The ingredients required to ensure domain integrity are user defined distinct types, user
defined functions, triggers, and check constraints.
Since the implementation of domain integrity is closely related to the implementation of
abstract data types described in Unit 7 - From Tuple Types to Tables, we need not discuss
it further.
Redundancy Integrity
Do not allow end users to maintain tables
directly through SQL DML statements
Violations Provide front-ends with proper SQL DML
of 2nd and statements ensuring integrity
3rd Normal Update all rows concerned at the same time
Forms On insert, copy existing redundant information
Can use triggers to maintain integrity for
update operations
Notes:
Redundancy integrity deals with the redundant storage of information in tables.
There are three major causes for the redundant storage of data:
• Violations of the Second Normal Form or Third Normal Form lead to the redundant
storage of data in the same table. Nonkey columns are solely dependent on columns
that are not primary key columns (Third Normal Form) or only on some of the primary
key columns (Second Normal Form). To ensure consistency, the dependent columns
must have the same values for all rows having the same values for the columns on
which the dependency exists.
For this type of redundancy, you can maintain integrity as follows:
- Do not allow end users to maintain the tables concerned directly through SQL Data
Manipulation Language (DML) statements (INSERT, UPDATE, or DELETE) via
dynamic SQL.
Uempty - Instead, provide front-ends with attractive user interfaces for the business processes
concerned. In the front-ends, use the proper program logic and SQL statements to
ensure the consistency of the redundant data.
For update operations, if redundant information is changed, all rows having the same
values for the columns on which the functional dependency exists must be changed at
the same time.
If new rows are inserted, copy the redundant information from existing rows already
containing the information rather than having the end user enter the information
again. The end user must only provide the information the first time around, i.e., when
the information does not yet exist.
Even though this does not make the information inconsistent, you may want to
prevent the deletion of the last row for a value of the columns on which the functional
dependency exists. However, you only need to do this if the redundant information is
still needed.
- If your database management system supports triggers, you may be able to use
triggers to ensure consistency of the redundant information for update operations.
The next visual illustrates how such a trigger must look like.
• Multiple copies of the same data are a second cause for redundant information. For
performance reasons, you may have decided to:
- repeat columns in other tables
- provide multiple copies of entire tables
If the information in the various tables must be consistent at all times, you can use
triggers to enforce the consistency.
Frequently, if you have provided multiple copies of entire tables, one of the tables is the
master table and must be up-to-date at all times. The copies are only used for reference
purposes and need not be up-to-date at all times. In this case, you should disallow
inserts, updates, or deletes for the copies. In addition, you may want to provide new
versions (refreshes) of the copies periodically or from time to time.
• Redundancy can also be caused by stored data that can be derived from other stored
data. Data that can be derived from other data is referred to as derivable data.
For our sample airline company, all seats for an aircraft have a row in table SEAT and the
number of seats on the aircraft is the number of rows in the table. Thus, the number of
seats can be derived from the information in table SEAT. To avoid scanning table SEAT
every time you need the number of seats, you may prefer to store the number of seats in
the rows for the aircraft in AIRCRAFT. If the seat arrangement for an aircraft changes
and you forget to update the appropriate row in table AIRCRAFT, the derivable data
becomes wrong.
If your database management system supports triggers, you can use triggers to
maintain the correctness of derivable data. Whenever data affecting the derivable data
are changed, a trigger must reevaluate and store the derivable data. The triggers
achieving this for the number of seats for our sample airline company are illustrated on
page 8-55.
An alternative is not to store derivable data and to derive them every time they are
needed. Which way is better depends on the usage profiles of your business processes.
Most of the time, retrieval operations are much more frequent than insert, update, or
delete operations (80-20 rule) and are more performance-critical. Then, triggers are
preferable.
Uempty
Violation of Normal Forms - Trigger
CREATE TRIGGER . . .
AFTER UPDATE OF dependent-column-1,
dependent-column-2,
...
ON table-name
REFERENCING NEW AS N OLD AS O
FOR EACH ROW MODE DB2SQL
WHEN (
N.dependent-column-1 <> O.dependent-column-1 OR
N.dependent-column-2 <> O.dependent-column-2 OR SAME
...
)
BEGIN ATOMIC
UPDATE table-name
SET dependent-column-1 = N.dependent-column-1,
dependent-column-2 = N.dependent-column-2,
...
WHERE reference-column = N.reference-column AND
primary-key <> N.primary-key ;
END
Notes:
As we discussed, violations of the Second Normal Form or Third Normal Form lead to the
redundant storage of data in the same table. Nonkey columns are solely dependent on
columns that are not primary key columns (Third Normal Form) or only on some of the
primary key columns (Second Normal Form). To ensure consistency, the dependent
columns must have the same values for all rows having the same values for the columns
on which the dependency exists.
The visual illustrates how a trigger maintaining the integrity of the redundant information for
update operations should look. In the visual the dependent columns are called
dependent-column-1, dependent-column-2, and so on.
Furthermore, to simplify matters, it is assumed that the columns are dependent on a single
column and that the primary key for the table is not composite; otherwise, additional AND
operators would be needed in the WHERE clause. The column the dependent columns are
functionally dependent on is called reference-column on the visual.
The trigger is activated on update requests for the table violating the Normal Form. It is
executed for each row updated if any of the dependent columns, i.e., the columns
containing the redundant information, has been changed (UPDATE OF ...). However, the
triggered actions are only performed if the value of at least one of the dependent columns
has changed.
The triggered action changes the same table as the table being updated. It changes the
values of the dependent columns of rows other than the row being updated (primary-key g
N.primary-key) to the new values for the updated row (SET ... dependent-column-n =
N.dependent-column-n). It only changes the columns of the rows having the same value,
as the row being updated, for the column the dependent columns are functionally
dependent on (reference-column = N.reference-column).
You can easily see the importance of the REFERENCING clause in this case because we
need to refer to three different states for a column: the state before the row was updated,
the state after the row has been updated, and the column for the rows being updated by the
triggered action.
Because the triggered action updates the same columns for the same table, the trigger is
invoked recursively. Thus, without the proper precautions, looping could occur. The search
condition of the WHEN clause prevents an endless recursion because all dependent
columns will have the same old and new value after some iterations.
Depending on how the iterations are performed by your database management system,
you may experience a serious performance degradation when using the trigger!
Uempty
Derivable Data - Sample Triggers
In table AIRCRAFT, maintain number of
seats on aircraft (Number_of_Seats)
Notes:
The example on the visual illustrates how the integrity of stored derivable data can be
maintained by means of triggers.
For our sample airline company, all seats for an aircraft have a row in table SEAT and the
number of seats on the aircraft is the number of rows in the table. Thus, the number of
seats can be derived from the information in table SEAT. To avoid scanning table SEAT
each time the number of seats is needed, the number of seats for an aircraft is also kept in
table AIRCRAFT. The appropriate column is Number_of_Seats and must be maintained as
seats are added or deleted for an aircraft.
The first trigger (ADDSEAT) is activated each time a seat is added to table SEAT. For each
seat added, it increases the number of seats in the row for the aircraft to which the seat
belongs. This requires that column Number_of_Seats was initialized to zero when the row
for the aircraft was inserted into table AIRCRAFT (default values).
Note that the row for an aircraft must exist before seats can be added to the aircraft. Also
note that each row in table SEAT contains the serial number for the aircraft to which the
seat belongs.
The second trigger (DELSEAT) is activated each time a seat is deleted from table SEAT.
For each seat deleted for an aircraft, it decreases the number of seats in the row for the
aircraft in table AIRCRAFT.
For both triggers, the REFERENCING clause is needed to be able to refer to the aircraft
number for the seat added or deleted, respectively.
Uempty
Constraint Integrity
OR
Do not allow end users to maintain tables
concerned by SQL DML statements
Provide proper front-ends to end users ensuring
that business constraints are observed
Notes:
The data in the tables are also not correct if they violate business constraints (business
rules) for the application domain. This may be a rule as simple as that an employee cannot
be a pilot and a mechanic at the same time. It may also be a more complex rule such as
that a mechanic can only be assigned to the maintenance of an aircraft if he/she has been
trained for the appropriate aircraft model. Constraint integrity requires that the business
constraints for the application domain are observed.
In Unit 3 - Problem Statement, business constraints were discussed as part of the problem
statement for the application domain and the information to be provided for them was
listed. In Unit 4 - Entity-Relationship Model, it was described how business constraints are
represented in the entity-relationship model.
Some business constraints are expressed by basic modeling constructs in the
entity-relationship model. For example, the controlling property is really a business
constraint. In some cases, the controlling property translates into delete rule CASCADE for
a referential constraint. However, for other cases, it does not. It does not if specified for an
m:m relationship type as we have seen during our discussions about referential integrity. It
then has to be handled in the same manner as other business constraints.
Of course, the business constraints must be translated into constraints for the tables of the
application domain. During the discussions about referential constraints, we saw that the
controlling property could be implemented by means of triggers in some cases. Triggers,
possibly, in conjunction with user defined functions, are indeed in many cases the means
for implementing business constraints within the database management system.
If your database management system does not support triggers or you do not want to use
them, you can enforce business constraints by:
• Not allowing end users to maintain the tables concerned by directly using INSERT,
UPDATE, or DELETE statements.
• Providing proper front-ends to the end users that ensure that the business constraints
are observed.
Sometimes, other functions of the database management system may do the trick such as
unique indexes. A unique index ensures that the set of columns for which it is defined
contains every value only once. Indexes will be discussed in a later unit.
A unique index would solve the business constraint we modeled in Unit 4 -
Entity-Relationship Model that, for a flight, each pilot function (CAPTAIN or COPILOT) must
only be assigned once. The business constraint translated into the requirement that the
combined values for attributes Flight Number, From, To, Flight Locator, and Pilot Function
of entity type PILOT ASSIGNMENT must be unique. The resulting implementation is a
unique index for the appropriate columns in table PILOT_ASSIGNMENT.
Uempty
Constraint Integrity - Example 1
{ 4 : New maintenance record
MAINTENANCE only for existing aircraft }
AIRCRAFT
RECORD
MAINTENANCE
AIRCRAFT
{4} _RECORD
{4}
Notes:
As we have seen in Unit 4 - Entity-Relationship Model, maintenance records for Come
Aboard include the serial number of the aircraft the maintenance was performed for. The
business constraints for Come Aboard state that maintenance records must be kept even if
the aircraft is no longer owned by CAB. Even though the record for the aircraft no longer
exists, the maintenance records must still contain the serial number for the aircraft they
were established for. When a maintenance record is established, a record for the aircraft
must exist.
Because the aircraft number of maintenance records may point to aircraft no longer owned
by CAB, we could not model the interrelationship as a relationship type. Instead, we
introduced a business constraint between entity types AIRCRAFT and MAINTENANCE
RECORD: When an instance is added to entity type MAINTENANCE RECORD, entity type
AIRCRAFT must contain an instance for the aircraft the maintenance record is established
for.
As the entity types are converted into tuple types and into tables, the constraint must be
translated into an equivalent constraint for the tables: When a row is added to table
MAINTENANCE_RECORD, table AIRCRAFT must contain a row for the aircraft the
maintenance record is established for.
Because column Aircraft_Number of table MAINTENANCE_RECORD may contain aircraft
numbers not contained in table AIRCRAFT, Aircraft_Number is not a foreign key of table
MAINTENANCE_RECORD. Therefore, we cannot use referential constraints to ensure that
the aircraft for new maintenance records exist.
However, we can use a trigger to ensure the existence of the aircraft for new maintenance
records as illustrated by the bottom portion of the visual. Before a row is inserted into table
MAINTENANCE_RECORD, the WHEN clause of the trigger checks if the aircraft number
for the row exists in table AIRCRAFT. If it does not exist, a nonzero SQL state is raised
causing the INSERT statement to terminate.
Uempty
Constraint Integrity - Example 2 (1 of 2)
8 7
C C 3
_trained MECHANIC_
_for_ AIRCRAFT FOR_AM NA
MECHANIC m
m MODEL AIRCRAFT
m 1. .1
AND {5}
_for_
AND {5} m 10 9
m C C
AIRCRAFT MECHANIC_
_scheduled
_for_ FOR_AC
Notes:
In Unit 4- Entity-Relationship Model, we also modeled the business constraint that
mechanics must only be scheduled for the maintenance of an aircraft if they have been
trained for the aircraft model. The business constraint applies to relationship type
MECHANIC_scheduled_for_AIRCRAFT. As input, it has relationship types
AIRCRAFT MODEL_for_AIRCRAFT and MECHANIC_trained_for_AIRCRAFT MODEL.
When translated into a constraint for tables, it applies to table MECHANIC_FOR_AC which
contains a row for each mechanic scheduled for the maintenance of an aircraft. Tables
AIRCRAFT and MECHANIC_FOR_AM are input for the constraint. Note that table
AIRCRAFT has, as foreign key, columns Type_Code and Model_Number specifying the
aircraft model for the various aircraft. Therefore, table AIRCRAFT_MODEL is not needed
as input for the business constraint.
Notes:
The visual illustrates the trigger for the business constraint discussed on the previous
visual. The trigger ensures that mechanics are scheduled for the maintenance of an aircraft
only if they have been trained for the appropriate aircraft model. The trigger achieves this
as follows:
• In the WHEN clause, it joins tables MECHANIC_FOR_AM and AIRCRAFT on columns
Type_Code and Model_Number. Each row of the intermediate result contains, for an
aircraft, the employee number of an employee that has been trained for the aircraft
model for the aircraft.
The WHERE clause extracts the rows for the aircraft number and the employee number
of the row to be inserted into table MECHANIC_for_AC. The NOT EXISTS predicate
determines if such rows were found. The result is true if rows were not found and false if
rows were found, i.e., the mechanic has been trained for the aircraft model.
• If the WHEN clause is true, i.e., if the mechanic has not been trained for the aircraft
model for the aircraft, the triggered action is performed.
Uempty The triggered action signals a nonzero SQL state that causes the INSERT statement for
table MECHANIC_FOR_AC to terminate. If the mechanic has been trained for the
aircraft model, a zero SQL state is signaled and the row can be inserted.
_for_ AIRCRAFT_
1. .m D TYPE
AIRCRAFT 2
MODEL
{1} NA
1. .1 AIRCRAFT_
_for_ MODEL
m 3
AIRCRAFT NA
AIRCRAFT
1
_in_ DC ENGINE AND {1}
_on_
1. .1 LOCATION 6
m
SN
ENGINE ENGINE
Notes:
Another business constraint for our sample airline company was that the number of
engines mounted on an aircraft must not be larger than the number of engines for the
aircraft type.
The left-hand portion of the visual repeats that, in the entity-relationship model, the
business constraint is modeled as a constraint between entity types AIRCRAFT TYPE and
AIRCRAFT. The aircraft type must be the one for the aircraft whose number of engines is
matched against the number of engines that can be mounted. Therefore, in principle, also
relationship types AIRCRAFT MODEL_for_AIRCRAFT and
AIRCRAFT TYPE_for_AIRCRAFT MODEL are input for the constraint. This could have
been indicated by dashed lines in the entity-relationship model. However, since it is
self-evident and to avoid cluttering the entity-relationship model, it has not been shown in
the entity-relationship model.
When translating the business constraint into a constraint for the tables of the application
domain, it becomes a constraint between tables AIRCRAFT_TYPE and ENGINE. Again, a
similar remark applies: To come to the proper aircraft type for the aircraft on which an
Uempty engine is to be mounted, you must navigate, from ENGINE, via the referential constraints to
table AIRCRAFT_TYPE. Since entity type AIRCRAFT MODEL is a dependent entity type of
AIRCRAFT TYPE, you can directly go from table AIRCRAFT to table AIRCRAFT_TYPE.
To illustrate this, we have also shown table AIRCRAFT as input for the constraint in the
referential structure on the right-hand side of the visual.
Notes:
The implementation of the business constraint that the number of engines mounted on an
aircraft must not exceed the number of engines for the aircraft type requires two triggers: a
trigger controlling insert operations and a trigger controlling update operations. The visual
illustrates the trigger for the insert operations:
• The trigger is activated each time a row is added to table ENGINE. It is activated before
the row is inserted and checks if the new engine violates the constraint.
• The appropriate check is made in the WHEN clause.
The first SELECT statement counts the number of engines for the aircraft for the engine
being added.
The second SELECT statement joins tables AIRCRAFT and AIRCRAFT_TYPE on
column Type_Code. The intermediate result contains, for each aircraft, the number of
engines for its aircraft type.
The SELECT statement further extracts the number of engines for the aircraft type of the
aircraft to which the new engine is to be added.
Uempty The results of the two SELECT statements are compared with each other.
• The triggered action is only performed if the WHEN condition evaluates to true, i.e., if the
number of engines for the aircraft becomes larger that allowed for the type. In this case,
a nonzero SQL state is signaled which causes the INSERT statement to terminate.
If the WHEN condition evaluated to false or unknown, the triggered action is not
performed and the new row can be added to table ENGINE.
The trigger for update operations looks the same except that the name for the trigger must
be different and the second line must read:
NO CASCADE BEFORE UPDATE ON ENGINE
Because column Aircraft_Number of table ENGINE can also be changed by the delete rule
of the referential constraint between tables AIRCRAFT and ENGINE, you should not
specify UPDATE OF Aircraft_Number ON ENGINE.
A further note of caution: Because of current restrictions, the trigger may not work on all
database management systems.
Checkpoint
9. Assume that the controlling property has been specified for the
source of an m:m relationship type. How can you ensure that the
row for a source instance is deleted if the row for an affiliated
relationship instance is deleted?
_____________________________________________________
_____________________________________________________
_____________________________________________________
15. Referential constraints must be defined for the parent table. (T/F)
22. How can you ensure that derivable data are always correct?
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
24. What are the main ingredients for achieving constraint integrity?
_____________________________________________________
_____________________________________________________
_____________________________________________________
Uempty
Unit Summary
Referential integrity requires that all foreign key values have matching
parent key values
The delete rules for referential constraints are NO ACTION, RESTRICT,
SET NULL, and CASCADE
The update rules for referential constraints supported by most systems are NO
ACTION and RESTRICT
There exist restrictions for delete-connected tables and referential cycles
A referential structure provides an overview of the referential constraints for
the application domain or a subset thereof
Domain integrity requires the correctness of the values of the columns for
the tabIes of the application domain
Redundancy integrity requires the consistency of redundant information
Constraint integrity requires the observance of the business constraints for
the application domain
For achieving redundancy or constraint integrity, triggers can be
used (if necessary, in conjunction with user defined function)
Notes:
Unit Objectives
Notes:
Conceptually, the tables established can be implemented without indexes. However, in this
case, accessing a row may mean searching the entire table for the row and may be very
time-consuming and expensive. Indexes present a means for directly accessing specific
rows and are needed to ensure performance.
In this unit, we will describe the basic structure of indexes and demonstrate how they are
used for directly accessing a row. Furthermore, we will talk about various options for (forms
of) indexes such as unique or nonunique indexes.
In addition, we will discuss for which columns, from a database design perspective, you
should establish indexes. We will not talk about the usage of indexes from the
business-process perspective. The requirements of the business processes for indexes
depend on their usage patterns and may change in the course of time. Therefore, indexes
for business processes should be established as and when needed and dropped when
they are no longer needed.
The database management systems generally provide means for analyzing queries to
determine the need for and effectiveness of indexes.
Tuple Types
Tables
Logical Data Structures
Integrity Rules
Notes:
This unit deals with the establishment of indexes for the tables of the application domain.
Therefore, it follows the establishment of the tables. Because the referential integrity
support of most database management systems requires indexes, the establishment of
indexes even follows the establishment of the integrity rules. It is the last step of storage
view.
Uempty
Purpose of an Index
A340 200 A300 600 B737 300 A320 200 B737 600 B777 200 B747 400
Notes:
The main purpose of an index is to improve performance in cases in which, otherwise, the
rows of the table would have to be scanned for locating a row. The visual illustrates this for
table AIRCRAFT_MODEL for our sample airline company called Come Aboard. Without an
index, when searching for an aircraft model, the data pages with the rows of the table must
be retrieved and scanned until the model has been found.
If the row is not contained in the table or multiple rows may exist for the same search
criterion, all rows for the table must be inspected. As you can see, this may require a lot of
pages (blocks) to be read and, as a consequence, a lot of I/O operations and may be very
expensive. The situation can be remedied by an index.
Indexes allow the database management system to directly access individual rows rather
than having to scan the rows of the table.
As we will see on the next visual, indexes logically order the rows of the table according to
the columns to which they apply. Per se, they do not order the rows physically even though
they may be used to ensure that the physical order corresponds to the logical order as
closely as possible.
The logical order of an index allows, without sorting, the rows of the table to be processed
in that logical order rather than in their physical order. Combining the logical order with the
direct-access capability, an index allows you to start logical sequential processing at a
specific row and/or end it with a specific row. In particular, this supports the BETWEEN
predicate for SQL queries.
As already indicated, by using the logical ordering of an index, the database management
system may be able to avoid internal sorting of the rows retrieved. In particular, this may be
the case for SELECT statements using ORDER BY, GROUP BY, or DISTINCT.
Uempty
Structure of Indexes
Index Key =
Root Page B737 300 B777 200 X'FF...FF' (Type_Code,
Model_Number)
Leaf Pages
A300 600 A320 200 A340 200 B737 300 B737 600 B747 400 B777 200
A340 200 A300 600 B737 300 A320 200 B737 600 B777 200 B747 400
Notes:
An index is based on a key, i.e., an ordered set of columns of a table. It is a multilevel tree
structure logically ordering the rows of a table in accordance with the key for the index. The
order can be ascending or descending depending on what you have requested. You can
determine the order when defining the index.
Assuming that the physical order of the rows may be different from the logical order implied
by the key, the index must be a dense index. This means that all key values must be
reflected by index entries in the lowest index level.
On the bottom of the visual, you see data pages with sample rows for table
AIRCRAFT_MODEL. On the lowest index level, there must be an index entry for each row.
The index entries are generally grouped into index pages. Within an index page, the index
entries are sorted in the requested order in accordance with the key for the index. The key
ranges for the index pages do not overlap.
Each index entry contains a key value and a pointer to the appropriate row(s) as indicated
on the visual. Thus, all rows of the table must be pointed to by index entries (dense
indexes).
The pages of the lowest index level are referred to as leaf pages. In general, the leaf pages
are chained together, forward and backward, in the ordering sequence emphasizing that
the index logically orders the rows of the table. The chaining of the leaf pages is used for
the logical sequential processing of rows.
Since an index may consist of many leaf pages (even more than data pages), it would still
be very inefficient to search the leaf pages for a particular row. Therefore, higher index
levels are introduced, again consisting of index pages referred to as nonleaf pages. The
index entries of the second index level (the one above the leaf pages) order the leaf pages.
Each index entry contains a key value and a pointer to a leaf page. Assuming an ascending
key sequence, the key value must be a key value higher than or equal to the highest key
value of the leaf page and lower than or equal to the lowest key value of the logically next
leaf page. On the visual, the lowest key value of the logically next leaf page is used as DB2
does. This has an advantage when inserting rows as we will see later.
The last index entry on any higher index level has a key value of all bytes hexadecimal FF,
the highest key value possible.
Since the second index level orders pages rather than rows, it will generally contain only a
few pages. If it contains more than one nonleaf page, a third index level is introduced to
order the pages of the second index level, and so on. The tree structure stops with an index
level that has only one index page.
The index page of the highest index level is referred to as root page.
The indexes established this way are balanced trees meaning that the number of index
levels to be traversed from the root page to a row is the same for all rows. There are other
types of indexes possible, but balanced trees have proven to be the best especially if the
distribution of the key values is random and cannot be predicted in advance.
Most indexes have two or three levels.
Uempty
Searching Via an Index
Searching for Index Key =
B737 600 B737 300 B777 200 X'FF...FF' (Type_Code,
Model_Number)
A300 600 A320 200 A340 200 B737 300 B737 600 B747 400 B777 200
A340 200 A300 600 B737 300 A320 200 B737 600 B777 200 B747 400
Notes:
Using the index illustrated on this visual and the previous visual, aircraft model Boeing
B737, Model 600, is searched for. The index is an index in ascending order.
First, the root page of the index is searched for the proper index entry. The proper index
entry is the first index entry whose key value is higher than the given key value. This is the
search rule for all index levels above the leaf-page level.
In our example, the proper index entry is the second index entry of the root page, i.e., the
one with key value (B777, 200). The entry points to the second leaf page.
When searching leaf pages, you look for the last index entry whose key value is lower
than or equal to the given key value. If you find an index entry with the given key value,
the desired row exists and is pointed to by the index entry. If the key value of the index
entry is lower, the desired row does not exist.
In our case, the entry found is the second index entry of the second leaf page and has the
key value searched for. Thus, the row exist and, indeed, the index entry points to the row.
Consider a table with 20,000 rows of 100 characters each and assume that the key for the
index is 8 characters long. Furthermore, assume that the size of the data pages and of the
index pages is 4K.
Under these assumptions, the rows occupy approximately 500 pages and the index
consists of only two index levels. Reading a row using the index requires three pages to be
accessed. In contrast, scanning the rows would require on the average 250 pages to be
accessed assuming the system stops scanning when it has found the row (which is
generally not the case). This illustrates very clearly the advantage of having an index.
Uempty
Unique and Nonunique Indexes
Unique-Where-Not-NULL Index
Every value of the key excluding the NULL value may occur at most once;
NULL value may occur multiple times
For foreign keys for imbedded 1:1 relationship types
Nonunique Index
Every value of the key may occur multiple times
For foreign keys for merged or imbedded 1:m relationship types
For individual columns of composite keys
Notes:
Unique indexes come in two flavors:
• Plain unique indexes consider the NULL value as a regular value and require/enforce
that each key value occurs in at most one row.
For an index key consisting of one column, this means that the NULL value may occur
in at most one row. Thus, uniqueness is enforced for all values including the NULL
value.
For an index key consisting of two columns, two key values (a,b) and (c,d) are
considered equal if a=c and b=d. This includes the NULL value: (a,NULL) and (c,NULL)
are considered different if a and c are different and identical if they are the same.
In particular, (plain) unique indexes can be used for the following two purposes:
- They can be used to guarantee the uniqueness of the values of the primary key for a
table.
- They can be used to guarantee the uniqueness of the values of a foreign key
resulting from the merging of the tuple type for a 1:1 relationship type.
Because the relationship type is a 1:1 relationship type, each defining attribute could
be the relationship key. Therefore, the corresponding (composite) attributes of the
related tuple type can assume each value only once. Thus, also the attribute that has
not been made the primary key of the tuple type.
For tuple types to be merged, they must, at all times, have the same primary key
values and, thus, number of tuples. As a consequence, the foreign key resulting from
the defining attribute that has not been made the primary key can assume each value
only once. Therefore, a (plain) unique index can be used to ensure the uniqueness of
the foreign key values.
• Unique-where-not-NULL indexes treat each occurrence of the NULL value as different
and require/enforce that each key value occurs in at most one row.
For an index consisting of one column, this means that each value except the NULL
value must occur in at most one row. Thus, uniqueness is only enforced for those
values that are not NULL.
For a key consisting of two columns, each occurrence of (a,NULL) is considered
different. Thus, uniqueness is only enforced for those key values for which none of their
components is the NULL value.
Unique-where-not-NULL indexes can be used to guarantee the uniqueness of the
values (that are not NULL) of foreign keys resulting from the imbedding of tuple types
for 1:1 relationship types.
The rationale is similar to the one for merged tuple types of 1:1 relationship types.
However, the imbedded tuple type may have, at any point in time, fewer tuples than the
target tuple type. As a consequence, for some of the rows, the foreign key may not have
a value requiring a unique-where-not-NULL index rather than a plain unique index.
Nonunique indexes allow any value of the key to occur in any number of rows.
In particular, nonunique indexes can be used for:
• The foreign keys resulting from merged or imbedded 1:m relationship types.
• Individual columns of composite keys. Even though the values of the composite key
may have to be unique, the values of the individual columns need not be unique.
Uempty
Clustering Index
Notes:
Indexes marked as clustering indexes are used by the database management system to
control where new rows are inserted. They are used to determine the insertion point for the
new rows.
By using an index to determine the insertion point, the database management system
attempts to make the physical sequence of the data pages equal to the logical sequence
implied by the index. However, a new row is inserted at the point determined via the index
only if the located page contains enough free space for the row. Therefore, when defining
the space object for the table, you should request that free space is left in the data pages
for later insertions when the rows are loaded.
As mentioned before, the database management system attempts to insert the new row in
the page determined by means of the clustering index. If the data page does not contain
enough space for the row, the row is not inserted into the page. The data page is not split
either. Instead, the database management system inserts the new row into the closest
page with sufficient free space in the neighborhood of the ideal insertion point. If none of
the pages in the neighborhood has enough free space, the row is inserted somewhere
else.
Since the logical order of the index determines the physical order of the data pages and the
rows can only be ordered according to one criterion, there can only be one clustering index
for a table.
A clustering index is advantageous if you have business processes processing the rows in
the logical order imposed by the index. Since the data pages are pretty much in the logical
order of the key, the database management system need not jump permanently from one
place on the storage volume to another. It can efficiently use techniques such as sequential
prefetch to read a set of physically adjacent data pages with a single I/O operation.
Clustering indexes are only supported by a few database management systems. They are
supported, for example, by DB2 Universal Database for z/OS.
Uempty
Clustering Index - First Insertion (1 of 2)
Inserting Index Key =
B757 300 B747 400 B777 200 X'FF...FF' (Type_Code,
Model_Number)
A300 600 A320 200 A340 200 B747 400 B777 200
Insertion
Point
A300 600 A320 200 A340 200 B747 400 B777 200
Notes:
The visual illustrates how the insertion point for a new row of table AIRCRAFT_MODEL is
determined using a clustering index. The aircraft model to be inserted has type code B757
and model number 300.
The index is searched in precisely the same manner as for the retrieval of rows: The
second index entry of the root page is the first index entry with a higher key value than the
new row. Therefore, it is the one that points to the proper leaf page, the second leaf page
on the visual.
In the leaf page, you look for the last index entry with a key value lower than or equal to the
key value of the new row. Since the leaf page contains a single index entry, it is the entry
found and its key is lower. As a consequence, the new row will be inserted into the data
page pointed to by the index entry provided the data page has sufficient free space.
For the example, the new row is to be inserted into the third data page. It contains enough
free space. The new row will follow the row with key (B747, 400). This is indeed the proper
place for maintaining the data pages in the logical order implied by the index. The next
visual shows the data pages and the index after the insertion of the row.
A300 600 A320 200 A340 200 B747 400 B757 300 B777 200
A300 600 A320 200 A340 200 B747 400 B757 300 B777 200
Notes:
Since the third data page has enough free space for the new row, the row with key
(B757, 300) is inserted into this data page. An appropriate index entry is added to the
second leaf page.
Uempty
Clustering Index - Second Insertion (1 of 2)
Inserting Index Key =
B767 200 B747 400 B777 200 X'FF...FF' (Type_Code,
Model_Number)
A300 600 A320 200 A340 200 B747 400 B757 300 B777 200
Insertion
Point
A300 600 A320 200 A340 200 B747 400 B757 300 B777 200
Notes:
The visual illustrates the locating of the insertion point for a second insertion: A new row
with key (B767, 200) is to be inserted.
This time, the second index entry of the second leaf page is the last index entry whose key
value is lower than or equal to the key value of the row to be inserted. The data page
pointed to by the index entry is the third data page, the same as for the previous insert
request.
Since the third data page does not have any free space, the new row cannot be inserted
into the data page. The system looks for the closest data page with enough free space. The
second data page and the fourth data page have enough free space and are equally close.
Since the index is in ascending order, later pages in logical order are preferred and the
fourth data page is chosen. The new row will be inserted into the free space of the fourth
data page, i.e., following the row with key (B777, 200).
The next visual illustrated the insertion of the row.
A300 600 A320 200 A340 200 B747 400 B757 300 B767 200 B777 200
A300 600 A320 200 A340 200 B747 400 B757 300 B777 200 B767 200
Notes:
Since the third data page does not have enough free space for the new row, the row with
key (B767, 200) is inserted into the fourth data page. An appropriate index entry is added
to the second leaf page.
As you can see, the physical order of the rows no longer coincides with the logical order.
Uempty
Partitioning Index
Notes:
Partitioning indexes are a special form of clustering indexes. Thus, they have all the
features of clustering indexes. In addition, you must define key ranges for the key values of
the index. In turn, the key ranges for the index subdivide the rows for the table into
corresponding key ranges referred to as partitions.
Assuming that a partitioning index has been defined on column Employee_Number of table
EMPLOYEE for Come Aboard, the employees are partitioned in accordance with the key
ranges for the index. The example on the visual illustrates a partitioning into three
partitions: The first partition contains all rows for employees with an employee number
smaller than or equal to 1350000; the second partition all rows for employees with
employee numbers larger than 1350000, but not larger than 2999999; and the third
partition the rows for all remaining employees.
The partitioning can, however, only have an effect if something more is connected to it.
Generally, the following functions come with the subdivision into partitions:
• The rows of the partitions are placed into different physical spaces which may reside on
different cylinders or even different volumes. The rows for a partition are always placed
into the physical space for that partition and never into the physical space for another
partition.
• Utilities as, for example, a load utility, a backup utility, a recovery utility, or a
reorganization utility, can process individual partitions; can process the partitions
separately and jointly; and can process the partitions in parallel.
• SQL operations can process the partitions in parallel using multiple tasks or processes
of the operating system. This can considerably reduce the run time for SQL operations,
especially, queries.
These points imply that partitioning indexes are worth considering if you have large tables.
However, it is a prerequisite that you can reasonably subdivide the rows of the table into
partitions.
Since clustering indexes are only supported by a few database management systems,
partitioning indexes are also only supported by a few database management systems. For
example, they are supported by DB2 Universal Database for z/OS. Other systems use
other means to partition the rows of tables.
Uempty
Use of Indexes
Need not have a second index if other index exists whose key contains:
Key columns for second index as leading key columns
Key columns for second index in same sequence
If the table does not change after loading, you can create any indexes
Notes:
From a database design perspective, the following rules apply for the use of indexes:
• For each primary key, define a (plain) unique index independent of the number of data
pages the table occupies. By allowing each primary key value to occur only once, the
index ensures the unique identification of the rows. Without the index, each primary key
value could occur more than once.
If you are using the referential integrity support of your system for a referential constraint,
most systems require a unique index for the primary key. The index (and only the index)
is used to check if the parent row exists for a row inserted into the dependent table.
• For each foreign key, define an index if the rows of the table occupy more than three data
pages.
When using the referential integrity support of your database management system, the
system will generally not force you to have an index for the foreign key. If an index exists
for the foreign key, it is used to ensure the referential integrity when deleting rows of the
parent table.
For delete rules NO ACTION or RESTRICT, the index (and only the index) is used to
check if the dependent table has dependent rows. For delete rule SET NULL, it is used
to determine the dependent rows whose foreign key values must be reset to NULL. For
delete rule CASCADE, it is used to determine the rows of the dependent table that must
be deleted.
The missing index for the foreign key of a referential constraint is often the reason for
complaints about the poor performance of the referential integrity support.
• If you have an index for a composite key, you do not need an additional index for leading
columns of the key if you should need such an index. This assumes that the columns are
in the required order. The system is generally able to use the index for the composite key
since it is, in particular, ordered in accordance with the required leading key columns.
• The maintenance of indexes due to insert, update, or delete operations cannot be
neglected. Therefore, you should introduce additional indexes only if they are really
required by the business processes and not as a precautionary measure. Of course, you
can add any indexes, as long as you do not mind the space they occupy, if the rows of
the table do not change after the loading of the table.
• Good candidates for indexes are columns that are used for Join operations, the SQL
ORDER BY, GROUP BY, or DISTINCT clauses/keywords, or for the direct access of
rows.
Most systems have tools that allow you do determine the effectiveness of an index.
Uempty
No Index for Leading Foreign Key
Employee_Number
MECHANIC
Aircraft_Number
AIRCRAFT
Notes:
For m:m relationship types, you need a table containing columns for the defining attributes
for the relationship type. As you know, the defining attributes together form the relationship
key and, therefore, the primary key of the table. Thus, you should have a unique index
comprising all columns for the defining attributes.
For each of the defining attributes, the key consisting of the columns for the defining
attribute represents a foreign key. One of these keys is the first part of the primary key.
Thus, its columns are the leading columns of the primary key. Therefore, you do not need
an additional index for that key. However, you should have an index for the other foreign key
because the primary (key) index is ordered differently.
The visual illustrates this for table MECHANIC_FOR_AC, the table for m:m relationship
type MECHANIC_scheduled_for_AIRCRAFT. The table consists of the primary key
columns for tables MECHANIC and AIRCRAFT, i.e., columns Employee_Number and
Aircraft_Number. Together, the columns form the primary key. Let us assume that
Employee_Number is the first column of the primary key and that there is a unique index for
the primary key.
Uempty
Indexes - Documentation
Index Name: Name for index. Should be unique for application domain
Key Ranges: For a partitioning index, key ranges for the partitions
Notes:
For each index, you should provide the following information:
• The name of the table for which the index is established.
• You should select a name for each index in agreement with the naming rules for indexes
for your database management system. The name for the index should be unique for the
application domain and must not be the name of a table.
The name is only used to identify the index to the database management system. It is
not needed by end-users or for any other objects being defined. Therefore, you could
omit it and leave it to the database administrator to select a name when he/she defines
the index.
• The ordered list of columns making up the key for the index.
• The properties the index should have: If it should be a (plain) unique index, a
unique-where-not-NULL index, or a nonunique index; if it should be a clustering index or
a partitioning index or not.
• For a partitioning index, the key ranges for the partitions of the table.
Checkpoint
Uempty 7. For an index whose key consists of one column, match the
following definitions with the corresponding type of index:
a. Every value including the ____ Nonunique index
NULL value must occur in
at most one row.
b. Every value excluding the ____ Plain unique index
NULL value must occur in
at most one row.
c. Every value can occur in ____ Unique-where-not-NULL
any number of rows. index
10. What does the system attempt to do if you have a clustering index
for a table?
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
11. Which of the following actions are taken if the data page for a new
row determined by a clustering index does not have enough free
space for the row?
a. The row is not inserted.
b. The data page is split in half and the row is inserted into one of
the new data pages.
c. If there is a data page in the neighborhood that has enough free
space, the row is inserted into that data page; otherwise, it is
inserted somewhere else.
d. The row is always inserted at the end of the data pages.
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
12. From the database design perspective, for which columns should
you establish an index?
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
_____________________________________________________
Uempty
Unit Summary
Indexes allow the database management system to directly access the rows of
a table
Indexes support the logical sequential processing of rows without sorting
Indexes help avoid internal sorts by the database management system
Plain unique indexes ensure that each value of the index key including the
NULL value occurs in at most one row
Notes:
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Unit Objectives
After completion of this unit, you should be able to:
Notes:
After having established the tables, integrity rules, and indexes for the application domain,
we must make the transition from storage view to logical view. The transition verifies the
design of the database and proves that it meets the requirements of the business
processes. The verification is accomplished by establishing the logical data structures for
the business processes described in the process inventory.
This unit describes logical data structures and briefly discusses views which complement
them. It describes:
• The purpose of logical data structures.
• Who is responsible for establishing the logical data structures for the business
processes and the role of the database designer.
• The components of logical data structures and how they are represented.
• The relationship between the business processes and logical data structures.
• The interrelationship between logical data structures and views. Views are relational
database objects describing subsets and combinations of one or more tables.
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Tuple Types
Tables
Logical Data Structures
Integrity Rules
Notes:
After the tables, integrity rules, and indexes for the application domain have been
determined, it is time to verify that the database design meets the requirements of the
business processes. As part of logical view, the logical data structures are established for
all business processes described in the process inventory.
The logical view looks at the data of the application domain from the perspective of the
business processes for the application domain. Accordingly, the logical data structures
show which tables of the application domain contain the data needed by the business
processes. They also describe how to navigate from table to table when accessing the
data.
The tables established for the application domain are the primary input for the
establishment of the logical data structure. The integrity rules, more precisely, the
referential constraints between primary keys and foreign keys, are a second input because
they show the natural paths between the various tables.
Uempty
Logical Data Structures - Purpose
Data Inventory
Data
Process Elements and
Data Groups
Logical Data
Tables
Structures
Notes:
As the last step of the conceptual view, the data needed by the business processes of the
process inventory were described as data elements and data groups in the data inventory.
Based on the data elements and data groups in the data inventory, the tables for the
application design were developed. The data elements became the columns of the tables.
The data groups only provided structural information needed for normalization and the
splitting of tuple types.
In general, the data elements for a single business process constitute a small subset of the
columns of the tables and may be located in different tables. Therefore, for the individual
business processes, it is necessary to identify:
• The columns and tables corresponding to the data elements used by the business
process.
• How the business process can find, using the data found in one table, related data in
other tables.
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
This is where the logical data structures come into play. The logical data structures for a
business process describe this. They describe the subset of tables and columns needed by
the business process or a part of it. They also illustrate how the business process or the
appropriate part must navigate logically through the tables to achieve its function. Thus,
they reflect the logical view the business process (or the part) has of the tables and the
data flow between the tables for the business process.
Uempty
Logical Data Structures - Responsibilities
Logical views of tables and data
flows between tables for processes
Must show application
programmers:
Which tables contain data for processes
How to navigate from one table to the next
Joint effort between
Database designer
Knows tables
Knows referential constraints
and application programmers
Must write programs for processes
(Should) know processes
Input for application programmers
Allows verification of tables for application
domain
Figure 10-4. Logical Data Structures - Responsibilities CF182.0
Notes:
The logical data structures describe the logical views the business processes have of the
tables and the flow of data between the tables for the processes. They must show the
application programmers which tables contain the data (columns) for the business
processes and how to navigate from one table to the next. Thus, when establishing the
logical data structures, the interfaces between the business processes and the tables are
exposed.
The development of the logical data structures is a joint effort between the database
designer and the application programmers. The database designer must participate in the
development because he/she knows the tables, their columns, and the referential
constraints between the tables. The referential constraints, representing relationships
between primary keys and foreign keys, provide natural paths between the tables. They
are the primary vehicles for interconnecting the various tables.
However, the database designer cannot establish the logical data structures on his/her
own. The establishment of the logical data structures requires a detailed knowledge of the
business processes and may already consider implementation details. Therefore, the
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
application programmers writing the programs or queries for the business processes must
participate in the development of the logical data structures. They should have a primary
interest in the logical data structures. They should also have the required knowledge of the
business processes. How else can they implement them?!
Instead of the application programmers, the application domain expert could participate in
the development of the logical data structures. However, since implementation
considerations may affect the logical data structures for a business process, the
participation of the application programmers is preferable. Because the logical data
structures are input for the application programmers, they should be the driving force in
establishing them.
You may ask what the database designer's interest is in the development of the logical data
structures? He/She has a very good reason for participating in their development. By
establishing the logical data structures, the correctness and completeness of his/her
database design is verified. In addition, some performance bottlenecks may be revealed
leading to additional denormalizations, the combining of tables, and the splitting of tables.
The detection of design problems requires a reiteration of the design process rather than
patches to the tables. By just patching the tables, the quality of the design is jeopardized
and the rationale for design decisions is easily abandoned. If the changes are minor, it
does not take much time to verify and correct the intermediate design steps and, thus,
validate the basic design concept. If the changes are major, you better follow the design
steps from top to bottom when rectifying the problem.
Uempty
Sample Business Process
Display Maintenance Record Summary
1. The date when the maintenance was performed and the type of
maintenance performed.
2. The employee number and the name of the mechanic who performed the
maintenance.
3. The aircraft number of the aircraft for which the maintenance was performed.
4. If the aircraft is still owned by CAB, the date when the aircraft was
manufactured, the date when the aircraft was put into service, the model
number and type code for the aircraft, and the name of the manufacturer.
5. For each subrecord (direct or indirect) for the maintenance record, the date
of the maintenance and the type of maintenance performed.
Notes:
The business process on this visual is a business process for our sample airline company
called Come Aboard. For a given maintenance number, the business process displays
information about the maintenance record, the aircraft for maintenance record, and the
subrecords for the maintenance record.
For the maintenance record itself, it displays the date when the maintenance was
performed and the type of maintenance performed. In addition, it displays the employee
number and the name (last name, first name, and middle initial) of the employee that
performed the maintenance. Furthermore, the aircraft number of the aircraft is displayed for
which the maintenance was performed.
If Come Aboard still contains data about the aircraft, the date when the aircraft was
manufactured and the date when the aircraft was put into service are displayed. In addition,
the model number and type code for the aircraft and the name of the manufacturer are
displayed.
A maintenance record may have subrecords which again may have subrecords and so on.
For each subrecord, the date and type of maintenance are displayed.
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-9
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
We will come back to the various points on the visual when discussing the logical data
structure for the business process.
Uempty
Sample Structure Diagram
INPUT
MAINTENANCE_
RECORD/1
2 3 6
MAINTENANCE_
C
EMPLOYEE AIRCRAFT RECORD/2
7
4
AIRCRAFT_
TYPE
MANUFAC-
TURER
Notes:
A logical data structure consists of three components:
• A Structure Diagram illustrating how the various tables for the logical data structure are
interconnected. Since the structure diagram is the component resembling most what
you would expect from a structure, the term logical data structure is frequently used
synonymously for it.
• A Path Summary describing the columns through which the tables of the structure
diagram are interconnected.
• A Table Summary listing the columns needed for the various tables of the structure
diagram.
The current visual illustrates the structure diagram for the logical data structure for our
sample business process. Basically, the structure diagram looks as follows:
• The rectangular boxes in the structure diagram represent the tables used by the
business process (or a part of it) associated with the logical data structure. The boxes
contain the names of the tables.
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-11
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
If a business process uses the same table multiple times, for the same purpose or for
different purposes, the table occurs multiple times in the structure diagram. A different
usage may require different columns of the table. To tell the different uses apart and
correctly assign the columns to their uses in the table summary, the names of tables
occurring multiple times are appended by "/n". n uniquely numbers the different uses. In
the example, table MAINTENANCE_RECORD is used for two purposes as will be
described later.
• An arrow interconnecting two tables illustrates a data flow in the direction of the arrow.
The table at the beginning of the arrow is referred to as source table for the flow, the
table at the end as target table. A value found in the source table is used unmodified to
access the corresponding rows in the target table. For example, the employee number
found in a maintenance record is used to access the row for the appropriate employee in
table EMPLOYEE ( 2 ). This corresponds to a Join operation for the tables.
The tables can be joined through a single column or multiple columns. The columns in
the target table can be named differently, but their function must be the same.
• The arrows are labeled to establish a reference to the path summary for the logical data
structure. For each interconnection of two tables (path), the path summary lists the
columns of the source table as well as the columns of the target table. If the
interconnection is through multiple columns, the column names are preceded by
sequence numbers establishing the correspondence between the respective source and
target columns.
• As for referential structures, single-headed and double-headed arrows are used to
indicate how many rows may be found in the target table for a value. A single-headed
arrow means that at most one row with the source value may be found in the target
table. A double-headed arrow means that multiple rows with the source value may be
found in the target table.
• If a path corresponds to a referential constraint (a primary-key/foreign-key relationship)
in the direction of the arrow, the delete rule is specified at the target end. The referential
constraint may allow the application programmer to skip steps of the business process
because they are automatically done by the referential integrity support of the system.
• It may happen that a table is accessed recursively (for the same purpose). In this case,
the arrow for the path leads back to the same table as is the case for table
MAINTENANCE_RECORD/2.
It is conceivable that the recursive loop comprises multiple tables.
• The data flow for a business process (or a part of a business process) always starts with
a specific table referred to as entry table. The entry table is identified by an oval labeled
INPUT pointing to it. In case of the sample logical data structure, the business process
starts with table MAINTENANCE_RECORD/1.
The interconnection between the INPUT box and the entry table is also labeled and
described in the path summary, the entry table being the target table.
Uempty Most of the times, a subset of the rows of the entry table is selected based on the values
of certain columns. The columns used for the selection are specified as target columns
in the path summary. Since not applicable, the fields Source Table and Source Columns
remain blank.
So far the general description of the structure diagram. Now, let us explain how we arrived
at the logical data structure for the sample business process:
1. The business process displays information about the maintenance record whose
maintenance number is specified as input. Table MAINTENANCE_RECORD is the entry
table for the logical data structure and the oval labeled INPUT points to it. The
connecting arrow ( 1 ) is a single-headed arrow because table
MAINTENANCE_RECORD can only contain a single row with the specified
maintenance number.
The first step of the business process requests that the date of the maintenance record
and the type of maintenance performed be displayed.
2. As a consequence of the first step of the business process, the path summary for the
logical data structure contains a row for path 1 . The row identifies column
Maintenance_Number as target column for table MAINTENANCE_RECORD.
3. The first step of the business process needs the following columns of table
MAINTENANCE_RECORD: Maintenance_Number, Date_Maintenance, and
Type_Maintenance. Therefore, they are included in the table summary for table
MAINTENANCE_RECORD.
4. The second step of the business process requests that employee number and name of
the mechanic be displayed who performed the maintenance.
Table MAINTENANCE_RECORD contains the employee number of the mechanic who
performed the maintenance. The employee number is used to retrieve the row for the
mechanic in table EMPLOYEE. The row contains the name of the mechanic.
Accordingly, we have a path ( 2 ) from table MAINTENANCE_RECORD to table
EMPLOYEE. The connecting arrow must be a single-headed arrow because table
EMPLOYEE contains a single row for the employee number.
Note we do not need to access table MECHANIC since we do not need any data of that
table. Consequently, path 2 does not correspond to a relationship type of the
entity-relationship model or a referential constraint of the referential structure.
5. The path summary must include a row for path 2 . The row describes that tables
MAINTENANCE_RECORD and EMPLOYEE are joined via column Employee_Number.
The value found for column Employee_Number in table MAINTENANCE_RECORD is
used as search argument for column Employee_Number of table EMPLOYEE.
As the consequence of the second step of the business process, column
Employee_Number is added to table MAINTENANCE_RECORD in the table summary.
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-13
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
In addition, the table summary states that the following columns of table EMPLOYEE
are needed by the business process: Employee_Number, Last_Name, First_Name, and
Middle_Initial.
6. The third step of the business process requests that the aircraft number for the aircraft
be displayed for which the maintenance was performed. Since the aircraft number is
contained in the maintenance record, the structure diagram need not change.
7. Because of Step 3 of the business process, column Aircraft_Number must be added to
the columns needed from table MAINTENANCE_RECORD. The path summary remains
unchanged.
8. The fourth step of the business process requests that the date when the aircraft was
manufactured, the date when the aircraft was put into service, and the model number
and type code for the aircraft be displayed.
If Come Aboard still has information about the aircraft, the requested information is
contained in the row for the aircraft in table AIRCRAFT. To retrieve the row, the aircraft
number in the maintenance record is used (path 3 ). The arrow must be a
single-headed arrow because at most one row can be found in table AIRCRAFT for a
given aircraft number.
Note that there is not a relationship type interconnecting entity types AIRCRAFT and
MAINTENANCE RECORD in the entity-relationship model. Remember that the
maintenance records for an aircraft must be kept even if the remaining information about
the aircraft is deleted. For that reason, there is also not a referential constraint for the
tables.
9. Because of Step 4 of the business process, the path summary must contain a row for
path 3 . The row shows that column Aircraft_Number of table
MAINTENANCE_RECORD is used as search argument for column Aircraft_Number of
table AIRCRAFT.
The table summary comprises a row for table AIRCRAFT listing all columns requested
by Step 4 of the business process.
10.The fourth step of the business process also requests that the name of the
manufacturer of the aircraft be displayed. To find the manufacturer name, we must use
the type code for the aircraft found in table AIRCRAFT and retrieve the row for the
aircraft type from table AIRCRAFT_TYPE. (We need not go to table AIRCRAFT_MODEL
since we do not need model-specific information.)
The row retrieved contains the manufacturer code which is then used to retrieve the row
for the manufacturer from table MANUFACTURER. The retrieved row contains the name
of the manufacturer.
The structure diagram is extended by the two interconnections ( 4 and 5 ) required to
accomplish the requested task.
Uempty 11.As a consequence of the retrieval of the manufacturer name, the path summary
contains two additional rows describing the transitions from table AIRCRAFT to table
AIRCRAFT_TYPE and from table AIRCRAFT_TYPE to table MANUFACTURER.
The table summary reflects that columns Type_Code and Manufacturer_Code of table
AIRCRAFT_TYPE and columns Manufacturer_Code and Company_Name of table
MANUFACTURER are needed.
12.The fifth step of the business process requests that, for all subrecords of the specified
maintenance record, the date and type of the maintenance be displayed.
To obtain the subrecords for the maintenance record, we must retrieve all rows of table
MAINTENANCE_RECORD for which the value of column Owning_Record is equal to
the maintenance number of the specified maintenance record. This is expressed by
path 6 whose source and target is table MAINTENANCE_RECORD.
The structure diagram shows table MAINTENANCE_RECORD twice, and not an arrow
returning to the same table, because we have two different uses of table
MAINTENANCE_RECORD: Once, it is used for the original maintenance record and
once for the subrecords. That the uses are different is underlined by the fact that the
columns needed for the subrecords are different (fewer) and that there are different
interconnections from the subrecords.
The arrow for path 6 must be double-headed because multiple subrecords may exist
for a maintenance record.
13.To obtain unique references for table MAINTENANCE_RECORD in the path summary
and the table summary, "/1" and "/2" are appended to the table name, respectively.
14.The path summary contains a row for path 6 describing that columns
Maintenance_Number of table MAINTENANCE_RECORD/1 and Owning_Record of
table MAINTENANCE_RECORD/2 are joined.
The table summary describes that columns Owning_Record, Date_Maintenance,
Type_Maintenance, and Maintenance_Number of table MAINTENANCE_RECORD/2
are needed by the business process. Even though not expressed explicitly by the
description of the business process, the maintenance numbers for the subrecords must
be displayed to identify the subrecords. Column Maintenance_Number is also needed
for a different reason as we will see in a moment.
15.Looking more closely at the description of Step 5 reveals that not only the immediate
subrecords of the maintenance record are needed, but also all indirect subrecords. This
means that also the subrecords of the subrecords, and again their subrecords, are
needed.
Thus, we need the recursion represented by path 7 : The maintenance number of a
subrecord is used to locate all maintenance records whose column Owning_Record
contains that maintenance number. Since the interconnection corresponds to the
self-referencing constraint for table MAINTENANCE_RECORD, the delete rule
(CASCADE) is specified at the target end of the arrow.
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-15
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
16.The path summary must contain a row for path 7 describing the recursion. The table
summary remains unchanged since additional columns are not needed for table
MAINTENANCE_RECORD/2.
Uempty
Sample Path and Table Summaries
Path Summary
# Source Table Source Columns Target Table Target Columns
1 MAINTENANCE_RECORD/1 Maintenance_Number
2 MAINTENANCE_RECORD/1 Employee_Number EMPLOYEE Employee_Number
3 MAINTENANCE_RECORD/1 Aircraft_Number AIRCRAFT Aircraft_Number
4 AIRCRAFT Type_Code AIRCRAFT_TYPE Type_Code
5 AIRCRAFT_TYPE Manufacturer_Code MANUFACTURER Manufacturer_Code
6 MAINTENANCE_RECORD/1 Maintenance_Number MAINTENANCE_RECORD/2 Owning_Record
7 MAINTENANCE_RECORD/2 Maintenance_Number MAINTENANCE_RECORD/2 Owning_Record
Table Summary
Table Columns
MAINTENANCE_RECORD/1 Maintenance_Number, Date_Maintenance, Type_Maintenance,
Employee_Number, Aircraft_Number
EMPLOYEE Employee_Number, Last_Name, First_Name, Middle_Initial
AIRCRAFT Aircraft_Number, Date_Manufactured, Date_in_Service, Type_Code,
Model_Number
AIRCRAFT_TYPE Type_Code, Manufacturer_Code
MANUFACTURER Manufacturer_Code, Company_Name
MAINTENANCE_RECORD/2 Owning_Record, Date_Maintenance, Type_Maintenance,
Maintenance_Number
Notes:
The visual illustrates the path summary and the table summary for the sample business
process described on page 10-22. For each path of the structure diagram, the path
summary lists the source table and the target table. It also specifies the source-table and
target-table columns that are joined.
For each usage of a table, the table summary specifies the columns needed.
The notes for the previous visual describe how the path summary and the table summary
for the sample business process are derived.
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-17
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
An Alternate Representation
Input
C
Employee_ Aircraft_ Owning_ Maintenance_
Type_Code
Number Number Record Number
EMPLOYEE AIRCRAFT MAINTENANCE_
RECORD/2
Manufacturer_
Still need table summary,
MANUFACTURER Code but not path summary
Notes:
You might already have wondered why the path summary is needed? Indeed, you can
show the joined columns immediately in the structure diagram as done on the above visual
for the sample business process used so far. The arrows then point from the source
column to the target column and labels are no longer needed for the arrows. The names for
the tables are outside the boxes, next to them. The table summary is still necessary since it
is impractical to incorporate all needed columns into the diagram.
The resulting diagram seems to be simpler and clearer. However, this representation does
not always work well. It works well for those cases where a single column is used to
navigate from a table to table. The representation becomes complex and blurred if you
must join the tables on multiple columns and the columns are named differently in the two
tables. It becomes especially confusing if you must join a table with multiple other tables on
multiple columns and the columns overlap.
Furthermore, the above representation requires more space and you might find it more
difficult to squeeze it onto a single page.
Uempty In contrast, if you are using a path summary, you can even omit the structure diagram since
path summary and table summary together contain all necessary information. The
structure diagram just provides a graphical view of the flow between the tables.
Now, you have a choice. Make the best of it.
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-19
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Most of the time, the input for a logical data structure is part of the input
for the business process, but not always
Columns needed for the tables should coincide with the data read or written
by the business process as described in the process inventory
Notes:
The example we studied on the previous pages only required a single logical data
structure. As we have already discussed, a logical data structure describes a continuous
flow through a subset of the tables of the application domain. As for Join operations, the
data found in a row is used unmodified to select the rows of the next table. This entails that
many business processes will require multiple logical data structures since they do not just
use the values found to select the rows of the next table. Rather, they use additional criteria
(search arguments) or derived search arguments. Different or additional search arguments
require a separate logical data structure.
Many logical data structures are simple because they access a single table. In particular,
this applies to the logical data structures involving insert, update, or delete operation
because the corresponding SQL statements only allow the specification of a single table.
The structure diagrams consist of an input box, the box for the table, and an arrow
connecting them. The table summary lists the accessed columns of the table. The path
summary shows through which columns the table is entered, i.e., the search argument for
the rows retrieved, updated, or deleted or the columns inserted. You may opt to omit the
Uempty structure diagram for these logical data structures since it does not provide much
information.
Many paths in logical data structures represent primary-key/foreign-key interrelationships,
but not all, as we have seen for the previous example.
Most of the time, the input for a logical data structure is part of the input for the business
process, but not always. Secondary logical data structures may use a derived input.
The columns needed for the various tables should coincide with the data read or written by
the business process as described in the data inventory (see Unit 5 - Data and Process
Inventories).
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-21
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
1. It is verified that the specified flight and pilot exist. If flight or pilot do not exist, an
appropriate error message is displayed and the business process ends.
2. If pilot and flight exist, it is checked if the pilot has the license to fly the aircraft model
for the leg for the flight. If the pilot cannot fly the aircraft model, an appropriate error
message is displayed and the business process ends.
3. If the pilot has the license to fly the aircraft model, it is checked if the pilot has
already been assigned to the flight. If the pilot is already captain or copilot for the
flight, an appropriate message is displayed and the business process ends.
4. If the pilot has not yet been assigned to the flight, it is checked if another pilot is
already captain for the flight. If so, a message is displayed containing employee
number, last name, and first name of the current captain and the business process
ends.
5. If a captain has not yet been assigned to the flight, the specified pilot becomes the
captain for the flight.
6. A message is displayed confirming that the pilot has been assigned as captain to the
flight. The message includes employee number, last name, and first name of the
assigned captain.
Notes:
The visual displays the textual description of business process Assign Captain for Flight
which we already discussed in Unit 5 - Data and Process Inventories. This business
process will require multiple, fairly simple, logical data structures.
Uempty
Example 2 - Structure Diagrams
Structure 2 Structure 3
INPUT INPUT
Structure 1
1 1
INPUT
PILOT_FOR_ PILOT_
AM ASSIGNMENT
1
FLIGHT
Structure 4 Structure 5
2
INPUT INPUT
LEG
1 1
PILOT_
EMPLOYEE
ASSIGNMENT
Notes:
The sample business process described on the previous visual requires multiple logical
data structures as explained in the following:
1. The first two steps of the business process verify that the specified flight and pilot exist
and the pilot has the license to fly the aircraft model for (the leg for) the flight.
As a matter of fact, we need not explicitly verify that the specified employee is a pilot. It is
sufficient to verify that he/she belongs to the persons having the license to fly the aircraft
model for the flight. If we do not find him/her in the list of the persons, the business
process ends anyway. If he/she is in the list, we know that the specified employee is a
pilot. The referential constraints for table PILOT_FOR_AM enforce this. Table
PILOT_FOR_AM which contains a row for every valid pilot/aircraft model combination is
constrained by table PILOT.
The point discussed represents an implementation detail. It confirms that the application
programmers should participate in the establishment of the logical data structures.
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-23
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
To verify that the flight exists, we must access table FLIGHT using the specified flight
number, airport of departure, airport of arrival, and flight locator. Using the values found
in columns Flight_Number, From, and To, we must navigate to table LEG to determine
the aircraft model for the flight (columns Type_Code and Model_Number).
To verify that the specified pilot can fly the aircraft model, we have two choices:
•We can access table PILOT_FOR_AM just using the type code and the model number
for the aircraft model. In this case, we need to retrieve all pilots that can fly the aircraft
model until we have found the specified pilot or know that he/she is not in the list.
•We can access table PILOT_FOR_AM using the type code and the model number for
the aircraft model and the employee number of the specified pilot. In this case, we will
retrieve at most one row. If a row is returned, the specified employee is a pilot and can
fly the aircraft model. If a row is not returned, the specified employee is not a pilot or
cannot fly the aircraft model. In either case, he must not be considered for the flight.
If we took the first choice, we could continue the structure diagram to table
PILOT_FOR_AM since the value found in table LEG is used unmodified to navigate to
table PILOT_for_AM.
For the second choice, the search arguments for table PILOT_FOR_AM are the type
code and model number found in table LEG and the specified employee number. Thus,
a second logical data structure is required.
The first choice is a poor performer and we will choose the second alternative assuming
that an index is provided for the primary key of table PILOT_FOR_AM.
Since choosing the second alternative, the structure diagram for Structure 1 ends with
table LEG. Path summary and table summary for the logical data structure are illustrated
on page 10-26.
2. As explained before, we will use the type code and model number for the aircraft model
and the employee number for the pilot to access table PILOT_FOR_AM. Therefore, we
need a second logical data structure (Structure 2). Its structure diagram is extremely
simple since only one table is accessed. It consists of an input box, table
PILOT_FOR_AM, and the arrow interconnecting them. The structure diagram does not
continue further because we must use different inputs for the subsequent steps of the
business process.
Path summary and table summary for the logical data structure are on page 10-26.
3. Steps 3 and 4 of the business process check if the pilot has already been assigned to
the flight or if another pilot is already captain for the flight.
Both questions can be answered by a single access to table PILOT_ASSIGNMENT. For
this access, only the flight information (flight number, airport of departure, airport of
arrival, and flight locator) is used and not the employee number of the pilot. At most, two
rows are returned: one for the captain of the flight and one for the copilot. The returned
rows are then examined.
Uempty The appropriate logical data structure is Structure 3. Its path summary and table
summary are on page 10-26.
Note that columns Employee_Number and Pilot_Function must be retrieved to make the
necessary decisions.
4. If another pilot has already been assigned as captain to the flight, the fourth step of the
business process requests that employee number, last name, and first name of that pilot
be displayed. For this, we need a further logical data structure (Structure 4). Its path
summary and table summary are on page 10-26.
You might ask why Structure 3 is not continued to table EMPLOYEE? Continuing the
structure to table EMPLOYEE would mean that table EMPLOYEE were accessed for
every row retrieved from table PILOT_ASSIGNMENT. However, this is not the case for
an existing copilot assignment and we do not want to make unnecessary accesses for
the no-error cases.
5. The fifth step of the business process assigns the specified pilot as captain to the flight.
This gives rise to logical data structure Structure 5. Its path summary and table summary
are on page 10-26.
At the first glance, the logical data structure seems to be the same as Structure 3.
However, Structure 3 was for retrieval whereas Structure 5 is for the insertion of rows
and its path summary is different. As target columns, it shows all columns of table
PILOT_ASSIGNMENT meaning that they are input for the insert request.
6. The final step of the business process (Step 6) requests that employee number, last
name, and first name of the newly assigned captain be displayed. This requires an
access to table EMPLOYEE. For this access, we do not need an additional logical data
structure. Structure 4 can be used with the employee number of the new captain for the
flight.
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-25
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
This visual illustrates the path and table summaries for logical data structures Structure 1
and Structure 2 for the second sample business process.
Uempty
Example 2 - Path and Table Summaries (2 of 3)
Structure 3 - Path Summary
# Source Table Source Columns Target Table Target Columns
1 PILOT_ASSIGNMENT 1: Flight_Number, 2: From, 3:
To, 4: Flight_Locator
Notes:
This visual illustrates the path and table summaries for logical data structures Structure 3
and Structure 4 for the second sample business process.
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-27
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Notes:
This visual illustrates path summary and table summary for logical data structure Structure
5 for the second sample business process.
Uempty
Characteristics of Views
Represent subsets of the data in the tables of the
application domain
May comprise rows and columns of multiple tables
Selected columns can be ordered in any way desired
Selected columns can be renamed
When defining a view, a description of the data represented by the
view is stored
A view receives a name which can be used in SQL
statements where table names can be used
During execution, SQL statement replaced by SQL statement only containing
actual column and table names which is then executed
Views comprising multiple tables cannot be used in INSERT, UPDATE, or
DELETE statements
When the data described by the view is displayed, it is presented
in form of a table
All data comes from the base tables and not from a table corresponding
to the view
Data is always up-to-date
Notes:
Views are database objects representing subsets of columns and rows of one or more
tables. Thus, by means of views, you can represent the views logical data structures have
of the data in the tables of the application domain.
For example, you could define a view joining tables MAINTENANCE_RECORD,
AIRCRAFT, and EMPLOYEE and selecting a subset of their rows and columns:
• The rows of tables MAINTENANCE_RECORD and AIRCRAFT having the same aircraft
number should be combined.
• The resulting rows should be joined with the rows of table EMPLOYEE having the same
employee number.
• The view should contain the following columns of the three tables:
From table MAINTENANCE_RECORD:
Maintenance_Number, Date_Maintenance, Type_Maintenance,
Aircraft_Number, and Employee_Number
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-29
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Uempty
Usage of Views
Views allow you to limit the data an end user can see or change
Data security
Notes:
Views are an important tool for achieving data security for your tables since they limit the
data end users or programs can see or change. By not allowing direct access to the actual
tables, you can limit the access of people to the data of the views you authorized them for.
Another positive aspect of views is that end users and business processes only see the
data they are interested in. Thus, the data presented to end users are more readily
understandable and the programs for the business processes need not provide variables
for data they do not need. Consequently, views ease the work of end users and application
programmers.
Explicitly naming the columns in the view definition makes your business processes more
resilient again database changes. If the sequence of the columns in the database changes
due to the redefinition of a table, end users and programs using the view will not realize the
changes and are not impacted. If the actual names of columns change, you can change the
view definition in such a way that end users and programs using the view do not realize the
name changes.
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-31
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
By explicitly naming the columns in the view, you also ensure that end users or programs
do not realize the addition of new columns they are not interested in.
From a design perspective, you should not allow end users or program to directly access
the base tables, i.e., the actual tables. As a consequence, you have more freedom to
change and extend the tables as long as you ensure that the external appearance of the
views remains unchanged. Furthermore, if all columns are explicitly named in the view
definition, end users and programs selecting all columns via "SELECT *" are not impacted
if new columns are added to the table that are not contained in the view definition.
As we have illustrated by means of the example in the notes for the previous visual, views
complement the logical data structures. For a logical data structure, you may have multiple
views. Conversely, a single view may serve multiple logical data structures.
Uempty Checkpoint
3. A logical data structure reflects the data flow between the tables of
the application domain for a business process or a part of it. (T/F)
4. Since the logical data structures are intended for the application
programmers, the database designer is not involved in their
development. (T/F)
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-33
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
11. Views are only descriptions of data. They are not real tables. (T/F)
Uempty
Unit Summary
Notes:
© Copyright IBM Corp. 2000, 2002 Unit 10. Logical Data Structures 10-35
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Overview
Come Aboard (CAB) is an airline servicing a set of airports with its
aircraft. As employees, it has pilots flying the aircraft, mechanics
maintaining and servicing the aircraft, and other personnel for various
service functions.
CAB wants to administer flight planning, pilot assignment, and aircraft
maintenance activities by means of a database management system.
© Copyright IBM Corp. 2000, 2002 Appendix A. Sample Problem Statement A-1
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Pilots
CAB wants to store information (e.g., name, address, phone number,
or date of previous medical check-up) for its pilots. As every employee,
pilots have a unique employee serial number.
Mechanics
CAB wants to store information (e.g., name, address, phone number,
or area of expertise) for its mechanics. As every employee, mechanics
have a unique employee serial number.
Itineraries
Itineraries are ordered collections of consecutive nonstop connections
between airports which are called legs. This means that the ending
airport for the previous leg is always the starting airport for the next
leg.
Itineraries have unique flight numbers (e.g., YY1842). All legs of an
itinerary are operated under the flight number of the itinerary. CAB
wants to maintain information about the itineraries such as the seating
classes offered, the weekdays on which the itinerary is operated
(starting days), and the planned departure and arrival times for the
legs.
Flights
A flight is a scheduled or executed nonstop trip between two airports.
Flights are always related to the legs of itineraries. The information
kept about flights includes, for example, the estimated departure and
arrival times (which might be different from the planned departure and
arrival times for the appropriate leg because of delays) and the actual
departure and arrival times.
The individual flights can be identified by means of a sequence
number, referred to as flight locator, which is unique per itinerary and
leg. Thus, to identify a particular flight, you need to know the flight
number for the itinerary (e.g., YY1842), the airports for the legs (e.g.,
FRA - JFK), and the flight locator (e.g., 453) for the flight.
Maintenance Records
As the aircraft are maintained, maintenance records are established
for them. The information gathered as part of the maintenance records
includes, for example, the type of the maintenance performed and the
date of the maintenance.
Each maintenance record has a unique sequence number referred to
as maintenance number.
© Copyright IBM Corp. 2000, 2002 Appendix A. Sample Problem Statement A-3
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
Itineraries - Flights
For each leg of an itinerary, there may be multiple flights. These can
be scheduled flights or completed flights. Completed flights are kept
for a certain period of time.
A flight always applies to one leg of one itinerary.
Aircraft Models - Legs
Aircraft models are assigned to the legs of an itinerary to define the
kind of aircraft for the flights for the legs. At all times, a leg must have
one, and only one, aircraft model assigned to it. The assignment is
made when the leg is established, but may be changed.
An aircraft model may be assigned to multiple legs. It need not be
assigned to any legs.
Aircraft - Flights
Aircraft are assigned to flights. Flights represent nonstop connections.
Therefore, only one aircraft is assigned to a flight. An aircraft can be
assigned to multiple flights. The aircraft assignment is not necessarily
made at the point in time when the flight is scheduled.
It is possible that, at a given point in time, an aircraft has not been
assigned to any flight.
Pilots - Flights
To each flight, one pilot is assigned as (flight) captain and another pilot
as copilot. This assignment is not necessarily made at the point of time
when the flight is scheduled, but at least three weeks before the flight
is performed.
A pilot can function as captain or copilot for multiple flights. It is
possible that, at a given point in time, a pilot does not have any flight
assignments.
Mechanics - Aircraft Models
Mechanics are trained to repair the aircraft of a specific aircraft model.
A mechanic can be trained for multiple aircraft models. For an aircraft
model, multiple mechanics may have the required training.
It is possible that, temporarily, a mechanic does not have the training
for any of the aircraft models. Conversely, it is possible that, for an
aircraft model, CAB does not have a trained mechanic.
Mechanics - Aircraft
CAB wants to record which mechanics are scheduled for the next
maintenance service of an aircraft. A mechanic may perform the
Business Constraints
The following constraints exist for the business object types and
business relationship types that CAB wants to maintain in its
database:
© Copyright IBM Corp. 2000, 2002 Appendix A. Sample Problem Statement A-5
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
© Copyright IBM Corp. 2000, 2002 Appendix A. Sample Problem Statement A-7
Course materials may not be reproduced in whole or in part
without the prior written permission of IBM.
Student Notebook
1.
The relational data model describes the conceptual representation
of the data objects of relational databases and gives guidelines for
their implementation.
2.
c
3.
False
4.
Fields are the columns for a particular row of a table. They are the
actual receptacles for the data stored into a table.
5.
True
6.
True
7.
a, b, d
8.
The main reasons are:
- Identical rows cannot be modified or deleted individually.
9.
False
10.
True
11.
c
1.
The problem statement for an application domain is a document
describing the types of business objects for the application domain,
the relationships between them, and the business constraints for
both of them.
2.
c
3.
c, a, b
4.
False
5.
True
6.
b, a, a, c, b, b, c, b
7.
An entity-relationship model visualizes the business object types of
the application domain, the relationships between them, and the
business constraints for both of them.
8.
The data inventory is a description of the data elements, i.e., the
elementary data, of the application domain.
9.
a, b
10.
Logical data structures apply to processes or parts of them. They
describe:
- The subset of the tables (of the database for the application
domain) used by the process or the pertinent part of the
process.
- How the process or the part of the process must logically
navigate through the tables in order to accomplish its function.
11.
False
1.
a, b, c, e, g
2.
The main sections of a problem statement are:
- An overview of the application domain.
- A description of the business object types.
- A description of the business relationship types.
- A description of the business constraints.
3.
The overview section should:
- Describe what the application domain does.
- Identify the areas of the application domain to be implemented
in the target database.
4.
b, c
5.
True
6.
A business relationship type represents a category of business
relationships, with the same meaning and characteristics, between
the objects of one or more business object types.
7.
For each business relationship type, the problem statement should:
- Contain a textual description of the business relationship type.
- Identify the business object types linked by the business
relationship type.
8.
True
9.
Cascading business relationship type.
10.
A business constraint represents a restriction for the objects of
business object types, for the relationships of business relationship
types, or for a mixture thereof.
11.
For each business constraint, the problem statement should:
- Contain a textual description of the restriction that must be
adhered to.
- Identify the business object types or business relationship types
to which the restriction applies.
- Specify when the constraint is to be applied.
- Describe the action to be performed if the constraint is violated.
12.
True
1.
The three major components of entity-relationship models are:
- Entity types
- Relationship types
- Constraints
2.
False
3.
An entity type is a conceptual unit representing a class of objects
with the same meaning and characteristics about which information
is to be stored and maintained.
An entity instance is an actual object belonging to an entity type.
4.
The entity key allows to uniquely identify the instances belonging to
an entity type.
The minimum principle requires that all attributes of the entity key
are necessary for the unique identification of the instances of the
entity type. If an attribute is omitted, the remaining attributes no
longer uniquely identify the instances of the entity type.
5.
True
6.
A relationship type is a conceptual association between:
- The entity instances, one each, of two not necessarily different
entity types.
- The relationship instances, one each, of two not necessarily
different relationship types.
7.
True
8.
True
9.
False
10.
b, a, a, b, d, c
11.
The cardinalities for relationship type PASSENGER_has_SEAT
are the following:
Cardinality for source: 0..1 or 1
Cardinality for target: 0..m or m
12.
A 1:1 relationship type is a relationship type with cardinalities ..1 at
both ends of the relationship type.
A 1:m relationship type is a relationship type with cardinality ..1 at
one end and cardinality ..m at the other end of the relationship
type.
A m:m relationship type is a relationship type with cardinalities ..m
at both ends of the relationship type.
13.
a. For relationship type r1, any number of instances of entity type
B can be connected to an instance of entity type A.
14.
Since relationship type r1 has a source cardinality of 1, entity
instance B3 cannot be connected to multiple instances of entity
type A.
Since relationship type r2 has a source cardinality of 1..1, entity
instance C1 of entity type C must be connected to one and only
one instance of entity type A.
15.
The defining attributes and the relationship keys for relationship
types r1 and r2 are:
Defining attributes for r1: Key of A and key of B
Relationship key for r1: Key of A (target cardinality of 1)
Defining attributes for r2: Key of r1 and key of C, i.e., key of A and
key of C
Relationship key for r2: Key of r1 and key of C, i.e., key of A and
key of C, since r2 is a m:m relationship
type
16.
False
17.
To be a dependent entity type, the entity type must fulfill the
following requirements:
- A part of its key or its entire key must be equal to the key of
another entity type or of a relationship type (referred to as
parent entity type or relationship type, respectively).
- There must exist a relationship type between the parent entity
type or relationship type and the dependent entity type so that:
• Each instance of the dependent entity type is, at all times,
connected to one and only one parent instance.
• The dependent and parent instances interconnected are
those with matching key values: The value of the
appropriate key portion of the dependent entity instance
must be equal to the key value of the parent instance.
18.
Owning relationship type r1 cannot have the instance (A1, A2.B1)
because the value of the appropriate key portion for the entity
instance of B is different from the key value for the instance of A.
19.
By means of dependent entity types.
20.
False
21.
True
AP 22.
Deletion of C2
u Deletion of (C2, A3) for r1
u Deletion of A3 (controlling property)
u Deletion of (A3, B2) for r2
u Deletion of B2 (controlling property)
u Deletion of (C2, D3) for r3
u Deletion of ((A1, B1), (C2, D3)) for r4
Remaining Instances:
Object Instances
A A1, A2,
B B1
C C1, C3
D D1, D2, D3
r1 (C1, A2)
r2 (A1, B1), (A2, B1)
r3 (C1, D1), (C1, D2)
r4 ((A1, B1), (C1, D2))
23.
True
24.
The components of a class structure are:
Supertype
Subtypes
Is-bundle
25.
The is-bundle is the set of _is_ relationship types connecting the
supertype to its subtypes.
26.
b, c, a, d
27.
The instances of entity types and relationship types can be
restricted by means of constraints.
28.
The three components of constraints are:
The constraining objects
The constrained objects
The rule specifying how the constraining objects restrict the
instances of the constrained objects.
29.
The format of a constraint in the entity-relationship model is:
{ identifier [ : rule ] }
1.
a, b, e, f
2.
A data inventory should contain:
- A description of the abstract data types for the application
domain.
- A description of the data elements and data groups for the
application domain.
3.
From the application-domain perspective, a data element is an
indivisible piece of data.
A data group consists of one or more related data elements and/or
data groups and, thus, generally is not an indivisible piece of data.
4.
Data elements can be associated with standard data types or
abstract data types. Abstract data types are an extension of
standard data types. They can be tailored to the application
domain. They describe the values that the data elements
associated with them can assume and the operations that can be
performed with them.
5.
For an abstract data type, you should provide:
- Its signature, i.e., its name and parameters.
- The values that can be assumed.
- The operations that can be performed.
6.
a, b, d, e
7.
By associating data elements and data groups with the entity types
using them as attributes, you can verify the completeness of the
entity-relationship model for your application domain. If you cannot
find an entity type for a data element or data group not belonging to
a data group, the entity-relationship model is incomplete.
8.
The usual methods for establishing a data inventory are:
- Surveying the departments of expertise.
- Screening existing data and programs.
- Coupling the data and process inventories.
9.
Some of the problems in surveying the departments of expertise
are:
- Communicative problems:
• The application domain expert may not be able to extract
the proper information from the members of the
departments of expertise.
• The members of the departments of expertise may not be
able to communicate their thoughts and ideas.
• Due to workload pressure, the members of the departments
of expertise may be reluctant to talk with the application
domain expert about database related topics.
- In discussions, it is easy to forget something.
- You may obtain data elements and data groups not actually
needed.
- It is a one-time effort. Later changes are not reflected in the
data inventory.
10.
The principle behind coupling the data and process inventories is
the following:
- When a business process is described or updated in the
process inventory, the data elements and data groups it uses
are identified or changed accordingly.
11.
b, d, e
12.
The description of a business process should contain the following
items:
Title
Purpose
Input
Textual description
Formal description
Output
Data read
Data written
Others (such as window formats or listing formats)
13.
Data read for a business process are the data elements or data
groups read internally during the execution of the business
process.
For each data element or data group read, its name in the data
inventory and all purposes it is read for should be described.
14.
For each step of the business process, you determine the entity
types and relationship types of the entity-relationship model
needed to access the data elements and data groups for the step.
The entity types are the receptacles for the appropriate data. The
relationship types are the paths for navigating from a piece of
15.
Process decomposition is an iterative, step-by-step decomposition
of the application domain into groups of functionally related
business processes. Each iteration decomposes the groups for the
previous iteration into functionally related subsets until the groups
cannot be broken down any further. The result is a process tree.
The purpose of process decomposition is to obtain the complete
set of business processes for the application domain.
1.
True
2.
False
3.
The cardinality for an attribute determines how many values the
attribute must assume at least and can assume at most in the
scope it is used.
If the attribute is used as direct component of the tuple type, the
cardinality specifies how many values the attribute must assume at
least and can assume at most for each tuple.
If the attribute is used as component of a composite attribute, the
cardinality specifies how many values the attribute must assume at
least and can assume at most for each value of the composite
attribute.
4.
c, e
5.
False
6.
The tuple type for an entity type is established by compiling the
data elements and data groups of the data inventory associated
with the attributes of the entity type.
7.
Tuple types must not be established for:
- Owning relationship types.
8.
The components of a composite attribute are indented.
9.
In the tuple type documentation, the role of a data element or data
group for an attribute can be identified by means of the AS clause:
name of data element/group AS role name
10.
d
11.
MAINTENANCE RECORD_belongs_to_MAINTENANCE RECORD
Maintenance Number, PK
Maintenance Number AS Owner
12.
a, d, f
13.
The Normal Forms describe states or quality levels for the tuple
types. The higher the Normal Form of a tuple type, the more stable
the tuple type is, the fewer data inconsistencies are possible, and
the less redundant information it contains.
14.
The resulting tuple types no longer contain repeating groups, i.e.,
all attributes can assume at most one value.
AP
15.
The attributes for repeating groups have a maximum cardinality
higher than 1. This includes a maximum cardinality of * meaning
that the appropriate attribute can assume any number of values
within its scope.
16.
Generally, in the entity-relationship model, you need:
- A new dependent entity type.
- A new owning relationship type interconnecting the new
dependent entity type and the entity type/relationship type for
the original tuple type.
17.
False
18.
True
19.
Generally, in the entity-relationship model, you need:
- A new entity type.
- A new relationship type interconnecting the new entity type and
the entity type/relationship type for the original tuple type.
20.
If the data groups the attributes for a tuple type are based upon
have been established properly, they contain all attributes (and
only those) that, during normalization, must be moved together to a
new tuple type.
21.
True
1.
Tuple types are translated into tables as follows:
- Each tuple type becomes a table.
- Each elementary attribute becomes a column.
- Each elementary attribute of the tuple type's primary key
becomes a column of the table's primary key.
2.
Tuple types with always corresponding primary key values can be
merged.
3.
A tuple type whose primary key values always are a subset of the
primary key values of another tuple type can be imbedded in the
other tuple type if the following condition is met: For each
potentially imbedded tuple, at least one of its nonkey attributes has
a value.
4.
True
5.
For T1 through Tn to be a perfect decomposition of T, the following
condition must be satisfied as well:
At all times, each primary key value of T must occur in one and
only one of the tuple types T1 through Tn.
6.
The following are some reasons for not combining tuple types:
- The tuple types have nothing to do with each other.
- The tuple types are only processed together by business
processes that are not performance-critical.
7.
Some typical limitations for relational database management
systems are:
- The rows must fit entirely into a single pages of a chosen size.
This limits the row size.
- The maximum number of rows per page is limited.
- The maximum number of columns per page is limited.
- The maximum size of a table is limited.
8.
True
9.
True
10.
False
11.
True
12.
False
13.
f, a, c, b, a, d, e, a, f, c, g, f, b, e
AP 14.
False
15.
False
16.
System default values are system-provided, predefined, default
values for the various data types. They are independent of
columns.
User default values are default values you define for specific
columns. As user default for a column, any value can be chosen
that is compatible with the data type for the column.
17.
You can provide your own default value for a column by specifying
the value in the WITH DEFAULT clause for the column.
18.
True
19.
False
20.
External user defined functions are based on programs written by
you. Sourced user defined functions are based on existing built-in
functions or user defined functions.
21.
True
22.
b, c, a
23.
Check constraints allow you to restrict the values of columns
beyond the values permitted by the data types of the columns.
24.
False
25.
A trigger is a set of actions to be performed when a specific event
occurs.
26.
False
27.
True
28.
A trigger can be activated before the changes for the row or SQL
statement are applied or after they have been applied.
29.
a, e, f
30.
a, b, c, d, e
31.
True
32.
True
1.
The four basic types of integrity to be maintained for a data base
are:
- Referential integrity
- Domain integrity
- Redundancy integrity
- Constraint integrity
2.
A foreign key is an ordered set of columns whose values are, at all
times, a subset of the values of a parent key of the same or another
table.
3.
True
4.
e, b, d, a, c
5.
NO ACTION checks for orphans after the deletion of the rows of
the parent table and rejects the request if orphans are detected.
RESTRICT checks for parent rows before the deletion of the rows
of the parent table and rejects the request if parent rows are found.
6.
True
7.
a, b
8.
The deletion of a parent row fails if:
- Another referential constraint with delete rule NO ACTION or
RESTRICT prevents the deletion of the parent row.
- Another referential constraint with delete rule NO ACTION or
RESTRICT for which the dependent table is the parent table
prevents the deletion of a dependent row.
9.
You need an after trigger for the table for the relationship type. The
trigger must be activated for each deletion of a row for the
relationship type and must delete the row for the appropriate
source instance.
10.
Table T is delete-connected to table T1 if the deletion of a row of T1
requires that rows of T are accessed.
11.
False
12.
True
13.
True
14.
For referential cycles, the following restrictions exist:
- For a cycle of two or more tables, at least two delete rules must
be different from CASCADE.
- For a self-referencing constraint, the delete rule must be NO
ACTION or CASCADE.
AP 15.
False
16.
The purpose of a referential structure is to provide an overview of
the referential constraints for the tables of an application domain or
a subset thereof.
17.
True
18.
A double-headed arrow in a referential structure indicates that a
parent key value may occur more than once as foreign key value in
the dependent table.
19.
Domain integrity requires that the values of the columns for the
tables are correct. This means that:
- The values belong to the values supported by the abstract data
types for the data elements for the columns.
- The values adhere to domain restrictions for the data elements
for the columns.
- The values observe length restrictions for the data elements for
the columns.
20.
The three major causes for the redundancy of data are:
- Violations of the Second Normal Form or Third Normal Form
- Multiple copies of columns or tables
- Derivable data
21.
False
22.
You can ensure the correctness of derivable data by:
- Not storing them and deriving them each time they are needed.
- Triggers reevaluating and storing the derivable data each time
data affecting the derivable data are inserted, updated, or
deleted.
23.
For constraint integrity, all business constraints of the application
domain must be observed.
24.
The main ingredients for achieving constraint integrity are triggers
and user defined functions. Sometimes, unique indexes or
referential constraints can be used.
AP Unit 9 - Indexes
1.
The main purpose of an index is to improve performance when the
locating of a row would require the scanning of the rows of the
table.
2.
True
3.
True
4.
An index is a dense index if each key value has an index entry in
the lowest index level.
5.
False
6.
At most one.
7.
c, a, b
8.
Plain unique index can be used for:
- The primary key of a table.
- The foreign key resulting from merging the tuple type for a 1:1
relationship type.
9.
Unique-where-not-NULL indexes can be used for the foreign key
resulting from imbedding the tuple type for a 1:1 relationship type.
10.
If you have a clustering index for a table, the database
management system attempts to store the rows of the table in such
a way that the physical sequence of the data pages agrees with the
logical order implied by the index.
11.
c
12.
From a database design perspective, you should establish an
index for:
- Each primary key.
- Each foreign key.
1.
The two major inputs for the development of the logical data
structures are:
- The tables for the application domain.
- The referential structure for the application domain.
2.
The main purposes of logical data structures are to identify:
- The columns (and the tables containing the columns)
corresponding to the data elements used by the business
processes.
- How the business processes can navigate, with the data found,
from one table to the next.
3.
True
4.
False
5.
b
6.
The components of a logical data structure are:
The structure diagram.
The path summary.
The table summary.
7.
The structure diagram for a logical data structure illustrates the
paths interconnecting the tables of the logical data structure.
8.
For each path of the structure diagram, the path summary specifies
the source table, the target table, and the interconnected columns.
9.
For each use of a table of the logical data structure, the table
summary specifies the columns needed.
10.
False
11.
True
12.
Views provide data security, ease of use, resilience against
database changes, and freedom to change the table definitions.
IX Index
Numerics condition 3-16
1:1 relationship types 4-43 textual description 3-16
1:m relationship types 4-43 business object types 3-6
business process 5-42
data read 5-45
A data written 5-45
abstract data types 5-9 formal description 5-43
example 5-12, 5-14, 5-15 input 5-42
implementation considerations 7-49 output 5-44
operations 5-11 purpose 5-42
sample implementation 7-66 sample business process 5-46
signature 5-10 textual description 5-43
values 5-10 title 5-42
attributes 4-9 business relationship types 3-11
components 4-11
composite attributes 4-11
definition 4-9 C
elementary attributes 4-11 candidate keys 4-12
name 4-11 cardinalities 4-41
properties 4-10 example 4-44, 4-46
value 4-11 CASCADE 8-15, 8-17
character strings 7-39
CHARACTER 7-39
B CLOB 7-39
balanced trees 9-8 DBCLOB 7-39
searching via an index 9-9 GRAPHIC 7-39
basic entity types 4-15 VARCHAR 7-39
built-in data types 7-38 VARGRAPHIC 7-39
BIGINT 7-39 check constraints 7-59
CHARACTER 7-39 documentation 7-85
character strings 7-39 examples 7-61
CLOB 7-39 class structure 4-76
DATE 7-39 subtypes 4-76
datetime data types 7-39 supertype 4-76
DBCLOB 7-39 clustering indexes 9-13
DECIMAL 7-39 locating insertion point 9-15, 9-17
design considerations 7-40 partitioning indexes 9-19
DOUBLE 7-39 purpose 9-13
GRAPHIC 7-39 sample insertion 9-15, 9-17
INTEGER 7-39 column attributes 7-41
NUMERIC 7-39 default values 7-45
numeric data types 7-39 column functions 7-56
REAL 7-39 columns 1-4
SMALLINT 7-39 combining tuple types 7-13
TIME 7-39 considerations 7-26
TIMESTAMP 7-39 decomposition of super tuple types 7-23
VARCHAR 7-39 imbedding detail tuple types 7-18
VARGRAPHIC 7-39 merging partial tuple types 7-13
bundle cardinality 4-79 Come Aboard A-1
business constraints 3-16 CAB A-1
action if violated 3-16 composite attributes 4-11
affected constructs 3-16 components 4-11
T
S table functions 7-56
sample business process 5-46 table summary 10-11
data read 5-56 description 10-11
input 5-46 example 10-17
output 5-48 tables 1-4, 7-7
purpose 5-46 built-in data types 7-38
textual description 5-47 check expressions 7-59
verification of ER model 5-50 column attributes 7-41
sample problem statement A-1 conversion of tuple types into tables 7-7
business constraints A-5 documentation 7-87
business object types A-1 token translation tables 7-78
business relationship types A-3 target 4-27
CAB A-1 third normal form 6-43
Come Aboard A-1 correction of entity-relationship model 6-48
overview A-1 definition 6-43
scalar functions 7-56 example 6-44, 6-50
second normal form 6-39 functional dependency 6-43
correction of entity-relationship model 6-41 instance example 6-47
definition 6-39 solution 6-45
example 6-40 violation 6-44
solution 6-41 token translation tables 7-78
violation 6-40 an alternative 7-79
self-referencing constraint 8-9 triggers 7-62
self-referencing table 8-9 activation time 7-63
SET NULL 8-14, 8-17 after triggers 7-63
source 4-27 before triggers 7-63
sourced functions 7-56 examples 7-69
specialization 4-77 granularity 7-63
standard data types 5-9 prerequisite conditions 7-63
steps during conceptual view 2-6 remarks 7-64
steps during logical view 2-10 triggered actions 7-63
steps during storage view 2-8 triggering operations 7-63
storage view 2-4 tuple types 2-8, 6-4
structure diagram 10-11 characteristics 6-7
description 10-11 conversion into tables 7-7
example 10-11 decomposition of super tuple types 7-23
U
unary relationship types 4-32
unique indexes 9-11
plain unique indexes 9-11
unique-where-not-null indexes 9-12
uniqueness of columns 1-6
uniqueness of rows 1-6
unique-where-not-null indexes 9-12
update rules 8-6
update rules (referential integrity) 8-16
CASCADE 8-17
NO ACTION 8-16
RESTRICT 8-17
SET NULL 8-17
updated maintenance view 8-41
user default values 7-46
user defined distinct types 7-51
documentation 7-82
example 7-53
source data type 7-51
user defined functions 7-55
column functions 7-56
definition 7-57
documentation 7-83
external functions 7-55
invocation 7-57
scalar functions 7-56
sourced functions 7-56
table functions 7-56
V
values 1-4
vertical splitting 7-33
backpg