Sei sulla pagina 1di 59

EECS 317 Data Management and

Information Processing
Fall 2013
Goce Trajcevski
Dept. of EECS
Northwestern University
"#$% &'()'*+)(,-*
./01$% 2)34056#7891:;<43=>#83#4;?#@A
1
2

"#$%&'()%$ *%&% +(,#$
-." /0 &( /0 "#$%&'()%$ 123#4%5

67%$'&8 (9 *: ,#;'<)

3
Relational Data Model Basics
Relational Data Model describes the logical schema of a Database that is
implemented on top of a Relational Database Management System ((R)-DBMS)


More Formally:

Relational Schema = a (logical) description of the
Name of that schema +
The names of the attributes of a given relation, along with their types,
e.g.,
Student(stID:integer, Name: String, Major: String, GPA:real, Email:string).

Note that this is very similar to the Entity Class and its attributes description.
Often, (when no ambiguity and/or clear from the context) we will omit that types of
the attributes from the description.


Relational Data Model Basics
(Relational) Database Schema = acollectionofindividualRelations
Schemas

NOTE:partofthedatabaseschemadescriptionalsoinvolvessomeimplicit
issues:
Keys specification (recall the discussion of superkey; candidate key;
primary key). Once a given (set of)attribute(s) is declared as a key, it
implicitly means that its value is unique forindividualinstances
Integrity Constraints (will address more formally later on)

The design of the schema is seemingly-same as the ER-diagram design

However, in general, there will be more relations (then entity sets)

Plus(wellseeinafewlectures)theschema-design also considers issues like
redundancy(minimizereplication)efficiency(dontspendtoomuchtime
processingqueries)correctness(theanswerindeedcorrespondstothesnapshot
of the world recorded in the database)

4
5
Relational Data Model Basics
Fundamental concept = Relation, which is essentially a table that has a distinct
name, and:
Columns: represent the individual attributes.
Rows: correspond to individual objects/instances for which each column represents
the particular value of the given attribute, for that particular object.
Also, commonly known as records or tuples.

stID Name Major GPA email
13579 John Smith Liberal Arts 3.47 js@las.uni1.edu

24687 Peter Jones EECS 3.49 pj@eecs.uni.edu

12456 Anna Berg Journalism 3.61 ab@jour.uni1.edu
Relation Name:
Student

Attributes
Records/
Tuples
key
Relational Database =
a collection of such tables

The relational schema is:
Student(stID, Name, Major, GPA,
email)

#-of-tuples = cardinality;
#-of-attributes = arity

OK,alittlerecollectional
discussion++
6
Relational Data Model
Sequence:
Basic Definitions
ER-to-Relational Model (rules)
EER-to-Relational Model (rules)
Goal:
Familiarizewiththemodelthatcanbe(almost)directlytranslatedintoan
actual Relational Database
History-recap:
T. Codd, 1970s
Simple, elegant and highly-expressive model
Strong Mathematical Foundation (recall, relations are subset of Cartesian
Products)
Turing Award
Dominantindustrymodel
Even for many scientific and streaming-data applications
Now also coupled with the Object-Oriented Modeling (Object-Relational
DBMS)
NOTE: relation = relationship (from the ER context)



7
Relational Data Model Settings
The focus of this part of the course will be:
Given:
d
Entity1
attribute11 attribute12
Entity11 Entity12
attribute111 attribute112
attribute121 attribute122
Generate the collection:
Rel_1(A11,A12,A13,)
Rel_2(A21,A22,A23,)
Rel_3(A31,A32,A33,)

Rel_k(Ak1,Ak2,Ak3,)

Faithfullytranslatingthe
given ER-diagram(asinput)
which, in turn, represents the
model of the problem-domain
athand
8
ER-to-Relational
Basic Rule
An entity class typically becomes a relation, where its
name becomes the name of the schema and its attributes
are the ones that constitute the attributes of the schema.
Entity_E1
attribute1 attribute2 attribute3

Becomes:

Entity_E1(attribute1, attribute2, attribute3)
loan
loan_ID amount
Becomes:
loan(loan_ID, amount)
e.g.,
9
ER-to-Relational

In case the Entity Class has a composite attribute
flattenitout,i.e.,incorporatetheprimitive/atomic
attributesasfirstclasscitizensoftheschema(+
ignoretheoriginalcompositeattribute)
Entity_E1
attribute1
attribute2
attribute3
attribute31
attribute32
Becomes:

Entity_E1(attribute1, attribute2, attribute31, attribute32)
10
ER-to-Relational
Example:
Customer
!"#$
"&&'$((
)*(+,-./
street
zip
state
city
N
O
T
E
:

Becomes:

Customer(cust_ID#, name, street, city, state, zip)
11
ER-to-Relational
In case the Entity class has a multivalued attribute
Possible solution 1:
Enter each value of the multivalued attribute in consecutive tuples
for each instance of the particular object
- BAD: replicate the same data (e.g., name, ssn, address) for
different values of only one attribute
Possible solution 2:
Fix the number of allowed values (cardinality) for the multivalued
attribute
Does not fully alleviate the replication problem;
BAD++
Loss of completeness (e.g., if Jones has 5 phone numbers,
but we limit the representation to 3 -> we cannot represent
the entire information needed
NULL values: If Smith has only one phone number, and we
insist on having three per object-instance, we need to fill
the other two with NULL.


compilation
12
ER-to-Relational
Mulitivalued representation:
Create an extra relation (i.e., relational schema)

attribute1
attribute2 attribute3 attribute(N-1)
attributeN
Entity_1
.
Becomes:
Entity_1(attribute1, attribute2, attribute3, attribute(N-1))
Entity_1Mult(attribute1, attributeN)
13
ER-to-Relational
Example:
Becomes:

Customer(cust_ID#, name, street, city, state, zip)
CustomerMV(cust_ID#, cust_Phone#)
Customer
!"#$
"&&'$((
)*(+0123!$/
)*(+,-./
street
zip
state
city
NOTES:
1. Wellseeweak-entities
2. Names MUST be unique,
hence, CustomerMV (could have
been CustomerPhones)
14
ER-to-Relational
What about relationships?
Thestillneedstobecomesome type of a table
Basictemptation:
Get the attributes from each participating entity set, plus the attributes of the
relationshipitself,andformaseparatetablewiththem
Ex:
Create the schema:
depositor(customer_id,customer_name,customer_street,customer_sity,acces_date,account_number,balance)
15
ER-to-Relational
Why is the schema:
depositor(customer_id,customer_name,customer_street,customer_sity,acces_date,account_number,balance)
NOT GOOD???
Think in terms of the tables with the actual tuples (AND recall that each entity class represents some
significantplayerfromtheproblem-domain,ergo,)
4*(+3#$' 53"!
.$63(7+3'
Way too much replication
ER-to-Relational
16
Recall:
stID Name Major GPA email

Attributes
-Detailed discussion about individual entities
(plus, composite and multivalued attributes (Q = ??))

- Started with converting relationships to Relational model

17
ER-to-Relational
So:
Dontwantreplication!!!
Just*WHAT*arethesemanticalpropertiesoftherelationshipsthatcan
hinthowtorepresentthetruenatureoftheproblemdomain:
Without redundancy;
With preserving the information that is inherent to that domain;
The key aspects of translating a relationship from the ER diagram into a
relational schema are:
Cardinality Constraints;
Participation Constraints;
Whilepreservingthekeys


18
ER-to-Relational
The impact of the cardinality type/constraint of the relationship:
1-to-M
Entity_E1
attribute11
attribute12
Entity_E1
attribute21
attribute22
attributeR
R12
1 M
RULE:
- Addthekeyof1entity,asaforeign key
tothetableofMentity
AND
- Add the attributes of the relationship type
Entity_1(attribute11, attribute12)

Entity_2(attribute21, attribute22, attribute11, attributeR)
Q:whynottheotherwayaround???
19
ER-to-Relational
Dealing with M-to-1 relationships:
mirror_image of 1-to-M

Dealing with 1-to-1 relationships:
similarinspiritto1-to-M: just pick one of the relationships to
actasifitsMandaddtheforeignkeys(+theattributesof
the relationship types)
NOTE: it may be tempting to create a single table with all of the
Entity_1,Entity_2andR12attributes,because,ifweknowits1-t-
1, then there will be no redundancy, in the sense of the
previously-discussedexample
HOWEVER, this is a poor design because it blurs the actual
semantics of the problem domain (in reality, Entity_1 and
Entity_2 are separate classes!).


Q: how about if a given entity participates in > 1 relationships
A:handleeachrelationshipone-at-a-time

20
ER-to-Relational

M-to-M:
Observe that 1-to-Mwashandledbysqueezingtherelationshiptypeasan
attribute, plus taking care of the keys (foreign) for the purpose of proper
maintenance of the data-relationships
Now we must create a separate table for the relationship type, with added
foreignkeys=thekeysoftheparticipatingentityclasses(plustheown
attributes of the relationship)
Entity_E1
attribute11
attribute12
Entity_E1
attribute21
attribute22
attributeR
R12
M M
Entity_1(attribute11, attribute12)

Entity_2(attribute21, attribute22)

R12(attribute11, attribte12, attributeR)
Q:whatistheproblemifwetrytomimick1-to-M?

Becomes:
21
ER-to-Relational (Weak Entities)
Recall that a weak entity class needs to have
Identifying Relationship;
Owner Entity Class;
Partial Key (discriminator)

Rule: create a table for the weak entity in which the key is composed of foreign
key (from the owner entity) + the discriminator;
AND add the rest of the attributes

Entity_ES
attributeS1
attributeS2
Entity_EW
attributeW1
attributeW2
RSW
Entity_EW(attributeS1, attributeW1, attributeW2)
NOTE: the rest of the model is translated into a schema based on the
respective rules (i.e., the schemas for Entity_Es; RSW)
22
ER-to-Relational: Specialization/Generalization


There are three basic strategies for converting this type of EER
model into a Relational model
d
Entity1
attribute11 attribute12
Entity11 Entity12
attribute111 attribute112
attribute121
attribute122
Recall:
23
ER-to-Relational: Specialization/Generalization
I: create a table for each entity class, plus add the key of the superclass as a
foreign key to each sublcass (specialization)



II: createasinglerelationfortheentirehierarchyanddumpeverythingin
there





III: eliminate the superclass and put its attributes in each of the subclasses
(similartothememory-imageoftheobjectsinanOOprogramminglanguage)
Entity1(attribute11, attribute12)
Entity11(attribute11, attribute111, attribute112)
Entity12(attribute11, attribute121, attribute122)
Entity1(attribute11, attribute12, attribute13, attribute111, attribute112, attribute121, attribute122)

NOTE: may create too many NULL values in the actual table
Entity11(attribute11, attribute12, attribute111, attribute112)
Entity12(attribute11, attribute12, attribute121, attribute122
ER-to-Relational: Specialization/Generalization
Example
Method 1:
Form a schema for the higher-level entity
Form a schema for each lower-level entity set, include primary key of
higher-level entity set and local attributes

schema attributes
person name, street, city
customer name, credit_rating
employee name, salary
Drawback: getting information about, an employee requires accessing two
relations, the one corresponding to the low-level schema and the one
corresponding to the high-level schema
24
Person
Employee Customer
Consider:
(assumetheusualattributes)
ER-to-Relational: Specialization/Generalization
Example
Method 2:
Form a schema for each entity set with all local and inherited attributes

schema attributes
person name, street, city
customer name, street, city, credit_rating
employee name, street, city, salary

If specialization is total, the schema for the generalized entity set (person)
not required to store information
Canbedefinedasaviewrelationcontainingunionofspecialization
relations
But explicit schema may still be needed for foreign key constraints
Drawback: street and city may be stored redundantly for people who are
both customers and employees

25
Person
Employee Customer
26
ER-to-Relational: Complex
Specializations/Generalizations
Multiple Inheritance:
I-strategy is applicable;
II- strategy is NOT applicable (why?)
III-strategy is applicable;
Overlap:
I-strategy is applicable;
II-strategy is applicable;
III-strategy is bad/NA; (why?)
Multiple Specialization:
I and II strategy are applicable;
III-strategy is bad (why?)
Union:
YetanothertestimonythattheI-strategy is the most-often applicable
one
ReadingAssignment:theEERdiagramwithTransactionItem(Ch.4).
27
ER-to-Relational: Aggregation
RECALL:
One particular issue that we discussed in class, but is NOT presented formally in the
textbook is the AGGREGATION
It occurs when there is a need to form an association (relationship type) between
a particular entity class AND another relationship.
Keyideaistogroupalltheentitiesaroundtherelationshipintoasinglemeta-
classwhichaggregatestheirinformationtogether
Possible Idea:
1. Constructafakeschema:
E-J-B-W(A1,A2,A3,,Ak)
2. Consider the nature of the Aggregate-Relationship
(e.g., is it 1-to-1, 1-to-M,)
3. HandletheManagesrelationshipaccordingly
4. NOTE: we will see the details later, when we
introduce the Views in SQL
Another possbitility:
manages (employee_id, branch_name, title,
manager_name)
Schema works_on is redundant provided we are
willing to store null values for attribute
manager_name in relation on schema manages
28
ER-to-Relational: other issues(?)
Although mentioned cardinality AND participation constraints, so far
explored only the translation of cardinality ones (e.g., 1-to-M, M-to-M)
Q:howistheMustvs.May(i.e.,Totalvs.Partial)handledinthe
Relational Model???


A: Integrity Constraints
IC = set of formulae that specify WHAT ARE THE CORRECT STATES that an
instance of the database can be in (i.e., what are the properties that any correct
states must exhibit)
Specification;
Verification;
Enforcement
Topicsaddressedinnearfuture
29
DB-design:Overview/Positioning
Motivation
Redundancy
(Possible) Anomalies
Functional Dependencies
Definition
Closure
Armstrong Axioms
Keys and FDs
Normalization
1NF
2NF
3NF
BCNF
30
DB-design: Motivation
We know (E)ER, as well as (E)ER->to->Relational model, e.g.,
d
Entity1
attribute11 attribute12
Entity11 Entity12
attribute111 attribute112
attribute121
attribute122
Rel_1(A11,A12,A13,)
Rel_2(A21,A22,A23,)
Rel_3(A31,A32,A33,)

Rel_k(Ak1,Ak2,Ak3,)

Faithfully representing the given
ER-diagram from the input
8 ()2$#" 927)2 7( !"##$!%: 7! +$'#( 3; translation ;'3# <=>=?: #"@ @7$A& "
&"+"B"($ +2"+ 7( &"%'()*%$'"+ -. /0123-24
31
DB-design:motivation
Consider the schema:
HourlyEmployee(e_ID, eName, eDept, eCategoryRate, eHourlyWage, eTotalHours)
Observe the following actual-instance:
36 15 8 D2 Smith 5677
8 20 10 D4 Engels 5507
36 15 8 D2 Jones 3578
40 12 5 D1 Sundan 2468
5 20 10 D3 Marx 2234
32 12 5 D1 Smiley 1359
40 15 8 D1 Atishoo 1234
eID eName eDept eCategory eHourlyRt eTotalHrs
Q: Anything possibly wrong???
A: A LOT of REDUNDANCY stored:
(observethattheresanassociation)

8 15
10 20
5 12

In terms of the values between
eCategory eHourlyRate
32
DB-design:Motivation
36 15 8 D2 Smith 5677
8 20 10 D4 Engels 5507
36 15 8 D2 Jones 3578
40 12 5 D1 Sundan 2468
5 20 10 D3 Marx 2234
32 12 5 D1 Smiley 1359
40 15 8 D1 Atishoo 1234
eID eName eDept eCategory eHourlyRt eTotalHrs
What are the drawbacks of having
suchassociationsgoundetected???
(possible) ANOMALIES =
canNOT AUTOMATICALLY ENFORCE
somenaturalconsistencyrequirements
Ex: Update Anomaly

Assume a decision is made to raise the HourlyRate of every hourly employee of category 8 from
15 to 16.50 => cannot be done only in row, e.g., #5,
without actually
affecting rows #1 and #7!!!
MUSTmanuallyenforcetheconsistency

33
DB-design:Motivation
Using the same settings:
Insertion anomaly: adding a new tuple may add an inconsistency
E.g., (3456, Tompson, D4, 8, 17, 33) will lead to an inconsistency in terms of the 8
15 dependency/association.
Deletion anomaly: removing a (set of) tuple(s) may also remove an existing
dependency
E.g., if both of the tuples with Marx and Engels are removed, we will loose the
dependency/association 10 20.
Ideally, would like NO Redundancy, whatsoever, but sometimes it may be
needed/allowed (for the purpose of efficient retrieval mostly DB-programmers
call)
Hence, we define the following Goal:
Whenevertheressomeassociationamongvaluesofgivenattributes,extractit
asseparatetable
HourlyEmployee(e_ID, eName, eDept, eCategoryRate, eTotalHours)

RankWage(eCategoryRate, eHourlyWage)
eCategoryRate eHourlyWage
8 15
5 12
10 20
NOTE
34
DB-design: desiderata
Decide whether a particular relation R isingoodform.
In the case that a relation R isnotingoodform,decomposeitintoa
set of relations {R
1
, R
2
, ..., R
n
} such that
each relation Ri is in good form
the decomposition is a lossless-join decomposition
(defined shortly)
Theoretical framework founded upon:
functional dependencies
multivalued dependencies

35
DB-design: desiderata
Formally: A decomposition of R into R
1
and R
2
is lossless join if and
only if at least one of the following dependencies is in F
+
:
R
1
R
2
R
1
R
1
R
2
R
2

Informally/Intuitively:
If a given relation is split into two, then there should be no loss of
information in the sense that joiningbetween the two sub-relations
shouldbecapableofansweringthequeriesthatwereanswerableprior
to the split-up


ASIDE:itsOKnottoquite-understand it at the time being stick
withtheintuitiveguidance
36
DB-Quality: Normalization
Goal:
Givenasetofrelationalschemas,reason-outhowtheyshouldbere-
arrangedsothatsomenicepropertiesareensured
Expressed in terms of, so called, Normal Forms of a given DB.
NOTE: as we will see, there is a well-definedformalismbehindthis
However:
Before we can dwell on NORMALIZATION, we need to introduced some
techniqueswhich,inasense,arecomputationalvehicleto
(pre)normalization
Hence:
OurnexttopicisonFunctionalDependencies

37
DB-Quality: Functional Dependencies (FD)
Essentially,FDscanbeviewedasaspecialkindofIntegrity Constraints (IC)
IC = conditions that the (records in an) instance of the database MUST satisfy
E.g.,canNOTinsertanewemployee,unlessthedepartmentinwhichhes
supposed to work is already stored in the database
Wellseemoreofaformaltreatmentofsuchstatements
Inthiscontext,FDsarekindofICsthatarefocusedonattributes+keys
Typically, represented in a form:
A,B,C D, E
Meaning: the values of the attributes A, B, and C uniquely determine the values of
the attributes D and E.
An alternative interpretation:
X Y means:
Whenever there are (at least) two tuples, say, tuple_i and tuple_j for which their
valuesoftheattributeXareequal(i.e.,tuple_i.X=tuple_j.X),thentheirvalues
ontheattributeYmustalsobeequal(i.e.,tuple_i.Y=tuple_j.Y)

38
DB-Quality: Functional Dependencies (FD)
Example - Consider the following relation:
Observe:
d7 c3 b1 a2
d5 c3 b1 a2
d3 c2 b2 a1
d2 c1 b1 a1
d1 c1 b1 a1
D C B A
This instance satisfies:

A,B C
39
DB-Quality: Functional Dependencies (FD)
We use functional dependencies to:
test relations to see if they are legal under a given set of functional
dependencies.
If a relation Ri is legal under a set F of functional dependencies, we say that r
i

satisfies F.
specify constraints on the set of legal relations
We say that F holds on R if all legal relations on R satisfy the set of functional
dependencies F.
Note: A specific instance of a relation schema may satisfy a functional
dependency even if the functional dependency does not hold on all legal
instances.
For example, a specific instance of loan may, by chance, satisfy
amount customer_name.
NOTE: A functional dependency is trivial if it is satisfied by all instances of a
relation
Example:
customer_name, loan_number customer_name
customer_name customer_name
In general, o | is trivial if | _ o

40
DB-Quality: Functional Dependencies (FD)
Examples:

Employee(e_ID, eName, eDept, eLocation, eSalary, eStartDate)
Apparently:
e_ID eName;
e_ID eDept;
(after all, e_ID is unique)


Customer(custID, cName, cStreet, cCity, cState, cZip)
Apparently:
cZip {cState, cCity}
Canwesaythattheotherwayaroundisalsotrue:cState cZip ???
Note:mayomit{and}also,oftenusearrowstodenoteFDs
41
DB-Quality: Functional Dependencies (FD)
Given a set F of functional dependencies, there are certain other
functional dependencies that are logically implied by F.
For example: If A B and B C, then we can infer that A C

Specifically: e_ID eDept, and eDept eLocation, implies eID
Location

The set of all the functional dependencies logically implied by a given
set F is called the closure of F.
The closure of F is typically denoted by F
+
.
Clearly, F
+
is a superset of F.
i.e.,therearemoredependenciesinF+thanintheoriginalF.

42
DB-Quality: Functional Dependencies (FD)
So,givenastartingsetFofFDs,how do we compute its closure F+ (and why?)
Why =thesubtlelinebetweensoundness vs. completenessofANYformal
inferencesystem
How = Given F, apply some valid-inference-rules to generate newer, and newer,
and ,and some more_newer,,FDs(justwhendoweterminate???)
Well, first-thing-first: what are those validinferencerules
ArmstrongAxioms(abitlingo- cautionhere)
R1 (Reflexivity): If X _ Y, then X Y
R2 (Augmentation): If X Y, then XZ YZ
R3 (Transitivity): If X Y AND Y Z, then X Z (recall the example of
e_ID)
Additionalrules(convenienceonly,notfundamental)
R4 (Decomposition): If X YZ, then X Y (equivalently, XZ)
R5 (Union): If X Y AND X Z, then X YZ
R6 (Pseudo-Transitivity): If X Y AND WY Z, then WX Z


43
DB-Quality: FDs and Inference/Closure
Closure of FDs:
Let the initial set of FDs be F
init
See if some of the rules can be applied to F
init
so that a new set of FDs is
obtained, call it
F
1
= F
init
F
new-ones-init
See if some of the rules can be applied to so that a newer-new set of FDs is
obtained, call it
F
2
= F
1
F
newer-new-ones

Keep on repeating this procedure (recursively), UNTIL, at some points, F
N
=
F
(N-1)
, i.e., there canNOT be any newer-new ones (F
newer-new-ones(N-1)
= C)


Visually:
Armstrong Axioms++
=>
=>
=>
=>
=>
=> C
=>

44
DB-Quality: FDs and Inference/Closure
ASIDE:
Sometimes, NOT INTERESTED in the entire F+ (closure of all the FDs)
Instead, interested in whether a particular dependency X Y holds
Brute_Force:
Compute the entire F+ and check for X Y in it
Morefocused
Compute all the FDs that have a form X 567438-.9, and can be obtained from
initialFviaArmstrongsAxioms
Check if the desired X Y is there
Such subset of FDs is often called the Attribute-Closure of X

END_ASIDE

45
DB-Quality: FDs and Inference/Closure
Example applicationofArmstrongsAxioms:
F1: a {b,j}
F2: b c
F3: b {d,g}
F4: g {e,f}
F5: h i
F6: {i,k} j
F7: {i h}
When deriving new FDs from the existing ones,
the important aspect is justification:
-Which rule is applied
-To which existing FD
for a particular new-FD
F8: {a,b,d} {b,d} Reflexivity
F9: {g,k} {e,f,k} Augmentation, F4
F10: a b Decomposition, F1
F11: a c Transitivity, F10 and F2
F12: g e Decomposition, F4
F13: b {c,d,g} Union, F2 and F3
F14: {h,k} j Pseudo-Transitivity, F5 and F6
NOTE:Itmaybetemptingtoinferthatifonecontinuously
applies reflexivity, the process will never terminate.
However,theresalimitonthe#ofattributes
Derive some more as a HW-exercise
DB-Quality: Functional Dependencies (FD)
46
Formally (Closure of a given set F of FDs) :


F+ := F

repeat
for each functional dependency f (= X Y) in F+
apply the reflexivity and augmentation rules on f
add the resulting functional dependencies to F+
for each pair of functional dependencies f1 and f2 in F+
if f1 and f2 can be combined using transitivity rule
add the resulting FD to F+
until F+ does not change any further
Note:recallthatR4,R5andR6arenotthebaseones
hence, the algorithm above need not rely upon them.

However,itmakesitabitslower/less_elegant
DB-Quality: Functional Dependencies (FD)
Semi-Exampleofthealgorithm:
Let R (A, B, C, D)
Let F = {A BC; C D; B CD}
47
So, F+ : F;
Here are *some* of the reflexivity-based FDs:
AA; B B; C C; D D;

ABA; ABB; ACA; ACC; ADA; ADD; BCB; BCC; BDB; BDD; CDC;
CDD;

ABCAB; ABCAC; ABCBC; ABDAB; ABDAD; ABDBD; ACDAC; ACDAD;
ACDCD
BCDBC; BCDBD; BCDCD;

ABCDABC; ABCDABD; ABCDACD; ABCD BCD;

ABCDABCD;
Here are *some* augmentation-based FDs:
ADBCD (augment 1
st
FD in F); CADA (augment 2
nd
FD in F); CABDAB (augment 2
nd
FDinF)etc
Add all these into F+;
Work with transitivity-loop;
Repeatall
48
DB-Quality:Keys
Recall:
SuperKey = a subset of the set of attributes, which is sufficient to uniquely
identify the rest of the values.
In the new terminology: SuperKey = a subset of attributes sufficient to
functionally-determine alltherestoftheattributesinagiventable
CandidateKey = a set of attributes which is
A SuperKey;
No subset of it can be a SuperKey (i.e., it is minimal)
Ex: VideoTape(tapeID, title, genre, minutes, storeID)
{tapeID, title} is a SuperKey, but NOT a CandidateKey
tapeID is a CandidateKey
At the ER (or ER-to-Relational) stage, Keys are declared based on
intuition
However, the theory of FDs gives a formalism for the concept of keys
49
DB-Quality:Keys
Specifically, let:
S=(Attr1,Attr2,Attr3,,Attr_n)denoteaschema
F denote the set of initial dependencies;
F+ denote the closure(F)
Then,thesetofattributesX={B1,B2,,B_k}(_ {Attr1, Attr2, Attr3,
,Attr_n})iscalledaKeyofSif:
X AllAttributes X is an FD in F+
For each set of attributes Y c X, Y X is NOT in F+
Alternatively:
If X Z, and
X Z = AllAttributes
Then, X is a SuperKey
Now, we have:
Prime Attributes = attributes which are part of at least one key for a given
schema (as per the FD-based definition), e.g., if {a,b} {c,d}, then both
aandbareprime
Non-prime Attributes = attributes which do not participate in any key.
50
DB-Quality: Normal Forms
Basically, a set of rules which, if applied, bring the database schema to a
stagethatensuressomequalityguaranteesoftheDBdesign
Essentially, each Normal Form (NF) has specifications that precisely limit the
types of FDs that are allowed in the (schema of the) DB.
A relational schema R is in :-053 &6071; :607 <=&:> if the domains of all
attributes of R are atomic
Motivation: Non-atomic values complicate storage and encourage redundant
(repeated) storage of data
Example: Set of accounts stored with each customer, and set of owners stored with
each account

1NFisthehistorically-firstNormalForm
51
DB-Quality: Normal Forms - 2NF
2NF eliminates FDs which have a part-key on the LHS
Formally:
A relational schema is in 2NF if No NON-PRIME attributes are partially
dependent on any key of that schema
Alternatively:
A relational schema violates (is not in) the 2NF if there exists a FD X Y,
such that X is a subset of a key, and the attributes in Y are all non-prime
Example:
Assume we have the relational schema R(a, b, c, d, e, f, g, h, i, j), where the
key = {a,h}
Also, assume that we have the FD (in F+): a {b, c, d, e, f, g, k}
Due to the fact that {a, h} is a key, and all the b, c, d, e, f, g, k are non-prime,
R is NOT in 2NF
Q: how does one bring R into 2NF?
A: Decomposition

52
DB-Quality: Normal Forms - 2NF
Rule:
If a given schema violates 2NF with respect to some FD, then break it into (at least)
two smaller schemas, such that the partial-key dependencies (i.e.,thereasonfor
violation)do NOT hold anymore
Example:
R (a, b, c, d, e, f, g, h, i, j, k),
Key: {a,h}
FD: a {b, c, d, e, f, g, k}
All the attributes that participate in
thetroubledFD+maketheLHS
of that FD a Key
All the rest of the attributes that are NOT
partoftheRHSofthetroubledFD
R1 (a, h, i, j)
R2 ( a, b, c, d, f, g, k)
53
DB-Quality: Normal Forms - 3NF
OK, so we transformed R into R1 and R2, which is OK for 2NF.
Assume that we also have the following FDs (as part of F+):
FD=b {c, d, e, f, g}
FD=g {e, f}
FD={d,e,f} g
Recall that R2(a,b,c,d,f,g,k),whereaisthekey,hence:
EachofFd,FDandFDhasanon-key attribute on their LHS
Inasense,wehaveahintforatransitivedependencyoftheform:Key NonKey;
NonKeyNonPrime
A relational schema is in 3NF (Third Normal Form) if:
It is already in the 2NF;
There are no transitive-dependencies
Tobringaschemainto3NF(provideditisinto2NF),againwebreakitinto
partialschemas,w.r.t.agiventroubledFD.

54
DB-Quality: Normal Forms - 3NF
Example:
Continue with R2 ( a, b, c, d, f, g, k)
Recall:
FD=b {c, d, e, f, g}
FD=g {e, f}
FD={d,e,f} g
NOTE: when > 1
troubledFDs,
random-pick
FD=b {c, d, e, f, g}
SeparateschemaforthetroubledFD,whereitsLHS
becomes a Key
Remove the attributes from the
RHSofthetroubledFD
R21 (a, b, k) in 3NF
R22 (b, c, d, e, f, g)
Stillviolates3NFpickFD
g {e,f}
R221 (b, c, d, g) R222 (g, e, f)
Same split-rulesapplied
55
DB-Quality: Normal Forms - 3NF
SUMMARY: so, started with some R and some FDs
Ended up with:
R1, due to 2NF violation via a {b, c, d, e, f, g, k}
R2, due to 2NF, which violated 3NF for some other FDs, so:
R21 (3NF) via b {c, d, e, f, g}
R22 still violating 3NF (via g {e,f} ), so split it in
R221
R222 both in 3NF

NOTE: it was a coincidence that:
R221andR222arein3NFw.r.t.FD
Lucky-choiceoffollowingFDwithFD
Homework:
SplitR22withrespecttoFDfirst,andseewhathappenswithrespecttothelast-FD-
left(FD)

56
DB-Quality: Normal Forms - 3NF
3NFfeatures:
Allows some redundancy;
But functional dependencies can be checked on individual relations
withoutcomputingajoin.
Itisdependency-preservingdecomposition,andobtainingthe3NFis
always possible (algorithmically).

Example:
Let Fc be a canonical cover for F;
i := 0;
for each functional dependency o | in Fc do
if none of the schemas Rj, 1 s j s i contains o |
then begin
i := i + 1;
Ri := o |
end
if none of the schemas Rj, 1 s j s i contains a candidate key for
R
then begin
i := i + 1;
Ri := any candidate key for R;
end
return (R1, R2, ..., Ri)

57
DB-Quality: Normal Forms - 3NF
Consider:
R = (J, K, L )
FDs = {JK L, L K }
Assume two candidate keys: JK and JL
Hence, R is in 3NF
JK L is OK because JK is a superkey
L K is OK because K is contained in a candidate key
However, there still may be some problems with this schema:

J
j
1

j
2

j
3

null
L
l
1

l
1

l
1

l
2
K
k
1

k
1

k
1

k
2
- Repetition of information (e.g., the relationship l1, k1)
-May need to use null values (e.g., to represent the
relationship l2, k2 where there is no corresponding
value for J).

58
DB-Quality: Normal Forms - BCNF
Boyce-Codd Normal Form:
The easiest NF to specify:
No Non-Key Dependencies
(i.e., eliminates the problem of a prime-attribute being dependent on a non-key)
Catches:
Since it is NOT purely-key-based (for the LHS of the FDs), it may yield a
decompositionthatdoesNOTpreservecertainoriginaldependencies
Strongerthen3NF,butinpractice3NFisoftengoodenough
Therearehigher-NFs
4NF
Based on Multivalued Dependencies
Even 5NF

Notcoveredinthisclass
No BCNF for the exams
DB-Quality:PracticeProblems
1:
Given R (A, B, C, D, E)
F = ABC; CDE; BD; EA;
Compute all the non-trivial functional dependencies that can be derived from F, using
ArmstrongAxioms
2:
Suppose we have the schema R(A, B, C, D, E), and the same set of functional
dependenciesasin1above
Assuming that (AE) constitutes the key of R, is the decomposition R1(A,B,C) and
R2(A,D,E) in 3NF? Is it loss-less?

59

Potrebbero piacerti anche