Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
DatabaseDesignTheory
Guidessystematicimprovementstodatabaseschemas
DesignTheoryforRelationalDBs: Generalidea:
Expressconstraintsonthedata
FunctionalDependencies, Usethesetodecomposetherelations
Ultimately,getaschemathatisinanormalformthat
Decompositions,NormalForms guaranteescertaindesirableproperties
Normalinthesenseofconformingtoastandard
Introductiontodatabases Theprocessofconvertingaschematoanormalformiscalled
CSCC43Winter2012 normalization
RyanJohnson
ThankstoManosPapagelis,JohnMylopoulos,ArnoldRosenbloom
andReneeMillerformaterialintheseslides 2
3 4
Goal#1:removeredundancy Goal#2:expressingconstraints
Considerthisschema Considerthefollowingsetsofschemas:
StudentName StudentEmail Course Instructor Students(utorid,name,email)
vs.
Xiao xiao@gmail CSCC43 Johnson
Students(utorid,name)
Xiao xiao@gmail CSCD08 Bretscher Emails(utorid,address)
Jaspreet jaspreet@utsc CSCC43 Johnson Consideralso:
Whatif House(street,city,value,owner,propertyTax)
Xiaochangesemailaddresses?(updateanomaly) vs.
XiaodropsCSCD08?(deletionanomaly) House(street,city,value,owner)
UTSCcreatesanewcourse,CSCC44(insertionanomaly) TaxRates(city,value,propertyTax)
Multiplerelations=>exponentiallyworse Dependencies,constraintsaredomaindependent
1
6
Functionaldependencies
LetX,YbesetsofattributesfromrelationR
X>YisanassertionabouttuplesinR
AnytupleswhichagreeinallattributesofXmustalsoagreeinall
PartI: attributesofY
XfunctionallydeterminesY
FunctionalDependencies Or,ThevaluesofattributesYareafunctionofthoseinX
Notnecessarilyaneasyfunctiontocompute,mindyou
=>ConsiderX>h,wherehisthehashofattributesinX
Notationalconventions
a,b,c specificattributes
A,B,C setsof(unnamed)attributes
abc>def sameas{a,b,c}>{d,e,f}
Mostcommontoseesingletons (X>yorabc>d)
7 8
FD:relaxestheconceptofakey RulesandprinciplesaboutFDs
Functionaldependency:X >Y Rules
Superkey:X >R Thesplitting/combiningrule
TrivialFDs
Asuperkeymustincludeallremainingattributes Thetransitiverule
oftherelationontheRHS
AlgorithmsrelatedtoFDs
AnFDcaninvolvejustasubsetofthem theclosureofasetofattributesofarelation
Example: aminimalbasisofarelation
Houses(street,city,value,owner,tax)
street,city>value,owner,tax(bothFDandkey)
city,value>tax(FDonly)
2
9 10
11 12
TrivialFDs Transitiverule
Notallfunctionaldependenciesareuseful ThetransitiveruleholdsforFDs
A>Aalwaysholds ConsidertheFDs:a>b and b>c;thena>cholds
abc>aalsoalwaysholds(rightsideissubsetofleftside) ConsidertheFDs:ad>b and b>cd;thenad>cdholdsor
justad>c (becauseofthetrivialdependencyrule)
FDwithanattributeonbothsidesistrivial
SimplifybyremovingL RfromR
abc>adbecomesabc>d
Or,insingletonform,deletetrivialFDs
abc>aandabc>dbecomesjustabc>d
3
13 14
Identifyingfunctionaldependencies CoincidenceorFD?
FDsaredomainknowledge ID Email City Country Surname
1983 tom@gmail.com Toronto Canada Fairgrieve
Intrinsicfeaturesofthedatayouredealingwith
8624 mar@bell.com London Canada Samways
Somethingyouknow(orassume)aboutthedata 9141 scotty@gmail.com Winnipeg Canada Samways
DatabaseenginecannotidentifyFDsforyou 1204 birds@gmail.com Aachen Germany Lakemeyer
Designermustspecifythemaspartofschema WhatifwetrytoinferFDsfromthedata?
DBMScanonlyenforceFDswhentoldto ID>email,city,country,surname
DBMScannotsafelyoptimizeFDseither email>city,country,surname
Ithasonlyafinitesampleofthedata city>country
AnFDconstrainstheentiredomain surname>country
DomainknowledgerequiredtovalidateFDs
15 16
KeysandFDs Candidatekeysvs.superkeys
ConsiderrelationRwithattributesA Considertheserelations
Students(ID,surname,name,email,address,major)
Superkey Houses(street,city,value,owner,tax)
AnyS As.t.S>A
=>AnysubsetofAwhichdeterminesallremainingattributesinA
Whatarethecandidatekeys?
Students:ID,whatelse?
Candidatekey(orkey) Houses:?
C As.t.C>AandX>AdoesnotholdforanyX C Whatothersuperkeysexist?
=>Asuperkeywhichcontainsnoothersuperkeys Students:ID,surnameID,nameID,name,surname
=>Removeanyattributeandyounolongerhaveakey Houses:?
Primarykey Primeattributes?
Thecandidatekeyweusetoidentifytherelation Students:?
=>Alwaysexists,onlyoneallowed,doesntmatterwhichCweuse Houses:?
Primeattribute
candidatekeyCs.t.x C(attributethatparticipatesinatleastonekey)
4
17 18
Cyclicfunctionaldependencies? GeometricviewofFDs
AttributesonrightsideofoneFDmayappear LetDbethedomainoftuplesinR
onleftsideofanother! EverypossibletupleisapointinD
Simpleexample:assumerelation(A,B)&FDs:A>B,B>A FDXonRrestrictstuplesinRtoasubsetofD
WhatdoesthissayaboutAandB? PointsinDwhichviolateXcannotbeinR
Example Example:D(x,y,z)
studentID>emailemail>studentID xy>z
(1,1,2)
=>z=abs(x)+abs(y) (1,1,0) (0,0,1)
z>x,y (1,1,2)
=>x=y=abs(z)/2 (1,1,2) (0,0,0) (2,2,4)
(2,2,4)
(1,1,2) (3,2,1)
(1,2,3)
19 20
Inferringfunctionaldependencies ClosuretestforFDs
Problem GivenattributesetAandFDsetF
GivenFDsX1 >a1,X2 >a2,etc. DenoteAF+ astheclosureofArelativetoF
DoessomeFDY>B(notgiven)alsohold? =>AF+ =setofallFDsgivenorimpliedbyA
Considerthedependencies Computingthe[transitive]closureofA
A>BB>C Start:AF+ =A,F=F
Intuitively,A>Calsoholds WhileX Fs.t.LHS(X) AF+:
ThegivenFDsentail(imply)it(transitivityrule) AF+ =AF+ U RHS(X)
F=F X
Atend:A>BB AF+
Howtoproveitinthegeneralcase?
5
21 22
IsAB> EentailedbyF?Yes
IsD>CentailedbyF?No
a b c d e f a b c d e f
Result:XF+allowsustodetermineallFDsoftheform
X> Y entailedby F
23 24
DiscardingredundantFDs Constructingaminimalbasis
Minimalbasis:oppositeextremefromclosure Straightforwardbuttimeconsuming
GivenasetofFDsF,wanttominimizeFs.t. 1. SplitallRHSintosingletons
F F 2. X F,testwhetherJ=(FX)+ isstillequivalenttoF+
FentailsXXF =>MightmakeFtoosmall
3. i LHS(X)X F,letLHS(X)=LHS(X)i
PropertiesofaminimalbasisF Testwhether(FX+X)+ isstillequivalenttoF+
RHSisalwayssingleton =>MightmakeFtoobig
IfanyFDisremovedfromF,Fisnolongeraminimalbasis 4. Repeat(2)and(3)untilneithermakesprogress
IfforanyFDinFweremoveoneormoreattributesfrom
theLHSofF,theresultisnolongeraminimalbasis
6
25 26
MinimalBasis:Example MinimalBasis:Example(cont.)
RelationR:R(A,B,C,D) 1st Step
DefinedFDs: H={A>A,A>C,B>A,B>B,B>C,D>A,D>B,D>C}
F={A>AC,B>ABC,D>ABC} 2nd Step
A>A,B>B:can beremovedastrivial
A>C:cant beremoved,asthereisnootherLHSwithA
FindtheminimalBasisMofF B>A:cant beremoved,becauseforJ=H{B>A}isB+=BC
B>C:can beremoved,becauseforJ=H{B>C}isB+=ABC
D>A:can beremoved,becauseforJ=H{D>A}isD+=DBA
D>B:cant beremoved,becauseforJ=H{D>B}isD+=DC
D>C:can beremoved,becauseforJ=H{D>C}isD+=DBAC
Stepoutcome=>H={A>C,B>A,D>B}
27 28
MinimalBasis:Example(cont.) MinimalBasis:Example2
3rd Step RelationR:R(A,B,C)
HdoesntchangeasallLHSinHaresingleattributes DefinedFDs:
4th Step A>B,A>C,B>C,B>A,C>A,C>B
Hdoesntchange AB>,ACB,BC>A
A>BC
A>A
MinimalBasis:M=H={A>C,B>A,D>B}
PossibleMinimalBases:
{A>B,B>A,B>C,C>B}or
{A>B,B>C,C>A}
7
29 30
RepresentingFDsasgraphs Example:FDsetasgraph
Insight:treatanFDasadirectededgeinagraph Example1:a>bcb>cd>b
EntireLHSbecomesaclosednode(ornodecluster) d dnotprime(dominatedbya)
b d
EachattributeofRHSbecomesanopennode
a b b c a b
b c twodifferent
DrawedgefromLHStoRHS
csinks
OKtomergeopennode(s)withamatchingclosednode c c
=>Illegaltomergeopennodeswitheachotherdirectly! Example2:ab>cc>dce>a
Terminologyintermsofgraphs
Superkey:setofnodeswhichreachesallsinks c d
a c d
Candidatekey:anynonredundantsetofsourceswhich c
a b e
reachesallsinks(e.g.removinganysourceorphans1+sinks) a e
=>Sourcenode<=/=>primeattribute c
b
32
FDsandredundancy
GivenrelationRandFDsF
Roftenexhibitsanomaliesduetoredundancy
PartII: Fidentifiesmany(notall)oftheunderlyingproblems
Idea
Schemadecomposition UseFtoidentifygoodwaystosplitrelations
SplitRinto2+smallerrelationshavinglessredundancy
SplitupFintosubsetswhichapplytothenewrelations
(computetheprojectionoffunctionaldependencies)
8
33 34
35 36
Gotcha:lossyjoindecomposition Informationlosswithdecomposition
Considerarelationwithonemoretuple DecomposeRintoSandT
ConsiderFDa>b,witha onlyinS andb onlyinT
StudentName StudentEmail Course Instructor
FDloss
Xiao xiao@gmail CSCC43 Johnson Attributesa andb nolongerinsamerelation
Xiao xiao@gmail CSCD08 Bretscher =>MustjoinTandStoenforcea>b(expensive)
Jaspreet jaspreet@utsc CSCC43 Johnson Joinloss
LHSandRHSnolongerinsamerelation,nootherconnection
Mary mary@utsc CSCD08 Rosenburg Neither(S T)>Snor(S T)>TinF+
=>JoiningTandSproducesbogustuples(irreparable)
Students Taking Courses hasbogustuples! Inourexample:
MaryisnottakingBretscherssectionofD08 ({email,course} {course,instructor})={course}
XiaoisnotinRosenburgssectionofD08 course/>instructorandcourse/>email
Whydidthishappen?Howtopreventit?
9
37 38
FDlossasagraph Joinlossasagraph
title title title title title star
39 40
ProjectingFDs FDprojectionalgorithm
Oncewevesplitarelationwehavetorefactor StartwithFi =
ourFDstomatch ForeachsubsetXofRi
EachFDsmustonlymentionattributesfromonerelation ComputeX+
Similartogeometricprojection ForeachattributeainX+
IfaisinRi
Manypossibleprojections(dependsonhowwesliceit)
addX>atoFi
Keeponlytheonesweneed(minimalbasis)
ComputetheminimalbasisofFi
Projectionisexpensive
SupposeR1 hasnattributes
HowmanysubsetsofR1 arethere?
10
41 42
Makingprojectionmoreefficient Example:ProjectingFDs
Ignoretrivialdependencies ABC withFDsA>B andB>C
NoneedtoaddX >A ifAisinXitself A +=ABC ;yieldsA>B,A>C
WedonotneedtocomputeAB+ orAC+
Ignoretrivialsubsets
B +=BC ;yieldsB>C
Theemptysetorthesetofallattributes(botharesubsetsof
C+=C;yieldsnothing.
X)
BC +=BC;yieldsnothing.
IgnoresupersetsofXifX + =R
TheycanonlygiveusweakerFDs(withmoreontheLHS)
41 42
43
Example Continued
ResultingFDs:A>B,A>C,andB>C
ProjectionontoAC :A>C
OnlyFDthatinvolvesasubsetof{A,C} PartIII:
ProjectiononBC:B>C Normalforms
OnlyFDthatinvolvessubsetof{B,C}
43
11
45 46
47 48
12
49 50
51 52
3NFintermsofgraphs 3NF,dependencies,andjoinloss
3NFviolation:transitivedependency Theorem:alwayspossibletoconvertaschematojoin
lossless,dependencypreserving3NF
title title lostredundant Caveat:alwayspossible tocreateschemasin3NFfor
FD whichthesepropertiesdonothold
year year Joinlossexample1:
MovieInfo(title,year,studioName)
StudioAddress(title,year,studioAddress)
=>CannotenforcestudioName>studioAddress
Note:OKfordecompositiontoloseredundantFDs
13
53 54
Graphsandlossydecomposition BoyceCoddnormalform(BCNF)
Loss:anFDwhichspanstworelations Oneadditionalrestrictionover3NF
Joinlossifnotransitiveconnectionbetweenthetwonodes AllnontrivialFDhavesuperkeyLHS
=>Nosetofjoinscanreconstructtheconnection Counterexample
Our3NFexampleshowedalostdependency CanadianAddress(street,city,province,postalCode)
title,year >studioAddr Candidatekeys:{street,postalCode},{street,city,province}
=>Nojoinlossbecausetitle>year>studioName >studioAddr FD:postalCode>city,province
Satisfies3NF:city,provincebothnonprime
ViolatesBCNF:postalCodeisnotasuperkey
=>PossibleanomaliesinvolvingpostalCode
Dowecare?Howoftendopostalcodeschange?
55 56
AnotherExample MoreExamples
emps(emp_id,emp_name,emp_phone,dept_name,emp_city, Manager Project Branch Division
emp_straddr) Brown Mars Chicago 1
Green Jupiter Birmingham 1
empadds(emp_city,emp_zip,emp_straddr) Green Mars Birmingham 1
FDs: Hoskins Saturn Birmingham 2
Hoskins Venus Birmingham 2
emp_id>emp_nameemp_phonedept_dname
emp_cityemp_straddr>emp_zip Functionaldependencies:
emp_zip>emp_city Manager>Branch,Division eachmanagerworksatone
branchandmanagesonedivision
TheFDemp_zip>emp_cityispreservedintherelation Branch,Division>Manager foreachbranchanddivision
empaddsbutemp_zipisnotakey.TheschemaisnotinBCNF. thereisasinglemanager
Project,Branch>Division,Manager foreachbranch,a
Theattributeemp_cityisprime(thereiskeyemp_city projectisallocatedtoasingledivisionandhasasole
emp_straddr).Hencetheschemaisin3NF. managerresponsible
14
57 58
Agooddecomposition Limitsofdecomposition
Picktwo
Project Branch Division
Manager Branch Division Mars Chicago 1 Losslessjoin
Brown Chicago 1 Jupiter Birmingham 1
Mars Birmingham 1
Dependencypreservation
Green Birmingham 1
Hoskins Birmingham 2 Saturn Birmingham 2 Anomalyfree
Venus Birmingham 2
3NF
Note:Thefirstrelationhasasecondkey{Branch,Division} Alwaysallowsjoinlosslessanddependencypreserving
Thedecompositionisin3NFbutnotinBCNF;moreover,itis Mayallowsomeanomalies
losslessanddependenciesarepreserved BCNF
ThisexampledemonstratesthatBCNFistoostrongacondition Alwaysexcludesanomalies
toimposeonarelationalschema
Maygiveuponeofjoinlosslessordependencypreserving
Usedomainknowledgetochoose3NFvs.BCNF
15