Sei sulla pagina 1di 15

2

DatabaseDesignTheory
Guidessystematicimprovementstodatabaseschemas
DesignTheoryforRelationalDBs: Generalidea:
Expressconstraintsonthedata
FunctionalDependencies, Usethesetodecomposetherelations
Ultimately,getaschemathatisinanormalformthat
Decompositions,NormalForms guaranteescertaindesirableproperties
Normalinthesenseofconformingtoastandard
Introductiontodatabases Theprocessofconvertingaschematoanormalformiscalled
CSCC43Winter2012 normalization

RyanJohnson

ThankstoManosPapagelis,JohnMylopoulos,ArnoldRosenbloom
andReneeMillerformaterialintheseslides 2

3 4

Goal#1:removeredundancy Goal#2:expressingconstraints
Considerthisschema Considerthefollowingsetsofschemas:
StudentName StudentEmail Course Instructor Students(utorid,name,email)
vs.
Xiao xiao@gmail CSCC43 Johnson
Students(utorid,name)
Xiao xiao@gmail CSCD08 Bretscher Emails(utorid,address)
Jaspreet jaspreet@utsc CSCC43 Johnson Consideralso:
Whatif House(street,city,value,owner,propertyTax)
Xiaochangesemailaddresses?(updateanomaly) vs.
XiaodropsCSCD08?(deletionanomaly) House(street,city,value,owner)
UTSCcreatesanewcourse,CSCC44(insertionanomaly) TaxRates(city,value,propertyTax)

Multiplerelations=>exponentiallyworse Dependencies,constraintsaredomaindependent

1
6

Functionaldependencies
LetX,YbesetsofattributesfromrelationR
X>YisanassertionabouttuplesinR
AnytupleswhichagreeinallattributesofXmustalsoagreeinall
PartI: attributesofY
XfunctionallydeterminesY
FunctionalDependencies Or,ThevaluesofattributesYareafunctionofthoseinX
Notnecessarilyaneasyfunctiontocompute,mindyou
=>ConsiderX>h,wherehisthehashofattributesinX
Notationalconventions
a,b,c specificattributes
A,B,C setsof(unnamed)attributes
abc>def sameas{a,b,c}>{d,e,f}

Mostcommontoseesingletons (X>yorabc>d)

7 8

FD:relaxestheconceptofakey RulesandprinciplesaboutFDs
Functionaldependency:X >Y Rules
Superkey:X >R Thesplitting/combiningrule
TrivialFDs
Asuperkeymustincludeallremainingattributes Thetransitiverule
oftherelationontheRHS
AlgorithmsrelatedtoFDs
AnFDcaninvolvejustasubsetofthem theclosureofasetofattributesofarelation
Example: aminimalbasisofarelation
Houses(street,city,value,owner,tax)
street,city>value,owner,tax(bothFDandkey)
city,value>tax(FDonly)

2
9 10

TheSplitting/CombiningruleofFDs SplittingFDs example


Attributesonrightindependentofeachother ConsidertherelationandFD
Considera,b,c>d,e,f EmailAddress(user,domain,firstName,lastName)
Attributesa,b,andcfunctionallydetermined,e,andf user,domain>firstName,lastName
=>Nomentionofdrelatingtoeorfdirectly Thefollowinghold
Splittingrule(UsefultosplituprightsideofFD) user,domain>firstName
abc>def becomesabc>d,abc>e andabc>f user,domain>lastName
Nosafewaytosplitleftside ThefollowingdoNOThold!
abc>def isNOTthesameasab>def andc>def! user>firstName,lastName
Combiningrule(Usefultocombinerightsides): domain>firstName,lastName
ifabc>d,abc>e,abc>fholds,thenabc>defholds
Gotcha:doesnthold=notalltuples!=alltuplesnot

11 12

TrivialFDs Transitiverule
Notallfunctionaldependenciesareuseful ThetransitiveruleholdsforFDs
A>Aalwaysholds ConsidertheFDs:a>b and b>c;thena>cholds
abc>aalsoalwaysholds(rightsideissubsetofleftside) ConsidertheFDs:ad>b and b>cd;thenad>cdholdsor
justad>c (becauseofthetrivialdependencyrule)
FDwithanattributeonbothsidesistrivial
SimplifybyremovingL RfromR
abc>adbecomesabc>d
Or,insingletonform,deletetrivialFDs
abc>aandabc>dbecomesjustabc>d

3
13 14

Identifyingfunctionaldependencies CoincidenceorFD?
FDsaredomainknowledge ID Email City Country Surname
1983 tom@gmail.com Toronto Canada Fairgrieve
Intrinsicfeaturesofthedatayouredealingwith
8624 mar@bell.com London Canada Samways
Somethingyouknow(orassume)aboutthedata 9141 scotty@gmail.com Winnipeg Canada Samways
DatabaseenginecannotidentifyFDsforyou 1204 birds@gmail.com Aachen Germany Lakemeyer

Designermustspecifythemaspartofschema WhatifwetrytoinferFDsfromthedata?
DBMScanonlyenforceFDswhentoldto ID>email,city,country,surname
DBMScannotsafelyoptimizeFDseither email>city,country,surname
Ithasonlyafinitesampleofthedata city>country
AnFDconstrainstheentiredomain surname>country

DomainknowledgerequiredtovalidateFDs

15 16

KeysandFDs Candidatekeysvs.superkeys
ConsiderrelationRwithattributesA Considertheserelations
Students(ID,surname,name,email,address,major)
Superkey Houses(street,city,value,owner,tax)
AnyS As.t.S>A
=>AnysubsetofAwhichdeterminesallremainingattributesinA
Whatarethecandidatekeys?
Students:ID,whatelse?
Candidatekey(orkey) Houses:?
C As.t.C>AandX>AdoesnotholdforanyX C Whatothersuperkeysexist?
=>Asuperkeywhichcontainsnoothersuperkeys Students:ID,surnameID,nameID,name,surname
=>Removeanyattributeandyounolongerhaveakey Houses:?
Primarykey Primeattributes?
Thecandidatekeyweusetoidentifytherelation Students:?
=>Alwaysexists,onlyoneallowed,doesntmatterwhichCweuse Houses:?

Primeattribute
candidatekeyCs.t.x C(attributethatparticipatesinatleastonekey)

4
17 18

Cyclicfunctionaldependencies? GeometricviewofFDs
AttributesonrightsideofoneFDmayappear LetDbethedomainoftuplesinR
onleftsideofanother! EverypossibletupleisapointinD
Simpleexample:assumerelation(A,B)&FDs:A>B,B>A FDXonRrestrictstuplesinRtoasubsetofD
WhatdoesthissayaboutAandB? PointsinDwhichviolateXcannotbeinR
Example Example:D(x,y,z)
studentID>emailemail>studentID xy>z
(1,1,2)
=>z=abs(x)+abs(y) (1,1,0) (0,0,1)
z>x,y (1,1,2)
=>x=y=abs(z)/2 (1,1,2) (0,0,0) (2,2,4)
(2,2,4)
(1,1,2) (3,2,1)
(1,2,3)

19 20

Inferringfunctionaldependencies ClosuretestforFDs
Problem GivenattributesetAandFDsetF
GivenFDsX1 >a1,X2 >a2,etc. DenoteAF+ astheclosureofArelativetoF
DoessomeFDY>B(notgiven)alsohold? =>AF+ =setofallFDsgivenorimpliedbyA
Considerthedependencies Computingthe[transitive]closureofA
A>BB>C Start:AF+ =A,F=F
Intuitively,A>Calsoholds WhileX Fs.t.LHS(X) AF+:
ThegivenFDsentail(imply)it(transitivityrule) AF+ =AF+ U RHS(X)
F=F X
Atend:A>BB AF+

Howtoproveitinthegeneralcase?

5
21 22

Closuretest example Example:ClosureTest


ConsiderR(a,b,c,d,e,f) F:AB>C XXF+
withFDsab>c,ac>d,c>e,ade>f A> D
D> E A{A,D,E}
FindA+ ifA=aborfind{a,b}+ AC> B AB{A,B,C,D,E}
AC{A,C,B,D,E}
B{B}
a b c d e f a b c d e f
D{D,E}

IsAB> EentailedbyF?Yes
IsD>CentailedbyF?No
a b c d e f a b c d e f
Result:XF+allowsustodetermineallFDsoftheform
X> Y entailedby F

{a,b}+={a,b,c,de,f}or ab>cdef abisacandidatekey!

23 24

DiscardingredundantFDs Constructingaminimalbasis
Minimalbasis:oppositeextremefromclosure Straightforwardbuttimeconsuming
GivenasetofFDsF,wanttominimizeFs.t. 1. SplitallRHSintosingletons
F F 2. X F,testwhetherJ=(FX)+ isstillequivalenttoF+
FentailsXXF =>MightmakeFtoosmall
3. i LHS(X)X F,letLHS(X)=LHS(X)i
PropertiesofaminimalbasisF Testwhether(FX+X)+ isstillequivalenttoF+
RHSisalwayssingleton =>MightmakeFtoobig
IfanyFDisremovedfromF,Fisnolongeraminimalbasis 4. Repeat(2)and(3)untilneithermakesprogress
IfforanyFDinFweremoveoneormoreattributesfrom
theLHSofF,theresultisnolongeraminimalbasis

6
25 26

MinimalBasis:Example MinimalBasis:Example(cont.)
RelationR:R(A,B,C,D) 1st Step
DefinedFDs: H={A>A,A>C,B>A,B>B,B>C,D>A,D>B,D>C}
F={A>AC,B>ABC,D>ABC} 2nd Step
A>A,B>B:can beremovedastrivial
A>C:cant beremoved,asthereisnootherLHSwithA
FindtheminimalBasisMofF B>A:cant beremoved,becauseforJ=H{B>A}isB+=BC
B>C:can beremoved,becauseforJ=H{B>C}isB+=ABC
D>A:can beremoved,becauseforJ=H{D>A}isD+=DBA
D>B:cant beremoved,becauseforJ=H{D>B}isD+=DC
D>C:can beremoved,becauseforJ=H{D>C}isD+=DBAC
Stepoutcome=>H={A>C,B>A,D>B}

27 28

MinimalBasis:Example(cont.) MinimalBasis:Example2
3rd Step RelationR:R(A,B,C)
HdoesntchangeasallLHSinHaresingleattributes DefinedFDs:
4th Step A>B,A>C,B>C,B>A,C>A,C>B
Hdoesntchange AB>,ACB,BC>A
A>BC
A>A
MinimalBasis:M=H={A>C,B>A,D>B}
PossibleMinimalBases:
{A>B,B>A,B>C,C>B}or
{A>B,B>C,C>A}

7
29 30

RepresentingFDsasgraphs Example:FDsetasgraph
Insight:treatanFDasadirectededgeinagraph Example1:a>bcb>cd>b
EntireLHSbecomesaclosednode(ornodecluster) d dnotprime(dominatedbya)
b d
EachattributeofRHSbecomesanopennode
a b b c a b
b c twodifferent
DrawedgefromLHStoRHS
csinks
OKtomergeopennode(s)withamatchingclosednode c c
=>Illegaltomergeopennodeswitheachotherdirectly! Example2:ab>cc>dce>a
Terminologyintermsofgraphs
Superkey:setofnodeswhichreachesallsinks c d
a c d
Candidatekey:anynonredundantsetofsourceswhich c
a b e
reachesallsinks(e.g.removinganysourceorphans1+sinks) a e
=>Sourcenode<=/=>primeattribute c
b

32

FDsandredundancy
GivenrelationRandFDsF
Roftenexhibitsanomaliesduetoredundancy

PartII: Fidentifiesmany(notall)oftheunderlyingproblems
Idea
Schemadecomposition UseFtoidentifygoodwaystosplitrelations
SplitRinto2+smallerrelationshavinglessredundancy
SplitupFintosubsetswhichapplytothenewrelations
(computetheprojectionoffunctionaldependencies)

8
33 34

Schemadecomposition Splittingrelations example


GivenrelationRandFDsF Considerthefollowingrelation:
SplitRintoRi s.t.iRi R(nonewattributes) StudentName StudentEmail Course Instructor
SplitFintoFi s.t.iFentailsFi (nonewFDs)
Xiao xiao@gmail CSCC43 Johnson
Fi involvesonlyattributesinRi
Xiao xiao@gmail CSCD08 Bretscher
Caveat:entirelypossibletoloseinformation
Jaspreet jaspreet@utsc CSCC43 Johnson
F+ mayentailFDXwhichisnotin(Ui Fi)+
=>DecompositionlostsomeFDs Onepossibledecomposition
PossibletohaveR iRi Students(email,name)
=>Decompositionlostsomerelationships Courses(name,instructor)
Taking(studentEmail,courseName)
Goal:minimizeanomalieswithoutlosinginfo
Wellrevisitinformationlosslater

35 36

Gotcha:lossyjoindecomposition Informationlosswithdecomposition
Considerarelationwithonemoretuple DecomposeRintoSandT
ConsiderFDa>b,witha onlyinS andb onlyinT
StudentName StudentEmail Course Instructor
FDloss
Xiao xiao@gmail CSCC43 Johnson Attributesa andb nolongerinsamerelation
Xiao xiao@gmail CSCD08 Bretscher =>MustjoinTandStoenforcea>b(expensive)
Jaspreet jaspreet@utsc CSCC43 Johnson Joinloss
LHSandRHSnolongerinsamerelation,nootherconnection
Mary mary@utsc CSCD08 Rosenburg Neither(S T)>Snor(S T)>TinF+
=>JoiningTandSproducesbogustuples(irreparable)
Students Taking Courses hasbogustuples! Inourexample:
MaryisnottakingBretscherssectionofD08 ({email,course} {course,instructor})={course}
XiaoisnotinRosenburgssectionofD08 course/>instructorandcourse/>email

Whydidthishappen?Howtopreventit?

9
37 38

FDlossasagraph Joinlossasagraph
title title title title title star

year year year year year


lostFD! join
star loss
studio studio studioAddr star studioAddr

Joiningrecoversoriginalrelation salary studio studio


studioAddr
becausestudio>studioAddr

39 40

ProjectingFDs FDprojectionalgorithm
Oncewevesplitarelationwehavetorefactor StartwithFi =
ourFDstomatch ForeachsubsetXofRi
EachFDsmustonlymentionattributesfromonerelation ComputeX+
Similartogeometricprojection ForeachattributeainX+
IfaisinRi
Manypossibleprojections(dependsonhowwesliceit)
addX>atoFi
Keeponlytheonesweneed(minimalbasis)
ComputetheminimalbasisofFi
Projectionisexpensive
SupposeR1 hasnattributes
HowmanysubsetsofR1 arethere?

10
41 42

Makingprojectionmoreefficient Example:ProjectingFDs
Ignoretrivialdependencies ABC withFDsA>B andB>C
NoneedtoaddX >A ifAisinXitself A +=ABC ;yieldsA>B,A>C
WedonotneedtocomputeAB+ orAC+
Ignoretrivialsubsets
B +=BC ;yieldsB>C
Theemptysetorthesetofallattributes(botharesubsetsof
C+=C;yieldsnothing.
X)
BC +=BC;yieldsnothing.
IgnoresupersetsofXifX + =R
TheycanonlygiveusweakerFDs(withmoreontheLHS)

41 42

43

Example Continued
ResultingFDs:A>B,A>C,andB>C
ProjectionontoAC :A>C
OnlyFDthatinvolvesasubsetof{A,C} PartIII:
ProjectiononBC:B>C Normalforms
OnlyFDthatinvolvessubsetof{B,C}

43

11
45 46

Motivationfornormalforms 1st normalform(1NF)


Identifyagoodschema Nomultivaluedattributesallowed
Forsomedefinitionofgood Imaginestoringalist/setofthingsinanattribute
Avoidanomalies,redundancy,etc. =>NotreallyevenexpressibleinRA
Manynormalforms Counterexample
1st Course(name,instructor,[student,email]*)
2nd Redundancyinnonlistattributes
3rd
Name Instructor StudentName StudentEmail
BoyceCodd
CSCC43 Johnson Xiao xiao@gmail
...andseveralmorewewontdiscuss
Jaspreet jaspreet@utsc
Mary mary@utsc
BCNF 3NF 2NF 1NF(focuson3NF/BCNF) CSCD08 Rosenburg Jaspreet jaspreet@utsc

47 48

1NFintermsofgraphs? 2nd normalform(2NF)


Weonlyneedtographtheschema Nonprimeattributesdependoncandidatekeys
=>Structureoftuplesdoesnotvaryfromtupletotuple Considernonprime(ie.notpartofakey)attributea
Consideragainourexample ThenFDXs.t.X>aandXisacandidatekey
=>Cannotcapturethestructureatschemalevelonly Counterexample
Movies(title,year,star,studio,studioAddress,salary)
Name Instructor StudentName StudentEmail
FD:title,year>studio;studio>studioAddress;star>salary
CSCC43 Johnson Xiao xiao@gmail
Title Year Star Studio StudioAddr Salary
Jaspreet jaspreet@utsc
StarWars 1977 Hamill Lucasfilm 1LucasWay $100,000
Mary mary@utsc StarWars 1977 Ford Lucasfilm 1LucasWay $100,000
CSCD08 Rosenburg Jaspreet jaspreet@utsc StarWars 1977 Fisher Lucasfilm 1LucasWay $100,000
PatriotGames 1992 Ford Paramount Cloud9 $2,000,000
LastCrusade 1989 Ford Lucasfilm 1LucasWay $1,000,000

12
49 50

2NFintermsofgraphs 3rd normalform(3NF)


Requireapathfromeverysourcetoeverysink Nonprimeattr.dependonly oncandidatekeys
Notrivialedgesallowed! ConsiderFDX>a
Disconnectedcomponentsviolate2NF Eithera XORXisasuperkeyORaisprime(partofakey)
Watchfornodeclusterswhicharesubsetsofcandidatekeys =>Notransitivedependenciesallowed
Counterexample:
title year star studio>studioAddr
nopathfrom (studioAddrdependsonstudio whichisnotacandidatekey)

trivial titleyeartosalary, Title Year Studio StudioAddr


studio norfromtitleyearstar
StarWars 1977 Lucasfilm 1LucasWay
redundant tostudio/studioAddr
PatriotGames 1992 Paramount Cloud9
studioAddr salary LastCrusade 1989 Lucasfilm 1LucasWay

51 52

3NFintermsofgraphs 3NF,dependencies,andjoinloss
3NFviolation:transitivedependency Theorem:alwayspossibletoconvertaschematojoin
lossless,dependencypreserving3NF
title title lostredundant Caveat:alwayspossible tocreateschemasin3NFfor
FD whichthesepropertiesdonothold
year year Joinlossexample1:
MovieInfo(title,year,studioName)
StudioAddress(title,year,studioAddress)
=>CannotenforcestudioName>studioAddress

studioName studioName studioName Joinlossexample2:


Movies(title,year,star)
StarSalary(star,salary)
studioAddr studioAddr =>CannotenforceMoviesStarSalaryyieldsbogustuples(irreparable)

Note:OKfordecompositiontoloseredundantFDs

13
53 54

Graphsandlossydecomposition BoyceCoddnormalform(BCNF)
Loss:anFDwhichspanstworelations Oneadditionalrestrictionover3NF
Joinlossifnotransitiveconnectionbetweenthetwonodes AllnontrivialFDhavesuperkeyLHS
=>Nosetofjoinscanreconstructtheconnection Counterexample
Our3NFexampleshowedalostdependency CanadianAddress(street,city,province,postalCode)
title,year >studioAddr Candidatekeys:{street,postalCode},{street,city,province}
=>Nojoinlossbecausetitle>year>studioName >studioAddr FD:postalCode>city,province

Satisfies3NF:city,provincebothnonprime
ViolatesBCNF:postalCodeisnotasuperkey
=>PossibleanomaliesinvolvingpostalCode

Dowecare?Howoftendopostalcodeschange?

55 56

AnotherExample MoreExamples
emps(emp_id,emp_name,emp_phone,dept_name,emp_city, Manager Project Branch Division
emp_straddr) Brown Mars Chicago 1
Green Jupiter Birmingham 1
empadds(emp_city,emp_zip,emp_straddr) Green Mars Birmingham 1
FDs: Hoskins Saturn Birmingham 2
Hoskins Venus Birmingham 2
emp_id>emp_nameemp_phonedept_dname
emp_cityemp_straddr>emp_zip Functionaldependencies:
emp_zip>emp_city Manager>Branch,Division eachmanagerworksatone
branchandmanagesonedivision
TheFDemp_zip>emp_cityispreservedintherelation Branch,Division>Manager foreachbranchanddivision
empaddsbutemp_zipisnotakey.TheschemaisnotinBCNF. thereisasinglemanager
Project,Branch>Division,Manager foreachbranch,a
Theattributeemp_cityisprime(thereiskeyemp_city projectisallocatedtoasingledivisionandhasasole
emp_straddr).Hencetheschemaisin3NF. managerresponsible

14
57 58

Agooddecomposition Limitsofdecomposition
Picktwo
Project Branch Division
Manager Branch Division Mars Chicago 1 Losslessjoin
Brown Chicago 1 Jupiter Birmingham 1
Mars Birmingham 1
Dependencypreservation
Green Birmingham 1
Hoskins Birmingham 2 Saturn Birmingham 2 Anomalyfree
Venus Birmingham 2
3NF
Note:Thefirstrelationhasasecondkey{Branch,Division} Alwaysallowsjoinlosslessanddependencypreserving
Thedecompositionisin3NFbutnotinBCNF;moreover,itis Mayallowsomeanomalies
losslessanddependenciesarepreserved BCNF
ThisexampledemonstratesthatBCNFistoostrongacondition Alwaysexcludesanomalies
toimposeonarelationalschema
Maygiveuponeofjoinlosslessordependencypreserving

Usedomainknowledgetochoose3NFvs.BCNF

15

Potrebbero piacerti anche