Sei sulla pagina 1di 20

Best Practices for Semantic Data Modeling for Performance and Scalability

SQL Server Technical Article

Writer: Sharon Bjeletich Technical Reviewer: Thomas Kejser

Published: August, 2

! !

A""lies To: S#$ Server 2

Summary: %ore and more business a""lications are architected as business &ramewor's, where the core data model o& the &ramewor' must su""ort customers who wor' with di&&erent database objects and attributes, as well as allow &or e(tensive customi)ation* This "a"er covers some o& the issues that can arise when it is di&&icult to decide whether to use an object+oriented or relational a""roach to designing the database* ,t includes a""roaches to im"rove "er&ormance and scalabilit-* This "a"er is &or database develo"ers who are &amiliar with semantic modeling challenges*

.ilename: 2 /

00/*doc

Co yright
The in&ormation contained in this document re"resents the current view o& %icroso&t 1or"oration on the issues discussed as o& the date o& "ublication* Because %icroso&t must res"ond to changing mar'et conditions, it should not be inter"reted to be a commitment on the "art o& %icroso&t, and %icroso&t cannot guarantee the accurac- o& an- in&ormation "resented a&ter the date o& "ublication*

This White Pa"er is &or in&ormational "ur"oses onl-* %,1R2S2.T %AK3S 42 WARRA4T,3S, 35PR3SS, ,%P$,36 2R STAT7T2R8, AS T2 T93 ,4.2R%AT,24 ,4 T9,S 6217%34T*

1om"l-ing with all a""licable co"-right laws is the res"onsibilit- o& the user* Without limiting the rights under co"-right, no "art o& this document ma- be re"roduced, stored in or introduced into a retrieval s-stem, or transmitted in an- &orm or b- an- means :electronic, mechanical, "hotoco"-ing, recording, or otherwise;, or &or an- "ur"ose, without the e("ress written "ermission o& %icroso&t 1or"oration*

%icroso&t ma- have "atents, "atent a""lications, trademar's, co"-rights, or other intellectual "ro"ert- rights covering subject matter in this document* 3(ce"t as e("ressl- "rovided in an- written license agreement &rom %icroso&t, the &urnishing o& this document does not give -ou an- license to these "atents, trademar's, co"-rights, or other intellectual "ro"ert-*

7nless otherwise noted, the e(am"le com"anies, organi)ations, "roducts, domain names, e+mail addresses, logos, "eo"le, "laces and events de"icted herein are &ictitious, and no association with an- real com"an-, organi)ation, "roduct, domain name, email address, logo, "erson, "lace or event is intended or should be in&erred*

! %icroso&t 1or"oration* All rights reserved*

%icroso&t, S#$ Server, <isio, and the Server ,dentit- $ogo are either registered trademar's or trademar's o& %icroso&t 1or"oration in the 7nited States and=or other countries*

The names o& actual com"anies and "roducts mentioned herein ma- be the trademar's o& their res"ective owners*

Table of Contents
!ntroduction""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""# The $%niversal& Data Model""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""# Su erty es and Subty es""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""' ()tensible Attributes"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""* +ormali,e- +ormali,e- +ormali,e"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""". +ullability"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""/ Three01alued Logic"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""## Com ensating Actions for Denormali,ation"""""""""""""""""""""""""""""""""""""""""""""""""""""#2 Parent3Child Tables and Se4uence !Ds"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#2 Surrogate 5eys""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#6 Se4uence !Ds""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#6 Data Model Designs"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#' !nde)ing"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#' Query Builders""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#' Paging """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#* La,y Loading"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#* Semantic3Metadata37untime Data Model Chec8list""""""""""""""""""""""""""""""""""""""""#* Summary"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""#9

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

>

,ntroduction
%ore and more business a""lications are architected as business &ramewor's, where the core data model o& the &ramewor' must su""ort customers who wor' with di&&erent database objects and attributes, as well as allow &or e(tensive customi)ation* .or e(am"le, ta'e a manu&acturing com"an- that develo"s an a""lication to ca"ture all sensor data coming &rom a "lant?s e@ui"ment* 3ver- "lant &loor is "otentiall- di&&erent, with di&&erent e@ui"ment t-"es, sensors, readings, and needs* A "lant where automobiles are made has ver- di&&erent e@ui"ment and sensors than a "lant where chocolate bars are made* Relational databases that are develo"ed &or these a""lications tend to be ver- object oriented since there is no real wa- to identi&- all o& the data de&initions at design time* 2bjects with attributes are commonl- used* These are o&ten called semantic models* when im"lemented, these ver- generic or AuniversalB data models can be com"le( on man- levels* The- are ver- di&&icult to write @ueries against because the AobjectB table is aliased over and over in a @uer-, ma'ing the @uer- ver- di&&icult to understand* .urthermore, cost+based o"timi)ers have a di&&icult time with a database that has mansel&+joins* ,n addition, the most common data tends to be close together on dis', resulting in scalabilit- issues* This "a"er covers some o& the main issues that can arise in these scenarios and some a""roaches to im"rove "er&ormance and scalabilit-* ,t is targeted &or database develo"ers who are &amiliar with semantic modeling challenges*

The A7niversalB 6ata %odel


%ost objects and transactions can be modeled b- using a AuniversalB data modela model o& nouns, adjectives, verbs, and adverbs, i& -ou will* The &ollowing model could be created to store just about an- 'ind o& data &or an- 'ind o& a""lication* :This e(am"le model is e(tremel- sim"listic and is &or illustration "ur"oses onl-*;

:igure #
Microsoft Corporation 2008

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

This model is translated as &ollows:

:igure 2 ,& this model is a""lied to an online boo' +selling a""lication, boo's and stores are nouns :objects;C boo' names and t-"es are adjectives :attributes;C the sale is the verb :relationshi";C and the date o& sale and @uantit- are adverbs* This is a ver- sim"le e(am"le, and it would be common to add grou"ing, containers, and t-"es, but as a &oundation this model can su""ort most a""lications even without s"eci&ic table and column names as might be re@uired in a traditional relational data model* Since it is data driven at im"lementation time, the database is essentiall- AruntimeB* 9owever, it is ver- di&&icult to write @ueries against this model* A @uer- that returns a list o& boo' titles sold during a "articular da- at a "articular storea ver- sim"le @uer-would loo' li'e the &ollowing:
select convert(varchar, SaleDateValue.ObjectAttributeValue, 101) as SaleDate

Store.ObjectName as StoreName Book.ObjectName as BookName Author.ObjectName as AuthorName ustomer.ObjectName as &rom Object as Sale join ObjectAttribute as SaleDateValue on Sale.Object'D ( SaleDateValue.Object'D join Attribute as SaleDate
Microsoft Corporation 2008

ustomerName

SaleBook!t"value.#elationshi$AttributeValue as BooksSol%

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

on SaleDateValue.Attribute'D ( join #elationshi$ as SaleStore

SaleDate.Attribute'D

an% SaleDate.AttributeName ( )SaleDate) on Sale.Object'D ( SaleStore.*romObject'D an% SaleStore.#elationshi$+"$e ( )SaleStore) join Object as Store on SaleStore.+oObject'D ( Store.Object'D an% Store.Object+"$e ( )Store) ,oin #elationshi$ as SaleBook on Sale.Object'D ( SaleBook.*romObject'D an% SaleBook.#elationshi$+"$e ( )SaleBook) join Object as Book on SaleBook.+oObject'D ( Book.Object'D an% Book.Object+"$e ( )Book) ,oin #elationshi$ as BookAuthor on Book.Object'D ( BookAuthor.*romObject'D an% BookAuthor.#elationshi$+"$e ( )BookAuthor) join Object as Author on BookAuthor.+oObject'D ( Author.Object'D an% Author.Object+"$e ( )-erson) ,oin #elationshi$ as Sale ustomer on Sale.Object'D ( Sale ustomer.*romObject'D an% Sale ustomer.#elationshi$+"$e ( )Sale ust) join Object as an% ustomer ustomer.Object'D on Sale ustomer.+oObject'D (

ustomer.Object+"$e ( )-erson)

join #elationshi$Attribute as SaleBook!t"value on SaleBook.#elationshi$'D ( SaleBook!t"value.#elationshi$'D join Attribute as SaleBook!t" on SaleBook!t"value.Attribute'D ( SaleBook!t".Attribute'D an% SaleBook!t".AttributeName ( )Sale!t") .here Sale.Object+"$e ( )Sale)

This @uer- is com"le( and constrained* %an- com"anies have &ailed at a""lications based on this t-"e o& model because o& the abstraction o& the objects* Although mancustomer scenarios can be used with this model because it is so e(tensible, it is almost im"ossible &or customers to write @ueries and re"orts against it*
Microsoft Corporation 2008

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

,& -ou 'now at design time that the a""lication is &or a boo'seller, the model might loo' li'e the &ollowing* The @uer- would be eas- to write and understand, and to o"timi)e*

:igure ; The same @uer- against the above model would li'e more li'e this: select convert(varchar, ,Store.StoreName ,Book.BookName ,Author.AuthorName , ustomer. ustomerName ,SaleDetail.!t" &rom Sale join Store on Sale.Store'D ( Store.Store'D join SaleDetail on Sale.Sale'D ( join Book on SaleDetail.Book'D ( Book.Book'D join ustomer ustomer. ustomer'D on Sale. ustomer'D ( join Author
Microsoft Corporation 2008

Sale.SaleDate, 101) as SaleDate

SaleDetail.Sale'D

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

on Book.Author'D ( Author.Author'D Although this is much easier to understand, it is not e(tensible unless a customer adds columns and modi&ies the structure a&ter im"lementation, which has obvious shortcomings*

Su"ert-"es and Subt-"es


A com"romise between the two e(tremes e(em"li&ied b- these models is necessar-* That com"rise will di&&er &rom customer to customer, de"ending on the di&&erences in the end s-stems, the amount o& structure that can be "redetermined, and the "erceived abilities o& the customer* Su"ert-"es and subt-"es are one wa- to allow &or a more understandable data model that can still logicall- su""ort the object model* A supertype is a construct that allows &or 'ee"ing all common data in one table while s"litting o&& the data that is signi&icantldi&&erent into subt-"e tables* This enables all o& the "arts to be seen as one logical entit-, and obtuse AobjectB tables become more understandable* ,n the model in .igure >, the 2bject table re"resents Anouns,B such as Authors, 1ustomers, Stores, and Boo's* ,n this e(am"le, we assume that all &inal customer models will need a customi)able a""lication that handles "eo"le :Authors and 1ustomers;, com"anies :Stores;, and things :Boo's;* Subt-"ing the 2bject table to these new tables removes a great deal o& the di&&icult- in understanding these entities, but still allows &or the &le(ibilit- and e(tensibilit- that are necessar- to sell this a""lication to an- t-"e o& retailer* To design a subt-"e table, we must &irst determine which entities are in common* At the highest level is the 2bject table* Su"ert-"e tables also re@uire a discriminator columnto s"lit the su"ert-"e table rows b- object t-"ewhich usuall- corres"onds to the subt-"e table name* The 2bjectT-"e column is a natural discriminator so we leave that in the su"ert-"e table* The model &or the su"ert-"e and subt-"e tables might now loo' li'e this:

:igure 6 3ach subt-"e table has a "rimar- 'e- that is actuall- a &oreign 'e- re&erence to the 2bject table, renamed to something that is eas- to understand* The 1ustom2bject table enables users to add other subt-"es as needed* ,n this scenario, no Author can have the same ,6 as an- other object* This is ver- im"ortant, as some data could ta'e either one F that cannot be modeled as one column without a su"ert-"e table* .or e(am"le, in a Sale table, the same "erson could be either a 1ustomer or an Author*
Microsoft Corporation 2008

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

,t becomes clear that Author and 1ustomer are not correctl- modeledthese are roles, not things* The object in this scenario "robabl- should be a Person* When a Person,6 is in the Sale table, the "erson is a customer, when a "erson is in the Boo' table, he or she is an author* ,n addition, a sale could be made to a com"an- in addition to a "erson* The o"timal wa- to model this is to have two subt-"e tablesone named ,ndividual and one named 1om"an-* This is a common "ractice to ensure that this data can be treated interchangeabl- when needed and se"arated when that is re@uired* The new model loo's li'e the &ollowing:

:igure ' The Boo' subt-"e table ma- "rove to be "roblematic, as the end user ma- sell maga)ines, magni&-ing glasses, and other "roducts in addition to boo's* 1hanging the table to a more general Product name ma'es the model more usable* Su"ert-"e and subt-"e tables do not im"ose "er&ormance or scalabilit- constraints on databases, and the- allow &le(ibilit-* ,n some cases the 2bject table need not even be "h-sicallim"lemented* The 2bject name can be moved into each subt-"e table, and the "rimar'e-s &or each subt-"e need to be ensured "rogrammaticall- to be di&&erent across all subt-"es, aligning to the logical im"lementation o& the 2bject table*

3(tensible Attributes
,n the model, man- attributes can be "redetermined* .or e(am"le, we can "resume that most users will want to store individual names and e+mail address* These can be added as columns* The "roblem now becomes how users will add their own columns* There are two o"tions to solve this and both have usabilit- and scalabilit- com"romises* 2ne o"tion is to add columns to the schema* This carries numerous ris's, such as users who might delete necessar- columns or overwrite columns, in addition to "otential u"grade issues* Adding man- columns &or ever- "ossible o"tion that might be needed ahead o& time is also "roblematicit ta'es s"ace, and scalabilit- can be severel- limited when there are @ueries that re@uire man- 2R clauses* Almost ever- enter"rise a""lication needs some 'ind o& ad hoc @uer- builder screen and "rogrammaticallcreating a @uer- to run this is o&ten written as a se@uence o& 2R clauses* 1ost+based o"timi)ers will ma'e a decision at some number o& 2R clauses to revert &rom an inde( see' to an inde( scan* 2nce that threshold is reached, the @uer- becomes long running and can loc' large amounts o& resources* This must be avoided to hel" ensure that the a""lication is enter"rise scalable*
Microsoft Corporation 2008

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

2ne advantage o& adding columns to the schema is that the data t-"e is 'nown both bthe user and the database engine, allowing the database engine to error on invalid &unctions* ,t is also much easier &or end users to create custom re"orts when using this o"tion* The second o"tion is to add a "ro"ert- bag table, which is usuall- modeled as a name=value "air* This enables new columns to be added as rows, without changing the schema* Programmaticall- built @ueries can be built as 74,24 clauses* This is a highlscalable o"tion and inde( see's can alwa-s be "er&ormed* .inding one attribute in a table o& a billions rows re@uires about the same resources as &inding one attribute in a table o& a thousand rows* 9owever, name=value "airs do not have enough in&ormation to ma'e them realluse&ul* A decision must be made about how to store the value when it could be o& ant-"e* 2ne o"tion is to store it as a S#$<AR,A4T, another is to "lace values in data t-"e tablesone name=value "air table &or strings, one &or &loats, and so on* ,n addition, there must be more metadata available to be used* 1an -ou aggregate the valueI S#$<AR,A4T has not been shown to cause "er&ormance or scalabilit- issues, but it is not as eas- and clean to use as a t-"ed table* The biggest issue b- &ar is that all o& this data must be "ivoted so that it can be re"orted against* S#$ Server Re"orting Services :SSRS; is a ver- "o"ular "roduct because -ou can use it to develo" an enter"rise a""lication without building the re"orts individuall-* 3nd users can "oint SSRS to the database schema or to a SSRS Re"ort %odel and create their own re"orts* This is clearl- a desired &eature* 7sing name=value "airs re@uires some wor' b- the develo"ment team to "ivot rows into columns, either in data marts or "rogrammaticall-* S#$ Server Anal-sis Services does 'now how to "ivot these rows into measures but the "roblem o& losing the data t-"e still remains* The best "ractice is to "ic' a mi( o& the two a""roaches that wor's in -our a""lication and -our im"lementation environment* Adding custom columns that can be managed b- the customer as well as a name=value "air a""roach &or the more obscure attributes seems to wor's best* ,n the model in .igure G, the ,ndividual table contains 'nown columns, custom columns, and an attribute table that has been subt-"ed &rom the main attribute table*

Microsoft Corporation 2008

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

:igure * $imiting the number o& "ossible 2R clauses increases scalabilit-, and limiting the number o& name=value "airs increases usabilit-* The o"timal mi( must be determined &or 3ach a""lication* The onl- wa- to determine this is to run load tests to determine where the model will &all over* 8ou then must &ine+tune the mi( and test again* %icroso&tJ S#$ ServerJ 2 ! "rovides another o"tions"arse column su""ort with column sets and &iltered inde(es* Sparse columns were designed s"eci&icall- &or the o"timi)ed storage o& columns that o&ten contain null values* Column sets enable these columns to be retrieved as a single (ml document* .iltered inde(es "rovide the abilitto create an inde( on a subset o& dataC in this scenario that would be onl- the data with actual values* .iltered inde(es im"rove "er&ormance and could conceivabl- be created on ever- customer or e(tended column, hel"ing to alleviate table scans on @uerbuilder @ueries*

4ormali)e, 4ormali)e, 4ormali)e


A common @uestion in database modeling is when and where to denormali)e* The current answer &or semantic models is to normali)e as much as "ossible unless there is a "roven and tested reason not to* There are some common areas where we 'now we "robabl- need to denormali)e* 2ne e(am"le is &older hierarchies* A table containing &older hierarchies might loo' li'e the &ollowing:

Microsoft Corporation 2008

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

:igure 9 This model is the most e&&icient &or s"ace and the data will alwa-s be consistent* 9owever, to retrieve the &ull "ath o& a &older re@uires writing a common table e("ression to iterate through the hierarch-, which can signi&icantl- im"act scalabilit-* ,n this scenario, i& &older locations do not o&ten change, it is a use&ul denormali)ation to add a column with the materiali)ed "ath, which can then be retrieved with one see' o"eration* S#$ Server 2 ! "rovides new &unctionalit- &or this in the hierarchyid data t-"e, which can be used to re"resent this tree in an e(tremel- com"act manner, and can be sorted* The S#$ Server engine was com"letel- re+architected in version H* , awa- &rom the original shared %icroso&t=S-base code* This meant S#$ Server could design around a more "urel- relational model, setting the &oundation &or a""lications that could scale and "er&orm out o& the bo( i& the database model also &ollows the relational model* This is a huge advantage* All o& the metadata &or a modelthe "rimar- 'e-s, &oreign 'e-s, nullabilit-, chec' constraints, and so onare used b- the o"timi)er to &ind an o"timal e(ecution "lan* The best "ractice is to design at least a third normal &orm semantic model and materiali)e that in the database with all o& the corres"onding constraints* This is es"eciall- critical &or semantic models because there tends to be a lot o& sel&+ re&erencing in these and the im"act o& having no relational in&ormation will be &elt in scalabilit- much more than in other t-"es o& models*

4ullabilit4ullable columns and relationshi"s can limit scalabilit- as there are man- o"tions &or the o"timi)er to consider* ,& a &oreign 'e- relationshi" is nullable, "rogrammaticallbuilt @ueries must use outer joins to ensure that all "arts o& the data are "resented* .orcing all &oreign 'e- relationshi"s to be non+nullable and creating a &oreign 'econstraint on them gives the o"timi)er in&ormation that allows it to o"timi)e the @uerand outer joins are usuall- not needed* The o"timi)er can even eliminate "arts o& a @uer-* .or e(am"le, ta'e the &ollowing model:

Microsoft Corporation 2008

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

>

:igure . ,n this model, the relationshi" between the Sales2rder9eader table and the SalesPerson table is not nullable and there is a &oreign 'e- relationshi" de&ined* There is also an inde( on 2rder6ate to su""ort the @uer- "redicate* ,t is common &or a "rogrammaticall- built @uer- &rom a @uer- builder a""lication to write @ueries in a standard &orm, including columns that are on the a""lication search &orm* a @ueragainst two tables that actuall- returns onl- columns &rom one o& the tables is not uncommon* This @uer- and its e(ecution "lan would loo' li'e this: select SalesOr%er/ea%er.Or%erDate &rom SalesOr%er/ea%er join Sales-erson on SalesOr%er/ea%er.Sales-erson'D ( Sales-erson.Sales-erson'D .here SalesOr%er/ea%er.Or%erDate bet.een )0100101002) an% )0100101002)

:igure / Removing onl- the &oreign 'e- constraint on the Sales2rder9eader table and rerunning the @uer- "roduces the &ollowing @uer- e(ecution "lan:

Microsoft Corporation 2008

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

>>

:igure #<

Clearly the foreign key constraint metadata is being used by the optimizer to eliminate the table that is not needed.
4ote that there is another im"ortant issue with nulls that is o&ten overloo'edC their correct usage is not alwa-s understood b- either develo"ers or customers* 6evelo"ers must code di&&erentl- &or nulls because the correct s-nta( is di&&erent &rom the s-nta( &or non+null values* 4ot doing This leads to runtime errors* #ueries can also return di&&erent results de"ending on the 1241ATK47$$K8,3$6SK47$$ setting* The &ollowing @uer- should, and does, return LAB?: %eclare 3nvar1 nchar(1) %eclare 3nvar1 nchar(1) set 3nvar1 ( )A) set 3nvar1 ( )B) select 3nvar1 4 3nvar1 '& "ou set one o& the variables to null5 set 3nvar1 ( N677 The result returned de"ends on the 1241ATK47$$K8,3$6SK47$$ setting*

Three+<alued $ogic
%an- o& us are not com"letel- &amiliar with three+valued logic* Ta'e the &ollowing test to illustrate this "oint* 1om"lete each bo( &or two e("ressions that are A46ed* :That is, i& e8$1 evaluates to +#69 A46 e8$1 evaluates to +#69, the Boolean A46 o& both is TR73; A+D TR73 .A$S3 74K TR73 .A$S3 74K

Now complete the following for Or expressions.


Microsoft Corporation 2008

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

>2

=7 TR73 .A$S3 74K

TR73

.A$S3

74K

Answers are in the &ollowing tables* The "oint is that avoiding nulls avoids this situation and "otential "roblems &or customers* Three01alued Logic Ans>ers A+D TR73 .A$S3 74K =7 TR73 .A$S3 74K TR73 true &alse un' TR73 true true true .A$S3 &alse &alse &alse .A$S3 true &alse un' 74K un' &alse un' 74K true un' un'

1om"ensating Actions &or 6enormali)ation


,t is ver- im"ortant &or an- denormali)ation that there be a com"ensating action* 4ormali)ation "rovides con&ormance to the rules o& the relational model, &or which relational engines are designed* Thus, normali)ation gives the o"timi)er the best chance to do its job e&&ectivel-* ,t also "rotects data consistenc-* Both o& these are critical elements to an- s-stem* ,& a decision to denormali)e is made, there must be a com"ensating action to ensure data consistenc-* ,n the e(am"le o& the &older hierarch- and a materiali)ed "ath, i& a &older is moved b- changing the "arent, the materiali)ed "ath is not automaticallu"dated* the com"ensating action is that there must be a trigger or "rogrammatic solution to change the materiali)ed "ath as well* A denormali)ation alwa-s has other re"ercussions that ma- ma'e it not as scalable or "er&orm as well as intended* ,n the case o& the &older hierarch-, the wor' re@uired to build the "ath on ever- @uer- is worth the cost o& the com"ensating action*

Parent=1hild Tables and Se@uence ,6s


Semantic models tend to be ver- dee" in "arent=child relationshi"sC this is sim"l- a common characteristic that is not o&ten &ound in 2$TP or 2$AP data models* Problems with incorrectl- modeling "arent=child relationshi"s t-"icall- do not a&&ect scalabilit- &or the "arent and child, but &or all the grandchildren* 3ach generation com"ounds the "roblem because each generation "rovides a natural narrowing o& the data and "arent=child tables are not ver- di&&erent &or the o"timi)er than a table with a domain
Microsoft Corporation 2008

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

>D

table* :A domain table is generall- a t-"e table, with a &oreign 'e- relationshi" with a regular table* This table "rovides the domain o& allowed values &or the column in the regular table*; The &ollowing tables re"resent common entities on a manu&acturing "lant &loor :sim"li&ied &or this e(am"le;* There is a se@uence,6 &or each table and there are currentl- no relationshi"s de&ined, so the o"timi)er does not have an- in&ormation about relationshi"s :these relationshi"s are a conce"t and not de&ined in the s-stem;* A Site is the "lant building, an area is inside the "lant building, a line is a "roduction line inside an area, a node is a "iece o& e@ui"ment on that line, and sensors live on nodes*

:igure ## This is a common data model that we see when databases are migrated &rom 2racle* even though there are column names that are the same in each table, that does not indicate to a s-stem that the- have a relationshi"* the o"timi)er can ma'e a good guess whether there is a relationshi" between the Site,6 in Site and the Site,6 in Area, but there is no guarantee that it will be correct* A model with &oreign 'e- constraints &or each table loo's li'e this:

:igure #2 The "roblem with this model is that the o"timi)er :as well as a develo"er; will see these as domain tables that constrain the value o& the 4ode,6 in the Sensor table* But that is not actuall- correct* A sensor cannot e(ist b- itsel&it is a sub"art o& a 4odeso it is a child table* This is true o& the entire grou" o& tableseach is a child o& the "revious* children are identi&ied in the relational model b- a com"ound 'e- :the "arent ,6 and an ,nstance or $ine,6;*

6omain Relationshi" :igure #;

Parent=1hild Relationshi"

6ata modeling tools usuall- "ull the "rimar- 'e- &rom the "arent to the child table when -ou draw that relationshi"* This is what the model should actuall- loo' li'e:

Microsoft Corporation 2008

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

>E

:igure #6 ,t is good "ractice to 'ee" the "arent 'e-s in the same order in each child table &or ease o& reading* ,t is ver- clear in this diagram that each is a "arent o& the ne(t* Another rule o& relational databases is that -ou must join on the entire 'e-C this model ma'es that clear as well* Moining the Sensor table to the 4ode table re@uires a join on Site,6 and Area,6 and $ine,6 and 4ode,6 on both sides o& the join* 4ote that incorrect modeling, such as the model in .igure >2 can lead to the entr- o& incorrect data* This tas' loo's ver- onerousthere are so man- 'e-s and so much code* 9owever, it is normal &orm and ma'es a huge di&&erence in scalabilit- in semantic models* The o"timi)er can "ic' absolutel- correct join e(ecution "lans, and -ou can jum" in the hierarch- at an- "oint* .or instance, &inding all the sensors in one area can be accom"lished b- @uer-ing onl- the Sensor table* ,n the "revious model in .igure >2, the "arent 'e-s are not carried down and -ou would have to join ever- single table to get that in&ormation* Watch &or @ueries that return du"licate data and that use a 6,ST,41T clause to &i( this. 6u"licate data is a clear sign o& a model "roblem and the "roblem is usuall- just this onedomain relationshi"s instead o& "arent=child relationshi"s* The 6,ST,41T clause should not be necessar- to ma'e a @uer- correct* To test the "h-sical data model, reverse engineer the database into %icroso&t <isioJ and see i& all o& the relationshi"s are intact* ,& <isio can "ull out the entire model and relationshi"s, so can the o"timi)er*

Surrogate Ke-s
1learl- the &act that the number o& 'e-s contained in the "rimar- 'e- gets larger as -ou move down a hierarch- is "roblematic* %an- would be tem"ted to use a surrogate 'einstead o& the com"osite 'e-* Surrogate 'e-s were initiall- used to save s"ace, and as 'e-s were carried to all other inde(es it matteredC as that was the major concern a number o& -ears ago* Surrogate 'e-s have no business meaningthe- are meant to stand in &or something else* A 1om"an-,6 in a 1om"an- table is a surrogate 'ebecause it ta'es the "lace o& the 1om"an-4ame, which is long, hard to use, and machange* :A "rimar- 'e- must not change*; ,n the "lant model, the Site,6 is a surrogate 'e- &or the Site4ame* 9owever, the Site4ame column is still in the table* Nenerallwhen a surrogate 'e- is used, -ou should still store the business data in the tableotherwise that data is lost* Surrogate 'e-s save s"ace onl- where "rimar- 'e-s are carried to other inde(es or tables* ,n our e(am"le, we could use a surrogate &or the Sensor or 4ode tables, but the Site,6, Area,6, and $ine,6 must be in the table because the- are "art o& the de&inition o& the table*

Se@uence ,6s
Se@uence ,6s tend to be &ound in databases migrated &rom 2racle* As there are usuall2racle 6BAs available to o"timi)e @ueries, the "roblems the- cause &or relational @uerMicrosoft Corporation 2008

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

>/

engines is hidden in the cost o& their services* 2ne o& the main architecture goals &or S#$ Server was to minimi)e the need &or o"timi)ation e("erts* To achieve this low total cost o& ownershi" :T12;, ta'e the time to model -our a""lication correctl-, and remove A$$ Se@uence ,6s* $et the o"timi)er do its wor' &or -ou*

6ata %odel 6esigns


6ata model designs are the &irst and most im"ortant "iece o& an- database a""lication* The &irst logical model is about the business re@uirements* A&ter the logical model is com"lete and a""roved, the "h-sical model is materiali)ed into a database with constraints and inde(es, nullabilit-, data t-"es, and so on* This enables the o"timi)er to wor' as a relational o"timi)er* The data model also indicates how @ueries can be correctl- written* Moins should be used onl- along the relationshi" lines shown in the diagram* Because there tends to be a lac' o& modeling disci"line in some database grou"s, develo"ers join on columns that seem to have the same name or whatever criteria the- can come u" with* ,n the "lant model, although there is an Area,6 in the Sensor table, there is not a relationshi" line, so there should not be a join between those two tables* ,n &act, i& a @uer- is written that wa-, du"licate rows will be returned and a 6,ST,41T clause would be re@uired* Moining all the tables results in good "er&ormancejoining numerous tables is not a "roblem; the "roblem is incorrect joining o& numerous tables*

,nde(ing
A&ter the semantic data model is relationall- correct, the ne(t ste" is to add inde(es &or all &oreign 'e- constraints as shown in the model, and to add a single column inde( &or each column in tables that -ou antici"ate will end u" in "redicates* When there are man- small inde(es, the o"timi)er can "ut them together in the most a""ro"riate manner* ,t is a best "ractice to have the "rimar- 'e- be the clustered 'e- in semantic models, unless there are good reasons not to :such as "er&ormance tests that reveal a "roblem with this;* Because we must sta- ver- relationall- correct and tight &or scalabilitresults, the 'e-s are eventuall- carried &orward an-wa-, so a clustered "rimar- 'esaves s"ace and gives the o"timi)er the best o"tions This should ta'e care o& about ! O o& -our @uer- o"timi)ation needs* The other 2 O -ou will &ind during -our "er&ormance and scalabilit- tests* ,t is im"ortant to reiterate that there is an o"timal balance between an object view and a relational view o& the data and -ou can &ind this &or -our a""lication onl- b- testing*

#uer- Builders
%ost a""lications designed b- vendors &or end customer use su""l- some wa- to "er&orm ad hoc searches without 'nowing ahead o& time what those searches might contain* Nenerall-, a @uer- builder "age that allows users to &ill in whatever data is needed and "er&orm a search is created* These are o&ten non+"er&ormant and inhibit scalabilit- as the- tend to turn into table scans* 9aving man- table scans limits scalabilit- because there is too much contention on the "h-sical data la-out* Because the @uer- must be "rogrammaticall- built, there might be man- 2R clauses that combine the user+selected data* These generall- degrade into table or inde( scans

Microsoft Corporation 2008

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

>G

a&ter a certain number o& 2R clauses because the o"timi)er determines that it is more e&&icient to scan each row and chec' &or inclusion against the man- criteria* To lessen the im"act o& this, tr- to avoid data conversions, as the- generall- degrade to a table or inde( scan because the inde( is no longer o& value* ,& -ou 'ee" metadata about the columns that are included in the @uer- builder "age, -ou can "rogrammaticall- ensure that numbers are entered into the @uer- as numbers and not as converted strings, or another data t-"e* 6esign -our @ueries to use see's as much as "ossible*

Paging
1ustomers can submit @ueries that return large numbers o& results, whereas o&ten the customer does not actuall- need all o& the returned data* To 'ee" a s-stem scalable, -ou must design into the a""lication a wa- to limit this im"act* ,ncluding a T2P clause does not ensure that the entire table is not scanned be&ore returning the limited amount o& rows* The underl-ing wor' must still be "er&ormed, a&ter which the end results are limited* $arge amounts o& resources are still being consumed and "otentiallloc'ed* A use&ul design "attern is to "age the results, giving the user the to" &ew results, with an o"tion to re@uest more data* 6o not "rovide user controls that allow scrolling through all o& the result sets as doing so re@uires that all data be returned* 7se the design "attern o&ten used in web searchesa result set o& a certain si)e is returned, and the o"tion to retrieve more is o&&ered*

$a)- $oading
$i'e "aging, la)- loading is a client design decision that can have a large im"act on the scalabilit- o& the database* Trees should be initiall- dis"la-ed as colla"sed with onl- the data re@uired &or the colla"sed version* When a node is selected, the database call should be made to return just that data* 6evelo"ers o&ten worr- about the round tri"s re@uired &or la)- loading trees, but that data should come bac' well under the time needed to dis"la- it, and it increases scalabilit- o& the a""lication b- limiting resource usage*

Semantic=%etadata=Runtime 6ata %odel 1hec'list


3valuate semantic models &or the &ollowing to hel" isolate "otential scalabilit- issues* %itigating each area at design time will hel" to limit run+time "roblems* Build a logical model and ma'e sure it is a""roved b- the business Subt-"e &rom a 7niversal model 4ormali)e ,& -ou denormali)e, add a com"ensating action Remove all Se@uence,6s %odel "arent=child relationshi"s correctl1reate a "h-sical model and im"lement it in the database Avoid nullable relationshi"s as much as "ossible Reverse engineer the database into <isio to chec' &or correctness
Microsoft Corporation 2008

Best Practices &or Semantic 6ata %odeling &or Per&ormance and Scalabilit-

>H

6esign the user inter&ace to su""ort "aging $a)- load trees as much as "ossible 3valuate all @ueries that re@uire a 6,ST,41T clause $oad test &or scalabilit-

SummarSemantic data models can be ver- com"le( and until semantic databases are commonlavailable, the challenge remains to &ind the o"timal balance between the "ure object model and the "ure relational model &or each a""lication* The 'e- to success is to understand the issues, ma'e the necessar- mitigations &or those issues, and then test, test, and test* Scalabilit- testing is a critical success &actor i& -ou are going to &ind that o"timal design*

:or more information: S#$ Server Web site S#$ Server Tech1enter S#$ Server 6ev1enter 6id this "a"er hel" -ouI Please give us -our &eedbac'* Tell us on a scale o& > :"oor; to / :e(cellent;, how would -ou rate this "a"er and wh- have -ou given it this ratingI .or e(am"le: Are -ou rating it high because it has good e(am"les, e(cellent screenshots, clear writing, or another reasonI Are -ou rating it low because o& "oor e(am"les, &u))- screenshots, unclear writingI

This &eedbac' will hel" us im"rove the @ualit- o& white "a"ers we release* Send &eedbac'*

Microsoft Corporation 2008

Potrebbero piacerti anche