Sei sulla pagina 1di 18

Designing the Star Schema Database

By Craig Utley

Introduction
Creating a Star Schema Database is one of the most important, and sometimes the final, step in creating a data warehouse. Given how important this process is to our data warehouse, it is important to understand how me move from a standard, on-line transaction processing !"#$% system to a final star schema which here, we will call an !"&$ system%. #his paper attempts to address some of the issues that have no doubt 'ept you awa'e at night. &s you stared at the ceiling, wondering how to build a data warehouse, (uestions began swirling in your mind) *hat is a Data *arehouse+ *hat is a Data ,art+ *hat is a Star Schema Database+ *hy do - want.need a Star Schema Database+ #he Star Schema loo's very denormali/ed. *on0t - get in trouble for that+ *hat do all these terms mean+ Should - repaint the ceiling+

#hese are certainly burning (uestions. #his paper will attempt to answer these (uestions, and show you how to build a star schema database to support decision support within your organi/ation.

Terminology
Usually, you are bored with terminology at the end of a chapter, or buried in an appendi1 at the bac' of the boo'. 2ere, however, - have the thrill of presenting some terms up front. #he intent is not to bore you earlier than usual, but to present a baseline off of which we can operate. #he problem in data warehousing is that the terms are often used loosely by different parties. #he Data *arehousing -nstitute http)..www.dwinstitute.com% has attempted to standardi/e some terms and concepts. - will present my best understanding of the terms - will use throughout this lecture. $lease note, however, that - do not spea' for the Data *arehousing -nstitute. OLTP

!"#$ stand for !nline #ransaction $rocessing. #his is a standard, normali/ed database structure. !"#$ is designed for transactions, which means that inserts, updates, and deletes must be fast. -magine a call center that ta'es orders. Call ta'ers are continually ta'ing calls and entering orders that may contain numerous items. 3ach order and each item must be inserted into a database. Since the performance of the database is critical, we want to ma1imi/e the speed of inserts and updates and deletes%. #o ma1imi/e performance, we typically try to hold as few records in the database as possible. OLAP and Star Schema !"&$ stands for !nline &nalytical $rocessing. !"&$ is a term that means many things to many people. 2ere, we will use the term !"&$ and Star Schema pretty much interchangeably. *e will assume that a star schema database is an !"&$ system. #his is not the same thing that ,icrosoft calls !"&$4 they e1tend !"&$ to mean the cube structures built using their product, !"&$ Services. 2ere, we will assume that any system of read-only, historical, aggregated data is an !"&$ system. -n addition, we will assume an !"&$.Star Schema can be the same thing as a data warehouse. -t can be, although often data warehouses have cube structures built on top of them to speed (ueries. Data Warehouse and Data Mart Before you begin grumbling that - have ta'en two very different things and lumped them together, let me e1plain that Data *arehouses and Data ,arts are conceptually different 5 in scope. 2owever, they are built using the e1act same methods and procedures, so will define them together here, and then discuss the differences. & data warehouse or mart% is way of storing data for later retrieval. #his retrieval is almost always used to support decision-ma'ing in the organi/ation. #hat is why many data warehouses are considered to be DSS Decision-Support Systems%. 6ou will hear some people argue that not all data warehouses are DSS, and that0s fine. Some data warehouses are merely archive copies of data. Still, the full benefit of ta'ing the time to create a star schema, and then possibly cube structures, is to speed the retrieval of data. -n other words, it supports (ueries. #hese (ueries are often across time. &nd why would anyone loo' at data across time+ $erhaps they are loo'ing for trends. &nd if they are loo'ing for trends, you can bet they are ma'ing decisions, such as how much raw material to order. Guess what) that0s decision support7 3nough of the soap bo1. Both a data warehouse and a data mart are storage mechanisms for read-only, historical, aggregated data. By read-only, we mean that the person loo'ing at the data won0t be changing it. -f a user wants to loo' at the sales yesterday for a certain product, they should not have the ability to change that number. !f course, if we 'now that number is wrong, we need to correct it, but more on that later.

#he 8historical9 part may :ust be a few minutes old, but usually it is at least a day old. & data warehouse usually holds data that goes bac' a certain period in time, such as five years. -n contrast, standard !"#$ systems usually only hold data as long as it is 8current9 or active. &n order table, for e1ample, may move orders to an archive table once they have been completed, shipped, and received by the customer. *hen we say that data warehouses and data marts hold aggregated data, we need to stress that there are many levels of aggregation in a typical data warehouse. -n this section, on the star schema, we will :ust assume the 8base9 level of aggregation) all the data in our data warehouse is aggregated to a certain point in time. "et0s loo' at an e1ample) we sell ; products, dog food and cat food. 3ach day, we record sales of each product. &t the end of a couple of days, we might have data that loo's li'e this) Date ?.;?.@@ <uantity Sold !rder =umber Dog >ood Cat >ood A B ; ; C D C ; E ? ; ; B C C A ; C C ; ? F A D

?.;B.@@

Table 1 =ow, as you can see, there are several transactions. #his is the data we would find in a standard !"#$ system. 2owever, our data warehouse would usually not record this level of detail. -nstead, we summari/e, or aggregate, the data to daily totals. !ur records in the data warehouse might loo' something li'e this) Date ?.;?.@@ ?.;B.@@ Table 6ou can see that we have reduced the number of records by aggregating the individual transaction records into daily records that show the number of each product purchased each day. <uantity Sold Dog >ood Cat >ood AB AC @ G

*e can certainly get from the !"#$ system to what we see in the !"&$ system :ust by running a (uery. 2owever, there are many reasons not to do this, as we will see later. Aggregations #here is no magic to the term 8aggregations.9 -t simply means a summari/ed, additive value. #he level of aggregation in our star schema is open for debate. *e will tal' about this later. Hust reali/e that almost every star schema is aggregated to some base level, called the grain.

OLTP Systems
!"#$, or !nline #ransaction $rocessing, systems are standard, normali/ed databases. !"#$ systems are optimi/ed for inserts, updates, and deletes4 in other words, transactions. #ransactions in this conte1t can be thought of as the entry, update, or deletion of a record or set of records. !"#$ systems achieve greater speed of transactions through a couple of means) they minimi/e repeated data, and they limit the number of inde1es. >irst, let0s e1amine the minimi/ation of repeated data. -f we ta'e the concept of an order, we usually thin' of an order header and then a series of detail records. #he header contains information such as an order number, a bill-to address, a ship-to address, a $! number, and other fields. &n order detail record is usually a product number, a product description, the (uantity ordered, the unit price, the total price, and other fields. 2ere is what an order might loo' li'e)

!igure 1 =ow, the data behind this loo's very different. -f we had a flat structure, we would see the detail records loo'ing li'e this)

!rder =umber A;C?B

!rder Date ?.;?.@@ Customer Iip

Customer -D ?BA Contact =ame

Customer =ame &C,3 $roducts Contact =umber

Customer &ddress A;C ,ain Street $roduct -D

Customer City "ouisville $roduct =ame

Customer State
J6 $roduct Description L9 Brass *idget ?D;D; Category Brass Goods Hane Doe SubCategory *idgets BD;-BBB-A;A; $roduct $rice MA.DD &ACH; <uantity !rdered ;DD *idget 3tcK 3tcK

Table " =otice, however, that for each detail, we are repeating a lot of information) the entire customer address, the contact information, the product information, etc. *e need all of this information for each detail record, but we don0t want to have to enter the customer and product information for each record. #herefore, we use relational technology to tie each detail to the header record, without having to repeat the header information in each detail record. #he new detail records might loo' li'e this)
!rder =umber A;?FC $roduct =umber <uantity !rdered &?NA;H ;DD

Table # & simplified logical view of the tables might loo' something li'e this)

!igure =otice that we do not have the e1tended cost for each record in the !rderDetail table. #his is because we store as little data as possible to speed inserts, updates, and deletes. #herefore, any number that can be calculated is calculated and not stored. *e also minimi/e the number of inde1es in an !"#$ system. -nde1es are important, of course, but they slow down inserts, updates, and deletes. #herefore, we use :ust enough inde1es to get by. !ver-inde1ing can significantly decrease performance.

$ormali%ation
Database normali/ation is basically the process of removing repeated information. &s we saw above, we do not want to repeat the order header information in each order detail record. #here are a number of rules in database normali/ation, but we will not go through the entire process. >irst and foremost, we want to remove repeated records in a table. >or e1ample, we don0t want an order table that loo's li'e this)

!igure " -n this e1ample, we will have to have some limit of order detail records in the !rder table. -f we add ;D repeated sets of fields for detail records, we won0t be able to handle that order for ;A products. -n addition, if an order :ust has one product ordered, we still have all those fields wasting space. So, the first thing we want to do is brea' those repeated fields into a separate table, and end up with this)

!igure # =ow, our order can have any number of detail records.

OLTP Ad&antages
&s stated before, !"#$ allows us to minimi/e data entry. >or each detail record, we only have to enter the primary 'ey value from the !rder2eader table, and the primary 'ey of the $roduct table, and then add the order (uantity. #his greatly reduces the amount of data entry we have to perform to add a product to an order. =ot only does this approach reduce the data entry re(uired, it greatly reduces the si/e of an !rderDetail record. Compare the si/e of the records in #able C as to that in #able ?.

6ou can see that the !rderDetail records ta'e up much less space when we have a normali/ed table structure. #his means that the table is smaller, which helps speed inserts, updates, and deletes. -n addition to 'eeping the table smaller, most of the fields that lin' to other tables are numeric. <ueries generally perform much better against numeric fields than they do against te1t fields. #herefore, replacing a series of te1t fields with a numeric field can help speed (ueries. =umeric fields also inde1 faster and more efficiently. *ith normali/ation, we may also have fewer inde1es per table. #his means that inserts, updates, and deletes run faster, because each insert, update, and delete may affect one or more inde1es. #herefore, with each transaction, these inde1es must be updated along with the table. #his overhead can significantly decrease our performance.

OLTP Disad&antages
#here are some disadvantages to an !"#$ structure, especially when we go to retrieve the data for analysis. >or one, we now must utili/e :oins and (uery multiple tables to get all the data we want. Hoins tend to be slower than reading from a single table, so we want to minimi/e the number of tables in any single (uery. *ith a normali/ed structure, we have no choice but to (uery from multiple tables to get the detail we want on the report. !ne of the advantages of !"#$ is also a disadvantage) fewer inde1es per table. >ewer inde1es per table are great for speeding up inserts, updates, and deletes. -n general terms, the fewer inde1es we have, the faster inserts, updates, and deletes will be. 2owever, again in general terms, the fewer inde1es we have, the slower select (ueries will run. >or the purposes of data retrieval, we want a number of inde1es available to help speed that retrieval. Since one of our design goals to speed transactions is to minimi/e the number of inde1es, we are limiting ourselves when it comes to doing data retrieval. #hat is why we loo' at creating two separate database structures) an !"#$ system for transactions, and an !"&$ system for data retrieval. "ast but not least, the data in an !"#$ system is not user friendly. ,ost -# professionals would rather not have to create custom reports all day long. -nstead, we li'e to give our customers some (uery tools and have them create reports without involving us. ,ost customers, however, don0t 'now how to ma'e sense of the relational nature of the database. Hoins are something mysterious, and comple1 table structures such as associative tables on a bill-of-material system% are hard for the average customer to use. #he structures seem obvious to us, and we sometimes wonder why our customers can0t get the hang of it. Nemember, however, that our customers 'now how to do a >->!-to"->! revaluation and other such tas's that we don0t want to deal with4 therefore, understanding relational concepts :ust isn0t something our customers should have to worry about. -f our customers want to spend the ma:ority of their time performing analysis by loo'ing at the data, we need to support their desire for fast, easy (ueries. !n the other hand, we

need to meet the speed re(uirements of our transaction-processing activities. -f these two re(uirements seem to be in conflict, they are, at least partially. ,any companies have solved this by having a second copy of the data in a structure reserved for analysis. #his copy is more heavily inde1ed, and it allows customers to perform large (ueries against the data without impacting the inserts, updates, and deletes on the main data. #his copy of the data is often not :ust more heavily inde1ed, but also denormali/ed to ma'e it easier for customers to understand.

'easons to Denormali%e
*henever - as' someone why you would ever want to denormali/e, the first and often only% answer is) speed. *e0ve already discussed some disadvantages to the !"#$ structure4 it is built for data inserts, updates, and deletes, but not data retrieval. #herefore, we can often s(uee/e some speed out of it by denormali/ing some of the tables and having (ueries go against fewer tables. #hese (ueries are faster because they perform fewer :oins to retrieve the same recordset. Hoins are slow, as we have already mentioned. Hoins are also confusing to many end users. By denormali/ing, we can present the user with a view of the data that is far easier for them to understand. *hich view of the data is easier for a typical end-user to understand)

!igure (

!igure ) #he second view is much easier for the end user to understand. *e had to use :oins to create this view, but if we put all of this in one table, the user would be able to perform this (uery without using :oins. *e could create a view that loo's li'e this, but we are still using :oins in the bac'ground and therefore not achieving the best performance on the (uery.

*o+ We ,ie+ In-ormation


&ll of this leads us to the real (uestion) how do we view the data we have stored in our database+ #his is not the (uestion of how we view it with (ueries, but how do we logically view it+ >or e1ample, are these intelligent (uestions to as') 2ow many bottles of &niseed Syrup did we sell last wee'+ &re overall sales of Condiments up or down this year compared to previous years+ !n a (uarterly and then monthly basis, are Dairy $roduct sales cyclical+ -n what regions are sales down this year compared to the same period last year+ *hat products in those regions account for the greatest percentage of the decrease+

&ll of these (uestions would be considered reasonable, perhaps even common. #hey all have a few things in common. >irst, there is a time element to each one. Second, they all are loo'ing for aggregated data4 they are as'ing for sums or counts, not individual transactions. >inally, they are loo'ing at data in terms of 8by9 conditions. *hen - tal' about 8by9 conditions, - am referring to loo'ing at data by certain conditions. >or e1ample, if we ta'e the (uestion 8!n a (uarterly and then monthly basis, are Dairy $roduct sales cyclical9 we can brea' this down into this) 8*e want to see total sales by category :ust Dairy $roducts in this case%, by (uarter or by month.9

2ere we are loo'ing at an aggregated value, the sum of sales, by specific criteria. *e could add further 8by9 conditions by saying we wanted to see those sales by brand and then the individual products. >iguring out the aggregated values we want to see, li'e the sum of sales dollars or the count of users buying a product, and then figuring out these 8by9 conditions is what drives the design of our star schema.

Ma.ing the Database Match our /01ectations


-f we want to view our data as aggregated numbers bro'en down along a series of 8by9 criteria, why don0t we :ust store data in this format+ #hat0s e1actly what we do with the star schema. -t is important to reali/e that !"#$ is not meant to be the basis of a decision support system. #he 8#9 in !"#$ stands for transactions, and a transaction is all about ta'ing orders and depleting inventory, and not about performing comple1 analysis to spot trends. #herefore, rather than tie up our !"#$ system by performing huge, e1pensive (ueries, we build a database structure that maps to the way we see the world. *e see the world much li'e a cube. *e won0t tal' about cube structures for data storage :ust yet. -nstead, we will tal' about building a database structure to support our (ueries, and we will speed it up further by creating cube structures later.

!acts and Dimensions


*hen we tal' about the way we want to loo' at data, we usually want to see some sort of aggregated data. #hese data are called measures. #hese measures are numeric values that are measurable and additive. >or e1ample, our sales dollars are a perfect measure. 3very order that comes in generates a certain sales volume measured in some currency. -f we sell twenty products in one day, each for five dollars, we generate ADD dollars in total sales. #herefore, sales dollars is one measure we may want to trac'. *e may also want to 'now how many customers we had that day. Did we have five customers buying an average of four products each, or did we have :ust one customer buying twenty products+ Sales dollars and customer counts are two measures we will want to trac'. Hust trac'ing measures isn0t enough, however. *e need to loo' at our measures using those 8by9 conditions. #hese 8by9 conditions are called dimensions. *hen we say we want to 'now our sales dollars, we almost always mean by day, or by (uarter, or by year. #here is almost always a time dimension on anything we as' for. *e may also want to 'now sales by category or by product. #hese by conditions will map into dimensions) there is almost always a time dimension, and product and geographic dimensions are very common as well. #herefore, in designing a star schema, our first order of business is usually to determine what we want to see our measures% and how we want to see it our dimensions%.

Ma11ing Dimensions into Tables


Dimension tables answer the 8why9 portion of our (uestion) how do we want to slice the data+ >or e1ample, we almost always want to view data by time. *e often don0t care what the grand total for all data happens to be. -f our data happen to start on Hune A?, A@G@, do we really care how much our sales have been since that date, or do we really care how one year compares to other years+ Comparing one year to a previous year is a form of trend analysis and one of the most common things we do with data in a star schema. *e may also have a location dimension. #his allows us to compare the sales in one region to those in another. *e may see that sales are wea'er in one region than any other region. #his may indicate the presence of a new competitor in that area, or a lac' of advertising, or some other factor that bears investigation. *hen we start building dimension tables, there are a few rules to 'eep in mind. >irst, all dimension tables should have a single-field primary 'ey. #his 'ey is often :ust an identity column, consisting of an automatically incrementing number. #he value of the primary 'ey is meaningless4 our information is stored in the other fields. #hese other fields contain the full descriptions of what we are after. >or e1ample, if we have a $roduct dimension which is common% we have fields in it that contain the description, the category name, the sub-category name, etc. #hese fields do not contain codes that lin' us to other tables. Because the fields are the full descriptions, the dimension tables are often fat4 they contain many large fields. Dimension tables are often short, however. *e may have many products, but even so, the dimension table cannot compare in si/e to a normal fact table. >or e1ample, even if we have CD,DDD products in our product table, we may trac' sales for these products each day for several years. &ssuming we actually only sell C,DDD products in any given day, if we trac' these sales each day for ten years, we end up with this e(uation) C,DDD products sold O CEB day.year P AD years e(uals almost AA,DDD,DDD records7 #herefore, in relative terms, a dimension table with CD,DDD records will be short compared to the fact table. Given that a dimension table is fat, it may be tempting to denormali/e the dimension table. Nesist the urge to do so4 we will see why in a little while when we tal' about the snowfla'e schema.

Dimensional *ierarchies
*e have been building hierarchical structures in !"#$ systems for years. 2owever, hierarchical structures in an !"&$ system are different because the hierarchy for the dimension is actually all stored in the dimension table. #he product dimension, for e1ample, contains individual products. $roducts are normally grouped into categories, and these categories may well contain sub-categories. >or instance, a product with a product number of OA;HC may actually be a refrigerator.

#herefore, it falls into the category of ma:or appliance, and the sub-category of refrigerator. *e may have more levels of sub-categories, where we would further classify this product. #he 'ey here is that all of this information is stored in the dimension table. !ur dimension table might loo' something li'e this)

!igure 2 =otice that both Category and Subcategory are stored in the table and not lin'ed in through :oined tables that store the hierarchy information. #his hierarchy allows us to perform 8drill-down9 functions on the data. *e can perform a (uery that performs sums by category. *e can then drill-down into that category by calculating sums for the subcategories for that category. *e can the calculate the sums for the individual products in a particular subcategory. #he actual sums we are calculating are based on numbers stored in the fact table. *e will e1amine the fact table in more detail later.

3onsolidated Dimensional *ierarchies 4Star Schemas5


#he above e1ample >igure F% shows a hierarchy in a dimension table. #his is how the dimension tables are built in a star schema4 the hierarchies are contained in the individual dimension tables. =o additional tables are needed to hold hierarchical information. Storing the hierarchy in a dimension table allows for the easiest browsing of our dimensional data. -n the above e1ample, we could easily choose a category and then list all of that category0s subcategories. *e would drill-down into the data by choosing an individual subcategory from within the same table. #here is no need to :oin to an e1ternal table for any of the hierarchical informaion. -n this overly-simplified e1ample, we have two dimension tables :oined to the fact table. *e will e1amine the fact table later. >or now, we will assume the fact table has only one number) SalesDollars.

!igure 6 -n order to see the total sales for a particular month for a particular category, our S<" would loo' something li'e this) S3"3C# Sum Sales>act.SalesDollars% &S Sum!fSalesDollars >N!, #imeDimension -==3N H!-= $roductDimension -==3N H!-= Sales>act != $roductDimension.$roduct-D Q Sales>act.$roduct-D% != #imeDimension.#ime-D Q Sales>act.#ime-D *23N3 $roductDimension.CategoryQ0Brass Goods0 &=D #imeDimension.,onthQC &=D #imeDimension.6earQA@@@ #o drill down to a subcategory, we would merely change the statement to loo' li'e this) S3"3C# Sum Sales>act.SalesDollars% &S Sum!fSalesDollars >N!, #imeDimension -==3N H!-= $roductDimension -==3N H!-= Sales>act != $roductDimension.$roduct-D Q Sales>act.$roduct-D% != #imeDimension.#ime-D Q Sales>act.#ime-D *23N3 $roductDimension.SubCategoryQ0*idgets0 &=D #imeDimension.,onthQC &=D #imeDimension.6earQA@@@

Sno+-la.e Schemas
Sometimes, the dimension tables have the hierarchies bro'en out into separate tables. #his is a more normali/ed structure, but leads to more difficult (ueries and slower response times.

>igure @ represents the beginning of the snowfla'e process. #he category hierarchy is being bro'en out of the $roductDimension table. 6ou can see that this structure increases the number of :oins and can slow (ueries. Since the purpose of our !"&$ system is to speed (ueries, snowfla'ing is usually not something we want to do. Some people try to normali/e the dimension tables to save space. 2owever, in the overall scheme of the data warehouse, the dimension tables usually only hold about AR of the records. #herefore, any space savings from normali/ing, or snowfla'ing, are negligible.

!igure 7

8uilding the !act Table


#he >act #able holds our measures, or facts. #he measures are numeric and additive across some or all of the dimensions. >or e1ample, sales are numeric and we can loo' at total sales for a product, or category, and we can loo' at total sales by any time period. #he sales figures are valid no matter how we slice the data. *hile the dimension tables are short and fat, the fact tables are generally long and s'inny. #hey are long because they can hold the number of records represented by the product of the counts in all the dimension tables. >or e1ample, ta'e the following simplified star schema)

!igure 19 -n this schema, we have product, time and store dimensions. -f we assume we have ten years of daily data, ;DD stores, and we sell BDD products, we have a potential of CEB,DDD,DDD records CEBD days P ;DD stores P BDD products%. &s you can see, this ma'es the fact table long. #he fact table is s'inny because of the fields it holds. #he primary 'ey is made up of foreign 'eys that have migrated from the dimension tables. #hese fields are :ust some sort of numeric value. -n addition, our measures are also numeric. #herefore, the si/e of each record is generally much smaller than those in our dimension tables. 2owever, we have many, many more records in our fact table. !act :ranularity !ne of the most important decisions in building a star schema is the granularity of the fact table. #he granularity, or fre(uency, of the data is usually determined by the time dimension. >or e1ample, you may want to only store wee'ly or monthly totals. #he lower the granularity, the more records you will have in the fact table. #he granularity also determines how far you can drill down without returning to the base, transaction-level data.

,any !"&$ systems have a daily grain to them. #he lower the grain, the more records that we have in the fact table. 2owever, we must also ma'e sure that the grain is low enough to support our decision support needs. !ne of the ma:or benefits of the star schema is that the low-level transactions are summari/ed to the fact table grain. #his greatly speeds the (ueries we perform as part of our decision support. #his aggregation is the heart of our !"&$ system.

!act Table Si%e


*e have already seen how BDD products sold in ;DD stores and trac'ed for AD years could produce CEB,DDD,DDD records in a fact table with a daily grain. #his, however, is the ma1imum si/e for the table. ,ost of the time, we do not have this many records in the table. !ne of the things we do not want to do is store /ero values. So, if a product did not sell at a particular store for a particular day, we would not store a /ero value. *e only store the records that have a value. #herefore, our fact table is often sparsely populated. 3ven though the fact table is sparsely populated, it still holds the vast ma:ority of the records in our database and is responsible for almost all of our dis' space used. #he lower our granularity, the larger the fact table. 6ou can see from the previous e1ample that moving from a daily to wee'ly grain would reduce our potential number of records to only slightly more than B;,DDD,DDD records. #he data types for the fields in the fact table do help 'eep it as small as possible. -n most fact tables, all of the fields are numeric, which can re(uire less storage space than the long descriptions we find in the dimension tables. >inally, be aware that each added dimension can greatly increase the si/e of our fact table. -f we added one dimension to the previous e1ample that included ;D possible values, our potential number of records would reach F.C billion.

3hanging Attributes
!ne of the greatest challenges in a star schema is the problem of changing attributes. &s an e1ample, we will use the simplified star schema in >igure AD. -n the StoreDimension table, we have each store being in a particular region, territory, and /one. Some companies realign their sales regions, territories, and /ones occasionally to reflect changing business conditions. 2owever, if we simply go in and update the table, and then try to loo' at historical sales for a region, the numbers will not be accurate. By simply updating the region for a store, our total sales for that region will not be historically accurate. -n some cases, we do not care. -n fact, we want to see what the sales would have been had this store been in that other region in prior years. ,ore often, however, we do not want to change the historical data. -n this case, we may need to create a new record for the store. #his new record contains the new region, but leaves the old store record, and therefore

the old regional sales data, intact. #his approach, however, prevents us from comparing this stores current sales to its historical sales unless we 'eep trac' of it0s previous Store-D. #his can re(uire an e1tra field called $reviousStore-D or something similar. #here are no right and wrong answers. 3ach case will re(uire a different solution to handle changing attributes.

Aggregations
>inally, we need to discuss how to handle aggregations. #he data in the fact table is already aggregated to the fact table0s grain. 2owever, we often want to aggregate to a higher level. >or e1ample, we may want to sum sales to a monthly or (uarterly number. -n addition, we may be loo'ing for total :ust for a product or a category. #hese numbers must be calculated on the fly using a standard S<" statement. #his calculation ta'es time, and therefore some people will want to decrease the time re(uired to retrieve higher-level aggregations. Some people store higher-level aggregations in the database by pre-calculating them and storing them in the database. #his re(uires that the lowest-level records have special values put in them. >or e1ample, a #imeDimension record that actually holds wee'ly totals might have a @ in the Day!f*ee' field to indicate that this particular record holds the total for the wee'. #his approach has been used in the past, but better alternatives e1ist. #hese alternatives usually consist of building a cube structure to hold pre-calculated values. *e will e1amine ,icrosoft0s !"&$ Services, a tool designed to build cube structures to speed our access to warehouse data.

Potrebbero piacerti anche