Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Cooking up a
Data Warehouse
Todd Saunders
Abstract
As a data warehousing professional, you know that your environment has many components that must work together and
interact just so to provide valuable information to your business. However, for colleagues who are beginning to familiarize
themselves with data warehousing, it is not always clear what
those components are and how they affect each other. This
article will provide an analogy to help you explain key
DW components and their interactions to neophytes.
Introduction
When explaining the basic components of a data warehouse environment, the analogy I like to use is that of a
restaurant. This is not a new analogy; in a Web search, I
found articles from several years ago that use it. However,
new techniques and technologies have influenced the way
data warehouses are developed and used, so its time to
update the analogy.
Because were all familiar with restaurants, you can use
this analogy to explain data warehousing to people
friends or colleagues, perhapswho are unfamiliar with
technology in general or with DW components such
as ETL tools or databases. Providing an easy way to
visualize what is happening in one of these solutions can
go a long way toward effectively communicating how a
data warehouse works. It should be easier for someone
new to data warehousing to visualize and remember how
ingredients are stored in a kitchen by type (for example,
frozen, canned, or fresh) than to visualize how data
is stored in a database according to subject area. The
analogy will provide DW neophytes with a clarifying
context about data warehouse solutions, so when they are
pulled into conversations regarding data structures, they
can discuss them meaningfully.
16
DW components
Data Sources
One of the complexities in data warehousing is determining what data to put into the data warehouse. In our
restaurant analogy, this equates to figuring out what raw
ingredients to order. The key is deciding what is going to
be on the menu. As a restaurant owner, you decide what
soups, salads, appetizers, main dishes, and desserts you
will offer. Each of these dishes requires ingredients, so
the complete menu gives you the total list of ingredients.
If one of the desserts you offer is a milk shake, you know
you will need to have milk, ice cream, and flavoring
available in the kitchen.
In the business world, knowing what information is
required by business users will help you determine what
data needs to be available in your data warehouse.
17
DW components
Data Updates
Another complexity is timing of data refresh. In the previous example, a business user needed a report delivered
each week of the receivables by customer, but what if
18
Data Standardization
Another complexity is standardization and hygiene.
Food orders can help in explaining what happens in the
standardization process.
The key to standardization and hygiene is getting
everything to look the way we expect it to. If we have
filet mignon on the menu, we need know how much of
exactly what to order. We cant just place an order for
meat. We need to make sure we are ordering beef, and
we need to make sure we are ordering beef tenderloin
and not strip steak.
What happens if we are ordering our beef from two different suppliers? One supplier may ship us individual filets.
The other may ship us tenderloins that can be carved into
six filets each. When we want to know how many filet
dinners well be able to serve at a given time, we need to
know how many filets and tenderloins are on hand and
how they add up to individual filet mignon meals.
In business, we need to know the number of items in an
order unit. Our business may have one supplier that ships
six oil filters per order and another that ships 24 filters per
order. In our data warehouse, we need to recognize how
many orders have been received from each supplier and
DW components
An experienced database
administrator with deep knowledge
of data warehousing is one of the
keys to creating a successful data
warehouse environment.
It makes more sense to keep information about customers
in one area, store attributes in another, and sales transactions in another. Our sales transaction record would have
the sales information (item ID, units sold, and amount),
with just a store ID and customer ID that can be used to
find out more information about the store or customer
later if needed.
Organizing the data in this manner will help with data
management as well as data retrieval, a key attribute of
data warehouses: the ability to access and retrieve data
(relatively) quickly.
ETL
In data warehousing, one of the biggest parts of the
development effort is the ETL process. ETL (extract,
transform, and load) refers to getting (extracting) data
from point A (the source system), transforming it (e.g.,
changing euros to U.S. dollars), and loading it into point B
(the correct table within the data warehouse). It is a much
19
DW components
20
Data Marts
Typically, a data mart contains summarized (or aggregated) data relevant to a particular subject area such as
marketing or sales. In our kitchen, a data mart would be
like food that is partially pre-made to expedite completion of the dish.
Picture one of those fast food Chinese restaurants where
you can choose rice or noodles, then one or several main
courses such as orange chicken, garlic chicken, or beef
and broccoli. The rice and noodles have already been
cooked and are ready to dish, as are the main courses.
This is how the data mart works. You have your raw
ingredients (raw data) stored in the kitchen in the various
storage areas (freezer, refrigerator, or shelf) just like
the data in the tables in the data warehouse. You then
partially prepare the food (cook the rice or make the
orange chicken), much as you would aggregate the data
for the data mart (sum up all the sales by customer or
calculate total parts sold per time period).
When you want to prepare the final dish (orange chicken
on rice), you can quickly scoop the two ingredients
together on a plate. In the case of the data mart, you can
simply select the appropriate time period and see how
many parts were sold without having to go to the data
warehouse and select each and every individual transaction (where some may have been sales transactions, some
were order corrections, and some were returns).
The data has already been prepared, so you know that
when you ask for net parts sold during a time period, the
mart has already applied all the necessary logic to the raw
data to present you with the right answer.
Reporting
Consider the dishes delivered to the customers at their
tables. The dishes are analogous to the presentation (i.e.,
reporting) layer in a data warehousing environment.
They are the end product. They are what are produced
using our raw ingredients as inputs.
The dishes are ordered by the customers based on choices
from the menu. The menu is not infinite. It has a set
selection from which the customers can choose, because
DW components
the kitchen cannot possibly stock all the ingredients necessary to produce any dish that a customer might desire.
Rather, the kitchen is stocked with the raw ingredients
needed to produce any of the items listed on the menu.
When a chef receives an order for veal parmesan, he or
she knows that the necessary ingredients are available in
the kitchen and can find those ingredients and produce
the dish in a timely manner.
Similarly, in our data warehouse environment, the end
users have been identified and their reporting needs captured. These reports are like the menu items. Just as the
menu items require certain raw ingredients, the reports
require certain data. Since the report specifications are
known before the warehouse is built (if the process works
as it should), we can be confident that the needed data is
in the warehouse and available for each report.
Standard Reports
The reporting environment often includes a set of
standard reports. It is useful in many cases for a business
unit to receive the same report every Monday morning
showing sales for the previous week, day, or other time
period, depending on the business need. The point is
that the report arrives at the expected time on a predetermined frequency containing the most recent information.
This is like having the same meal prepared and picked
up or delivered on a regular schedule. Maybe you like to
treat yourself to a favorite meal every Friday for lunch
and have it delivered. You talk to the restaurant, let them
know how you would like the meal prepared, and ask
them to deliver it to your office every Friday at noon. You
get the same food every week and it is prepared with
fresh ingredients each time.
Configurable Reports
Sometimes when reading a menu, you like a particular
dish but would like to exchange one of the ingredients or
side dishes. You might ask the server for soup instead of
salad, or chips instead of fries. Reporting can operate in a
similar fashion.
Reporting environments will often provide a list of standard reports that an end user can select to view. However,
the end user may want to vary that report slightly. For
example, a particular report may show sales by region
by year for the last 10 years, but the end user would
prefer sales by month over the last 12 months. It is often
possible to make reports like this configurable, where
the end user can select some of these parameterssuch
as time period or regionbut the basic structure of the
report and the supporting data remains the same.
21
DW components
Summary
It is a little surprising how closely the process of producing meals at a restaurant resembles the processes in a
data warehousing environment. When people you know
are thinking about the components of a data warehouse
solution, this restaurant analogy should be a good
way to help them keep the components and processes
straight and provide a clearer picture of what is going on
in your solution.
Of course, no analogy is perfect, but this one does a
good job of providing an easy-to-understand overview
of what could be a whole new environment for those
new to the technology. n
22
Submissions
www.tdwi.org/journalsubmissions
Materials should be submitted to:
Jennifer Agee, Managing Editor
E-mail: journal@tdwi.org
Upcoming Deadlines
Volume 14, Number 4
Submissions Deadline: September 4, 2009
Distribution Date: December 2009
Volume 15, Number 1
Submissions Deadline: December 18, 2009
Distribution Date: March 2010