Sei sulla pagina 1di 32

DATA WAREHOUSE

SHIVA

OLTP(ONLINE TRANSACTIONAL
PROCESSING)
CHARACTERSTICS:
Data volatile
Normalization
Maintains current data
It doesnt support business analysis
Designed for transactional processing

History
W.H.INMON / Ralph Kimball -Father of data warehouse
Designed for decision making process.
Historical data.
De-Normalization.
Static.
OLTP
OLTP
OLTP

ETL

DWH

Data warehouse
According to Inmon, famous author for several data

warehouse books, "A data warehouse is a subject


oriented, integrated, time variant, non volatile collection
of data in support of management's decision making
process".

Characteristics of DW
Subject-Oriented D.B.
Time variant.
Non-volatile.
Integrated DB.

Time variant
Characteristics and Features

Historical data is kept in a data warehouse. For example, one can retrieve

data from 3 months, 6 months, 12 months, or even older data from a data
warehouse. This contrasts with a transactions system, where often only the
most recent data is kept. For example, a transaction system may hold the
most recent address of a customer, where a data warehouse can hold all
addresses associated with a customer.

Integrated DB
Characteristics and Features

A data warehouse integrates data from multiple data sources.


For example,source A and source B may have different ways of identifying a
product, but in a data warehouse, there will be only a single way of identifying
a product.

Subject Oriented
Characteristics and Features
A data warehouse can be used to analyze a particular subject area. For
example, "sales" can be a particular subject.

NONVOLATILE
Once data is in the data warehouse, it will not change. So, historical data in a

data warehouse should never be altered.

Differences b/w operational DB &DW

ETL(Extraction, transformation, loading)


ETL (Extract, Transform and Load) is a process

in data warehousing responsible for pulling data out of the source systems and
placing it into a data warehouse. ETL involves the following tasks:
- extracting the data from source systems (SAP, ERP, other oprational
systems), data from different source systems is converted into
one consolidated data warehouse format which is ready for transformation
processing.
- transforming the data may involve the following tasks:
applying business rules (so-called derivations, e.g., calculating new measures
and dimensions), cleaning (e.g., mapping NULL to 0 or "Male" to "M" and
"Female" to "F" etc.), filtering (e.g., selecting only certain columns to load),
splitting a column into multiple columns and vice versa, joining together data
from multiple sources (e.g., lookup, merge), transposing rows and columns,
applying any kind of simple or complex data validation (e.g., if the first 3
columns in a row are empty then reject the row from processing)
- loading the data into a data warehouse or data repository
other reportingapplications

Data Warehouse architecture

Data Warehouse architecture

DW Approaches
There are two type of approaches:

1.TOP-DOWN Approach (W.H. Inmon)


2.BOTTOM-UP Approach(Ralph Kimball).

TOP-DOWN Approach (W.H. Inmon)

BOTTOM-UP Approach(Ralph Kimball)

DWH

Meta Data
The metadata in a data warehouse system unfolds the definitions, meaning, origin and

rules of the data used in a Data Warehouse. There are two main types of metadata in a
data warehouse system: business metadata and technical metadata. Those two
types illustrate both business and technical point of view on the data.
The Data Warehouse Metadata is usually stored in a Metadata Repository which is
accessible by a wide range of users.
Typically, the following information needs to be provided to describe business
metadata:
DW Table Name
DW Column Name
Business Name - short and desctiptive header information
Definition - extended description with brief overiview of the business rules for the field
Field Type - a flag may indicate whether a given field stores the key or a discrete value,
whether is active or not, or what data type is it. The content of that field (or fields) may
vary upon business needs.

Datawarehouse_ProjectLifecycle

DATA MART
A data mart is a subset of data warehouse that is designed

for a particular line of business, such as sales, marketing, or


finance. In a dependent data mart, data can be derived from
an enterprise-wide data warehouse. In an independent data
mart, data can be collected directly from sources.
DM are known as high performance query structures.
There are two types of DM.
1.Dependent DM
2.Independent DM
DM are logical which never contains data.
DM contains only views.

Dependent DM

Inmon:

According to Bill Inmon, a dependent data mart is a place


where its data comes from a data warehouse. Data in a data
warehouse is aggregated, restructured, and summarized
when it passes into the dependent data mart.

Independent DM

Kimball:

A Stand-alone data mart focuses exclusively on one subject area and it is not designed in

an enterprise context. For example, manufacturing has their data mart, human resources
has their, finance has their and so on. stand-alone data mart gets data from multiple
transaction systems in one subject area or department to support specific business needs.
stand-alone data mart may use dimensional design or entity-relationship model. Analytic
or business intelligence tools query data directly from data mart and present information
to user. The picture below is a typical Stand-alone data mart.
Stand-alone data mart takes very short time to build and bring the visible result to
specific departments with less cost. However if you look at the whole system landscape
where multiple data marts exist, you will see that different ETL tools need to built for
different transaction systems in different technologies and the data is duplicate in several
data marts. From business perspective, each data mart is built to address a set of specific
business needs, what if the needs expand? And what if you want to analyze data across
function or department? The inconsistent data, such as definition of product, will make
the information comparison between departments impossible.

DW- Database Design


A DW is designed with following types of schemas:
Star Schema
Snowflake Schema
Galaxy Schema

Fact Table
A table in a star schema that contains facts and connected to dimensions. A fact table typically

has two types of columns: those that contain facts and those that are foreign keys to dimension
tables. The primary key of a fact table is usually a composite key that is made up of all of its
foreign keys.
A fact table might contain either detail level facts or facts that have been aggregated (fact tables
that contain aggregated facts are often instead called summary tables). A fact table usually
contains facts with the same level of aggregation.

Dimension
A dimension is a descriptive data about the major aspects of business.
The dimensions are used to describe key performance indicator known as

facts.
A dimension table contains dimension which are de-normalized.

STAR SCHEMA
Star Schema is a relational database schema for representing multimensional data. It

is the simplest form of data warehouse schema that contains one or more dimensions
and fact tables. It is called a star schema because the entity-relationship diagram
between dimensions and fact tables resembles a star where one fact table is connected
to multiple dimensions. The center of the star schema consists of a large fact table and
it points towards the dimension tables. The advantage of star schema are slicing down,
performance increase and easy understanding of data.
.

SNOWFLAKE SCHEMA
A snowflake schema is a term that describes a star schema structure

normalized through the use of outrigger tables. i.e. dimension table


hierarchies are broken into simpler tables.

Galaxy Schema
Fact Constellation: It is the process of joining two fact

tables.
Conformed Dimension (Reusable): A dimension which can
be shared by multiple fact tables is known as conformed
dimension.
Galaxy schema
contains many fact
tables with some
common dimensions
(conformed
dimensions). This
schema is a
combination of many
data marts

Factless fact table: A fact table without any facts is known as

factless fact table

Slowly Changing Dimension (SCD)


Slowly Changing Dimensions: Slowly changing dimensions are the

dimensions in which the data changes slowly, rather than changing


regularly on a time basis.
Type1 SCD: SCD type 1 methodology is used when there is no need to store
historical data in the dimension table. This method overwrites the old data
in the dimension table with the new data.
Type2 SCDSCD type 2 stores the entire history the data in the dimension
table. With type 2 we can store unlimited history in the dimension table. In
type 2, you can store the data in three different ways. They are
Versioning
Flagging
Effective Date
Type3 SCD: In type 3 method, only the current status and previous status
of the row is maintained in the table. To track these changes two separate
columns are created in the table.

Dimensional Modeling
Dimensional Modeling is a methodology or approach used

for designing the star schema . A Dimensional modeling


contains the following phases in designing the star
schema.
1.Conceptual modeling
2.Logical Modeling
3.Physical Modeling

Thank you

Potrebbero piacerti anche