DW Concepts Shiva

DATA WAREHOUSE
SHIVA
OLTP(ONLINE TRANSACTIONAL
PROCESSING)
CHARACTERSTICS:
Data volatile
Normalization
Maintains current data
It doesnt support business analysis
Designed for transactional processing
History
W.H.INMON / Ralph Kimball -Father of data warehouse
Designed for decision making process.
Historical data.
De-Normalization.
Static.
OLTP
OLTP
OLTP
ETL
DWH
Data warehouse
According to Inmon, famous author for several data
warehouse books, "A data warehouse is a subject

oriented, integrated, time variant, non volatile collection
of data in support of management's decision making
process".
Characteristics of DW
Subject-Oriented D.B.
Time variant.
Non-volatile.
Integrated DB.
Time variant
Characteristics and Features
Historical data is kept in a data warehouse. For example, one can retrieve
data from 3 months, 6 months, 12 months, or even older data from a data
warehouse. This contrasts with a transactions system, where often only the
most recent data is kept. For example, a transaction system may hold the
most recent address of a customer, where a data warehouse can hold all
addresses associated with a customer.
Integrated DB
A data warehouse integrates data from multiple data sources.

For example,source A and source B may have different ways of identifying a
product, but in a data warehouse, there will be only a single way of identifying
a product.
Subject Oriented
A data warehouse can be used to analyze a particular subject area. For
example, "sales" can be a particular subject.
NONVOLATILE
Once data is in the data warehouse, it will not change. So, historical data in a
data warehouse should never be altered.
Differences b/w operational DB &DW
ETL(Extraction, transformation, loading)

ETL (Extract, Transform and Load) is a process
in data warehousing responsible for pulling data out of the source systems and
placing it into a data warehouse. ETL involves the following tasks:
- extracting the data from source systems (SAP, ERP, other oprational
systems), data from different source systems is converted into
one consolidated data warehouse format which is ready for transformation
processing.
- transforming the data may involve the following tasks:
applying business rules (so-called derivations, e.g., calculating new measures
and dimensions), cleaning (e.g., mapping NULL to 0 or "Male" to "M" and
"Female" to "F" etc.), filtering (e.g., selecting only certain columns to load),
splitting a column into multiple columns and vice versa, joining together data
from multiple sources (e.g., lookup, merge), transposing rows and columns,
applying any kind of simple or complex data validation (e.g., if the first 3
columns in a row are empty then reject the row from processing)
- loading the data into a data warehouse or data repository
other reportingapplications
Data Warehouse architecture
Data Warehouse architecture
DW Approaches
There are two type of approaches:
1.TOP-DOWN Approach (W.H. Inmon)

2.BOTTOM-UP Approach(Ralph Kimball).
TOP-DOWN Approach (W.H. Inmon)
BOTTOM-UP Approach(Ralph Kimball)
DWH
Meta Data
The metadata in a data warehouse system unfolds the definitions, meaning, origin and
rules of the data used in a Data Warehouse. There are two main types of metadata in a
data warehouse system: business metadata and technical metadata. Those two
types illustrate both business and technical point of view on the data.
The Data Warehouse Metadata is usually stored in a Metadata Repository which is
accessible by a wide range of users.
Typically, the following information needs to be provided to describe business
metadata:
DW Table Name
DW Column Name
Business Name - short and desctiptive header information
Definition - extended description with brief overiview of the business rules for the field
Field Type - a flag may indicate whether a given field stores the key or a discrete value,
whether is active or not, or what data type is it. The content of that field (or fields) may
vary upon business needs.
Datawarehouse_ProjectLifecycle
DATA MART
A data mart is a subset of data warehouse that is designed
for a particular line of business, such as sales, marketing, or

finance. In a dependent data mart, data can be derived from
an enterprise-wide data warehouse. In an independent data
mart, data can be collected directly from sources.
DM are known as high performance query structures.
There are two types of DM.
1.Dependent DM
2.Independent DM
DM are logical which never contains data.
DM contains only views.
Dependent DM
Inmon:
According to Bill Inmon, a dependent data mart is a place

where its data comes from a data warehouse. Data in a data
warehouse is aggregated, restructured, and summarized
when it passes into the dependent data mart.
Independent DM
Kimball:
A Stand-alone data mart focuses exclusively on one subject area and it is not designed in
an enterprise context. For example, manufacturing has their data mart, human resources
has their, finance has their and so on. stand-alone data mart gets data from multiple
transaction systems in one subject area or department to support specific business needs.
stand-alone data mart may use dimensional design or entity-relationship model. Analytic
or business intelligence tools query data directly from data mart and present information
to user. The picture below is a typical Stand-alone data mart.
Stand-alone data mart takes very short time to build and bring the visible result to
specific departments with less cost. However if you look at the whole system landscape
where multiple data marts exist, you will see that different ETL tools need to built for
different transaction systems in different technologies and the data is duplicate in several
data marts. From business perspective, each data mart is built to address a set of specific
business needs, what if the needs expand? And what if you want to analyze data across
function or department? The inconsistent data, such as definition of product, will make
the information comparison between departments impossible.
DW- Database Design

A DW is designed with following types of schemas:
Star Schema
Snowflake Schema
Galaxy Schema
Fact Table
A table in a star schema that contains facts and connected to dimensions. A fact table typically
has two types of columns: those that contain facts and those that are foreign keys to dimension
tables. The primary key of a fact table is usually a composite key that is made up of all of its
foreign keys.
A fact table might contain either detail level facts or facts that have been aggregated (fact tables
that contain aggregated facts are often instead called summary tables). A fact table usually
contains facts with the same level of aggregation.
Dimension
A dimension is a descriptive data about the major aspects of business.
The dimensions are used to describe key performance indicator known as
facts.
A dimension table contains dimension which are de-normalized.
STAR SCHEMA
Star Schema is a relational database schema for representing multimensional data. It
is the simplest form of data warehouse schema that contains one or more dimensions
and fact tables. It is called a star schema because the entity-relationship diagram
between dimensions and fact tables resembles a star where one fact table is connected
to multiple dimensions. The center of the star schema consists of a large fact table and
it points towards the dimension tables. The advantage of star schema are slicing down,
performance increase and easy understanding of data.
.
SNOWFLAKE SCHEMA
A snowflake schema is a term that describes a star schema structure
normalized through the use of outrigger tables. i.e. dimension table

hierarchies are broken into simpler tables.
Galaxy Schema
Fact Constellation: It is the process of joining two fact
tables.
Conformed Dimension (Reusable): A dimension which can
be shared by multiple fact tables is known as conformed
dimension.
Galaxy schema
contains many fact
tables with some
common dimensions
(conformed
dimensions). This
schema is a
combination of many
data marts
Factless fact table: A fact table without any facts is known as
factless fact table
Slowly Changing Dimension (SCD)

Slowly Changing Dimensions: Slowly changing dimensions are the
dimensions in which the data changes slowly, rather than changing

regularly on a time basis.
Type1 SCD: SCD type 1 methodology is used when there is no need to store
historical data in the dimension table. This method overwrites the old data
in the dimension table with the new data.
Type2 SCDSCD type 2 stores the entire history the data in the dimension
table. With type 2 we can store unlimited history in the dimension table. In
type 2, you can store the data in three different ways. They are
Versioning
Flagging
Effective Date
Type3 SCD: In type 3 method, only the current status and previous status
of the row is maintained in the table. To track these changes two separate
columns are created in the table.
Dimensional Modeling
Dimensional Modeling is a methodology or approach used
for designing the star schema . A Dimensional modeling

contains the following phases in designing the star
schema.
1.Conceptual modeling
2.Logical Modeling
3.Physical Modeling
Thank you

DW Concepts Shiva

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

DW Concepts Shiva

Caricato da

Copyright:

Formati disponibili

DATA WAREHOUSE

warehouse books, "A data warehouse is a subject

A data warehouse integrates data from multiple data sources.

data warehouse should never be altered.

Differences b/w operational DB &DW

ETL(Extraction, transformation, loading)

Data Warehouse architecture

Data Warehouse architecture

1.TOP-DOWN Approach (W.H. Inmon)

TOP-DOWN Approach (W.H. Inmon)

BOTTOM-UP Approach(Ralph Kimball)

for a particular line of business, such as sales, marketing, or

According to Bill Inmon, a dependent data mart is a place

DW- Database Design

normalized through the use of outrigger tables. i.e. dimension table

Factless fact table: A fact table without any facts is known as

factless fact table

Slowly Changing Dimension (SCD)

dimensions in which the data changes slowly, rather than changing

for designing the star schema . A Dimensional modeling

Potrebbero piacerti anche