Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
SHIVA
OLTP(ONLINE TRANSACTIONAL
PROCESSING)
CHARACTERSTICS:
Data volatile
Normalization
Maintains current data
It doesnt support business analysis
Designed for transactional processing
History
W.H.INMON / Ralph Kimball -Father of data warehouse
Designed for decision making process.
Historical data.
De-Normalization.
Static.
OLTP
OLTP
OLTP
ETL
DWH
Data warehouse
According to Inmon, famous author for several data
Characteristics of DW
Subject-Oriented D.B.
Time variant.
Non-volatile.
Integrated DB.
Time variant
Characteristics and Features
Historical data is kept in a data warehouse. For example, one can retrieve
data from 3 months, 6 months, 12 months, or even older data from a data
warehouse. This contrasts with a transactions system, where often only the
most recent data is kept. For example, a transaction system may hold the
most recent address of a customer, where a data warehouse can hold all
addresses associated with a customer.
Integrated DB
Characteristics and Features
Subject Oriented
Characteristics and Features
A data warehouse can be used to analyze a particular subject area. For
example, "sales" can be a particular subject.
NONVOLATILE
Once data is in the data warehouse, it will not change. So, historical data in a
in data warehousing responsible for pulling data out of the source systems and
placing it into a data warehouse. ETL involves the following tasks:
- extracting the data from source systems (SAP, ERP, other oprational
systems), data from different source systems is converted into
one consolidated data warehouse format which is ready for transformation
processing.
- transforming the data may involve the following tasks:
applying business rules (so-called derivations, e.g., calculating new measures
and dimensions), cleaning (e.g., mapping NULL to 0 or "Male" to "M" and
"Female" to "F" etc.), filtering (e.g., selecting only certain columns to load),
splitting a column into multiple columns and vice versa, joining together data
from multiple sources (e.g., lookup, merge), transposing rows and columns,
applying any kind of simple or complex data validation (e.g., if the first 3
columns in a row are empty then reject the row from processing)
- loading the data into a data warehouse or data repository
other reportingapplications
DW Approaches
There are two type of approaches:
DWH
Meta Data
The metadata in a data warehouse system unfolds the definitions, meaning, origin and
rules of the data used in a Data Warehouse. There are two main types of metadata in a
data warehouse system: business metadata and technical metadata. Those two
types illustrate both business and technical point of view on the data.
The Data Warehouse Metadata is usually stored in a Metadata Repository which is
accessible by a wide range of users.
Typically, the following information needs to be provided to describe business
metadata:
DW Table Name
DW Column Name
Business Name - short and desctiptive header information
Definition - extended description with brief overiview of the business rules for the field
Field Type - a flag may indicate whether a given field stores the key or a discrete value,
whether is active or not, or what data type is it. The content of that field (or fields) may
vary upon business needs.
Datawarehouse_ProjectLifecycle
DATA MART
A data mart is a subset of data warehouse that is designed
Dependent DM
Inmon:
Independent DM
Kimball:
A Stand-alone data mart focuses exclusively on one subject area and it is not designed in
an enterprise context. For example, manufacturing has their data mart, human resources
has their, finance has their and so on. stand-alone data mart gets data from multiple
transaction systems in one subject area or department to support specific business needs.
stand-alone data mart may use dimensional design or entity-relationship model. Analytic
or business intelligence tools query data directly from data mart and present information
to user. The picture below is a typical Stand-alone data mart.
Stand-alone data mart takes very short time to build and bring the visible result to
specific departments with less cost. However if you look at the whole system landscape
where multiple data marts exist, you will see that different ETL tools need to built for
different transaction systems in different technologies and the data is duplicate in several
data marts. From business perspective, each data mart is built to address a set of specific
business needs, what if the needs expand? And what if you want to analyze data across
function or department? The inconsistent data, such as definition of product, will make
the information comparison between departments impossible.
Fact Table
A table in a star schema that contains facts and connected to dimensions. A fact table typically
has two types of columns: those that contain facts and those that are foreign keys to dimension
tables. The primary key of a fact table is usually a composite key that is made up of all of its
foreign keys.
A fact table might contain either detail level facts or facts that have been aggregated (fact tables
that contain aggregated facts are often instead called summary tables). A fact table usually
contains facts with the same level of aggregation.
Dimension
A dimension is a descriptive data about the major aspects of business.
The dimensions are used to describe key performance indicator known as
facts.
A dimension table contains dimension which are de-normalized.
STAR SCHEMA
Star Schema is a relational database schema for representing multimensional data. It
is the simplest form of data warehouse schema that contains one or more dimensions
and fact tables. It is called a star schema because the entity-relationship diagram
between dimensions and fact tables resembles a star where one fact table is connected
to multiple dimensions. The center of the star schema consists of a large fact table and
it points towards the dimension tables. The advantage of star schema are slicing down,
performance increase and easy understanding of data.
.
SNOWFLAKE SCHEMA
A snowflake schema is a term that describes a star schema structure
Galaxy Schema
Fact Constellation: It is the process of joining two fact
tables.
Conformed Dimension (Reusable): A dimension which can
be shared by multiple fact tables is known as conformed
dimension.
Galaxy schema
contains many fact
tables with some
common dimensions
(conformed
dimensions). This
schema is a
combination of many
data marts
Dimensional Modeling
Dimensional Modeling is a methodology or approach used
Thank you