Data Warehouse

Data Warehouse
Data Warehouse
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process. Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject. Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.
Data Warehouse Contd

Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer. Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered. Ralph Kimball provided a more concise definition of a data warehouse: A data warehouse is a copy of transaction data specifically structured for query and analysis.
ETL
ETL : Extract Transform and load ETL is the method or technology used for implementation of data warehouse. Extract : Extract data from source Transform : Transform as per the business requirements. Load : Load data in Data warehouse
OLTP and OLAP systems

Difference b/w OLTP systems and Data warehouseOLTP : It stands for online transaction processing and is used basically for transactions. OLTP system is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second. In OLTP database there is detailed and current data, and schema used to store transactional databases is the entity model (usually 3NF).
OLAP
Online analytics processing.These system are implemented where speed of data retrieval required is more and compromise can be made over the speed of data entry. OLAP system is characterized by relatively low volume of transactions. Queries are often very complex and involve aggregations. For OLAP systems a response time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema).
Difference B/W OLTP and OLAP

Source of data
Purpose of data What the data
Operational data; OLTPs are the original source of the data.
Consolidation data; OLAP data comes from the various OLTP Databases
Inserts and Updates Queries
To control and run fundamental business To help with planning, problem solving, and tasks decision support Reveals a snapshot of on-going business Multi-dimensional views of various kinds of processes business activities Short and fast inserts and updates initiated Periodic long-running batch jobs refresh the by end users data Relatively standardized and simple queries Returning relatively few records Often complex queries involving aggregations Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP Typically de-normalized with fewer tables; use of star and/or snowflake schemas
Processing Speed
Typically very fast
Space Requirements Database Design
Can be relatively small if historical data is archived Highly normalized with many tables
Backup and Recovery
Backup religiously; operational data is Instead of regular backups, some critical to run the business, data loss is likely environments may consider simply reloading to entail significant monetary loss and legal the OLTP data as a recovery method liability
Data Warehousing concepts

Dimensional Modelling. Fact and Dimension tables. Star schema Snowflake schema Fact constellation schema Types of Dimension : Junk dimensions, Conformed dimension
Slowly changing Dimension (SCD)
Dimensional Modelling
DM is a design technique for databases intended to support end-user queries in a data warehouse. It is oriented around understandability and performance.
Dimensional modeling always uses the concepts of facts (measures), and dimensions (context). Facts are typically (but not always) numeric values that can be aggregated, and dimensions are groups of hierarchies and descriptors that define the facts.
Fact and Dimension tables
Fact table : A fact table is the table that contain measure of interest i.e. Business facts.
Example : sales amount for a product by store and day.
Fact table mostly contain the additive values, that can be aggregated to provide figures that would help to take business decisions.
Dimension Table
A category of information. For example, the Product dimension, store dimension, Time dimension. Attribute : A unique level within attribute. Ex :product category in product dimension Or month in time dimension
Types of Dimensional Modelling

Star Schema
Dimensional Modelling
Snow flake schema
Fact constellatio n schema.
Star Schema
In Star schema a single Fact table is surrounded by multiple dimensional tables.
Snow Flake Schema

A snowflake schema applies normalization over a star schema, in which very large dimension tables are normalized into multiple tables. Dimensions with hierarchies can be decomposed into a snowflake structure when you want to avoid joins to big dimension tables when you are using an aggregate of the fact table.
Advantages and Disadvantages

The normalization of dimension tables tends to increase number of dimension tables or sub-dimension tables that require more foreign key joins when querying the data therefore reduce the query performance. Snowflake schema helps in saving space by normalizing dimension tables. The query of snowflake schema is more complex than query of star schema due to multiple joins from dimension table to sub-dimension tables.
It is more difficult for business users who use data warehouse system using snowflake schema because they have to work with more tables in a database than star schema.
By creating aggregate table(s) and joining it (them) to the required dimension table(s) improves performance by reducing the execution time.
Fact Constellation schema

Fact constellation schema contain multiple fact table using same dimension tables.
Fact constellation schema can implement between aggregated fact table or else when a complex fact table is decomposed into independent simple fact table
Conformed dimension is describes a common structured dimension that shared across the various FACT table in data warehouse. conformed dimensions are used to avoid redundant data in data warehouse.
Slowly changing Dimensions

SCD Type 1 : New record replace the old record. No trace of old record exist.
Product ID 1 2 Product ID 1 2 Item Soap powder Price 15 55 Load_date 02-Dec 02-Dec Update_date 02-Dec 02-Dec
Item Soap powder
Price 20 55
Load_date 02-Dec 02-Dec
Update_date 05-Dec 02-Dec
Slowly Changing Dimension

SCD Type 2 : A new record is added into the table, therefore both old and new record exist. Latest record can be tracked by various ways. 1)Effective and End date concept.
Product ID 1 3 2 Product ID 1 2 Item Soap Soap Powder Item Soap Powder Price 15 20 55 Price 15 55 Eff_dt 02-Dec 05-Dec 02-Dec Eff_dt 02-Dec 02-Dec End_Dt 05-Dec NULL NULL End_Dt NULL NULL
Slowly Changing Dimension Cont

SCD Type 2 Versioning
Product ID 1 2
Product ID 1 3 2
Item Soap powder

Item Soap Soap powder
Price 15 55
Price 15 20 55
Version 1 1
Version 1 2 1
Slowly Changing Dimension Cont

SCD Type 3
Junk Dimension
In data warehouse design, frequently we run into a situation where there are yes/no indicator fields in the source system. Keep all those indicator fields in the fact table, not only do we need to build many small dimension tables, but the amount of information stored in the fact table also increases tremendously, leading to possible performance issues.
Fact table Customer_id prd_id prepay_ind
coupon_ind
Junk_ind 1 2 3 4
prepay_ind Y Y N N
coupon_ind Y N N Y
Types of Fact Tables

1)Additive Fact 2) Semi additive Fact 3)Non additive Fact 2)Semi
Surrogate key
Surrogate key is the DWH generated primary key that is used for uniquely identifying record in DWH. Why surrogate key is implemented : 1) When data is loaded from multiple sources. 2) When the history need to be maintained and primary key from source would violates the primary key constraint in DWH.
Data Mart
A data warehouse incorporates information about many subject areas often the entire enterprise. While the data mart focuses on one or more subject areas. The data mart represents only a portion of an enterprise's data perhaps data related to a business unit or work group.
Typically, a data mart's data is targeted to a smaller audience of end users or used to present information on a smaller scope.
Thank You

Data Warehouse

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Data Warehouse

Caricato da

Copyright:

Formati disponibili

Data Warehouse

Data Warehouse Contd

OLTP and OLAP systems

Difference B/W OLTP and OLAP

Operational data; OLTPs are the original source of the data.

Inserts and Updates Queries

Typically very fast

Space Requirements Database Design

Backup and Recovery

Data Warehousing concepts

Slowly changing Dimension (SCD)

Fact and Dimension tables

Example : sales amount for a product by store and day.

Types of Dimensional Modelling

Snow Flake Schema

Advantages and Disadvantages

Fact Constellation schema

Slowly changing Dimensions

Item Soap powder

Load_date 02-Dec 02-Dec

Update_date 05-Dec 02-Dec

Slowly Changing Dimension

Slowly Changing Dimension Cont

Item Soap powder

Slowly Changing Dimension Cont

Fact table Customer_id prd_id prepay_ind

Types of Fact Tables

Potrebbero piacerti anche