Sei sulla pagina 1di 25

Data Warehouse

Definition
Importance of Data Warehouse
Its Components
Two Data Warehousing Strategies
ETL Processes
For a Successful Warehouse
Data Warehouse Pitfalls

Data Warehouse
A subject oriented, integrated, time-variant, non-volatile
collection of data in support of management decisions (Bill
Inmon)
Subject oriented -- data are organized around sales,
products, etc.
Integrated -- data are integrated to provide a
comprehensive view
Time variant -- historical data are maintained
Nonvolatile -- data are not updated by users

Limitations of Traditional
Databases
lack of on-line historical data
residing in different operational systems
extremely poor query performance
operational database designs not suited for
decision support

The Importance of Data


Warehousing
More cost effective decision making
Increase quality and flexibility of enterprise analysis as
data warehouse contain accurate and reliable data
Ability to maintain better customer relationships
Unlimited analyses of enterprise information

Components of Data warehouse


Summarized data
Basically of two type: 1) Lightly (departmental information)
2) Highly (enterprise wide decision)
Current detail

Comes directly from operational system


But stored by subject area and represent entire organization not a department

System of record
Maintaining the source of record
Integration and transformation Programs

Programs that convert an application specific data to enterprise data

Cont..

Performs many function like


Reformatting, recalculating
Adding time element
Identifying the default value
Summarizing and merging the data
Filling up the blank fields

Archives
Contain old data which hold some amount of significance to the organization
Used for trend analysis
Metadata
Control access and analysis of the data warehouse contents

To manage and control data warehouse creation and maintenance

Two Data Warehousing


Strategies
Enterprise-wide warehouse, top down, the
Inmon methodology
Data mart, bottom up, the Kimball
methodology
When properly executed, both result in an
enterprise-wide data warehouse

The Data Mart Strategy


The most common approach
Begins with a single mart and are added over
time for more subject areas
Relatively inexpensive and easy to implement
Can be used as a proof of concept for data
warehousing
Requires an overall integration plan

The Enterprise-wide Strategy


A comprehensive warehouse is built initially
An initial dependent data mart is built using a subset
of the data in the warehouse
Additional data marts are built using subsets of the
data in the warehouse
Like all complex projects, it is expensive, time
consuming, and prone to failure
When successful, it results in an integrated, scalable
warehouse

ETL Processes
Extraction, Transformation, and Loading Process

The plumbing work of data warehousing


Data are moved from source to target data
bases
A very costly, time consuming part of data
warehousing

Sample ETL Tools


Teradata Warehouse Builder from Teradata
DataStage from Ascential Software
SAS System from SAS Institute
Power Mart/Power Center from Informatica
Sagent Solution from Sagent Software

Reasons for Dirty Data

Dummy Values
Absence of Data
Multipurpose Fields
Inappropriate Use of Address Lines
Violation of Business Rules
Non-Unique Identifiers
Data Integration Problems

I. Data Cleansing and


Extracting
Source systems contain dirty data that must be cleansed
ETL software contains rudimentary data cleansing capabilities
Specialized data cleansing software is often used. Important
for performing name and address correction and householding
functions
Leading data cleansing vendors include Vality (Integrity),
Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)

Steps in Data Cleansing


Parsing
Correcting
Standardizing
Matching
Consolidating

Parsing
Parsing locates and identifies individual data
elements in the source files and then isolates
these data elements in the target files.
Examples include parsing the first, middle,
and last name; street number and street
name; and city and state.

Correcting
Corrects parsed individual data components
using sophisticated data algorithms and
secondary data sources.
Example include replacing a vanity address
and adding a zip code.

Standardizing
Standardizing applies conversion routines to
transform data into its preferred (and
consistent) format using both standard and
custom business rules.
Examples include adding a pre name,
replacing a nickname, and using a preferred
street name

Matching
Searching and matching records within and
across the parsed, corrected and standardized
data based on predefined business rules to
eliminate duplications.
Examples include identifying similar names
and addresses.

Consolidating
Analyzing and identifying relationships between
matched records and consolidating/merging
them into ONE representation.

II. Data Transformation


Transforms the data in accordance with the
business rules and standards that have been
established
Example include: format changes,
deduplication, splitting up fields,
replacement of codes, derived values, and
aggregates

III. Data Loading


Data are physically moved to the data
warehouse
The loading takes place within a load
window
The trend is to near real time updates of the
data warehouse as the warehouse is
increasingly used for operational applications

For a Successful Warehouse


From day one establish that warehousing is a joint
user/builder project
Establish that maintaining data quality will be an
ONGOING joint user/builder responsibility
Train the users one step at a time
Consider doing a high level corporate data model in no
more than three weeks
Look closely at the data extracting, cleaning, and loading
tools

Cont..
Determine a plan to test the integrity of the data in the
warehouse
From the start get warehouse users in the habit of 'testing'
complex queries
Coordinate system roll-out with network administration
personnel
Implement a user accessible automated directory to information
stored in the warehouse

Data Warehouse Pitfalls


Many warehouse end users will be trained and never or
seldom apply their training
Large scale data warehousing can become an exercise in
data homogenizing
Loading information only because it is available
Providing no maintenance to the data warehouse

Contact Us
BusinessName:SkylineBusinessSchool
Address:HauzKhasEnclave,
NewDelhi110016,India.
Phone:911126864848,:911126866968
Email:info@skylinecollege.com
Resource:
www.skylinecollege.com/ourprogrammes/pgpdatawarehousingbusiness
intelligence

Potrebbero piacerti anche