Sei sulla pagina 1di 39

Data Warehousing

 A decision support database that is


maintained separately from the organization’s
operational databases.
 A data warehouse is a
 subject-oriented,
 integrated,
 time-varying,
 non-volatile
collection of data that is used primarily in
organizational decision making
 Companies, over the years, gathered huge
volumes of data
 Data not suitable for ready usage – Information
crisis
 Technology changes
 Inability of past DSS to provide information
How to take strategic decisions?
Can we analyze this data to get any
competitive advantage?
Need for Strategic Decision Making
 Strategic Information – Information required
to make strategic decisions
 Managers, Executives use this to improve
business, market share, to retain customers etc.
 Operational data is good for day to day
activities but not good for strategic decision
making
Characteristics of Strategic
Information
Integrated Must have single, enterprise wide
view
Data Integrity Information must be accurate and
must conform to business rules
Accessible Easily accessible with intuitive
access paths, and responsive for
analysis
Credible Every business factor must have one
and only one value
Timely Information must be available within the
stipulated time frame
Operational Vs Decision Support
Systems (DSS)
Operational(oltp) Informational(olap)
Data Content Current data Archived, derived,
summarized
Data Structure Optimized for Optimized for
transactions complex queries
Access Frequency High Medium to Low
Access Type Read, Update, Read
Delete
Usage Predictable, Ad-hoc, random,
repetitive heuristic
Response Time Sub-seconds Several seconds to min
Users Large number Relatively small number
 Data Warehouse is designed around
“subjects” rather than processes
 A company may have
 Retail Sales System
 Outlet Sales System
 Catalog Sales System
 DW will have a Sales Subject Area
 Heterogeneous Source Systems
 Little or no control
 Need to Integrate source data
 Remove Inconsistencies
 For Example: Product codes could be different in
different systems
 Arrive at common code in DW
 Standardize various data elements
 Naming Conventions
 Codes
 Data Attributes
 Measurements
 Most business analysis
has a time component
 Trend Analysis (historical
data is required)
 Extract, Transform, Load (ETL) tools
 DW databases & DBMS tools
 Data marts
 Meta data
 DW administration & management tools
 Information delivery system
 Data Extraction
 Data Cleaning
 Data Transformation
 Convert from legacy/host format to warehouse
format
 Load
 Sort, summarize, consolidate, compute views,
check integrity, build indexes, partition
 Consumes 70-80% of project time
 Heterogeneous Source Systems
 Little or no control over source systems
 Source systems scattered
 Different currencies, measurement units
 Ensuring data quality
 A storage area where extracted data is
cleaned, transformed and deduplicated.
 Initial storage for data
 Need not be based on Relational model
 Mainly sorting and Sequential processing
 Does not provide data access to users
 Commercial tools:
 Warehouse Builders (Oracle)
 MS Data Transformation Services
 DataStage
 SAS ETL Server
 Typical functions
 Define source, query (run SQL), define
transformation, define target, verify
transformation, schedule run, audit report
 Almost always a relational DB
 Oracle, DB2, Sybase, SQL Server
 New DB design for special purpose of DW
(e.g., scale up, speed up, parallel
processing)
 OLTP Systems are Data Capture Systems
 “DATA IN” systems
 DW are “DATA OUT” systems
 Design of the DW must directly reflect the
way the managers look at the business
 Should capture the measurements of
importance along with parameters by
which these parameters are viewed
 must facilitate data analysis, i.e.,
answering business questions
 A logical design technique that seeks to
eliminate data redundancy
 Illuminates the microscopic relationships
among data elements
 Perfect for OLTP systems
 Responsible for success of transaction
processing in Relational Databases
ER models are NOT suitable for DW?
 End user cannot understand or remember
an ER Model
 Many DWs have failed because of overly
complex ER designs
 Not optimized for complex, ad-hoc queries
 Data retrieval becomes difficult due to
normalization
 Browsing becomes difficult
 Facts are stored in FACT Tables
 Dimensions are stored in DIMENSION
tables
 Dimension tables contains textual
descriptors of business
 Fact and dimension tables form a Star
Schema
 “BIG” fact table in center surrounded by
“SMALL” dimension tables
 Measures or facts
 Facts are “numeric” & “additive”
 For example; Sale Amount, Sale Units
 Factors or dimensions
 Star Schemas
 Snowflake & Fact constellations
 Data mart = subset of DW for community users,
e.g. accounting department
 Sometimes exist as Multidimensional Database
 Info mart = summarized data + report for
community users
 Top down – Data pulled from DW
 Bottom up – Data pushed to DW
Data Warehouse Vs Data Mart
Data Warehouse Data Mart

Corporate / Enterprise wide Departmental


Union of all data marts A single business process
Data received from staging area Star-join (fact & dimensions)
Organized on E-R Model Uses structure to suit
departmental view
 Data about data
 Help users understand content & locate data
 Types
1.Operational Metadata
Field, Field length, data types, structure
2. Extraction and Transformation Metadata
Extraction frequencies, Methods, Business
rules
3. End User Metadata
Navigational map for DW
 Security & priority
 Keep track of updates
 QC
 Purging & copy to data mart
 Security issue critical (users at many levels)
 Some security measures to protect a DW
 Views = limit users to see certain rows/columns
 Access control = grant rights to specific users to access
selected data (can be created by DBA thro’ SQL
commands such as Grant/Revoke)
 Admin controls such as group access, firewall,
encryption
 Audit = track what users are doing
 Tools
 Query & reporting
 OLAP
 Data mining, visualization, segmentation,
clustering
 New developments: text mining, web mining &
personalization
 Mining multimedia data
 Commercial tools
 Crystal Report, Impromptu, WebFocus
 Increasingly common mode of delivery:
 Web-enabled
 Paulraj Ponniah. 2001. Data Warehousing
Fundamentals, John Wiley & Sons, 2001.
 Vincent Rainardi. 2007. Building a Data Warehouse
With Examples in SQL Server. Apress.
 Turban, Aronson, and Liang. 2006. Decision Support
Systems and Intelligent Systems, Seventh
Edition
Thank you

Potrebbero piacerti anche