Sei sulla pagina 1di 48

Data Warehouse Architecture

Decision Support
Information technology to help the knowledge worker (executive, manager, analyst) make faster & better decisions
What were the sales volumes by region and product category for the last year? How did the share price of comp. manufacturers correlate with quarterly profits over the past 10 years? Which orders should we fill to maximize revenues?

On-line analytical processing (OLAP) is an element of decision support systems (DSS)

OLAP Conceptual Data Model


Goal of OLAP is to support ad-hoc querying for the business analyst Business analysts are familiar with spreadsheets Extend spreadsheet analysis model to work with warehouse data Multidimensional view of data is the foundation of OLAP

Three-Tier Decision Support Systems


Warehouse database server
Almost always a relational DBMS, rarely flat files

OLAP servers
Relational OLAP (ROLAP): extended relational DBMS that maps operations on multidimensional data to standard relational operators Multidimensional OLAP (MOLAP): special-purpose server that directly implements multidimensional data and operations

Clients
Query and reporting tools Analysis tools Data mining tools

Approaches to OLAP Servers


Three possibilities for OLAP servers (1) Relational OLAP (ROLAP) Relational and specialized relational DBMS to store and manage warehouse data OLAP middleware to support missing pieces (2) Multidimensional OLAP (MOLAP) Array-based storage structures Direct access to array data structures (3) Hybrid OLAP (HOLAP)

Storing detailed data in RDBMS Storing aggregated data in MDBMS User access via MOLAP tools

OLTP vs. OLAP


On-Line Transaction Processing (OLTP):
technology used to perform updates on operational or transactional systems (e.g., point of sale systems)

On-Line Analytical Processing (OLAP):


technology used to perform complex analysis of the data in a data warehouse
OLAP is a category of software technology that enables analysts, managers, and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that has been transformed from raw data to reflect the dimensionality of the enterprise as understood by the user. [source: OLAP Council: www.olapcouncil.org]

OLTP vs. OLAP Source: Datta, GT


OLTP OLAP

User Function DB Design

Clerk, IT Professional Day to day operations Application-oriented (E-R based) Current, Isolated Detailed, Flat relational Structured, Repetitive Short, Simple transaction Read/write Index/hash on prim. Key Tens Thousands 100 MB-GB Trans. throughput

Knowledge worker Decision support Subject-oriented (Star, snowflake) Historical, Consolidated Summarized, Multidimensional Ad hoc Complex query Read Mostly Lots of Scans Millions Hundreds 100GB-TB Query throughput, response

Data View Usage Unit of work Access Operations # Records accessed #Users Db size Metric

Approaches to OLAP Servers


Multidimensional OLAP (MOLAP) Array-based storage structures Direct access to array data structures Example: Essbase (Arbor)

Relational OLAP (ROLAP)


Relational and Specialized Relational DBMS to store and manage warehouse data OLAP middleware to support missing pieces Optimize for each DBMS backend Aggregation Navigation Logic Additional tools and services Example: Microstrategy, MetaCube (Informix)

ROLAP
Special schema design: star, snowflake Special indexes: bitmap, multi-table join Special tuning: maximize query throughput Proven technology (relational model, DBMS), tend to outperform specialized MDDB especially on large data sets Products
IBM DB2, Oracle, Sybase IQ, RedBrick, Informix

Points to be noticed about ROLAP


Defines complex, multi-dimensional data with simple model Reduces the number of joins a query has to process Allows the data warehouse to evolve with rel. low maintenance Can contain both detailed and summarized data. ROLAP is based on familiar, proven, and already selected technologies. BUT!!! SQL for multi-dimensional manipulation of calculations.

MOLAP
MDDB: a special-purpose data model Facts stored in multi-dimensional arrays Dimensions used to index array Sometimes on top of relational DB Products
Pilot, Arbor Essbase, Gentia

Multidimensional Data
Sales Volume as a function of time, city and product

Juice Cola Milk Cream

10 47 30 12

3/1 3/2 3/3 3/4

Date

Operations in Multidimensional Data Model


Aggregation (roll-up) dimension reduction: e.g., total sales by city summarization over aggregate hierarchy: e.g., total sales by city and year -> total sales by region and by year Selection (slice) defines a subcube e.g., sales where city = Palo Alto and date = 1/15/96 Navigation to detailed data (drill-down) e.g., (sales - expense) by city, top 3% of cities by average income Visualization Operations (e.g., Pivot)

A Visual Operation: Pivot (Rotate)


Juice Cola Milk
10

47
30 Product

Cream 12

3/1 3/2 3/3 3/4

Date

Advantages of ROLAP Dimensional Modeling


Define complex, multi-dimensional data with simple model Reduces the number of joins a query has to process Allows the data warehouse to evolve with rel. low maintenance HOWEVER! Star schema and relational DBMS are not the magic solution
Query optimization is still problematic

Aggregates
Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1
sale prodId p1 p2 p1 p2 p1 p1 storeId s1 s1 s3 s2 s1 s2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4

81

Aggregates
Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date
sale prodId p1 p2 p1 p2 p1 p1 storeId s1 s1 s3 s2 s1 s2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4

ans

date 1 2

sum 81 48

Another Example
Add up amounts by day, product In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId
sale prodId p1 p2 p1 p2 p1 p1 storeId s1 s1 s3 s2 s1 s2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4

sale

prodId p1 p2 p1

date 1 1 2

amt 62 19 48

rollup
drill-down

Aggregates
Operators: sum, count, max, min, median, ave Having clause Using dimension hierarchy
average by region (within store) maximum by month (within date)

ROLAP vs. MOLAP


ROLAP: Relational On-Line Analytical Processing MOLAP: Multi-Dimensional On-Line Analytical Processing

The MOLAP Cube


Fact table view:
sale prodId p1 p2 p1 p2 storeId s1 s1 s3 s2 amt 12 11 50 8

Multi-dimensional cube:
p1 p2 s1 12 11 s2 8 s3 50

dimensions = 2

3-D Cube
Fact table view:
sale prodId p1 p2 p1 p2 p1 p1 storeId s1 s1 s3 s2 s1 s2 date 1 1 1 1 2 2 amt 12 11 50 8 44 4

Multi-dimensional cube:

day 2 day 1

p1 p2 s1 p1 12 p2 11

s1 44 s2 8

s2 4 s3 50

s3

dimensions = 3

Example
roll-up to region

NY
SF LA Juice Milk Coke Cream Soap Bread 10 34 56 32 12 56 M T W Th F S S

Dimensions: Time, Product, Store roll-up to brand Attributes: Product (ucp, price, ) Store Hierarchies: Product Brand Day Week Quarter roll-up to week Store Region Country

Product

Time
56 units of bread sold in LA on M

Cube Aggregation: Roll-up


Example: computing sums
day 2
day 1
p1 p2 s1 p1 12 p2 11 s1 44 s2 8 s2 4 s3 50 s3

...

sum
p1 p2 s1 56 11 s2 4 8 s3 50

s1 67

s2 12

s3 50

129
p1 p2 sum 110 19

rollup drill-down

Cube Operators for Roll-up


day 2
day 1
p1 p2 s1 p1 12 p2 11 s1 44 s2 8 s2 4 s3 50 s3

... sale(s1,*,*)
sum s1 67 s2 12 s3 50

p1 p2

s1 56 11

s2 4 8

s3 50

129
p1 p2 sum 110 19

sale(s2,p2,*)

sale(*,*,*)

Extended Cube
*

day 2

p1 p2 * s1

s1 56 11 67 s2

s2 4 8 12 s3
* 62 19 81

s3 50 50 * 48

* 110 19 129

day 1

p1 p2 *

p1 p2 s1 * 12 11 23

44
s2 44 8 8

4
s3 4 50 50

48

sale(*,p2,*)

Aggregation Using Hierarchies


day 2 day 1
p1 p2 s1 p1 12 p2 11 s1 44 s2 8 s2 4 s3 50 s3

store region country

p1 p2

region A region B 56 54 11 8

(store s1 in Region A; stores s2, s3 in Region B)

Slicing
day 2
day 1
p1 p2 s1 p1 12 p2 11 s1 44 s2 8 s2 4 s3 50 s3

TIME = day 1
s1 12 11 s2 8 s3 50

p1 p2

Slicing & Pivoting

Products Store s1 Electronics Toys Clothing Cosmetics Electronics Toys Clothing Cosmetics

Store s2

Sales ($ millions) Time d1 d2 $5.2 $1.9 $2.3 $1.1 $8.9 $0.75 $4.6 $1.5 Sales ($ millions) d1 Store s1 Store s2 $5.2 $8.9 $1.9 $0.75 $2.3 $4.6 $1.1 $1.5

Products Store s1 Electronics Toys Clothing Cosmetics Electronics Toys Clothing

Store s2

Summary of Operations
Aggregation (roll-up) aggregate (summarize) data to the next higher dimension element e.g., total sales by city, year total sales by region, year Navigation to detailed data (drill-down) Selection (slice) defines a subcube e.g., sales where city =Gainesville and date = 1/15/90 Calculation and ranking e.g., top 3% of cities by average income Visualization operations (e.g., Pivot) Time functions e.g., time average

Points to be noticed about MOLAP


Pre-calculating or pre-consolidating transactional data improves speed. BUT Fully pre-consolidating incoming data, MDDs require an enormous amount of overhead both in processing time and in storage. An input file of 200MB can easily expand to 5GB

MDDs are great candidates for the <50GB department data marts.
Rolling up and Drilling down through aggregate data. With MDDs, application design is essentially the definition of dimensions and calculation rules, while the RDBMS requires that the database schema be a star or snowflake.

ROLAP

Relational DBMS as Warehouse Server


Schema design Specialized scan, indexing and join techniques Handling of aggregate views (querying and materialization) Supporting query language extensions beyond SQL Complex query processing and optimization Data partitioning and parallelism

MOLAP vs. OLAP


Commercial offerings of both types are available In general, MOLAP is good for smaller warehouses and is optimized for canned queries In general, ROLAP is more flexible and leverages relational technology on the data server and uses a ROLAP server as intermediary. May pay a performance penalty to realize flexibility

The MOLAP Cube


Fact table view:
sale prodId p1 p2 p1 p2 storeId s1 s1 s3 s2 amt 12 11 50 8

Multi-dimensional cube:
p1 p2 s1 12 11 s2 8 s3 50

dimensions = 2

Hybrid OLAP (HOLAP)


HOLAP = Hybrid OLAP:
Best of both worlds

Storing detailed data in RDBMS


Storing aggregated data in MDBMS User access via MOLAP tools

Data Flow in HOLAP


RDBMS Server MDBMS Server
Multidimensional access Multidimensional data SQL-Reach Through

Client

SQL-Read User data

Meta data Derived data

Multidimensional Viewer

Relational Viewer
SQL-Read

When deciding which technology to go for, consider:


1) Performance:
How fast will the system appear to the end-user? MDD server vendors believe this is a key point in their favor.

2) Data volume and scalability:


While MDD servers can handle up to 50GB of storage, RDBMS servers can handle hundreds of gigabytes and terabytes.

An experiment with Relational and the Multidimensional models on a data set


The analysis of the authors example illustrates the following differences between the best Relational alternative and the Multidimensional approach. relational MultiImprovement dimensiona l Disk space requirement (Gigabytes) 17 10 1 2* 1.7 240 110*

Retrieve the corporate measures 240 Actual Vs Budget, by month (I/Os) Calculation of Variance Budget/Actual for the whole database (I/O time in hours) 237

* This may include the calculation of many other derived data without any additional I/O. Reference: http://dimlab.usc.edu/csci599/Fall2002/paper/I2_P064.pdf

What-if analysis
IF A. You require write access B. Your data is under 50 GB C. Your timetable to implement is 60-90 days D. Lowest level already aggregated E. Data access on aggregated level F. Youre developing a general-purpose application for inventory movement or assets management THEN Consider an MDD /MOLAP solution for your data mart IF A. Your data is over 100 GB B. You have a "read-only" requirement C. Historical data at the lowest level of granularity D. Detailed access, long-running queries E. Data assigned to lowest level elements THEN Consider an RDBMS/ROLAP solution for your data mart. IF A. OLAP on aggregated and detailed data B. Different user groups C. Ease of use and detailed data THEN Consider an HOLAP for your data mart

Examples
ROLAP
Telecommunication startup: call data records (CDRs) ECommerce Site Credit Card Company

MOLAP
Analysis and budgeting in a financial department Sales analysis

HOLAP
Sales department of a multi-national company Banks and Financial Service Providers

Tools: Warehouse Servers


The RDBMS dominates:
Oracle 8i/9i IBM DB2 Microsoft SQL Server Informix (IBM) Red Brick Warehouse (Informix/IBM) NCR Teradata Sybase

Tools: OLAP Servers


Support multidimensional OLAP queries Often characterized by how the underlying data stored Relational OLAP (ROLAP) Servers
Data stored in relational tables Examples: Microstrategy Intelligence Server, MetaCube (Informix/IBM)

Multidimensional OLAP (MOLAP) Servers


Data stored in array-based structures Examples: Hyperion Essbase, Fusion (Information Builders)

Hybrid OLAP (HOLAP)


Examples: PowerPlay (Cognos), Brio, Microsoft Analysis Services, Oracle Advanced Analytic Services

DOLAP: Brio.Enterprise BusinessObjects Cognos PowerPlay MOLAP SAS CFO Vision Comshare Decision Hyperion Essbase PowerPlay Enterprise Server ROLAP Cartesis Carat MicroStrategy HOLAP Oracle Express Seagate Holos Speedware Media/M Microsoft OLAP Services

This list is neither all inclusive nor complete. Product classification and vendor classification might vary.
Source: OLAP architectures, http://www.olapreport.com/Architectures.htm

Tools: Extraction, Transformation, & Load (ETL)


Cognos Accelerator Copy Manager, Data Migrator for SAP, PeopleSoft (Information Builders) DataPropagator (IBM) ETI Extract (Evolutionary Technologies) Sagent Solution (Sagent Technology) PowerMart (Informatica)

Tools: Report & Query


Actuate e.Reporting Suite (Actuate) Brio One (Brio Technologies) Business Objects Crystal Reports (Crystal Decisions) Impromptu (Cognos) Oracle Discoverer, Oracle Reports QMF (IBM) SAS Enterprise Reporter

Tools: Data Mining


BusinessMiner (Business Objects) Decision Series (Accrue) Enterprise Miner (SAS) Intelligent Miner (IBM) Oracle Data Mining Suite Scenario (Cognos)

www.microstrategy.com www.businessobjects.com www.cognos.com www.brio.com www.hyperion.com www.oracle.com/ip/analyze/warehouse/bus_intell www.microsoft.com/sql/techinfo/datawarehousing.htm

Potrebbero piacerti anche