Sei sulla pagina 1di 74

Prof.

chandan Singhavi
17/08/2014 2
What is
On-Line: A process controlled by a computer.
Analytical Processing needs Analytical Data.
Analytical Data: Data that involve analysis.
Analytical Data consist of Business Data.
Business Data: Time, Customers, Sales, Stores,
Products, etc.
Business Data
Analytical Data
Analytical Processing
Client
17/08/2014 5
Products
Time
Customers
Possible Views of Sale
How many Products sold at Time to
specific Customer(s)?
How many Customers bought at
specific Time the Product(s)?
At which Time(s) the Customer(s)
bought the specific Product(s)?
Sale
Interactive, exploratory analysis of
multidimensional data to discover patterns

age
a
c
c
i
d
e
n
t
s
g
e
n
d
e
r
A definition:



Data representation is in the form of a CUBE
OLAP goes beyond SQL with its analysis
capabilities
Key feature of OLAP: Relevant multi-
dimensional views such as products, time,
geography

Online analytical processing is a category of
software technology that enables analysts,
manager and executives to gain insight into
data through fast consistent, interactive
access in a wide variety of possible views of
information that has been transformed from
raw data to reflect the real dimensionality of
the enterprise as understood by the user.
Advanced data analysis environment
Supports decision making, business modeling,
and operations research activities
Characteristics of OLAP
Use multidimensional data analysis techniques
Provide advanced database support
Provide easy-to-use end-user interfaces
Support client/server architecture
Facilitate interactive query and complex analysis for the
user
Allow drill down or roll up
Ability to perform intricate calculations and comparisons
Present result in meaningful ways like chart graphs
August 17, 2014 Data Mining: Concepts and Techniques 10
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date
detailed, flat relational
isolated
historical,
summarized, multidimensional
integrated, consolidated
usage repetitive ad-hoc
access read/write
index/hash on prim. key
lots of scans
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response


Multidimensional conceptual view
Transparency
Accessibility
Consistent reporting performance
Client server architecture
Generic dimensionality
Dynamic sparse matrix handling
Multiuser support
Unrestricted cross dimensional operations
Intuitive data manipulation
Flexible reporting
Unlimited dimensions and aggregation levels



17/08/2014
Theodoros CHRYSAFIS - Academix
s3ctit03 -
www.city.academic.gr/academix 14
OLAP Taxonomy
Multi-dimensional OLAP (MOLAP)
A k-dimensional matrix based on a non relational storage
structure. Agrawal et al.
Relational OLAP (ROLAP)
A relational back-end wherein operations of the data are
translated to relational queries. Agrawal et al.
Hybrid OLAP (HOLAP)
Integration of MOLAP and ROLAP.
Desktop OLAP (DOLAP)
Provides a specific cube for analysis. Simplified version of
MOLAP or ROLAP.
OLAP functionality to
multidimensional databases (MDBMS)
Stored data in multidimensional data
cube
N-dimensional cubes called
hypercubes
Cube cache memory speeds
processing
Affected by how the database system
handles density of data cube called
sparsity
OLAP functionality
Uses relational DB query tools
Extensions to RDBMS
Multidimensional data schema support
Data access language and query performance
optimized for multidimensional data
Support for very large databases (VLDBs)
General features
Basic features are
Multidimensional analysis
Consistent performance
Fast response time
Drill down and roll up
Navigation in and out of details
Slice and dice rotation
Multiple view modes
Easy scalability
Time intelligence
Star Structure (quite common)
The
Cube
Three-
Dimensional
Cube
Display
Page Columns
Region:
North
Sales
Red
blob
Blue
blob
Total
1996
Rows 1997
Year Total
Six-
Dimensional
Cube
Dimension Example
Brand Mt. Airy
Store Atlanta
Customer segment Business
Product group Desks
Period January
Variable Units sold
MDS structure
A hypercube is general metaphor for
representing multidimensional data


Region Sales variance
Africa 105%
Asia 57%
Europe 122%
North America 97%
Pacific 85%
South America 163%
Nation Sales variance
China 123%
Japan 52%
India 87%
Singapore 95%
Just a snippet from http://www.olapreport.com/ProductsIndex.htm ; not an endorsement
Advance Database Techniques 28
The database is stored in a special structure that is
optimized for multidimensional analysis.

Data is aggregated and stored according to predicted usage

Very fast query response time as data is mostly pre-calculated

Systems are best used when data is desired for a specific
application

Tight Coupling between application and presentation layer


MOLAP
Advance Database Techniques 29
Practical limit on the size
- time taken to calculate the database & the
space - required to hold these pre-calculated
values
- Good for smaller storage space (< 50 GB)

Navigation of Data is limited

Costly to maintain

Does not scale well



MOLAP
Advance Database Techniques 32
Advantages :

Excellent performance:
MOLAP cubes are built for fast data retrieval, and is
optimal for slicing and dicing operations.

Can perform complex calculations:
All calculations have been pre-generated when the
cube is created. Hence, complex calculations are not only
doable, but they return quickly.

Advance Database Techniques 33

Disadvantages:

Handles limited data :
Because all calculations are performed when
the cube is built, it is not possible to include a large
amount of data in the cube itself.

Requires additional investment :
Cube technology are often proprietary and do
not already exist in the organization. Therefore, to
adopt MOLAP technology chances, additional
investments in human and capital resources are
needed.
MOLAP

Advance Database Techniques 34
ROLAP is an alternative to the MOLAP technology.

ROLAP differs significantly in that it does not
require the pre-computation and storage of
information.

ROLAP tools access the data in a relational
database and generate SQL queries to calculate
information at the appropriate level when an End
user requests it

It is possible to create additional
database(summary tables and aggregation) tables
which is summarize the data at any desired
combination of dimensions.

Advance Database Techniques 35
The database is a standard relational
database and the database model is a
multidimensional model, often referred to
as a star or snowflake model or schema.
17/08/2014
Theodoros CHRYSAFIS - Academix
s3ctit03 -
www.city.academic.gr/academix 36
ROLAP
A multi-dimensional user view on relational
data storage using Star or Snowflake
Database Schemata.
Product
Dimension
Time
Dimension
Region
Dimension
Customer
Dimension
Product
Dimension
Year
Dimension
Country
Dimension
Customer
Dimension
Sales
Customer
Characteristics
Product
Kind
Region
Month
Snowflake
Schema
Sales
Star Schema
17/08/2014
Theodoros CHRYSAFIS - Academix
s3ctit03 -
www.city.academic.gr/academix 37
ROLAP
Advantages: Easy to understand, easy to
model, easy to implement.
Further Research on dynamic optimisation, on
meta-models, on functional extensions for
the ROLAP engines, on user-defined
functions for the OLAP.
Advance Database Techniques 39

Advantages:

Can handle large amounts of data:
The data size limitation of ROLAP technology
is the limitation on data size of the underlying
relational database. In other words, ROLAP itself
places no limitation on data amount.
Can leverage functionalities inherent in the
relational database:
Often, relational database already comes with
a host of functionalities. ROLAP technologies,
since they sit on top of the relational database,
can therefore leverage these functionalities.
Easy to understand, easy to model, easy to
implement.


Advance Database Techniques 40
Disadvantages:

Performance can be slow:
Because each ROLAP report is essentially a SQL
query (or multiple SQL queries) in the relational database,
the query time can be long if the underlying data size is
large.
Limited by SQL functionalities:
Because ROLAP technology mainly relies on
generating SQL statements to query the relational
database, and SQL statements do not fit all needs (for
example, it is difficult to perform complex calculations
using SQL), ROLAP technologies are therefore traditionally
limited by what SQL can do. ROLAP vendors have mitigated
this risk by building into the tool out-of-the-box complex
functions as well as the ability to allow users to define their
own functions.

ROLAP
Advance Database Techniques 41

ROLAP v/s MOLAP
AND
HOLAP


Advance Database Techniques 43
Relational vs.
Multidimensional OLAP
Advance Database Techniques 45
a hybrid of ROLAP and MOLAP
can be thought of as a virtual database
whereby the higher levels of the database are
implemented as MOLAP and the lower levels of
the database as ROLAP
HOLAP
Advance Database Techniques 46
A system, which supports (and integrates)
multi-dimensional and relational storage for
data in an equivalent manner in order to
benefit from the corresponding characteristics
and optimization techniques.
Advantages:
use of best techniques introduced on
MOLAP and ROLAP, transparency between
MOLAP and ROLAP systems.
HOLAP Contd
Advance Database Techniques 47
Development Issues
Results in lots of data redundancy
It allows users to build custom cubes causing data
inconsistencies
Only limited amounts of Data can be maintained
efficiently
Almost all systems utilize HOLAP to some
respects
HOLAP Contd
Advance Database Techniques 48
DOLAP (Desktop OLAP)

The previous terms are used to refer to server based OLAP
technologies

DOLAP enables users to quickly pull together small cubes that run on
their desktops or laptops
2014.08.17. OLAP operations 51
Roll up (drill-up): summarize data
by climbing up hierarchy or by dimension reduction
Drill down (roll down): reverse of roll-up
from higher level summary to lower level summary or
detailed data, or introducing new dimensions
Slice and dice:
project and select
Pivot (rotate):
reorient the cube, visualization, 3D to series of 2D
planes.
Other operations
drill across: involving (across) more than one fact table
drill through: through the bottom level of the cube to its
back-end relational tables (using SQL)
2014.08.17. OLAP operations 52
2014.08.17. OLAP operations 53
Roll up (drill-up): summarize
data
by climbing up
hierarchy or by
dimension reduction

Drill down (roll down): reverse
of roll-up
from higher level
summary to lower level
summary or detailed
data, or introducing
new dimensions
2014.08.17. OLAP operations 54
OLAP operations II.
Slice and dice:
project and select
Pivot (rotate):
reorient the cube,
visualization, 3D to
series of 2D planes.
Other operations
drill across:
involving (across)
more than one fact
table
drill through:
through the bottom
level of the cube to
its back-end
relational tables
(using SQL)
55
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
Add up amounts for day 1
In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
81
56
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
Add up amounts by day
In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date
ans date sum
1 81
2 48
57
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
Add up amounts by day, product
In SQL: SELECT date, sum(amt) FROM SALE
GROUP BY date, prodId
sale prodId date amt
p1 1 62
p2 1 19
p1 2 48
drill-down
rollup
58
Operators: sum, count, max, min,
median, avg
Having clause
Using dimension hierarchy
average by region (within store)
maximum by month (within date)
59
day 2
c1 c2 c3
p1 44 4
p2
c1 c2 c3
p1 12 50
p2 11 8
day 1
c1 c2 c3
p1 56 4 50
p2 11 8
c1 c2 c3
sum 67 12 50
sum
p1 110
p2 19
129
. . .
drill-down
rollup
Example: computing sums
60
day 2
c1 c2 c3
p1 44 4
p2
c1 c2 c3
p1 12 50
p2 11 8
day 1
c1 c2 c3
p1 56 4 50
p2 11 8
c1 c2 c3
sum 67 12 50
sum
p1 110
p2 19
129
. . .
sale(c1,*,*)
sale(*,*,*)
sale(c2,p2,*)
61
c1 c2 c3 *
p1 56 4 50 110
p2 11 8 19
* 67 12 50 129
day 2 c1 c2 c3 *
p1 44 4 48
p2
* 44 4 48
c1 c2 c3 *
p1 12 50 62
p2 11 8 19
* 23 8 50 81
day 1
*
sale(*,p2,*)
62
day 2
c1 c2 c3
p1 44 4
p2
c1 c2 c3
p1 12 50
p2 11 8
day 1
region A region B
p1 56 54
p2 11 8
customer
region
country
(customer c1 in Region A;
customers c2, c3 in Region B)
63
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
day 2
c1 c2 c3
p1 44 4
p2
c1 c2 c3
p1 12 50
p2 11 8
day 1
Multi-dimensional cube:
Fact table view:
c1 c2 c3
p1 56 4 50
p2 11 8
Advance Database Techniques 64

Slicing is selecting a group of cells from
the entire multidimensional array by
specifying a specific value for one or more
dimensions.
Dicing involves selecting a subset of cells by
specifying a range of attribute values.
This is equivalent to defining a sub
array from the complete array.
In practice, both operations can also be
accompanied by aggregation over some
dimensions.
Slicing and Dicing

68
20
23
18
19
20
21
22
23
25
26
id name age
1 joe 20
2 fred 20
3 sally 21
4 nancy 20
5 tom 20
6 pat 25
7 dave 21
8 jeff 26
.

.

.

age
index
data
records
Customers(custid : integer, name string,
gender boolean, rating : integer)

Sparse column

70
sale prodId storeId date amt
p1 c1 1 12
p2 c1 1 11
p1 c3 1 50
p2 c2 1 8
p1 c1 2 44
p1 c2 2 4
Combine SALE, PRODUCT relations
In SQL: SELECT * FROM SALE, PRODUCT
product id name price
p1 bolt 10
p2 nut 5
joinTb prodId name price storeId date amt
p1 bolt 10 c1 1 12
p2 nut 5 c1 1 11
p1 bolt 10 c3 1 50
p2 nut 5 c2 1 8
p1 bolt 10 c1 2 44
p1 bolt 10 c2 2 4
71
product id name price jIndex
p1 bolt 10 r1,r3,r5,r6
p2 nut 5 r2,r4
sale rId prodId storeId date amt
r1 p1 c1 1 12
r2 p2 c1 1 11
r3 p1 c3 1 50
r4 p2 c2 1 8
r5 p1 c1 2 44
r6 p1 c2 2 4
join index
Bitmapped join index

File organisation


Web based OLAP

Web OLAP approaches
Browser plug ins
Precreated HTML documents
OLAP in the server
OLAP engine design

Dependence on the RDBMS
Dependence on engine
Intelligent OLAP engine

Potrebbero piacerti anche