OLAP

On-Line Analytical Processing
If you want to analyze data, then On-Line Analytical Processing (OLAP) is often the best way to
organize the data. OLAP organization provides several benefits:
Fast
Data is organized for rapid query and analysis.

The database structure uses efficiency multidimensional or tuned
relational approaches.
Visual
Tools enable the analyst to navigate and view results through

graphics such as: bar charts, pie charts and tree structures.
Multidimensional
Supports "slicing and dicing" along multiple dimensions such as

product, customer and location.
In addition, supports "pivot" / "cross tabs" where the investigator
changes the direction of the analysis.
Aggregation
Supports both drill down to details as well as roll up.

Some aggregations may be pre-calculated to save analysis time.
This pre-calculation is where MOLAP provides value. See below.
Time Series
Supports trend analysis. Most data marts include a calendar

dimension. This dimension supports time hierarchies: year, quarter,
month, week, day of week, and day.
Ranking
Find the top, bottom or quartile members of a group, such as the

top 10 most profitable products or the 10 lease profitable sales
territories.
Clusters and Outliers
Gain an understanding of groupings of items with common

characteristics (clusters) as well as item with unusual
characteristics (outliers).
Complex Criteria
Gain an understanding of root causes and patterns by using

complex criteria to look at meaningful slices of data.
OLAP Cubes Support Business Intelligence

OLAP helps to visualize data as cube structures. A cube
is a multidimensional structure consisting of dimensions
and measurements. Cells are the points where dimensions
intersect and contain the measurements.Dimensions
provide the context for analysis which are used for labels
on reports and selection criteria for queries. Dimensions
answer questions like:
Who (customers, employees, partners, ...)

When (year, quarter, month, ...)
What (products, contracts, ...)
Where (state, zip code, territory, ...)
How (method, process, formula, ...)
Cells supply quantitative information. Cells answers questions like:
How many (customer count, inventory count, ...)

How much (revenue amount, budget amount, ...)
An atomic measurement is one that is stored at the lowest level, such as an individual sale or a
single receipt of goods. The benefit of atomic data is that it supports detailed analysis and
atomic data can be summed as needed. The drawbacks of atomic data are that it takes more
space to store and it requires time to aggregate into totals for analysis.
Aggregated data is a summation of atomic data. For example, sales by quarter and rejects by
month are aggregations. The benefit is that query and analysis time are reduced. The
drawbacks are that analysis detail can be lost and it is difficult to predict which aggregations
the analyst will want to use.
Data mining
Data mining can discover information hidden within valuable data assets. Knowledge discovery,
using advanced information technologies, can uncover veins of surprising, golden insights in a
mountain of factual data. Data mining consists of panoply of powerful tools which are intuitive,
easy to explain, understandable, and simple to use. These advanced information technologies
include artificial intelligence methods (e.g. expert systems, fuzzy logic, etc.), decision trees,
rule induction methods, genetic algorithms and genetic programming, neural networks and
clustering techniques.
The synergy created between data warehousing and data mining allows knowledge seekers to
leverage their massive data assets, thus improving the quality and effectiveness of their
decisions. The growing requirements for data mining and real time analysis of information will be
a driving force in the development of new data warehouse architectures and methods and,
conversely, the development of new data mining methods and applications.
Data mining terms
analytical model
A structure and process for analyzing a dataset. For example, a decision

tree is a model for the classification of a dataset.
anomalous data
Data that result from errors (for example, data entry keying errors) or that
represent unusual events. Anomalous data should be examined carefully
because it may carry important information.
artificial neural
networks
Non-linear predictive models that learn through training and resemble

biological neural networks in structure.
CART
Classification and Regression Trees. A decision tree technique used for

classification of a dataset. Provides a set of rules that you can apply to a
new (unclassified) dataset to predict which records will have a given
outcome. Segments a dataset by creating 2-way splits. Requires less data
preparation than CHAID.
CHAID
Chi Square Automatic Interaction Detection. A decision tree technique

used for classification of a dataset. Provides a set of rules that you can
apply to a new (unclassified) dataset to predict which records will have a
given outcome. Segments a dataset by using chi square tests to create
multi-way splits. Preceded, and requires more data preparation than,
CART.
classification
The process of dividing a dataset into mutually exclusive groups such that
the members of each group are as "close" as possible to one another, and
different groups are as "far" as possible from one another, where distance
is measured with respect to specific variable(s) you are trying to predict.
For example, a typical classification problem is to divide a database of
companies into groups that are as homogeneous as possible with respect
to a creditworthiness variable with values "Good" and "Bad."
clustering
The process of dividing a dataset into mutually exclusive groups such that
the members of each group are as "close" as possible to one another, and
different groups are as "far" as possible from one another, where distance
is measured with respect to all available variables.
data cleansing
The process of ensuring that all values in a dataset are consistent and
correctly recorded.
data mining
The extraction of hidden predictive information from large databases.
data navigation
The process of viewing different dimensions, slices, and levels of detail of

a multidimensional database. See OLAP.
data visualization
The visual interpretation of complex relationships in multidimensional

data.
data warehouse
A system for storing and delivering massive quantities of data.
decision tree
A tree-shaped structure that represents a set of decisions. These decisions

generate rules for the classification of a dataset. See CART and CHAID.
dimension
In a flat or relational database, each field in a record represents a

dimension. In a multidimensional database, a dimension is a set of similar
entities; for example, a multidimensional sales database might include the
dimensions Product, Time, and City.
exploratory data
analysis
The use of graphical and descriptive statistical techniques to learn about

the structure of a dataset.

OLAP

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

OLAP

Caricato da

Copyright:

Formati disponibili

On-Line Analytical Processing

Data is organized for rapid query and analysis.

Tools enable the analyst to navigate and view results through

Supports "slicing and dicing" along multiple dimensions such as

Supports both drill down to details as well as roll up.

Supports trend analysis. Most data marts include a calendar

Find the top, bottom or quartile members of a group, such as the

Clusters and Outliers

Gain an understanding of groupings of items with common

Gain an understanding of root causes and patterns by using

OLAP Cubes Support Business Intelligence

Who (customers, employees, partners, ...)

What (products, contracts, ...)

Where (state, zip code, territory, ...)

How (method, process, formula, ...)

Cells supply quantitative information. Cells answers questions like:

How many (customer count, inventory count, ...)

Data mining terms

A structure and process for analyzing a dataset. For example, a decision

Non-linear predictive models that learn through training and resemble

Classification and Regression Trees. A decision tree technique used for

Chi Square Automatic Interaction Detection. A decision tree technique

The extraction of hidden predictive information from large databases.

The process of viewing different dimensions, slices, and levels of detail of

The visual interpretation of complex relationships in multidimensional

A system for storing and delivering massive quantities of data.

A tree-shaped structure that represents a set of decisions. These decisions

In a flat or relational database, each field in a record represents a

The use of graphical and descriptive statistical techniques to learn about

Potrebbero piacerti anche