Sei sulla pagina 1di 4

On-Line Analytical Processing

If you want to analyze data, then On-Line Analytical Processing (OLAP) is often the best way to
organize the data. OLAP organization provides several benefits:
Fast

Data is organized for rapid query and analysis.


The database structure uses efficiency multidimensional or tuned
relational approaches.

Visual

Tools enable the analyst to navigate and view results through


graphics such as: bar charts, pie charts and tree structures.

Multidimensional

Supports "slicing and dicing" along multiple dimensions such as


product, customer and location.
In addition, supports "pivot" / "cross tabs" where the investigator
changes the direction of the analysis.

Aggregation

Supports both drill down to details as well as roll up.


Some aggregations may be pre-calculated to save analysis time.
This pre-calculation is where MOLAP provides value. See below.

Time Series

Supports trend analysis. Most data marts include a calendar


dimension. This dimension supports time hierarchies: year, quarter,
month, week, day of week, and day.

Ranking

Find the top, bottom or quartile members of a group, such as the


top 10 most profitable products or the 10 lease profitable sales
territories.

Clusters and Outliers

Gain an understanding of groupings of items with common


characteristics (clusters) as well as item with unusual
characteristics (outliers).

Complex Criteria

Gain an understanding of root causes and patterns by using


complex criteria to look at meaningful slices of data.

OLAP Cubes Support Business Intelligence


OLAP helps to visualize data as cube structures. A cube
is a multidimensional structure consisting of dimensions
and measurements. Cells are the points where dimensions
intersect and contain the measurements.Dimensions
provide the context for analysis which are used for labels
on reports and selection criteria for queries. Dimensions
answer questions like:

Who (customers, employees, partners, ...)


When (year, quarter, month, ...)

What (products, contracts, ...)

Where (state, zip code, territory, ...)

How (method, process, formula, ...)

Cells supply quantitative information. Cells answers questions like:

How many (customer count, inventory count, ...)


How much (revenue amount, budget amount, ...)

An atomic measurement is one that is stored at the lowest level, such as an individual sale or a
single receipt of goods. The benefit of atomic data is that it supports detailed analysis and
atomic data can be summed as needed. The drawbacks of atomic data are that it takes more
space to store and it requires time to aggregate into totals for analysis.
Aggregated data is a summation of atomic data. For example, sales by quarter and rejects by
month are aggregations. The benefit is that query and analysis time are reduced. The
drawbacks are that analysis detail can be lost and it is difficult to predict which aggregations
the analyst will want to use.
Data mining
Data mining can discover information hidden within valuable data assets. Knowledge discovery,
using advanced information technologies, can uncover veins of surprising, golden insights in a
mountain of factual data. Data mining consists of panoply of powerful tools which are intuitive,
easy to explain, understandable, and simple to use. These advanced information technologies
include artificial intelligence methods (e.g. expert systems, fuzzy logic, etc.), decision trees,
rule induction methods, genetic algorithms and genetic programming, neural networks and
clustering techniques.
The synergy created between data warehousing and data mining allows knowledge seekers to
leverage their massive data assets, thus improving the quality and effectiveness of their
decisions. The growing requirements for data mining and real time analysis of information will be
a driving force in the development of new data warehouse architectures and methods and,
conversely, the development of new data mining methods and applications.

Data mining terms

analytical model

A structure and process for analyzing a dataset. For example, a decision


tree is a model for the classification of a dataset.

anomalous data

Data that result from errors (for example, data entry keying errors) or that
represent unusual events. Anomalous data should be examined carefully
because it may carry important information.

artificial neural
networks

Non-linear predictive models that learn through training and resemble


biological neural networks in structure.

CART

Classification and Regression Trees. A decision tree technique used for


classification of a dataset. Provides a set of rules that you can apply to a
new (unclassified) dataset to predict which records will have a given
outcome. Segments a dataset by creating 2-way splits. Requires less data
preparation than CHAID.

CHAID

Chi Square Automatic Interaction Detection. A decision tree technique


used for classification of a dataset. Provides a set of rules that you can
apply to a new (unclassified) dataset to predict which records will have a
given outcome. Segments a dataset by using chi square tests to create
multi-way splits. Preceded, and requires more data preparation than,
CART.

classification

The process of dividing a dataset into mutually exclusive groups such that
the members of each group are as "close" as possible to one another, and
different groups are as "far" as possible from one another, where distance
is measured with respect to specific variable(s) you are trying to predict.
For example, a typical classification problem is to divide a database of
companies into groups that are as homogeneous as possible with respect
to a creditworthiness variable with values "Good" and "Bad."

clustering

The process of dividing a dataset into mutually exclusive groups such that
the members of each group are as "close" as possible to one another, and
different groups are as "far" as possible from one another, where distance
is measured with respect to all available variables.

data cleansing

The process of ensuring that all values in a dataset are consistent and
correctly recorded.

data mining

The extraction of hidden predictive information from large databases.

data navigation

The process of viewing different dimensions, slices, and levels of detail of


a multidimensional database. See OLAP.

data visualization

The visual interpretation of complex relationships in multidimensional


data.

data warehouse

A system for storing and delivering massive quantities of data.

decision tree

A tree-shaped structure that represents a set of decisions. These decisions


generate rules for the classification of a dataset. See CART and CHAID.

dimension

In a flat or relational database, each field in a record represents a


dimension. In a multidimensional database, a dimension is a set of similar
entities; for example, a multidimensional sales database might include the
dimensions Product, Time, and City.

exploratory data
analysis

The use of graphical and descriptive statistical techniques to learn about


the structure of a dataset.

Potrebbero piacerti anche