Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
If you want to analyze data, then On-Line Analytical Processing (OLAP) is often the best way to
organize the data. OLAP organization provides several benefits:
Fast
Visual
Multidimensional
Aggregation
Time Series
Ranking
Complex Criteria
An atomic measurement is one that is stored at the lowest level, such as an individual sale or a
single receipt of goods. The benefit of atomic data is that it supports detailed analysis and
atomic data can be summed as needed. The drawbacks of atomic data are that it takes more
space to store and it requires time to aggregate into totals for analysis.
Aggregated data is a summation of atomic data. For example, sales by quarter and rejects by
month are aggregations. The benefit is that query and analysis time are reduced. The
drawbacks are that analysis detail can be lost and it is difficult to predict which aggregations
the analyst will want to use.
Data mining
Data mining can discover information hidden within valuable data assets. Knowledge discovery,
using advanced information technologies, can uncover veins of surprising, golden insights in a
mountain of factual data. Data mining consists of panoply of powerful tools which are intuitive,
easy to explain, understandable, and simple to use. These advanced information technologies
include artificial intelligence methods (e.g. expert systems, fuzzy logic, etc.), decision trees,
rule induction methods, genetic algorithms and genetic programming, neural networks and
clustering techniques.
The synergy created between data warehousing and data mining allows knowledge seekers to
leverage their massive data assets, thus improving the quality and effectiveness of their
decisions. The growing requirements for data mining and real time analysis of information will be
a driving force in the development of new data warehouse architectures and methods and,
conversely, the development of new data mining methods and applications.
analytical model
anomalous data
Data that result from errors (for example, data entry keying errors) or that
represent unusual events. Anomalous data should be examined carefully
because it may carry important information.
artificial neural
networks
CART
CHAID
classification
The process of dividing a dataset into mutually exclusive groups such that
the members of each group are as "close" as possible to one another, and
different groups are as "far" as possible from one another, where distance
is measured with respect to specific variable(s) you are trying to predict.
For example, a typical classification problem is to divide a database of
companies into groups that are as homogeneous as possible with respect
to a creditworthiness variable with values "Good" and "Bad."
clustering
The process of dividing a dataset into mutually exclusive groups such that
the members of each group are as "close" as possible to one another, and
different groups are as "far" as possible from one another, where distance
is measured with respect to all available variables.
data cleansing
The process of ensuring that all values in a dataset are consistent and
correctly recorded.
data mining
data navigation
data visualization
data warehouse
decision tree
dimension
exploratory data
analysis