Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
AND
DATA MINING
By:
Raghav Agrawal
Btech( E&T )
III yr – A
A1607107107
Overview
❚ Introduction
❚ Data Warehousing
❚ Data Warehousing
V/S OLAP
❚ Data Mining
2
Motivation: “Necessity is the
Mother of Invention”
Data explosion problem
3
What is a Data Warehouse?
4
Warehouses are Very Large
Databases
35%
30%
25%
Respondents
20%
15%
10%
Initial
5% Projected 2Q96
6
Data Warehousing --
It is a process
❚ Technique for assembling and
managing data from various
sources for the purpose of
answering business
questions. Thus making
decisions that were not
previous possible
❚ A decision support database
maintained separately from
the organization’s operational
database 7
Characteristics of Data Warehouse
❚ A data warehouse is a
❙ subject-oriented
❙ integrated
❙ time-varying
❙ non-volatile
collection of data that is used primarily in
organizational decision making.
8
Data Warehouse Architecture
Relational
Databases
Optimized Loader
Extraction
ERP
Cleansing
Systems
Data Warehouse
Engine Analyze
Query
Metadata Repository
9
Application Areas
Industry Application
Finance Credit Card Analysis
Insurance Claims, Fraud Analysis
Telecommunication Call record analysis
Transport Logistics management
Consumer goods promotion analysis
Data Service providersValue added data
Utilities Power usage analysis
10
What makes data mining possible?
12
DISADVANTAGES OF DATA
WAREHOUSES
❙ Data warehouses are not the optimal
environment for unstructured data.
❙ There is an element of latency in data
warehouse data.
❙ Data warehouses can have high costs.
Maintenance costs are high.
❙ Data warehouses can get outdated relatively
quickly.
13
So, what’s different b/w OLTP &
DW?
Object 5
OLTP vs Data Warehouse
❚ OLTP ❚ Warehouse
❙ Application ❙ Subject Oriented
Oriented ❙ Used to analyze
❙ Used to run business
business ❙ Summarized and
❙ Detailed data refined
❙ Current up to date ❙ Snapshot data
❙ Isolated Data ❙ Integrated Data
15
OLTP V/S Data Warehouse
16
To summarize DW & OLTP...
❚ OLTP Systems are
used to “run” a
business
❚ The Data
Warehouse helps
to “optimize” the
business
17
What Is Data Mining?
❚ The objective of data mining is to extract
valuable information from your data, to discover
the “hidden gold.”
❚ that you do not need a data warehouse to
successfully use data mining—all you need is
data.
❚ On-Line Analytical Processing (OLAP)- DM tool.
18
DATA MINING MODELS
Acc. To IBM
❚ Verification Model
❚ . Discovery Model
19
Steps for Data Mining
❚ Identify
❙ Find sales relationships between specific
products or services
❙ Identify specific purchasing patterns over
time
❙ Identify potential types of customers
❙ Find product sales trends.
20
❚ Select
❙ Are the data adequate to describe the
phenomena the data mining analysis is
attempting to model?
❙ Can you enhance internal customer records with
external lifestyle and demographic data?
❙ Are the data stable—will the mined attributes be
the same after the analysis?
❙ If you are merging databases can you find a
common field for linking them?
❙ How current and relevant are the data to the
business goal?
21
❚ Prepare
❙ Establish strategies for handling missing data,
extraneous noise, and outliers
❙ Identify redundant variables in the dataset and
decide which fields to exclude
❙ Decide on a log or square transformation, if
necessary
❙ Visually inspect the dataset to get a feel for
the database
❙ Determine the distribution frequencies of the
data
22
❚ Audit the data
❙ What is the ratio of categorical/binary
attributes in the database?
❙ What is the nature and structure of the
database?
❙ What is the overall condition of the
dataset?
26
Data Mining Algorithms
❚ Neural Networks
❚ Decision Trees
27
Neural Network
28
Decision Trees
29
Thank you !!! 30