Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Concepts and
Techniques
Pattern Recognition
Spatial Data Analysis
create thematic maps in GIS by clustering
feature spaces
detect spatial clusters and explain them in
spatial data mining
Image Processing
Economic Science (especially market research)
WWW
Document classification
Cluster Weblog data to discover groups of similar
access patterns
Examples of Clustering
Applications
Scalability
Data Structures
Data matrix
(two modes)
x11
...
x
i1
...
x
n1
Dissimilarity matrix
(one mode)
...
x1f
...
x1p
...
...
...
...
xif
...
...
xip
...
...
... xnf
...
...
...
xnp
d(2,1)
0
d(3,1) d ( 3,2) 0
:
:
:
... 0
7
Interval-scaled variables
Binary variables
Standardizing data
s f 1n (| x1 f m f | | x2 f m f | ... | xnf m f |)
Where
m f nthe
(x1 f standardized
x2 f ... xnf )
Calculate
measurement (z-score)
.
xif m f
zif
sf
Then use distances/similarities
based on standardized scores
Examples: longitude, latitude coordinates, when you cluster houses,
weights, heights and weather temperatures
Real/Interval-valued variables
(continuous measurements)
10
q
q
Some popular
d (i, j) q (| ones
x x |include:
| x x Minkowski
| q ... | x x |distance:
)
i1
j1
i2
j2
ip
jp
Manhattan distance
d (i, j) | x x | | x x | ... | x x |
i1 j1 i2 j 2
ip jp
11
Euclidean distance:
d (i, j) (| x x | 2 | x x | 2 ... | x x |2 )
i1
j1
i2
j2
ip
jp
12
Binary/Nominal Variables
/categorical variable
13
Ordinal Variables
14
15
1.
2.
3.
4.
5.
16
Example
10
9
8
7
6
5
10
10
4
3
2
1
0
0
K=2
Arbitrarily choose
K object as initial
cluster center
10
Assign
each
objects
to
most
similar
center
3
2
1
0
0
10
4
3
2
1
0
0
reassign
10
10
2
1
0
0
10
reassign
Update
the
cluster
means
10
Update
the
cluster
means
4
3
2
1
0
0
10
17
Weakness
18
Hierarchical Clustering
a
b
Step 1
ab
abcde
cde
de
e
Step 4
December 11, 2015
agglomerative
(AGNES)
Step 3
divisive
(DIANA)
19
20
Min
Average
distance
distance
Max
distance
Their Centroids.
21
Record 1
Name : carla
Prediction: yes
Age: 21
Balance: 2300$
Income: high
Eyes: blue
Gender: F
Record 2
Name : carl
Prediction: no
Age: 27
Balance: 5400$
Income: high
Eyes: brown
Gender: M
22
example
23
CHURN ANALYSIS
CHURN ANALYSIS
Data Mining Goal
IDENTIFIED CUSTOMER WITH DELIQUENT NATURE.
Scope
Assign Churn Score to all customers in order to identify those who are
most likely to churn (Quarter etc).
Define Clearly segments that are strongly divided by their churn relating
Behavior
CHURN ANALYSIS
Basic Understanding
1. Customer Request
2. Forced Churn (Defaulters)
CHURN ANALYSIS
Information Sources
Call Statistics (CDR)
Credit History
Billing History
Revenue History
Payment History
Survey Data
Demographic data
Complaint information
CHURN ANALYSIS
Suggested Analysis
Pareto analysis
Also called 80/20 Analysis. Its been observed that 80% of the
revenue profit comes from 20 % of the customer. Key
Business Improvement was identifying those 20% and serves
them better.
Techniques/ Reports/Algorithms
CHURN ANALYSIS
Suggested Analysis
Loyalty Analysis
Techniques/ Reports/Algorithms
Characterization and summarization, Top 10 report , List ,Cross tab Reports
Graph Charts etc.
CHURN ANALYSIS
Suggested Analysis
Customer Profit Analysis
Techniques/ Reports/Algorithms
Characterization and summarization, Top 10 report , List ,Cross tab Reports ,
Graph Charts etc.
CHURN ANALYSIS
Suggested Analysis
Trend Analysis
CHURN ANALYSIS
Suggested Analysis
Customer profiling
Inactive accounts, Light user, risky customer Active accounts, Loss making,
profit making accounts. This segment helps in mapping with the predictive
segment.
Techniques/ Reports/Algorithms
List, Cross Tab, clustering , Graph Charts
CHURN ANALYSIS
Suggested Analysis
LTV Analysis
Called Life Time value Analysis .Revenue projected over 25 yrs and
Projected Churning loss and rate.
Techniques/ Reports/Algorithms
CHURN ANALYSIS
Suggested Analysis
Churn Modeling
Techniques/ Reports/Algorithms
CHURN ANALYSIS
Suggested Analysis
Survival Analysis
This predicts how long the customer would continue with existing
service in terms of time. What measures can be taken. One of the
Popular Technique are K. Hazard Analysis .
Techniques/ Reports/Algorithms
K.Hazard technique
CHURN ANALYSIS
Suggested Approach