Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Commercial applications
A chain of radio-stores uses cluster analysis for identifying three different customer types with varying needs. An insurance company is using cluster analysis for classifying customers into segments like the self confident customer, the price conscious customer etc. A producer of copying machines succeeds in classifying industrial customers into satisfied and non-satisfied or quarrelling customers.
Input-data
Cluster: Obs. 1 Obs. 2 Obs. 3 Obs. i Obs, m
2
Observation 1 Observation 2 Observation 3 Observation i Observation m
Output-data
CL1 1 0 1 1 0 CL2 0 1 0 0 1 CL 1 2 1 1 2
X1 X2 Xn
Cluster 1 Classify
rows
Cluster 2
Factor: X1 X2 X3
Obs. 1 Obs. 2 Obs. m
XjXn
4
X1 X2 X3 Xj Xn F1 0,8 0,2 -0,7 0,6 0,0 F2 -0,1 0,7 0,1 -0,2 0,5 Fj 0,0 -0,1 0,1 0,0 -0,6
Factor 1
Figure 11.1
Independence Methods: We do not assume that any variable(s) is (are) caused by or determined by others. Basically, we only have X1, X2 .Xn (but no Y)
Independence Methods: The model is defined aposteriori (after the survey and/or estimation has been carried out) Examples: Cluster Analysis, Factor Analysis etc. When using independence methods we let the data speak for themselves!
X1 (Price)
95 90 80 85 . . .. . .
X2 (Price Competitor)
100 80 75 90 . . . . . .
X3 (Adverting)
300.000 200.000 200.000 250.000 . . . . . .
X3
1 4 3 2 . . . . . .
X3
3 2 5 4 . . . . . .
Cluster 1
Cluster 2 Cluster 3
Obs1 Obs2 Obs3 Obs4 Obs5 Obs6 Obs7 Obs8 Obs9 Obs10
5 3 2 5 . . . . . .
Cluster analysis: A cross-tab between the clustervariable and background + opinions is established
Cluster 1 Age %-Females Household size Opinion 1 Opinion 2 Opinion 3 32 31 1.4 3.2 2.1 2.2 Cluster 2 44 54 2.9 4.0 3.4 3.3 Cluster 3 56 46 2.1 2.6 3.2 3.0
5 1 4 4 1 3
1 5 2 4 1 3
A B A
8 2 6
Governing principle
Maximization of homogeneity within clusters and simultaneously Maximization of heterogeneity across clusters
Non-overlapping (Exclusive) Methods Hierarchical Non-hierarchical/ Partitioning/k-means - Sequential threshold - Parallel threshold - Neural Networks - Optimized partitioning (8) Variance Methods - Ward (7)
Overlapping Methods Non-hierarchical - Overlapping k-centroids -Overlapping k-means - Latent class techniques - Fuzzy clustering
Agglomerative
Divisive
Centroid Methods
- Between (1)
- Within (2)
- Weighted
- Single
- Ordinary (3)
- Density - Two stage Density
Note: Methods in italics are available In SPSS. Neural networks necessitate SPSS data mining tool Clementine
- Complete (4)
Figure 12.1
Non overlapping
Overlapping *
Single Linkage:
Minimum distance
Complete Linkage:
* Hierarchical Non-hierarchical * * *
Maximum distance
Average Linkage:
Average distance
Agglomerative
Divisive
1a
1a 1c 1b
1b
1b1
* * * * *
* * * * * * *
Centroid method:
2
Figure 12.2
1b2
* * * * *
X
2 2
d =
(x2-x1) + (y2-y1)
Other distances available in SPSS: City-Block uses of absolute differences instead of squared differences of coordinates. Moreover: Minkowski distance, Cosine distance, Chebychev distance, Pearson Correlation.
Euclidean distance
Y B (3, 5)
= 3,61
C *
* F *D
H*
*E
C *
* F *D
H*
*E
Quo vadis, C?
*G A* *B
C * *D
H*
*E
C * *D
H*
*E
C * 12,0
8,5
9,0 9,5
*D
H*
*E
Complete linkage
Minimize longest distance from cluster to point
*G
A* 10,5
*B
C * *D 9,5 H* *E
Average linkage
Minimize average distance from cluster to point
*G
A*
*B
8,5
C * 9,0 *D
H*
*E
Single linkage
Minimize shortest distance from cluster to point *G
A*
*B
7,0
C *
8,5
*D
H*
*E
*
Cluster formation begins
A*
*
B*
*C
All the time the closest observation is put into the existing cluster(s)
Chaining or Snake-like clusters
*
* *
* ** * * Entropy group * * * *
* *
Outliers
Good outlier detection and removal procedure in cases with noisy data sets
Cluster analysis
More potential pitfalls & problems:
Do our data at all permit the use of means? Some methods (i.e. Wards) are biased toward production of clusters with approximately the same number of observations. Other methods (i. e. Centroid) require data as input that are metric scaled. So, strictly speaking it is not allowable to use this algorithm, when clustering data containing interval scales (Likert- or semantic differential scales).
0,68
3
0,42
2
0,92
4
0,58
0,68
3
0,42
2
0,92
4
0,58
0,68
3
0,42
2
0,92
4
0,58
Dendrogram
OBS 1 * OBS 2 * Step 0: Each observation is treated as a separate cluster OBS 3
OBS 4 * OBS 5
0,2
0,4
0,6
0,8
1,0
Dendrogram (Continued)
OBS 1 *
Cluster 1
OBS 2 * Step 1: Two observations with smallest pairwise distances are clustered OBS 3
OBS 4 * OBS 5
OBS 6 *
0,2
0,4
0,6
0,8
1,0
Dendrogram (Continued)
OBS 1 *
Cluster 1
OBS 2 *
Step 2: Two other observations with smallest distances amongst remaining points/clusters are clustered
OBS 3
OBS 4 * OBS 5
Cluster 2
OBS 6 *
0,2
0,4
0,6
0,8
1,0
Dendrogram (Continued)
OBS 1 *
Cluster 1
OBS 2 * OBS 3 Step 3: Observation 3 joins with cluster 1
OBS 4 * OBS 5
Cluster 2
OBS 6 *
0,2
0,4
0,6
0,8
1,0
Dendrogram (Continued)
OBS 1 *
Supercluster
OBS 2 * OBS 3
OBS 4 * Step 4: Cluster 1 and 2 - from Step 3 joint into a Supercluster OBS 5
OBS 6 *
0,2
A single observation remains unclustered (Outlier)
0,4
0,6
0,8
1,0