7 Clusters R

Computing Clusters
Unsupervised Learning
Luc Anselin
http://spatial.uchicago.edu
Copyright © 2016 by Luc Anselin, All Rights Reserved

• dimension reduction
• classical clustering methods
• spatially-constrained clustering

Dimension Reduction

Principles

• Curse of dimensionality
• in a nutshell
• low dimensional techniques break down in high

dimensions
• complexity of some functions increases

exponentially with variable dimension

Example 1
change with p (variable dimension) of distance in unit cube  

required to reach fraction of total data volume
Source: Hastie, Tibshirani, Friedman (2009)

Example 2
nearest neighbor distance in one vs 2 dimensions
Source: Hastie, Tibshirani, Friedman (2009)

• Dimension reduction
• reduce multiple variables into a smaller number
of functions of the original variables
• principal component analysis (PCA)
• visualize the multivariate similarity (distance)

between observations in a lower dimension
• multidimensional scaling (MDS)

Principal Components Analysis (PCA)

• Principle
• capture the variance in the p by p covariance
matrix X’X through a set of k principal
components with k << p
• principal components capture most of the variance
• principal components are orthogonal
• principal component coefficients (loadings) are

scaled

• More formally
• ci = ai1 x1 + ai2 x2 + … + aip xp
• each principal component is a weighted sum of the
original variables
• ci’cj = 0
• components are orthogonal to each other
• Σk aik2 = 1
• the sum of the squared loadings equals one
• computation
• matrix decomposition

• Typical results of interest
• loadings for each principal component
• the contribution of each of the original variables to
that component
• principal component score

• the value of the principal component for each
observation
• variance proportion explained

• the proportion of the overall variance each
principal component explains

• Example
• 77 Community Areas in Chicago (2014)
• 12 health indicator variables (all % or rates)
• teen birth rate, pre-term births, infant mortality

rate, Gonnorrhea, breast cancer, lung cancer,
colorectal cancer, prostate cancer, lead, diabetes,
stroke, tuberculosis

first four principal components with their loadings

biplot
scatter plot of first two principal components

each vector (arrow) shows relative loadings for that variable
variance explained by each component

how many components?
elbow in scree plot

mapping principal components (pc1)
neighbors in multivariate space are not necessarily 
neighbors in geographical space
(Hyde Park and Near North Side)

Multidimensional Scaling (MDS)

• Principle
• n observations are points in a p-dimensional
data hypercube
• p-variate distance or dissimilarity between all

pairs of points  
(e.g., Euclidean distance in p dimensions)
• represent the n observations in a lower

dimensional space (at most p - 1) while
respecting the pairwise distances

• More formally
• n by n distance or dissimilarity matrix D
• dij = ||xi - xj|| Euclidean distance in p dimensions
• find values z1, z2, … zn in k-dimensional space

(with k << p) that minimize the stress function
• S(z) = Σi,j ( dij - ||zi - zj|| )2
• least squares or Kruskal-Shephard scaling

MDS representation of 12 dimensions into 2

neighbors in multivariate space vs neighbors in geographical space

Classical Clustering
Methods

• Principle
• grouping of similar observations
• maximize within-group similarity
• minimize between-group similarity
• or, maximize between-group dissimilarity
• each observation belongs to one  

and only one group

• Issues
• similarity criterion
• Euclidean distance, correlation
• how many groups
• many “rules of thumb”
• computational challenges
• combinatorial problem, NP hard
• n observations in k groups
• kn possible partitions
• k = 4 with n = 77, kn = 2.3 x 1046
• no guarantee of a global optimum

• Two main approaches
• hierarchical clustering
• start from bottom
• determine number of clusters later
• partitioning clustering (k-means)
• start with random assignment to k groups
• number of clusters pre-determined
• many clustering algorithms

Hierarchical Clustering

• Algorithm
• find two observations that are closest  
(most similar)
• they form a cluster
• determine the next closest pair

• include the existing clusters in the comparisons
• continue grouping until all observations have

been included
• result is a dendrogram
• a hierarchical tree structure

2 5 2 5 2 5
3 3 3
7 7 7
Variable2
Variable2
Variable2
1 1 1
4 8 4 8 4 8
2 3 2 3 5 7
Variable1 Variable1 Variable1
2 and 3 are the closest points. They become a cluster. 5 and 7 are the closest points. They become a cluster. 1 and the cluster of 2 and 3 are the closest points.
2 5 2 5 2 5
3 3 3
7 7 7
Variable2
Variable2
Variable2
1 1 1
4 8 4 8 4 8
2 3 1 5 7 2 3 1 4 5 7 2 3 1 4 5 7 8
4 and the cluster of 1, 2, and 3 are the closest points. 8 and the cluster of 5 and 7 are the closest points. The two remaining clusters are the closest points.
Stop ("cut") here

2 5 2 5 for two clusters 2 5
3 3 3
7 7 7 Stop ("cut") here
Variable2
Variable2
Variable2
1 1 1 for four clusters
4 8 4 8 4 8
2 3 1 4 5 7 8 2 3 1 4 5 7 8 2 3 1 4 5 7 8
The algorithm has finished. Rewind algorithm to reveal desired number of clusters.
hierarchical clustering algorithm

Source: Grolemund and Wickham (2016)

• Practical issues
• measure of similarity (dissimilarity) between
clusters = linkage
• complete
• compact clusters
• single
• elongated clusters, singletons
• average
• centroid
• others …
• how many clusters
• where to cut the tree

complete single average centroid
2 2 2 2
3 3 3 3
4 4 4 4
δ x x5
1 5 1 5 1 5 1
δ δ δ
types of linkages

complete linkage dendrogram
(using first four principal components)

Complete linkage hierarchical cluster maps
k=4
k=6

single linkage dendrogram
(using first four principal components)

Single linkage hierarchical cluster maps
k=4
k=6

k-Means Clustering

• Algorithm
• randomly assign n observations to k groups
• compute group centroid (or other

representative point)
• assign observations to closest centroid
• iterate until convergence

x x
x x x x
Randomly assign points to k groups. (Here k = 3). Compute centroid of each group. Reassign each point to group of closest centroid.
x x x
x
x x
x x x
Re-compute centroid of each group. Reassign each point to group of closest centroid. Re-compute centroid of each group.
x x
x x
x x
Reassign each point to group of closest centroid. Re-compute centroid of each group. Stop when group membership ceases to change.
k-means clustering algorithm


• Practical issues
• which k to select ?
• compare solutions on within-group and  
between-group similarities
• sensitivity to starting point

• use several random assignments and pick the best
• avoid local optima

• sensitivity analysis
• replicability
• set random seed

k-means clustering on four major principal components
note: labeling is irrelevant

K-means cluster maps
k=4
k=6

Spatially-Constrained
Clustering

• Regionalization
• grouping contiguous objects that are similar into
new aggregate areal units
• multiple objectives
• classical clustering
• within-group similarity, between-group dissimilarity
• spatial similarity
• only contiguous objects in same group
• shape
• compactness

• Solution strategies
• classical clustering with updates
• multi-objective approach
• automatic zoning
• graph-based approaches
• explicit optimization

• Classical clustering with updates
• start with hierarchical clustering or k-means
solution
• split/combine clusters that are not contiguous
• inefficient approach
• number of cluster indeterminate

• Multi-objective approach
• introduce location (x, y) as variables within the
clustering routing
• assign weights to similarity objective vs spatial

objective
• difficult to set weights

• Automatic Zoning
• AZP
• automatic zoning procedure (Openshaw and Rao)
• heuristic
• starts from random initial feasible solutions
• optimization (NP-hard problem)

• Graph-based approaches
• represent the contiguity structure of the objects
as a graph
• graph pruning
• e.g., using minimum spanning tree (SKATER)
• maximize internal similarity objective

• Explicit optimization
• formulate as an integer programming problem
• decision variables to allocate object i to region j
• formalize adjacency constraints
• typically as a graph representation
• several heuristics

• Example: SKATER
• Spatial Kluster analysis be Tree Edge Removal
• Assuncao et al (2006)
• algorithm
• construct minimum spanning tree from adjacency

graph
• prune the tree (cut edges) to achieve maximum

internal homogeneity

Chicago Community Areas - neighbor structure (as a tree)

Chicago Community Areas - minimum spanning tree (MST)

Chicago Community Areas - pruned tree

Contiguity Constrained Clusters
k=4
k=6

• Issues
• many algorithms/heuristics
• different search spaces
• different conceptualization of attribute similarity
• different consideration of spatial contiguity
• additional constraints
• districting: target population size
• number of clusters
• exogenous
• endogenous, max-p region problem

7 Clusters R

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

7 Clusters R

Caricato da

Copyright:

Formati disponibili

Computing Clusters

Copyright © 2016 by Luc Anselin, All Rights Reserved

Copyright © 2016 by Luc Anselin, All Rights Reserved

Copyright © 2016 by Luc Anselin, All Rights Reserved

Copyright © 2016 by Luc Anselin, All Rights Reserved

• low dimensional techniques break down in high

• complexity of some functions increases

Copyright © 2016 by Luc Anselin, All Rights Reserved

change with p (variable dimension) of distance in unit cube

Source: Hastie, Tibshirani, Friedman (2009)

Copyright © 2016 by Luc Anselin, All Rights Reserved

nearest neighbor distance in one vs 2 dimensions

Source: Hastie, Tibshirani, Friedman (2009)

Copyright © 2016 by Luc Anselin, All Rights Reserved

• principal component analysis (PCA)

• visualize the multivariate similarity (distance)

• multidimensional scaling (MDS)

Copyright © 2016 by Luc Anselin, All Rights Reserved

Copyright © 2016 by Luc Anselin, All Rights Reserved

• principal components capture most of the variance

• principal components are orthogonal

• principal component coefficients (loadings) are

Copyright © 2016 by Luc Anselin, All Rights Reserved

Copyright © 2016 by Luc Anselin, All Rights Reserved

• principal component score

• variance proportion explained

Copyright © 2016 by Luc Anselin, All Rights Reserved

• 12 health indicator variables (all % or rates)

• teen birth rate, pre-term births, infant mortality

Copyright © 2016 by Luc Anselin, All Rights Reserved

Copyright © 2016 by Luc Anselin, All Rights Reserved

scatter plot of first two principal components

Copyright © 2016 by Luc Anselin, All Rights Reserved

Copyright © 2016 by Luc Anselin, All Rights Reserved

Copyright © 2016 by Luc Anselin, All Rights Reserved

Copyright © 2016 by Luc Anselin, All Rights Reserved

• p-variate distance or dissimilarity between all

• represent the n observations in a lower

Copyright © 2016 by Luc Anselin, All Rights Reserved

• dij = ||xi - xj|| Euclidean distance in p dimensions

• find values z1, z2, … zn in k-dimensional space

• S(z) = Σi,j ( dij - ||zi - zj|| )2

• least squares or Kruskal-Shephard scaling

Copyright © 2016 by Luc Anselin, All Rights Reserved

Copyright © 2016 by Luc Anselin, All Rights Reserved

Copyright © 2016 by Luc Anselin, All Rights Reserved

Copyright © 2016 by Luc Anselin, All Rights Reserved

• maximize within-group similarity

• minimize between-group similarity

• or, maximize between-group dissimilarity

• each observation belongs to one

Copyright © 2016 by Luc Anselin, All Rights Reserved

• no guarantee of a global optimum

Copyright © 2016 by Luc Anselin, All Rights Reserved

• start from bottom

• determine number of clusters later

• partitioning clustering (k-means)

• start with random assignment to k groups

• number of clusters pre-determined

• many clustering algorithms

Copyright © 2016 by Luc Anselin, All Rights Reserved

• determine the next closest pair

• continue grouping until all observations have

change with p (variable dimension) of distance in unit cube  

• each observation belongs to one