Sei sulla pagina 1di 14

CLUSTERING

DENSITY-BASED METHODS
Elsayed Hemayed
Data Mining Course
Outline
2

 Density-Based Clustering Methods


 Density-Based Clustering Background
 Terminology
 How does DBSCAN find clusters?
 DBSCAN

Density-based Clustering Methods


Clustering Methods
3

 Partitioning methods
 K-Means
 Hierarchical methods
 Agglomerative Hierarchical Clustering
 Divisive hierarchical clustering
 Density-based methods
 DBSCAN: a Density-Based Spatial Clustering of Applications with Noise
 Grid-based methods
 STING: A Statistical Information Grid Approach to Spatial Data Mining
 Model-based methods
 Expectation-Maximization
 Neural Network Approach
 High Dimensional Data Clustering
 CLIQUE: A Dimension-Growth Subspace Clustering Method

Density-based Clustering Methods


4 Density-based Clustering Methods
DBSCAN

Density-based Clustering Methods


Density-Based Clustering Methods
5

 Clustering based on density, such as density-connected points instead


of distance metric.
 Cluster = set of “density connected” points.
 Major features:
 Discover clusters of arbitrary shape
 Handle noise
 Need “density parameters” as termination condition- (when no new
objects can be added to the cluster.)

 Example:
 DBSCAN (Ester, et al. 1996)
 OPTICS (Ankerst, et al 1999)
 DENCLUE (Hinneburg & D. Keim 1998)

Density-based Clustering Methods


Density-Based Clustering: Background
6

 Eps neighborhood: The neighborhood within a radius


Eps of a given object
 MinPts: Minimum number of points in an Eps-neighborhood
of that object.
 Core object: If the Eps neighborhood contains at least a
minimum number of points Minpts, then the object is a core
object
 Directly density-reachable: A point p is directly density-
reachable from a point q wrt. Eps, MinPts if
 1) p is within the Eps neighborhood of q
 2) q is a core object p MinPts = 5
q
Density-based Clustering Methods
Eps = 1
Density Reachability and Density
7
Connectivity
 M, P, O and R are core objects since each is in an
Eps neighborhood containing at least 3 points

Minpts = 3
Eps=radius
of the
circles
Density-based Clustering Methods
Directly density reachable
8

Q is directly density reachable from M.


 M is directly density reachable from P and vice versa.

Density-based Clustering Methods


Indirectly density reachable
9

 Q is indirectly density reachable from P since Q is


directly density reachable from M and M is directly
density reachable from P. But, P is not density
reachable from Q since Q is not a core object.

Density-based Clustering Methods


Core, border, and noise points
10

 DBSCAN is a Density-Based Spatial Clustering of


Applications with Noise
 Density = number of points within a specified radius (Eps)

 A point is a core point if it has a specified number (or more)


of points (MinPts) within Eps
 These are points that are at the interior of a cluster.

 A border point has fewer than MinPts within Eps, but is in the
neighborhood of a core point.

 A noise point is any point that is not a core point nor a


border point.

Density-based Clustering Methods


How does DBSCAN find clusters?
11

 DBSCAN searches for clusters by checking the Eps-


neighborhood of each point in the database.
 If the Eps-neighborhood of a point p contains more than
MinPts, a new cluster with p as a core object is created.
 DBSCAN then iteratively collects directly density-
reachable objects from these core objects, which may
involve the merge of a few density-reachable clusters.
 The process terminates when no new point can be
added to any cluster

Density-based Clustering Methods


DBSCAN Algorithm
12

 Arbitrary select a point p


 Retrieve all points density-reachable from p wrt Eps
and MinPts.
 If p is a core point, a cluster is formed.

 If p is a border point, no points are density-reachable


from p and DBSCAN visits the next point of the
database.
 Continue the process until all of the points have been
processed.

Density-based Clustering Methods


DBSCAN Summary
13

 DBSCAN is A Density-Based Clustering Method Based


on Connected Regions with Sufficiently High Density
 The algorithm grows regions with sufficiently high
density into clusters and discovers clusters of arbitrary
shape in spatial databases with noise.
 It defines a cluster as a maximal set of density-
connected points. So distance is not the metric unlike
the case of hierarchical methods.

Density-based Clustering Methods


Summary
14

 Density-Based Clustering Methods


 Density-Based Clustering Background

 Terminology

 How does DBSCAN find clusters?

 DBSCAN

Density-based Clustering Methods

Potrebbero piacerti anche