Data Mining-Density and Grid Methods

CLUSTERING DENSITY AND GRID BASED
Density based methods

Clusters dense regions of objects
Low density regions Noise
DBSCAN
Density Based Spatial Clustering of Applications
with Noise
OPTICS
Ordering Points To Identify the Clustering
Structure
DENCLUE
DENsity Based CLUstEring
DBSCAN
Cluster maximal set of density connected points
Grows regions with sufficiently high density into
clusters
-neighborhood
MinPts and Core object
Directly Density Reachable
An object p is directly density reachable from
o
-neighborhood of q
and q is a core object
Density Reachable
An object p is density reachable from q, if there is
a chain of objects p1, pn, p1=q and pn=p such

that pi+1 is directly density reachable from pi
Density Connected
An object p is density connected to object
q if
there is an object o such that both p and q are density
reachable from o.
Arbitrarily select a point p

Retrieve all points density-reachable from p
If p is a core point, a cluster is formed.
If p is a border point, no points are density-reachable
from p, then DBSCAN visits the next point of the
database.
Continue the process until all of the points have been
processed.
2
Complexity : O(n log n) / O(n )
OPTICS: A Cluster-Ordering Method
OPTICS: Ordering Points To Identify the Clustering
Structure
Produces a special order of the database with
respect to its density-based clustering structure
Good for both automatic and interactive cluster
analysis, including finding intrinsic clustering
structure
Can be represented graphically or using
visualization techniques
In DBSCAN, for a constant MinPts value, density
based clusters with respect to a higher density (lower
value of ) are completely contained in lower density
sets.
DBSCAN is extended so that Objects are processed
in a specific order.
Selects an object that is density-reachable with
Core distance of an object p

that makes {p} a core object
Reachability distance of an object q with respect
to p = max (core-distance of p, d(p,q))
Complexity : O(n log n)

DENCLUE: using density functions
DENsity-based CLUstEring
Major features
Solid mathematical foundation
Good for data sets with large amounts of noise
Allows a compact mathematical description of
arbitrarily shaped clusters in high-dimensional
data sets
Significantly faster than existing algorithm (faster
than DBSCAN by a factor of up to 45)
But needs a large number of parameters
Influence function: describes the impact of a data
point within its neighborhood.
d
x, y objects in F d-dimensional input space
Influence of object y on x is:
Can be determined by distance:
Overall density of the data space can be calculated as
the sum of the influence function of all data points.
Clusters can be determined mathematically by
identifying density attractors.
Density attractors are local maximal of the overall
density function.
Density attractor Local maxima of overall density
function
A point x is said to be density attracted to a density
attractor x* if there exists a set of points x0, x1,..xk

such that x0 = x and xk =x* and the gradient of xi-1
is in the direction of xi
Center defined clusters
For a density attractor x* - a subset of points that
are density attracted by x* and where density
function x* is no less than threshold
Others are outliers
Arbitrary shape cluster
Set of density attractors and set of Cs
There should be a path from each density attractor
to another where density function value for each
point is no less that
Denclue Grid Based Methods
Uses a Multi-resolution grid data structure
Quantizes space into a finite number of cells that
form a grid structure
Fast processing time
STING
WaveCluster
CLIQUE CLustering In QUEst
STING
STatistical Information Grid
Spatial area is divided into rectangular cells
Several levels of cells at different levels of
resolution
High level cell is partitioned into several lower level
cells
Statistical attributes are stored in cell
Mean, Maximum, Minimum
Parameters of higher level cells are computed from

those at lower levels
To answer queries
Identify level
Estimate cells relevance to query
Process relevant cells at lower levels
Continue to lowest level
Computation is query independent
Parallel processing supported
Data is processed in a single pass
Quality depends on granularity
WaveCluster
A multi-resolution clustering approach which applies
wavelet transform to the feature space
A wavelet transform is a signal processing
technique that decomposes a signal into different
frequency sub-band.
Both grid-based and density-based
Input parameters:
# of grid cells for each dimension
the wavelet, and the # of applications of wavelet
transform.
Using wavelet transform to find clusters
Summarises the data by imposing a
multidimensional grid structure onto data space
These multidimensional spatial data objects are
represented in a n-dimensional feature space
Apply wavelet transform on feature space to find
the dense regions in the feature space
Apply wavelet transform multiple times which
result in clusters at different scales from fine to
coarse
Quantization Transformation in WaveCluster
Reasons for using Wavelet transformation in
clustering
Unsupervised clustering
It uses filters to emphasize region where points
cluster, but simultaneously to suppress weaker
information in their boundary
Effective removal of outliers
Multi-resolution
Cost efficiency
Major features:
Complexity O(N)
Detect arbitrary shaped clusters at different scales
Not sensitive to noise, not sensitive to input order
Only applicable to low dimensional data
CLIQUE (Clustering In QUEst)
Automatically identifying subspaces of a high
dimensional data space that allow better clustering
than original space
CLIQUE can be considered as both density-based
and grid-based
It partitions each dimension into the same number
of equal length interval
It partitions an m-dimensional data space into
non-overlapping rectangular units
A unit is dense if the fraction of total data points
contained in the unit exceeds the input model
parameter
A cluster is a maximal set of connected dense
units within a subspace

Data Mining-Density and Grid Methods

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Data Mining-Density and Grid Methods

Caricato da

Copyright:

Formati disponibili

CLUSTERING DENSITY AND GRID BASED

Density based methods

a chain of objects p1, pn, p1=q and pn=p such

Arbitrarily select a point p

Core distance of an object p

Complexity : O(n log n)

attractor x* if there exists a set of points x0, x1,..xk

Parameters of higher level cells are computed from

Potrebbero piacerti anche