Clustering For Semantic Labels - Wk9

Computational Intelligence:
Methods and Applications
Lecture 20
Clustering to form semantic concepts –

Linguistic Labels
SCE, NTU, Singapore

Overview
• Interpretability of fuzzy representation.
• What is clustering for semantic labels?
• Histogram analysis
• LVQ (Linear Vector Quantization)
• FCM ((Fuzzy y C-Means))
• FKP (Fuzzy Kohonen Partitioning)
• PFKP (Pseudo Fuzzy Kohonen
Partitioning)
Semantic Label Clustering
• Semantic properties of a linguistic variable
– linguistic variable L: (L
(L, T(L),
T(L) U,
U G,
G M)
where L is the name of the variable; T(L) is the linguistic term set
of L; U is a universe of discourse; G is a syntactic rule which
generates T(L); and M is a semantic rule that associates each
T(L) with its meaning.
– Each linguistic term set is characterized by a fuzzy

set which is described using a membership function
1 1
0.8 0.8
0.6 0.6
μG(x)
μT(x)
0.4 0.4 Gaussian MF

Trapezoidal MF 0.2 0.2
0
0 0 2 4 6 8 10
0 2 4 6 8 10 x
x
Example of a Linguistic Variable
• linguistic variable x named L=“performance”
• five linguistic
g terms
where T(L)={“very small”, “small”, “medium”, “large” and “very large”}.
• Semantic assignment M is shown in the figure – normal and convex
• Semantic ordering
ordering such that “very small”≺ “small”≺ “medium” ≺ “large” ≺ “very large”.
• universe of discourse very very
small small medium large large
U=[0, 100] of the base variable x 1
μT(x) 0.8
≺
0.6
0.4
02
0.2
0
0 20 40 60 80 100
x (performance)
Criteria of Interpretability
p y
• Coverage: MFs cover the entire universe
of discourse
• Normalised: if ∃ x∈Xi such that
μX ( x) = 1
i
• Convex:
(
x ≤ y ≤ z ⇒ μ X i ( y ) ≥ min μ X i ( x ) , μ X i ( z ) )
• Ordered: X1 ≺ X 2 Xj ≺ Xn
X1 ≺ X 2 denotes X1 precedes X2
Clustering
• Clustering is a method that organizes patterns
into clusters such that patterns within a cluster
are more similar to each other than patterns in
other clusters.
• When the crisp partition in classical clustering
analysis is replaced with a fuzzy partition or a
fuzzy pseudo-partition, it is referred to as fuzzy
clustering
• Examples:
E l LVQ (K
(Kohonen),
h ) FCM (B (Bedzek),
d k)
MLVQ (Ang and Quek), DIC (Tung and Quek)
etc.
etc
Example: Particle Classification
• Particles on an air filter
P1:
P2:
P3:
10/7/2008 7
Histogram Analysis
P1
P1: P2
P2: P3
P3:
P1
mber
mber
P2
P2
Num
Num
P3
Area Area
10/7/2008 8
Histogram Analysis
P1
P1: P2
P2: P3
P3:
P1
P2 P3
meter
mber
Perim
Num
P3
P2
Perimeter Area
Sample Data Sets
• Sample data is divided into two disjoint
sets:
– Design set (or training set) is used for
designing a classifier
– Test set (or cross
cross-validation
validation set) is used for
evaluating the obtained classifier
• Sample data is usually represented by an
m by (n+1) matrix, where m is the number
of sample data entries and n is the number
of features.
Sample data set
Features Class
Area Perimeter Class

3 6 P1
5 7 P1 Design set:
4 4 P1 Odd-indexed entries
7 6 P1
12 11 P2
15 10 P2 Test set:
14 12 P2 Even-indexed entries
17 13 P2
14 19 P3
13 20 P3
15 22 P3
12 18 P3
… … …
Flowchart for Histogram
Analysis
General flowchart: Particle example:
Feature From image to

extraction f t
features
D t
Data
None
reduction
Probability
Histogram analysis
estimate
Histogram Analysis
• Number of intervals vs
vs. number of data
points
50 samples
p
from a 3 bins
Gaussian
distribution
10 bins 25 bins
Histogram
g Analysis
y
• Properties:
– One of those nonparametric techniques which do not
require explicit use of density functions
– Dilemma between no. of intervals vs. no. of points
– Rule of thumb: no. of intervals is equal to the square
root of no. of points
– Intervals may be unequally spaced
– To convert to density functions, the total area must be
unityy
– Can be used in any number of features, but subjected
to curse of dimensionality
Kernel and Window Estimators
σ = 0.1
σ = 0.3
Kernel and Window Estimators
• Properties:
– Also known as Parzen estimator
– Its computation is similar to convolution
– Can be used in multi-features estimation
– Width is found by trial and error
1
normal optimal smoothing strategy ⎛ 4 ⎞ 5
h opt
=⎜ ⎟ σ
σ denotes the standard deviation of the distribution ⎝ 3n ⎠
W. Bowman and A. Azzalini. Applied Smoothing Techniques for Data Analysis:
The Kernel Approach with S-Plus Illustrations. New York:Oxford University Press, 1997.
Learning Vector Quantization
• LVQ are unsupervised neural networks that determine the weights
for cluster centers in an iterative and sequential manner
• Each output neuron has a weight vector vj
that is adjusted during learning. v 1
x1 y1
• The winner,, whose weightg has
.
the minimum distance from the x2 .
w
input, updates its weights and w j2
v
j1
j
yj
.
those of its neighbors . wji
w
.
jn
• Repeated until the weights are xi wm2

w
.
m1
v m
ym
forced to stabilize through the w mi
. w winner
mn
.
specification of a learning rate .
.
vc
xn yc
input output
layer layer
LVQ – Cont’d
x − vi(T ) = min
j
( x − v (jT ) ) for j = 1..c
(T +1)
⎧
⎪ v(T )
+ α (T )
( x − v(T )
j ) if j = i
vj = ⎨ j
⎪⎩ v (T )
j if j ≠ i
||x y|| is the euclidean distance
||x-y|| distance, c is the number of
clusters, x is the input vector, vi is the ith cluster centre
and α is the learning constant
Pseudo Code: (1) Define number of clusters c and small
terminating condition ε (2) Initialise weights (3)
Determining g winning
g neuron based on distance ((4))
Update winner: v i(T) = v i(T-1) + α i(T ) ( xk − vi(T −1) ) for i ≤ N (T)
(5) Determine terminating condition, else repeat with
new vector t
Fuzzy
y C-Means ((FCM – Bezdek))
• A fuzzy pseudo-partition of a finite data set
X is defined: ∑ μ i ( xk ) = 1 for all k = 1..n
c
i =1
n
0 < ∑ μi ( xk ) < n for all i = 1..c
1 c
k =1
• An objective function for fuzzy clustering is

(m defines the degree of fuzziness):
n c
J m (Ρ) = ∑∑ ( μi ( xk )) xk − vi
m 2
k =1 i =1
FCM – Cont
Cont’d
d
• Pseudo Code:
– Define number of clusters (c), degree of fuzziness (m)
and terminating condition (ε)
– Init t and pseudo parition p0
– Compute cluster centres: v1, v2, …vi … vc
n
∑ ( μ ( x ))
i k
m
xk
vi(T ) = k =1
n
for i = 1..c
∑ (μ ( x ))
k =1
i k
m
FCM – Cont
Cont’d
d2
• Pseudo Code:
– Update new Pseudo Partition:
−1
⎛ 1
⎞
c ⎛ x − v(T ) ⎞
2 m−1
⎜ ⎟
μi(T +1) ( xk ) = ⎜ ∑ ⎜ ⎟ for i = 1..c, k = 1..n
k i
⎟
⎜⎜ j =1 ⎜⎝ xk − v(jT ) ⎟
2
⎠ ⎟⎟
⎝ ⎠
– Compare distance between the partitions E= pt+1 – pt
c n
E = Ρ (T +1) − Ρ (T ) = ∑ ∑ μi(T +1) ( xk ) − μi(T ) ( xk )
i =1 k =1
– Terminate if E < ε is
Examples: Iris data
5.0 3.0
2.5
4.0
Sepal width (cm)

2.0
Petal width (cm)

3.0
1.5
2.0
1.0
Iris virginica
1.0
0.5 Iris versicolor
I i sentosa
Iris t
0.0 0.0
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
Sepal length (cm) Petal length (cm)
180 50 150 50
170 45 45
140
40 40
160
Num ber of mistakes

ber of mistakes
35 35
150 130
30
Total error
30
Total error
140
25 120 25
130 20
20
T
Numb
120 110
15
15
110 10
10 100
5 e
100 5 e
Mistakes
Mistakes 90 0
90 0
0 5 10 15 20 25
0 10 20 30 40 50 60
Number of iterations
Number of iterations
Error trend:LVQ α=0.005, δ=0.8, ε=0.0001 Error trend:FCM m=1.5, ε=0.0001

Properties of FCM
• FCM is a non-sequential
non sequential, batch
batch-learning
learning
optimization algorithm.
• Computationally and memory intensive
intensive.
• Unable to perform on-line training.
• performance depends on a good choice of
weighting exponential m.
Results of FCM
1.0 1.0
mbership degree (x)

0.8 0.8
m bership degree μ(x)
0.6 0.6
0.4 0.4
Mem
Mem
0.2 0.2
0.0 0.0
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9 2 2.4 2.8 3.2 3.6 4 4.4
Sepal length (cm) Sepal width (cm)
1.0 1.0
Membership degree (x))
Membership degree (x))

0.8 0.8
0.6 0.6
0.4 0.4
sentosa
0.2 0.2
versicolor
virginica
0.0 0.0
1 2 3 4 5 6 0.1 0.5 0.9 1.3 1.7 2.1 2.5
Petal length (cm) Petal width (cm)
IRIS data set - FCM with m=1.5, ε=0.0001
trapezoidal –like membership functions

Results FCM – Cont
Cont’d
d
1.0 1.0
bership degree (x)

0.8 0.8
bers hip degree μ (x)
0.6 0.6
0.4 0.4
Mem b
Memb
0.2 0.2
0.0 0.0
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9 2 2.4 2.8 3.2 3.6 4 4.4
1.0 1.0
embership degree (x)
embership degree (x)

0.8 0.8
0.6 0.6
0.4 0.4
sentosa
02
0.2 02
0.2
Me
Me
versicolor
virginica
0.0 0.0
1 2 3 4 5 6 0.1 0.5 0.9 1.3 1.7 2.1 2.5
IRIS dataset - FCM with m=2 0 ε=0.0001

m=2.0, =0 0001
Gaussian-like membership functions

Results MLVQ (Gaussian)
vi − vclosest
• Width of gaussian MFi wi =
σ
1.0 1.0
M em bership degreee (x)

0.8 0.8
Mem bers hip degreee μ (x)
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9 2 2.4 2.8 3.2 3.6 4 4.4
1.0 1.0
M em bership degrree (x)
M em bership degrree (x)
0.8 0.8
0.6 0.6
0.4 0.4
sentosa
0.2 0.2
versicolor
virginica
0.0 0.0
1 2 3 4 5 6 01
0.1 05
0.5 00.99 13
1.3 17
1.7 21
2.1 25
2.5
MLVQ with λ=0.02, σ=1.5, ε=0.0001

Results MLVQ (Gaussian) –cont’d
cont d
vi − vclosest
• Width of gaussian MFi wi =
σ
1.0 1.0
M em bership deggree (x)

0.8 0.8
Mem bers hip deggree μ (x)
06
0.6 06
0.6
0.4 0.4
0.2 0.2
0.0 0.0
43
4.3 47
4.7 51
5.1 55.55 5.9
5 9 6.36 3 6.7
67 71
7.1 75
7.5 79
7.9 2 24
2.4 22.88 32
3.2 36
3.6 4 44
4.4
1.0 1.0
0.8 0.8
06
0.6 06
0.6
0.4 0.4
sentosa
0.2 0.2
versicolor
virginica
0.0 0.0
1 2 3 4 5 6 01
0.1 05
0.5 00.99 13
1.3 17
1.7 21
2.1 25
2.5
MLVQ with λ=0.02, σ=3.0, ε=0.0001

Trapezoidal MF
• A trapezoidal membership function μ(x)
can be described by a fuzzy interval
formed by four parameters (α (α, β,
β γ,
γ δ) and
a centroid v μ(x)
• This fuzzy interval is 1
also known as a
trapezoidal fuzzy number
x
0 α β v γ δ
Trapezoidal MF – Cont
Cont’d
d
• The subinterval [β, γ] where μ(μ(x)=1
) is called the kernel of
the fuzzy interval, and the subinterval [α, δ] is called the
support.
[β γ]=kernel
• [β, ] k l off the
th fuzzy
f interval,
i t l andd
• [α, δ]=support of the fuzzy interval.
• MLVQ algorithm can be ⎧0 if x < α or x > δ
used to derive the centroid v ⎪ α−x
⎪ if α ≤ x ≤ β
• it cannot derive the ⎪ α −β
μ (x) = ⎨
parameters (α, β, γ, δ) ⎪1 if β ≤ x ≤ γ
of the trapezoidal-shaped ⎪ δ −x
⎪ if γ ≤ x ≤ δ
membership function ⎩ δ −γ
The Fuzzy Kohonen Partition
algorithm - supervised
• Define:
– c as the number of classes,
– λ≤1/Ω as the learning constant, where Ω=number of data vectors
in a cluster
cluster,
– η as the learning width and a small positive number ε as a
stopping criterion; n=total number of data vectors
• I iti li weights:
Initialise i ht + 1
( )
i
vi(0) = min ( xk ) + 2 max ( x ) − min ( x )
k k
k c k k
for i = 1..c, k = 1..n
• Determine the ith cluster that the data xk belongs and

Update the weights vi of the ith cluster
algorithm – supervised (cont’d)
• C
Compute t error to
t cluster
l t andd difference
diff iin error b
between
t
iteration: n
e(T +1) = ∑ xk − vi(T +1)
k =1
(T +1) (T +1)
de =e −e (T )
• Repeat: while ¬ (de(T+1)≤ε )
– End of determining centroids

• Initialize
I iti li α i = β i = δ i = γ i = ϕ i = vi(T +1) for i = 1..c
– where ϕi is the pseudo weight of vi.
i=1 i=2 i=3
• Determine the ith cluster that the data xk belongs

g and
Update the pseudo weights ϕi of the ith cluster
ϕ i = ϕ i + η (xk − ϕ i )
• U
Update
d t the
th four
f points
i t off the
th Trapezoidal
T id l Fuzzy
F Number
N b
(TrFN)
α i = m in (α i , x k )
β i = m in ( β i , ϕ i )
γ i = m ax ( γ i , ϕ i )
δ i = m ax (δ i , x k )
algorithm – Results
1.0 1.0
gree (x)
0.8 0.8
gree μ(x)
06
0.6 06
0.6
Membership deg
Mem bership deg
0.4 0.4
0.2 0.2
0.0 0.0
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9 2 2.4 2.8 3.2 3.6 4 4.4
1.0 1.0
degree (x)
degree (x)
0.8 0.8
0.6 0.6
Membership d
Membership d
0.4 0.4
sentosa
0.2 0.2
versicolor
virginica
0.0 0.0
1 2 3 4 5 6 0.1 0.5 0.9 1.3 1.7 2.1 2.5
P t l length
Petal l th ((cm)) P t l width
Petal idth ((cm))
FKP with λ=0.02, η=0, ε=0.0005

algorithm – Results
1.0 1.0
ee (x)
ee (x)
0.8 0.8
Membership degre
Membership degre
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9 2 2.4 2.8 3.2 3.6 4 4.4
1.0 1.0
ee (x)
0.8 ee (x) 0.8

M em bership degre
M em bership degre
0.6 0.6
0.4 0.4
sentosa
0.2 0.2
versicolor
virginica
0.0 0.0
1 2 3 4 5 6 0.1 0.5 0.9 1.3 1.7 2.1 2.5
FKP with λ=0.02, η=0.5, ε=0.0005

Pseudo Fuzzy Kohonen Partition algorithm
• PFKP operates similarly to FKP except in the
determination of the trapezoidal MF parameters:
ϕ
• Initialize i = α i = β i = δ i = γ i = vi
(T +1)
for i = 1..c
1 c
– where ϕi is the pseudo weight of vi.
• Determine TF parameters:
– Determine winner centroid: j
(
xk − ϕi = min xk − ϕ j )
for j = 1..c
– Update pseudo weight of winner: ϕ i = ϕ i + η (x k − ϕ i )
⎧min(αi , xk ) for i =1
– Update
p Trapezoidal
p Fuzzy
y Number αi = ⎨
⎩ γi−1 for i >1
βi = min(βi ,ϕi )
γi = max((γi ,ϕi )
⎧max(δi , xk ) for i = c
δi = ⎨
⎩ βi+1 for i < c
PFKP - Results
1.0 1.0
M em bership degree (x)
M em bership degree (x)

0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9 2 2.4 2.8 3.2 3.6 4 4.4
1.0 1.0
M em berrship degree (x )
M em berrship degree (x )
0.8 0.8
0.6 0.6
0.4 0.4
sentosa
0.2 0.2
versicolor
virginica
0.0 0.0
1 2 3 4 5 6 0.1 0.5 0.9 1.3 1.7 2.1 2.5
PFKP with α=0.02, η=0, ε=0.0005

PFKP – Results (cont
(cont’d)
d)
1.0 1.0
embership degree (x))
embership degree (x))

0.8 0.8
0.6 0.6
0.4 0.4
02
0.2 02
0.2
Me
Me
0.0 0.0
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9 2 2.4 2.8 3.2 3.6 4 4.4
1.0 1.0
bership degree (x)
bership degree (x)

0.8 0.8
0.6 0.6
0.4 0.4
Memb
Memb
sentosa
0.2 0.2
versicolor
virginica
0.0 0.0
1 2 3 4 5 6 0.1 0.5 0.9 1.3 1.7 2.1 2.5
with λ=0.02, η=0.01, ε=0.0005

Concluding Remarks
• Fuzzy Kohonen Partition (FKP) and the Pseudo
Fuzzy Kohonen Partition (PFKP), were proposed
to directly derive appropriate membership
functions from training data.
• Both algorithms directly derive trapezoidal
membership functions that are convex and
normal from training data while the latter derive
membership functions that forms a pseudo
pseudo-
partition of the input space

Clustering For Semantic Labels - Wk9

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Clustering For Semantic Labels - Wk9

Caricato da

Copyright:

Formati disponibili

Computational Intelligence:

Methods and Applications

Clustering to form semantic concepts –

SCE, NTU, Singapore

– Each linguistic term set is characterized by a fuzzy

0.4 0.4 Gaussian MF

Area Perimeter Class

Feature From image to

• Repeated until the weights are xi wm2

• An objective function for fuzzy clustering is

Sepal width (cm)

Petal width (cm)

Num ber of mistakes

Error trend:LVQ α=0.005, δ=0.8, ε=0.0001 Error trend:FCM m=1.5, ε=0.0001

mbership degree (x)

Membership degree (x))

IRIS data set - FCM with m=1.5, ε=0.0001

trapezoidal –like membership functions

bership degree (x)

embership degree (x)

IRIS dataset - FCM with m=2 0 ε=0.0001

Gaussian-like membership functions

M em bership degreee (x)

M em bership degrree (x)

MLVQ with λ=0.02, σ=1.5, ε=0.0001

M em bership deggree (x)

M em bership deggree (x)

MLVQ with λ=0.02, σ=3.0, ε=0.0001

• This fuzzy interval is 1

for i = 1..c, k = 1..n

• Determine the ith cluster that the data xk belongs and

• Repeat: while ¬ (de(T+1)≤ε )

– End of determining centroids

i=1 i=2 i=3

• Determine the ith cluster that the data xk belongs

FKP with λ=0.02, η=0, ε=0.0005

0.8 ee (x) 0.8

FKP with λ=0.02, η=0.5, ε=0.0005

M em bership degree (x)

PFKP with α=0.02, η=0, ε=0.0005

embership degree (x))

bership degree (x)

with λ=0.02, η=0.01, ε=0.0005

Potrebbero piacerti anche