Sei sulla pagina 1di 39

Computational Intelligence:

Methods and Applications

Lecture 20

Clustering to form semantic concepts –


Linguistic Labels

SCE, NTU, Singapore


Overview
• Interpretability of fuzzy representation.
• What is clustering for semantic labels?
• Histogram analysis
• LVQ (Linear Vector Quantization)
• FCM ((Fuzzy y C-Means))
• FKP (Fuzzy Kohonen Partitioning)
• PFKP (Pseudo Fuzzy Kohonen
Partitioning)
Semantic Label Clustering
• Semantic properties of a linguistic variable
– linguistic variable L: (L
(L, T(L),
T(L) U,
U G,
G M)
where L is the name of the variable; T(L) is the linguistic term set
of L; U is a universe of discourse; G is a syntactic rule which
generates T(L); and M is a semantic rule that associates each
T(L) with its meaning.

– Each linguistic term set is characterized by a fuzzy


set which is described using a membership function
1 1

0.8 0.8

0.6 0.6
μG(x)
μT(x)

0.4 0.4 Gaussian MF


Trapezoidal MF 0.2 0.2

0
0 0 2 4 6 8 10
0 2 4 6 8 10 x
x
Example of a Linguistic Variable
• linguistic variable x named L=“performance”
• five linguistic
g terms
where T(L)={“very small”, “small”, “medium”, “large” and “very large”}.
• Semantic assignment M is shown in the figure – normal and convex
• Semantic ordering
ordering such that “very small”≺ “small”≺ “medium” ≺ “large” ≺ “very large”.
• universe of discourse very very
small small medium large large
U=[0, 100] of the base variable x 1

μT(x) 0.8


0.6

0.4

02
0.2

0
0 20 40 60 80 100
x (performance)
Criteria of Interpretability
p y
• Coverage: MFs cover the entire universe
of discourse
• Normalised: if ∃ x∈Xi such that

μX ( x) = 1
i

• Convex:
(
x ≤ y ≤ z ⇒ μ X i ( y ) ≥ min μ X i ( x ) , μ X i ( z ) )
• Ordered: X1 ≺ X 2 Xj ≺ Xn
X1 ≺ X 2 denotes X1 precedes X2
Clustering
• Clustering is a method that organizes patterns
into clusters such that patterns within a cluster
are more similar to each other than patterns in
other clusters.
• When the crisp partition in classical clustering
analysis is replaced with a fuzzy partition or a
fuzzy pseudo-partition, it is referred to as fuzzy
clustering
• Examples:
E l LVQ (K
(Kohonen),
h ) FCM (B (Bedzek),
d k)
MLVQ (Ang and Quek), DIC (Tung and Quek)
etc.
etc
Example: Particle Classification
• Particles on an air filter

P1:

P2:

P3:

10/7/2008 7
Histogram Analysis
P1
P1: P2
P2: P3
P3:

P1
mber

mber
P2
P2
Num

Num
P3

Area Area
10/7/2008 8
Histogram Analysis
P1
P1: P2
P2: P3
P3:

P1
P2 P3

meter
mber

Perim
Num

P3

P2

Perimeter Area
Sample Data Sets
• Sample data is divided into two disjoint
sets:
– Design set (or training set) is used for
designing a classifier
– Test set (or cross
cross-validation
validation set) is used for
evaluating the obtained classifier
• Sample data is usually represented by an
m by (n+1) matrix, where m is the number
of sample data entries and n is the number
of features.
Sample data set
Features Class

Area Perimeter Class


3 6 P1
5 7 P1 Design set:
4 4 P1 Odd-indexed entries
7 6 P1
12 11 P2
15 10 P2 Test set:
14 12 P2 Even-indexed entries
17 13 P2
14 19 P3
13 20 P3
15 22 P3
12 18 P3
… … …
Flowchart for Histogram
Analysis
General flowchart: Particle example:

Feature From image to


extraction f t
features

D t
Data
None
reduction

Probability
Histogram analysis
estimate
Histogram Analysis
• Number of intervals vs
vs. number of data
points
50 samples
p
from a 3 bins
Gaussian
distribution

10 bins 25 bins
Histogram
g Analysis
y
• Properties:
– One of those nonparametric techniques which do not
require explicit use of density functions
– Dilemma between no. of intervals vs. no. of points
– Rule of thumb: no. of intervals is equal to the square
root of no. of points
– Intervals may be unequally spaced
– To convert to density functions, the total area must be
unityy
– Can be used in any number of features, but subjected
to curse of dimensionality
Kernel and Window Estimators

σ = 0.1

σ = 0.3
Kernel and Window Estimators
• Properties:
– Also known as Parzen estimator
– Its computation is similar to convolution
– Can be used in multi-features estimation
– Width is found by trial and error
1
normal optimal smoothing strategy ⎛ 4 ⎞ 5
h opt
=⎜ ⎟ σ
σ denotes the standard deviation of the distribution ⎝ 3n ⎠
W. Bowman and A. Azzalini. Applied Smoothing Techniques for Data Analysis:
The Kernel Approach with S-Plus Illustrations. New York:Oxford University Press, 1997.
Learning Vector Quantization
• LVQ are unsupervised neural networks that determine the weights
for cluster centers in an iterative and sequential manner
• Each output neuron has a weight vector vj
that is adjusted during learning. v 1
x1 y1
• The winner,, whose weightg has
.
the minimum distance from the x2 .
w
input, updates its weights and w j2
v
j1
j
yj
.
those of its neighbors . wji
w
.
jn

• Repeated until the weights are xi wm2


w
.
m1
v m
ym
forced to stabilize through the w mi

. w winner
mn
.
specification of a learning rate .
.
vc
xn yc

input output
layer layer
LVQ – Cont’d
x − vi(T ) = min
j
( x − v (jT ) ) for j = 1..c

(T +1)

⎪ v(T )
+ α (T )
( x − v(T )
j ) if j = i
vj = ⎨ j

⎪⎩ v (T )
j if j ≠ i
||x y|| is the euclidean distance
||x-y|| distance, c is the number of
clusters, x is the input vector, vi is the ith cluster centre
and α is the learning constant
Pseudo Code: (1) Define number of clusters c and small
terminating condition ε (2) Initialise weights (3)
Determining g winning
g neuron based on distance ((4))
Update winner: v i(T) = v i(T-1) + α i(T ) ( xk − vi(T −1) ) for i ≤ N (T)
(5) Determine terminating condition, else repeat with
new vector t
Fuzzy
y C-Means ((FCM – Bezdek))
• A fuzzy pseudo-partition of a finite data set
X is defined: ∑ μ i ( xk ) = 1 for all k = 1..n
c

i =1
n
0 < ∑ μi ( xk ) < n for all i = 1..c
1 c
k =1

• An objective function for fuzzy clustering is


(m defines the degree of fuzziness):
n c
J m (Ρ) = ∑∑ ( μi ( xk )) xk − vi
m 2

k =1 i =1
FCM – Cont
Cont’d
d
• Pseudo Code:
– Define number of clusters (c), degree of fuzziness (m)
and terminating condition (ε)
– Init t and pseudo parition p0
– Compute cluster centres: v1, v2, …vi … vc
n

∑ ( μ ( x ))
i k
m
xk
vi(T ) = k =1
n
for i = 1..c
∑ (μ ( x ))
k =1
i k
m
FCM – Cont
Cont’d
d2
• Pseudo Code:
– Update new Pseudo Partition:
−1
⎛ 1

c ⎛ x − v(T ) ⎞
2 m−1
⎜ ⎟
μi(T +1) ( xk ) = ⎜ ∑ ⎜ ⎟ for i = 1..c, k = 1..n
k i

⎜⎜ j =1 ⎜⎝ xk − v(jT ) ⎟
2

⎠ ⎟⎟
⎝ ⎠
– Compare distance between the partitions E= pt+1 – pt
c n
E = Ρ (T +1) − Ρ (T ) = ∑ ∑ μi(T +1) ( xk ) − μi(T ) ( xk )
i =1 k =1

– Terminate if E < ε is
Examples: Iris data
5.0 3.0

2.5
4.0

Sepal width (cm)


2.0

Petal width (cm)


3.0
1.5
2.0
1.0
Iris virginica
1.0
0.5 Iris versicolor
I i sentosa
Iris t
0.0 0.0
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0
Sepal length (cm) Petal length (cm)

180 50 150 50

170 45 45
140
40 40
160

Num ber of mistakes


ber of mistakes

35 35
150 130
30

Total error
30
Total error

140
25 120 25
130 20
20
T

Numb

120 110
15
15
110 10
10 100
5 e
100 5 e
Mistakes
Mistakes 90 0
90 0
0 5 10 15 20 25
0 10 20 30 40 50 60
Number of iterations
Number of iterations

Error trend:LVQ α=0.005, δ=0.8, ε=0.0001 Error trend:FCM m=1.5, ε=0.0001


Properties of FCM
• FCM is a non-sequential
non sequential, batch
batch-learning
learning
optimization algorithm.
• Computationally and memory intensive
intensive.
• Unable to perform on-line training.
• performance depends on a good choice of
weighting exponential m.
Results of FCM
1.0 1.0

mbership degree (x)


0.8 0.8
m bership degree μ(x)

0.6 0.6

0.4 0.4
Mem

Mem
0.2 0.2

0.0 0.0
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9 2 2.4 2.8 3.2 3.6 4 4.4
Sepal length (cm) Sepal width (cm)

1.0 1.0
Membership degree (x))

Membership degree (x))


0.8 0.8

0.6 0.6

0.4 0.4

sentosa
0.2 0.2
versicolor
virginica
0.0 0.0
1 2 3 4 5 6 0.1 0.5 0.9 1.3 1.7 2.1 2.5
Petal length (cm) Petal width (cm)

IRIS data set - FCM with m=1.5, ε=0.0001

trapezoidal –like membership functions


Results FCM – Cont
Cont’d
d
1.0 1.0

bership degree (x)


0.8 0.8
bers hip degree μ (x)

0.6 0.6

0.4 0.4
Mem b

Memb
0.2 0.2

0.0 0.0
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9 2 2.4 2.8 3.2 3.6 4 4.4
Sepal length (cm) Sepal width (cm)

1.0 1.0
embership degree (x)

embership degree (x)


0.8 0.8

0.6 0.6

0.4 0.4

sentosa
02
0.2 02
0.2
Me

Me

versicolor
virginica
0.0 0.0
1 2 3 4 5 6 0.1 0.5 0.9 1.3 1.7 2.1 2.5
Petal length (cm) Petal width (cm)

IRIS dataset - FCM with m=2 0 ε=0.0001


m=2.0, =0 0001

Gaussian-like membership functions


Results MLVQ (Gaussian)
vi − vclosest
• Width of gaussian MFi wi =
σ
1.0 1.0

M em bership degreee (x)


0.8 0.8
Mem bers hip degreee μ (x)

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9 2 2.4 2.8 3.2 3.6 4 4.4
Sepal length (cm) Sepal width (cm)

1.0 1.0
M em bership degrree (x)

M em bership degrree (x)

0.8 0.8

0.6 0.6

0.4 0.4

sentosa
0.2 0.2
versicolor
virginica
0.0 0.0
1 2 3 4 5 6 01
0.1 05
0.5 00.99 13
1.3 17
1.7 21
2.1 25
2.5
Petal length (cm) Petal width (cm)

MLVQ with λ=0.02, σ=1.5, ε=0.0001


Results MLVQ (Gaussian) –cont’d
cont d
vi − vclosest
• Width of gaussian MFi wi =
σ
1.0 1.0

M em bership deggree (x)


0.8 0.8
Mem bers hip deggree μ (x)

06
0.6 06
0.6

0.4 0.4

0.2 0.2

0.0 0.0
43
4.3 47
4.7 51
5.1 55.55 5.9
5 9 6.36 3 6.7
67 71
7.1 75
7.5 79
7.9 2 24
2.4 22.88 32
3.2 36
3.6 4 44
4.4
Sepal length (cm) Sepal width (cm)

1.0 1.0
M em bership deggree (x)

M em bership deggree (x)

0.8 0.8

06
0.6 06
0.6

0.4 0.4

sentosa
0.2 0.2
versicolor
virginica
0.0 0.0
1 2 3 4 5 6 01
0.1 05
0.5 00.99 13
1.3 17
1.7 21
2.1 25
2.5
Petal length (cm) Petal width (cm)

MLVQ with λ=0.02, σ=3.0, ε=0.0001


Trapezoidal MF
• A trapezoidal membership function μ(x)
can be described by a fuzzy interval
formed by four parameters (α (α, β,
β γ,
γ δ) and
a centroid v μ(x)

• This fuzzy interval is 1

also known as a
trapezoidal fuzzy number
x
0 α β v γ δ
Trapezoidal MF – Cont
Cont’d
d
• The subinterval [β, γ] where μ(μ(x)=1
) is called the kernel of
the fuzzy interval, and the subinterval [α, δ] is called the
support.
[β γ]=kernel
• [β, ] k l off the
th fuzzy
f interval,
i t l andd
• [α, δ]=support of the fuzzy interval.
• MLVQ algorithm can be ⎧0 if x < α or x > δ
used to derive the centroid v ⎪ α−x
⎪ if α ≤ x ≤ β
• it cannot derive the ⎪ α −β
μ (x) = ⎨
parameters (α, β, γ, δ) ⎪1 if β ≤ x ≤ γ
of the trapezoidal-shaped ⎪ δ −x
⎪ if γ ≤ x ≤ δ
membership function ⎩ δ −γ
The Fuzzy Kohonen Partition
algorithm - supervised
• Define:
– c as the number of classes,
– λ≤1/Ω as the learning constant, where Ω=number of data vectors
in a cluster
cluster,
– η as the learning width and a small positive number ε as a
stopping criterion; n=total number of data vectors
• I iti li weights:
Initialise i ht + 1
( )
i
vi(0) = min ( xk ) + 2 max ( x ) − min ( x )
k k
k c k k

for i = 1..c, k = 1..n

• Determine the ith cluster that the data xk belongs and


Update the weights vi of the ith cluster
The Fuzzy Kohonen Partition
algorithm – supervised (cont’d)
• C
Compute t error to
t cluster
l t andd difference
diff iin error b
between
t
iteration: n
e(T +1) = ∑ xk − vi(T +1)
k =1

(T +1) (T +1)
de =e −e (T )

• Repeat: while ¬ (de(T+1)≤ε )

– End of determining centroids


The Fuzzy Kohonen Partition
algorithm – supervised (cont’d)
• Initialize
I iti li α i = β i = δ i = γ i = ϕ i = vi(T +1) for i = 1..c
– where ϕi is the pseudo weight of vi.

i=1 i=2 i=3

• Determine the ith cluster that the data xk belongs


g and
Update the pseudo weights ϕi of the ith cluster

ϕ i = ϕ i + η (xk − ϕ i )
The Fuzzy Kohonen Partition
algorithm – supervised (cont’d)
• U
Update
d t the
th four
f points
i t off the
th Trapezoidal
T id l Fuzzy
F Number
N b
(TrFN)

α i = m in (α i , x k )
β i = m in ( β i , ϕ i )
γ i = m ax ( γ i , ϕ i )
δ i = m ax (δ i , x k )
The Fuzzy Kohonen Partition
algorithm – Results
1.0 1.0

gree (x)
0.8 0.8
gree μ(x)

06
0.6 06
0.6

Membership deg
Mem bership deg

0.4 0.4

0.2 0.2

0.0 0.0
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9 2 2.4 2.8 3.2 3.6 4 4.4
Sepal length (cm) Sepal width (cm)

1.0 1.0
degree (x)

degree (x)
0.8 0.8

0.6 0.6
Membership d

Membership d

0.4 0.4

sentosa
0.2 0.2
versicolor
virginica
0.0 0.0
1 2 3 4 5 6 0.1 0.5 0.9 1.3 1.7 2.1 2.5
P t l length
Petal l th ((cm)) P t l width
Petal idth ((cm))

FKP with λ=0.02, η=0, ε=0.0005


The Fuzzy Kohonen Partition
algorithm – Results

1.0 1.0
ee (x)

ee (x)
0.8 0.8
Membership degre

Membership degre
0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9 2 2.4 2.8 3.2 3.6 4 4.4
Sepal length (cm) Sepal width (cm)

1.0 1.0
ee (x)

0.8 ee (x) 0.8


M em bership degre

M em bership degre

0.6 0.6

0.4 0.4

sentosa
0.2 0.2
versicolor
virginica
0.0 0.0
1 2 3 4 5 6 0.1 0.5 0.9 1.3 1.7 2.1 2.5
Petal length (cm) Petal width (cm)

FKP with λ=0.02, η=0.5, ε=0.0005


Pseudo Fuzzy Kohonen Partition algorithm
• PFKP operates similarly to FKP except in the
determination of the trapezoidal MF parameters:
ϕ
• Initialize i = α i = β i = δ i = γ i = vi
(T +1)
for i = 1..c
1 c
– where ϕi is the pseudo weight of vi.
• Determine TF parameters:
– Determine winner centroid: j
(
xk − ϕi = min xk − ϕ j )
for j = 1..c
– Update pseudo weight of winner: ϕ i = ϕ i + η (x k − ϕ i )

⎧min(αi , xk ) for i =1
– Update
p Trapezoidal
p Fuzzy
y Number αi = ⎨
⎩ γi−1 for i >1
βi = min(βi ,ϕi )
γi = max((γi ,ϕi )
⎧max(δi , xk ) for i = c
δi = ⎨
⎩ βi+1 for i < c
PFKP - Results
1.0 1.0
M em bership degree (x)

M em bership degree (x)


0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9 2 2.4 2.8 3.2 3.6 4 4.4
Sepal length (cm) Sepal width (cm)

1.0 1.0
M em berrship degree (x )

M em berrship degree (x )
0.8 0.8

0.6 0.6

0.4 0.4

sentosa
0.2 0.2
versicolor
virginica
0.0 0.0
1 2 3 4 5 6 0.1 0.5 0.9 1.3 1.7 2.1 2.5
Petal length (cm) Petal width (cm)

PFKP with α=0.02, η=0, ε=0.0005


PFKP – Results (cont
(cont’d)
d)
1.0 1.0
embership degree (x))

embership degree (x))


0.8 0.8

0.6 0.6

0.4 0.4

02
0.2 02
0.2
Me

Me
0.0 0.0
4.3 4.7 5.1 5.5 5.9 6.3 6.7 7.1 7.5 7.9 2 2.4 2.8 3.2 3.6 4 4.4
Sepal length (cm) Sepal width (cm)

1.0 1.0
bership degree (x)

bership degree (x)


0.8 0.8

0.6 0.6

0.4 0.4
Memb

Memb

sentosa
0.2 0.2
versicolor
virginica
0.0 0.0
1 2 3 4 5 6 0.1 0.5 0.9 1.3 1.7 2.1 2.5
Petal length (cm) Petal width (cm)

with λ=0.02, η=0.01, ε=0.0005


Concluding Remarks
• Fuzzy Kohonen Partition (FKP) and the Pseudo
Fuzzy Kohonen Partition (PFKP), were proposed
to directly derive appropriate membership
functions from training data.
• Both algorithms directly derive trapezoidal
membership functions that are convex and
normal from training data while the latter derive
membership functions that forms a pseudo
pseudo-
partition of the input space

Potrebbero piacerti anche