Universit di Verona
A.A. 2012-13
Teoria e Tecniche del Riconoscimento
Modelli generativi: Hidden Markov Models,
Observed Influence Models
Sommario
1. Processi e modelli di Markov;
2. Processi e modelli di Markov a stati
nascosti (Hidden Markov Model, HMM);
3. Attivit di ricerca e applicazioni su HMM;
Marco Cristani
s1
s2
s3
N=3
t=1
E caratterizzato da passi
discreti, t=1,t=2,
La probabilit di partire da un
determinato stato dettata
dalla distribuzione:
={i} : i = P(q1 = si) con
N
i
1 i N, i 0 e
i 1
Processo di Markov
s1
s2
s3
N=3
t=1
q1=s3
Stato
Corrente
Processo di Markov
s1
s2
s3
N=3 t=2
q2=s2
Stato
Corrente
Stato
Corrente
Processo di Markov
Definendo:
s1
s2
s3
N=3 t=1
q2=s2
Stato
Corrente
A=
a1,1
a1,2
a1,3
a2,1
a2,2
a2,3
a3,1
a3,1
a3,3
=(A, )
Marco Cristani
Marco Cristani
Marco Cristani
s1
s2
s3
Marco Cristani
Modella comportamenti
stocastici Markoviani (di ordine
N) di un sistema in cui gli stati
sono:
Stato
Corrente
Esempio: Semaforo
t=1
t=2
Stato
Corrente
A=
Marco Cristani
1= 0.33
2= 0.33 3= 0.33
a11= 0.1
a12= 0.9
a13=0
a23= 0.99
a31= 1
a33= 0
a32= 0
Semaforo inferenze
s1
s2
s3
Stato
Corrente
Marco Cristani
Inferenza importante
P(qt,qt-1,,q1) = P(qt|qt-1,,q1) P(qt-1,,q1)
= P(qt|qt-1) P(qt-1,qt-2,,q1)
= P(qt|qt-1) P(qt-1|qt-2) P(qt-2,...,q1)
= P(qt|qt-1) P(qt-1|qt-2)...P(q2|q1)P(q1)
Marco Cristani
Modello grafico
q1
q2
q3
q4
Marco Cristani
qt-1
= P(qt|qt-1)
qt
(=P(qt|qt-1,,q1) in
questo caso)
Stato
Corrente
P(O| ) = P(O)
0.99
0.33 =
P(qT,qT-1,,q1)
STEP 2: Uso questo metodo per calcolare P(qT=
sj), ossia:
P(qT= sj)
P(Q)
i
j
pt 1 ( j ) P(qt 1 s j ) P(qt 1 s j , qt si )
i 1
Marco Cristani
P(q
t 1
i 1
s j , qt si )
i 1
i 1
P(qt 1 s j | qt si )P(qt si ) a ij pt (i )
Uso questo metodo partendo da P(qT= sj) e
procedendo a ritroso
Il costo della computazione in questo caso O(TN 2)
Marco Cristani
Marco Cristani
NO!
Marco Cristani
s1
s3
s2
s1
Marco Cristani
s3
s2
Osservo il filmato:
osservo che c un
sistema che evolve, ma
non riesco a capire quali
siano gli stati regolatori.
?
?
?
Marco Cristani
S1
S2
S3
Marco Cristani
Osservo il filmato:
osservo che c un
sistema che evolve, ma
non riesco a capire quali
siano gli stati regolatori.
Il sistema comunque
evolve a stati; lo capisco
osservando il fenomeno
(introduco una
conoscenza a priori)
v1
S1
S2
S3
Marco Cristani
v2
v3
Teoria e Tecniche del Riconoscimento
v1, v2, v3
S1
S2
S3
Marco Cristani
v1, v2, v3
v1, v2, v3
Teoria e Tecniche del Riconoscimento
v1, v2, v3
S1
S2
S3
Marco Cristani
v1, v2, v3
Descrivono probabilisticamente la
dinamica di un sistema rinunciando
ad identificarne direttamente le
cause
Marco Cristani
1= 0.33
2= 0.33 3= 0.33
a11= 0.1
A=
Marco Cristani
a12= 0.9
S1
S2
S3
a13=0
a23= 0.99
a31= 1
a33= 0
a32= 0
Un insieme V={v1,v2,,vM}
di simboli dosservazione;
S1
Una distribuzione di
probabilit sui simboli
dosservazione
B={bjk}={bj(vk)}, che indica la
probabilit di emissione del
simbolo vk quando lo stato
del sistema sj, P(vk|sj)
B=
Marco Cristani
v1,..., vM
b11=0.8
S2
S3
v1,..., vM
b21=0.1
b31=0.1
b32= 0.1
b3M=0.8
A=
B=
Marco Cristani
1= 0.33
a11= 0.1
v1,..., vM
2= 0.33 3= 0.33
a12= 0.9
a13=0
a23= 0.99
a31= 1
a32= 0
a33= 0
b11=0.8
b21=0.1
b31=0.1
b12= 0.1
b22= 0.8
b32= 0.1
b1M= 0.1
b2M= 0.1
b3M=0.8
S1
S2
S3
v1,..., vM
v1,..., vM
Assunzioni sullosservazione
Indipendenze condizionali
q1
P(Ot=X|qt=sj, qt-1,qt-2,,q2,q1,
Ot-1,Ot-2,,O2,O1)
= P(Ot=X|qt=sj)
q2
O1
O2
q3
O3
q4
O4
Marco Cristani
Urn 1
Urn 2
Urn 3
P(red) = b1(1)
P(red) = b2(1)
P(red) = b3(1)
P(blue) = b1(2)
P(blue) = b2(2)
P(blue) = b3(2)
P(green) = b1(3)
P(green) = b2(3)
P(green) = b3(3)
P(purple) = b3(4)
Marco Cristani
Marco Cristani
Marco Cristani
O-CA-OR
s2
s1
O-CA-OR
s3
Marco Cristani
1= 0.33
b1(O)=0.8
b1(OR)= 0.1
b1(CA)= 0.1
a11= 0
a21= 1/3
a31= 1/2
2= 0.33
b2(O)=0.1
b2(OR)= 0.0
b2(CA)= 0.9
a12= 1
a22= 2/3
a32= 1/2
3= 0.33
b3(O)=0.1
b3(OR)= 0.8
b3(CA)= 0.1
a13= 0
a23= 0
a33= 0
O-CA-OR
s2
s1
O-CA-OR
s3
q0 = S2
O1= CA
q1 = S2
O2= CA
q2 = S1
Marco Cristani
O3 = O
Teoria e Tecniche del Riconoscimento
Il nostro problema
che gli stati non
sono direttamente
osservabili!
q0 =
O1= CA
q1 =
O2= CA
q2 =
O3= O
Problema 1:
Probabilit di una serie di osservazioni
P(O)=P(O1,O2,O3) =P(O1=CA,O2 =CA,O3=O)?
Strategia forza bruta:
P(O)=
P(O, Q)
Qcammini di lunghezza 3
P(O | Q) P(Q)
?
Marco Cristani
Problema 1:
Probabilit di una serie di osservazioni
P(O)=P(O1,O2 ,O3)=P(O1=X,O2 =X,O3=Z)?
Strategia forza bruta:
P(O,Q)
=P(O|Q)P(Q)
P(O)=
P(Q)=P(q1,q2 ,q3)=
=P(q1)P(q2,q3|q1)
= P(q1)P(q2|q1)P(q3 |q2)
Nel caso Q = S2S2 S1
= 2 a22 a21
Marco Cristani
=1/3*2/3*1/3=2/27
Problema 1:
Probabilit di una serie di osservazioni
P(O)=P(O1,O2 ,O3)=P(O1=X,O2 =X,O3=Z)?
Strategia forza bruta:
P(O,Q)
=P(O|Q)P(Q)
P(O)=
P(O|Q)
=P(O1,O2,O3|q1,q2 ,q3)
=P(O1|q1)P(O2|q2)P(O3 |q3)
Nel caso Q = S2S2 S1
=9/10*9/10*8/10=0.648
Marco Cristani
Osservazioni
Le precedenti computazioni risolvono solo un
termine della sommatoria; per il calcolo di P(O)
sono necessarie 27 P(Q) e 27 P(O|Q)
Per una sequenza da 20 osservazioni
necessitiamo di 320 P(Q) e 320 P(O|Q)
Esiste un modo pi efficace, che si basa sulla
definizione di una particolare probabilit
In generale:
P(O | )
Q1 Q1
All sequences Q1 ,...,Q T
Procedura Forward
Date le osservazioni O1,O2 ,...,OT definiamo
t(i)=P(O1,O2,,Ot,qt=si|) ,dove 1 t T
ossia:
Abbiamo visto le prime t osservazioni
Siamo finiti in si, come t-esimo stato visitato
P(Ot+1,qt+1=sj|O1,O2,,Ot,qt=si )P(O1,O2,,Ot,qt=si)
i 1
N
i 1
N
i 1
N
[a ij t(i)] bj(Ot+1)
i 1
N
q1
q2
i 1
Marco Cristani
q3
q4
O1
O2
O3
O4
P(O1,O2,,OT) =
i 1
(i )
di complessit O(N2T)
Ma anche altre quantit utili, per esempio:
P(qt=si| O1,O2,,Ot) =
t (i )
( j)
j 1
Marco Cristani
e quindi
0 (j) verificare!!!
j1
Marco Cristani
Marco Cristani
occorrono
finiscono nello stato si
producono come output O1,O2,...,Ot
Marco Cristani
Algoritmo di Viterbi
Marco Cristani
Algoritmo di Viterbi
ATTENZIONE:
calcolato
per ogni j!
Marco Cristani
Algoritmo di Viterbi
Marco Cristani
Marco Cristani
Stima ML di HMM:
procedura di ri-stima di Baum Welch
Definiamo
(i, j ) (i)
j 1
T 1
(i, j )
t
t 1
T 1
(i)
t 1
Marco Cristani
(E step)
Marco Cristani
Stima ML di HMM:
procedura di ri-stima di Baum Welch
Algoritmo di Baum-Welch
Tali quantit vengono utilizzate nel processo di
stima dei parametri dellHMM in modo iterativo
Si utilizza una variazione dellalgoritmo di
Expectation-Maximization (EM)
che esegue unottimizzazione locale
massimizzando la log-likelihood del modello rispetto
ai dati
opt = arg max log P({Ol} | )
Marco Cristani
( A, B , )
Algoritmo:
1) inizializzo il modello
2) il modello corrente =
( A0 , B0 , 0 )
HMM training
Fundamental issue:
Baum-Welch is a gradient-descent
optimization technique (local optimizer)
the log-likelihood is highly multimodal
Model selection
Marco Cristani
1.5
1
0.5
0
0.5
1
1
0.5
0.5
2 is too low
1.5
1.5
4 seems ok
0.5
0.5
0.5
0.5
0.5
0.5
1
1
Marco Cristani
0.5
0.5
1
1
0.5
12 is too high
1.5
1
1
0.5
0.5
underfitting
1.5
1.5
ok...
0.5
0.5
0.5
0.5
0.5
0.5
1
1
0.5
0.5
2 is too low
1
1
0.5
overfitting
1.5
0.5
1
1
0.5
0.5
100
150
200
250
0
10
15
20
complexity penalty
Marco Cristani
k best
increases with
k
Marco Cristani
decreases with k
(penalizes larger k)
50
1
100
0.5
150
200
0.5
250
0
1
1
5
10
15
0.5
0.5
20
50
estimate
truth
100
1
150
0
200
1
250
Marco Cristani
0
3
10
0.5
0.5
Marco Cristani
standard ergodic
HMM
S1
S2
S1
S2
S3
S4
S3
S4
Preliminary note:
the Bayesian Network formalism
Bayes Net: graph where nodes represent
variables and edges represent causality
hidden variable
observable variable
causal dependency
EX.: HMM
state
sequenc
e
Qt-1
Qt
Qt+1
output
sequenc
Marcoe
Cristani
Ot-1
Ot
Ot+1
It-1
It
It+1
state
sequenc
e
Qt-1
Qt
Qt+1
output
sequenc
e
Ot-1
Ot
Ot+1
Qt-1
Qt
Qt+1
state
sequence
2
state
sequence
3
Qt-1
Qt
Qt+1
Qt-1
Qt
Qt+1
output
sequenc
e
Ot-1
Ot
Ot+1
Ot-1
Ot
Ot+1
state
sequence
1
Qt-1
Qt
Qt+1
state
sequence
2
Qt-1
Qt
Qt+1
output
sequence
2
Ot-1
Ot
Ot+1
...
Marco Cristani
Discriminative training
Standard HMM training is blind (no information
from other classes is used)
IDEA: training HMMs with discriminative criteria,
i.e. considering also other classes information
Two popular examples:
maximum mutual information (MMI)
[Bahl, Brown, de Souza, Mercer, ICASSP 00]
maximize likelihood for the objects in the class while
minimizing the likelihood for the other obejcts
Discriminative classification
Standard HMM classification: train one HMM per
class and apply the Bayes rule
trained HMMs
1
sequenc
e
Bayes rule
class = arg maxi P(O| i)
...
N
Marco Cristani
Discriminative classification
Idea of discriminative classification: using
trained HMMs to derive a feature space, where a
discriminative classifiers is trained
trained HMMs
1
sequenc
e
feature vector
x1,x2,x3,...,xK
...
N
Marco Cristani
discriminative
vector-based
classifier
Discriminative classification
Kind of features:
the gradient of the Log Likelihood (or other related
quantities):
this is the well known Fisher Kernel:
[Jaakkola, Haussler, NIPS 99]
Marco Cristani
HMM application
2D shape classification
2D shape classification
Addressed topic in Computer Vision, often basic
for three dimensional object recognition
Fundamental: contour representation
Fourier Descriptor
chain code
curvature based techniques
invariants
auto-regressive coefficients
Hough - based transforms
associative memories
Marco Cristani
Motivations
The use of HMM for shape analysis is very
poorly addressed
Previous works:
He Kundu (PAMI - 91) using AR coefficients
Fred Marques Jorge 1997 (ICIP 97) using chain code
Arica Yarman Vural (ICPR 2000) using circular HMM
Objectives
Investigate the capability of HMM in
discriminating object classes, with respect
to object translation, rotation, occlusion,
noise, and affine projections.
We use curvature representation for object
contour.
No assumption about HMM topologies or
closeness of boundaries.
Marco Cristani
Curvature representation
The object boundary is approximated
with segments of approximately fixed
length
The curvature is calculated as the angle
between two consecutive segments
Contour is smoothed by a gaussian filter
before computing the curvature
Marco Cristani
Curvature representation
Advantages
invariant to object translation
rotation of object is equal to phase translation
of the curvature signal;
can be calculated for open contours
Disadvantages
noise sensitivity
Marco Cristani
Marco Cristani
HMM Initialisation
Gaussian
Mixture Model
clustering of the
curvature
coefficients:
each cluster
centroid is used
for initialising the
parameters of
each state.
Marco Cristani
Marco Cristani
Strategy
Training: for any object we perform these steps
extract edges with Canny edge detector
calculate the related curvature signature;
train an HMM on it:
the HMM was initialised with GMM clustering;
the number of HMM states is estimated using the BIC criterion;
each HMM was trained using Baum-Welch algorithm
Marco Cristani
Strategy (cont.)
Classification: given an unknown
sequence O
compute, for each model i, the probability
P(O| i) of generating the sequence O
classify O as belonging to the class whose
model shows the highest probability P(O| i).
Marco Cristani
Marco Cristani
The models
Marco Cristani
Marco Cristani
Rotation
Test set is obtained by rotating 10 times each object by
a random angle from 0 to 2.
Results: Accuracy 100%
Marco Cristani
Occlusion
Each object is occluded: occlusion vary from 5% to 50%
(only an half of the whole object is visible)
Marco Cristani
Occlusion: results
Marco Cristani
Noise
A Gaussian Noise (with
mean 0 and variance 2)
is added to the X Y
coordinates of the object
2 varies from 1 to 5:
Accuracy 100%. The
gaussian filter applied
before calculating the
curvature is able to
remove completely this
kind of noise
Marco Cristani
2=1
2=4
2=5
Alternative Noise
Adding noise to the first derivative
2=0.5
2=0.3
2=0.9
Marco Cristani
2=0.7
Noise: results
Noise
variance 2
0.1
0.3
0.5
0.7
0.9
Marco Cristani
Classification
Accuracy
100.00%
97.50%
88.75%
82.50%
73.75%
Marco Cristani
Marco Cristani
Tilt = 10
Tilt = 20
Tilt = 30
Slant = 20
Slant = 30
Slant = 40
Slant = 50
Marco Cristani
Tilt = 40
Tilt = 50
Marco Cristani
Conclusions
System is able to recognize object that
could be translated, rotated and occluded,
also in presence of noise.
Translation invariance: due to Curvature
Rotation invariance: due to Curvature and
HMM
Occlusion invariance: due to HMM
Robustness to noise: due to HMM
Marco Cristani
HMM application
Video Analysis
bi(,Ov)Nv2;i,2)
i2
B=
represents gray level
ranges assumed by the v-th pixel
signal, and
The idea
Define the distances between locations on the basis of
the distances between the trained Hidden Markov
Models
The segmentation process is obtained using a spatial
clustering of HMMs
We need to define a similarity measure
decide when a group (at least, a couple) of neighboring pixels
must be labelled as belonging to the same region
D
L
|j)jiL
1
(i,j)ij2P
i(
jO
The similarity measure
where
Marco Cristani
Results (real)
Corridoio.avi
HMM based
segmentation
Image based
segmentation
Marco Cristani
M. Cristani, M. Bicego, V. Murino, Integrated Region- and Pixelbased Approach to Background Modelling, IEEE Wks. on Motion and
Video Computing, Orlando, FL, pp. 3-8, Dec. 2002.
Marco Cristani