(JMT) Fisher Inf

Fisher information matrix for Gaussian and categorical
distributions
Jakub M. Tomczak
November 28, 2012
1 Notations
Let x be a random variable. Consider a parametric distribution of x with parameters θ, p(x|θ). The
contiuous random variable x ∈ R can be modelled by normal distribution (Gaussian distribution):
1 n (x − µ)2 o
p(x|θ) = √ exp −
2πσ 2 2σ 2
= N (x|µ, σ 2 ), (1)
T
where θ = µ σ 2 .
A discrete (categorical) variable x ∈ X , X is a finite set of K values, can be modelled by categorical
distribution:1
K
Y
p(x|θ) = θkxk
k=1
= Cat(x|θ), (2)
P
where 0 ≤ θk ≤ 1, k θk = 1.
For X = {0, 1} we get a special case of the categorical distribution, Bernoulli distribution,
p(x|θ) = θx (1 − θ)1−x
= Bern(x|θ). (3)
2 Fisher information matrix

2.1 Definition
The Fisher score is determined as follows [1]:
g(θ, x) = ∇θ ln p(x|θ). (4)
The Fisher information matrix is defined as follows [1]:
F = Ex g(θ, x) g(θ, x)T .

(5)
1
We use the 1-of-K encoding [1].
1
2.2 Example 1: Bernoulli distribution
Let us calculate the fisher matrix for Bernoulli distribution (3). First, we need to take the logarithm:
ln Bern(x|θ) = x ln θ + (1 − x) ln(1 − θ). (6)

Second, we need to calculate the derivative:
d x 1−x
ln Bern(x|θ) = −
dθ θ 1−θ
x−θ
= . (7)
θ(1 − θ)
Hence, we get the following Fisher score for the Bernoulli distribution:
x−θ
g(θ, x) = . (8)
θ(1 − θ)
The Fisher information matrix (here it is a scalar) for the Bernoulli distribution is as follows:
F = Ex [g(θ, x) g(θ, x)]

h (x − θ)2 i
= Ex
(θ(1 − θ))2
1 n
2 2
o
= Ex [x − 2xθ + θ ]
(θ(1 − θ))2
1 n
2 2
o
= E x [x ] − 2θE x [x] + θ
(θ(1 − θ))2
1 n
2 2
o
= θ − 2θ + θ
(θ(1 − θ))2
1
= θ(1 − θ)
(θ(1 − θ))2
1
= . (9)
θ(1 − θ)
2.3 Example 2: Categorical distribution

Let us calculate the fisher matrix for categorical distribution (2). First, we need to take the logarithm:
K
X
ln Cat(x|θ) = xk ln θk . (10)
k=1
Second, we need to calculate partial derivatives:

∂ xk
ln Cat(x|θ) = . (11)
∂θk θk
Hence, we get the following Fisher score for the categorical distribution:
 x1 
θk
g(θ, x) =  ...  . (12)
 
xK
θK
2
Now, let us calculate the product of Fisher score and its transposition:
 2 
x1 x1 x2 x1 xK
···
 x1 
θk  θ1 θ1 θ2 θ1 θK 
 ..  x1 · · · xK  .. .
.. .. 
 .  θk θK =  . · · · . 
xK  2 
xK x1 xK x2 xK
θK
θK θ1 θK θ2
··· θK
 
g11 g12 · · · g1K
 .. .. ..  .
= . . ··· .  (13)
gK1 gK2 · · · gKK
Therefore, for gkk we have:

h x 2 i
k
Ex [gkk ] = Ex
θk
1
= E [x2 ]
2 x
θk
1
= , (14)
θk
and for gij , i 6= j:
hx x i
i j
Ex [gij ] = Ex
θi θj
1
= Ex [xi xj ]
θi θj
= 0. (15)
Finally, we get: n1 1 o
F = diag ,..., . (16)
θ1 θK
2.4 Example 3: Normal distribution

Let us calculate the Fisher matrix for univariate normal distribution (1). First, we need to take the
logarithm:
1 1 1
ln N (x|µ, σ 2 ) = − ln 2π − ln σ 2 − 2 (x − µ)2 . (17)
2 2 2σ
Second, we need to calculate the partial derivatives:
∂ 1
N (x|µ, σ 2 ) = 2 (x − µ) (18)
∂µ σ
∂ 1 1
2
N (x|µ, σ 2 ) = − 2 + 4 (x − µ)2 . (19)
∂σ 2σ 2σ
Hence, we get the following Fisher score for normal distribution:
∂
N (x|µ, σ 2 )

g(θ, x) = ∂ ∂µ
2
2 N (x|µ, σ )
∂σ 1
σ 2 (x − µ)
= . (20)
− 2σ1 2 + 2σ1 4 (x − µ)2
3
Now, let us calculate the product of Fisher score and its transposition:
1
 
σ2
(x − µ)
 12 (x − µ) − 1 2 + 1 4 (x − µ)2 =


σ 2σ 2σ
− 2σ1 2 + 2σ1 4 (x − µ)2
1 1 1
 2 3

σ 4 (x − µ) − 2σ 4 (x − µ) + 2σ 6 (x − µ)
 = (21)
1 1 3 1 1 2 1 4
− 2σ4 (x − µ) + 2σ6 (x − µ) 4σ4 − 2σ6 (x − µ) + 4σ8 (x − µ)
 
g11 g12
 ,
g21 g22
where g12 = g21 .
In order to calculate the Fisher information matrix we need to determine the expected value of
all gij . Hence,2 for g11 :
1
2
Ex [g11 ] = Ex 4 (x − µ)
σ
1
= 2 Ex [x2 ] − 2µ2 + µ2

σ
1
= 2 µ2 + σ 2 − 2µ2 + µ2

σ
1
= 2, (22)
σ
and for g12 :
1 1
Ex [g12 ] = Ex − 4 (x − µ) + 6 (x − µ)3
2σ 2σ
1 1
3 2 2 3

= − 4 (Ex [x] − µ) + Ex (x − 3x µ + 3xµ − µ )
2σ 2σ 6
1
= 6 (Ex [x3 ] − 3µEx [x2 ] + 3µ2 Ex [x] − µ3 )
2σ
1
= 6 µ3 + 3µσ 2 − 3µ(µ2 + σ 2 ) + 3µ3 − µ3
2σ
= 0, (23)
and for g22 :
1 1 1
2 4
Ex [g22 ] = Ex − (x − µ) + (x − µ)
4σ 4 2σ 6 4σ 8
1 1 1
= 4 − 6 Ex [x2 − 2xµ + µ2 ] + 8 Ex [x4 − 4x3 µ + 6x2 µ2 − 4xµ3 + µ4 ]
4σ 2σ 4σ
1 1 1
= 4 − 6 Ex [x2 ] − 2Ex [x]µ + µ2 + 8 Ex [x4 ] − 4Ex [x3 ]µ + 6Ex [x2 ]µ2 − 4Ex [x]µ3 + µ4
4σ 2σ 4σ
1 1 2 1
= 4 − 6 σ + 8 3σ 4
4σ 2σ 4σ
1
= 4. (24)
2σ
Finally, we get: 1
2 0
F= σ . (25)
0 2σ1 4
2
See Section 3 for raw moments of univariate normal distribution.
4
2.5 Summary
The Fisher information matrix for given distribution:
• Bernoulli distribution:
1
F= ,
θ(1 − θ)
• Categorical distribution: n1 1 o
F = diag ,..., ,
θ1 θK
• Normal distribution:
1

σ2
0
F= 1 .
0 2σ 4
3 Appendix: Raw moments
Table 1: The raw moments of univariate normal distribution.
Order Expression Raw moment

1 Ex [x] µ
2 Ex [x2 ] µ + σ2
2
3 Ex [x3 ] µ3 + 3µσ 2
4 Ex [x4 ] µ4 + 6µ2 σ 2 + 3µ4
References
[1] C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

(JMT) Fisher Inf

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

(JMT) Fisher Inf

Caricato da

Copyright:

Formati disponibili

Fisher information matrix for Gaussian and categorical

2 Fisher information matrix

g(θ, x) = ∇θ ln p(x|θ). (4)

The Fisher information matrix is defined as follows [1]:

F = Ex g(θ, x) g(θ, x)T .

ln Bern(x|θ) = x ln θ + (1 − x) ln(1 − θ). (6)

F = Ex [g(θ, x) g(θ, x)]

2.3 Example 2: Categorical distribution

Second, we need to calculate partial derivatives:

Therefore, for gkk we have:

2.4 Example 3: Normal distribution

3 Appendix: Raw moments

Table 1: The raw moments of univariate normal distribution.

Order Expression Raw moment

Potrebbero piacerti anche