Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Probability Distributions
This appendix archives a number of useful results from texts by Papoulis [44], Lee [33]
and Cover [12]. Table 16.1 in Cover (page 486) gives entropies of many distributions
not listed here.
Standard Error
As an example, the mean
X
m = N1 xi (D.12)
i
of uncorrelated variables xi has a variance
X1
m V ar(m) =
2
N V ar(xi) (D.13)
i
= N x2
0.15
p(x)
0.1
0.05
0
10 5 0 5 10 15
x
D.3.1 Entropy
The entropy of a Gaussian variable is
H (x) = 12 log + 12 log 2 + 21
2
(D.18)
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0 2 4 6 8 10
For a given variance, the Gaussian distribution has the highest entropy. For a proof
of this see Bishop ([3], page 240).
is
+ + 2q p 1
D[qjjp] = 21 log p + q p 2 q
2 2 2 2
2 2
2 (D.19)
q p
is a special case of the Gamma distribution with b = 2 and c = N=2. This gives
1 xN= exp x
(x; N ) = (N=
2
2 1
(D.26)
2) 2N= 2 2
The mean and variance are N and 2N . The entropy and relative entropy can be
found by substituting the the values b = 2 and c = N=2 into equations D.23 and
D.24. The distribution is only dened for positive variables.
2
y= x
p
(D.27)
then y has a -density with N degrees of freedom. For N = 3 we have a Maxwell
density and for N = 2 a Rayleigh density ([44], page 96).
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0 2 4 6 8 10
then
x = mp (D.28)
s= N
has a t-distribution with N 1 degrees of freedom. It is written
! (D+1)=2
t(x; D) = B (D=12; 1=2) 1 + xD
2
(D.29)
is the beta function. For D = 1 the t-distribution reduces to the standard Cauchy
distribution ([33], page 281).
0.35 0.35
0.3 0.3
0.25 0.25
p(t)
p(t)
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
(a) 10 5 0
t
5
(b) 10 10 5 0
t
5 10
Figure D.4: The t-distribution with (a) N = 3 and (b) N = 49 degrees of freedom.
0.1 0.12
0.09
0.1
0.08
0.08
0.07
0.06 0.06
0.05
0.04
0.04
0.02
0.03
(a) 0.02
10 5 0 5 (b)
10
0
10 5 0 5 10
Figure D.5: The generalised exponential distribution with (a) R = 1,w = 5 and (b)
R = 6,w = 5. The parameter R xes the weight of the tails and w xes the width
of the distribution. For (a) we have a Laplacian which has positive kurtosis (k = 3);
heavy tails. For (b) we have a light-tailed distribution with negative kurtosis (k = 1).
where () is the gamma function [49], the mean of the distribution is zero , the 1
width of the distribution is determined by 1= and the weight of its tails is set by
R. This gives rise to a Gaussian distribution for R = 2, a Laplacian for R = 1 and a
uniform distribution in the limit R ! 1. The density is equivalently parameterised
by a variable w, which denes the width of the distribution, where w = =R giving
1
The variance is
V = w (3 =R)2
(D.33)
(1=R)
which for R = 2 gives V = 0:5w . The kurtosis is given by [7]
2
K = (5=R ) (1=R) 3
(3=R) 2
(D.34)
where we have subtracted 3 so that a Gaussian has zero kurtosis. Samples may be
generated from the density using a rejection method [59].
1 For non zero mean we simply replace a with a where is the mean.
D.8 PDFs for Time Series
Given a signal a = f (t) which is sampled uniformly over a time period T , its PDF,
p(a) can be calculated as follows. Because the signal is uniformly sampled we have
p(t) = 1=T . The function f (t) acts to transform this density from one over t to to
one over a. Hence, using the method for transforming PDFs, we get
p(a) = p(t) (D.35)
j dadt j
where jj denotes the absolute value and the derivative is evaluated at t = f (x).
1
D.8.1 Sampling
When we convert an analogue signal into a digital one the sampling process can
have a crucial eect on the resulting density. If, for example, we attempt to sample
uniformly but the sampling frequency is a multiple of the signal frequency we are,
in eect, sampling non-uniformly. For true uniform sampling it is necessary that the
ratio of the sampling and signal frequencies be irrational.
where cos(t) is evaluated at t = sin (a). The inverse sine is only dened for =2
1
t =2 and p(t) is uniform within this. Hence, p(t) = 1=. Therefore
p(a) = p 1 (D.37)
1 a 2
This density is multimodal, having peaks at +1 and 1. For a more general sine wave
a = R sin(wt) (D.38)
we get p(t) = w=
p(a) = q 1 (D.39)
1 (a=R) 2
3.5
2.5
1.5
0.5
0
3 2 1 0 1 2 3
3
0.15
0.1 2
0.05 1
0
4 0
2
1
0
(a) (b) 22 1
2 3 4
2 1 0 1
2 0 1 2 3 4
Figure E.1: (a) 3D-plot and (b) contour plot of Multivariate Gaussian PDF with
= [1; 1]T and = = 1 and = = 0:6 ie. a positive correlation of
11 22 12 21
r = 0:6.
gives rise to a random vector y with mean
y = F x + C (E.5)
and covariance
y = F xF T (E.6)
If we generate another random vector, this time from a dierent linear transformation
of x
z = Gx + D (E.7)
then the covariance between the random vectors y and z is given by
y;z = F xGT (E.8)
The i,j th entry in this matrix is the covariance between yi and zj .
E.2.1 Entropy
The entropy is
H (x) = 21 log jj + d2 log 2 + d2 (E.10)
E.2.2 Relative Entropy
For Normal densities q(x) = N (x; q ; q ) and p(x) = N (x; p; p) the KL-divergence
is
D[qjjp] = 0:5 log jjpj
q j + 0 :5 Trace ( p
1
q ) + 0:5(q p) p (q p)
T 1 d (E.11)
2
where jpj denotes the determinant of the matrix p.
m s
Y 1
q() = (tot ) s (E.13)
s=1 ( s )
denes a Dirichlet distribution over these parameters where
X
tot = s (E.14)
s
The mean value of s is s=tot .