Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Densities
This Wikibook shows how to transform the probability density of a continu-
ous random variable in both the one-dimensional and multidimensional case.
In other words, it shows how to calculate the distribution of a function of
continuous random variables. The first section formulates the general prob-
lem and provides its solution. However, this general solution is often quite
hard to evaluate and simplifications are possible in special cases, e.g., if the
random vector is one-dimensional or if the components of the random vector
are independent. Formulas for these special cases are derived in the subse-
quent sections. This Wikibook also aims to provide an overview of different
equations used in this field and show their connection.
Rn Rm mapping
Z (3)
%Y~ (~y ) = % ~ (~x) dn x
y1 ym {~xRn |f~(~x)~y} X
The following sections will provide simplifications in special cases.
1
2 Function of a Random Variable (n=1, m=1)
If n=1 and m=1, X is a continuously distributed random variable with the
density %X and f : R R is a Borel-measurable function. Then also
Y := f(X) is continuously distributed and we are looking for the density
%Y (y). In the following, f should always be at least differentiable.
Let us first note that there may be values that are never reached by f, e.g.
y<0 if f(x) = x2 . For all those y, necessarily %Y (y) = 0.
0, if y
/ f (R)
%Y (y) =
?, if y f (R)
Following equations (1) and (2), we obtain
d d d
%Y (y) = FY (y) = P (Y y) = P (f (X) y) (4)
dy dy dy
We will now rearrange this expression in various ways.
2
NP
+1
P (f (X) y) = P (fIj (X) y)
j=1 1
NP
+1 df (y)
%X (fI1
d Ij
%Y (y) = dy
P (f (X) y) = j
(y)) dy
j=1
With the convention that the sum over 0 addends is 0 and using the inverse
function theorem, it is possible to write this in a more compact form (read
as: sum over all x, where f(x)=y):
R R mapping
X %X (x) (6)
%Y (y) =
x,f (x)=y
|f 0 (x)|
3
2.3 Deviation using the Delta Distribution
In this section we consider another different deviation, often used in physics.
We start again with equation (4) and write this as an integral:
d
%Y (y) = dy P (f (X) y)
d R
= dy {xR|f (x)y} %X (x) dx
d R
= dy (y f (x)) %X (x) dx
R dR
= R dy (y f (x)) %X (x) dx
= R (y f (x)) %X (x) dx
R
The intuitive interpretation of the last expression is: one integrates over all
possible x-values and uses the delta function to pick all positions where
y = f(x). This formula is often found in physics books, possibly written as
expectation value, h. . .i:
We can see this formula is equivalent to equation (6) using the following
identity
Z X g(x)
(h(x)) g(x) dx =
R
x,h(x)=0
|h0 (x)|
2.4 Example
2
Let us consider the following specific example: let %X (x) = exp[0.5x
2
]
and f (x) = x2 . We choose to use equation (6) (equations (5) and (7)
lead to the same result). We calculate the derivative f 0 (x) = 2x and
find all x for which f(x)=y, which are y and + y if y>0 and none
otherwise. For y>0 we have:
P %X (x) %X ( y) %X (+ y) exp[0.5y] exp[0.5y]
%Y (y) = |f 0 (x)|
= + = +
x,f (x)=y |f 0 ( y)| |f 0 (+ y)| 2 2 y 2 2 y
exp[0.5y]
=
2 y
Since f never reaches negative values, the sum remains 0 for y<0 and
we finally obtain:
0, if y 0
%Y (y) = exp[0.5y]
2 y
, if y > 0
This example is illustrated in figure 1.
4
y
y = x2
v + v
(a)
v
x
- v + v - v v v + v
%X (x)
2
(b) e0.5 x
%X (x) =
2
x
- v + v - v v v + v
%Y (y)
e0.5 y
(c) %Y (y) =
2 y
y
v v + v
Figure 1: Random numbers are generated according to the standard normal
distribution X N(0,1). They are shown on the x-axis of figure (a). Many
of them are around 0. Each of those numbers is mapped according to y = x2 ,
which is shown with grey arrows for two example points. For many generated
realisations of X, a histogram on the y-axis will converge towards the wanted
probability density function %Y shown in (c). In order to analytically derive
this function, we start by observing that in order for Y to be between any
v and v + v, Xmust have been between either v + v and v, or,
between v and v + v. Those intervals are marked in figures (b) and (c).
Because the probability is equal to the area under the probability density
function, we can determine %Y from the condition that the grey shaded area
in (c) must be equal to the sum of the areas in (b). The areas are calculated
using integrals and it is useful to take the limit v 0 in order to get the
formula noted in (c).
5
Another example is the inverse transformation method. Suppose a
computer generates random numbers X with a uniform distribution on
[0, 1], i.e.
1, if 0 x 1
%X (x)
0, else.
y
1
FX
0.75
0.50
0.25
x
0 1 2
6
3 Mapping of a Random Vector to a Random
Variable (n>1, m=1)
We will now investigate the case when a random vector X with known density
%X~ is mapped to (scalar) random variable Y and calculate the new density
%Y (y).
According to (3), we find:
d Z
%Y (y) = %X~ (~x) dn x (8)
dy {~xR |f (~x)y}
n
The direct evaluation of this equation is sometimes the easiest way, e.g., if
there is a known formula for the area or volume presented by the integral.
Otherwise one needs to solve a parameter-depended multiple integral.
If the components of the random vector X ~ are independent, then the prob-
ability density factorizes:
%X~ (x1 , . . . , xn ) = %X1 (x1 ) . . . %Xn (xn )
In this case the delta function may provide a fast tool for the evaluation:
3.1 Examples
~ = X1 + X2 with the independent, continuous random
Let Y = f (X)
variable X1 and X2 . According to equation (9), we have:
%Y (y) = RR %X2 (x2 ) R %X1 (x1 ) (y x2 x1 ) dx1 dx2
R R
7
If Y = XR1 X2 with independent
X1 and X2 , then
%Y (y) = R %X1 (x1 ) %X2 xy1 |x11 | dx1 .
X1
If Y = X R2
with independent X1 and X2 , then
%Y (y) = R %X1 (x2 y) %X2 (x2 ) |x2 | dx2 .
The last integral is over a circle with radius y 1, hence with the area
y 2 . This simplifies the calculation:
1 d 2 y
0 y 1 %Y (y) = dy
(y 2 ) =
= 2y.
If y<0, we integrate over an empty set, which gives 0. If y>1, %X~ = 0.
Therefore, the final result is:
2y, if 0 y 1
%Y (y) =
0, otherwise
This example is illustrated in the figure 3.
8
x2 %Y (y)
(a) 2
(b)
x1
v y
0 1
x2
1 FY (y)
(c)
Atot =
1
Av = v 2 (d)
x1
v 1 Av /Atot
v y
0 1
Figure 3: Figure (a) shows a sketch of random sample points from a uniform
distribution in a circle of radius 1, subdivided into rings. In figure (b), we
count how many of those points fell into each of the rings with the same width
v. Since the area of the rings increases linearly with the radius, one can
expect more points for larger radii. For v 0, the normalized histogram in
(b) will converge to the wanted probability density function %Y . In order to
calculate %Y analytically, we first derive the cumulated distribution function
FY , plotted in figure (d). FY (y) is the probability to find a point inside the
circle of radius v (shown in grey in figure (c)). For v between 0 and 1, we
2
find FY (v) = AAtot
v
= v12
= v 2 . The slope of FY is the wanted probability
density function Y (y) = dFdy
Y (y)
= 2y, in agreement with figure (b).
9
4 Invertible Transformation of a Random Vec-
tor (n=m)
Let X~ = (X1 , . . . , Xn ) be a a random vector with the density % ~ (x1 , . . . , xn )
X
and let f : Rn Rn be a diffeomorphism. For the density %Y~ of Y~ := f~(X) ~
we Rhave:
x) dn x = f (G) %X~ (f 1 (~y )) (x
R 1 ,...,xn ) n
~ (~
G %X (y1 ,...,yn )
d y
and therefore
Rn Rn mapping
1
(x , . . . , x )
1 n (10)
%Y~ (~y ) = %X~ (f (~y )) ,
(y1 , . . . , yn )
where f 1 = (x
1 ,...,xn )
(y1 ,...,yn )
is the Jacobian determinant of f 1 . Note that
f 1 = (f )1 . In the one-dimensional case (n=1), equation (10) coincides
with equation (5).
4.0.1 Examples
Given the random vector X ~ and the invertible
matrix A and a vector
~b, let Y~ = AX
~ T + ~b. Then % ~ (~y ) = % ~ A1 (~y ~b) |det A1 |. Also,
Y X
det A1 = 1/ det A.
10
5 Possible simplifications for multidimensional
mappings (n>1, m>1)
Even if none of the above special cases apply, simplifications can still be
possible. Some of them are listed below:
5.1.1 Example
Given the random vector X ~ = (X1 , X2 , X3 , X4 ) with independent compo-
nents. Let f : R4 R2 , Y~ = f~(X) ~ := (X1 + X2 , X3 + X4 ). Obviously, the
components YR 1 = X1 + X2 and Y2 = X3 + X4 are independent and therefore:
%Y1 (y1 ) = RR %X1 (y1 x2 ) %X2 (x2 ) dx2 and
%Y2 (y2 ) = R %X3 (y2 x4 ) %X4 (x4 ) dx4 .
Note that the components of Y~ can be independent even if the components
~ are not.
of X
5.2.1 Example
To illustrate the idea, we use a simple Rn R example: Let Y = X1 2 + X2 2 + X3
with
ex3 , if x2 + x2 1 x 0
1 2 3
%X~ (x1 , x2 , x3 ) =
0, else
The parametrisation of the region where at the same time x1 2 + x2 2 + x3 y
and x1 2 + x2 2 1 and x3 0 may not be obvious, so we use the two above
formulas:
11
%Y (y) = R3 %X~ (~x) (y f (~x)) dx1 dx2 dx3
R
x
= 0 x21 +x22 1 e 3 (y x21 x22 x3 ) dx1 dx2 dx3
R RR
x
= 0 x21 +x22 1 e 3 R ( x21 x22 ) (y x3 ) d dx1 dx2 dx3
R RR R
R R hRR x
i
= 0 R x21 +x22 1 e 3 ( x21 x22 ) dx1 dx2 (y x3 ) d dx3
Now, we have split the integral such that the expression in brackets can be
evaluated separately, because the region depends on x1 and x2 only and may
contain x3 only as parameter.
x
ex3 , if 0 1
e 3 2 2
RR
2 2
x1 +x2 1 ( x 1 x 2 ) dx 1 dx 2 =
0, else
Therefore:
e
1y
ey , if y > 1
R R 1 x Ry
%Y (y) = 0 0 e 3 (yx3 ) d dx3 = max(0,y1) ex3 dx3 = 1 ey , if 0 y 1
0, if y 0
5.3.1 Example
Given the random vector X ~ = (X1 , X2 , X3 ) with the density % ~ (~x) and the
X
following mapping:
! X
1 2 3 1
!
Y1
= X2
Y2 4 5 6
X3
Now we introduce the helper coordinate Y3 = X3 , which results in the trans-
formation matrix
1 2 3
A = 4 5 6
0 0 1
1
with the corresponding
pdf
%X~ (A
~y ) |det A1 |. Thus, we finally obtain
y1
%Y~ (y1 , y2 ) = R %X~ A y2 |det A1 | dy3 .
R 1
y3
Remark: If the joint pdf %Y~ (y1 , y2 ), i.e. the conditional distribution,
is not of interest and one is only interested in the marginal distribution
12
R
with %Y1 (y1 ) = R %Y~ (y1 , y2 ) dy2 , then it is possible to calculate the den-
sity as described in section Mapping of a Random Vector to a Random
Variable for the mapping Y1 = 1 X1 + 2 X2 + 3 X3 (and likewise for
Y2 = 4 X1 + 5 X2 + 6 X3 ).
6 Real-World Applications
In order to show some possible applications, we present the following ques-
tions, which can be answered using the techniques outlines in this Wiki-
book. In principle, the answers could also be approximated using a numeri-
cal random number simulation: generate several realizations of X,~ calculate
Y~ = f (X)
~ and make a histogram of the results. However, many such random
numbers are needed for reasonable results, especially for higher-dimensional
random vectors. Gladly, we can always calculate the resulting distribution
analytically using the above formulas.
Suppose we consider the value of one gram gold, silver and platinum in
one year from now as independent random variables G, S and P, respec-
tively. Box A contains 1 gram gold, 2 gram silver and 3 gram platinum.
13
! G !
A 1 2 3
Box B contains 4, 5 and 6 gram, respectively. Thus, = S .
B 4 5 6
P
What is the value of the contents in box A (or box B) in one year from
now? (The answer is given in an example above.) Note that A and B
are correlated.
14