Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
(d , m) (d , m)
( d , m)
The
The concept
concept of
of probability
probability
probability
probability density
density functions
functions (pdf)
(pdf)
Bayes
Bayes theorem
theorem
states
states of
of information
information
Shannons
Shannons information
information content
content
Combining
Combining states
states of
of information
information
The
The solution
solution to
to inverse
inverse problems
problems
Albert Tarantola
This lecture follows Tarantola, Inverse problem theory, p. 1-88.
( d , m) = k
(d , m) (d , m)
( d , m)
X
A
P Ai = P( Ai )
i
i
( d , m) = k
(d , m) (d , m)
( d , m)
Head
Tail
( d , m) = k
(d , m) (d , m)
( d , m)
A X
P ( A) = dxf ( x)
A
dx = dx dx dx ...
A
( d , m) = k
(d , m) (d , m)
( d , m)
Marginal probabilities
fY ( y ) = dxf ( x, y )
X
( d , m) = k
(d , m) (d , m)
( d , m)
Conditional probabilities
fY | X ( y | x0 ) =
f ( y, x0 )
dxf ( y, x )
0
f X |Y ( x | y0 ) =
f ( x, y 0 )
dxf ( x, y )
0
or
( d , m) = k
(d , m) (d , m)
( d , m)
Bayes Theorem
f ( x, y ) = f X |Y ( x | y ) fY ( y )
or
f ( x, y ) = f Y | X ( y | x ) f X ( x )
Bayes theorem gives the probability for event y to happen given event x
fY | X ( y, x) =
f X |Y ( x | y ) fY ( y )
X |Y
( x | y ) fY ( y )dy
( d , m) = k
(d , m) (d , m)
( d , m)
2.
( d , m) = k
(d , m) (d , m)
( d , m)
State of information
to be more formal
Let P denote the probability for a given state of information on the
parameter space X and f(x) is the probability density
P ( A) = f ( x)dx
A
fY ( y ) = f ( x, y )dx
X
( d , m) = k
(d , m) (d , m)
( d , m)
f ( x) = ( x x 0 )
where (.) represents the Dirac delta function and
(x x ) = 1
0
This state is only useful in the sense that sometimes a parameter with
respect to others is associated with much less error.
( d , m) = k
(d , m) (d , m)
( d , m)
M ( A) = ( x)dx
A
(x x
) =1
( d , m) = k
(d , m) (d , m)
( d , m)
Claude Shannon
1916-2001
H = pi log pi
i
( d , m) = k
(d , m) (d , m)
( d , m)
Event 1: p
Event 2: q=1-p
( d , m) = k
(d , m) (d , m)
( d , m)
->
->
->
H = pi log pi
i
Bits, Bytes
Neps
Digit
-> high order, small thermodynamic entropy, small Shannon entropy, not alot of information
-> disorder, large thermodynamic entropy, large Shannon entropy, wealth of information
( d , m) = k
(d , m) (d , m)
( d , m)
H ( f , ) =
f ( x)
f ( x) log
dx
( x)
is called the information content of f(x). H() represents the state of null
information.
Finally: What is the
information content of
your name?
Frequency of letters
in German
H = pi log pi
i
( d , m) = k
(d , m) (d , m)
( d , m)
With basic principles from mathematical logic it can be shown that with two
propositions f(x) (e.g. two data sets, two experiments, etc.) the combination
of the two sources of information (with a logical and) comes down to
( x) =
f1 (x) f 2 ( x)
( x)
This is called the conjunction of states of information (Tarantola and Valette, 1982). Here
(x) is the non-informative pdf and s(x) will turn out to be the a posteriori probability
density function.
This equation is the basis for probabilistic inverse problems:
We will proceed to combine information obtained from measurements with information
from a physical theory.
( d , m) = k
(d , m) (d , m)
( d , m)
d cal = g (m)
Examples:
- ground displacements for an earthquake source and a given earth model
- travel times for a regional or global earth model
- polarities and amplitudes for a given source radiation pattern
- magnetic polarities for a given plate tectonic model and field revearsal history
- shaking intensity map for a given earthquake and model
-....
But: Our modeling may contain errors, or may not be the right physical theory,
How can we take this into account?
( d , m) = k
(d , m) (d , m)
( d , m)
Following the previous ideas the most general way of describing information
from a physical theory is by defining for given values of model m - a
probability density over the data space, i.e. a conditional probability density
denoted by (d|m).
Examples:
1.
2.
( d | m) = ( d g ( m))
( d , m) = k
(d , m) (d , m)
( d , m)
(d,m) summarized:
The expected correlations between model and data space can be described
using the joint density function (d,m). When there is an inexact physical
theory (which is always the case), then the probability density for data d is
given by (d|m)(m).
This may for example imply putting error bars about the predicted data
d=g(m) graphically
( d , m) = k
(d , m) (d , m)
( d , m)
Good data
Noisy data
Uncertainty
( d , m) = k
(d , m) (d , m)
( d , m)
( d , m) = k
(d , m) (d , m)
( d , m)
(d,m)
m
Nonlinear Inverse Problems
Acceleration (m/s2 )
Example:
We observe the max. acceleration (data d) at a given site as a function of
earthquake magnitude (model m). We expect earthquakes to have a magnitude
smaller than 9 and larger than 4 (because the accelerometer would not trigger
before). We also expect the max. acceleration not to exceed 18m/s and not be
below 5 m/s2 .
D (d)
20
(d,m)
M(m)
Magnitude
10
Probability and information
( d , m) = k
(d , m) (d , m)
( d , m)
(d , m) (d , m)
( d , m) =
( d , m)
Lets try and look at this graphically
( d , m) = k
(d , m) (d , m)
( d , m)
( d , m) = k
(d , m) (d , m)
( d , m)
Summary