Sei sulla pagina 1di 100

Lecture Notes

Adjustment Theory
Nico Sneeuw
Geodatisches Institut
Universitat Stuttgart
November 4, 2007

c Nico Sneeuw, 2006



These are lecture notes in progress. Please contact me (sneeuw@gis.uni-stuttgart.de)
for remarks, errors, suggestions, etc.

Contents
1 Introduction
1.1 Adjustment theory a first look . . . . . . . . . . . . . . . . . . . . . . .
1.2 Historical development . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5
5
8

2 Least squares adjustment


13
2.1 Adjustment with observation equations . . . . . . . . . . . . . . . . . . . 13
2.2 Adjustment with condition equations . . . . . . . . . . . . . . . . . . . . . 18
2.3 Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 Generalizations
3.1 Higher dimensions: the A-model . . .
3.2 Higher dimensions: the B-model . . .
3.3 The datum problem . . . . . . . . . .
3.4 Linearization of non-linear observation

. . . . . .
. . . . . .
. . . . . .
equations

4 Weighted least squares


4.1 Weighted observation equations . . . . . . . . .
4.1.1 Geometry . . . . . . . . . . . . . . . . .

4.1.2 Ubertragung
auf Ausgleichungsrechnung
4.1.3 Higher dimensions . . . . . . . . . . . .
4.2 Weighted condition equations . . . . . . . . . .
4.3 Stochastics . . . . . . . . . . . . . . . . . . . .
4.4 Best Linear Unbiased Estimation (blue) . . . .

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

22
22
25
28
33

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

39
39
41
42
43
43
46
47

.
.
.
.
.
.

50
50
52
53
54
54
60

5 Geomatics examples & mixed models


5.1 Ebenes Dreieck . . . . . . . . . . . . . . . . . . . . . .
5.2 Linearisierung von Richtungsbeobachtungen . . . . . .
5.3 Linearisierung von Bedingungsgleichungen . . . . . . .
5.4 Bogenschnitt im Raum mit zusatzlichen Hohenwinkeln
5.5 Polynomausgleich (mit Nebenbedingungen) . . . . . .
5.6 Mixed model . . . . . . . . . . . . . . . . . . . . . . .
6 Statistics

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

65

Contents
6.1
6.2
6.3
6.4

Expectation of sum
Basics . . . . . . .
Hypotheses . . . .
Distributions . . .

of squared residuals
. . . . . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . .

7 Statistical Testing
7.1 Global model test: a first approach
7.2 Testing procedure . . . . . . . . . .
7.3 DIA-Testprinciple . . . . . . . . . .
7.4 Internal reliability . . . . . . . . .
7.5 External reliability . . . . . . . . .
7.6 Reliability: a synthesis . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

65
67
68
71

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

73
73
75
84
85
88
89

.
.
.
.
.
.

92
92
92
92
93
93
94

8 Recursive estimation
8.1 Partitioned model . . . . . . . . . . . . . . . . . .
8.1.1 Batch / offline / Stapel / standard . . . . .
8.1.2 Rekursiv / sequentiell / real-time . . . . . .
8.1.3 Umformen . . . . . . . . . . . . . . . . . . .
8.1.4 Formulierung nach Bedingungsgleichungen .
8.2 Allgemeiner . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

A Partitioning
96
A.1 Inverse Partitioning Method . . . . . . . . . . . . . . . . . . . . . . . . . . 96
A.2 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
B Buchempfehlungen
98
B.1 Wissenschaftliche B
ucher . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
B.2 Popularwissenschaftliche B
ucher, Literatur . . . . . . . . . . . . . . . . . . 99

1 Introduction
Adjustment theory deals with the optimal combination of redundant measurements together with the estimation of unknown parameters.
(Teunissen, 2000)

1.1 Adjustment theory a first look


To understand the purpose of adjustment theory consider the following simple highschool
example that is supposed to demonstrate how to solve for unknown quantities. In case
0 the price of apples and pears is determined after doing groceries twice. After that we
will discuss more interesting shopping scenarios.
Case 0)

2 equations in 2 unknowns:

3 apples + 4 pears = 5.00e


5 apples + 2 pears = 6.00e

5 = 3x1 + 4x2
6 = 5x1 + 2x2

  
 
5
34
x1
as matrix-vector system:
=
6
52
x2
linear algebra: y = Ax
The determinant of matrix A reads det A = 3 2 5 4 = 14. Thus the above linear
system can be inverted:

  

 
1
2 4
5
1
x1
x = A1 y
=
=
x2
6
0.5
14 5 3
So each apple costs 1e and each pear 50 cents. The price can be determined because
there are as many unknowns (the price of apples and the price of pears) as there are
observations (shopping twice). The square and regular matrix A is invertible.

Ausgleichungsrechnung

1 Introduction
Remark 1.1 (terminology) The left-hand vector y contains the observations. The vector
x contains the unknown parameters. The two vectors are linked through the design
matrix A. The linear model y = Ax is known as the model of observation equations.
The following cases demonstrate that the idea of determining unknowns from observations is not as straightforward as may seem from the above example.
Case 1a)
If one buys twice as much apples and pears the second time, and if one has to pay twice
as much as well, no new information is added to the system of linear equations

  
 
3a + 4p = 5e
5
34
x1

=
6a + 8p = 10e
10
68
x2
The matrix A has linearly dependent columns (and rows), i.e. it is singular. Correspondingly det A = 0 and the inverse A1 does not exist. The observations (5e and 10e) are
consistent, but the vector x of unknowns (price per apple or pear) cannot be determined.
This situation will return later with so-called datum problems. Seemingly trivial, case
1a) is of fundamental importance.
Case 1b)
Suppose the same shopping scenario as above, but now one needs to pay 8e the second
time.
 
5
y=
8
In this alternative scenario, the matrix is still singular and x cannot be determined. But
worse still, the observations y are inconsistent with the linear model. Mathematically,
they do not fulfil the compatibility conditions. In data analysis inconsistency is not
necessarily a weakness. In fact, it may add information to the linear system. It might
indicate observation errors (in y), for instance a miscalculation of the total grocery
bill. Or it might indicate an error in the linear model: the prices may have changed in
between, which leads to a different A.
Case 2)
We go back to the consistent and invertible case 0. Suppose a third combination of
apples and pears gives an inconsistent result.
 

x1
34
5
6 = 5 2 x2
12
3
6

1.1 Adjustment theory a first look


The third row is inconsistent with x1 = 1, x2 = 21 from case 0. But one can equally
maintain that the first row is inconsistent with the second and third. In short, we have
redundant and inconsistent information: the number of observations (m = 3) is larger
than the number of unknowns (n = 2). Consequently, matrix A is not a square matrix.
Although a standard inversion is not possible anymore, redundancy is a positive characteristic in engineering disciplines. In data analysis redundancy provides information
on the quality of the observations, it strengthens the estimation of the unknowns and
allows us to perform statistical tests. Thus, redundancy provides a handle to quality
control.
But obviously the inconsistencies have to be eliminated. This is done by spreading them
out in an optimal way. This is the task of adjustment: to combine redundant and
inconsistent data in an optimal way. Two main questions will be addressed in the first
part of this course:
How to combine inconsistent data optimally?
Which criterion defines what optimal is?
Errors
The inconsistencies may be caused by model errors. If the green grocer changed his prices
between two rounds of shopping we need to introduce new parameters. In surveying,
however, the observation models are usually well-defined, e.g. the sum of angles in a
plane triangle equals . So usually the inconsistencies arise from observation errors. To
make the linear system y = Ax consistent again, we need to introduce an error vector e
with the same dimension as the observation vector.
y = A x + e
m1

mn n1

(1.1)

m1

Errors go under several names: inconsistencies, residuals, improvements, deviations,


discrepancies, and so on.
Remark 1.2 (sign convention) In many textbooks the error vector is put at the same
side of the equation as the observations: y + e = Ax. Where to put the e-vector is rather
a philosophical question. Practically, though, one should be aware of the definitions
used, how the sign of e is defined.
Three different types of errors are usually identified:
i) Gross error, also known as blunder or outlier.
ii) Systematic error, or bias.
iii) Random error.

grober Fehler
systematischer F.
Zufallsfehler

1 Introduction
These types are visualized in fig. 1.1. In this figure, one can think of the marks left
behind by the arrow points in a game of darts, in which one attempts to aim at the
bulls eye.

(a) gross error

(b) systematic error

(c) random error

Figure 1.1: Different types of errors.

Zufallsvariable

Whatever the type, errors are stochastic quantities. Thus, the vector e is a (m-dimensional)
stochastic variable. The vector of observations is consequently also a stochastic variable.
Such quantities will be underlined, if necessary:
y = Ax + e .
Nevertheless, it will be assumed in the sequel that e is drawn from a distribution of
random errors.

1.2 Historical development


The question how to combine redundant and inconsistent data has been treated in
many different ways in the past. To compare the different approaches, the following
mathematical framework is used:
observation model:
combination:

y = Ax
L

y = L

nm m1

invert:

A x

nm mn n1

x = (LA)1 Ly
= By

From a modern viewpoint matrix B is a left-inverse of A because BA = I. Note that


such a left-inverse is not unique, as it depends on the choice of the combination matrix
L.

1.2 Historical development


Method of selected points before 1750
A simple way out of the overdetermined problem is to select only so many observations
(points) as there are unknowns. The remaining unused observations may be used to
validate the estimated result. This is the so-called method of selected points. Suppose
one uses only the first n observations. Then:
L =[ I
nm

nn

n(mn)

The trouble with this approach, obviously, is the arbitrariness of the choice of n observations. There are m
n choices.

From a modern perspective the method of selected points resembles the principle of
cross-validation. The idea of this principle is to deliberately leave out a limited number
of observations during the estimation and to use the estimated parameters to predict
values for those observations that were left out. A comparison between actual and
predicted observations provides information on the quality of the estimated parameters.
Method of averages ca. 1750
In 1714 the British government offered the Longitude Prize for the precise determination
of a ships longitude. Tobias Mayers1 approach was to determine longitude, or rather
time, through the motion of the moon. In the course of his investigations he needed
to determine the libration of the moon through measurement to lunar surface (craters).
This led him to overdetermined systems of observation equations:
y = A
271

273

x
31

Mayer called them equations of conditions, which is, from todays view point, an unfortunate designation.
Mayers adjustment strategy:
distribute the observations into three groups
sum up the equations within each group
solve the 3 3-system.

1 1 1 0 0 0 0 0 0 0 0
L = 0 0 0 0 1 1 1 0 0 0 0
327
0 0 0 0 0 0 0 0 1 1 1
1

Tobias Mayer (17231762) made the breakthrough that enabled the lunar distance method to become
a practicable way of finding longitude at sea. As a young man, he displayed an interest in cartography and mathematics. In 1750, he was appointed professor in the Georg-August Academy in
G
ottingen, where he was able to devote more time to his interests in lunar theory and the longitude
problem. From 1751 to 1755, he had an extensive correspondence with Leonhard Euler, whose work
on differential equations enabled Mayer to calculate lunar distance tables.

1 Introduction
Mayer actually believed each aggregate of 9 observations to be9 times more precise
than a single observation. Today we know that this should be 9 = 3.
Eulers attempt 1749
Leonhard Euler2
Background:
Orbital motion of the Saturn under influence of Jupiter
Stability of the solar system
Prize (1748) of the Academy of Sciences, Paris
75 observations from the years 15821745; 6 unknowns Given up!
Euler was mathematician Error bounds
Laplaces attempt ca. 1787
Laplace3
Background: Saturn, too
Reformulated: 4 unknowns
Best Data: 24 Observations
Approach: like Mayer, but other combinations:

= A x

241

244 41

= L

424 241

A x

424 244 41

x = (LA)1 Ly

Euler (17071783) was a Swiss mathematician and physicist. He is considered to be one of the greatest
mathematicians who ever lived. Euler was the first to use the term function (defined by Leibniz in
1694) to describe an expression involving various arguments; i.e., y = F (x). He is credited with being
one of the first to apply calculus to physics.
3
Pierre-Simon, Marquis de Laplace (17491827) was a French mathematician and astronomer who put
the final capstone on mathematical astronomy by summarizing and extending the work of his predecessors in his five volume Mecanique Celeste (Celestial Mechanics) (17991825). This masterpiece
translated the geometrical study of mechanics used by Newton to one based on calculus, known as
physical mechanics. He is also the discoverer of Laplaces equation and the Laplace transform, which
appear in all branches of mathematical physics a field he took a leading role in forming. He became
count of the Empire in 1806 and was named a marquis in 1817 after the restoration of the Bourbons.
Pierre-Simon Laplace was among the most influential scientists in history.

10

1.2 Historical development

L =
424

1
1
1
0

1
1
0
1

1
1
1
0

1 1 1 1
1 1 1 1
1 0 0 1
0 1 1 0

1
1
0
1

1
1
0
1

1
1
1
0

1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 0 1 0 0 1 1 0 1 0 0 1 1
0 1 1 0 1 1 0 0 1 0 1 1 0 0

Method of least absolute deviation 1760


Roger Boscovich4
Ellipticity of the Earth
5 Observations (Quito, Kapstadt, Rom, Paris, Lappland)
2 unknowns

M () =

a(1 e2 )

(1 e2 sin2 ) 2

3
= a(1 e2 )(1 + e2 sin2 + ...)
2

M (0) = a(1 e2 ) < a

M ( ) = a 1e2 = a > a
3

2
1e2
(1e2 ) 2
= x1 + sin2 x2


5!
=
First attempt All 52 = 2!(52)!
10 systems of equations (2 2)
10 solutions
Comparison of results.

54321
21321

= 10 combinations with 2 observations each.

His result: gross variations of the ellipticity Reject the ellipsoidal hypothesis.
Second attempt

The mean deviation (or sum of deviations) should be zero:


5
X

ei = 0 ,

i=1

and the sum of absolute deviations should be minimum:


5
X
i=1

|ei | = min .

Rudjer Josip Boskovic aka. Roger Boscovich (17111787) was a Croatian Jesuit, a mathematician and
an innovative physicist, he was active also in astronomy, nature philosophy and poetry as well as
technician and geodesist.

11

1 Introduction
This is an objective adjustment criterion, although its implementation is mathematically
difficult. This is the approach of L1 -norm minimization.
Method of least squares 1805
Methode der kleinsten
Quadrate

In 1805 Legendre5 published his method of least squares (in French: moindres carres).
The name least squares refers to the fact the sum of square residuals is minimized.
Legendre developed the method for the determination of orbits of comets and to derive
the Earth ellipticity. As will be derived in the next chapter, the matrix L will be the
transposed of the design matrix A:
E =

5
X
i=1

e2i = eT e = (y Ax)T (y Ax) = min


x

L = AT

T
x
= (A
A})1 AT y
{z
|
nm m1
n1
nn

After Legendres publication Gauss states that he already developed and used the
method of least squares in 1794. He published his own theory only several years later.
A bitter argument over the scientific priority broke out. Nowadays it is acknowledged
that Gausss claim of priority is very likely valid but that he refrained from publication
because he found his results still premature.

Adrien-Marie Legendre (17521833) was a French mathematician. He made important contributions


to statistics, number theory, abstract algebra and mathematical analysis.

12

2 Least squares adjustment


Legendres method of least squares is actually not a method. Rather, it provides the
criterion for the optimal combination of inconsistent data: combine the observations
such that the sum of squared residuals is minimal. It was seen already that this criterion
defines the combination matrix L:
Ly = LAx = x = (LA)1 Ly .
But what is so special about L = AT ? In this chapter we will derive the equations of
least squares adjustment from several mathematical viewpoints:
geometry: smallest distance (Pythagoras)
linear algebra: orthogonality between the optimal e and the columns of A: AT e = 0
calculus: minimizing target function differentiation
probability theory: BLUE (Best Linear Unbiased Estimate)
These viewpoints are elucidated by a simple but fundamental example in which a distance
is measured twice.

2.1 Adjustment with observation equations


We will start with the model of the introduction y = Ax. This is the model of observation
equations, in which observations are linearly related to unknowns.
Suppose that, in order to determine a certain distance, it is measured twice. Let the
unknown distance be x and the observations y1 and y2 :

    
x
y1 = x
y1
1
=
=
= y = ax
(2.1)
y2
y2 = x
1
If y1 = y2 the equations are consistent and the parameter x clearly solvable: x = y1 = y2 .
If, on the other hand, y1 6= y2 the equations are inconsistent and x not solvable directly.

13

vermittelnde
Ausgleichung

direkte
Beobachtungen

2 Least squares adjustment


Given a limited measurement precision the latter scenario will be more likely. Lets
therefore take into account measurement errors e.
      
x
y1
1
e1
=
+
= y = ax + e
(2.2)
y2
1
e2
A geometric view
Spaltenraum

The column vector a spans up a line y = ax in R2 . This line is the 1D model space
or range space of A: R(A). Inconsistency of the observation vector means that y does
not lie on this line. Instead, there is some vector of discrepancies e that connects the
observations to the line. Both this vector e and the point on the line, defined by the
unknown parameter x, must be found, see the left panel of fig. 2.1.

y=ax
e?

e
e

ax

e
ax=y

ax
ax
ax

ax?

(a) Inconsistent data: the observation


vector y is not in the model space, i.e.
not on the line spanned by a.

(b) Least squares adjustment means


orthogonal projection of y onto
the line ax. This guarantees the
shortest e.

Figure 2.1
Adjustment of observations is about finding the optimal e and x. An intuitive choice
for optimality is to make the vector e as short asP
possible. The shortest possible e is
Te
indicated
by
a
hat:
e

.
The
squared
length
e

=
2i is the smallest of all possible
ie
P
eT e = i e2i , which explains the name least squares. If e is determined, we will at the
same time know the optimal x
.
How do we get the shortest e? The right panel of fig. 2.1 show that the shortest e is
perpendicular to a:
e a

14

2.1 Adjustment with observation equations


Subtracting e from the vector of observations y leads to the point y = a
x that is on
the line and closest to y. This is the vector of adjusted observations. Being on the line
means that y is consistent.
If we now substitute e = y a
x, the least squares criterion leads us subsequently to
optimal estimates of x, y and e:
orthogonality e a
normal equations

aT e = 0

(2.3a)

aT (y a
x) = 0

(2.3b)

aT a
x = aT y

(2.3c)

LS estimate of x

x
= (aT a)1 aT y

LS estimate of y

LS estimate of e
sum square residuals

(2.3d)
1 T

y = a
x = a(a a)

a y
T

(2.3e)
1 T

e = y y = [I a(a a)

eT e = y T [I a(aT a)1 aT ]y

a ]y

(2.3f)
(2.3g)

Exercise 2.1 Call the matrix in square brackets P and convince yourself that the sum
of squares of the residuals (the squared length of e) in the last line indeed follows from
the line above. Two things should be shown: that P is symmetric, and that P P = P .
The least squares criterion leads us to the above algorithm. Indeed, the combination
matrix reads L = AT .
A calculus view
Let us define the Lagrangian or cost function:
1
La (x) = eT e ,
2

(2.4)

which is half of the sum of square residuals. Its graph would be a parabola. The factor
1
2 shouldnt worry us. If we find the minimum La , then any scaled version of it is also
minimized. The task is now to find the x
that minimizes the Lagrangian. With e = yax
we get the minimization problem:
1
min La (x) = min (y ax)T (y ax)
x

x
2


1 T
1 T 2
T
= min
y y xa y + a ax
.
x

2
2
15

2 Least squares adjustment


The term 12 y T y is just a constant that doesnt play a role in the minimization. The
minimum occurs at the location where the derivative of La is zero (necessary condition):
dLa
(
x) = aT y + aT a
x = 0.
dx
The solution of this equation, which happens to be the normal equation (2.3c) is the x

were looking for:


x
= (aT a)1 aT y .
To make sure that the derivative does not give us a maximum, we must check that the
second derivative of La is positive at x
(sufficiency condition):
d2 La
x) = aT a > 0 ,
(
dx2
which is a positive constant for all x indeed.
Projectors
Figure 2.1 shows that the optimal, consistent y is obtained by an orthogonal projection
of the original y onto the line ax. Mathematically this was translated by (2.3e) as:

with

y = a(aT a)1 aT y

(2.5a)

y = Pa y

(2.5b)

Pa = a(aT a)1 aT .

(2.5c)

The matrix Pa is an orthogonal projector. It is an idempotent matrix, meaning:


Pa Pa = a(aT a)1 aT a(aT a)1 aT = Pa .

(2.6)

It projects onto the line ax along a direction orthogonal to a. With this projection in
mind, the property Pa Pa = Pa becomes clear: if a vector has been projected already,
the second projection has no effect anymore.
Also (2.3f) can be abbreviated:
e = y Pa y = (I Pa ) y = Pa y ,
which is also a projection. In order to give e the vector y is projected onto a line
perpendicular to ax along the direction a. And, of course, Pa is idempotent as well:
Pa Pa = (I Pa )(I Pa ) = I 2Pa + Pa Pa = I Pa = Pa .

16

2.1 Adjustment with observation equations


Moreover, the definition (2.5c) makes clear that Pa and Pa are symmetric. Therefore
the square sum of residuals (2.3g) could be simplified to:
T

eT e = y T Pa Pa y = y T Pa Pa y = y T Pa y .
At a more fundamental level the definition of the orthogonal projector Pa = I Pa can
be recast into the equation:
I = Pa + Pa .
Thus, we can decompose every vector, say z, into two components: one in component
in a subspace defined by Pa , the other mapped onto a subspace by Pa :



z = Iz = Pa + Pa z = Pa z + Pa z .
In the case of ls adjustment, the subspaces are defined by the range space R(a) and its
orthogonal complement R(a) :
y = Pa y + Pa y = y + e ,
which is visualized in fig. 2.1.
Numerical example
With a = (1 1)T we will follow the steps from (2.3a):
(aT a)
x = aT y

x
= (aT a)1 aT y
y = a(aT a)1 aT y
e = y y
eT e

2
x = y1 + y2
x
= 12 (y1 + y2 )
(average)

 

y1
y1 + y2
1
= 2
y2
y1 + y2

 

e1
y1 y2
= 12
(error distribution)
e2
y1 + y2

21 (y1 y2 )2

(least squares)

Exercise 2.2 Verify that the projectors are






1 11
1
1 1

Pa =
and Pa = I Pa =
2 11
2 1 1
and check the equations y = Pa y and e = Pa y with the numerical results above.

17

zerlegen

2 Least squares adjustment

2.2 Adjustment with condition equations


In the ideal case, in which the measurements y1 and y2 are without error, both observations would be equal: y1 = y2 or y1 y2 = 0. In matrix notation:
1 1

Widerspruch



y1
y2

bT y = 0 .

12 21

(2.7)

11

In reality, though, both observations do contain errors, i.e. they are not equal: y1 y2 6= 0
or bT y 6= 0. Instead of 0 one would obtain a misclosure w. If we recast the observation
equation into y e = ax, it is clear that it is (y e) that has to obey the above condition:
bT (y e) = 0

Bedingungsgleichung

=0

w := bT y = bT e .

(2.8)

In this condition equation the vector e is unknown. The task of adjustment according
to the model of condition equations is to find the smallest possible e that fulfills the
condition (2.8). At this stage, the model of condition equations does not involve any
kind of parameters x.
A geometric view
The condition (2.8) describes a line with normal vector b that goes through the point y.
This line is the set of all possible vectors e. We are looking for the shortest e, i.e. the
point closest to the origin. Figure 2.2 makes it clear that e is perpendicular to the line
bT e = w. So e lies on a line through b.

Sch
atzer

Geometrically, e is achieved by projecting y onto a line through b. Knowing the definition


of the projectors from the previous section, we here define the following estimators by
using the projector Pb :
e = Pb y = b(bT b)1 bT y
T

(2.9a)

1 T

y = y e = y b(b b)

b y

= [I b(bT b)1 bT ]y = Pb y

eT e = y T Pb y = y T b(bT b)1 bT y

(2.9b)
(2.9c)

Exercise 2.3 Confirm that the orthogonal projector Pb is idempotent and verify that
the equation for eT e is correct.

18

2.2 Adjustment with condition equations

b e=b y
bTe=bTy

y
y

ee

e?
b

(a) The condition equation describes a


line in R2 , perpendicular to b and going through y. We are looking for a
point e on this line.

(b) Least squares adjustment with


condition equations means orthogonal projection of y onto the
line through b. This guarantees
the shortest e.

Figure 2.2
Numerical example

With bT = 1 1 we get

Pb = b(bT b)1 bT =
=

e = Pb y =

Pb = I Pb =
=

1
(bT b)1 =
2





1
1
1 1
1
1 1
=
2 1
2 1 1


1
y1 y2
2 y1 + y2






1
1 11
10
1 1

=
01
2 1 1
2 11


1 y1 + y2
2 y1 + y2

bT b = 2

y = Pb y =

These results for y and e are the same as those for the adjustment with observation
equations. The estimator y describes the mean of the two observations, whereas the
estimator e distributes the inconsistencies equally. Also note that Pb = Pa and vice
versa.

19

2 Least squares adjustment


A calculus view
Alternatively we can again determine the optimal e by minimizing the target function
Lb (e) = eT e, but now under the condition bT (y e) = 0:
min Lb (e) = eT e under bT (y e) = 0 ,
e

min Lb (e, ) =

e,

1 T
e e + (bT y bT e) .
2

(2.10a)
(2.10b)

The main trick here due to Lagrange is to not consider the condition as a constraint or
limitation of the minimization problem. Instead, the minimization problem is extended.
To be precise, the condition is added to the original cost function, multiplied by a factor
. Such factors are called Lagrangian multipliers. In case of more than one condition,
each gets its own multiplier. The target function Lb is now a function of e and .
that minimize the extended
The minimization problem now exists in finding the e and
Lb . Thus we need to derive the partial derivatives of Lb towards e and . Next, we

impose the conditions that these partial derivatives are zero when evaluated in e and .
L
=0
(
e, ) = 0 =
e b
e
L
(
e, ) = 0 = bT y bT e = 0

In matrix terms, the minimization problem leads to:



  

e
I b
0
= bT y .
bT 0

(2.11)

Because of the extension of the original minimization problem, this system is square. It
might be inverted in a straightforward manner, see also A.1. Instead, we will solve it
stepwise. First, rewrite the first line:
=0
e b

.
e = b

This result is then used to eliminate e in the second line:


= 0,
bT y bT b
which is solved by:

= (bT b)1 bT y .

With this result we go back to the first line:


e b(bT b)1 bT y = 0 ,
which is finally solved by:
e = b(bT b)1 bT y = Pb y .
This is the same estimator e as (2.9a).

20

2.3 Synthesis

2.3 Synthesis
Both the calculus and geometric approach provide the same ls estimators. This is due
to
Pa = Pb and Pb = Pa ,
as can be seen in fig. 2.3. The deeper reason is that a is perpendicular to b:
 
1 1
1
T
b a=
= 0,
1

(2.12)

which fundamentally connects the model with observation equations to the model with
condition equations. Starting with the observation equation, and applying the orthogonality, one ends up with the condition equation:
bT

bT a=0

y = ax + e bT y = bT ax + bT e bT y = bT e .
T

b e=b y
y
e

y=ax
y=Pa y=Pb y

e=Pb y=Pa y

Figure 2.3: Least squares adjustment


with observation equations and with condition
equations in terms of the
projectors Pa and Pb .

21

3 Generalizations
In this chapter we will apply several generalizations. First we will take the ls adjustment
problems to higher dimensions. What we will basically do is replace the vector a by an
(m n) matrix A and replace the vector b by an (m (m n)) matrix B. The basic
structure of the projectors and estimators will remain the same.
Moreover, we need to be able to formulate the 2 ls problems with constant terms:
y = Ax + a0 + e

and B T (y e) = b0 .

Next, we will deal with nonlinear observation equations and nonlinear condition equations. This will involve linearization, the use of approximate values, and iteration.
We will also touch upon the datum problem, which arises if A contains dependent
columns. Mathematically we have rank A = 0 so that the normal matrix has det AT A = 0
and is not invertible.

3.1 Higher dimensions: the A-model


The vector of observations y, the vector of inconsistencies e and their respective lsestimators will be an (m 1) vectors. The vector x will contain n unknown parameters.
Thus the redundancy, that is the number of redundant observations, is:
redundancy: r = m n .
Geometry
Absolutgliedvektor

y = Ax + e is the multidimensional extension of y = ax + e with given (reduced) vector


of observations y.
We split A in its n column vectors ai , i = 1, . . . , n
m1

A = [ a1 , a2 , a3 , . . . , an ]
mn

22

m1

m1

m1

m1

3.1 Higher dimensions: the A-model

y =
m1

n
X
i=1

ai xi + e ,
m1 11

m1

which span an n-dimensional vector space as a subspace of Em .


Example: m = 3, n = 2 ( y spans an E3 )
m1

y3

y R(A)
a1
y

e1

e2

a2
y2

Ax=y

a2

Ax?

Ax?
a1

y1

(a) The vectors y, a1 and a2 all lie in R3 .

(b) To see that y is inconsistent, the space


spanned by a1 and a2 , is shown as the
base plane. It is clear that the observation vector is not in this plane
y 6 R(A), i.e. y cannot be written as
a linear combination of a1 and a2 .

Figure 3.1

e = PA y = [I A(AT A)1 AT ]y

y = PA y = A(AT A)1 AT y = A
x
x
= (AT A)1 AT y
(AT A)1 exists iff rank A = n = rank(AT A)

genau dann, wenn

23

3 Generalizations
Calculus

1 T
e e
2
1
=
(y Ax)T (y Ax)
2
1 T
1
1
1
=
y y y T Ax xT AT y + xT AT Ax
2
2
2
2

LA (x) =

min

L
(
x) = 0 = e = y y = [I A(AT A)1 AT ]y = PA y
x
PA idempotent?
PA PA = [I A(AT A)1 AT ][I A(AT A)1 AT ]

= I 2A(AT A)1 AT + A(AT A)1 AT A(AT A)1 AT


|
{z
}
=I

= I A(A A)

= PA

y = PA y = A(AT A)1 AT y

Example: height network


P1
1,
km

km

0,9

0,8

1k

5k

PB

PA
P2

P3

0,5km

Figure 3.2

24

3.2 Higher dimensions: the B-model

h1B = HB H1 + e1B
h13 = H3 H1 + e13
h12 = H2 H1 + e12
h32 = H2 H3 + e32
h1A = HA H1 + e1A
hT := [h1B , h13 , h12 , h32 , h1A ] vector of levelled height differences
H1 , H2 , H3 unknown heights of points P1 , P2 , P3
HA , HB given bench marks
In matrix notation:


h1B
1
h13 1


h12 = 1


h32 0
h1A
1

0 0
H1
HB
e1B

0 e13
0 1
H2

+ e12
1 0 H3
0
+

0 e32
1 1
0 0
HA
e1A


1
h1B HB

h
13
1

= 1

h12

h32
h1A HA
1

e1B
H1
0 0
e13

0 1

H2

e
1 0 H3
+
12

e32
1 1
e1A
0 0

(or y = Ax + e)

y =A x + e
51

53 31

51

3.2 Higher dimensions: the B-model


In the ideal case we had

h1B h1A = (HB H1 ) (HA H1 ) = HB HA


h13 + h32 h12 = (H3 H1 ) + (H2 H3 ) (H2 H1) = 0
or

25

3 Generalizations

1 0 0 0 1
0 1 1 1 0




h1B
1 0 0 0 1
HB
h13

0 1 1 1 0

0
h12 =
0 .

0
h32
h1A
HA

Due to erroneous observations a vector e of unknown inconsistencies must be introduced


in order to make our linear model consistent.


1 0 0 0 1
0 1 1 1 0




h1B e1B
1 0 0 0 1
HB
h13 e13

0 1 1 1 0

0
h12 e12 =
0 .

h32 e32
0
h1A e1A
HA

or
B

25

h e
51

51

= BTc .
21

Connected with this example are the questions


Q 1: How to handle constants like the vector c?
Q 2: How many conditions must be set up?
Q 3: Is the solution of the B-model identical to the one of the A-model?
A 1: Starting from
B T (h e) = B T c
where solely e is unknown we collect all unknown parts on the left and all known
quantities on the right hand side

= B T h B T e = B T c

B T e = B T h B T c

B T e = B T y =: w
rm m1

26

r1

3.2 Higher dimensions: the B-model


w : vector of misclosures w := B T y
y : reduced vector of observations
r : number of conditions
A 2: The number of conditions equals the redundancy
r =mn
Sometimes the number of conditions can hardly be determined without knowledge
on the number n of unknowns in the A-model. This will be treated later in more
detail together with the so-called datum problem.
A 3:
LB (e, ) =

1 T
e
e + T ( B T e w ) min
e,
2 1m m1 1r rm m1
r1
|
{z
}
11

LB
=
(
e, )
e

= 0
e + B

m1

mr r1

m1

LB
= B T e w = 0
(
e, )
(w = B T y)

rm m1
r1
r1

 

e
I
B
0
mr
= m1
= mm

BT 0
w
rm

rr

= B T B
=w
e = B
= (B T B)1 w
=

m1

rank(B T B) = b

= e = +B(B T B)1 w

= B(B T B)1 B T y
= PB y
y = y e

= [I B(B T B)1 B T ]y
= PB y

Transition parametric model y = Ax + e


model of condition equations B T e = B T y
Left multiply y = Ax + e by B T

27

3 Generalizations
B T y = B T Ax + B T e B T A = 0



1 0 0 0 1
1 0 0
1 0 1 

0 1 1 1 0

0
0
0
1 1 0 =

000
0 1 1
23
1 0 0
25

53

3.3 The datum problem


Approach 1: reduce problem
shrink solution space

h12 = H2 H1
h12
1 1 0
H1
h13 = H3 H1 = h13 = 1 0 1 H2

h32 = H2 H3
h32
0 1 1
H3
= y = A x
31

33 31

wobei m = 3, n = 3, rank A = 2




0 1
1 0



= 1 + (1) = 0
det A = 1
(1)
1 1
1 1
= A and (AT A) are not invertible
dim N (A) > 0
Ax = 0 has a nontrivial solution = homogeneous solution xhom 6= 0
Konsequenz
math.: A(x + xhom ) = Ax + Axhom = Ax = y
| {z }
=0

vert.: Beobachtungen sind blind fuer Nullraum

vert.: Unbekannte (also Hoehennetz) kann geaendert werden ohne y zu beeinflussen.


Homogene Losung
spontan/geometrisch: Hohentranslation des ganzen Netzes

28

3.3 The datum problem

math.: xhom

= 1
1

Rangdefekt in A = Datumproblem
- Beobachtungen nicht beeintraechtigt
- Unbekannte nicht loesbar
moegliche Loesung
- lege dim N (A) Parameter fest (Datumfestlegung oder Datumdefinition)
- Wie? Eine Spalte eliminieren und Beobachtungen anpassen
z.B. lege H1 fest




h12 + H1
1 0
H2
= h13 + H1 = 0 1 H3
h32
1 1
Approach 2: augment problem
augment solution space

h13 + h32 h12 = 0 =

1 1 1

h12
h13 = 0
h32

Redundanz = 1 = 1 Bedingung
Datumproblem?
B T y = B T Ax + B T e = 0 kein Datumproblem.
Material Krumm
im Streckennetz u
ber (Zusatz-) Bedingungsgleichungen

Bisher: Datumfestlegung im Streckennetz der Ubung


5 durch Nullsetzen der Zuschlage
xA , xC , yD Fixierung der gegebenen Koordinaten xA , xC , yD Eliminieren
der entsprechenden Spalten in der Designmatrix
Besser, weil flexibler: Hinzuf
ugen von Bedingungsgleichungen
lineares Modell mit rangdefekter Designmatrix A
y = A x + c
m1

mn n1

m1

29

3 Generalizations
Bedingungsgleichungen zur Datumdefinition oder auch zur Erzwingung anderer Zwangsbedingungen, wobei r = n rank A (im Allgemeinen r Bedingungsgleichungen)
BT x = c
rn n1

r1

Streckennetz der unbekannten Parameter xA , yA , . . . , xI , yI

1 0 0 0 0 0 0 0 ... 0
BT = 0 0 0 0 1 0 0 0 . . . 0 ,
0 0 0 0 0 0 0 1 ... 0


0
c = 0
0

Im Allgemeinen sind auch beliebige andersartige Bedingungen zulassig (z. B. Summe der
Koordinatenzuschlage = vorgegebener Wert)

B T = 1 1 1 1 1 1 1 1 ... 1 ,
c = ...
Behandlung u
ber Lagrangefunktion (Qy = Py = I)
L(x, ) =
=

1 T
e
e + T ( B T x c ) min
x,
2 1m m1
1r
rn n1
r1
1
(y Ax)T (y Ax) + T (B T x c)
2

L
= AT y + AT A
c =! 0
(
x, )
x + B
x
L
!
= BTx
(
x, )
=0

T

T
x

A A B
A y
nr n1
n1

= nn
=
M z = v

BT 0
c

rn
rr
(n+r)(n+r)

r1
(n+r)1

r1
(n+r)1

 
x

= z = = M 1 v . . .

Material Ersatz Sneeuw


y = Ax + e under

DT x = c
rn n1

e. g.

30

H1
1 0 0 H2 = H
H3


r1

3.3 The datum problem

1 T
e e + (DT x c)
2
1 T
1
=
y y y T Ax + xT AT Ax + (DT x c)
2
2

LD (x, ) =

LD
= AT y + AT Ax + D := 0
x
LD
= DT x c := 0

 T
    T 
x

A AD
A y
=
= M z = v
=
DT 0
c

(n+r)(n+r)

(n+r)1

E. g.

1
A = 1
0

1 0
1
0 1 = AT A = 1
1 1
0

1 0
1
0 1 1
1 1
0

2 1 1
1 2 1
M =
1 1 2
1 0 0

1 0
2 1 1
0 1 = 1 2 1
1 1
1 1 2

1
0

0
0



1 2 1
2
1
det M = 1 det 1 1 2 = 1 1 det
= 3
1 2
1 0 0

= M regular

Solution in general form?


1
x
= AT A + DDT
AT y

Side remark: more general conditions on parameters


B T x = b0

L=

B T = 1 1 1 1 1 1 1 1 ... 1

1 T
e e + T (Ax + e y)
2

31

3 Generalizations
L
= e+
e
L
= Ax + e y

L
= AT
x
1
LD = (y Ax)T (y Ax) + T (DT x c)
2
 T
   T 
x

A AD
A y
=
DT 0
c

= AT y
AT A
x + D
DDT x
= Dc
= (AT A + DDT )
x = AT y + Dc
Material Sharifi

y = A x + e
m1

mn n1

m1

DT x = 0
dn n1

d1

1 T
e e = min
2
1
= Lf (x, e) = (y Ax)T (y Ax) + T (DT x)
2
1 T
= (y xT AT )(y Ax) + T DT x
2
1
= [y T y y T Ax xT AT y + xT AT Ax] + T DT x
2
1
= [y T y 2y T Ax + xT AT Ax] + T DT x
2
Lf
x
Lf

32

T=0
= y T A + x
T AT A + D
T
= (DT x
)

=0
AT y + AT A
x + D
T
D x
=0

3.4 Linearization of non-linear observation equations




AT A D
DT 0

   T 
x

A y
=

Case 1: rank(AT A) = rank A = n d and rank DT = d


since
dim N (A) = d = H | A H T = 0
dn

mn nd

AH T = 0
AT AH T = 0

md

H(AT A) = 0

3.4 Linearization of non-linear observation equations


Planar distance observation
sAB =

p
?
(xB xA )2 + (yB yA )2 y = Ax

answer: linearize, Taylor series expansion


General 1-D-formulation

y = f (x),

x0



1 d2 f
df
= f (x0 ) +
(x x0 ) +
(x x0 )2
dx x0
2 dx2 x0
|
{z
}


df
y y0 =
(x x0 ) + ...
dx x0

neglectable if x x0 small


df
y =
(x x0 ) +
O(x2 )
,
| {z }
dx 0
|
{z
} terms of higher order
linear model

x := x x0

= model errors

General multi-D formulation

yi = fi (xj ),

i = 1, ..., m; j = 1, ..., n

xj,0 yi,0 = fi (xj,0 )

33

3 Generalizations

f1
y1 =
x1 +
x1 0

f2
y2 =
x1 +
x1 0
..
.

ym


f1
x2 + +
x2 0

f2
x2 + +
x2 0


f1
xn
xn 0

f2
xn
xn 0




fm
fm
fm
=
x1 +
x2 + +
xn .
x1 0
x2 0
xn 0

Terms of second order and higher have been neglected.

y1
y2
..
.
ym

f1
x1

..
= .

x1


x1


.. x2 y = A(x )x
..

.
0
.
.
..
fm
fm

x2 xn
{z
}0 xn
f1
x2

f1
xn

Jacobian matrix A

Example:
Linearization of planar distance observation equation (given Taylor point of expansion
0 , x0 , y 0 = approximate values of unknown point coordinates); explicit differenis x0A , yA
B B
tiation

measured sAB =

q
p
2
(xB xA )2 + (yB yA )2 = x2AB + yAB

xA = x0A + xA ,

sAB

0
yA = yA
+ yA ,

0
xB = x0B + xB , yB = yB
+ yB
q
2
2
0 + y y 0 + y
=
x0B + xB x0A + xA
+ yB
B
A
A
q
2

0 y0 2
=
x0B x0A + yB
A
|
{z
}
s0AB
(distance from
approximate coordinates)





sAB
sAB
sAB
sAB
x
+
x
+
y
+
yB
A
B
A
xA 0
xB 0
yA 0
yB 0

sAB
sAB xAB
1
1
xB xA
=
= q
2xAB (1) =
xA
xAB xA
2 x2 + y 2
sAB
AB
AB
34

3.4 Linearization of non-linear observation equations


xB xA
sAB
,
=+
xB
sAB

= sAB :=

s s0
|AB {z AB}

sAB
yB yA
,
=
yA
sAB

reduced observation

x0B x0A
s0AB

sAB
yB yA
=+
yB
sAB

0 y 0 x0 x0 y 0 y 0
yB
A
B
A
B
A
s0AB
s0AB
s0AB

y = A(x0 )x

xA
 y
A

xB
yB

Sometimes it is more convenient to use implicit differentiation within the linearization


of observation equations.
Depart from s2AB = (xB xA )2 + (yB yA )2 instead from sAB and calculate the total
differential:
2/sAB dsAB = 2/ (xB xA ) ( dxB dxA ) + 2/ (yB yA ) ( dyB dyA )
Solve for dsAB , introduce approximate value and switch from d :
sAB := sAB s0AB =

0 y0
yB
x0B x0A
A
(x

x
)
+
(yB yA )
B
A
s0AB
s0AB

Iteration (fig. 3.3)


Functional model:
y = f (x)
= Taylor:
f (x) =

X
f (n) (x0 )

n=0

n!

(x x0 )n

Linearization:
f (x) = f (x0 ) + f (x0 )(x x0 ) + O,

where O are small


terms/model errors of degree > 1

= f (x) f (x0 ) = f (x0 )(x x0 ) + O


y = f (x0 )x + O

df
y =
x + O
dx
x0

35

3 Generalizations
This results in linear model:

df
x + e = A(x0 )x + e
y =
dx x0
The datum problem again
Matrix A rank deficient (rank A < n)
mn

A has linear dependent columns


Ax = 0 has non-trivial solution xhom 6= 0
det(AT A) = 0
AT A has zero eigenvalues
Example: planar distance network (fig. 3.4)
Rank defect:
Translation 2 parameters (x-, y-direction)
Rotation 1 parameter
= total of 3 parameters
= rank A = n 3
9 points n = 18 3 = 15; m = 19 thus r = 4
Conditional adjustment: How many conditions?

36

Answer b = r.

3.4 Linearization of non-linear observation equations

Nonlinear observation equations y = f(x)


Approximate values

x0

Dy(x0) = A(x0) Dx + e, Dy(x0) := y(x) - y(x0)


Redefined
approximate
values

linear model

Additional constraints, e.g.


datum constraints

x0 := x
^

Dx = [ AT(x0) A(x0) ]-1 AT(x0) Dy(x0)


^

updated approximate values


adjusted original parameters

PDxP2 < e

Stop criteria

x^ = x0 + Dx
No !

estimated unknown
parameters

Yes!
y^ = A x^
Error in iteration
process !

^
e
= y - y^

No !

adjusted (estimated) observations

estimated residuals (inconsistencies)

A e^ = 0

orthogonality check satisfied ?

Yes!
No !

^
y^ - f(x)
=0

main check: nonlinear obervation


equations satisfied by adjusted
observations ?

Yes!
Error in iteration
process or erroneous
linearization !
Figure 3.3: Iterative scheme

37

3 Generalizations

726000

A
B
C

725000

724000

D
I
723000

F
E

722000

186000

185000

184000

183000

x
(a) distance network
A

A
B

or

D
I

(b) Four lines may be deleted without destabilizing the net.

Figure 3.4
38

4 Weighted least squares


Beobachtungen sind nicht mehr gleichgewichtig (Beobachtungen sind unterschiedlich
genau).

4.1 Weighted observation equations


Analytisch
Zielfunktion:
y1 w1
y2 w2


1
w1 (y1 ax)2 + w2 (y2 ax)2
2


1
w1 0
= (y ax)T
(y ax)
0 w2
2

Eaw =

1
(y ax)T W (y ax)
2
1
1
= y T W y y T W ax + xT aT W ax
2
2
1
= eT W e
2
=

notwendige Bedingung:
dEa
(
x)
dx
dE
=
= y T W a + aT W ax = aT W y + aT W a
x=0
dx

x
: min Ea (x) =
x

= aT W a
x = aT W y Normalgleichung
hinreichende Bedingung:
d2 E
= aT W a > 0,
dx

erf
ullt, da W positiv definit

39

4 Weighted least squares

aT W (y a
x) = 0 = aT W e = 0
= e W a

aT W a
x = aT W y

Normalgleichung
WKQ-Schatzung von x
(gewichtete, kleinste Quadrate)

x
= (aT W a)1 aT W y

WKQ-Schatzung von y

y = a
x = a(aT W a)1 aT W y
h
i
e = y y = I a(aT W a)1 aT W y

WKQ-Schatzung von e

 



a=

 w1 0

1

 aT W = 1 1
= w1 w2
w1 0
0 w2

W =

0 w2
T

a W a = w1 w2
gewichtetes Mittel:

 
1
= w1 + w2
1

1
(w1 y1 + w2 y2 )
w1 + w2
w1
w2
=
y1 +
y2
w1 + w2
w1 + w2

x
=

e1
e2

y1
y2

1
=
w1 + w2

y1
y2

(w1 + w2 )y1 w1 y1 w2 y2
(w1 + w2 )y2 w1 y1 w2 y2


1
w2 (y1 y2 )
=
w1 + w2 w1 (y2 y1 )
w1 > w2 : mehr y1 als y2 in x
, |
e1 | < |
e2 |

40

4.1 Weighted observation equations

Pa = a(aT W a)1 aT W := Pa(Wa )


Pa Pa = a(aT W a)1 aT W a(aT W a)1 aT W
= a(aT W a)1 aT W = Pa
idempotente Matrix = Projektion (schiefe) = oblique projector

4.1.1 Geometry

z2

z2
z2
T

z Wz = c

zTWz = c

z1

zTWz = c

z1

w11 = w22
w12 = 0

z1

w11 = w22
w12 = 0

(a) Kreis

w11 = w22
w12 = 0

(b) Ellipse in Hauptachsenlage

(c) Allgemeine
Ellipse

Figure 4.1

F (z) = z T W z = c
w1 z12 + w2 z22 = c
w1 2 w2 2
z +
z =1
c 1
c 2
z12
z22
w1 + w2 = 1
c

z12 z22
+
=1
a
b

Ellipsengleichung

Eine Familie (weil c variieren kann) von Ellipsen, die im Allgemeinen nicht in Hauptachsenlage sind.

41

4 Weighted least squares


Kreis
Ellipse in Hauptachsenlage

z12 w11 + 2w12 z1 z2 + w22 z22 = c

zTW z = c
u
bliche Ellipsenform aber:

z12 z22
+ 2 =1
a2
b

also
z12 w11 + 2w12 z1 z2 = 0, weil Hauptachsenlage + w22 z22 = c
| {z }
z2
z2
= q 1 2 + q 2 2 = 1
c
w11

c
w22

Allgemeine Ellipse

grad F (z0 ) = 2W z0

der zur Ellipse orthogonale Vektor in Punkt z0

z z0 W z 0

oder

z0 T W (z z0 ) = 0

4.1.2 Ubertragung
auf Ausgleichungsrechnung
Gesucht: Vektor, der von der Linie ax startet, bei y endet und parallel zu z a oder
orthogonal zu aT W ist = e.
Aussagen:
y = a
x ist die Projektion von y
auf a
in Richtung orthogonal zu W a (entlang (W a) )
= y = Pa,(W a) y mit Pa,(W a) = a(aT W a)1 aT W
e ist die Projektion von y
auf (W a)

42

4.2 Weighted condition equations


in Richtung a

= e = P(W a) ,a y

P(W a) ,a = Pa,(W
a)

mit

= I a(aT W a)1 aT W

= [I a(aT W a)1 aT W ]y
Wegen e 6 a (oder aT e 6= 0) handelt es sich um schiefe Projektionen bzw. um
bez
uglich W orthogonale Projektionen, da ja e W a (oder aT W e = 0)

4.1.3 Higher dimensions


Von einer Unbekannten zu vielen Unbekannten
m=2
y = a
21

x + e

21 11

21

wird zu
y = A
m1

x + e

mn n1

m1

Ersetze a durch A!
P(Wa ) ,a = I A (AT W A)1 AT W
nm mm
mn
mm
{z
}
|
nn

4.2 Weighted condition equations


Geometry
Starting point again: bT a = 0 (a b):
Direction of (W a) :
bT a = 0 = bT W 1 W a = 0 = W a W 1 b = W 1 b = (W a)
Zu minimieren: eT W e unter bT e = bT y (oder bT (y e) = 0)
Von allen moglichen e auf der Gerade bT e = bT y ergibt e das Minimum von eT W w =
bT e = bT y = Tangente von eT W e = eT W e

43

4 Weighted least squares

bTe=bTy
z2

ax

e
ax
zTWz=aTWa

Wa

a
z1

b
T

a Wz=a Wa
W -1b

Figure 4.2: weighted condition


m Ber
uhrpunkt: Normal in Richtung b aber auch in Richtung W e = e = W 1 b
bestimmen: e liegt auf bT e = bT y
= bT e = bT W 1 b = bT y
= = (bT W 1 b)1 bT y
= e = W 1 b(bT W 1 b)1 bT y
h
i
= y = y e = I W 1 b(bT W 1 b)1 bT y
NB.: e ist nicht das kleinste e auf bT e = bT y!
Calculus
1
Eb (e, y) = eT W e + (bT y bT e)
2
e : min eT W e

unter Bedingung bT e = bT y

Lagrange:

1
Lb (e, ) = eT W e + (bT e bT y)
2
Gesucht: e und , die Lb minimieren
=

44

etc.

 Lb

= W e + b
=0
T
T
=b eb y =0

e, )
e (
Lb

(
e, )

4.2 Weighted condition equations

b e=b y
e2

ax

e
e1
e TWe
=c1
e TWe
=c2 W -1b
e TWe
=c3

Figure 4.3: possible ellipses

W b
bT 0

  

e
0
= bT y

Eliminierung 1. Zeile:
= 0 = e = W 1 b

W e + b
einsetzen:
= bT y
bT e = bT y = bT W 1 b
losen:
= (bT W 1 b)1 bT y

einsetzen:
e = W 1 b(bT W 1 b)1 bT y
h
i
y = y e = I W 1 b(bT W 1 b)1 bT y
Higher dimensions
Nur b durch B ersetzen.
b = m n Bedingungsgleichungen, Lagrange-Multiplikatoren
BTy = BTe
BTA = 0
y = Ax + e

= B T y = B T Ax + B T e = B T e

45

4 Weighted least squares

mm
BT
bm

m1 =

mb
bb

b1

0
BTy

e = W 1 B(B T W 1 B)1 B T y
h
i
y = I W 1 B(B T W 1 B)1 B T y
Constant term (RHS)
Ideal case without errors:
BTy = c
In reality:
B T (y e) = c = B T e = B T y c := w

e = W 1 B(B T W 1 B)1 [B T y c]
=
y = y e = etc.

4.3 Stochastics
Probabilistic formulation
Version 1:
y = Ax + e,
Version 2:

E {e} = 0,

D {e} = Qy


E y = Ax

Version 1 + 2 zusammen: Funktionales Modell.


Stochastisches Modell:


D y = Qy

Kovarianzmatrix

stochastisches + funktionales Modell: mathematisches Modell


Visualisierung y
Schar von Realisierungen/Experimenten
Vektor der beobachteten Werte = Realisierung von y = nicht stochastisch

46

4.4 Best Linear Unbiased Estimation (blue)


Variance-covariance propagation
In general:
z = M y,

Qz = M Qy M T

x
= (AT W A)1 AT W y
= Schatzer (LS)
= My

E {
x} = (AT W A)1 AT W E y
= (AT W A)1 AT W Ax

= x (erwartungstreuer/unverzerrter) Schatzer
Qx = M Qy M T

= (AT W A)1 AT W Qy W AT (AT W A)1

y = A
x
= PA y


E y = A E {
x} = Ax = E y
Qy = PA Qy PA T

e = y a
x = (I PA )y

E {
e} = E y Ax = 0

Qe = Qy PA Qy Qy PA T + PA Qy PA T

Questions:
Is x
the best estimator?
Or: When is Qx smallest?

4.4 Best Linear Unbiased Estimation (BLUE)

Best Qx minimal (in LU-Class)


Linear x
= LT y

47

4 Weighted least squares


Unbiased E {
x} = x
Estimate
2D-example (old)
 
1
a=
1
 2

1 12
Qy =
12 22


E y = ax,


D y = Qy ,
L-property:

x
= lT y
U-property:


E {
x} = lT E y = lT ax = x = lT a = 1

B-property:

x
= lT y = x2 = lT Qy l
Wir suchen das l, das lT Qy l minimiert und das auf der Geraden lT a = 1 liegt.
= min lT Qy l unter lT a = 1
l

Solution?
Comparison WKQ, B-Model
min
unter
Schatzer

eT W e
= bT y = w
e = W 1 b(bT W 1 b)1 W
bT e

lT Qy l
= aT l = 1
l = Q1 a(aT Q1 a)1 /cdot1
y
y
lT a

1 T
= x
= lT y = (aT Q1
y a) a Qy y

Higher dimensions
a A,

48

Q1
y = Py

4.4 Best Linear Unbiased Estimation (blue)


Gauss coined the variable P from the Latin pondus, which means weight.
BLUE: x
= (AT Py A)1 AT Py y
Det.: x
= (AT W A)1 AT W y

= BLUE, wenn W = Py = Q1
y

Variance-covariance propagation

x
= (AT Py A)1 AT Py y
= Qx = (AT Py A)1 AT Py Qy Py A(AT Py A)1 = (AT Py A)1
y = A
x = PA y
= Qy = A(AT Py A)1 AT Py Qy = PA Qy = PA Qy PA T = Qy PA
e = (I PA )y = PA y = y y
= Qe = Qy PA Qy Qy Py T + PA Qy PA T = PA Qy = Qy Qy
Besides:
I = PA + PA = Qy = PA Qy + PA = Qy + Qe

49

5 Geomatics examples & mixed models

5.1 Ebenes Dreieck

mit Strecken und Winkeln beobachtet


Beobachtungen: Winkel , , [ ], Strecken s12 , s13 , s23 [m]
Hilfsgroen: Richtungswinkel T12 , T13 [ ]
1
N
aherungskoordinaten:
2
3

x
0
1

y
0
0

3
2

1
2

y
3

S13

T13
T12

S23

S12

Figure 5.1: Skizze zur Beobachtungssituation ,,ebenes Dreieck

50

5.1 Ebenes Dreieck


Naherungskoordinaten [m]

x0
0
1

1
2
3

y0
0
0

3
2

1
2

Beobachtung
S12
S23
S13

y [Einheit]

:= 180

0.01[m]
0.03[m]
0.02[m]
0[rad]

0.3 [rad]

0.2 [rad]

Naherungswerte
0
S12
1m
0
S13
1m
0
S23
1m
0
60
0
60
0
60

Messwerte
S12
1.01 m
S13
1.02 m
S23
0.97 m

60

59.7

60.2

Koeffizientenmatrix, Designmatrix A
dx1 dy1
dx2 dy2
dx3 dy3
1
0
1
0 0
0
3
3
1
1
0
0
2 2
2
2
3
1
1

23 0
0
2
2
2
3
3
1
1
0 1

2
2
2
2
3
1
0
1
23 12
2
2

3
2

1
2

3
2

1
2

0.01 m
0.02 m
0.01 m
1
1
1

Einheiten

[]

[m1 ]

Unbekannte
[m]
dx1
dy1
dx2
dy2
dx3
dy3

Nichtlineare Beobachtungsgleichungen
Strecken:
Sij =
Winkel:

q
(xi xj )2 + (yi yj )2

= T12 T13
x2 x1
x3 x1
= arctan
arctan
y2 y1
y3 y1
= T23 T21
x3 x2
x1 x2
arctan
= arctan
y3 y2
y1 y2
= T31 T32
x1 x3
x2 x3
= arctan
arctan
y1 y3
y2 y3
Richtungswinkel:
Tij = arctan

xj xi
yj yi

Linearisierte Beobachtungsgleichungen
(Taylorpunkt = Nullstelle = Menge der Naherungskoordinaten)

51

5 Geomatics examples & mixed models


Strecken:
0
Sij = Sij
+

x0i x0j
0
Sij

xi +

yi0 yj0
0
Sij

yi

x0i x0j
0
Sij

xj +

yi0 yj0
0
Sij

yj

0 . . . aus N
wobei: Sij . . . beobachtete Strecke (Mewert), Sij
aherungskoordinaten berech0
nete Strecke (Naherungsstrecke) und xi := xi xi , yi := yi yi0 etc. . . . unbekannte, im Ausgleichungsproze zu schatzende Parameter (= ausgeglichene Koordinaten
x
i = x0i +
xi etc.)

Winkel: (zuerst Richtungswinkel Tij )


1

Tij = Tij0 +
1+
=

Tij0

= Tij0

x0j x0i
yj0 yi0

(yj0 yi0 )2
0 )2
(Sij

yj0 yi0

2

x0j x0i
x0j x0i
1
1
0
xi + 0
yi + 0
xj 0
yj
yj yi0
(yj yi0 )2
yj yi0
(yj yi0 )2

[. . .]

xi +

x0j x0i

yi +

yj0 yi0

xj +

x0j x0i

0 )2
0 )2
0 )2
0 )2 yj
(Sij
(Sij
(Sij
(Sij
 0

 0

y2 y10 y30 y10
x2 x01 x03 x01
0
0
= = T12 T13 + 0 2 +
x1 +
+
y1
(s12 )
(s013 )2
(s012 )2
(s013 )2

x02 x01
y30 y10
x03 x01
y20 y10
x

x
+
y3
2
2
3
(s012 )2
(s012 )2
(s013 )2
(s013 )2
= 0 + . . .
+

!Physikalische Einheiten! Datumproblem bei unterschiedlichen Beobachtungsszenarien?


Physikalische Einheiten der V-K-Matrix?

5.2 Linearisierung von Richtungsbeobachtungen

rij = Tij wi
xj xi
= arctan
wi
yj yi
0

= rij

52

yj0 yi0
(s0ij )2

xi +

x0j x0i
(s0ij )2

yi +

yj0 yi0
(s0ij )2

xj

x0j x0i
(s0ij )2

yi wi

zusatzliche Unbekannte

5.3 Linearisierung von Bedingungsgleichungen

y
0 Teilkreis
i

Pj

Tij

rij

Pi

Figure 5.2: Linearisierung von Richtungsbeobachtungen, rij : Richtungsmessung, wi :


Orientierungsunbekannte

5.3 Linearisierung von Bedingungsgleichungen

Figure 5.3: Linearisierung von Bedingungsgleichungen


Idealfall fehlerfreier Beobachtungen
a
b
=
sin
sin

a sin b sin = 0

Realfall fehlerbehafteter Beobachtungen


(a + ea ) sin( + e ) (b + eb ) sin( + e ) = 0
Unbekannt sind die Inkonsistenzen ea , eb , e , e , also aben wir eine nichtlineare Funktion
f (ea , eb , e , e ) = (a + ea ) sin( + e ) (b + eb ) sin( + e ) = 0,
die in eine Taylorreihe mit Taylorpunkt (die ei sind klein)

(e0a = e0b = e0 = e0 = 0) =: 0
53

5 Geomatics examples & mixed models


zu entwickeln ist.

f (ea , eb , e , e ) =

f (e0a , e0b , e0 , e0 )



f
f
0
+
(ea ea ) + . . . +
(e e0 )
ea 0
e 0

= a sin b sin + sin ea sin eb + a cos e b cos e = 0


Modell der Ausgleichung nach bedingten Beobachtungen
BTe = w
BT =
e=

sin sin b cos a cos


T
ea eb e e

w = a sin b sin

Widerspruch

5.4 Bogenschnitt im Raum mit zus


atzlichen H
ohenwinkeln
Raumstrecken
Sij =

(xi xj )2 + (yi yj )2 + (zi zj )2

(i = 1, . . . , 4; j P )

... Linearisierung wie u


blich
Hohenwinkel
ij

p
(xi xj )2 + (yi yj )2
= arccot
zi zj
= arccot
0
= ij

(auch andere trigonometrische Beziehungen verwendbar!)

dij
zi zj

dij
zi zj

2 . . . xi + . . . yi + . . . + . . . zj

5.5 Polynomausgleich (mit Nebenbedingungen)


gemessen: an fest vorgegebenen Stellen xi , i = 1, . . . , m gemessene y-Werte yi , i =
1, . . . , m

54

5.5 Polynomausgleich (mit Nebenbedingungen)

z
P
P4

S4
4
S3 3

S1
1
S2

P1

P2

P3

x
Figure 5.4: Hohenwinkel
gesucht: Parameter des ausgleichenden Polynoms
f (x) = y =

nX
max

an xn

n=0

(moglicherweise mit Nebenbedingungen, z. B. der Art die Tangente im Punkt xT , yT


soll durch den Punkt xP , yP gehen oder das Polynom soll durch den Punkt xQ , yQ
gehen oder der Koeffizient ak soll den Wert a
k annehmen, etc.)
Beobachtungsgleichung
yi =

nX
max

an xn + ei

n=0

y1 = a0 x01 + a1 x11 + a2 x21 + . . . + e1


..
.
ym = a0 x0m + a1 x1m + a2 x2m + . . . + em

55

5 Geomatics examples & mixed models

cubic (n=3)
quadratic (n=2)
linear (n=1)

x1 x2 x3

x4 x5

x7 x8

x6

x9

x10

Figure 5.5: Ausgleichende Polynome verschiedenen Grades


Van der Monde-Matrix A

y1
y2
..
.

1
1

= ..
.

ym
| {z }
y

a0
x1 xn1 max
e1

x2 xn2 max
a1 e2
.. + ..
..
. .
.
n
1 xm xmmax
anmax
em
|
{z
} | {z } | {z }

1. Ausgleichungsprinzip eT e min x

2. Meorte xi alle fehlerfrei: Inkonsistenzen werden nur in y-Richtung angebracht


3. yi konnen V-K-Matrix haben
4. beste Anpassung: je kleiner eT e f
ur variierende nmax ist, umso besser ist die Anpassung. Aber: je groer nmax ist, umso mehr schwingt das ausgleichende Polynom.
Durch Wahl eines groen nmax ist sogar eT e = 0 erreichbar Beschrankung auf
niedrige Grade (Ausnahme: Fourierkoeffizienten)
5. Einbringen von Nebenbedingungen
a) der Koeffizient ak soll den Wert a
k erhalten: entweder ak aus dem Unbekanntenvektor eliminieren oder u
ber Nebenbedingungen der Art B T x = c einbinden ( Lagrangefunktion)
nP
max
b) das Polynom soll durch den Punkt xQ , yQ gehen. yQ =
an xnQ = B T x
n=0

yQ = 0, B T = 1 xQ x2Q . . . xnQmax . Einbinden u
ber Lagrangefunktion.

56

5.5 Polynomausgleich (mit Nebenbedingungen)


c) die in xT x angelegte Tangente soll auch durch den Punkt xP , yP gehen.
Tangente im Punkt xT , yT : g(x) = f (xT ) + f (xT )(x xT ) soll durch xP , yP
gehen: = yP = g(xP ) = f (xT ) + f (xT )(x xT )
... Beispiel mit nmax = 2
f (x) = a0 + a1 x + a2 x2

Parabel

f (x) = a1 + 2a2 x
Tangente in xT : g(x) = a0 + a1 xT + a2 x2T + (a1 + 2a2 xT )(x xT )
Tangente in xT , die durch xP , yP geht
yP = a0 + a1 xT + a2 x2T + (a1 + 2a2 xT )(xP xT )

= a0 + a1 xT + a2 x2T + a1 (xP xT ) + 2a2 (xP xT )xT


= a0 + xP a1 + xT (2xP xT )a2

= B T x = yP
BT =

1 xP xT (2xP xT )

Einbinden u
ber Lagrange oder eine Unbekannte zugunsten der anderen eliminieren
Im Allgemeinfall des Polynoms von Grad n gilt
B T = 1 xP xT (2xP xT ) . . . xn1
T [nxP (n 1)xT ]

Praktische Umsetzung ...


... der Zeichnung der Tangente an ein Polynom n-ten Grades:
Bedingungsgleichung Bx = c
B =
x=

1 xP xT (2xP xT ) x2T (3xP 2xT ) . . . xn1


T (nxP (n 1)xT )

a0 a1 . . . an

c = yP

xT . . . x-Koordinate des Punktes, wo die Tangente an das Polynom angelegt wird


xP , yP . . . Koordinaten des Punktes, durch den die Tangente (noch) gehen soll
yT = a
0 + a
1 xT + . . . + a
n xnT geschatzte yT -Koordinate

57

5 Geomatics examples & mixed models


Tangente kann zwischen (xP , yP ) und (xT , yT ) gezeichnet werden. Verlangerung der
Tangente bis zu den Koordinatenrandern
Steigung der Tangente
Achsenabschnitt yT

yT yP
xT xP
yT yP
xT xP

=: aT

xT =: bT =

yP xT
yT x P
xT xP

= yi = aT xi + bT
Problem: ausgleichender Kreis als Spezialfall einer Ellipse gemischtes Modell
gemessen: x- und y-Koordinaten von Punkten, die nahezu auf einer Ellipse liegen
gesucht: ausgeglichene Positionen der Messpunkte, Mittelpunktkoordinaten, Halbachsen
der Ellipse
Bedingungsgleichung:
(xi + exi xM )2 (yi + eyi yM )2
+
1=0
a2
b2

Ellipsengleichung

0 , a , b , e0 , e0
Einf
uhrung von Naherungspunkten x0M , yM
0 0 x i yi

f (exi , eyi , xM , yM , a, b) = . . . Taylorentwicklung


=

xi x0M
a0

2

0
yi yM
b0

2

0 )
2(xi x0M )
2(yi yM
xM
yM
2
2
a0
b0

0 )
2(yi yM
2(xi x0M )
+
e
eyi
x
i
a20
b20

0 )2
2(xi x0M )2
2(yi yM
a

b = 0
a30
b30

sortieren nach Inkonsistenzen und unbekannten Parametern


+

58

0 )
2(xi x0M ) 2(yi yM
2
2
a0
b0

2(xi x0M )
a20

e 

0 )
2(yi yM
2
b0

xi

eyi

2(xi x0M )2
a30

0 )2
2(yi yM
3
b0

xM
 y
M

a
b

5.5 Polynomausgleich (mit Nebenbedingungen)

0 )2
(xi x0M )2 (yi yM
+
1=0
a20
b20

Eine Gleichung mit sechs Unbekannten


B T e + Ax + w = 0

T
B = 2

m2m

0
x1 x0M y1 yM
a20
b20

0
..
.

A = 2

m4

x1 x0M
a20
x2 x0M
a20

..
.

0
x2 x0M y2 yM
a20
b20

0
y1 yM
2
b0
0
y2 yM
2
b0

...

...

...

(x1 x0M )2
a30
(x2 x0M )2
a30

0
xm x0M ym yM
a20
b20
0 )2
(y1 yM
3
b0
0 )2
(y2 yM
3
b0

0
0 )2
(xm x0M )2 (ym yM
xm x0M ym yM
a20
b20
a30
b30

e = ex1 ey1 ex2 ey2 . . . exm eym

2m1

x = xM yM a b
41

w =

m1

(x1 x0M )2
a20
(x2 x0M )2
a20
(xm x0M )2
a20

0 )2
(y1 yM
b20
0 )2
(y2 yM
b20

T
1

T

..

0 )2
(ym yM
+

1
2
b
+

1
L(e, x, ) = eT e + T (B T e + Ax + w) min
e,x,
2
L
= e + B
=0
(
e, x
, )
e
L
= AT
=0
(
e, x
, )
x
L
= B T e + A
(
e, x
, )
x = w

59

5 Geomatics examples & mixed models

2m2m
0
42m

BT

2m4

2mm

AT

44

4m

m2m
m4
mm
3m+43m+4

e
0


x
=
0

3m+41

3m+41

= B T B
+ A
e = B
x = w







B T B A
w

=
=
0
AT 0
x

= (B T B)1 (A
=
x + w)
= AT (B T B)1 A
x + AT (B T B)1 w = 0
= x
= (AT (B T B)1 A)1 AT (B T B)1 w
= e = B(B T B)1 (A
x + w)

= B(B T B)1 A(AT (B T B)1 A)1 AT (B T B)1 w B(B T B)1 w

= B(B T B)1 (A(AT (B T B)1 A)1 AT (B T B)1 I)w

5.6 Mixed model


functional constraints:

linearization:

f (x0 , y) +
g(x0 ) +

f

x x0 ,y

g
x x

f (x, y) = 0
=
g(x) = 0

x +

f
y x ,y
0

x + HOT = 0

Ax B T + w = 0
A x + wf = 0
1 fT
2 e W e min

f (x0 + x, y e) = 0
g(x0 + x) = 0

(e) + HOT = 0
=

w + Ax B T e = 0
wf + Af x = 0

1
Lf (x, e, ) = eT W e+T (AxB T e+w)+f T (Af x+wf )
2

Lf
T A + f T Af = 0 = AT
+ Af T
f = 0
= T A + f T Af =
x
Lf
T B T = eT W
T B T = 0 = W e B
=0
= eT W
e

60

5.6 Mixed model


Lf
T
= (Ax B T e + w) = A
x B T e + w = 0

Lf
T
x B T e + wf = 0
= (Af x B T e + wf ) = Af
f

W
B T

0
0

B 0
0
0
0
0
AT Af T


e
0
0

A w
f = wf
Af
0
0

B T W 1 row 1 + row 2 :

W
B
0
0 B T W 1 B 0

0
0
0
T
0
A
Af T


e
0
0
w

A


f = wf
Af
0
0

Reduce the equations and the unknowns

B T W 1 B 0 A
w

f = wf
0
0 Af
T
T
0
A
Af 0

x
AT (B T W 1 B)1 row 1 + row 3 :

B T W 1 B 0
A
w
f =

0
0
Af
wf
=
T
T
T
1
1
T
T
1
1
A (B W B) w
0
Af A (B W B) A

x
second reduction


 

f
0
Af
wf

=
=
AT (B T W 1 B)1 w
Af T AT (B T W 1 B)1 A

x
or
=

AT (B T W 1 B)1 A Af T
Af
0



AT (B T W 1 B)1 w
wf

Case 1: AT (B T W 1 B)1 A is a full-rank matrix

61

5 Geomatics examples & mixed models


use partitioning formula:


N11 N12
N21 N22



Q11 Q12
Q21 Q22

I 0
0I

Q22

Q12
=
Q

21
Q11

1
= (N22 N21 N11
N12 )1
1
= N11 N12 Q22
1
= Q22 N21 N11
1
1
1
= N11 + N11 N12 Q22 N21 N11

N11 = AT (B T W 1 B)1 A = AT M 1 A
N12 = Af T
N21 = N12 T = Af
N22 = 0
Q22 = (0 Af (AT M 1 A)1 Af T )1 = (Af (AT M 1 A)1 Af T )1
Q12 = (AT M 1 A)1 Af T (Af (AT M 1 A)1 Af T )1

Q21 = Q12 T
Q11 = (AT M 1 A)1 (I Af T (Af (AT M 1 A)1 Af T )1 Af )

x = Q11 AT M 1 w Q12 wf

= (AT M 1 A)1 AT M 1 w + (AT M 1 A)1 Af T (Af (AT M 1 A)1 Af T )1 Af AT M 1 w


(AT M 1 A)1 Af T (Af (AT M 1 A)1 Af T )1 wf

x =
xwithout +
x
T
1
f = Q21 A M w Q22 wf

= Q22 (Af T (AT M 1 A)1 w wf )

= (Af (AT M 1 A)1 Af T )(wf Af T (AT M 1 A)1 w)

+ A
w = M
x
= M 1 (A
=
x + w)
= W 1 BM 1 (A
e = W 1 B
xw)
Case 2: AT M 1 A is a rank deficient matrix

rank(AT M 1 A) = rank A = n d

62

5.6 Mixed model

N Af T
Af 0

1

R ST
S Q

1

N R + Af T S = I

(5.1)

N S T + Af T Q = 0

(5.2)

Af R = 0

(5.3)

Af S T = I

(5.4)

since A is rank deficient AH T = 0 where H = null(A)


T 1
A M AH T = 0
N HT = 0

HN = 0

H : d n therefore

N is symmetric
H (5.1)

= H |{z}
N R +HAf T S = H = S = (HAf T )1 H
0

H (5.2)

= H N
S T} +HAf T Q = 0 = HAf T Q = 0
| {z
0

HAf T full rank = Q = 0

(5.1) = N R + Af T (HAf T )1 H = I
(5.3) = Af R = 0 = Af T Af R = 0

(+)

(N + Af T Af )R = I Af T (HAf T )1 H

= R = (N + Af T Af )1 (I Af T (HAf T )1 H)

x = RAT N 1 w + S T wf

= (N + Af T Af )1 AT M 1 w

+ (N + Af T Af )1 Af T (HAf T )1 HAT M 1 w S T wf
|
{z
}
=0

= (N + Af Af )

A M 1 w H T (Af H T )1 wf
T

63

5 Geomatics examples & mixed models


if Wf = 0:

x = (N + Af T Af )1 AT M 1 w

= M 1 (A

x
x + w)

= M 1 ((N + Af T Af )1 AT M 1 w

= M 1 ((N + Af T Af )1 AT M 1 I)w

e = W 1 B

= W 1 BM 1 ((N + Af T Af )1 AT M 1 I)w

64

6 Statistics
6.1 Expectation of sum of squared residuals
n
o
T 1
E e Qy e
NB.: eT Q1
oe, die minimiert wird
y e ist die Gr
T

Q1
y

e =

1m mm m1

m X
m
X

(Py )ij ei ej

i=1 j=1

m X
m
o
n
X


T 1
(Py )ij E ei ej
= E e Qy e =
i=1 j=1

m X
m
X

(Py )ij (Qe)ij

i=1 j=1

m
m X
X

(Py )ij (Qe)ji =

i=1 j=1

[Py Qe]ii

= trace(Py Qe)
= trace(Py (Qy Qy))
= trace(Im Py Qy)
= m trace Py Qy
trace Py Qy = trace QyPy
= trace AQx AT Py
= trace A(AT Py A)1 AT Py
= trace PA

Projektor

65

6 Statistics
Aus Linearer Algebra:
trace X = Summe der Eigenwerte von X
Frage: Eigenwerte des Projektors?
PA z = z

PA PA z = PA z = z
PA PA z = PA z = 2 z

Eigenwertproblem

z = z = ( 1)z = 0 = =

= trace PA = Zahl der Eigenwerte 1


Frage wieviele Eigenwerte = 1
Antwort:
dim R(A) = n
n
o
E eT Py e = m n (= r Redundanz)

66

0
1

6.2 Basics

6.2 Basics
Zufallsvariable: x
Realisierung: x
Probability density function
probability density function (pdf)

f (x)

Wahrscheinlichkeitsdichte

f (x)

f (x)

x0

(a) probability density


function

ab

(b) probability calculations by integrating


over the pdf

(c) intervall

Figure 6.1
Z

f (x) dx = 1

NB.: Nicht unbedingt Normalverteilung!

E {x} = x =

xf (x) dx

D {x} = x2 =



(x x )2 f (x) dx = E (x x )2

Probability calculations by integrating over the pdf.

P (x < x0 ) =

Zx0

f (x) dx

67

6 Statistics
Cumulative density function
Verteilungsfunktion

cumulative distribution or density function (cdf)


1

F(x)

Figure 6.2: cumulative distribution or density function

F (x) =

Zx

f (y) dy = P (x < x)

z.B.:
P (a x b) =

Zb

Zb

f (x) dx

f (x) dx

Za

f (x) dx

= F (b) F (a)

6.3 Hypotheses
Annahme/Aussage, die statistisch getestet werden kann.
H : x f (x)
Annahme: x sei verteilt nach vorgegebem f (x).

P (a x b) = 1
= Sicherheitswahrscheinlichkeit = Konfidenzniveau

68

6.3 Hypotheses

1
/2

/2
a

K
reject

x
A
accept

b
K
reject

Figure 6.3: Konfidenz- und Signifikanzniveau


P (x 6 [a; b]) =
= Irrtumswahrscheinlichkeit = Signifikanzniveau
[a; b] = Konfidenzbereich = Annahmebereich (confidence region)
[; a] [b; ] = kritischer Bereich = Ablehnungs-/Verwerfungsbereich (critical region)
a, b = Grenzwert
Jetzt: Realisierung von x = x
Falls a x b: Hypothese akzeptieren Falls nicht: Hypothese verwerfen
z.B.:
e = Pa y
Qe = Pa Qy = Qy Qy
Beispiel Normalverteilung (zweiseitig)
a, b definieren bestimmen

P ( x + ) = 68.3% = = 0.317
P ( 2 x + 2) = 95.5% = = 0.045
P ( 3 x + 3) = 99.7% = = 0.003
Matlab: normpdf
1 = F (b) F () = F ( + k) F ( k)
k = kritischer Wert (critical value)

69

6 Statistics
festlegen a, b bestimmen

P ( 1.96 x + 1.69) = 95% = = 0.05( 2)


P ( 2.58 x + 2.58) = 99% = = 0.01
P ( 3.29 x + 3.29) = 99.9% = = 0.001
Matlab: norminv
Rejection of hypothesis
= eine andere Hypothese mu wahr sein

H0 : x f0 (x)

: Null-Hypothese

Ha : x fa (x)

: alternative Hypothese

Ha

H0

0
A

a
k

accept

K
reject

Figure 6.4: accept or reject hypothesis?

xK
= H0 verwerfen
x 6 K
= H0 annehmen

H0 wahr
Fehlschlu 1. Art
(falscher Alarm)
P (x K|H0 ) =
OK

= level of significance of test = size of test


= 1 = Testg
ute = power of test

70

H0 falsch
OK

Fehlschlu 2. Art
(unterlassener Alarm)
P (x 6 K|Ha ) =

6.4 Distributions

6.4 Distributions
Standard normal distribution (univariate)
x N (0, 1)

f (x) =

1 2
1
e 2 x
sqrt2

E {x} = 0

D {x} = E x2 = 1 x2 2 (1, 0)



E x2 = 1

Standard normal (multivariate) Chi-square distribution


x
kV ektor

N(

, 1)

f (x) =



1 T
exp

x
x
k
2
(2) 2

kV ector

x
kV ektor

kV ektor

n
o
E xxT = I


D {x} = E x2 = 1 x2 2 (1, 0)

xT x = x21 + x22 + . . . + x2k


2 (k, 0)
n
o


E xT x = E x21 + . . . + E x2k = k

Non-standard normal central Chi-square distribution

x N (0, Qx ),

Qx =

12

22

0
..

.
k2



1
1 x2i
xi
f (xi ) =
exp 2
2 i
2i
x
y i = i N (0, 1)
i
n
o
x2k
x21
x22
2
T 1
xT Q1
x x = 2 + 2 + . . . + 2 (k, 0) = E x Qx x = k
1
2
k
N (0, i2 )

71

6 Statistics

f (x) =

1
1
(det Qx ) 2 exp ( xT Q1
x x)
2
(2)
k
2

The same is true when


x N (0, Qx ) with Qx full matrix .
Non-standard normal non-central Chi-square distribution
x N (, I)
= xT x 2 (k, )
E {x} =
n
o
E xT x = k + ;

T = Nichtzentralitatsparameter

= 21 + 22 + . . . + 2k

General case
x N (, Qx )


1
1
1
f (x) =
exp

(x

)Q
(x

)
x
k
1
2
(2) 2 (det Qx ) 2
E {x} =
D {x} = Qx
n
o
E xT Q1
x x = k + ,

72

= T Q1
x

7 Statistical Testing
7.1 Global model test: a first approach
Statistics of estimated residuals

e = y y
= PA y
E {
e} = 0,

e N (0, Qe)

D {
e} = Qe
= Qy Qy

= PA Qy (PA )



2 (m, 0) and thus E eT Q1
= m?
Question: eT Q1
e e
e e

No, because Qe is singular and therefore not invertible. However, in 6.1:


n
o
n
o
T 1
1
T
E e Qy e = trace(Qy E ee ) = trace(Q1
y (Qy Qy)) = m n
| {z }
Qe

Test statistic
As residuals tell us something about the mismatch between data and model, they will
be the basis for our testing. In particular the sum of squared estimated residuals will be
used as our test statistic T :
T = eT Q1
2 (m n, 0)
y e
E {T } = m n
Thus, we have a test statistic and we know its distribution. This is the starting point
for global model testing.

73

7 Statistical Testing

Figure 7.1: Distribution of the test statistic T under the null and alternative hypotheses.
(Non-centrality parameter to be explained later)

T > k: reject H0
In case T the realization of T is larger than a chosen critical value (based on ),
the null hypothesis H0 should be rejected. At this point, we havent formulated an
alternative hypothesis Ha yet. The rejection may be due to:
error in the (deterministic) observation model A,
measurement error: E {e} 6= 0,
wrong assumptions in the stochastic model: D {e} 6 Qy .
Variance of unit weigth
A possible error in the stochastic model would be a wrong scale factor. Let us write
Qy = 2 Q and see how an unknown variance factor 2 propagates through the various
estimators:
1 T 1
x
= (AT Q1
y A) A Qy y
1
Qx = (AT Q1
y A)

Qy = 2 Q

y Py = Q1 = 2 Q1
y

x
= (AT 2 Q1 A)1 AT 2 Q1 y
= 2 (AT Q1 A)1 AT 2 Q1 y

74

7.2 Testing procedure


= (AT Q1 A)1 AT Q1 y
Qx = 2 (AT Q1 A)1

= unabhangig von 2

= voll abhangig von 2

Thus, the estimator x


is independent of the variance factor and therefore insensitive to
stochastic model errors. However, the covariance matrix Qx is scaled by the variance
factor. How about the test statistic T ?
o
n
o
n
2 T 1
e

=
E

Q
e

=mn
E eT Q1
y
n
o
= E eT Q1 e = 2 (m n)
Alternative test statistic
This leads to a new test statistic:

2 =

 2
eT Q1 e
= E
= 2 ,
mn

which shows that


2 is an unbiased estimator of 2 .

unverzerrter Sch
atzer

y =
If we consider Q as the a priori variance-covariance matrix, then Q
2 is the a
posteriori one.
Now consider the ratio between a posteriori and a priori variance as an alternative test
statistic:
T
2 (m n, 0)

2
eT 2 Q1 e e Q1
y e
=
=

= F (m n, , 0)
2
mn
mn
mn

The ratio has a so-called Fisher distribution.


 2

E
=1
2

7.2 Testing procedure


Null hypothesis and alternative hypothesis


If the null hypothesis is described by E y = Ax, D y = Qy , and if we assume
that our stochastic model is correct, then we formulate an alternative hypothesis by
augmenting the model. We will add q new parameters (which is not an operator
here). Consequently we will need a design matrix C for .
mq

75

7 Statistical Testing

H0

E y = Ax;

Ha

D y = Qy

x
0

y0 = A
x0

e0

T
1
e0 Qy e0 2 (m n)



E y = Ax + C; D y = Qy
 x
= AC

x
a ,

ya = A
xa + C

ea

T
1
ea Qy ea 2 (m n q)

Ha more parameters = sum of squared residuals smaller


1
1
eT
a < eT
0
a Qy e
0 Qy e

= difference, which is a measure of improvement as test statistic:


1
1
T = eT
0 eT
a
0 Qy e
a Qy e

How is it distributed?
H0 : T 2 (q, 0)

76

and Ha : T 2 (q, )

7.2 Testing procedure


Geometry of H0 und Ha
y

ea
R(C)

e0

R(A|C)

ya

ea

e0

y0 ya

y0

y0

ya

y0 ya

R(A)

(a)

(b)
R(C)

C
R(A

R(A|C)

PAC

ya

PA C
y0-Axa
Axa

(c)

Figure 7.2: Hypothesen H0 und Ha

ya = A
xa + C

+PA C

= A
x + P C
| a {z A }
y0

= ya y0 = PA C


Q1
T T
= T = PA C
y PA C = C

PA Q1
y PA
|
{z
}

1
1

=Q1
y PA =Qy Qe
0 Qy

1
T C T Q1
=
y Qe0 Qy C

Schon drei Versionen von T


1
1
T = eT
0 eT
a
0 Qy e
a Qy e

= (
y0 ya )T Q1
y0 ya )
y (

77

7 Statistical Testing
1
T C T Q1
=
y Qe0 Qy C

Alle drei erfordern Ausgleichung unter Ha

(
ea , ya , )
Jetzt: Version nur mit e0 und C
Normal equations under H0 , Ha

AT Q1
x0 = AT Q1
y A
y y
 T
   T
 x

A
A
1
Ha :
Qy A C
=
Q1
y y
T

C
CT

T 1
AT Q1
   T 1 
y A A Qy C
a
A Qy y

x
nn
nq
T 1
=
T Q1 y
C Qy A C T Q1
C
C

y
y

H0

qn

qq

1. Zeile:
T 1

AT Q1
xa + AT Q1
x0
y A
y C = A Qy A

1 T 1
= x
a = x
0 (AT Q1
y A) A Qy C

= A
xa = A
x0 PA C

= A

= A
xa + C
x0 + (I PA )C

= ya = y0 + PA C

losen lange Ableitung/Manipulation


2. Zeile: x
a einsetzen und
deshalb nur Ergebnis:
1
1 T 1
= (C T Q1

0
y Qe0 Qy C) C Qy e

T...
4. Form
Einsetzen in
1
1
1 T 1
T = eT
0
0 Qy C() ()() C Qy e
1
T 1
1
1 T 1
= eT
0
0 Qy C(C Qy Qe0 Qy C) C Qy e

78

7.2 Testing procedure


Verteilung von T
Variablentransformation
e0
z = C T Q1
y

q1

qm mm m1

1
Qz = C T Q1
y Qe0 Qy C

= Q1

z z = z = Qz
2
T = z T Q1
z z q

H0

Ha

z N (0, Qz )
T 2 (q, 0)

Qz )
z N (Qz ,
2
T (q, )
T
1
1
= Qz Qz Qz = T C T Q1
y Qe0 Qy C

Summary
1 e
1 zeigt m
Testgroe T = eT
T
oglicherweise an, da H0 zugunsten Ha zu
0
a
a Qy e
0 Qy e
verwerfen ist, d. h. das Modell E {y} = Ax ist moglicherweise unpassend.

T 2q,0 im Fall H0 und T 2q, im Fall Ha mit dem Nichtzentralitatsparameter


1
= T C T Q1
y Qe0 Qy C
Alternative Formulierungen f
ur T
1
1
(1) eT
0 eT
a
0 Qy e
a Qy e

(2) (
y0 ya )T Q1
y0 ya )
y (
1
(3) T C T Q1
y Qe0 Qy C

1
1
T 1
1
(4) eT
Q
C
C
Q
Q
Q
C
C T Q1
0
e0 y
0 y
y
y e

(5) z T Q1
z z;

z := C T Q1
0 ;
y e

Qz = . . .

wobei bei (1), (2) und (3) die Berechnung der Alternativhypothese involviert ist,
wahrend (4) und (5) nur e0 und C benotigt
Wegen
z N (0, Qz ) unter H0
z N (Qz , Qz ) unter Ha

79

7 Statistical Testing
folgt die Aussage
T 2 (q, 0) unter H0 , T 2 (q, ), = T Qz unter Ha
Wieviel Zusatzparameter konnen minimal/maximal gewahlt werden?
Gesamtzahl aller Parameter x und ist n + q
Anzahl der Beobachtungen m
Losbarkeit gegeben, wenn n + q m = 0 < q m n
Case (i) q = m n: global model test

rank(A|C) = n + q = n + (m n) = m
=

o(A|C) = m n + q = m m quadratisch

Redundanz = m n q = 0

ea = 0

ya = y

1
T = eT
0
0 Qy e

H0 : E {y} = Ax versus Ha : E {y} Rm


T 2mn,0

80

T 2mn, , wie zuvor

7.2 Testing procedure

Alternativ zur Testgroe T wird oftmals die Testgroe


mit
2 bezeichnet
1
eT
0
0 Qy e

2 =
mn

T
mn

1
eT
0
0 Qy e
mn

verwendet und

H0 :
2 F (m n, , 0)

Ha :
2 F (m n, , 0), wie zuvor
Meaning of
2

Im Fall, da die VK-Matrix D y = Qy zu 2 Q gegeben ist 2 ist ein unbekannter
Skalierungsfaktor folgt f
ur die quadratische Form eT Q1

y e
n
o
E eT Q1
e

= 2 (m n)
y
 2
=E
(m n)
 2
= E
= 2

2 ist eine erwartungstreue, d. h. unverzerrte Schatzung f


ur den unbekannten Varianz2
faktor
Einschub
Einflu eines Varianzfaktors auf die Ergebnisse der Parameterschatzung
E {y} = Ax

D {y} = Qy = 2 Q = W 1

2 . . . unbekannter Varianzfaktor
Q1 = W . . . gegebene Gewichtsmatrix

1 T 1
x
= (AT Q1
y A) A Qy y

= (AT 2 Q1 A)1 AT 2 Q1 y
= 2 (AT Q1 A)1 AT 2 Q1 y
= (AT Q1 A)1 AT Q1 y
von 2 unabh
angig (egal ob bekannt oder unbekannt)

81

7 Statistical Testing

Qx = (AT Qy1 A)1


= 2 (AT Q1 A)1
! abhangig
y = A
x
unabhangig

T 1
Qy = (AQ1
x A )

= 2 A(AT Q1 A)1 AT
! abhangig
e = y y
unabhangig

! abhangig
allgemein

Qe = Qy Qy
h
i
= 2 Q A(AT Q1 A)1 AT
f = F x

lineare Funktion der Schatzwerte der Parameter

Qf = F Qx F T
= 2 F (AT Q1 A)1 F T
Die VK-Matrizen sind von 2 abhangig, d. h. nicht berechenbar, wenn 2 unbekannt
ist. Aber wir haben mit
2 ja eine erwartungstreue Schatzung f
ur 2 , so da anstatt
Qx , Qy, Qe, . . . , Qf die VK-Matrizen
x =
=
Q
2 Qx , . . . , Q
2 Qf
f

82

7.2 Testing procedure


angegeben werden konnen. Und das gibt Sinn!
Beispiel: Geradenanpassung, Parabelanpassung
T 2q,T Qz
unter der Alternativhypothese
wegen E {z} = Qz
= 2q,T ...

Untersuchung der Datensatze 1 und 2 per Geraden- bzw. Parabelanpassung

a) Geradenanpassung: Welche qualitative Aussage erwartet man hinsichtlich der Genauigkeit


der Parameter? [Die Parameter selbst waren ja identisch]
Antwort: die Genauigkeit der Parameter ist beim Datensatz 1 besser bei Datensatz 2,
T Q1 e

x widerspiegelt.
in Q
was sich u
ber 2 = e mn
x = 0wird,
b) Parabelanpassung: . . . Das Modell pat perfekt zum Datensatz 2, so da Q
T
1
denn e Q e = 0 wegen e = 0!
Bedeutung des Tests
Wegen ea = 0 mu offensichtlich keine C-Matrix spezifiziert werden, um den Test
durchf
uhren zu konnen. Im Fall q < m n ist das anders ,und es ist nicht immer trivial, ein f
ur alle Situationen geeignetes C zu finden. Der Test kann immer durchgef
uhrt
werden und deshalb als allgemeiner Modelltest bezeichnet werden (overall model test).
Case (ii) q = 1: data snooping
= C ist ein m 1-Vektor, ein Skalar

1
1
T 1
1
T = w2 = eT
Q
C
C
Q
Q
Q
C
C T Q1
0
e0 y
0 y
y
y e
=

1
eT
0 Qy C

2

1
C T Q1
y Qe0 Qy C

83

7 Statistical Testing
2

1
C T Q1
y Qe0 Qy C

2 =
mit
=

1

C T Q1

y e
1
C T Q1
y Qe0 Qy C

Wichtigste Anwendung:
Detektion eines groben Fehlers in den Beobachtungen, der zu einer falschen Modellformulierung f
uhrt.

H0 : E {y} = Ax

Verwerfe H0 , falls T =

> k oder

positiv oder negativ sein!)

H : E {y} = Ax + C
a

C = 0, 0, . . . ,

T =

1
, 0, . . . , 0
|{z}
i-te Pos.
1

< k2 bzw.

T =

kann
> k2 (

Sollte H0 verworfen werden, mu yi neu gemessen werden. Der Test wird f


ur alle i =
1, . . . , m durchgef
uhrt und heit auch datasnooping. Falls Qy eine Diagonalmatrix ist,
folgt

ei
w= T =
ei

H0 :

T N (0, 1)

Ha : Tq N ( T , 1)

1
mit T = C T Q1
y QeQy C

7.3 DIA-Testprinciple
Detektion: Globaltest Modellfehler (
uberhaupt?)
Identifikation: Data snooping Ausreier (wo?) max(wi )
Anpassung Iteration, Neumodellierung, Neubeobachtung
Question: how to ensure consistent testing parameters? We must make sure that we
would reject the null hypothesis but cannot identify the error during the data snooping.

84

7.4 Internal reliability


Neither can we afford to have detectable outliers that go undetected in the global model
test. Consistency is guaranteed if the probability of detecting an outlier under the
alternative hypothesis with q = 1 (data snooping) is the same for the global test. Thus,
both tests must use the same = 1 , which is called 0 here.
0 = (, q = m n, = 0 ) = (1 , q = 1, = 0 )
q = 1:
0 = 1 0
1
q = m n:

0
0 = 1 0

= ( ) = 0


= = mn

z. B.: 1 = 1% (
ublicherweise klein), 0 = 20% = mn 30%

7.4 Internal reliability


Which model error C results in the power of test 0 ? Or the other way around: Welcher
Modellfehler C kann (gerade noch) mit Wahrscheinlichkeit 0 entdeckt werden? This
question is discussed in the framework of internal reliability.
Analysis

1
= T C T Q1
y QeQy C

Qe = Qy Qy

1
= Qy A AT Q1
A
AT
y



1
T T
1
1
T 1
T 1
= =
C
Q

Q
A
Q
A
A
Q
C
y
y
y
| {z } y
(C)T

Frage: bei gegebenem = 0 , wie kann man C beeinflussen?


u
ber Qy :

bessere Messungen = Qy kleiner

85

interne
keit

Zuverl
assig-

7 Statistical Testing
= Q1y groer
= C kleiner
= Bei genaueren Messungen ist der Modellfehler C,
der (gerade noch) bestimmt werden kann, kleiner
u
ber A:
mehr Beobachtungen = groere Redundanz
Bei gleichbleibendem C = groer
oder: bei gleichem = C kleiner
besserer Netzentwurf, Messanordnung (= Selbstkontrolle)
Minimum Detectable Bias (MDB)
y = C = E {y|Ha } E {y|H0 }

m1

mq q1

y beschreibt die interne Zuverlassigkeit. Es ist Ma f


ur den kleinsten Fehler, der mit
der Wahrscheinlichkeit aufgedeckt werden kann.
1
Frage: Wie bestimmt man aus 0 = (C)T Q1
y QeQy C?

Case q = 1 (datasnooping):
ist skalar / yi = ci
1
2
0 = ci T Q1
y QeQy ci
s
0
= |i | =
1
T
ci Qy QeQ1
y ci

|i | = minimal detectable bias (kleinster aufzudeckender Fehler)


Annahme: Qy ist Diagonalmatrix
1
4 T
= ci T Q1
y QeQy ci = yi ci

Qe ci
|{z}

=Qy Qy

= = y4
y2i y2i
i

86

7.4 Internal reliability

= |i | = s

y2i

= yi r
|i | = yi r

y2i y2
y2
i

y2

y2i

y2

0
= yi
ri

y2i

a) Wenn keine Verbesserung durch Ausgleichung


yi = yi = |i | =

b) Wenn yi yi : |i | = yi 0
ri = 1

y2i
y2i

= lokale Redundnaz

yi nicht kontrolliert 0 ri 1 sehr gut kontrolliert.


X
i

ri = m n


Red. weil: ri = ci T I QyQ1
ci
y
= ci T (I PA ) ci

= ci T PA ci
=

X
i

ri = tracePA = m n
n
o
T 1 T
NB.: E e Qy e
= mn

Mittlere lokale Redundanz:


m
P

mn
i | = y
r = i=1 =
= |
i
m
m

mn
m

87

7 Statistical Testing
Ausgleichung: Redundanz verteilen abhangig von A, Qy
Redundancy

e = RA
y



1
T 1
T 1
= I A A Qy A
A Qy y

= Ry

R = Redundanzmatrix

ei = Rij yj
= ri yi + ...
=
e = ri yi
= Lokale Redundanz ist ein Ma daf
ur wie Redundanz auf einzelne Messungen umverteilt
wird bzw. wie Modellfehler y auf Residuen projiziert werden.
nicht behandelt: q > 1, z. B. m n

7.5 External reliability


Wie wirkt sich ein undetektierter Fehler auf Unbekannte aus? Thats the question of
external reliability.

y( = C) x

1 T 1
x
= (AT Q1
y A) A Qy y

1 T 1
(
x + x
) = (AT Q1
y A) A Qy (y + y)
1 T 1
x
= (AT Q1
y A) A Qy y

= Auswirkung oder Effekt von KDF


Problem:
hangt von Ha von C ab
vektoriell
verschiedenartige Parameter x

88

7.6 Reliability: a synthesis

x = x
T Q1

x
x
= x
T AT Q1

y A x
= (PA y)T Q1
y (PA y)
= |PA y|2

7.6 Reliability: a synthesis

y = PA y + PA y
oder = PA x
+ PA y


2


|y|2 = |PA y|2 + PA y

T AT Q1
+ y T (PA ) Q1
y
oder y T Q1
y y = x
y A x
y PA |{z}
| {z }
|
{z
} |
{z
}
y

x + 0
y =

1
1
Q1
y Qe
Qy =Qy PA

{z
0

special case
q = 1, ci , Qy = diag
yi = 0 ,

ri = 1

y2i
y2

=
x = yi 0
1
= 0 0
ri


1 ri
=
0
ri
89

7 Statistical Testing

=
=
=

90

y2i y2i
y2i (y2i y2i )
y2i
y2i y2i
1
y2i
y2
i

7.6 Reliability: a synthesis

91

8 Recursive estimation
8.1 Partitioned model
E



y1
y2



(m1 +m2 )1

A1
A2

x;

(m1 +m2 )n



y1
y2



Q1 0
0 Q2

(m1 +m2 )(m1 +m2 )

8.1.1 Batch / offline / Stapel / standard


1 

T 1
T 1
T 1
A
y
+
A
Q
y
= x
= A1 T Q1
A
+
A
Q
A
Q
1
2
2
1
1
2
2
1
2
1
2

x
(2)


1
T 1
Qx = A1 T Q1
A
+
A
Q
A
1
2
2
1
2

8.1.2 Rekursiv / sequentiell / real-time

n o
n o
E y 1 = A1 x
D y 1 = Q1
(
1 T 1
x
(1) = A1 T Q1
A1 Q1 y
1 A1
1
=
T 1
Qx(1) = A1 Q1 A1



1 
1
T 1
T 1
x
(2) = Q1
+
A
Q
A
Q
x

+
A
Q
y
2
2
2
2
2
x

x
(1) (1)
2
 (1)
1
Q
1
T 1
x
(2) = Qx
(1) + A2 Q2 A2
92

8.1 Partitioned model

measurement update
covariance update

= Aufdatierungsgleichungen

Aber dies ist auch die Losung von


E



x
(1)
y2



I
A2

x;



x
(1)
y2



Qx(1) 0
0 Q2

= Grundlage vom rekursiven Schatzen.

8.1.3 Umformen
mit
1
T 1
Q1
x
(2) = Qx
(1) + A2 Q2 A2
1
T 1
= Q1
x
(1) = Qx
(2) A2 Q2 A2

Einsetzen:


T 1
T 1
x
(2) = Qx(2) Q1
x

A
Q
A
x

+
A
Q
y
2
2 (1)
2
2
2
2
x
(2) (1)

y2 A2 x
(1)
=x
(1) + Qx2 A2 T Q1
2
=x
(1) + KV2

V2 = y2 A2 x
2

K = Qx2 A2 T Q1
2

A2 x
(1) . . . predicted observation
V2 . . . prdicted residual
K . . . gain matrix
Problem: viele Matrixinversionen:


Qx(1) ,
Qx1 + A2 T Q1
A
2 ,
2
nn

nn

Q2
m2 m2

8.1.4 Formulierung nach Bedingungsgleichungen


BTA = 0

93

8 Recursive estimation

= A2 I

I
A2

=0

Q
0
x

(1)

x
(1)
x
(1)

A2 I E
= 0;
D
= 0
y2
y2
Q2





1
B T E y = 0
T
T
y = I Qy B B QyB
B y
D y = Qy

 
 

 
1

Qx(1) A2 T 
x
(2)
I 0
x
1
T
A2 I
=
=

Q2 + A2 Qx(1) A2
0I
y2
y2
Q2








1

= x
(2) = x
(1) + Qx(1) A2 T Q2 + A2 Qx(1) A2 T
(y2 A2 x
1 )
=x
(1) + KV2

K = Qx(1) A2 T

Q2 + A2 Qx(1) A2 T
m2 m2

Qx(2) = Qx(1) Qx(1) A2 T Q2 + A2 Qx(1) A2 T


= Qx(1) KA2 Qx(1)

1

A2 Qx(1)

= (I KA2 ) Qx(1)
Verstarkungsmatrix numerisch identisch. Nur 1 Inversion m2 m2. Zum Beispiel nach
jeder Neubeobachtung m2 = 1.

8.2 Allgemeiner

Batch:

y1
A1

y2 A2

E . = . x;
.
.

.
.

yk
Ak
x
=

k
X
i=1

94


y1
Q1

y2

Q2

D . =

..

yk

Ai T Q1i Ai

!1

k
X
i=1

Ai T Q1i yi

0
..

.
Qk

8.2 Allgemeiner
Rekursiv:
x
k = x
k1 + Kk zk
zk = yk Ak x
k1

1
T
T
Kk = Qx(k1) Ak Qk + Ak Qk Ak

Qx(k) = [I Kk Ak ] Qxk1

95

A Partitioning
A.1 Inverse Partitioning Method


I b
bT 0

1. A bC = I



 

A B
I 0
=
C D
0I
| {z }
inverse

A, B, C, D are unknown

2. B bD = 0
3. bT A = 0
4. bT B = I
T
T
T
T 1 T
1. b
| {z A} b bC = b = C = (b b) b
=0

2. bT B + bT bD = 0 = I + bT bD = 0 = D = (bT b)1
1. A + b(bT b)1 bT = I = A = I b(bT b)1 bT
2. B + b(bT b)1 = 0 = B = b(bT b)1


I b
bT 0

1

I b(bT b)1 bT b(bT b)1


(bT b)1 bT (bT b)1

e = b(bT b)1 bT y
= (bT b)1 bT y

96

A.2 Partitioning

A.2 Partitioning
The normal matrix of the linear system is symmetric, therefore
 T

 

A AD
I 0
R ST
=
S Q
0I
DT 0
then
(AT A)R + DS = I
T

(A.1)

(A A)S + DQ = 0

(A.2)

DT R = 0

(A.3)

DT S T = I

(A.4)

H (A.1) = H(AT A) R + HDS = H = S = (HD)1 H


| {z }
0

H (A.2) = H(AT A) S T + HDQ = 0 = HDQ = 0


| {z }
0

since HD is a d d full-rank matrix

HDQ = 0 = Q = 0

A.1 + D A.3 = (AT A)R + D(HD)1 H + DDT R = I


= (AT A + DDT )R = I D(HD)1 H

= R = (AT A + DDT )1 (I D(HD)1 H)


Inserting R and S into the normal equations

x
= RAT y = (AT A + DDT )1 AT y (AT A + DDT )1 D(HD)1 HT

| {z } y

= SAT y = (HD)1 HT T y = 0

| {z }

=0
=

= x
= (AT A + DDT )1 AT y
e = y A
x = (I A(AT A + DDT )1 AT )y

97

B Buchempfehlungen

B.1 Wissenschaftliche B
ucher
Teunissen, P. J. G.
Adjustment theory an introduction
Delft University Press, 2003
ISBN 90-407-1974-8
Teunissen, P. J. G.
Testing theory an introduction
Delft University Press, 20002006
ISBN 90-407-1975-6
Teunissen, P. J. G.
Dynamic data processing recursive least-squares
Delft University Press, 2001
ISBN 90-407-1976-4
Niemeier, Wolfgang
Ausgleichungsrechnung
de Gruyter, 2002
ISBN 3-11-014080-2
Grafarend, Erik W.
Linear and Nonlinear Models Fixed Effects, Random Effects, and
Mixed Models
de Gruyter, 2006
ISBN 978-3-11-016216-5
Koch, Karl-Rudolf
Parametersch
atzung und Hypothesentests in linearen Modellen
D
ummlers
ISBN 3-427-78923-3

98

B.2 Popularwissenschaftliche B
ucher, Literatur

B.2 Popul
arwissenschaftliche B
ucher, Literatur
Sobel, Dava
Longitude: The True Story of a Lone Genius Who Solved the Greatest
Scientific Problem of His Time
Fourth Estate, 1996
ISBN 1-85702-502-4

Deutsche Ubersetzung:
L
angengrad. Die wahre Geschichte eines einsamen Genies, welches das
gr
ote wissenschaftliche Problem seiner Zeit l
oste
Berliner Taschenbuch Verlag, 2003
ISBN 3-8333-0271-2
Kehlmann, Daniel
Die Vermessung der Welt
Rowohlt, Reinbek, 2005
ISBN 3-498-03528-2

99

Index
Legendre, 13
minimization problem, 15
optimal estimates, 15
unknowns, 13

100

Potrebbero piacerti anche