Unit03 Xxjd99y

Unit 3: Quasi Newton Methods in Optimization
Let f : Rn → R, the objective function.
Find x∗ such that

f (x∗) = minn f (x)
x∈R
Führer: FMNN25 2018

39
3.1: Notations
We denote
• The gradient ∇f (x) =: g(x) and write it as a row matrix.
• The Hessian ∇2f (x) =: G(x). It is a n × n matrix.
Note: G is a symmetric matrix.

40
3.2: Characterization
x∗ is a local minimizer of f , if
• g(x∗) = 0
• G(x∗) (strictly) positive definite.
A test for positive definiteness can be made together with linear system
solving by Cholesky ’s method. See scipy.linalg.chol.

41
3.3: Base Method: Newton Iteration
Solve g(x∗) = 0 by iteration:
• Choose a start value (initial guess) x(0)
• Loop over k until a termination criterion is fulfilled:
x(k+1) := x(k) − G(x(k))−1g(x(k))
We write g (k) := g(x(k)) and G(k) := G(x(k)).
s(k) := −G(x(k))−1g(x(k)) is called the Newton direction.

42
3.4: Newton Iteration - Problems
Problems and their remedies:
• Local Convergence: Requires good initial guesses x(0). Remedy: Global-

ization → Line search method
• Requires Hessian G. Remedy: Choose a numerical approximation of G

or even better of G−1 → Quasi Newton methods
x(k+1) := x(k) + α(k)s(k)

with
s(k) := −H (k)g (k)

43
3.5: Line search
Determine α(k) either
by exact line search:
α(k) := argminα f (x(k) + αs(k))
or
by inexact line search:
see Reference[1], Reference[3] in the course literature.
(An interval containing acceptable points is determined and α is taken as

an element of this interval.)
44
3.6 Acceptable Points
Define fα(α) = f (x(k) + αs(k)).
Consider two points αL(= 0) and αU(= 1099) (a big number).
A point α0 ∈ [αL, αU] is called acceptable if it fulfils a left condition LC

and a right condition RC.
We give two examples for these conditions on the next slides.

45
3.7 Goldstein conditions
α0 is an acceptable point if:
LC fα(α0) ≥ fα(αL) + (1 − ρ)(α0 − αL)fα0 (αL)
RC fα(α0) ≤ fα(αL) + ρ(α0 − αL)fα0 (αL)
is fulfilled for a given method parameter ρ ∈ [0, 12 ]. (see Fig. 2.5.1 on p. 28 in Ref. [1].)

46
3.8 Wolfe-Powell conditions
α0 is an acceptable point if:
LC fα0 (α0) ≥ σfα0 (αL)
RC fα(α0) ≤ fα(αL) + ρ(α0 − αL)fα0 (αL)
is fulfilled for a given method parameter ρ ∈ [0, 12 ] and σ ∈ [0, 1] with σ > ρ.
(see Fig. 2.5.2 on p. 28 in Ref. [1].)

47
3.9 Algorithm for inexact line search (Flow chart)
Consider one of the two ways of choosing conditions LC, RC given above.
Assume that method parameters ρ, σ, τ, χ are given as well as a guess α0 and the function
values fα(α0), fα(αL) and the corresponding derivatives.

48
3.10 Algorithm for inexact line search (Blocks)
New estimates for α0 are computed by the two blocks depending on the
cases above:
Block 1: Block 2:
∆α0 by extrapolation αU := min(α0, αU)

∆α0 := max(∆α0, τ (α0 − αL)) ᾱ0 by interpolation
∆α0 := min(∆α0, χ(α0 − αL)) ᾱ0 := max(ᾱ0, αL + τ (αU − αL))
αL := α0 ᾱ0 := min(ᾱ0, αU − τ (αU − αL))
α0 := α0 + ∆α0 α0 := ᾱ0
49
3.11 Algorithm for inexact line search (Inter-
/Extrapolation)
The extrapolation used in the last last is given by
fα0 (α0)
∆α0 = (α0 − αL) 0
fα(αL) − fα0 (α0)
and the interpolation by
(α0 − αL)2fα0 (αL)

ᾱ0 =
2(fα(αL) − fα(α0) + (α0 − αL)fα0 (αL))

50
3.12 Algorithm for inexact line search (Parameter values)
Recommended default values for the method parameters are:
• ρ = 0.1
• σ = 0.7
• τ = 0.1
• χ = 9.

51
3.13: Quasi Newton methods
A typical step looks like:
• Compute s(k) := −H (k)g (k)
• Perform line search to compute α(k).
• Compute x(k+1) := x(k) + α(k)s(k)
• Update by some method H (k) → H (k+1)

52
3.14 Generalization of Secant method
In R1:
x(k+1) := x(k) − H (k)g (k)
with
1 (k) g (k) − g (k−1)
=Q = k
H (k) x − xk−1
In general Rn:
Q(k)(xk − xk−1) = g (k) − g (k−1)
(k)
n equations for n2 unknowns Qij .
→ extra conditions needed.

53
3.15 Broyden condition
Find Q(k) by solving

min kQ(k) − Q(k−1)kF
subject to
Q(k) (xk − xk−1) = g| (k) −{zg (k−1)}
| {z }
δ (k) γ (k)
This gives
(k) (k−1) γ (k) − Q(k−1)δ (k) T
Q =Q + T
δ (k)
δ (k) δ (k)

54
3.16 Broyden condition (Cont.)
Broyden update is a rank-1 update of the form
A1 = A0 + vwT
Sherman – Morrison formula gives for the inverse
−1 T −1
A vw A0
A−1
1
−1
= A0 + 0
1 − wTA−10 v

55

Unit03 Xxjd99y

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Unit03 Xxjd99y

Caricato da

Copyright:

Formati disponibili

Unit 3: Quasi Newton Methods in Optimization

Let f : Rn → R, the objective function.

Find x∗ such that

Führer: FMNN25 2018

• The gradient ∇f (x) =: g(x) and write it as a row matrix.

• The Hessian ∇2f (x) =: G(x). It is a n × n matrix.

Note: G is a symmetric matrix.

Führer: FMNN25 2018

• G(x∗) (strictly) positive definite.

Führer: FMNN25 2018

Solve g(x∗) = 0 by iteration:

• Choose a start value (initial guess) x(0)

• Loop over k until a termination criterion is fulfilled:

x(k+1) := x(k) − G(x(k))−1g(x(k))

We write g (k) := g(x(k)) and G(k) := G(x(k)).

s(k) := −G(x(k))−1g(x(k)) is called the Newton direction.

Führer: FMNN25 2018

Problems and their remedies:

• Local Convergence: Requires good initial guesses x(0). Remedy: Global-

• Requires Hessian G. Remedy: Choose a numerical approximation of G

x(k+1) := x(k) + α(k)s(k)

Führer: FMNN25 2018

Determine α(k) either

by exact line search:

α(k) := argminα f (x(k) + αs(k))

by inexact line search:

see Reference[1], Reference[3] in the course literature.

(An interval containing acceptable points is determined and α is taken as

Define fα(α) = f (x(k) + αs(k)).

Consider two points αL(= 0) and αU(= 1099) (a big number).

A point α0 ∈ [αL, αU] is called acceptable if it fulfils a left condition LC

We give two examples for these conditions on the next slides.

Führer: FMNN25 2018

α0 is an acceptable point if:

LC fα(α0) ≥ fα(αL) + (1 − ρ)(α0 − αL)fα0 (αL)

RC fα(α0) ≤ fα(αL) + ρ(α0 − αL)fα0 (αL)

Führer: FMNN25 2018

α0 is an acceptable point if:

LC fα0 (α0) ≥ σfα0 (αL)

RC fα(α0) ≤ fα(αL) + ρ(α0 − αL)fα0 (αL)

Führer: FMNN25 2018

Führer: FMNN25 2018

∆α0 by extrapolation αU := min(α0, αU)

The extrapolation used in the last last is given by

and the interpolation by

(α0 − αL)2fα0 (αL)

Führer: FMNN25 2018

Recommended default values for the method parameters are:

Führer: FMNN25 2018

A typical step looks like:

• Compute s(k) := −H (k)g (k)

• Perform line search to compute α(k).

• Compute x(k+1) := x(k) + α(k)s(k)

• Update by some method H (k) → H (k+1)

Führer: FMNN25 2018

→ extra conditions needed.

Führer: FMNN25 2018

Find Q(k) by solving

Führer: FMNN25 2018

Broyden update is a rank-1 update of the form

Sherman – Morrison formula gives for the inverse

Führer: FMNN25 2018

Potrebbero piacerti anche