Sei sulla pagina 1di 17

Unit 3: Quasi Newton Methods in Optimization

Let f : Rn → R, the objective function.

Find x∗ such that


f (x∗) = minn f (x)
x∈R

Führer: FMNN25 2018


39
3.1: Notations

We denote

• The gradient ∇f (x) =: g(x) and write it as a row matrix.

• The Hessian ∇2f (x) =: G(x). It is a n × n matrix.

Note: G is a symmetric matrix.

Führer: FMNN25 2018


40
3.2: Characterization

x∗ is a local minimizer of f , if

• g(x∗) = 0

• G(x∗) (strictly) positive definite.

A test for positive definiteness can be made together with linear system
solving by Cholesky ’s method. See scipy.linalg.chol.

Führer: FMNN25 2018


41
3.3: Base Method: Newton Iteration

Solve g(x∗) = 0 by iteration:

• Choose a start value (initial guess) x(0)

• Loop over k until a termination criterion is fulfilled:

x(k+1) := x(k) − G(x(k))−1g(x(k))

We write g (k) := g(x(k)) and G(k) := G(x(k)).

s(k) := −G(x(k))−1g(x(k)) is called the Newton direction.

Führer: FMNN25 2018


42
3.4: Newton Iteration - Problems

Problems and their remedies:

• Local Convergence: Requires good initial guesses x(0). Remedy: Global-


ization → Line search method

• Requires Hessian G. Remedy: Choose a numerical approximation of G


or even better of G−1 → Quasi Newton methods

x(k+1) := x(k) + α(k)s(k)


with
s(k) := −H (k)g (k)

Führer: FMNN25 2018


43
3.5: Line search

Determine α(k) either

by exact line search:

α(k) := argminα f (x(k) + αs(k))

or

by inexact line search:

see Reference[1], Reference[3] in the course literature.

(An interval containing acceptable points is determined and α is taken as


an element of this interval.)
Führer: FMNN25 2018
44
3.6 Acceptable Points

Define fα(α) = f (x(k) + αs(k)).

Consider two points αL(= 0) and αU(= 1099) (a big number).

A point α0 ∈ [αL, αU] is called acceptable if it fulfils a left condition LC


and a right condition RC.

We give two examples for these conditions on the next slides.

Führer: FMNN25 2018


45
3.7 Goldstein conditions

α0 is an acceptable point if:

LC fα(α0) ≥ fα(αL) + (1 − ρ)(α0 − αL)fα0 (αL)

RC fα(α0) ≤ fα(αL) + ρ(α0 − αL)fα0 (αL)

is fulfilled for a given method parameter ρ ∈ [0, 12 ]. (see Fig. 2.5.1 on p. 28 in Ref. [1].)

Führer: FMNN25 2018


46
3.8 Wolfe-Powell conditions

α0 is an acceptable point if:

LC fα0 (α0) ≥ σfα0 (αL)

RC fα(α0) ≤ fα(αL) + ρ(α0 − αL)fα0 (αL)

is fulfilled for a given method parameter ρ ∈ [0, 12 ] and σ ∈ [0, 1] with σ > ρ.
(see Fig. 2.5.2 on p. 28 in Ref. [1].)

Führer: FMNN25 2018


47
3.9 Algorithm for inexact line search (Flow chart)

Consider one of the two ways of choosing conditions LC, RC given above.

Assume that method parameters ρ, σ, τ, χ are given as well as a guess α0 and the function
values fα(α0), fα(αL) and the corresponding derivatives.

Führer: FMNN25 2018


48
3.10 Algorithm for inexact line search (Blocks)

New estimates for α0 are computed by the two blocks depending on the
cases above:

Block 1: Block 2:

∆α0 by extrapolation αU := min(α0, αU)


∆α0 := max(∆α0, τ (α0 − αL)) ᾱ0 by interpolation
∆α0 := min(∆α0, χ(α0 − αL)) ᾱ0 := max(ᾱ0, αL + τ (αU − αL))
αL := α0 ᾱ0 := min(ᾱ0, αU − τ (αU − αL))
α0 := α0 + ∆α0 α0 := ᾱ0
Führer: FMNN25 2018
49
3.11 Algorithm for inexact line search (Inter-
/Extrapolation)

The extrapolation used in the last last is given by

fα0 (α0)
∆α0 = (α0 − αL) 0
fα(αL) − fα0 (α0)

and the interpolation by

(α0 − αL)2fα0 (αL)


ᾱ0 =
2(fα(αL) − fα(α0) + (α0 − αL)fα0 (αL))

Führer: FMNN25 2018


50
3.12 Algorithm for inexact line search (Parameter values)

Recommended default values for the method parameters are:

• ρ = 0.1

• σ = 0.7

• τ = 0.1

• χ = 9.

Führer: FMNN25 2018


51
3.13: Quasi Newton methods

A typical step looks like:

• Compute s(k) := −H (k)g (k)

• Perform line search to compute α(k).

• Compute x(k+1) := x(k) + α(k)s(k)

• Update by some method H (k) → H (k+1)

Führer: FMNN25 2018


52
3.14 Generalization of Secant method

In R1:
x(k+1) := x(k) − H (k)g (k)
with
1 (k) g (k) − g (k−1)
=Q = k
H (k) x − xk−1

In general Rn:
Q(k)(xk − xk−1) = g (k) − g (k−1)
(k)
n equations for n2 unknowns Qij .

→ extra conditions needed.

Führer: FMNN25 2018


53
3.15 Broyden condition

Find Q(k) by solving


min kQ(k) − Q(k−1)kF
subject to
Q(k) (xk − xk−1) = g| (k) −{zg (k−1)}
| {z }
δ (k) γ (k)

This gives
(k) (k−1) γ (k) − Q(k−1)δ (k) T
Q =Q + T
δ (k)
δ (k) δ (k)

Führer: FMNN25 2018


54
3.16 Broyden condition (Cont.)

Broyden update is a rank-1 update of the form

A1 = A0 + vwT

Sherman – Morrison formula gives for the inverse

−1 T −1
A vw A0
A−1
1
−1
= A0 + 0
1 − wTA−10 v

Führer: FMNN25 2018


55

Potrebbero piacerti anche