L2 Static Optimization Unconstrained Numerical

Lecture outline
Static optimization unconstrained problems

Graduate course on Optimal and Robust Control (spring12) Derivative-free optimization Nelder-Mead simplex method
Zden ek Hur ak
Department of Control Engineering Faculty of Electrical Engineering Czech Technical University in Prague
Derivative-based optimization Line search methods

Methods for line search (step length) Methods for descent direction search
February 19, 2013
Trust region methods
1 / 38
2 / 38
Numerical algorithms for unconstrained optimization
Derivative-free methods Nelder-Mead simplex method

Not to be confused with simplex method in linear programming!
2
The key classication Methods based on derivatives Derivative-free methods (Nelder-Mead)

1 3
fminsearch() in Matlab
3 / 38
4 / 38
Derivative-based methods
Line search methods
1. descent direction search . . . dk Line search methods Trust region methods xk +1 = xk + k pk 2. line search (step length determination) . . . k
5 / 38
6 / 38
Methods for line search
Fibonacci search
Fibonacci sequence = 1, 1, 2, 3, 5, 8, 13, . . . Fix the number of intervals at the beginning. Say, 13:
f(x)
1. Fibonacci, golden section 2. Bisection 3. Newton 4. Inexact line search

1 2 3 5 8 13 x
Start by evaluating f (x ) at x = 5 and x = 8. Need 4 evaluations (13 is the n = 4th Fib. number). In general n 2 steps and the uncertainty (b a)/Fn Improvement in the uncertainty Fn1 /Fn .
n
7 / 38
lim Fn1 /Fn =
(1 +
5)/2
0.618
8 / 38
Golden section search
Speed of convergence Order of convergence
dk +1 /dk 0.618
f(x)
Order p of convergence of the sequence {rk } to r 0 lim sup

k
rk +1 r < (rk r )p
Examples: rk = ak ,
x a x2 x1 b
0<a<1 0<a<1
rk = a(2 ) ,
9 / 38
10 / 38
Linear convergence
Bisection method
rk +1 r =<1 k rk r lim Geometric series rk = c k Comparisons of two linearly converging algorithms based on their convergence ratios .
f(x)
x a x1 b
For = 0: superlinear convergence. For = 1: sublinear convergence. Ex.: rk = 1/k
11 / 38
12 / 38
Newtons method for line search

Approximate the function by a parabola (use f (xk ), f (xk ) and f (xk )): 1 q (x ) = f (xk ) + f (xk )(x xk ) + f (xk )(x xk )2 2 Find the minimum of the approximating function can be done analytically: 0 = q (x ) = f (xk ) + f (xk )(x xk ) f (xk ) = xk f (xk )
Newtons method for line search

0.8 0.75 0.7 0.65 0.6 f(x) 0.55 0.5 0.45 0.4 f (x ) (x0 )(x x0 ) x0 + f(x0 )(x x0 ) + 1/2f
xk +1
0.35 0.3 0
0.1
0.2
0.3
0.4
0.5 x
0.6
0.7
0.8
0.9
13 / 38
14 / 38
Another look at Newtons method equation solving

Solving g (x ) = 0
Quadratic convergence of Newtons method

Lets stay with the equation solving formulation.
g ( x)
xk +1 x = xk x
g ( xk ) g ( xk )
x 0 xk+1 xk x
g (xk ) g (x ) g (xk ) g (xk ) g (x ) + g (xk )(x xk ) = g (xk ) 1 g ( ) = (xk x )2 2 g (xk ) k1 |xk x |2 2k2
|xk +1 x | xk +1 = xk g (xk ) g (xk )

15 / 38
16 / 38
Methods for descent direction search
Steepest descent
Condition for descending direction dT k f (xk ) < 0 Recall the geometric interpettation of an inner product |dT k f (xk )| = |dk ||f (xk )| cos The steepest descent xk +1 = xk k f (xk )
1. steepest descent 2. Newton 3. Quasi-Newton 4. Conjugate direction /conjugage gradient method (CG)
17 / 38
18 / 38
Steepest descent applied to quadratic cost

1 f (x) = xT Qx bT x 2 Find that minimizes f (xk k fk ) 1 f (xk k fk ) = (xk k fk )T Q(xk k fk ) bT (xk k fk ) 2 Using that gradient is f (x) = Qx b we get upon dierentiation wrt fkT fk k = fkT Qfk Hence steepest descent method is xk +1 = xk fkT fk fk fkT Qfk
19 / 38
Zigzagging of the steepest descent method

100 80 60
7500
7500
750
5000
12
50
10
500 0
00
2500
22 5 20 00 0 17 00 50 15 0 00 0
75
25
40 20
00
12 50 0
00
10
50 00
0 00
500 0
25
x2
0
10 00 0
00
0
750
20 40 60 80
75
15 00 17 0 50 20 0 00 22 0 50 0
50
10
12
50
100 100
80
60
40
20
0 x
20
40
60
80
75
75
00
5000
Poor convergence rate depending on the scaling.

20 / 38
00
00 0
500
00
250
250
12 50 0
00
100
Newtons search (also Newton-Raphson)
Solving symmetric positive denite linear equations
Idea: The function to be minimized is approximated locally by a quadratic function and this approximating function is minimized exactly. xk +1 = xk [2 f (xk )]1 f (xk )
Hessian gradient
Solve Ax = b Cholesky factorization A = X X .
Local convergence guaranteed but not global!
21 / 38
22 / 38
Modications of Newtons search damping
Modications of Newtons search positive deniteness

Positive denite matrix Mk instead of Hessian
1 xk +1 = xk k M k f (xk )
Interprettation in the scalar nonlinear equation case:

g ( x)
A search parameter introduced xk +1 = xk k [2 f (xk )]1 f (xk )
Another approach Bk = 2 f (xk ) + Ek where Ek = 0 if f (xk ) is suciently positive denite, otherwise it is chosen so that Bk > 0.
23 / 38 24 / 38
Quasi-Newton
From the denition of Hessian 2 fk (xk +1 xk ) fk +1 fk
sk yk
Popular updates in quasi-Newton methods

(yk Bk sk )(yk Bk sk )T (yk Bk sk )T sk
Symmetric-rank-one (SR1): Bk +1 = Bk + BFGS: Bk +1 = Bk

B k sk sT k Bk sT k B k sk
T yk yk Ts yk k
Find a matrix Bk +1 that mimics the Hessian behaviour above. Bk +1 sk = yk Typically two requirements symmetry (as Hessian) low-rank approximation between the steps
As inversion of Bk is needed, the update can be applied to its inverse directly: DFP (Davidon, Fletcher and Powell) Other updates keep the Hessian in factored formulation H = RRT Matlab chol() function ... Cholesky factorization
25 / 38
26 / 38
Conjugate gradient directions
Inexact line search
Armijo Goldstein Wolfe
27 / 38
28 / 38
Intuitive approach step size reduction
However, convergence to minimum not guaranteed: s (1x )2 4 2(1 x ) if x > 1 x )2 f (x ) = s (1+ 2(1 + x ) if x < 1 4 2 x 1 if 1 x 1.
Start with an initial step size s and if the corresponding vector xk + s d does not yield an improved (smaller) value of f (), that is, if f (xk + s d) f (xk ), reduce the step size, possibly by a xed factor. Repeat.
2 1
f 2.5 2.0 1.5 1.0 0.5 1 2 x
0.5 1.0
xk +1 = xk 1f (xk ).
29 / 38 30 / 38
Armijos condition
Goldsteins condition
31 / 38
32 / 38
Wolfes condition
Terminal conditions
33 / 38
34 / 38

Recall 1 f (xk + p) f (xk ) + f T (xk ) p + p2 f (xk )p 2 We seek the minimum of the quadratic model function 1 mk (p) = f (xk ) + f (xk ) p + pBk p 2
T
Trust region Line search direction Trust region step
Contours of mk (x)
subject to p k . For Bk = 2 f (xk ) trust-region Newton method.

Contours of f (x)
35 / 38
36 / 38
Software
Summary
1. Optimization toolbox for Matlab: fminunc() (trust-region Newton), fminsearch() (Nelder-Mead simplex) 2. UnconstrainedProblems package for Mathematica: FindMinimumPlot, FindMinimum
line search methods: direction search, step lentgth determination, Newton methods, quasi-Newton. trust region methods.
37 / 38
38 / 38

L2 Static Optimization Unconstrained Numerical

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

L2 Static Optimization Unconstrained Numerical

Caricato da

Copyright:

Formati disponibili

Lecture outline

Static optimization unconstrained problems

Derivative-based optimization Line search methods

February 19, 2013

Trust region methods

Numerical algorithms for unconstrained optimization

Derivative-free methods Nelder-Mead simplex method

The key classication Methods based on derivatives Derivative-free methods (Nelder-Mead)

Line search methods

Methods for line search

1. Fibonacci, golden section 2. Bisection 3. Newton 4. Inexact line search

lim Fn1 /Fn =

Golden section search

Speed of convergence Order of convergence

Order p of convergence of the sequence {rk } to r 0 lim sup

For = 0: superlinear convergence. For = 1: sublinear convergence. Ex.: rk = 1/k

Newtons method for line search

Newtons method for line search

Another look at Newtons method equation solving

Quadratic convergence of Newtons method

|xk +1 x | xk +1 = xk g (xk ) g (xk )

Methods for descent direction search

Steepest descent applied to quadratic cost

Zigzagging of the steepest descent method

Poor convergence rate depending on the scaling.

Newtons search (also Newton-Raphson)

Solving symmetric positive denite linear equations

Solve Ax = b Cholesky factorization A = X X .

Local convergence guaranteed but not global!

Modications of Newtons search damping

Modications of Newtons search positive deniteness

Interprettation in the scalar nonlinear equation case:

A search parameter introduced xk +1 = xk k [2 f (xk )]1 f (xk )

Popular updates in quasi-Newton methods

Symmetric-rank-one (SR1): Bk +1 = Bk + BFGS: Bk +1 = Bk

Conjugate gradient directions

Inexact line search

Armijo Goldstein Wolfe

Intuitive approach step size reduction

f 2.5 2.0 1.5 1.0 0.5 1 2 x

Trust region methods

Trust region methods

Trust region Line search direction Trust region step

subject to p k . For Bk = 2 f (xk ) trust-region Newton method.

Potrebbero piacerti anche