Sei sulla pagina 1di 30

Topic and contents

School of Mathematics and Statistics

MATH3161/MATH5165 – Optimization
Prof Jeya Jeyakumar
✞ ☎
Topic 01 – Optimization – What is it? ✆

1 Optimization 2 Mathematical background


What is optimization? Vector norms
Optimization in the “real world” Cauchy-Schwarz Inequality
Mathematics of optimization Optima and optimizers
Variables Existence
Objective Relaxation
Constraints Derivatives
Standard formulation Positive definite matrices

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 1 / 30


Optimization What is optimization?

What is optimization?
Definition 1 (Optimization/Optimisation)
Optimization is a process that finds the “best” possible solutions from a
set of feasible solutions. When you optimize something, you are “making
it best”.

“Optimization” comes from the same root as “optimal” which means


best. But “best” can vary.
Meaning of “best” =⇒ concept of ordering
Business/Economics:
Maximize: profit, return, utility. . .
Minimize: cost, risk, . . .
Engineering
Maximize: strength, production, . . .
Minimize: cost, materials, time, . . .
Definition 2 (What is an optimization problem?)
An optimization problem is a mathematical problem of finding the best possible
solution from a set of feasible solutions. It has the form of minimizing (or
maximizing) an objective function subject to constraints.
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 2 / 30
Optimization Optimization in the “real world”

Optimization in the “Real World”


The applicability of optimization is widespread, reaching into almost
every activity in which numerical information is processed. To provide
a comprehensive account of all these application areas would therefore
be unrealistic, but a selection might include:
Traditional Application Areas: Portfolio management problems in
Banking and Finance; structural design problems in manufacturing
and engineering; resource allocation and scheduling problems in
commerce; farm planning problems in Agriculture etc.
Emerging Scientific Areas: Sensor network localization problems in
wireless communications; data mining problems in machine learning
and information sciences; optimization models as decision support
tools in medicine (e.g. optimization based screening algorithms for
identifying neurological and other disorders).
See the course webpage on Moodle for recent research papers on
these emerging application areas in a folder–additional resources.
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 3 / 30
Optimization Mathematics of optimization

Mathematics of Optimization–Outline
Mathematical model
Variables
Objective
Constraints
Characterising optima ⇐⇒ Optimality principles
What is an optimum/optimizer?
Formulae for identifying/characterizing optima
Derivatives I Newton (1642 – 1726), G W Leibniz (1646 – 1711)
Analytic methods for finding optima
Fermat (1601 – 1665), L Euler (1707 – 1783), Lagrange (1736 – 1813)
Finding optima =⇒ Numerical methods
Newton’s method: I Newton (1642 – 1726), C F Gauss (1777 – 1855)
Linear programming: L Kantorovich (1912 – 1986), G B Dantzig (1914 – 2005)
Computer algorithms Linear algebra
Convexity R T Rockafellar (1938 – )
Duality J von Neumann (1903 – 1957)
Maximum principle L S Pontryagin (1908-1988)
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 4 / 30
Optimization Variables

Variables

Decision variables: what can you change


Finite dimensional x ∈ Rn , Number of variables n
Column vector  
x1
 x2 
x= ..
 

 . 
xn
Univariate: n = 1, x ∈ R
Multivariate: n = 2, 3, 4, . . . up to millions
Mathematical background: column vectors, symmetric matrices, matrix
operations
Discrete optimization: xi ∈ {0, 1}, xi ∈ N, xi ∈ Z
2
Matrix of variables X ∈ Rm×m =⇒ x ∈ Rm
Infinite dimensional: the control
A function u ∈ C([a, b])
A function u ∈ L1 ([0, T ])
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 5 / 30
Optimization Objective

Objective
A mathematical function of the variables quantifying the idea of “best”.
Finite dimensional: variables x ∈ Rn , Objective function f : Rn → R
Linear f (x) = gT x, g ∈ Rn fixed
Affine f (x) = gT x + f0 , g ∈ Rn , f0 ∈ R fixed
Quadratic f (x) = 21 xT Gx + gT x + f0 , G ∈ Rn×n , g ∈ Rn , f0 ∈ R
Infinite dimensional: f : C([0, T ] → R, variables u ∈ C([0, T ])
Z T
f (u) = u(t)dt
0

Co-domain of objective function must be ordered (total order)


If α, β, γ ∈ R, then
α ≤ β and β ≤ α ⇐⇒ α = β
α ≤ β and β ≤ γ =⇒ α ≤ γ
Either α ≤ β or β ≤ α
If u, v ∈ Rn , n ≥ 2,
u ≤ v ⇐⇒ v − u ≥ 0 ⇐⇒ (vi − ui ) ≥ 0, i = 1, . . . , n

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 6 / 30


Optimization Constraints

Constraints
Constraints: Describe restrictions on the allowable values of variables
Constraint structure for variables x ∈ Rn
Equality constraints ci (x) = 0, i = 1, . . . , mE
Inequality constraints ci (x) ≤ 0, i = mE + 1, . . . , m
Feasible region Ω ⊂ Rn

Ω = {x ∈ Rn : ci (x) = 0, i = 1, . . . , mE ;
ci (x) ≤ 0, i = mE + 1, . . . , m}

Unconstrained problem ⇐⇒ Ω = Rn
Standard formulation ĉi (x) ≥ 0 ⇐⇒ −ĉi (x) ≤ 0
Algebraic structure of constraints
Simple bounds: ℓ ≤ x ≤ u ⇐⇒ ℓi ≤ xi ≤ ui , i = 1, . . . , n
Linear constraints ci (x) = aTi x − bi
Nonlinear constraints

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 7 / 30


Optimization Constraints

Constraint representation
Example 3 (Constraint representation)
What feasible regions do the following constraints represent?
p
(x1 − a1 )2 + (x2 − a2 )2 = r
p
(x1 − a1 )2 + (x2 − a2 )2 ≤ r
p
(x1 − a1 )2 + (x2 − a2 )2 ≥ r
x2 = x21
x2 ≥ x21
x21 − 1 = 0

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 8 / 30


Optimization Standard formulation

Standard formulation

Definition 4 (Standard formulation)


The standard formulation of a continuous finite dimensional optimization is

Minimize f (x)
x ∈ Rn
subject to ci (x) = 0, i = 1, . . . , mE ;
ci (x) ≤ 0, i = mE + 1, . . . , m

Conversions
Maximize to minimize: max fˆ(x) = − min −fˆ(x)
Constraint right-hand-side: ĉi (x) = bi ⇐⇒ ĉi (x) − bi = 0
Less than or equal to inequalities: ĉi (x) ≥ 0 ⇐⇒ −ĉi (x) ≤ 0
Strict inequality: ĉi (x) < 0 ⇐⇒ ĉi (x) + ǫ ≤ 0, some ǫ > 0

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 9 / 30


Optimization Standard formulation

A simplified farm planning problem


Example 5 (Farm Planning)
Farmer Jack has 100 acres to devote to wheat and corn and wishes to plan
his planting to maximize the expected revenue. Jack has only $800 in
capital to apply to planting the crops, and it costs $5 to plant an acre of
wheat and $10 for an acre of corn. Their other activities leave the Jack
family only 150 days of labour to devote to the crops. Two days will be
required for each acre of wheat and one day for an acre of corn. If past
experience indicates a return of $40 from each acre of wheat and $30 from
each acre of corn, how many acres of each should be planted to maximize
his revenue?
Pose this as an optimization problem in standard form.

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 10 / 30


Optimization Standard formulation

Post Office Parcel problem


Example 6 (Post Office Parcel Problem)
At one time the post office regulations were that
the length plus the girth of a parcel must not exceed 1.8 metres.
What is the parcel of largest volume that could be sent through the post?

Pose this as an optimization problem in standard form. Assume that


The parcel has rectangular sides
The length of the parcel is the longest edge
The girth is the distance around the parcel perpendicular to the
length. For a rectangular box, girth is 2×(height + depth).

x3

x1

x2
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 11 / 30
Optimization Standard formulation

Standard formulation – Example 2


Example 7 (Standard formulation)
Maximize −x21 − (x2 − 1)2 (x2 − 3)2 − x2 /2 on the set

Ω = {x ∈ R2 : x2 ≥ x21 , x1 ≤ 1, x2 ≤ 2 + x1 , x2 ≤ 4 − x1 }
Write this problem in standard form and plot the feasible region Ω.
What is the feasible region if the first constraint becomes x2 = x21 ?
4

3.5

2.5

1.5

0.5

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 12 / 30


Mathematical background Vector norms

Measures of size - norms


If α ∈ R then its magnitude (orabsolute value) |α| is
α if α ≥ 0;
|α| =
−α if α ≤ 0.
In Rn there are several possible definitions – norms denoted using || ||.
Definition 8 (Vector norm)
A vector norm on Rn is a function ||.|| from Rn to R such that
1 kxk ≥ 0 for all x ∈ Rn and kxk = 0 ⇐⇒ x = 0.
2 kx + yk ≤ kxk + kyk for all x, y ∈ Rn . (Triangle inequality)
3 kαxk = |α| kxk for all α ∈ R, x ∈ Rn .
Example 9 (Vector norms)
The most widely used vector norms are
1-norm kxk1 = ni=1 |xi |
P
Pn 1
2 2 = (xT x) 21
2-norm kxk2 = i=1 |xi |
∞-norm or maximum norm kxk∞ = maxi=1,...,n |xi |
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 13 / 30
Mathematical background Cauchy-Schwarz Inequality

Cauchy-Schwarz Inequality
An important property connecting the dot product of two vectors and their
norms is the Cauchy-Schwarz inequality:

|xT y| ≤ kxk2 kyk2 , for any x, y ∈ Rn .

Equality holds if and only if x and y are linearly dependent.


Ex*. Show that the 2-norm, given by

f (x) = kxk2 ,

satisfies (1)–(3) of Definition 7 (vector norm). [Hint:


Cauchy-Schwarz inequality is useful here for verifying (2)].

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 14 / 30


Mathematical background Optima and optimizers

Local and global minima


Definition 10 (Global minimum)
A point x∗ ∈ Ω is a global minimizer of f (x) over Ω ⊆ Rn ⇐⇒
f (x∗ ) ≤ f (x) for all x ∈ Ω. The global minimum is f (x∗ ).

Definition 11 (Strict global minimum)


A point x∗ ∈ Ω is a strict global minimizer of f (x) over Ω ⊆ Rn ⇐⇒
f (x∗ ) < f (x) for all x ∈ Ω, x 6= x∗ .
Definition 12 (Local minimum)
A point x∗ ∈ Ω is a local minimizer of f (x) over Ω ⊆ Rn ⇐⇒
there exists a δ > 0 such that f (x∗ ) ≤ f (x) for all x ∈ Ω with
kx − x∗ k ≤ δ. Then f (x∗ ) is a local minimum.
Definition 13 (Strict local minimum)
A point x∗ ∈ Ω is a strict local minimizer of f (x) over Ω ⊆ Rn ⇐⇒
there exists a δ > 0 such that f (x∗ ) < f (x) for all x ∈ Ω with
0 < kx − x∗ k ≤ δ.
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 15 / 30
Mathematical background Optima and optimizers

Local and global maxima


Definition 14 (Global maximum)
A point x∗ ∈ Ω is a global maximizer of f (x) over Ω ⊆ Rn ⇐⇒
f (x∗ ) ≥ f (x) for all x ∈ Ω. The global maximum is f (x∗ ).
Definition 15 (Strict global maximum)
A point x∗ ∈ Ω is a strict global maximizer of f (x) over Ω ⊆ Rn ⇐⇒
f (x∗ ) > f (x) for all x ∈ Ω, x 6= x∗ .
Definition 16 (Local maximum)
A point x∗ ∈ Ω is a local maximizer of f (x) over Ω ⊆ Rn ⇐⇒
there exists a δ > 0 such that f (x∗ ) ≥ f (x) for all x ∈ Ω with
kx − x∗ k ≤ δ. Then f (x∗ ) is a local maximum.
Definition 17 (Strict local maximum)
A point x∗ ∈ Ω is a strict local maximizer of f (x) over Ω ⊆ Rn ⇐⇒
there exists a δ > 0 such that f (x∗ ) > f (x) for all x ∈ Ω with
0 < kx − x∗ k ≤ δ.
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 16 / 30
Mathematical background Optima and optimizers

Local and global minima and maxima


Example 18 (Local and global minima and maxima)
(x − 1)2


 if x ≤ 0.5;
 0.25 if 0.5 < x ≤ 1;


Ω = [0, 5], f (x) = 1.25 − (x − 2)2 if 1 < x ≤ 2.5;
 x − 1.5 if 2.5 < x ≤ 3;



1.5 − 0.25 sin(π(x − 3)) if 3 < x.

Local and global extrema


1.8

1.6

1.4

1.2

f(x) 1

0.8

0.6

0.4

0.2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 17 / 30
Mathematical background Optima and optimizers

Local and global minima and maxima


Solution 19 (Local and global minima and maxima)
Consider f (x) on the interval Ω = [0, 5] and the points
0, 0.5, 2, 2.5, 3, 3.5, 4.5, 5
x(a) = 0 is strict local maximizer with f (a) = 1;
any point x(b) in the interval [0.5, 1] is a local and global minimizer
with f (b) = 0.25 (but not strict as adjacent points have the same
function value);
x(c) = 2 is a strict local maximizer with f (c) = 1.25;
x(d) = 2.5 is a strict local minimizer with f (d) = 1;
x(e) = 3 is a strict local maximizer with f (e) = 1.5;
x(f ) = 3.5 is a strict local minimizer with f (f ) = 1.25;
x(g) = 4.5 is a strict local and global maximizer with f (g) = 1.75;
x(h) = 5 is a strict local minimizer with f (h) = 1.5.

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 18 / 30


Mathematical background Existence

Existence
Definition 20 (Extrema)
The global/local extrema of f over Ω are all the global/local minima and
all the global/local maxima

Proposition 1 (Existence of global extrema)


Let Ω be a compact set and let f be continuous on Ω. Then the global
extrema of f over Ω exist.
Finite dimensional Ω ⊆ Rn is compact ⇐⇒ Ω is closed and bounded

Example 21 (Existence)
Find the global extrema, if they exist, for the following problems
f (x) = e−x on Ω = [0, 1]
f (x) = e−x on Ω = [0, ∞)
f (x) = sin x on Ω = [0, 2π)
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 19 / 30
Mathematical background Relaxation

Relaxation

Proposition 2 (Relaxation)
If f : Rn → R and Ω̄ ⊆ Ω then

min f (x) ≤ min f (x)


x∈Ω x∈Ω̄

Thus, the minimum value of the relaxation problem ≤ the minimum value of the
original problem.

Proof.
Let x∗ ∈ Ω̄ and f (x∗ ) be the global minimizer and minimum of f over Ω̄.
As Ω̄ ⊆ Ω, x∗ ∈ Ω̄ =⇒ x∗ ∈ Ω. Thus, minx∈Ω f (x) ≤ f (x∗ ).

If you make the feasible region larger then the minimum value of
objective function must not increase
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 20 / 30
Mathematical background Derivatives

Gradients
Definition 22 (Gradient)
Let f : Rn → R be continuously differentiable.
The gradient ∇f : Rn → Rn of f at x is
 
∂f (x)
∂x1
 
 ∂f (x) 
∇f (x) = 
 ∂x2 
.. 
.
 
 
∂f (x)
∂xn

The gradient is a column vector with n elements


The gradient ∇f (x̄) of f at x̄ is orthogonal to the contour
{x ∈ Rn : f (x) = f (x̄)}

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 21 / 30


Mathematical background Derivatives

Hessians
Definition 23 (Hessian)
Let f : Rn → R be continuously differentiable. The Hessian
∇2 f : Rn → Rn×n of f at x is
 ∂ 2 f (x) ∂ 2 f (x) 2
· · · ∂∂xf1(x)

∂x21 ∂x1 x2 xn
 2
 ∂ f (x) ∂ 2 f (x) · · · ∂ 2 f (x) 

 ∂x2 x1 ∂x22 ∂x2 xn 
∇2 f (x) =  . . ..

 .. .. . .. 

 2 . 

∂ f (x) 2
∂ f (x) 2
∂ f (x)
∂xn x1 ∂xn x2 · · · ∂x2 n

The Hessian ∇2 f (x) is an n by n matrix


If f is twice continuously differentiable at x then
∂ 2 f (x) ∂ 2 f (x)
= for all i 6= j
∂xi xj ∂xj xi
That is the Hessian matrix G = ∇2 f (x) is symmetric (GT = G)
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 22 / 30
Mathematical background Derivatives

Gradient and Hessian – Example


Find the gradient and Hessian of
f (x) = −2x21 − 3x22 + 4x1 x2 + 2x1 + 6x2 + 8.

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 23 / 30


Mathematical background Derivatives

Gradient and Hessian – Exercise


Example 24 (Gradients and Hessians)
Let f (x) = x21 + (x2 − 1)2 (x2 − 3)2 + x2 /2 and

c1 (x) = −x21 + x2 ,
c2 (x) = −x1 + 1,
c3 (x) = −x1 + x2 − 2,
c4 (x) = −x1 − x2 + 4.

For each function f (x), ci (x), i = 1, 2, 3, 4


Find the gradient and Hessian
Determine if the function is linear, quadratic or nonlinear

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 24 / 30


Mathematical background Derivatives

Example 1 – Plots

2 2 2
x + (x −1) (x −3) + x /2
1 2 2 2

9
6
3.6

3 4
3.1
2
1.5

1.75
2.6
2

2.1 2.5
x
2 1.75 2
1.6
2
1 1.25
1.5
1.1
2.5 0.75
0.5

0.6
4 3
6
0.1 9
15
21
−0.4
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
x1

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 25 / 30


Mathematical background Derivatives

Linear and Quadratic functions


Example 25 (Linear and Quadratic functions)
Let f0 ∈ R, g ∈ Rn and G ∈ Rn×n , G symmetric, be fixed.
Find the gradient ∇f (x) and Hessian ∇2 f (x) for the
Linear function f (x) = gT x + f0
Quadratic function f (x) = 12 xT Gx + gT x + f0

Solution 26 n
X
T
f (x) = g x + f0 = gi xi + f0
i=1
∇f (x) = g, does not depend on x
∇2 f (x) = 0 ∈ Rn×n , n by n zero matrix
X n
n X n
X
f (x) = 12 xT Gx + gT x + f0 = 1
2 xi Gij xj + gi xi + f0
i=1 j=1 i=1
∇f (x) = Gx + g
∇2 f (x) = G, does not depend on x
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 26 / 30
Mathematical background Positive definite matrices

Positive definite matrices – Definition

Definition 27
A real square matrix A ∈ Rn×n is
positive definite ⇐⇒ xT Ax > 0 for all x ∈ Rn , x 6= 0
positive semi-definite ⇐⇒ xT Ax ≥ 0 for all x ∈ Rn
negative definite ⇐⇒ xT Ax < 0 for all x ∈ Rn , x 6= 0
negative semi-definite ⇐⇒ xT Ax ≤ 0 for all x ∈ Rn
indefinite ⇐⇒ there exist x0 , y0 ∈ Rn : x0 T Ax0 > 0 and
y0 T Ay0 < 0

Generalization of nonnegative (order) to symmetric matrices:


A  B ⇐⇒ A − B  0 ⇐⇒ A − B positive semidefinite
Theoretical definition, not a practical test
A is negative definite ⇐⇒ −A is positive definite
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 27 / 30
Mathematical background Positive definite matrices

Positive definite matrices – Eigenvalues


A symmetric matrix A ∈ Rn×n
has n real eigenvalues λi , i = 1, . . . , n
there exists an orthogonal matrix Q (QT Q = I) such that
A = QDQT where D = diag(λ1 , . . . , λn ) and Q = [v1 v2 · · · vn ]
where vi is an eigenvector
Qof A corresponding to eigenvalue λi
n
Determinant: det (A) P= i=1 λiP
Trace: trace (A) := ni=1 aii = ni=1 λi
Proposition 3
A symmetric matrix A ∈ Rn×n is
positive definite ⇐⇒ λi > 0 for all i = 1, . . . , n
positive semi-definite ⇐⇒ λi ≥ 0 for all i = 1, . . . , n
negative definite ⇐⇒ λi < 0 for all i = 1, . . . , n
negative semi-definite ⇐⇒ λi ≤ 0 for all i = 1, . . . , n
indefinite ⇐⇒ there exist i, j : λi > 0 and λj < 0
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 28 / 30
Mathematical background Positive definite matrices

Positive definite matrices – Principal Minors


Definition 28
The ith principal minor, ∆i , of a symmetric matrix A ∈ Rn×n is the
determinant of the leading i × i submatrix of A.

Proposition 4
A symmetric matrix A ∈ Rn×n is positive definite if and only if all the
principal minors, ∆i , i = 1, 2, . . . , n of A are positive
If, however, ∆i , i = 1, 2, . . . , n has the sign of (−1)i , i = 1, 2, . . . , n
(i.e. the values of ∆i are alternatively negative and positive), then
the matrix A is negative definite.

Example 29    
3 −3 −3 3
(a) (b)
−3 5 3 −5

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 29 / 30


Mathematical background Positive definite matrices

Positive definite matrices – Cholesky factorization


Proposition 5 (Cholesky factorization)
A symmetric matrix A ∈ Rn×n is positive definite ⇐⇒ the Cholesky
factorization A = RT R exists, R a nonsingular upper triangular matrix
Proof.
⇐= Suppose the Cholesky factorization A = RT R exists. Then
n
X
xT Ax = xT (RT R)x = (Rx)T (Rx) = yT y = yi2 ≥ 0
i=1

where y = Rx. Moreover, as R is nonsingular,

xT Ax = 0 ⇐⇒ y = 0 ⇐⇒ Rx = 0 ⇐⇒ x = 0.

=⇒ Text book on Linear Algebra, eg Golub and van Loan

Matlab chol
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 30 / 30

Potrebbero piacerti anche