MATH3161 Unsw

Topic and contents
School of Mathematics and Statistics
MATH3161/MATH5165 – Optimization
Prof Jeya Jeyakumar
✞ ☎
Topic 01 – Optimization – What is it? ✆
✝
1 Optimization 2 Mathematical background

What is optimization? Vector norms
Optimization in the “real world” Cauchy-Schwarz Inequality
Mathematics of optimization Optima and optimizers
Variables Existence
Objective Relaxation
Constraints Derivatives
Standard formulation Positive definite matrices
MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 1 / 30

Optimization What is optimization?
What is optimization?
Definition 1 (Optimization/Optimisation)
Optimization is a process that finds the “best” possible solutions from a
set of feasible solutions. When you optimize something, you are “making
it best”.
“Optimization” comes from the same root as “optimal” which means

best. But “best” can vary.
Meaning of “best” =⇒ concept of ordering
Business/Economics:
Maximize: profit, return, utility. . .
Minimize: cost, risk, . . .
Engineering
Maximize: strength, production, . . .
Minimize: cost, materials, time, . . .
Definition 2 (What is an optimization problem?)
An optimization problem is a mathematical problem of finding the best possible
solution from a set of feasible solutions. It has the form of minimizing (or
maximizing) an objective function subject to constraints.
Optimization Optimization in the “real world”
Optimization in the “Real World”

The applicability of optimization is widespread, reaching into almost
every activity in which numerical information is processed. To provide
a comprehensive account of all these application areas would therefore
be unrealistic, but a selection might include:
Traditional Application Areas: Portfolio management problems in
Banking and Finance; structural design problems in manufacturing
and engineering; resource allocation and scheduling problems in
commerce; farm planning problems in Agriculture etc.
Emerging Scientific Areas: Sensor network localization problems in
wireless communications; data mining problems in machine learning
and information sciences; optimization models as decision support
tools in medicine (e.g. optimization based screening algorithms for
identifying neurological and other disorders).
See the course webpage on Moodle for recent research papers on
these emerging application areas in a folder–additional resources.
Optimization Mathematics of optimization
Mathematics of Optimization–Outline
Mathematical model
Variables
Objective
Constraints
Characterising optima ⇐⇒ Optimality principles
What is an optimum/optimizer?
Formulae for identifying/characterizing optima
Derivatives I Newton (1642 – 1726), G W Leibniz (1646 – 1711)
Analytic methods for finding optima
Fermat (1601 – 1665), L Euler (1707 – 1783), Lagrange (1736 – 1813)
Finding optima =⇒ Numerical methods
Newton’s method: I Newton (1642 – 1726), C F Gauss (1777 – 1855)
Linear programming: L Kantorovich (1912 – 1986), G B Dantzig (1914 – 2005)
Computer algorithms Linear algebra
Convexity R T Rockafellar (1938 – )
Duality J von Neumann (1903 – 1957)
Maximum principle L S Pontryagin (1908-1988)
Optimization Variables
Variables
Decision variables: what can you change

Finite dimensional x ∈ Rn , Number of variables n
Column vector  
x1
 x2 
x= ..
 

 . 
xn
Univariate: n = 1, x ∈ R
Multivariate: n = 2, 3, 4, . . . up to millions
Mathematical background: column vectors, symmetric matrices, matrix
operations
Discrete optimization: xi ∈ {0, 1}, xi ∈ N, xi ∈ Z
2
Matrix of variables X ∈ Rm×m =⇒ x ∈ Rm
Infinite dimensional: the control
A function u ∈ C([a, b])
A function u ∈ L1 ([0, T ])
Optimization Objective
Objective
A mathematical function of the variables quantifying the idea of “best”.
Finite dimensional: variables x ∈ Rn , Objective function f : Rn → R
Linear f (x) = gT x, g ∈ Rn fixed
Affine f (x) = gT x + f0 , g ∈ Rn , f0 ∈ R fixed
Quadratic f (x) = 21 xT Gx + gT x + f0 , G ∈ Rn×n , g ∈ Rn , f0 ∈ R
Infinite dimensional: f : C([0, T ] → R, variables u ∈ C([0, T ])
Z T
f (u) = u(t)dt
0
Co-domain of objective function must be ordered (total order)

If α, β, γ ∈ R, then
α ≤ β and β ≤ α ⇐⇒ α = β
α ≤ β and β ≤ γ =⇒ α ≤ γ
Either α ≤ β or β ≤ α
If u, v ∈ Rn , n ≥ 2,
u ≤ v ⇐⇒ v − u ≥ 0 ⇐⇒ (vi − ui ) ≥ 0, i = 1, . . . , n

Optimization Constraints
Constraints
Constraints: Describe restrictions on the allowable values of variables
Constraint structure for variables x ∈ Rn
Equality constraints ci (x) = 0, i = 1, . . . , mE
Inequality constraints ci (x) ≤ 0, i = mE + 1, . . . , m
Feasible region Ω ⊂ Rn
Ω = {x ∈ Rn : ci (x) = 0, i = 1, . . . , mE ;
ci (x) ≤ 0, i = mE + 1, . . . , m}
Unconstrained problem ⇐⇒ Ω = Rn
Standard formulation ĉi (x) ≥ 0 ⇐⇒ −ĉi (x) ≤ 0
Algebraic structure of constraints
Simple bounds: ℓ ≤ x ≤ u ⇐⇒ ℓi ≤ xi ≤ ui , i = 1, . . . , n
Linear constraints ci (x) = aTi x − bi
Nonlinear constraints

Optimization Constraints
Constraint representation
Example 3 (Constraint representation)
What feasible regions do the following constraints represent?
p
(x1 − a1 )2 + (x2 − a2 )2 = r
p
(x1 − a1 )2 + (x2 − a2 )2 ≤ r
p
(x1 − a1 )2 + (x2 − a2 )2 ≥ r
x2 = x21
x2 ≥ x21
x21 − 1 = 0

Optimization Standard formulation
Standard formulation
Definition 4 (Standard formulation)

The standard formulation of a continuous finite dimensional optimization is
Minimize f (x)
x ∈ Rn
subject to ci (x) = 0, i = 1, . . . , mE ;
ci (x) ≤ 0, i = mE + 1, . . . , m
Conversions
Maximize to minimize: max fˆ(x) = − min −fˆ(x)
Constraint right-hand-side: ĉi (x) = bi ⇐⇒ ĉi (x) − bi = 0
Less than or equal to inequalities: ĉi (x) ≥ 0 ⇐⇒ −ĉi (x) ≤ 0
Strict inequality: ĉi (x) < 0 ⇐⇒ ĉi (x) + ǫ ≤ 0, some ǫ > 0

A simplified farm planning problem

Example 5 (Farm Planning)
Farmer Jack has 100 acres to devote to wheat and corn and wishes to plan
his planting to maximize the expected revenue. Jack has only $800 in
capital to apply to planting the crops, and it costs $5 to plant an acre of
wheat and $10 for an acre of corn. Their other activities leave the Jack
family only 150 days of labour to devote to the crops. Two days will be
required for each acre of wheat and one day for an acre of corn. If past
experience indicates a return of $40 from each acre of wheat and $30 from
each acre of corn, how many acres of each should be planted to maximize
his revenue?
Pose this as an optimization problem in standard form.

Post Office Parcel problem

Example 6 (Post Office Parcel Problem)
At one time the post office regulations were that
the length plus the girth of a parcel must not exceed 1.8 metres.
What is the parcel of largest volume that could be sent through the post?
Pose this as an optimization problem in standard form. Assume that

The parcel has rectangular sides
The length of the parcel is the longest edge
The girth is the distance around the parcel perpendicular to the
length. For a rectangular box, girth is 2×(height + depth).
x3
x1
x2
Standard formulation – Example 2

Example 7 (Standard formulation)
Maximize −x21 − (x2 − 1)2 (x2 − 3)2 − x2 /2 on the set
Ω = {x ∈ R2 : x2 ≥ x21 , x1 ≤ 1, x2 ≤ 2 + x1 , x2 ≤ 4 − x1 }
Write this problem in standard form and plot the feasible region Ω.
What is the feasible region if the first constraint becomes x2 = x21 ?
4
3.5
2.5
1.5
0.5
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

Mathematical background Vector norms
Measures of size - norms

If α ∈ R then its magnitude (orabsolute value) |α| is
α if α ≥ 0;
|α| =
−α if α ≤ 0.
In Rn there are several possible definitions – norms denoted using || ||.
Definition 8 (Vector norm)
A vector norm on Rn is a function ||.|| from Rn to R such that
1 kxk ≥ 0 for all x ∈ Rn and kxk = 0 ⇐⇒ x = 0.
2 kx + yk ≤ kxk + kyk for all x, y ∈ Rn . (Triangle inequality)
3 kαxk = |α| kxk for all α ∈ R, x ∈ Rn .
Example 9 (Vector norms)
The most widely used vector norms are
1-norm kxk1 = ni=1 |xi |
P
Pn 1
2 2 = (xT x) 21
2-norm kxk2 = i=1 |xi |
∞-norm or maximum norm kxk∞ = maxi=1,...,n |xi |
Mathematical background Cauchy-Schwarz Inequality
Cauchy-Schwarz Inequality
An important property connecting the dot product of two vectors and their
norms is the Cauchy-Schwarz inequality:
|xT y| ≤ kxk2 kyk2 , for any x, y ∈ Rn .
Equality holds if and only if x and y are linearly dependent.

Ex*. Show that the 2-norm, given by
f (x) = kxk2 ,
satisfies (1)–(3) of Definition 7 (vector norm). [Hint:

Cauchy-Schwarz inequality is useful here for verifying (2)].

Mathematical background Optima and optimizers
Local and global minima

Definition 10 (Global minimum)
A point x∗ ∈ Ω is a global minimizer of f (x) over Ω ⊆ Rn ⇐⇒
f (x∗ ) ≤ f (x) for all x ∈ Ω. The global minimum is f (x∗ ).
Definition 11 (Strict global minimum)

A point x∗ ∈ Ω is a strict global minimizer of f (x) over Ω ⊆ Rn ⇐⇒
f (x∗ ) < f (x) for all x ∈ Ω, x 6= x∗ .
Definition 12 (Local minimum)
A point x∗ ∈ Ω is a local minimizer of f (x) over Ω ⊆ Rn ⇐⇒
there exists a δ > 0 such that f (x∗ ) ≤ f (x) for all x ∈ Ω with
kx − x∗ k ≤ δ. Then f (x∗ ) is a local minimum.
Definition 13 (Strict local minimum)
A point x∗ ∈ Ω is a strict local minimizer of f (x) over Ω ⊆ Rn ⇐⇒
there exists a δ > 0 such that f (x∗ ) < f (x) for all x ∈ Ω with
0 < kx − x∗ k ≤ δ.
Local and global maxima

Definition 14 (Global maximum)
A point x∗ ∈ Ω is a global maximizer of f (x) over Ω ⊆ Rn ⇐⇒
f (x∗ ) ≥ f (x) for all x ∈ Ω. The global maximum is f (x∗ ).
Definition 15 (Strict global maximum)
A point x∗ ∈ Ω is a strict global maximizer of f (x) over Ω ⊆ Rn ⇐⇒
f (x∗ ) > f (x) for all x ∈ Ω, x 6= x∗ .
Definition 16 (Local maximum)
A point x∗ ∈ Ω is a local maximizer of f (x) over Ω ⊆ Rn ⇐⇒
there exists a δ > 0 such that f (x∗ ) ≥ f (x) for all x ∈ Ω with
kx − x∗ k ≤ δ. Then f (x∗ ) is a local maximum.
Definition 17 (Strict local maximum)
A point x∗ ∈ Ω is a strict local maximizer of f (x) over Ω ⊆ Rn ⇐⇒
there exists a δ > 0 such that f (x∗ ) > f (x) for all x ∈ Ω with
0 < kx − x∗ k ≤ δ.
Local and global minima and maxima

Example 18 (Local and global minima and maxima)
(x − 1)2


 if x ≤ 0.5;
 0.25 if 0.5 < x ≤ 1;


Ω = [0, 5], f (x) = 1.25 − (x − 2)2 if 1 < x ≤ 2.5;
 x − 1.5 if 2.5 < x ≤ 3;



1.5 − 0.25 sin(π(x − 3)) if 3 < x.

Local and global extrema

1.8
1.6
1.4
1.2
f(x) 1
0.8
0.6
0.4
0.2
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
x
Local and global minima and maxima

Solution 19 (Local and global minima and maxima)
Consider f (x) on the interval Ω = [0, 5] and the points
0, 0.5, 2, 2.5, 3, 3.5, 4.5, 5
x(a) = 0 is strict local maximizer with f (a) = 1;
any point x(b) in the interval [0.5, 1] is a local and global minimizer
with f (b) = 0.25 (but not strict as adjacent points have the same
function value);
x(c) = 2 is a strict local maximizer with f (c) = 1.25;
x(d) = 2.5 is a strict local minimizer with f (d) = 1;
x(e) = 3 is a strict local maximizer with f (e) = 1.5;
x(f ) = 3.5 is a strict local minimizer with f (f ) = 1.25;
x(g) = 4.5 is a strict local and global maximizer with f (g) = 1.75;
x(h) = 5 is a strict local minimizer with f (h) = 1.5.

Mathematical background Existence
Existence
Definition 20 (Extrema)
The global/local extrema of f over Ω are all the global/local minima and
all the global/local maxima
Proposition 1 (Existence of global extrema)

Let Ω be a compact set and let f be continuous on Ω. Then the global
extrema of f over Ω exist.
Finite dimensional Ω ⊆ Rn is compact ⇐⇒ Ω is closed and bounded
Example 21 (Existence)
Find the global extrema, if they exist, for the following problems
f (x) = e−x on Ω = [0, 1]
f (x) = e−x on Ω = [0, ∞)
f (x) = sin x on Ω = [0, 2π)
Mathematical background Relaxation
Relaxation
Proposition 2 (Relaxation)
If f : Rn → R and Ω̄ ⊆ Ω then
min f (x) ≤ min f (x)

x∈Ω x∈Ω̄
Thus, the minimum value of the relaxation problem ≤ the minimum value of the
original problem.
Proof.
Let x∗ ∈ Ω̄ and f (x∗ ) be the global minimizer and minimum of f over Ω̄.
As Ω̄ ⊆ Ω, x∗ ∈ Ω̄ =⇒ x∗ ∈ Ω. Thus, minx∈Ω f (x) ≤ f (x∗ ).
If you make the feasible region larger then the minimum value of
objective function must not increase
Mathematical background Derivatives
Gradients
Definition 22 (Gradient)
Let f : Rn → R be continuously differentiable.
The gradient ∇f : Rn → Rn of f at x is
 
∂f (x)
∂x1
 
 ∂f (x) 
∇f (x) = 
 ∂x2 
.. 
.
 
 
∂f (x)
∂xn
The gradient is a column vector with n elements

The gradient ∇f (x̄) of f at x̄ is orthogonal to the contour
{x ∈ Rn : f (x) = f (x̄)}

Hessians
Definition 23 (Hessian)
Let f : Rn → R be continuously differentiable. The Hessian
∇2 f : Rn → Rn×n of f at x is
 ∂ 2 f (x) ∂ 2 f (x) 2
· · · ∂∂xf1(x)

∂x21 ∂x1 x2 xn
 2
 ∂ f (x) ∂ 2 f (x) · · · ∂ 2 f (x) 

 ∂x2 x1 ∂x22 ∂x2 xn 
∇2 f (x) =  . . ..

 .. .. . .. 

 2 . 

∂ f (x) 2
∂ f (x) 2
∂ f (x)
∂xn x1 ∂xn x2 · · · ∂x2 n
The Hessian ∇2 f (x) is an n by n matrix

If f is twice continuously differentiable at x then
∂ 2 f (x) ∂ 2 f (x)
= for all i 6= j
∂xi xj ∂xj xi
That is the Hessian matrix G = ∇2 f (x) is symmetric (GT = G)
Gradient and Hessian – Example

Find the gradient and Hessian of
f (x) = −2x21 − 3x22 + 4x1 x2 + 2x1 + 6x2 + 8.

Gradient and Hessian – Exercise

Example 24 (Gradients and Hessians)
Let f (x) = x21 + (x2 − 1)2 (x2 − 3)2 + x2 /2 and
c1 (x) = −x21 + x2 ,
c2 (x) = −x1 + 1,
c3 (x) = −x1 + x2 − 2,
c4 (x) = −x1 − x2 + 4.
For each function f (x), ci (x), i = 1, 2, 3, 4

Find the gradient and Hessian
Determine if the function is linear, quadratic or nonlinear

Example 1 – Plots
2 2 2
x + (x −1) (x −3) + x /2
1 2 2 2
9
6
3.6
3 4
3.1
2
1.5
1.75
2.6
2
2.1 2.5
x
2 1.75 2
1.6
2
1 1.25
1.5
1.1
2.5 0.75
0.5
0.6
4 3
6
0.1 9
15
21
−0.4
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2
x1

Linear and Quadratic functions

Example 25 (Linear and Quadratic functions)
Let f0 ∈ R, g ∈ Rn and G ∈ Rn×n , G symmetric, be fixed.
Find the gradient ∇f (x) and Hessian ∇2 f (x) for the
Linear function f (x) = gT x + f0
Quadratic function f (x) = 12 xT Gx + gT x + f0
Solution 26 n
X
T
f (x) = g x + f0 = gi xi + f0
i=1
∇f (x) = g, does not depend on x
∇2 f (x) = 0 ∈ Rn×n , n by n zero matrix
X n
n X n
X
f (x) = 12 xT Gx + gT x + f0 = 1
2 xi Gij xj + gi xi + f0
i=1 j=1 i=1
∇f (x) = Gx + g
∇2 f (x) = G, does not depend on x
Mathematical background Positive definite matrices
Positive definite matrices – Definition
Definition 27
A real square matrix A ∈ Rn×n is
positive definite ⇐⇒ xT Ax > 0 for all x ∈ Rn , x 6= 0
positive semi-definite ⇐⇒ xT Ax ≥ 0 for all x ∈ Rn
negative definite ⇐⇒ xT Ax < 0 for all x ∈ Rn , x 6= 0
negative semi-definite ⇐⇒ xT Ax ≤ 0 for all x ∈ Rn
indefinite ⇐⇒ there exist x0 , y0 ∈ Rn : x0 T Ax0 > 0 and
y0 T Ay0 < 0
Generalization of nonnegative (order) to symmetric matrices:

A B ⇐⇒ A − B 0 ⇐⇒ A − B positive semidefinite
Theoretical definition, not a practical test
A is negative definite ⇐⇒ −A is positive definite
Positive definite matrices – Eigenvalues

A symmetric matrix A ∈ Rn×n
has n real eigenvalues λi , i = 1, . . . , n
there exists an orthogonal matrix Q (QT Q = I) such that
A = QDQT where D = diag(λ1 , . . . , λn ) and Q = [v1 v2 · · · vn ]
where vi is an eigenvector
Qof A corresponding to eigenvalue λi
n
Determinant: det (A) P= i=1 λiP
Trace: trace (A) := ni=1 aii = ni=1 λi
Proposition 3
A symmetric matrix A ∈ Rn×n is
positive definite ⇐⇒ λi > 0 for all i = 1, . . . , n
positive semi-definite ⇐⇒ λi ≥ 0 for all i = 1, . . . , n
negative definite ⇐⇒ λi < 0 for all i = 1, . . . , n
negative semi-definite ⇐⇒ λi ≤ 0 for all i = 1, . . . , n
indefinite ⇐⇒ there exist i, j : λi > 0 and λj < 0
Positive definite matrices – Principal Minors

Definition 28
The ith principal minor, ∆i , of a symmetric matrix A ∈ Rn×n is the
determinant of the leading i × i submatrix of A.
Proposition 4
A symmetric matrix A ∈ Rn×n is positive definite if and only if all the
principal minors, ∆i , i = 1, 2, . . . , n of A are positive
If, however, ∆i , i = 1, 2, . . . , n has the sign of (−1)i , i = 1, 2, . . . , n
(i.e. the values of ∆i are alternatively negative and positive), then
the matrix A is negative definite.
Example 29
3 −3 −3 3
(a) (b)
−3 5 3 −5

Positive definite matrices – Cholesky factorization

Proposition 5 (Cholesky factorization)
A symmetric matrix A ∈ Rn×n is positive definite ⇐⇒ the Cholesky
factorization A = RT R exists, R a nonsingular upper triangular matrix
Proof.
⇐= Suppose the Cholesky factorization A = RT R exists. Then
n
X
xT Ax = xT (RT R)x = (Rx)T (Rx) = yT y = yi2 ≥ 0
i=1
where y = Rx. Moreover, as R is nonsingular,
xT Ax = 0 ⇐⇒ y = 0 ⇐⇒ Rx = 0 ⇐⇒ x = 0.
=⇒ Text book on Linear Algebra, eg Golub and van Loan
Matlab chol

MATH3161 Unsw

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

MATH3161 Unsw

Caricato da

Copyright:

Formati disponibili

Topic and contents

School of Mathematics and Statistics

1 Optimization 2 Mathematical background

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 1 / 30

“Optimization” comes from the same root as “optimal” which means

Optimization in the “Real World”

Decision variables: what can you change

Co-domain of objective function must be ordered (total order)

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 6 / 30

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 7 / 30

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 8 / 30

Definition 4 (Standard formulation)

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 9 / 30

A simplified farm planning problem

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 10 / 30

Post Office Parcel problem

Pose this as an optimization problem in standard form. Assume that

Standard formulation – Example 2

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 12 / 30

Measures of size - norms

|xT y| ≤ kxk2 kyk2 , for any x, y ∈ Rn .

Equality holds if and only if x and y are linearly dependent.

satisfies (1)–(3) of Definition 7 (vector norm). [Hint:

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 14 / 30

Local and global minima

Definition 11 (Strict global minimum)

Local and global maxima

Local and global minima and maxima

Local and global extrema

Local and global minima and maxima

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 18 / 30

Proposition 1 (Existence of global extrema)

min f (x) ≤ min f (x)

The gradient is a column vector with n elements

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 21 / 30

The Hessian ∇2 f (x) is an n by n matrix

Gradient and Hessian – Example

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 23 / 30

Gradient and Hessian – Exercise

For each function f (x), ci (x), i = 1, 2, 3, 4

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 24 / 30

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 25 / 30

Linear and Quadratic functions

Positive definite matrices – Definition

Generalization of nonnegative (order) to symmetric matrices:

Positive definite matrices – Eigenvalues

Positive definite matrices – Principal Minors

MATH3161/MATH5165 (Optimization) H01 – Optimization – What is it? Session 1, 2018 29 / 30

Positive definite matrices – Cholesky factorization

where y = Rx. Moreover, as R is nonsingular,

=⇒ Text book on Linear Algebra, eg Golub and van Loan

Potrebbero piacerti anche