UUM 526 Optimization Techniques in Engineering Lecture 2: Mathematical Preliminaries

UUM 526 Optimization Techniques in Engineering
Lecture 2: Mathematical Preliminaries
Asst. Prof. N. Kemal Ure
Istanbul Technical University

ure@itu.edu.tr
February 5, 2019
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 1 / 57

Overview
1 Proof Methods
2 Linear Algebra
3 Geometry
4 Calculus

Introduction
▶ We need to review/build some mathematical tools before we can

start attacking optimality tests and convergence of algorithms
▶ We are going to review:
∎ Proof Methods
∎ Linear Algebra
∎ Geometry
∎ Single and Multivariable Calculus
▶ If you took UUM 535, everything here should look familiar.

∎ We will skip most proofs here, since we already did them at UUM 535.
∎ All the proofs are available in Chong’s anyway.
▶ The main goal is to understand quadratic forms and Taylor’s

Theorem in Rn , which will be our main tools in analysis of
optimization problems and algorithms.
Proof Methods
Proof Methods

Proof Methods
Proofs
▶ A theorem is usually a statement of the form A Ô⇒ B

∎ A is assumption of the theorem (what is given to us), B is the conclusion
of the problem. If A is true, then B holds
▶ Example (Triangle Inequality): A ∶ a, b ∈ R, B ∶ ∣a + b∣ ≤ ∣a∣ + ∣b∣
▶ Some theorems work both ways A ⇐⇒ B

∎ A, is true if and only if B is true
∎ In this case we have to prove both A Ô⇒ B and B Ô⇒ A. Example:
Theorem
Let a, b be real numbers. Then a = b if and only if for all > 0, ∣a − b∣ <

Proof Methods
Proof By Contradiction (PBC) and Proof By

Induction
▶ We just used PBC in the last proof! It is a very popular technique

▶ Based on the fact that A Ô⇒ B is equivalent to not(A and(notB))
Theorem
There are infinite number of prime numbers
▶ Proof by induction is also a popular technique. Suppose that

property we want to prove is indexed by N, so we need to prove
A(1), A(2), . . .
∎ First prove that it is true for A(1)
∎ Next prove that A(n) Ô⇒ A(n + 1)
Theorem
n(n+1)
Let n ∈ N, then ∑ni=1 i = 2
▶ Warning: Induction does not apply when n → ∞

Linear Algebra
Linear Algebra

Linear Algebra
Linear Combination
▶ A vector space is a set V equipped with vector addition (+) and

scalar multiplication (.) such that:
α ⋅ v1 + β ⋅ v2 ∈ V, ∀v1 , v2 ∈ V, α, β ∈ F (1)
▶ We will be mainly dealing with V = Rn and F = R.

▶ If S ⊆ V satisfies Eq. 1, then S is called a subspace of V .
Definition (Linear Combination)
Let ak , k = 1, . . . , n be a finite number of vectors. Vector b is said to be
a linear combination of vectors ak , if there exists scalars αk such that:
n
b = α1 a1 + α2 a2 + . . . + αn an = ∑ αk ak
k=1

Linear Algebra
Linear Dependency
Definition (Linearly Dependent List)

The list of vectors A = (ak ∶ k = 1, . . . , n) are said to be linearly
dependent if one of the vectors is a linear combination of the others.
▶ If list A is not linearly dependent, it is called linearly independent.

Theorem (Test for linear independence)
The list A = (ak ∶ k = 1, . . . , n) is linearly independent iff the equality,
n
∑ αk ak = 0, (2)
k=1
implies that αk = 0, k = 1 . . . , n.

Linear Algebra
Span, Basis and Dimension
Definition (Span)
The set of all linear combinations of a list of vectors {ak } is called the
span of {ak }
n
span[a1 , . . . , an ] = { ∑ αk ak ∶ αk ∈ R}
k=1
▶ Span is always a subspace!

Definition (Basis)
If the list (ak ) is linearly independent and span[a1 , . . . , an ] = V , then
(ak ) is a basis for V .
▶ By fixing a basis, we can represent vectors in V in terms of that basis.

Linear Algebra
Span, Basis and Dimension
Theorem (Unique Coordinates for a Fixed Base)

Let {ak } be a basis for V . Then any vector v ∈ V can be represented
uniquely as,
n
v = ∑ αk ak .
k=1
▶ There are usually infinite number of bases for a subspace. For

instance if (ak ) is a basis, so is (cak ), c ∈ R.
▶ What about the number of vectors in a basis?
Theorem (Unique Number of Vectors in a Basis)
Let ak , k = 1, . . . , n and bi , i = 1, . . . , m be two different bases for V .
Then n = m.
▶ Hence every space (or subspace) V has a unique number of vectors
in its every basis. That number is called the dimension of V .
▶ We call αk the coordinates of the vector v in base {ak }.
n Kemal Ure (ITU)
Asst. Prof. N. Lecture 2 February 5, 2019 11 / 57
Linear Algebra
Vector and Matrix Notation
▶ For a ∈ Rn we will write vectors in column notation

⎡ a1 ⎤
⎢ ⎥
⎢a ⎥
⎢ ⎥ T
a = ⎢ 2 ⎥ = [a1 a2 . . . an ] , ai ∈ R
⎢ ⋮ ⎥
⎢ ⎥
⎢an ⎥
⎣ ⎦
▶ A matrix A ∈ Rm×n is a rectangular collection of real numbers
aij ∈ R, i = 1, . . . , m.j = 1, . . . , n
⎡ a11 a12 . . . a1n ⎤
⎢ ⎥
⎢a ⎥
⎢ 21 a22 . . . a2n ⎥
A=⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⎥⎥
⎢
⎢am1 am2 . . . amn ⎥
⎣ ⎦
▶ It is much more useful to think A as a collection of n vectors lying in
Rm : A = [a1 , a2 , . . . , an ], ak ∈ Rm
∎ Also matrix-vector multiplication Av makes more sense this way
Linear Algebra
Matrix Rank
Definition (Rank of a Matrix)

Rank r is the maximal number of independent columns of A ∈ Rm×n .
▶ Notice that r ≤ n. When r = n, we say that matrix is full rank.

Theorem (Invariance of Rank)
Rank of a matrix A ∈ Rm×n is invariant under following operations.
1 Multiplication of columns of A by nonzero scalars.
2 Interchange of columns.
3 Addition to a given column a linear combination of other columns.
▶ Nice, but is there a formula for testing if the matrix has full rank? Is
there a scalar quantity that measures the independency of columns?
∎ For square matrices (m = n), the answer is yes! It is called the
determinant of the matrix.
Linear Algebra
Determinant
▶ Determinant of matrix (denoted by ∣A∣) is a confusing concept at

first, has many different interpretations.
▶ The properties of the determinant are more important than its
explicit formula
Definition (Determinant)
Determinant is a function denoted as det ∶ Rn×n → R, and possess the
following properties:
1 Determinant is linear in matrix’s columns
∣A∣ = detA = det[a1 , . . . , αak + βbk , . . . , an ]

= αdet[a1 , . . . , ak , . . . , an ]
+ β[det[a1 , . . . , bk , . . . , an ].
2 If for some k, ak = ak+1 , then ∣A∣ = 0.

3 Determinant of the identity matrix is 1, that is ∣In ∣ = 1.
Linear Algebra
Consequences of Determinant Definition
▶ If there is a zero column in the matrix, determinant is zero,
det[a1 , . . . , 0, . . . , an ] = 0.
▶ If we add a column linear combination of other columns, determinant

does not change.
⎡ ⎤
⎢ n ⎥
det ⎢a1 , . . . , ak + ∑ αj aj , . . . , an ⎥⎥ = det[a1 , . . . , ak , . . . , an ]
⎢
⎢ ⎥
⎣ j=1,j≠k ⎦
▶ Determinant change sign if we interchange columns,
det[a1 , . . . , ak−1 , ak , . . . , an ] = −det[a1 , . . . , ak , ak−1 , . . . , an ].
▶ So, if the columns of A are not linearly independent, then ∣A∣ = 0.

∎ Hence for a square matrix: full rank ⇐⇒ nonzero determinant.
Linear Algebra
Determinant and Rank
▶ Only square matrices have determinants. What if I want to test the

rank of a rectangular matrix?
∎ Rectangular matrices have square sub-matrices! Wonder if their
determinant is useful... First we need to define it:
Definition (Minor)
pth order minor of a matrix A ∈ Rm×n is the determinant of sub-matrix
formed by deleting m − p rows and n − p columns.
▶ Then we have this cool theorem:

Theorem (Minors and Rank)
If an A ∈ Rm×n (m ≥ n) has a nonzero nth order minor, then
rankA = n.
▶ It is straightforward to show that rank of a matrix is the maximal

order of its nonzero minors.
Linear Algebra
Nonsingular matrices and Inverses
▶ A square matrix A with det A ≠ 0 is called nonsingular.

▶ A matrix A ∈ Rn×n is nonsingular if and only if there exists a matrix
B ∈ Rn×n such that:
AB = BA = In
▶ Matrix B is called the inverse of A and denoted as A−1 .

▶ Shows up in the solution of linear equations Ax = b. The unique
solution exists if A is nonsingular (x = A−1 b)
▶ What about non-square linear systems?
Theorem (Existence of Solution in a Linear System)
The set of equations represented by Ax = b has a solution if and only if
rankA = rank[A, b].
▶ If rankA = m < n, then we have infinite number of solutions.

Linear Algebra
Euclidian Inner Product
▶ We need to turn our vector space into a metric space by adding a

”length” function.
∎ A such function already exists for V = R, the absolute value function ∣.∣
∎ Some very useful properties: −∣a∣ ≤ a ≤ ∣a∣, ∣ab∣ = ∣a∣∣b∣
∎ The most useful property: ∣a + b∣ ≤ ∣a∣ + ∣b∣
▶ For Rn , before defining the length function, it is helpful to define the

inner product first
Definition (Euclidean Inner Product)
The Euclidean Inner Product of two vectors x, y ∈ Rn is defined as
n
⟨x, y⟩ = ∑ xi yi = xT y
i=1

Linear Algebra
Inner Product Properties
▶ Inner product has the following properties

∎ Positivity: ⟨x, x⟩ ≥ 0, x = 0 ⇐⇒ ⟨x, x⟩ = 0.
∎ Symmetry: ⟨x, y⟩ = ⟨y, x⟩.
∎ Additivity: ⟨x + y, z⟩ = ⟨x, z⟩ + ⟨y, z⟩.
∎ Homogeneity: ⟨rx, y⟩ = r⟨x, y⟩, r ∈ R.
▶ These properties also hold for the second vector.

▶ Two vectors are orthogonal if ⟨x, y⟩ = 0.
▶ Now we can define the length, it is called the Euclidean norm:
√ √
∥x∥ = < x, x > = xT x
Theorem (Cauchy-Schwartz Inequality)

For any x, y ∈ Rn
∣⟨x, y⟩∣ ≤ ∥x∥∥y∥,
The equality holds only if x = αy for some α ∈ R.
Linear Algebra
Norm Properties
▶ Norm possess many properties of the absolute value function

∎ Positivity: ∥x∥ ≥ 0, ∥x∥ = 0 ⇐⇒ x = 0
∎ Homogeneity: ∥rx∥ = ∣r∣∥x∥, r ∈ R
∎ Triangle Inequality: ∥x + y∥ ≤ ∥x∥ + ∥y∥
▶ There are many other vector norms. Actually any function that
satisfies the properties above is a norm.
1
∎ p-norm: ∥x∥p = (∣x1 ∣p + ∣x2 ∣p + ⋅ ⋅ ⋅ + ∣xn ∣p ) p
∎ p = 2, the Euclidean norm
∎ What does p = 0 or 1 corresponds to? What happens when p → ∞?
▶ Continuity of f ∶ Rn → Rm can be formulated in terms of norms

∎ f is continuous at x0 ∈ Rn if and only if for all > 0, there exists a δ > 0
such that ∥x − x0 ∥ < δ Ô⇒ ∥f (x) − f (x0 )∥ <
▶ if x ∈ Cn (complex numbers), inner product is defined as ∑ni=1 xi yi ,

hence ⟨x, y⟩ = ⟨y, x⟩ and ⟨x, ry⟩ = r̄⟨x, y⟩.
Linear Algebra
Linear Transformations
Definition (Linear Transformation)

A function L ∶ Rn → Rm is called a linear transformation if
L(ax) = aL(x), a ∈ R
L(x + y) = L(x) + L(y)
▶ If we fix a basis for Rn and Rm we can represent L with a matrix,

such that L(x) = Ax. Hence different bases corresponds to different
matrix representations.
∎ Let {u1 , . . . , un } be a basis for Rn and let {v1 , . . . , vm } be a basis for
Rm . Then we can express L(uk ) = Auk ∈ Rm as:
Auk = ak,1 v1 + ⋅ ⋅ ⋅ + ak,m vm
the ak,1 , . . . , ak,m makes up the k th column of A.

Linear Algebra
Similarity Transformations
▶ Let ei and e′i be two different bases of Rn . Let x be the coordinates

of v ∈ Rn with respect to ei and let x′ be the coordinates of v with
respect to e′i .
▶ Then there is a transformation matrix T ∈ Rn×n such that x′ = T x,
T = [e′1 , . . . , e′n ]−1 [e1 , . . . , en ]
▶ Nice, now we know how to transform coordinates between two bases.

How about transforming matrices?
Theorem (Transformation of Bases)
Let A, B ∈ Rn×n be two representations of the same linear
transformation according to different bases. Then there exists a
nonsingular matrix T ∈ Rn×n such that B = T AT −1
▶ Two matrices are called similar if there exists a T such that

B = T AT −1 . Similar matrices represent the same transformation.
Linear Algebra
Eigenvalues and Eigenvectors
▶ Later we will see that proving convergence of many optimization

algorithms rely on computation of square matrix Ak .
∎ Can we find a basis of Rn such that Ak is easy to compute?
∎ This leads us to the study of eigenvalues and eigenvectors:
Definition (Eigenvalue and Eigenvector)

Let A ∈ Rn×n . If a vector v ∈ Rn satisfies the equation Av = λv, it is
called an eigenvector of A. λ is called the eigenvalue that corresponds
to v.
▶ Has many different physical and geometrical interpretations
∎ We will see that eigenvectors of a linear transformation will form a basis
that results in a very special matrix representation
▶ How to find v? How many are there? It is difficult answer directly...
That is where the eigenvalues come in
∎ Rearrange Av = λv to get (A − Iλ)v = 0
∎ What are the solutions?
Linear Algebra
Eigenvalues
▶ (A − Iλ)v = 0 has a non-trivial solution only if (A − Iλ) is singular.

▶ Hence equation for v turns into an equation for λ, det(A − Iλ) = 0
▶ Results in the nth order polynomial
det(A − Iλ) = λn + an−1 λn−1 + ⋅ ⋅ ⋅ + a1 λ + a0 = 0
▶ Hence there are at most n distinct eigenvalues and corresponding

eigenvectors!
∎ once a λ is found the corresponding v is easy to find.
▶ Although the number of eigenvalues is finite, that this is not the case
for eigenvectors..
∎ If v is an eigenvector, so is cv.
∎ So there are actually an infinite number of eigenvectors

Linear Algebra
Eigenvectors as a Basis
▶ Is there a case where eigenvectors are linearly independent, so that

we can form a basis out of them?
Theorem (Linearly Independent Eigenvectors)
Suppose that A has distinct eigenvalues (i ≠ j Ô⇒ λi ≠, λj ). Then
the corresponding non-zero eigenvectors vi are linearly independent.
▶ Now lets rearrange the eigenvectors into the matrix T , apply

B = T AT −1 and see what happens...
⎡λ1 0 ⎤⎥
⎢
⎢ ⎥
⎢ λ2 ⎥
B=⎢ ⎥
⎢ ⋱ ⎥
⎢ ⎥
⎢0 λn ⎥⎦
⎣ . ..
▶ What if we also wanted the basis to be orthogonal?

Linear Algebra
Symmetric Matrices
▶ A real matrix A is symmetric if AT = A. This leads to a lots of

cool properties
Theorem (Eigenvalues of Symmetric Matrices)
All eigenvalues of real symmetric matrices are real.
Theorem (Eigenvectors of Symmetric Matrices)

An n × n symmetric matrix has a set of n orthogonal eigenvectors.
▶ A matrix with orthogonal columns is called an orthogonal matrix

∎ When we normalize each column so that each have unit length, we get an
orthonormal matrix
∎ Orthonormal matrices has this super cool property: T −1 = T T

Linear Algebra
Orthogonal Subspaces
Definition (Orthogonal Complement)

Let V be a subspace of Rn . The orthogonal complement V ⊥ consists
of all vectors that are orthogonal to V . That is:
V ⊥ = {x ∈ Rn ∶ v T x = 0, v ∈ V }
▶ Every vector x can be uniquely

decomposed as x = x1 + x2 ,
where x1 ∈ V and x2 ∈ V ⊥ . We
call x1 and x2 orthogonal
projections onto V and V ⊥ . We
say that a linear transformation
P is an orthogonal projector if
P x ∈ V and x − P x ∈ V ⊥ .

Linear Algebra
Range and Nullspace
Definition (Range and Nullspace)

Let A ∈ Rm×n . Range of A is defined as R(A) = {Ax ∶ x ∈ Rn }.
Nullspace of A (also called the kernel) is defined as
N (A) = {x ∈ Rn ∶ Ax = 0}.
▶ R and N are subspaces! Moreover:

Theorem (Orthogonality of Matrix Subspaces)
Let A be a given matrix. Then R(A)⊥ = N (AT )
▶ Theorem above is important on its own, it also allows us to prove:

Theorem (Projection Matrix)
A Matrix P is an orthogonal projection if and only if P 2 = P = P T

Linear Algebra
Quadratic Forms
Definition (Quadratic Forms)

A function f ∶ Rn → R is a quadratic form if it can be represented as
f (x) = xT Qx,
where Q is a real matrix.
▶ Note that we can always assume Q is symmetric. Just replace Q

with 12 (Q + QT ), which is always symmetric.
▶ f (x) is called positive definite (p.d.) if xT Qx > 0 for x ≠ 0
▶ f (x) is called positive semidefinite (p.s.d) if xT Qx ≥ 0 for x ≠ 0
▶ We love p.d functions in optimization!
∎ They are basically the generalization of parabolas to higher dimensions

Linear Algebra
Positive Definite Matrices
▶ Q is called a p.d. matrix when the associated quadratic form is p.d.

∎ How can we check if a given matrix is p.d.?
▶ First we need to define what a leading principle minor is. Then we
can use the following test
Theorem (Sylvester’s Test)
The symmetric matrix Q is p.d. iff its leading principle minors are
positive.
∎ The test does not work for non-symmetric matrices! For p.s.d. we check
if the leading principle minors are non-negative (necessary condition), for
sufficiency we need to check all the principal minors.
▶ An alternative test is to check the eigenvalues
Theorem (Eigenvalue Test)
The symmetric matrix Q is p.d. iff all eigenvalues are positive.
Linear Algebra
Matrix Norms
▶ Matrix norms satisfy these conditions

∎ ∥A∥ ≥ 0 and ∥A∥ = 0 ⇐⇒ ∥A∥ = 0.
∎ ∥cA∥ = ∣c∣∥A∥.
∎ ∥A + B∥ ≤ ∥A∥ + ∥B∥.
▶ We will also enforce them to obey ∥AB∥ ≤ ∥A∥∥B∥.

▶ Example: Flatten A ∈ Rn×m into a vector in Rnm and apply the
Euclidian norm. The result is the Frobenius norm ∥.∥F :
1
⎛ n m 2 ⎞2
∥A∥F = ∑ ∑ aij
⎝i=1 j=1 ⎠
▶ If we have a norm ∥.∥n on Rn and the norm ∥.∥m on Rm , these two

norms induce the following norm on all linear transformations from
Rn to Rm :
∥A∥ = max ∥Ax∥m
∥x∥=1

Linear Algebra
Induced Norms
▶ Since norm ∥.∥m is a continuous function and the set ∥x∥ = 1 is

compact, the maximum is always attained. We can also check the all
4 norm properties to confirm that the induced norm in indeed a norm.
▶ A nice property of the induced norm is, they can be usually linked to
the spectral properties of the matrices, such as:
Theorem (Matrix Norm Induced by Euclidean Norm)
Let ∥.∥n and ∥.∥m be the Euclidean√norm on spaces Rn and Rm . Then
the induced matrix norm is ∥A∥ = λ1 , where λ1 is the largest
eigenvalue of AT A.
▶ The following important theorem also uses similar arguments:

Theorem (Rayleigh’s Inequality)
if P is a n × n real symmetric p.d. matrix, then
λmin (P )∥x∥2 ≤ xT P x ≤ λmax (P )∥x∥2

Geometry
Geometry

Geometry
Line Segments
▶ The line segment between two points x, y ∈ Rn is the set of points

on the straight line joining these two points
▶ If z lies on the line segment, there exists an α ∈ [0, 1] such that

z = αx + (1 − α)y
▶ Hence the line segment can be defined as:
l(x, y) = {αx + (1 − α)y ∶ α ∈ [0, 1]}

Geometry
Hyperplanes
Definition (Hyperplane)
Let u ∈ Rn and v ∈ R. An hyperplane is the set of points of the form
{x ∈ Rn ∶ ⟨u, x⟩ = v}
▶ Do not forget that a hyperplane is not necessarily a subspace!
∎ Hyperplane’s dimension is always

n−1
∎ Hyperplanes divide the space into
two halfspaces
∎ Positive halfspace: ⟨u, x⟩ ≥ v
∎ Negative halfspace: ⟨u, x⟩ ≤ v
∎ An alternative definition for a
hyperplane is ⟨x, a⟩ = 0 where a is a
point on the hyperplane
Geometry
Linear Varieties
Definition (Linear Variety (Affine Set))

A linear variety is a set of the form:
{x ∈ Rn ∶ Ax = b},
where A ∈ Rn×m and b ∈ Rm .
▶ We say that linear variety has dimension r if N (A) = r.
▶ A linear variety is a subspace if and only if b = 0

▶ It is easy to see that a linear variety contains all the lines between
any of its two points
∎ What about the converse? If a set contains all the lines between any of
its two points, is it an affine set?

Geometry
Convex Sets
Definition (Convex Set)

A set Θ is convex if for all x, y ∈ Θ we have l(x, y) ⊂ Θ.
▶ A point on l(x, y) is also called convex combination of x, y.

Hence a set is convex if it contains all of its convex combinations.

Geometry
Properties of Convex Sets
Theorem (Properties of Convex Sets)

Convex subsets of Rn have the following properties
1 If Θ is convex and β is a real number, then the set
βΘ = {x ∶ x = βv, v ∈ Θ}
is also convex.
2 If Θ1 and Θ2 are convex, then the set
Θ1 + Θ2 = {x ∶ x = v1 + v2 , v1 ∈ Θ1 , v2 ∈ Θ2 }
is also convex.
3 If Θi is a collection of convex sets, then ∩i Θi is also convex.
▶ A point x ∈ Θ is called an extreme point if it is not a convex

combination of any two other points in the set.
Geometry
Neighborhoods
Definition (Neighborhood)
Let > 0. An −neighborhood of a
point x ∈ Rn is the set
N (x) = {y ∈ Rn ∶ ∥y − x∥ < }
▶ A point x ∈ S called an interior point if there exists an > 0 such

that N (x) ⊂ S,
▶ A point x ∈ S called a limit point if for all > 0 we have
{N (x) ∩ S} ∖ {x} ≠ ∅,
▶ A set is open if all of its points are interior points.
▶ A set is closed if it contains all of its limit points.
▶ A set is bounded if there exists a M ∈ R such that S ⊂ NM (x)
▶ A set it compact if it is closed and bounded
Geometry
Polytopes and Polyhedra
▶ A polytope is a set that can be expressed as intersection of finite

number of halfspaces.
▶ If a polytope is nonempty and bounded, we call it a polyhedron
▶ Polytopes will be of major interest when we start studying linear

programming problems.

Calculus
Calculus

Calculus
Limit of a sequence
Definition (Convergent Sequence)

We say that the sequence xk ∈ R is convergent if there exists a x ∈ R
such that for all > 0 there exists an K ∈ N such that
k ≥ K Ô⇒ ∣xk − x∣ <
▶ If xk converges x we usually write xk → x or limk→∞ xk = x.

▶ If xk ∈ Rn , then we can use norm ∥.∥ instead of ∣.∣.
Theorem (Unique Limit)
A convergent sequence has a unique limit point.
▶ A sequence is bounded if there exists a M ∈ R such that ∣xk ∣ ≤ M .

Theorem (Convergence implies boundedness)
Every convergent sequence is bounded.
Calculus
Upper and Lower Bounds
▶ A number B ∈ R is called an upper bound for xk if xk ≤ B.

▶ The smallest upper bound of xk is call the supremum of the set:
Definition (Least Upper Bound)
Let xk be a bounded above sequence. The number B is called the least
upper bound or the supremum of xk , if
1 B is an upper bound of xk
2 For all > 0 there exists an xK ∈ xk such that xK ≥ B −
▶ A sequence is increasing if xk < xk+1 , non-decreasing if xk ≤ xk+1

▶ A sequence is decreasing if xk > xk+1 , non-increasing if xk ≥ xk+1
▶ A sequence is monotone if it is either increasing or decreasing.
Theorem (Convergence of Monotone Sequences)
Every bounded monotone sequence in R converges.
Calculus
Subsequences
▶ Let xk be a sequence in Rn and let mk be a sequence of increasing

natural numbers. We call the sequence
xm1 , xm2 , . . .
a subsequence of xk .
Theorem (Subsequences of Convergent Sequences)
Subsequences of convergent sequences converge to the same limit.
Theorem (Bolzano-Weierstrass)
Every bounded sequence contains a convergent subsequence.
▶ Sequences allow us to look at continuous function in a different light.

f ∶ Rn → Rm is continuous at x iff
xk → x Ô⇒ f (xk ) → f (x), ∀xk ∈ Rn
Calculus
Matrix Limits
▶ We can easily extend the notion of limits and convergence to

matrices. We say that Ak → A if
lim ∥Ak − A∥ = 0
k→∞
Theorem (Convergence to Zero Matrix)

Let A ∈ Rn×n . Then limk→∞ Ak = 0 if and only if the eigenvalues of A
satisfy ∣λi ∣ < 1.
Theorem (Geometric Matrix Series)
k=0 A converges iff limk→∞ A = 0. Then the series

k k
The series ∑∞
converges to (In − A)−1

Calculus
Matrix Valued Functions
▶ We can define a matrix valued function such as A ∶ Rr → Rm×n

∎ Hence A(ζ) returns a m × n matrix for a given ζ ∈ Rr
▶ We say that matrix valued function is continuous at ζ0 if
lim ∥A(ζ) − A(ζ0 )∥ = 0

∥ζ−ζ0 ∥→0
Theorem (Invertibility of continuous matrix functions)

Let A ∶ Rr → Rm×n be continuous at ζ0 . If A(ζ0 )−1 exists, then A(ζ)
is invertible in a sufficiently small neighbourhood of ζ0 and A(.)−1 is
continuous at ζ0 .
∎ Can be proven using the inverse function theorem

Calculus
Differentiability
▶ The main objective of differential calculus is to approximate an

arbitrary function f with an affine function A(x)
A(x) = L(x) + y
▶ A global approximation is usually not possible, hence we consider a

local approximation around the point x0 ∈ Rn . Which leads to:
∥f (x) − L(x − x0 ) − f (x0 )∥

lim =0
x→x0 ∥x − x0 ∥
▶ If such a linear transformation exists such that the above limit is

zero, we say that f differentiable at x0 . The linear transformation
L is called the derivative of f at x0 .

Calculus
Gradient and Jacobian
▶ A matrix is associated with every linear transformation. Hence

L(x) = Dx, where D ∈ Rm×n
▶ When f ∶ Rn → R, it is called the gradient ∇f (x0 )
∂f ∂f ∂f
Df (x0 ) = [ ∂x 1 ∂x2 ... ∂xn ] ∣ = ∇f (x0 )T
x=x0
▶ When f ∶ R → R , the matrix is called the Jacobian
n m
⎡ ∂f1 ∂f1 ∂f1 ⎤

⎢ ∂x1 . . . ∂x ⎥
⎢ ∂x2 n ⎥
⎢
Df (x0 ) = ⎢ ⋮ ⋮ ... ⋮ ⎥⎥ ∣
⎢ ∂fm ∂fm ∂fm ⎥ x=x0
⎢ ∂x . . . ∂xn ⎥⎦
⎣ 1 ∂x2
▶ Hence we approximate f at x0 with the affine function:
A(x) = f (x0 ) + Df (x0 )(x − x0 )
∎ This is an approximation in the sense that f (x) = A(x) + r(x) and

limx→x0 ∥r(x)∥/∥x − x0 ∥ = 0
Calculus
The Hessian
▶ Given f ∶ Rn → R, if f is twice differentiable, we can define the

Hessian matrix as:
⎡ ∂2f ∂2f ∂2f ⎤
⎢ ∂ 2 x1 . . . ⎥
⎢ ∂x 1 ∂x2 ∂x 1 ∂xn ⎥
D 2 f (x0 ) = ⎢⎢ ⋮ ⋮ ... ⋮ ⎥⎥ ∣
⎢ ∂2f ∂2f ∂2f ⎥
⎢ ⎥ x=x0
⎣ ∂xn ∂x1 ∂xn ∂x2 . . . ∂ 2 xn ⎦
▶ Schwarz’s Theorem: If f is twice continuously differentiable, then
∂2f ∂2f
=
∂xi ∂xj ∂xj ∂xi
∎ which makes the Hessian matrix symmetric!
∎ if the second partial derivatives are not continuous, then Hessian may not
be symmetric (See the example in the book).
▶ Since Hessian is symmetric, we can use Sylvester’s test or the
eigenvalue test to see if it is positive definite or not. This will be a
huge part of checking optimality conditions later in the course.
Calculus
Differentiation Rules
Theorem (Chain Rule)

Let g ∶ A → R be differentiable on A ⊂ Rn and let f ∶ (a, b) → A be
differentiable on (a, b). Define the composite function
h ∶ (a, b) → R = g(f (t)). Then h is differentiable and the derivative is
⎡ f ′ (t) ⎤
⎢ 1 ⎥
⎢ ⎥
h′ (t) = Dg(f (t))Df (t) = ∇g(f (t))⊺ ⎢ ⋮ ⎥
⎢ ′ ⎥
⎢fn (t)⎥
⎣ ⎦
Theorem (Product Rule)

Let f, g ∶ Rn → Rm be two differentiable functions. Define h ∶ Rn → R
as h(x) = f (x)T g(x). Then h is also differentiable and the derivative
is
Dh(x) = f (x)T Dg(x) + g(x)T Df (x)

Calculus
Some Other Useful Lemmas
▶ D(y T Ax) = y T A.
▶ D(xT Ax) = xT (A + AT ) if n = m.
▶ D(y T x) = y T .
▶ D(xT Qx) = 2xT Q if Q is symmetric.
▶ D(xT x) = 2xT

Calculus
Level Set
▶ Level set of a function f ∶ Rn → R at c ∈ R is the set of points

S = {x ∶ f (x) = c}
▶ Level sets can be represented as parametric curves g ∶ [0, 1] → Rn .

▶ The relationship between gradients of level set curve and function?
∎ Set h(t) = f (g(t)), h′ (t) = 0 since function f is constant on level set,
∇f (g(t))T ∇g(t) = 0
∎ They are orthogonal! This will be the main principle in design of many
optimization algorithms.
Calculus
Taylor’s Theorem
▶ Basis of many optimization methods and convergence proofs.

Theorem (Taylor’s Theorem for 1D)
Let f ∶ [a, b] → R ∈ C m . Denote h = b − a. Then,
h (1) h2 hm−1 (m−1)

f (b) = f (a) + f (a) + f (2) (a) + ⋅ ⋅ ⋅ + f (a) + Rm ,
1! 2! (m − 1)!
where f (i) is the ith order derivative of f and

hm (m)
Rm = f (a + θh), θ ∈ (0, 1)
m!
▶ Proof uses the Generalized Mean Value Theorem: If f, g are
differentiable on [a, b] there exists a point c ∈ (a, b) such that
f ′ (c) f (b) − f (a)
=
g ′ (c) g(b) − g(a)
Calculus
Order Symbols
▶ For further examination of the remainder term Rm we introduce the

order symbols
Definition (Order Symbols)
Let Ω ⊂ Rn ,0 ∈ Ω, g ∶ Ω → R with g(x) ≠ 0 if x ≠ 0 and f ∶ Ω → Rm .
∎ We say that f (x) = O(g(x)) if the quotient ∥f (x)∥/∣g(x)∣ is bounded
near 0. That is, there exists numbers δ, K > 0 such that
∥f (x)∥
∥x∥ < δ Ô⇒ ≤K
∣g(x)∣
∎ We say that f (x) = o(g(x)) if ∥f (x)∥ goes to zero faster than ∣g(x)∣,
∥f (x)∥
lim =0
x→0 ∣g(x)∣

Calculus
Order Symbol Examples
▶ Examples for Big-Oh symbol ▶ Examples for Little-Oh symbol

∎ x = O(x) ∎ x2 = o(x)
∎ [
x3 x3
] = O(x2 ) ∎ [ ] = o(x)
2x + 3x4
2
2x + 3x4
2
∎ cos(x) = O(1) ∎ x3 = o(x2 )
∎ sin(x) = O(x) ∎ x = o(1)
▶ It is evident that f (x) = o(g(x)) Ô⇒ f (x) = O(g(x)) but the

converse is not necessarily true.
▶ Inspection of the remainder term Rm in Taylor’s theorem yields
h (1) hm (m)
f (b) = f (a) + f (a) + ⋅ ⋅ ⋅ + f (a) + o(hm ), f ∈ C m
1! m!
h hm (m)
f (b) = f (a) + f (1) (a) + ⋅ ⋅ ⋅ + f (a) + O(hm+1 ), f ∈ C m+1
1! m!

Calculus
Taylor’s Theorem for Higher Dimensions
▶ Let f ∶ Rn → R ∈ C 2 , can we expand f to a Taylor series around the

point x0 ?
▶ Let z(α) = x0 + α ∥x−x00 ∥ and examine the function φ(α) = f (z(α))
(x−x )
∎ φ(∥x − x0 ∥) = f (x)
∎ Since φ is a single variable function, we can apply Taylor’s Theorem!
▶ Doing so yields (if f ∈ C 2 ):

1
f (x) = f (x0 )+Df (x0 )(x−x0 )+ (x−x0 )⊺ D2 f (x−x0 )+o(∥x−x0 ∥2 )
2
▶ If f ∈ C 3 :
1
f (x) = f (x0 )+Df (x0 )(x−x0 )+ (x−x0 )⊺ D2 f (x−x0 )+O(∥x−x0 ∥3 )
2

Calculus
Summary
▶ In this lecture we studied:
∎ Basic proof techniques we are going to use throughout the class

∎ Basics of linear algebra (span, basis, dimension, matrices, determinant)
∎ Linear transformations: how to transform between bases, eigenvalues,
quadratics forms, matrix norms
∎ Basic geometry: lines, planes, linear varieties, convex sets, polyhedra
∎ Basic calculus: convergence, differentiability, Taylor’s Theorem
▶ Next:
∎ Basics of optimization theory.

UUM 526 Optimization Techniques in Engineering Lecture 2: Mathematical Preliminaries

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

UUM 526 Optimization Techniques in Engineering Lecture 2: Mathematical Preliminaries

Caricato da

Copyright:

Formati disponibili

UUM 526 Optimization Techniques in Engineering

Lecture 2: Mathematical Preliminaries

Asst. Prof. N. Kemal Ure

Istanbul Technical University

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 1 / 57

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 2 / 57

▶ We need to review/build some mathematical tools before we can

∎ Single and Multivariable Calculus

▶ If you took UUM 535, everything here should look familiar.

▶ The main goal is to understand quadratic forms and Taylor’s

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 4 / 57

▶ A theorem is usually a statement of the form A Ô⇒ B

▶ Example (Triangle Inequality): A ∶ a, b ∈ R, B ∶ ∣a + b∣ ≤ ∣a∣ + ∣b∣

▶ Some theorems work both ways A ⇐⇒ B

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 5 / 57

Proof By Contradiction (PBC) and Proof By

▶ We just used PBC in the last proof! It is a very popular technique

▶ Proof by induction is also a popular technique. Suppose that

▶ Warning: Induction does not apply when n → ∞

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 7 / 57

▶ A vector space is a set V equipped with vector addition (+) and

▶ We will be mainly dealing with V = Rn and F = R.

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 8 / 57

Definition (Linearly Dependent List)

▶ If list A is not linearly dependent, it is called linearly independent.

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 9 / 57

Span, Basis and Dimension

▶ Span is always a subspace!

▶ By fixing a basis, we can represent vectors in V in terms of that basis.

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 10 / 57

Span, Basis and Dimension

Theorem (Unique Coordinates for a Fixed Base)

▶ There are usually infinite number of bases for a subspace. For

Vector and Matrix Notation

▶ For a ∈ Rn we will write vectors in column notation

Definition (Rank of a Matrix)

▶ Notice that r ≤ n. When r = n, we say that matrix is full rank.

▶ Determinant of matrix (denoted by ∣A∣) is a confusing concept at

∣A∣ = detA = det[a1 , . . . , αak + βbk , . . . , an ]

2 If for some k, ak = ak+1 , then ∣A∣ = 0.

Consequences of Determinant Definition

▶ If there is a zero column in the matrix, determinant is zero,

▶ If we add a column linear combination of other columns, determinant

▶ Determinant change sign if we interchange columns,

det[a1 , . . . , ak−1 , ak , . . . , an ] = −det[a1 , . . . , ak , ak−1 , . . . , an ].

▶ So, if the columns of A are not linearly independent, then ∣A∣ = 0.

Determinant and Rank

▶ Only square matrices have determinants. What if I want to test the

▶ Then we have this cool theorem:

▶ It is straightforward to show that rank of a matrix is the maximal

Nonsingular matrices and Inverses

▶ A square matrix A with det A ≠ 0 is called nonsingular.

▶ Matrix B is called the inverse of A and denoted as A−1 .

rankA = rank[A, b].

▶ If rankA = m < n, then we have infinite number of solutions.

Euclidian Inner Product

▶ We need to turn our vector space into a metric space by adding a

▶ For Rn , before defining the length function, it is helpful to define the

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 18 / 57

Inner Product Properties

▶ Inner product has the following properties

∎ Homogeneity: ⟨rx, y⟩ = r⟨x, y⟩, r ∈ R.

▶ These properties also hold for the second vector.

▶ A point x ∈ S called an interior point if there exists an > 0 such