Sei sulla pagina 1di 57

UUM 526 Optimization Techniques in Engineering

Lecture 2: Mathematical Preliminaries

Asst. Prof. N. Kemal Ure

Istanbul Technical University


ure@itu.edu.tr

February 5, 2019

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 1 / 57


Overview

1 Proof Methods

2 Linear Algebra

3 Geometry

4 Calculus

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 2 / 57


Introduction

▶ We need to review/build some mathematical tools before we can


start attacking optimality tests and convergence of algorithms
▶ We are going to review:
∎ Proof Methods
∎ Linear Algebra
∎ Geometry

∎ Single and Multivariable Calculus

▶ If you took UUM 535, everything here should look familiar.


∎ We will skip most proofs here, since we already did them at UUM 535.
∎ All the proofs are available in Chong’s anyway.

▶ The main goal is to understand quadratic forms and Taylor’s


Theorem in Rn , which will be our main tools in analysis of
optimization problems and algorithms.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 3 / 57
Proof Methods

Proof Methods

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 4 / 57


Proof Methods

Proofs

▶ A theorem is usually a statement of the form A Ô⇒ B


∎ A is assumption of the theorem (what is given to us), B is the conclusion
of the problem. If A is true, then B holds

▶ Example (Triangle Inequality): A ∶ a, b ∈ R, B ∶ ∣a + b∣ ≤ ∣a∣ + ∣b∣

▶ Some theorems work both ways A ⇐⇒ B


∎ A, is true if and only if B is true
∎ In this case we have to prove both A Ô⇒ B and B Ô⇒ A. Example:

Theorem
Let a, b be real numbers. Then a = b if and only if for all  > 0, ∣a − b∣ < 

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 5 / 57


Proof Methods

Proof By Contradiction (PBC) and Proof By


Induction

▶ We just used PBC in the last proof! It is a very popular technique


▶ Based on the fact that A Ô⇒ B is equivalent to not(A and(notB))
Theorem
There are infinite number of prime numbers

▶ Proof by induction is also a popular technique. Suppose that


property we want to prove is indexed by N, so we need to prove
A(1), A(2), . . .
∎ First prove that it is true for A(1)
∎ Next prove that A(n) Ô⇒ A(n + 1)

Theorem
n(n+1)
Let n ∈ N, then ∑ni=1 i = 2

▶ Warning: Induction does not apply when n → ∞


Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 6 / 57
Linear Algebra

Linear Algebra

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 7 / 57


Linear Algebra

Linear Combination

▶ A vector space is a set V equipped with vector addition (+) and


scalar multiplication (.) such that:

α ⋅ v1 + β ⋅ v2 ∈ V, ∀v1 , v2 ∈ V, α, β ∈ F (1)

▶ We will be mainly dealing with V = Rn and F = R.


▶ If S ⊆ V satisfies Eq. 1, then S is called a subspace of V .
Definition (Linear Combination)
Let ak , k = 1, . . . , n be a finite number of vectors. Vector b is said to be
a linear combination of vectors ak , if there exists scalars αk such that:
n
b = α1 a1 + α2 a2 + . . . + αn an = ∑ αk ak
k=1

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 8 / 57


Linear Algebra

Linear Dependency

Definition (Linearly Dependent List)


The list of vectors A = (ak ∶ k = 1, . . . , n) are said to be linearly
dependent if one of the vectors is a linear combination of the others.

▶ If list A is not linearly dependent, it is called linearly independent.


Theorem (Test for linear independence)
The list A = (ak ∶ k = 1, . . . , n) is linearly independent iff the equality,
n
∑ αk ak = 0, (2)
k=1

implies that αk = 0, k = 1 . . . , n.

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 9 / 57


Linear Algebra

Span, Basis and Dimension

Definition (Span)
The set of all linear combinations of a list of vectors {ak } is called the
span of {ak }
n
span[a1 , . . . , an ] = { ∑ αk ak ∶ αk ∈ R}
k=1

▶ Span is always a subspace!


Definition (Basis)
If the list (ak ) is linearly independent and span[a1 , . . . , an ] = V , then
(ak ) is a basis for V .

▶ By fixing a basis, we can represent vectors in V in terms of that basis.

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 10 / 57


Linear Algebra

Span, Basis and Dimension

Theorem (Unique Coordinates for a Fixed Base)


Let {ak } be a basis for V . Then any vector v ∈ V can be represented
uniquely as,
n
v = ∑ αk ak .
k=1

▶ There are usually infinite number of bases for a subspace. For


instance if (ak ) is a basis, so is (cak ), c ∈ R.
▶ What about the number of vectors in a basis?
Theorem (Unique Number of Vectors in a Basis)
Let ak , k = 1, . . . , n and bi , i = 1, . . . , m be two different bases for V .
Then n = m.
▶ Hence every space (or subspace) V has a unique number of vectors
in its every basis. That number is called the dimension of V .
▶ We call αk the coordinates of the vector v in base {ak }.
n Kemal Ure (ITU)
Asst. Prof. N. Lecture 2 February 5, 2019 11 / 57
Linear Algebra

Vector and Matrix Notation

▶ For a ∈ Rn we will write vectors in column notation


⎡ a1 ⎤
⎢ ⎥
⎢a ⎥
⎢ ⎥ T
a = ⎢ 2 ⎥ = [a1 a2 . . . an ] , ai ∈ R
⎢ ⋮ ⎥
⎢ ⎥
⎢an ⎥
⎣ ⎦
▶ A matrix A ∈ Rm×n is a rectangular collection of real numbers
aij ∈ R, i = 1, . . . , m.j = 1, . . . , n
⎡ a11 a12 . . . a1n ⎤
⎢ ⎥
⎢a ⎥
⎢ 21 a22 . . . a2n ⎥
A=⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⎥⎥

⎢am1 am2 . . . amn ⎥
⎣ ⎦
▶ It is much more useful to think A as a collection of n vectors lying in
Rm : A = [a1 , a2 , . . . , an ], ak ∈ Rm
∎ Also matrix-vector multiplication Av makes more sense this way
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 12 / 57
Linear Algebra

Matrix Rank

Definition (Rank of a Matrix)


Rank r is the maximal number of independent columns of A ∈ Rm×n .

▶ Notice that r ≤ n. When r = n, we say that matrix is full rank.


Theorem (Invariance of Rank)
Rank of a matrix A ∈ Rm×n is invariant under following operations.
1 Multiplication of columns of A by nonzero scalars.
2 Interchange of columns.
3 Addition to a given column a linear combination of other columns.

▶ Nice, but is there a formula for testing if the matrix has full rank? Is
there a scalar quantity that measures the independency of columns?
∎ For square matrices (m = n), the answer is yes! It is called the
determinant of the matrix.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 13 / 57
Linear Algebra

Determinant

▶ Determinant of matrix (denoted by ∣A∣) is a confusing concept at


first, has many different interpretations.
▶ The properties of the determinant are more important than its
explicit formula
Definition (Determinant)
Determinant is a function denoted as det ∶ Rn×n → R, and possess the
following properties:
1 Determinant is linear in matrix’s columns

∣A∣ = detA = det[a1 , . . . , αak + βbk , . . . , an ]


= αdet[a1 , . . . , ak , . . . , an ]
+ β[det[a1 , . . . , bk , . . . , an ].

2 If for some k, ak = ak+1 , then ∣A∣ = 0.


3 Determinant of the identity matrix is 1, that is ∣In ∣ = 1.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 14 / 57
Linear Algebra

Consequences of Determinant Definition

▶ If there is a zero column in the matrix, determinant is zero,

det[a1 , . . . , 0, . . . , an ] = 0.

▶ If we add a column linear combination of other columns, determinant


does not change.
⎡ ⎤
⎢ n ⎥
det ⎢a1 , . . . , ak + ∑ αj aj , . . . , an ⎥⎥ = det[a1 , . . . , ak , . . . , an ]

⎢ ⎥
⎣ j=1,j≠k ⎦

▶ Determinant change sign if we interchange columns,

det[a1 , . . . , ak−1 , ak , . . . , an ] = −det[a1 , . . . , ak , ak−1 , . . . , an ].

▶ So, if the columns of A are not linearly independent, then ∣A∣ = 0.


∎ Hence for a square matrix: full rank ⇐⇒ nonzero determinant.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 15 / 57
Linear Algebra

Determinant and Rank

▶ Only square matrices have determinants. What if I want to test the


rank of a rectangular matrix?
∎ Rectangular matrices have square sub-matrices! Wonder if their
determinant is useful... First we need to define it:
Definition (Minor)
pth order minor of a matrix A ∈ Rm×n is the determinant of sub-matrix
formed by deleting m − p rows and n − p columns.

▶ Then we have this cool theorem:


Theorem (Minors and Rank)
If an A ∈ Rm×n (m ≥ n) has a nonzero nth order minor, then
rankA = n.

▶ It is straightforward to show that rank of a matrix is the maximal


order of its nonzero minors.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 16 / 57
Linear Algebra

Nonsingular matrices and Inverses

▶ A square matrix A with det A ≠ 0 is called nonsingular.


▶ A matrix A ∈ Rn×n is nonsingular if and only if there exists a matrix
B ∈ Rn×n such that:
AB = BA = In

▶ Matrix B is called the inverse of A and denoted as A−1 .


▶ Shows up in the solution of linear equations Ax = b. The unique
solution exists if A is nonsingular (x = A−1 b)
▶ What about non-square linear systems?
Theorem (Existence of Solution in a Linear System)
The set of equations represented by Ax = b has a solution if and only if

rankA = rank[A, b].

▶ If rankA = m < n, then we have infinite number of solutions.


Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 17 / 57
Linear Algebra

Euclidian Inner Product

▶ We need to turn our vector space into a metric space by adding a


”length” function.
∎ A such function already exists for V = R, the absolute value function ∣.∣
∎ Some very useful properties: −∣a∣ ≤ a ≤ ∣a∣, ∣ab∣ = ∣a∣∣b∣
∎ The most useful property: ∣a + b∣ ≤ ∣a∣ + ∣b∣

▶ For Rn , before defining the length function, it is helpful to define the


inner product first
Definition (Euclidean Inner Product)
The Euclidean Inner Product of two vectors x, y ∈ Rn is defined as
n
⟨x, y⟩ = ∑ xi yi = xT y
i=1

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 18 / 57


Linear Algebra

Inner Product Properties

▶ Inner product has the following properties


∎ Positivity: ⟨x, x⟩ ≥ 0, x = 0 ⇐⇒ ⟨x, x⟩ = 0.
∎ Symmetry: ⟨x, y⟩ = ⟨y, x⟩.
∎ Additivity: ⟨x + y, z⟩ = ⟨x, z⟩ + ⟨y, z⟩.

∎ Homogeneity: ⟨rx, y⟩ = r⟨x, y⟩, r ∈ R.

▶ These properties also hold for the second vector.


▶ Two vectors are orthogonal if ⟨x, y⟩ = 0.
▶ Now we can define the length, it is called the Euclidean norm:
√ √
∥x∥ = < x, x > = xT x

Theorem (Cauchy-Schwartz Inequality)


For any x, y ∈ Rn
∣⟨x, y⟩∣ ≤ ∥x∥∥y∥,
The equality holds only if x = αy for some α ∈ R.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 19 / 57
Linear Algebra

Norm Properties

▶ Norm possess many properties of the absolute value function


∎ Positivity: ∥x∥ ≥ 0, ∥x∥ = 0 ⇐⇒ x = 0
∎ Homogeneity: ∥rx∥ = ∣r∣∥x∥, r ∈ R
∎ Triangle Inequality: ∥x + y∥ ≤ ∥x∥ + ∥y∥

▶ There are many other vector norms. Actually any function that
satisfies the properties above is a norm.
1
∎ p-norm: ∥x∥p = (∣x1 ∣p + ∣x2 ∣p + ⋅ ⋅ ⋅ + ∣xn ∣p ) p
∎ p = 2, the Euclidean norm

∎ What does p = 0 or 1 corresponds to? What happens when p → ∞?

▶ Continuity of f ∶ Rn → Rm can be formulated in terms of norms


∎ f is continuous at x0 ∈ Rn if and only if for all  > 0, there exists a δ > 0
such that ∥x − x0 ∥ < δ Ô⇒ ∥f (x) − f (x0 )∥ < 

▶ if x ∈ Cn (complex numbers), inner product is defined as ∑ni=1 xi yi ,


hence ⟨x, y⟩ = ⟨y, x⟩ and ⟨x, ry⟩ = r̄⟨x, y⟩.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 20 / 57
Linear Algebra

Linear Transformations

Definition (Linear Transformation)


A function L ∶ Rn → Rm is called a linear transformation if

L(ax) = aL(x), a ∈ R
L(x + y) = L(x) + L(y)

▶ If we fix a basis for Rn and Rm we can represent L with a matrix,


such that L(x) = Ax. Hence different bases corresponds to different
matrix representations.
∎ Let {u1 , . . . , un } be a basis for Rn and let {v1 , . . . , vm } be a basis for
Rm . Then we can express L(uk ) = Auk ∈ Rm as:

Auk = ak,1 v1 + ⋅ ⋅ ⋅ + ak,m vm

the ak,1 , . . . , ak,m makes up the k th column of A.

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 21 / 57


Linear Algebra

Similarity Transformations

▶ Let ei and e′i be two different bases of Rn . Let x be the coordinates


of v ∈ Rn with respect to ei and let x′ be the coordinates of v with
respect to e′i .
▶ Then there is a transformation matrix T ∈ Rn×n such that x′ = T x,
T = [e′1 , . . . , e′n ]−1 [e1 , . . . , en ]

▶ Nice, now we know how to transform coordinates between two bases.


How about transforming matrices?
Theorem (Transformation of Bases)
Let A, B ∈ Rn×n be two representations of the same linear
transformation according to different bases. Then there exists a
nonsingular matrix T ∈ Rn×n such that B = T AT −1

▶ Two matrices are called similar if there exists a T such that


B = T AT −1 . Similar matrices represent the same transformation.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 22 / 57
Linear Algebra

Eigenvalues and Eigenvectors

▶ Later we will see that proving convergence of many optimization


algorithms rely on computation of square matrix Ak .
∎ Can we find a basis of Rn such that Ak is easy to compute?
∎ This leads us to the study of eigenvalues and eigenvectors:

Definition (Eigenvalue and Eigenvector)


Let A ∈ Rn×n . If a vector v ∈ Rn satisfies the equation Av = λv, it is
called an eigenvector of A. λ is called the eigenvalue that corresponds
to v.
▶ Has many different physical and geometrical interpretations
∎ We will see that eigenvectors of a linear transformation will form a basis
that results in a very special matrix representation
▶ How to find v? How many are there? It is difficult answer directly...
That is where the eigenvalues come in
∎ Rearrange Av = λv to get (A − Iλ)v = 0
∎ What are the solutions?
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 23 / 57
Linear Algebra

Eigenvalues

▶ (A − Iλ)v = 0 has a non-trivial solution only if (A − Iλ) is singular.


▶ Hence equation for v turns into an equation for λ, det(A − Iλ) = 0
▶ Results in the nth order polynomial

det(A − Iλ) = λn + an−1 λn−1 + ⋅ ⋅ ⋅ + a1 λ + a0 = 0

▶ Hence there are at most n distinct eigenvalues and corresponding


eigenvectors!
∎ once a λ is found the corresponding v is easy to find.

▶ Although the number of eigenvalues is finite, that this is not the case
for eigenvectors..
∎ If v is an eigenvector, so is cv.
∎ So there are actually an infinite number of eigenvectors

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 24 / 57


Linear Algebra

Eigenvectors as a Basis

▶ Is there a case where eigenvectors are linearly independent, so that


we can form a basis out of them?
Theorem (Linearly Independent Eigenvectors)
Suppose that A has distinct eigenvalues (i ≠ j Ô⇒ λi ≠, λj ). Then
the corresponding non-zero eigenvectors vi are linearly independent.

▶ Now lets rearrange the eigenvectors into the matrix T , apply


B = T AT −1 and see what happens...
⎡λ1 0 ⎤⎥

⎢ ⎥
⎢ λ2 ⎥
B=⎢ ⎥
⎢ ⋱ ⎥
⎢ ⎥
⎢0 λn ⎥⎦
⎣ . ..

▶ What if we also wanted the basis to be orthogonal?


Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 25 / 57
Linear Algebra

Symmetric Matrices

▶ A real matrix A is symmetric if AT = A. This leads to a lots of


cool properties
Theorem (Eigenvalues of Symmetric Matrices)
All eigenvalues of real symmetric matrices are real.

Theorem (Eigenvectors of Symmetric Matrices)


An n × n symmetric matrix has a set of n orthogonal eigenvectors.

▶ A matrix with orthogonal columns is called an orthogonal matrix


∎ When we normalize each column so that each have unit length, we get an
orthonormal matrix
∎ Orthonormal matrices has this super cool property: T −1 = T T

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 26 / 57


Linear Algebra

Orthogonal Subspaces

Definition (Orthogonal Complement)


Let V be a subspace of Rn . The orthogonal complement V ⊥ consists
of all vectors that are orthogonal to V . That is:

V ⊥ = {x ∈ Rn ∶ v T x = 0, v ∈ V }

▶ Every vector x can be uniquely


decomposed as x = x1 + x2 ,
where x1 ∈ V and x2 ∈ V ⊥ . We
call x1 and x2 orthogonal
projections onto V and V ⊥ . We
say that a linear transformation
P is an orthogonal projector if
P x ∈ V and x − P x ∈ V ⊥ .

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 27 / 57


Linear Algebra

Range and Nullspace

Definition (Range and Nullspace)


Let A ∈ Rm×n . Range of A is defined as R(A) = {Ax ∶ x ∈ Rn }.
Nullspace of A (also called the kernel) is defined as
N (A) = {x ∈ Rn ∶ Ax = 0}.

▶ R and N are subspaces! Moreover:


Theorem (Orthogonality of Matrix Subspaces)
Let A be a given matrix. Then R(A)⊥ = N (AT )

▶ Theorem above is important on its own, it also allows us to prove:


Theorem (Projection Matrix)
A Matrix P is an orthogonal projection if and only if P 2 = P = P T

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 28 / 57


Linear Algebra

Quadratic Forms

Definition (Quadratic Forms)


A function f ∶ Rn → R is a quadratic form if it can be represented as

f (x) = xT Qx,

where Q is a real matrix.

▶ Note that we can always assume Q is symmetric. Just replace Q


with 12 (Q + QT ), which is always symmetric.
▶ f (x) is called positive definite (p.d.) if xT Qx > 0 for x ≠ 0
▶ f (x) is called positive semidefinite (p.s.d) if xT Qx ≥ 0 for x ≠ 0
▶ We love p.d functions in optimization!
∎ They are basically the generalization of parabolas to higher dimensions

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 29 / 57


Linear Algebra

Positive Definite Matrices

▶ Q is called a p.d. matrix when the associated quadratic form is p.d.


∎ How can we check if a given matrix is p.d.?
▶ First we need to define what a leading principle minor is. Then we
can use the following test
Theorem (Sylvester’s Test)
The symmetric matrix Q is p.d. iff its leading principle minors are
positive.
∎ The test does not work for non-symmetric matrices! For p.s.d. we check
if the leading principle minors are non-negative (necessary condition), for
sufficiency we need to check all the principal minors.
▶ An alternative test is to check the eigenvalues
Theorem (Eigenvalue Test)
The symmetric matrix Q is p.d. iff all eigenvalues are positive.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 30 / 57
Linear Algebra

Matrix Norms

▶ Matrix norms satisfy these conditions


∎ ∥A∥ ≥ 0 and ∥A∥ = 0 ⇐⇒ ∥A∥ = 0.
∎ ∥cA∥ = ∣c∣∥A∥.
∎ ∥A + B∥ ≤ ∥A∥ + ∥B∥.

▶ We will also enforce them to obey ∥AB∥ ≤ ∥A∥∥B∥.


▶ Example: Flatten A ∈ Rn×m into a vector in Rnm and apply the
Euclidian norm. The result is the Frobenius norm ∥.∥F :
1
⎛ n m 2 ⎞2
∥A∥F = ∑ ∑ aij
⎝i=1 j=1 ⎠

▶ If we have a norm ∥.∥n on Rn and the norm ∥.∥m on Rm , these two


norms induce the following norm on all linear transformations from
Rn to Rm :
∥A∥ = max ∥Ax∥m
∥x∥=1

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 31 / 57


Linear Algebra

Induced Norms

▶ Since norm ∥.∥m is a continuous function and the set ∥x∥ = 1 is


compact, the maximum is always attained. We can also check the all
4 norm properties to confirm that the induced norm in indeed a norm.
▶ A nice property of the induced norm is, they can be usually linked to
the spectral properties of the matrices, such as:
Theorem (Matrix Norm Induced by Euclidean Norm)
Let ∥.∥n and ∥.∥m be the Euclidean√norm on spaces Rn and Rm . Then
the induced matrix norm is ∥A∥ = λ1 , where λ1 is the largest
eigenvalue of AT A.

▶ The following important theorem also uses similar arguments:


Theorem (Rayleigh’s Inequality)
if P is a n × n real symmetric p.d. matrix, then

λmin (P )∥x∥2 ≤ xT P x ≤ λmax (P )∥x∥2


Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 32 / 57
Geometry

Geometry

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 33 / 57


Geometry

Line Segments

▶ The line segment between two points x, y ∈ Rn is the set of points


on the straight line joining these two points

▶ If z lies on the line segment, there exists an α ∈ [0, 1] such that


z = αx + (1 − α)y
▶ Hence the line segment can be defined as:

l(x, y) = {αx + (1 − α)y ∶ α ∈ [0, 1]}

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 34 / 57


Geometry

Hyperplanes

Definition (Hyperplane)
Let u ∈ Rn and v ∈ R. An hyperplane is the set of points of the form

{x ∈ Rn ∶ ⟨u, x⟩ = v}

▶ Do not forget that a hyperplane is not necessarily a subspace!

∎ Hyperplane’s dimension is always


n−1
∎ Hyperplanes divide the space into
two halfspaces
∎ Positive halfspace: ⟨u, x⟩ ≥ v
∎ Negative halfspace: ⟨u, x⟩ ≤ v
∎ An alternative definition for a
hyperplane is ⟨x, a⟩ = 0 where a is a
point on the hyperplane
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 35 / 57
Geometry

Linear Varieties

Definition (Linear Variety (Affine Set))


A linear variety is a set of the form:

{x ∈ Rn ∶ Ax = b},

where A ∈ Rn×m and b ∈ Rm .

▶ We say that linear variety has dimension r if N (A) = r.

▶ A linear variety is a subspace if and only if b = 0


▶ It is easy to see that a linear variety contains all the lines between
any of its two points
∎ What about the converse? If a set contains all the lines between any of
its two points, is it an affine set?

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 36 / 57


Geometry

Convex Sets

Definition (Convex Set)


A set Θ is convex if for all x, y ∈ Θ we have l(x, y) ⊂ Θ.

▶ A point on l(x, y) is also called convex combination of x, y.


Hence a set is convex if it contains all of its convex combinations.

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 37 / 57


Geometry

Properties of Convex Sets

Theorem (Properties of Convex Sets)


Convex subsets of Rn have the following properties
1 If Θ is convex and β is a real number, then the set

βΘ = {x ∶ x = βv, v ∈ Θ}

is also convex.
2 If Θ1 and Θ2 are convex, then the set

Θ1 + Θ2 = {x ∶ x = v1 + v2 , v1 ∈ Θ1 , v2 ∈ Θ2 }

is also convex.
3 If Θi is a collection of convex sets, then ∩i Θi is also convex.

▶ A point x ∈ Θ is called an extreme point if it is not a convex


combination of any two other points in the set.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 38 / 57
Geometry

Neighborhoods

Definition (Neighborhood)
Let  > 0. An −neighborhood of a
point x ∈ Rn is the set

N (x) = {y ∈ Rn ∶ ∥y − x∥ < }

▶ A point x ∈ S called an interior point if there exists an  > 0 such


that N (x) ⊂ S,
▶ A point x ∈ S called a limit point if for all  > 0 we have
{N (x) ∩ S} ∖ {x} ≠ ∅,
▶ A set is open if all of its points are interior points.
▶ A set is closed if it contains all of its limit points.
▶ A set is bounded if there exists a M ∈ R such that S ⊂ NM (x)
▶ A set it compact if it is closed and bounded
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 39 / 57
Geometry

Polytopes and Polyhedra

▶ A polytope is a set that can be expressed as intersection of finite


number of halfspaces.

▶ If a polytope is nonempty and bounded, we call it a polyhedron

▶ Polytopes will be of major interest when we start studying linear


programming problems.

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 40 / 57


Calculus

Calculus

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 41 / 57


Calculus

Limit of a sequence

Definition (Convergent Sequence)


We say that the sequence xk ∈ R is convergent if there exists a x ∈ R
such that for all  > 0 there exists an K ∈ N such that

k ≥ K Ô⇒ ∣xk − x∣ < 

▶ If xk converges x we usually write xk → x or limk→∞ xk = x.


▶ If xk ∈ Rn , then we can use norm ∥.∥ instead of ∣.∣.
Theorem (Unique Limit)
A convergent sequence has a unique limit point.

▶ A sequence is bounded if there exists a M ∈ R such that ∣xk ∣ ≤ M .


Theorem (Convergence implies boundedness)
Every convergent sequence is bounded.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 42 / 57
Calculus

Upper and Lower Bounds

▶ A number B ∈ R is called an upper bound for xk if xk ≤ B.


▶ The smallest upper bound of xk is call the supremum of the set:
Definition (Least Upper Bound)
Let xk be a bounded above sequence. The number B is called the least
upper bound or the supremum of xk , if
1 B is an upper bound of xk
2 For all  > 0 there exists an xK ∈ xk such that xK ≥ B − 

▶ A sequence is increasing if xk < xk+1 , non-decreasing if xk ≤ xk+1


▶ A sequence is decreasing if xk > xk+1 , non-increasing if xk ≥ xk+1
▶ A sequence is monotone if it is either increasing or decreasing.
Theorem (Convergence of Monotone Sequences)
Every bounded monotone sequence in R converges.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 43 / 57
Calculus

Subsequences

▶ Let xk be a sequence in Rn and let mk be a sequence of increasing


natural numbers. We call the sequence
xm1 , xm2 , . . .
a subsequence of xk .
Theorem (Subsequences of Convergent Sequences)
Subsequences of convergent sequences converge to the same limit.

Theorem (Bolzano-Weierstrass)
Every bounded sequence contains a convergent subsequence.

▶ Sequences allow us to look at continuous function in a different light.


f ∶ Rn → Rm is continuous at x iff
xk → x Ô⇒ f (xk ) → f (x), ∀xk ∈ Rn
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 44 / 57
Calculus

Matrix Limits

▶ We can easily extend the notion of limits and convergence to


matrices. We say that Ak → A if

lim ∥Ak − A∥ = 0
k→∞

Theorem (Convergence to Zero Matrix)


Let A ∈ Rn×n . Then limk→∞ Ak = 0 if and only if the eigenvalues of A
satisfy ∣λi ∣ < 1.

Theorem (Geometric Matrix Series)

k=0 A converges iff limk→∞ A = 0. Then the series


k k
The series ∑∞
converges to (In − A)−1

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 45 / 57


Calculus

Matrix Valued Functions

▶ We can define a matrix valued function such as A ∶ Rr → Rm×n


∎ Hence A(ζ) returns a m × n matrix for a given ζ ∈ Rr

▶ We say that matrix valued function is continuous at ζ0 if

lim ∥A(ζ) − A(ζ0 )∥ = 0


∥ζ−ζ0 ∥→0

Theorem (Invertibility of continuous matrix functions)


Let A ∶ Rr → Rm×n be continuous at ζ0 . If A(ζ0 )−1 exists, then A(ζ)
is invertible in a sufficiently small neighbourhood of ζ0 and A(.)−1 is
continuous at ζ0 .

∎ Can be proven using the inverse function theorem

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 46 / 57


Calculus

Differentiability

▶ The main objective of differential calculus is to approximate an


arbitrary function f with an affine function A(x)

A(x) = L(x) + y

▶ A global approximation is usually not possible, hence we consider a


local approximation around the point x0 ∈ Rn . Which leads to:

∥f (x) − L(x − x0 ) − f (x0 )∥


lim =0
x→x0 ∥x − x0 ∥

▶ If such a linear transformation exists such that the above limit is


zero, we say that f differentiable at x0 . The linear transformation
L is called the derivative of f at x0 .

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 47 / 57


Calculus

Gradient and Jacobian

▶ A matrix is associated with every linear transformation. Hence


L(x) = Dx, where D ∈ Rm×n
▶ When f ∶ Rn → R, it is called the gradient ∇f (x0 )
∂f ∂f ∂f
Df (x0 ) = [ ∂x 1 ∂x2 ... ∂xn ] ∣ = ∇f (x0 )T
x=x0
▶ When f ∶ R → R , the matrix is called the Jacobian
n m

⎡ ∂f1 ∂f1 ∂f1 ⎤


⎢ ∂x1 . . . ∂x ⎥
⎢ ∂x2 n ⎥

Df (x0 ) = ⎢ ⋮ ⋮ ... ⋮ ⎥⎥ ∣
⎢ ∂fm ∂fm ∂fm ⎥ x=x0
⎢ ∂x . . . ∂xn ⎥⎦
⎣ 1 ∂x2
▶ Hence we approximate f at x0 with the affine function:
A(x) = f (x0 ) + Df (x0 )(x − x0 )

∎ This is an approximation in the sense that f (x) = A(x) + r(x) and


limx→x0 ∥r(x)∥/∥x − x0 ∥ = 0
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 48 / 57
Calculus

The Hessian

▶ Given f ∶ Rn → R, if f is twice differentiable, we can define the


Hessian matrix as:
⎡ ∂2f ∂2f ∂2f ⎤
⎢ ∂ 2 x1 . . . ⎥
⎢ ∂x 1 ∂x2 ∂x 1 ∂xn ⎥
D 2 f (x0 ) = ⎢⎢ ⋮ ⋮ ... ⋮ ⎥⎥ ∣
⎢ ∂2f ∂2f ∂2f ⎥
⎢ ⎥ x=x0
⎣ ∂xn ∂x1 ∂xn ∂x2 . . . ∂ 2 xn ⎦
▶ Schwarz’s Theorem: If f is twice continuously differentiable, then
∂2f ∂2f
=
∂xi ∂xj ∂xj ∂xi
∎ which makes the Hessian matrix symmetric!
∎ if the second partial derivatives are not continuous, then Hessian may not
be symmetric (See the example in the book).
▶ Since Hessian is symmetric, we can use Sylvester’s test or the
eigenvalue test to see if it is positive definite or not. This will be a
huge part of checking optimality conditions later in the course.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 49 / 57
Calculus

Differentiation Rules

Theorem (Chain Rule)


Let g ∶ A → R be differentiable on A ⊂ Rn and let f ∶ (a, b) → A be
differentiable on (a, b). Define the composite function
h ∶ (a, b) → R = g(f (t)). Then h is differentiable and the derivative is
⎡ f ′ (t) ⎤
⎢ 1 ⎥
⎢ ⎥
h′ (t) = Dg(f (t))Df (t) = ∇g(f (t))⊺ ⎢ ⋮ ⎥
⎢ ′ ⎥
⎢fn (t)⎥
⎣ ⎦

Theorem (Product Rule)


Let f, g ∶ Rn → Rm be two differentiable functions. Define h ∶ Rn → R
as h(x) = f (x)T g(x). Then h is also differentiable and the derivative
is
Dh(x) = f (x)T Dg(x) + g(x)T Df (x)

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 50 / 57


Calculus

Some Other Useful Lemmas

▶ D(y T Ax) = y T A.

▶ D(xT Ax) = xT (A + AT ) if n = m.

▶ D(y T x) = y T .

▶ D(xT Qx) = 2xT Q if Q is symmetric.

▶ D(xT x) = 2xT

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 51 / 57


Calculus

Level Set

▶ Level set of a function f ∶ Rn → R at c ∈ R is the set of points


S = {x ∶ f (x) = c}

▶ Level sets can be represented as parametric curves g ∶ [0, 1] → Rn .


▶ The relationship between gradients of level set curve and function?
∎ Set h(t) = f (g(t)), h′ (t) = 0 since function f is constant on level set,
∇f (g(t))T ∇g(t) = 0
∎ They are orthogonal! This will be the main principle in design of many
optimization algorithms.
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 52 / 57
Calculus

Taylor’s Theorem

▶ Basis of many optimization methods and convergence proofs.


Theorem (Taylor’s Theorem for 1D)
Let f ∶ [a, b] → R ∈ C m . Denote h = b − a. Then,

h (1) h2 hm−1 (m−1)


f (b) = f (a) + f (a) + f (2) (a) + ⋅ ⋅ ⋅ + f (a) + Rm ,
1! 2! (m − 1)!

where f (i) is the ith order derivative of f and


hm (m)
Rm = f (a + θh), θ ∈ (0, 1)
m!
▶ Proof uses the Generalized Mean Value Theorem: If f, g are
differentiable on [a, b] there exists a point c ∈ (a, b) such that
f ′ (c) f (b) − f (a)
=
g ′ (c) g(b) − g(a)
Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 53 / 57
Calculus

Order Symbols

▶ For further examination of the remainder term Rm we introduce the


order symbols
Definition (Order Symbols)
Let Ω ⊂ Rn ,0 ∈ Ω, g ∶ Ω → R with g(x) ≠ 0 if x ≠ 0 and f ∶ Ω → Rm .
∎ We say that f (x) = O(g(x)) if the quotient ∥f (x)∥/∣g(x)∣ is bounded
near 0. That is, there exists numbers δ, K > 0 such that

∥f (x)∥
∥x∥ < δ Ô⇒ ≤K
∣g(x)∣

∎ We say that f (x) = o(g(x)) if ∥f (x)∥ goes to zero faster than ∣g(x)∣,

∥f (x)∥
lim =0
x→0 ∣g(x)∣

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 54 / 57


Calculus

Order Symbol Examples

▶ Examples for Big-Oh symbol ▶ Examples for Little-Oh symbol


∎ x = O(x) ∎ x2 = o(x)
∎ [
x3 x3
] = O(x2 ) ∎ [ ] = o(x)
2x + 3x4
2
2x + 3x4
2

∎ cos(x) = O(1) ∎ x3 = o(x2 )

∎ sin(x) = O(x) ∎ x = o(1)

▶ It is evident that f (x) = o(g(x)) Ô⇒ f (x) = O(g(x)) but the


converse is not necessarily true.
▶ Inspection of the remainder term Rm in Taylor’s theorem yields
h (1) hm (m)
f (b) = f (a) + f (a) + ⋅ ⋅ ⋅ + f (a) + o(hm ), f ∈ C m
1! m!
h hm (m)
f (b) = f (a) + f (1) (a) + ⋅ ⋅ ⋅ + f (a) + O(hm+1 ), f ∈ C m+1
1! m!

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 55 / 57


Calculus

Taylor’s Theorem for Higher Dimensions

▶ Let f ∶ Rn → R ∈ C 2 , can we expand f to a Taylor series around the


point x0 ?
▶ Let z(α) = x0 + α ∥x−x00 ∥ and examine the function φ(α) = f (z(α))
(x−x )

∎ φ(∥x − x0 ∥) = f (x)
∎ Since φ is a single variable function, we can apply Taylor’s Theorem!

▶ Doing so yields (if f ∈ C 2 ):


1
f (x) = f (x0 )+Df (x0 )(x−x0 )+ (x−x0 )⊺ D2 f (x−x0 )+o(∥x−x0 ∥2 )
2

▶ If f ∈ C 3 :
1
f (x) = f (x0 )+Df (x0 )(x−x0 )+ (x−x0 )⊺ D2 f (x−x0 )+O(∥x−x0 ∥3 )
2

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 56 / 57


Calculus

Summary

▶ In this lecture we studied:

∎ Basic proof techniques we are going to use throughout the class


∎ Basics of linear algebra (span, basis, dimension, matrices, determinant)
∎ Linear transformations: how to transform between bases, eigenvalues,
quadratics forms, matrix norms
∎ Basic geometry: lines, planes, linear varieties, convex sets, polyhedra
∎ Basic calculus: convergence, differentiability, Taylor’s Theorem

▶ Next:
∎ Basics of optimization theory.

Asst. Prof. N. Kemal Ure (ITU) Lecture 2 February 5, 2019 57 / 57

Potrebbero piacerti anche