Sei sulla pagina 1di 4

Derivatives with respect to vectors

Let x ∈ Rn (a column vector) and let f : Rn → R. The derivative of f


with respect to x is the row vector:
∂f ∂f ∂f
=( ,..., )
∂x ∂x1 ∂xn
∂f
∂x
is called the gradient of f.

The Hessian matrix is the square matrix of second partial derivatives of


a scalar valued function f:
 ∂2f ∂2f

∂x 2 . . . ∂x1 ∂xn
 ∂ 2 f1 ∂2f 
 ∂x x . . . ∂x 2 ∂xn 

H(f ) = 
 ..
2 1
(1)
 .


∂2f ∂2f
∂xn ∂x1
... ∂x21

The mixed derivatives of f are the entries off the main diagonal in the
Hessian. Assuming that they are continuous, the order of differentiation
does not matter.
If the gradient of f is zero at some point x, then f has a critical point at x.
The determinant of the Hessian at x is then called the discriminant. If this
determinant is zero then x is called a degenerate critical point of f. Otherwise
it is non-degenerate.
For a non-degenerate critical point x, if the Hessian is positive definite at x,
then f attains a local minimum at x. If the Hessian is negative definite at x,
then f attains a local maximum at x. If the Hessian has both positive and
negative eigenvalues then x is a saddle point for f (this is true even if x is
degenerate). Otherwise the test is inconclusive.

1
Let x ∈ Rn (a column vector) and let f : Rn → Rm . The derivative of f
with respect to x is the m × n matrix:

 ∂f (x)1 ∂f (x)1 
∂x1
... ∂xn
∂f
=  ... (2)
 
∂x

∂f (x)m ∂f (x)m
∂x1
... ∂xn

∂f
∂x
is called the Jacobian matrix of f.

Examples:
Let u, x ∈ Rn (column vectors).

1. The derivative of uT x = ni=1 ui xi with respect to x:


P

∂ ni=1 uixi ∂uT x


P
= ui ⇒ = (u1 , . . . , un ) = uT (3)
∂xi ∂x

2. The derivative of xT x = ni=1 xi with respect to x:


P

∂ ni=1 x2i ∂xT x


P
= 2xi ⇒ = (2x1 , . . . , 2xn ) = 2xT (4)
∂xi ∂x

We will compute this derivative once again using the product rule: first
holding x constant and then holding xT constant.

∂(f (x)T g(x)) ∂(f (x)T g(x̄)) ∂(f (x̄)T g(x))


= + (5)
∂x ∂x ∂x

Placing a bar over a vector to indicate that it is being treated as con-


stant, we have:

∂xT x ∂xT x̄ ∂ x̄T x


= + = xT + xT = 2xT (6)
∂x ∂x ∂x

2
3. Let A ∈ Rm×n and x ∈ Rn . Let aT1 , . . . , aTn be the rows of A.
   
aT1 aT1 x
 aT   aT x 
 2  2 
Ax =  ..  x =  ..  (7)
 .   . 
aTm aTm x
 T 
∂a1 x  
aT
 ∂a∂xT2 x   1T 
∂Ax  a
∂x  =  2 

= .  ..  (8)
∂x  ..   . 
 
∂aTmx aTm
∂x
∂Ax
⇒ =A (9)
∂x

4. Example: Let A ∈ Rm×n and x ∈ Rn .


∂xT Ax ∂xT Ax̄ ∂ x̄T Ax
= + (10)
∂x ∂x ∂x
∂xT u ∂uT x
To compute these derivatives we will use ∂x
= ∂x
= uT by substi-
tuting u1 = Ax̄ and uT2 = x̄T A.

∂xT Ax ∂xT Ax̄ ∂ x̄T Ax


= + = (11)
∂x ∂x ∂x
∂xT u1 ∂uT2 x
+ = uT1 + uT2 = xT AT + xT A = xT (A + AT )
∂x ∂x
∂xT Ax
If A is symmetric then A = AT and ∂x
= 2xT A. Taking the second
derivative, we have:
∂ 2 xT Ax
= A + AT (12)
∂x2

The chain rule: Let U ⊆ Rn , V ⊆ Rn be open sets. Let f : U → V be


differentiable in a ∈ U and let g : V → Rp be differentiable in b = f (a) then
g(f (x)) : U → Rp is differentiable in a and ∂g(f∂a(a)) = ∂g(b)
∂b
∂f (a)
∂a
.
Example: We will compute again example 4, this time we will define u = x̄
and use the chain rule:
∂xT Ax ∂xT Ax̄ ∂ x̄T Ax
= + = (13)
∂x ∂x ∂x
∂(xT A)u ∂uT (Ax)
+
∂x ∂x
3
Now define b1 = AT x and b2 = Ax,

∂(xT A)u ∂uT (Ax) ∂bT u ∂b1 ∂uT b2 ∂b2


+ = 1 + = (14)
∂x ∂x ∂b ∂x ∂b2 ∂x
uT AT + uT A = xT (A + AT )

Potrebbero piacerti anche