Matrixdiff PDF

INTRODUCTION TO VECTOR
AND MATRIX DIFFERENTIATION

Econometrics 2
Heino Bohn Nielsen
September 21, 2005
his note expands on appendix A.7 in Verbeek (2004) on matrix dierentiation. We first present the conventions for derivatives of scalar and vector
functions; then we present the derivatives of a number of special functions
particularly useful in econometrics, and, finally, we apply the ideas to derive the
ordinary least squares (OLS) estimator in the linear regression model. We should
emphasize that this note is cursory reading; the rules for specific functions needed in
this course are indicated with a ().
Conventions for Scalar Functions
Let = ( 1 , ..., k )0 be a k 1 vector and let f () = f ( 1 , ..., k ) be a real-valued function

that depends on , i.e. f () : Rk 7 R maps the vector into a single number, f ().
Then the derivative of f () with respect to is defined as
f ()
f ()
=
..
.
f ()
k
(1)
This is a k 1 column vector with typical elements given by the partial derivative f() .
i
Sometimes this vector is referred to as the gradient. It is useful to remember that the
derivative of a scalar function with respect to a column vector gives a column vector as
the result1 .
1
We can note that Wooldridge (2003, p.783) does not follow this convention, and let
row vector.
f ()
be a 1 k
Similarly, the derivative of a scalar function with respect to a row vector yields the
1 k row vector
f () f ()
f ()
.
=
1
k
0
2
Now let
Conventions for Vector Functions
g1 ()
..
g() =
.
gn ()
be a vector function depending on = ( 1 , ..., k )0 , i.e. g() : Rk 7 Rn maps the k 1

vector into a n 1 vector, where gi () = gi ( 1 , ..., k ), i = 1, 2, ..., n, is a real-valued
function.
Since g() is a column vector it is natural to consider the derivatives with respect to a
row vector, 0 , i.e.
g ()
g1 ()
1
1
k
g()
..
..
..
,
(2)
=
.
.
.
0
gn ()
n ()
g
where each row, i = 1, 2, ..., n, contains the derivative of the scalar function gi () with
respect to the elements in . The result is therefore a n k matrix of derivatives with
i ()
. If the vector function is defined as a row vector, it
typical element (i, j) given by g
j
is natural to take the derivative with respect to the column vector, .
We can note that it holds in general that
g() 0
(g()0 )
=
,
(3)
which in the case above is a k n matrix.

Applying the conventions in (1) and (2) we can define the Hessian matrix of second
derivatives of a scalar function f () as
2 f ()
2 f ()
=
=
0
0
2 f ()
1 1
..
.
..
.
2 f ()
k 1
2 f ()
1 k
..
.
2 f ()
k k
f ()
which is a k k matrix with typical elements (i, j) given by the second derivative
.
i j
Note that it does not matter if we first take the derivative with respect to the column or
the row.
Some Special Functions
First, let c be a k 1 vector and let be a k 1 vector of parameters. Next define the
scalar function f () = c0 , which maps the k parameters into a single number. It holds
that
(c0 )
= c.
()
To see this, we can write the function as

f () = c0 = c1 1 + c2 2 + ... + ck k .
Taking the derivative with respect to yields
(c +c +...+c
1 1
f ()
=
k k )
2 2
..
.
(c1 1 +c2 2 +...+ck k )

k
c1
.
= .. = c,

ck
which is a k 1 vector as expected. Also note that since 0 c = c0 , it holds that

0c
= c.
()
Now, let A be a n k matrix and let be a k 1 vector of parameters. Furthermore

define the vector function g() = A, which maps the k parameters into n function values.
g() is an n 1 vector and the derivative with respect to 0 is a n k matrix given by
(A)
= A.
0
()
To see this, write the function as
A11 1 + A12 2 + ... + A1k k
..
,
g() = A =
.
An1 1 + An2 2 + ... + Ank k
and find the derivative

(A +...+A
1k k )
..
.
..
.
(A11 1 +...+A1k k )
k
(An1 1 +...+Ank k )
1
(An1 1 +...+Ank k )
k
11 1
g()
=
..
.
A11
.
..
= ..
.

An1
A1k
..
.
= A.
Ank
Similarly, if we consider the transposed function, g() = 0 A0 , which is a 1 n row vector,

we can find the k n matrix of derivatives as
0 A0
= A0 .
()
This is just an application of the result in (3).

3
Now consider a quadratic function f () = 0 V for some kk matrix V . This function

maps the k parameters into a single number. Here we find the derivatives as the k 1
column vector
0V
= (V + V 0 ),
()
or the row variant
0V
= 0 (V + V 0 ).
0
()
If V is symmetric this reduces to 2V and 2 0 V , respectively. To see how this works,

consider the simple case k = 3 and write the function as
1
V11 V12 V13
0V =
1 2 3 V21 V22 V23 2
V31 V32 V33
3
= V11 21 + V22 22 + V33 23 + (V12 + V21 ) 1 2 + (V13 + V31 ) 1 3 + (V23 + V32 ) 2 3 .
Taking the derivative with respect to , we get
( 0 V )
0
(0 V1 )
V
=
0 2
( V )
2V11 1 + (V12 + V21 ) 2 + (V13 + V31 ) 3
= 2V22 2 + (V12 + V21 ) 1 + (V23 + V32 ) 3

2V33 3 + (V13 + V31 ) 1 + (V23 + V32 ) 2
V12 + V21 V13 + V31

1
2V11
= V12 + V21
2V22
V23 + V32 2
V13 + V31 V23 + V32
2V33
3
1
V11 V12 V13
V11 V21 V31
= V21 V22 V23 + V12 V22 V32 2

V31 V32 V33
V13 V23 V33
3
= (V + V 0 ).
The Linear Regression Model
To illustrate the use of matrix dierentiation consider the linear regression model in matrix
notation,
Y = X + ,
where Y is a T 1 vector of stacked left-hand-side variables, X is a T k matrix of
explanatory variables, is a k 1 vector of parameters to be estimated, and is a T 1
vector of error terms. Here k is the number of explanatory variables and T is the number
of observations.
4
One way to motivate the ordinary least squares (OLS) principle is to choose the estibOLS of , as the value that minimizes the sum of squared residuals, i.e.
mator,
bOLS = arg min
T
X
t=1
b2t = arg min b0b.
Looking at the function to be minimized, we find that
0
b
b
Y X
b0b = Y X
b
b0 X 0 Y X
= Y0
b
b0 X 0 Y +
b0 X 0 X
b
= Y 0Y Y 0X
b+
b0 X 0 X ,
b
= Y 0 Y 2Y 0 X
b and
b0 X 0 Y are identical scalar variables.
where the last line uses the fact that Y 0 X
b yields
Note that b0b is a scalar function and taking the first derivative with respect to
the k 1 vector
0
0 Y 2Y 0 X
b+
b0 X 0 X
b
Y
bb
b
=
= 2X 0 Y + 2X 0 X .
b
b
Solving the k equations,
(e0e)
e
= 0, yields the OLS estimator

0 1 0
b
X Y,
OLS = X X
provided that X 0 X is non-singular.

bOLS is a minimum of b0b and not a maximum, we should formally
To make sure that
take the second derivative and make sure that it is positive definite. The k k Hessian
matrix of second derivatives is given by
0
0 Y + 2X 0 X
b
2
2X
bb
=
= 2X 0 X,
b
b0
b0

which is a positive definite matrix by construction.
References
[1] Verbeek, Marno (2004): A Guide to Modern Econometrics, Second edition, John
Wiley and Sons.
[2] Wooldridge, Jeffrey M. (2003): Introductory Econometrics: A Modern Approach,
2nd edition, South Western College Publishing.

Matrixdiff PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Matrixdiff PDF

Caricato da

Copyright:

Formati disponibili

INTRODUCTION TO VECTOR

AND MATRIX DIFFERENTIATION

Conventions for Scalar Functions

Let = ( 1 , ..., k )0 be a k 1 vector and let f () = f ( 1 , ..., k ) be a real-valued function

Conventions for Vector Functions

be a vector function depending on = ( 1 , ..., k )0 , i.e. g() : Rk 7 Rn maps the k 1

which in the case above is a k n matrix.

Some Special Functions

To see this, we can write the function as

(c1 1 +c2 2 +...+ck k )

which is a k 1 vector as expected. Also note that since 0 c = c0 , it holds that

Now, let A be a n k matrix and let be a k 1 vector of parameters. Furthermore

To see this, write the function as

A11 1 + A12 2 + ... + A1k k

An1 1 + An2 2 + ... + Ank k

and find the derivative

Similarly, if we consider the transposed function, g() = 0 A0 , which is a 1 n row vector,

This is just an application of the result in (3).

Now consider a quadratic function f () = 0 V for some kk matrix V . This function

or the row variant

If V is symmetric this reduces to 2V and 2 0 V , respectively. To see how this works,

= V11 21 + V22 22 + V33 23 + (V12 + V21 ) 1 2 + (V13 + V31 ) 1 3 + (V23 + V32 ) 2 3 .

Taking the derivative with respect to , we get

2V11 1 + (V12 + V21 ) 2 + (V13 + V31 ) 3

= 2V22 2 + (V12 + V21 ) 1 + (V23 + V32 ) 3

V12 + V21 V13 + V31

= V21 V22 V23 + V12 V22 V32 2

The Linear Regression Model

b2t = arg min b0b.

Looking at the function to be minimized, we find that

Solving the k equations,

= 0, yields the OLS estimator

provided that X 0 X is non-singular.

which is a positive definite matrix by construction.

Potrebbero piacerti anche