Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
his note expands on appendix A.7 in Verbeek (2004) on matrix dierentiation. We first present the conventions for derivatives of scalar and vector
functions; then we present the derivatives of a number of special functions
particularly useful in econometrics, and, finally, we apply the ideas to derive the
ordinary least squares (OLS) estimator in the linear regression model. We should
emphasize that this note is cursory reading; the rules for specific functions needed in
this course are indicated with a ().
..
.
f ()
k
(1)
This is a k 1 column vector with typical elements given by the partial derivative f() .
i
Sometimes this vector is referred to as the gradient. It is useful to remember that the
derivative of a scalar function with respect to a column vector gives a column vector as
the result1 .
1
We can note that Wooldridge (2003, p.783) does not follow this convention, and let
row vector.
f ()
be a 1 k
Similarly, the derivative of a scalar function with respect to a row vector yields the
1 k row vector
f () f ()
f ()
.
=
1
k
0
2
Now let
g1 ()
..
g() =
.
gn ()
g ()
g1 ()
1
1
k
g()
..
..
..
,
(2)
=
.
.
.
0
gn ()
n ()
g
where each row, i = 1, 2, ..., n, contains the derivative of the scalar function gi () with
respect to the elements in . The result is therefore a n k matrix of derivatives with
i ()
. If the vector function is defined as a row vector, it
typical element (i, j) given by g
j
is natural to take the derivative with respect to the column vector, .
We can note that it holds in general that
g() 0
(g()0 )
=
,
(3)
2 f ()
2 f ()
=
=
0
0
2 f ()
1 1
..
.
..
.
2 f ()
k 1
2 f ()
1 k
..
.
2 f ()
k k
f ()
which is a k k matrix with typical elements (i, j) given by the second derivative
.
i j
Note that it does not matter if we first take the derivative with respect to the column or
the row.
First, let c be a k 1 vector and let be a k 1 vector of parameters. Next define the
scalar function f () = c0 , which maps the k parameters into a single number. It holds
that
(c0 )
= c.
()
f ()
=
k k )
2 2
..
.
c1
.
= .. = c,
ck
()
()
..
,
g() = A =
.
1k k )
..
.
..
.
(A11 1 +...+A1k k )
k
(An1 1 +...+Ank k )
1
(An1 1 +...+Ank k )
k
11 1
g()
=
..
.
A11
.
..
= ..
.
An1
A1k
..
.
= A.
Ank
0 A0
= A0 .
()
0V
= (V + V 0 ),
()
0V
= 0 (V + V 0 ).
0
()
1
V11 V12 V13
0V =
1 2 3 V21 V22 V23 2
V31 V32 V33
3
( 0 V )
0
(0 V1 )
V
=
0 2
( V )
= V12 + V21
2V22
V23 + V32 2
V13 + V31 V23 + V32
2V33
3
1
V11 V12 V13
V11 V21 V31
To illustrate the use of matrix dierentiation consider the linear regression model in matrix
notation,
Y = X + ,
where Y is a T 1 vector of stacked left-hand-side variables, X is a T k matrix of
explanatory variables, is a k 1 vector of parameters to be estimated, and is a T 1
vector of error terms. Here k is the number of explanatory variables and T is the number
of observations.
4
One way to motivate the ordinary least squares (OLS) principle is to choose the estibOLS of , as the value that minimizes the sum of squared residuals, i.e.
mator,
bOLS = arg min
T
X
t=1
0
b
b
Y X
b0b = Y X
b
b0 X 0 Y X
= Y0
b
b0 X 0 Y +
b0 X 0 X
b
= Y 0Y Y 0X
b+
b0 X 0 X ,
b
= Y 0 Y 2Y 0 X
b and
b0 X 0 Y are identical scalar variables.
where the last line uses the fact that Y 0 X
b yields
Note that b0b is a scalar function and taking the first derivative with respect to
the k 1 vector
0
0 Y 2Y 0 X
b+
b0 X 0 X
b
Y
bb
b
=
= 2X 0 Y + 2X 0 X .
b
b
(e0e)
e
X Y,
OLS = X X
0
0 Y + 2X 0 X
b
2
2X
bb
=
= 2X 0 X,
b
b0
b0
References
[1] Verbeek, Marno (2004): A Guide to Modern Econometrics, Second edition, John
Wiley and Sons.
[2] Wooldridge, Jeffrey M. (2003): Introductory Econometrics: A Modern Approach,
2nd edition, South Western College Publishing.