Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Lecture Notes
1. Introduction
I The principal purpose of this lecture is to demonstrate how matrices can
be used to simplify the development of statistical models.
I A secondary purpose is to review, and extend, some material in linear
models.
I I will take up the following topics:
Expressing linear models for regression, dummy regression, and
analysis of variance in matrix form.
Deriving the least-squares coefficients using matrices.
Distribution of the least-squares coefficients.
The least-squares coefficients as maximum-likelihood estimators.
Statistical inference for linear models.
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 2
Sociology 761 c
Copyright 2014 by John Fox
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 4
Sociology 761 c
Copyright 2014 by John Fox
Collecting 5
these6q equations
5 into a single
6 matrix equation:
5 6
|1 1 {11 {1n 5 6 %1
9 |2 : 9 1 {21 {2n : 0 9 %2 :
9 : 9 : 9 :
9 : 9 : 9 : 9 :
9 : = 9 : 9 .1 : + 9 :
9 : 9 : 7 . 8 9 :
9 : 9 : 9 :
7 8 7 8 n 7 8
|q 1 {q1 {qn %q
y = X + %
(q1) (qn+1)(n+11) (q1)
The X matrix in the linear model is called the model matrix (or the
design matrix).
Note the column of 1s for the constant.
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 6
0 0 2%
equivalently,
y Qq(X> 2% Iq)
Sociology 761 c
Copyright 2014 by John Fox
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 8
I Recall that this model implies potentially different intercepts and slopes
that is, potentially different regression lines for the two groups:
for men,
|l = + {l + 1 + ({l1) + %l
= ( + ) + ( + ){l + %l
for women
|l = + {l + 0 + ({l0) + %l
= + {l + %l
and so is the difference in intercepts between men and women, and
is the difference in slopes.
Because men and women can have different slopes, this model
permits gender to interact with education in determining income.
Sociology 761 c
Copyright 2014 by John Fox
I Written as
5 a matrix6 equation,
5 the dummy-regression
6 5model becomes.
6
|1 1 {1 0 0 5 6 % 1
9 .. : 9 .. .. .. .. : 9 .. :
9 : 9 : 9 :
9 |q1 : 9 1 {q1 0 0 : 9 : 9 %q1 :
9 : 9 :9 : 9 :
9 |q +1 : = 9 1 {q +1 1 {q +1 : 7 8 + 9 %q +1 :
9. 1 : 9. . 1 . . 1 : 9. 1 :
7. 8 7. . . . 8 7. 8
|q 1 {q 1 {q %q
y = X + %
where, for clarity, the q1 observations for women precede the q q1
observations for men.
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 10
Sociology 761 c
Copyright 2014 by John Fox
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 12
I The matrix
5 form of the
6 one-way
5 ANOVA model 6 is 5 6
|11 1 1 0 0 0 %11
group 1 9 . : .
9. . .. . . . 9 ..
9. : 9 . . : : 9
:
:
9 |q1>1 : 9 1 1 0 0 0 : 9 %q1>1 :
9 : 9 : 9 :
9 |12 : 9 1 0 1 0 0 :5 6 9 %12 :
group 2 9
9 ..
:
:
9. . .
9. . . .. .. : :
9.
9
:
:
9 : 9 : 9 1 : 9 . :
9 |q2>2 : 9 1 0 1 0 0 :9 : 9 :
9. : 9. . . . . : 9 2 : 9 %. q2>2 :
9. : = 9. . . . . : 9 :+9 . :
9 : 9 : 9 .. : 9 :
9 |1>p1 : 9 1 0 0 1 0 :9 : 9 %1>p1 :
group 9 : 9 : 9
.. .. : 7 p1 8 9 ..
:
9 .. : 9 .. .. .. :
p1 9 9 |q >p1 :
: 9
9 1 0 0 1 0 : p
: 9
9 %q >p1
:
:
9 p1 : 9 : 9 p1 :
9 |1p : 9 1 0 0 0 1 : 9 %1p :
group p 9
7 ..
:
8
9
7 .. .. ..
:
.. .. 8
9
7 ..
:
8
|qp>p 1 0 0 0 1 %qp>p
y = X + %
Sociology 761 c
Copyright 2014 by John Fox
Sociology 761 c
Copyright 2014 by John Fox
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 16
5 6
() (1) (2) (p1)
9 1 1 0 0 :
group 1 9 .. .. .. .. :
9 :
9 :
9 1 1 0 0 :
9 :
9 1 0 1 0 :
group 2 9 .. .. .. .. :
9 :
9 :
9 1 0 1 0 :
X = 9 .. .. .. .. :
9 :
(qp) 9 :
9 1 0 0 1 :
group p 1 9
9 .. .. .. .. :
:
9 :
9 1 0 0 1 :
9 :
9 1 1 1 1 :
group p 9 :
7 .. .. .. .. 8
1 1 1 1
Sociology 761 c
Copyright 2014 by John Fox
3. Least-Squares Fit
I The fitted linear model is
y = Xb + e
where
b = [e0> e1> ===> en ]0 is the vector of fitted coefficients.
e = [h1> h2> ===> hq]0 = y Xb is the vector of residuals.
I We want the coefficient vector b that minimizes the residual sum of
squares, expressed asXa function of b:
V(b) = h2l = e0e = (y Xb)0(y Xb)
= y0y y0Xb b0X0y + b0X0Xb
= y0y (2y0X)b + b0(X0X)b
The last line of the equation is justified because y0 X b and
(1q)(qn+1)(n+11)
b0 X0 y are both scalars, and consequently equal.
(1n+1)(n+1q)(q1)
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 18
Setting the derivative to 0 produces the normal equations for the linear
model
2X0y + 2X0Xb = 0
X0Xb = X0y
a system of n + 1 linear equations in n + 1 unknowns (i.e., e0> e1> ===> en ).
We can solve the normal equations uniquely for b if as the (n + 1)
(n + 1) matrix X0X is nonsingular, which will be the case as long as
there are at least as many observations as coefficients that is,
q n + 1.
no column of the model matrix X is a perfect linear function of the
other columns.
Sociology 761 c
Copyright 2014 by John Fox
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 20
Sociology 761 c
Copyright 2014 by John Fox
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 22
Sociology 761 c
Copyright 2014 by John Fox
I Notice the strong analogy between the formulas for the slope coefficient
in least-squares simple regression (i.e., with a single {) and for the
coefficients of the linear model in matrix form:
Simple Regression Linear Model
Model |l = + {l + %l y = X + %
| =P { +%
{|
Least-Squares Estimator e = P 2 b = (X0X)1X0y
P {21 P
= { {|
2
Sampling Variance Y (e) = P % 2 Y (b) = 2% (X0X)1
2
P {21
= % {
Distribution e h i b h i
2
P 2
1 2 0 1
Q > % { Qn+1 > % (X X)
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 24
Sociology 761 c
Copyright 2014 by John Fox
Note: exp(d) in a formula means hd, for the constant h ' 2=718.
I In maximum-likelihood estimation, recall, we find the values of the
parameters that make the probability of observing the data as high as
possible.
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 26
Sociology 761 c
Copyright 2014 by John Fox
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 28
I Notice that
The MLE b is just the least-squares coefficients b.
P
The MLE of the error variance, b2% = h2l @q is biased.
The usual unbiased estimator, v2h , divides by residual degrees of
freedom q n 1 rather than by q.
The MLE is consistent, however, since the bias (along with the
variance of the estimator) goes to zero as q get larger.
Sociology 761 c
Copyright 2014 by John Fox
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 30
For example:
To test
K0: m = 0
we compute
em
w0 =
SE(em )
To form a 95-percent confidence interval for m we take
m = em w=975>qn1SE(em )
where w=975>qn1 is the .975 quantile of the w-distribution with q n 1
degrees of freedom.
I More generally, suppose that we want to test the linear hypothesis
K0: L = c
(tn+1)(n+11) (t1)
where the hypothesis matrix L and the right-hand-side vector c (usually
0) encode the hypothesis.
Sociology 761 c
Copyright 2014 by John Fox
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 32
Sociology 761 c
Copyright 2014 by John Fox
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 34
Sociology 761 c
Copyright 2014 by John Fox
Sociology 761 c
Copyright 2014 by John Fox
Linear Models Using Matrices 36
h i1
0 0 1 0
(Lb) L(X X) L Lb
I0 =
tv2h
3 5 65 641
0=1021 0=0008 0=0008 0 0
0 1 0 7
[0=599> 0=546] C 0=0008 0=0001 0=0000 8 7 1 0 8D
0 0 1
0=0008
0=0000
0=0001 0 1
0=599
0=546
=
2 178=7309
= 101=22 with 2 and 42 degrees of freedom, s ' 0
Sociology 761 c
Copyright 2014 by John Fox
To test
the hypothesis
that the slopes are equal:
L = 0 1 1
5 6
6=06466
Lb = 0 1 1 7 0=59873 8 = 0=05290 (i.e., the difference in slopes)
0=54583
h i1
0 0 1 0
(Lb) L(X X) L Lb
I0 =
tv2h
3 5 65 641
0=1021 0=0008 0=0008 0
0=053 C 0 1 1 7 0=0008 0=0001 0=0000 8 7 1 8D 0=053
0=0008 0=0000 0=0001 1
=
1 178=7309
= 0=068 with 1 and 42 degrees of freedom, s = =80
Sociology 761 c
Copyright 2014 by John Fox