Stat371 CouseNotes S13

Winter 2013
LIST OF FIGURES
STAT 371 (Winter 2013 - 1135)

Statistics for Businesses I
Prof. H. Fahmy
University of Waterloo
LATEXer: W. KONG
<http://stochasticseeker.wordpress.com/>
Last Revision: June 9, 2013
Table of Contents
1 Review
1.1 Methods of Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
2 The General Linear Regression Model (GLRM)

2.1 The Classicial Assumptions of the GLRM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
4
3 Estimates and Estimators
4 Analysis of Variance (ANOVA)

4.1 Adjusted R2 Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Generalized ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
7
7
5 Statistical Inference and the GLRM

5.1 Single Variable Inference . . . .
5.2 Inference in the GLRM . . . . .
5.3 R Framework . . . . . . . . .
5.4 R Test . . . . . . . . . . . . .
5.5 Single Restriction R Test . . .
5.6 Applications . . . . . . . . . . .
5.7 One-sided vs. Two-sided Tests .
5.8 Multiple Restriction R Test . .
5.9 Test of the Goodness of Fit . . .
5.10 Tests for Multiple Restrictions .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
9
9
10
11
11
12
13
13
13
16
References
17
Appendix A
18
These notes are currently a work in progress, and as such may be incomplete or contain errors.
List of Figures
Winter 2013
ACKNOWLEDGMENTS
A CKNOWLEDGMENTS :
Special thanks to Michael Baker and his LATEX formatted notes. They were the inspiration for the structure of these notes.
ii
Winter 2013
ABSTRACT
Abstract
The purpose of these notes is ...
iii
Winter 2013
THE GENERAL LINEAR REGRESSION MODEL (GLRM)
Errata
Dr. H. Fahmy
M3 2018
Office hours: T,Th @ 4-5pm
Midterm: Thursday, June 13th, 2013 @ 30% (2:30pm-4:30pm)
Assignments: 4 Assignments @ 5% each = 20%
Exam: Final Exam @ 50%
Review
Definition 1.1. Given a regression Yt = f (Xt ), Yt is called the response variable or regressor, and Xt is called the explanatory
variable or regressand.
Definition 1.2. Here are the steps of model building:
1) Specification (define variables, gather data)
2) Estimation (MLE, Least Squares, GMM, Report/ANOVA)
3) Evaluation (inference)
4) Assessing the validity of your results
Definition 1.3. The error term is a random part of a regression model that accounts for all information that is not captured
by the model. The presence of an error term indicates a stochastic formulation and the lack of one makes it a deterministic
formulation.
Note 1. In the model Yt = 0 + 1 Xt + t , Yt and Xt are observed variables, 0 and 1 are true unknown parameters and t
is an unobserved error term.
1.1
Methods of Estimation
(Reviewed in the Tutorial and omitted here; the methods discussed were Least Squares and MLE for a simple linear regression).
The General Linear Regression Model (GLRM)
From this point forward, the author assumes that the reader has a good understanding of linear algebra.
Definition 2.1. We define the the GLRM as follows. Suppose that we have k explanatory variables (including the constant
variable) and n equations (n is the number of observations) with k < n. Let Xab be the bth observation of the ath variable, Yt
be the tth observation, and t be the the tth variable.
t
t
t
Define Y = Y1 Y2 Yn , U = 1 2 n , = 1 2 n
and a matrix X M where the
nth row and mth column entry is Xmn with X1n = 1 for all n. That is, the lth column is the vector of observations of the lth
explanatory variable.
The GLRM in a compactification is
1) The true model: Y = X + U
We also define:
2) The estimated: Y = X
= Y Y
3) The residual: U
1
Winter 2013
.
Note that Y = X + U
From the least squares method,
RSS =
n
X
D
E
, U
2t = U
i=1
and we want to minimize RSS by changing (ordinary least squares). Note that,
D
E
, U
RSS =
U
t (Y X )
(Y X )
= Y t Y Y 0 X t X t Y + t X t X
= Y t Y 2t X t Y + t X t X
=
and using first order conditions, we want
RSS
k1
= 0k1 where
RSS
= 0k1 = OLS = (X t X)1 X t Y
= 2X t Y + 2X 0 X B
k1
and note that the order of the variable in the denominator of the partial must match the result of the partial. The equation
RSS
= 0k1
= 2X t Y + 2X 0 X B
k1
is called the normal equation.
Note that we assume that X is of rank n in order for X t X to be invertible since null(At A) = null(A) by rank nullity theorem.
Example 2.1. In a simple regression, we have

t
Pn
X X=
x2t

P
P x2t
, X tY =
x22t
Yt
X2t Yt
t
yt = (Yt Y ) which we call deviation form.

Note that we also use the notation xt = (Xt X),
Example 2.2. Consider the stochastic presentation of the Cobb-Douglas production function
t
Qt = cL
t Kt e
where t is the error term. If we are given data from 2000 to 2010 per year, we are given 11 observation.
To model this we do the following:
1) (Estimation) Linearize the model [Log-log model]:
ln Qt = ln c + ln Lt + ln Kt + t
2) (Estimation) Re-parametrize: Yt = ln Qt , 1 = ln c, 2 = , ln Lt = X2t , 3 = , X3t = ln Kt and so
Yt = 1 + 2 X2t + 3 X3t + t
OLS = (X t X)1 X t Y where X M113 (R) and Y M111 (R),
3) (Estimation) Calculate B
P
P
n
X2t
X3t
P
P
P
2
X2t X3t
X t X = P X2t P X2t
P
2
X3t
X2t X3t
X3t
33
which is fairly difficult to invert. Instead we work with the deviation form
Yt = 1 + 2 X2t + 3 X3t + t
2t + 3 X
3t +
Y = 1 + 2 X
yt = 2 x2t + 3 x3t + t
Winter 2013
This creates a new matrix form y = x 0 + U where y contains the yt0 s, x contains the x0t s, U contains the 0t s and contains
only 2 and 3 . So we now have

P 2
P
x2t x3t
x2t
t
P
P
xx=
x2t x3t
x23t
which is easier to invert. Thus,
0 =
P 2
P x2t
x2t x3t
and we can deduce 1 using
1 P

P
Px2t2x3t
P x2t yt
x3t
x3t yt
2 3 X
3
1 = Y 2 X
Example 2.3. Let the true model be

Yt = 1 + 2 X2t + 3 X3t + t , t = 1, 2, ..., 23
where Yt is the log output, X2t is the log labour output, and X3t is the log capital output. The data given (in deviation form)
is
X
X
X
X
X
X
x22t = 12,
x23t = 12,
x2t x3t = 8,
x2t yt = 10,
x3t yt = 8,
yt2 = 10.
We want to estimate the model using the least squares estimate and explain the meaning using the estimated coefficient. The
following is the solution.
yt = 2 x2t + 3 x3t + t , n = 23, k = 2
where
=
t

=
12
8
8
12
1
10
8

=
0.15 0.10
0.10 0.15

10
8

=
0.7
0.2
and so 2 = 0.7, 3 = 0.2. Thus, our model is

Yt = 1 + 0.7X2t + 0.2X3t
and note that the betas are actually the Xt elasticities of Yt . Here the A elasticity of B is given by
EAB =
%4B
dB A
=
dA B
%4A
Summary 1. Here are a few relevant statistical models:

1) Log-log model: ln Yt = 1 + 2 ln Xt
2) Semi-log model: ln Yt = 1 + 2 Xt
3) Linear model: Yt = 1 + 2 Xt
4) Growth models: 4 ln Yt = 1 + 2 4 ln Xt
(For Midterm Review: p. 1-10 (Chapter 1- Simple linear regression), deviation form, p. 57-61 (Chapter 2: GLRM))
Summary 2. Recall the normal equations that are determined by the condition
RSS
=0

which produces the (normal) equations
X t X =
OLS =
X tY
(X t X)1 X t Y
tU
.
where RSS = U
Definition 2.2. We say that A and B are orthogonal (A B) when
At B = B t A = 0
3
Winter 2013
is a projection of Y = X
+U
since X
= X +U onto the column space of X with orthogonal
Remark 2.1. Note that X U
. This also can be shown using the above normal equations.
component U
Corollary 2.1. (The following are found in p. 61-69 in the course book)
(1) is unique.
(2) You can find Y = X by projecting Y onto the column space of X.
Remark 2.2. Any idempotent and symmetric matrix is a projection matrix and for any idempotent matrix, its rank is equal to
. [Called
its trace. Using this, note that the linear operator M = (I ProjX ) = (I X t (X t X)1 X) applied to U produces U
result # 11]
2.1
The Classicial Assumptions of the GLRM
1. The model is true as specified (below are some examples)

(a) Over identification or adding irrelevant variables (too many variables)
(b) Underfitting or omitting a relevant variable (too few variables)
(c) Wrong functional form (e.g. linear model instead of log-linear model)
2. The Xs are non-stochastic in repeated sampling; X is treated as constant; if not satisfied, this could indicate a sampling
problem
3. The model is linear in the parameters and the error term; it is a linear function in the polynomial ring with coefficients
span{Parameters,Error}
4. X t X is of full rank; nullity(At ) = 0; no multi-collinearity
5. Assumptions related to the disturbance term Un1 ; if satisfied, the error is said to be white noise:
(a) If assumption 1. is satisfied, then E[U ] = 0n1 ; E[U |X] = 0n1
2
(b) Homoskedastic error term; V ar[U ] = n1
(c) No serial correlation between the errors; Cov(t , s ) = 0, t 6= s

Notation 1. We first define notation for the assumptions for simple regression
(5a) E[ut |x1t , ..., xkt ] = E[ut ] = 0
(5b) V ar[ut ] = E[ut 0]2 = E[u2t ] = u2
(5c) Cov[ut , us ] = E[(ut 0)(us 0)] = E[ut us ] = 0, s 6= t
And now for general regression (matrix form):
(5a) E[Un1 ] = 0n1
(5b) V ar[U ] = E (U E[U ])
= E U E[U ] U E[U ] = E[U U t ] = u2 I which is a diagonal matrix with

| {z }
| {z }
0n1
0n1
diagonal entries equal to the error variance
Winter 2013
ESTIMATES AND ESTIMATORS
Estimates and Estimators
Summary 3. In general, any estimator (formula) coming from any method of estimation should satisfy certain properties to
ensure its reliability. These differ from large and small samples. For small samples (n 30), it should have:
E[]
=
1) Unbiasedness: for ,
is small
2) Minimum Variance/Efficiency: V ar()
For large samples (n ), it should have:
1) Consistency: limn =
2) Asymptotic Normality: n = N
Summary 4. We investigate a few properties of OLS = (X t X)1 X t Y . First, we find a few key facts:
a) First note that
= (X t X)1 X t (X + U ) = + (X t X)1 X t U
and so
= + (X t X)1 X t E[U ] =
E[]
by assumption 2 that says that X is non-stochastic and assumption 5a). So is unbiased
b) Next, lets take a look at the variance
i
V ar k1 =
h
V ar(0 )
Cov(0 , 1 )
Cov(0 , 1 )
V ar(1 )
Writing this in an alternate form,

=E
V ar[]

t
E[] E[]
=E

kk

t
kk
and recall from part a), the equations

=
(1) = (X t X)1 X t (X + U ), (2) = + (X t X)1 X t U, (3) E[]
and from (2) we get

= E (X t X)1 X t U U t X(X t X)1 = (X t X)1 X t X(X t X)1 E[U U t ] = (X t X)1 V ar[U ]
V ar[]
2
So using the form of V ar[U ] = U
I, we get that
= 2 (X t X)1
(4) V ar[]
u
tU
; that is U
is a proxy for U and RSS could be a
Thus, we need an estimator for u2 . The first guess would be RSS = U
2
proxy for u . However, note that this is slightly biased. To see this, first note that

tU
] = E (M U )t (M U ) = E (U t M t M U ) , M = In X(X t X)1 X t
(5) E[U
and
(6) Rank(M ) = tr(M ) = n tr((X t X)1 X t X) = n tr(Ik ) = n k
Continuing from (5), since M is idempotent, note that
(7) RSS = U t M t M U = U t M U
and that for a general n 1 vector e, we have
(8) et e = tr(eet )
So finally, using all equations,
E[RSS] = E[(M U )t (M U )] = E[tr(M U (M U )t )] = E[tr(U U t M t M )] = E[U U t tr(M )]
5
Winter 2013
ANALYSIS OF VARIANCE (ANOVA)
and thus
(9) E[RSS] = (n k)E[U U t ] = u2 (n k)
To create an unbiased estimate then, we use the estimate
(10)
2 =
RSS
nk
We then have
=
(11) V ar[]
2 (X t X)1
c) (Gauss-Markov Theorem) We now show that our estimate OLS is efficient and is the best linear unbiased estimator
(BLUE).
See the course notes for the proof.
The formal statement of the theorem is that: In the class of linear and unbiased estimators, it can be shown that OLS has the
minimum variance. That is
h
i
h i
V ar OLS V ar M
for any other method M that is linear and unbiased. Thus, OLS is the BLUE.
Analysis of Variance (ANOVA)
Remark 4.1. The mean of the residuals is 0. That is, E[ut ] = 0 . This can be seen from the normal equation
X t X X t Y = 0
= 0.
. This is because X1l = 1 for all l = 1...k and so P u
or the fact that X U
t = 0 = u
Definition 4.1. Recall that
Yt = Yt + u
t =
|{z}
T otal
X
|{z}
Explained
u
t
|{z}
Residual
We construct the ANOVA table as follows, where everything is expressed in deviation form. So summing and dividing by n
on the above equation, we get
1 + ... + n X
n + 0 = X

Y = 0 + 1 X
and subtracting the above equation from the first, while squaring the result, we get
X

2 X
2
yt2 = 0 + 1 x1,t + ... + n xn,t + ut =
x + u
t
For simple regression (p. 21-24), we get

X
yt2
| {z }
X
xt 1 + u
t
2
T SS
= 12
x2t + 21
12
x2t
{z
xt u
t +
| {z }
u
2t
XU
ESS
u
2t
| {z }
RSS
We use the notation that ESS is the explained sum

of squares, and T SS is the total sum
P of squares,PRSS is the residual
P sum
of squares, all in deviation form. So ESS = 12 x2t , RSS = u2t and T SS =
yt2 . The actual ANOVA table looks like
Winter 2013
Source
Explained
Residual
Total
SS (Sum of Squares)
P 2
2
xt
ESS = 1P
RSS = P u
2t
T SS = yt2
Df (Degrees of Freedom)
k1
nk
n1
ANALYSIS OF VARIANCE (ANOVA)
MSS (Mean SS)

ESS/(k 1)
RSS/(n k)
T SS/(n 1)
Note that for a simple regression, k = 2. The corresponding F statistics is

FStatistic =
ESS/(k 1)
RSS/(n k)
and the coefficient of determination or R2 (a crude estimate for the correlation coefficient) is
R2 = 1
RSS
T SS RSS
ESS
=
=
T SS
T SS
T SS
The interpretation for R2 is that it is a measure of the goodness of fit. It shows how much, in percent, the variation of the
dependent variable is being explained by the Xs of the model.
4.1
Adjusted R2 Statistic
Remark 4.2. The drawback of the coefficient of determination is that is only improves by adding more xs (explanatory
variables). This might not always be feasible, so use the adjusted R2 , defined by
2 = 1 RSS/(n k)
R
T SS/(n 1)
and this is a better measure since it includes the number of observations; that is, it can be improved by increasing the number
of observations. It can be shown (p. 24) that
2 = 1 (1 R2 ) n 1
R
nk
4.2
Generalized ANOVA
The following can be found in p. 76 in the course notes.

Definition 4.2. Starting with the general RSS, we recall that
RSS
tU
t (Y X )
(Y X B)
Y t Y 2t X t Y + t X t X
|{z}
(X t X)1 X t Y
t
= Y Y 2 X Y + X Y
= Y t Y + t X t Y
X
=
Yt2 t X t Y
and so
P 2
u
2t + t X t Y = RSS + t X t Y =
Yt . Subtracting nY 2 from both sides, we get
X
X
yt2
u
2t + t X t Y nY 2 =
|
{z
}
| {z }
| {z }
RSS
ESS
Thus, the general ANOVA table is
T SS
Winter 2013
Source
Explained
Residual
Total
SS (Sum of Squares)
ESS = t X tP
Y nY 2
RSS = P u
2t
T SS = yt2
STATISTICAL INFERENCE AND THE GLRM
Df (Degrees of Freedom)
MSS (Mean SS)
k1
nk
n1
ESS/(k 1)
RSS/(n k)
T SS/(n 1)
The F statistic and the coefficient of determination are defined in the same way as in the previous section (in terms of
T SS, ESS, RSS).
Summary 5. Lets recap all relevant information up to this point.
1. True model: Y = X + U
= Y Y
2. Estimated Residual: U
3. ANOVA
(a) The division
+U
= XB
X
t
t
= X Y nY 2 +
u
2 +
|
{z
} | {z }t
Y
Y Y nY
| {z }
t
T SS
RSS
T SSRSS
T SS =
T SS
ESS/(k1)
FStatistic = RSS/(nk)
(b) R2 = 1
(c)
2 = 1
(d) R
RSS/(nk)
T SS/(n1)
ESS
RSS
ESS
T SS
n1
= 1 (1 R2 ) nk
4. Regression
(a) = (X t X)1 X t Y
= ; is unbiased
(b) E[]
= 2 (X t X)1 ; is unbiased
(c) V ar[]
u
(d) is the BLUE [Guass-Markov]
(e)
u2 =
RSS
nk
Statistical Inference and the GLRM
We start with some basic results from statistical theory.

(1) Recall that if X N (, ), then
"

2 #
1 x
1
exp
fX (x) =
2
2
(2) Z =
is distributed as Z N (0, 1).
(3) The sum of squares of n independent standard normal variates is distributed as 2 (n) with n degrees of freedom. That is,
Zi N (0, 1), Zi Zj , i 6= j =
n
X
Zi 2 (n)
i=1
(4) (W. Gosset) The ratio of a standard normal variable Z over the square root of a chi-square distributed r.v., V , over its
degrees of freedom, r, is distributed as a tdistribution with rdegrees of freedom, provided that Z V . That is,
Z
q
t(r), Z V
V
r
Winter 2013
(5) The ratio of two chi-square random variables over their corresponding degrees of freedom gives a Fisher F distribution,
provided that the r.v.s are statistically independent. That is, if U (r1 ) and V (r2 ) then
U
r1
/ rV2 F (r1 , r2 ), U V
(6) Any linear combination of a set of normal random variables is also normal with different mean and different variance.
Example 5.1. Let Yt = 0 + 1 Xt + ut , Yt = 0 + 1 Xt and u
t = Yt Yt . If ut is normally distributed with mean 0 and
variance u2 , then ut N (0, 2 ) for all t. Then, Yt = 0 + 1 Xt + ut is Yt N (0 + 1 Xt , u2 ). So if ut is normal, then Yt is
normal with the same variance.
5.1
Single Variable Inference
(7) Hypothesis Testing + Confidence Intervals / Inference:

Hypothesis Testing
1. Formulate the hypothesis: H0 , H1
(a) (Example 1) Suppose we are given a model and estimate:
Const
\t
Cons
= 0 + 1 Incomet + ut
=
10 + (0.8)Incomet
We claim that 1 = 0.9. The null hypothesis is the claim (H0 : 1 = 0.9) and the alternate hypothesis is the
objective, goal, or defense of the study (H1 : 1 6= 0.9).
(b) (Example 2) Testing if income is a significant variable in your model gives H0 : k = 0, H1 : 1 6= 0 (called the Test
of Significance of One Parameter).
(c) (Example 3) Using the model in Ex. 1, we claim that we expect the sign of 1 is positive. We have H0 : 1 0,
H1 : > 0 (called the Test of Expected Sign of a Coefficient)
2. Create the test statistic
(a) (Example 1) From above, we need a distribution of the estimator of 1 . To do this, we need some assumptions
regarding the disturbance term ut . Thus, we assume it to be standard normal for all t (Assumption 7) which is
needed to perform inference. It can be shown that
t=
1 1
t(n k)
sd(1 )
(b) Aside. We do not need the normality assumption above to ensure that OLS is B.L.U.E.
3. Decision (critical value vs. statistic OR p-value)
(a) (Example 1) From above, we must make a decision by comparing
tStatistic > tCritical [t table]
5.2
Inference in the GLRM
Suppose that
Y = X + U, Un1 N (0n1 , 2 In )
and using result (6), we have that
Yn1 N (X, 2 In )
Winter 2013
and since = (X t X)1 X t Y which is a linear combination of a normal r.v., we have

N (, 2 (X t X)1 )
where we estimate 2 with
2 =
RSS
nk .
The following is the general framework for testing, called the R test.
5.3 R Framework
Example 5.2. Let k = 5, Yt = 1 + 2 X2t + ... + 5 X5t + ut and we are testing the hypothesis that 1 + 2 + 3 = 0. That is
H0 : 1 + 2 + 3 = 0, H1 : 1 + 2 + 3 6= 0
Suppose we have q restrictions, where
rq1 = Rqk k1
Then since q = 1 in this case,
r = 011 = R =
0
1
1
0
0
0
and we can reform the hypothesis as H0 : R = r and H1 : R 6= r.

Example 5.3. If the restriction is 1 + 2 + 4 = 1 and 3 = 0, then

1
1 1
r=
= R =
0
0 0
Definition 5.1. Let = (X t X )1 X t Y be the unrestricted OLS . Then let R be the restricted OLS under H0 : R = r. The
derivation of the formula for R is as follows. We want to
min RSSR
{R }
Rt U
R
= U
subject to
R = r
and using Lagrange multipliers, define

L = (Y X R )t (Y X R ) + tq1 [rq1 RR ]
{z
} |
|
{z
}
11
11
and using the first order condition,

(1)
L

R
= 0 = 2X t Y + 2X t X R Rtkq q1 = 0
k1
and the second order condition

(2)
L
= 0 = r RR = 0
q1
From (1), can not be defined because R is probably singular. To give a definition for , we multiply (1) by R(X t X)1 to
get
2R(X t X)1 X t Y + 2R(X t X)1 X t X R R(X t X)1 Rt = 0 = 2R + 2RR R(X t X)1 Rt = 0
and under H0 , we can rewrite this as
(3) 2R + 2r R(X t X)1 Rt = 0
=
=
R(X t X)1 Rt = 2R + 2r

1
= 2 R(X t X)1 Rt
(r R)
and plugging (3) back into (1), we can solve for R to get
(4) R = + (X t X)1 Rt R(X t X)1 Rt
1
(r R)
Note here that p. 86 and 87 in the textbook is just extra and not
where our restricted is a function of the unrestricted .
10
Winter 2013
testing material. It can also be shown that

E[R ] =
V ar[R ] =
u2 (I AR)(X t X)1 (I AR)t
1
A = (X t X)1 Rt R(X t X)1 Rt
which is used in the construction of confidence intervals.
Summary 6. In short, given
H0 : R = r and H1 : R 6= r
then
= (X t X)1 X t Y
1
R = + (X t X)1 Rt R(X t X)1 Rt

(r R)
5.4 R Test
Summary 7. There are 3 important tests that we use the above framework (R framework) for.
(1) Testing the significance of ONE coefficient [t-test] (two-sided)
(2) Testing the expected sign of ONE coefficient [t-test] (right/left sided)
(3) Testing the significance of the WHOLE relation [F -test] (based on the ANOVA table)
5.5
Single Restriction R Test
Lets test the validity of one single single restriction. The applications include (1) testing the significance of one parameter
(2) testing the expected sign of the coefficient. Let H0 : R = r, H1 : R 6= r.
Given U N (0, u2 In ) then
U 0
N (0, In ),
u
U
u
t
U
u

=
U tU
X(n)
u2
and it also follows that Y = X + U is normal such that Y N (X, 2 In ) and since = (X t X)1 X t Y then
N (, 2 (X t X)1 ). We also use
tU
RSS
U
tU
= (n k)
u2 =
=
= U
u2
nk
nk
and from the above,
R N (R, 2 R(X t X)1 Rt )
Define
Z=p
R R
2 R(X t X)1 Rt
=p
r R
2 R(X t X)1 Rt
N (0, 1)
by H0 . Since 2 is not known , we use the estimate for 2 above instead.

Claim 5.1. The statistic using
u2 instead of u2 to get Z t(n k).
Proof. Let M = I P = I X(X t X)X t . From the above, note that
t t

U tM U
U tM tM U
U M
MU
=
=
2 (n k)
u2
u2
u2
u2 nk
then
(which is a sum square of normal r.v.s) and since M U = U
!
!
t
tU
U
U
U
(n k)
u2
= 2 =
2 (n k)
2
2
2
u
u
u
u
11
Winter 2013
and so
(13) p
r R
2 R(X t X)1 Rt
1
2
(nk)
u
2
(nk)u
=p
r R
2 R(X t X)1 Rt
t(n k)
if and only if Z and V are independent (proof in textbook).
5.6
Applications
(1) Test the significance of one parameter

Equation (13) simplifies to t =
j N ull
B
.
sd(j )
Example 5.4. For a simple linear regression, we know that
2 X 2 X 2
V ar(1 ) = P u 2 ,
xt =
Xt n
x
xt
and we want to test the significance of 1 . We are given sd(1 ) = 0.1, 1 = 0.8. Here, lets say that Yt is consumption and Xt
is income. Testing the significance gives us
1) H0 : 1 = 0 = r = R, H1 : 2 6= 0 = r 6= R.
2) t = 1 0
V ar(1 )
t(n k) t
rR
2 R(X t X)1 Rt
t(n k). To see this, note that

R = 1 , r = 0
and
(X t X)1 =
1
n
Xt2 (
P 2
PXt
2
Xt
Xt )
P
n
Xt
= Rt (X t X)1 Rt =
1
n
P 2
P
2 =
Xt n
x
Xt2 ( Xt )
which in deviation form, reduces to

1
2
R(X t X)1 Rt = P 2 =
u2 R(X t X)1 Rt = P u 2
xt
xt
and so
q
Thus our statistic is
s
p
V ar(1 ) =
u2 Rt (X t X)1 Rt =
2
P u 2 = sd(1 )
xt
1 0
0.8 1
tStatistic = q
=
=8
0.1
V ar( )
1
and we reject the hypothesis based on the tCrtical tT abulated defined by

2P (tEstimator tCritical ) =
for a level of significance in a two tailed test. It is rejected when tStatistic > tCritical . Therefore this parameter 1 , income,
is significant with a 95% confidence level.
To construct the confidence interval, we want the interval determined by

2
2
P r 1 sd(1 )tnk
< 1 < 1 + sd(1 )tnk
=
P r (0.064 < 1 < 0.996)
(1 )
0.95
where we interpret this as: we are 95% confident that the interval [0.604, 0.996] contains 1 .
12
Winter 2013
Proposition 5.1. The following are equivalent (interchangeable amongst numbers)

1) H0 : R = r, H1 : R 6= r
2) (H0 , H1 ) [Test of significance] j = 0, j 6= 0 {Two-sided}; [Expected sign] j ()0, j < (>)0 where this is the left (right)
side {One-sided}; [Test of any claim] j = any value, j 6= same value {Two-sided}
Remark that it can also be shown that, in the case of a single restriction, the quantity
t2statistic
j N ull
B
sd(j )
!2
=F =
(R r)t [R(X t X)1 Rt ](R r)/q

tU
)/(n k)
(U
for the case of a single restriction.
5.7
One-sided vs. Two-sided Tests
In the two sided test, the significance level is divided on both sides of the given distribution while in the one sided test, the
significance level is entirely placed on one tail of the distribution.
5.8
Multiple Restriction R Test
It can be shown that

F =
5.9
(RSSR RSSU N ) /q
=
tU
)/(n k)
RSS/(n k)
(U
Test of the Goodness of Fit
This test is based off of the ANOVA table and we sometimes refer to it as the overall significance of the whole relation. The
statistic in question is
ESS/(k 1)
F (k 1, n k)
F =
RSS/(n k)
Example 5.5. (The Theory of the Allocation of Time, Gary Becker)
Suppose that we record st = your test score, Tt = study time measured in hours, Et = your consumption of energy drinks.
Define Yt = ln St , X2t = ln Et and X3t = ln Tt . Our model will be
St = Et2 Tt3 et = ln St = ln + 2 ln Et + 3 ln Tt + t
which can be rewritten as Yt = 1 + 2 X2t + 3 X3t + t where 1 = ln . Suppose that n = 10. If the data is in deviation
form,
yt = 2 x2t + 3 x3t + t
and we are given
t
xx=
2
1
1
3
,x y =
1
8
X
,
yt2 = 48.2
We want to:
1) Estimate 2 , 3 , SE[2 ], SE[3 ] and explain the meaning of the coefficients
Use the least squares unrestricted strategy = (xt x)1 xt y with

1
1
1
3 1
3 1
0.6 0.2
t
xx
=
=
=
1 2
1 2
0.2 0.4
61
5

0.6 02
1
1
to get =
=
= 2 = 1, 3 = 3 with Yt = 0 + 1X2t + 3X3t . The coefficients represent the
0.2 0.4
8
3
coefficient elasticity of test scores. That is, a 1% increase in energy drinks increases test scores by 1% (unit elastic). Similarly,
13
Winter 2013
a 1% increase in time studied increases test scores by 3%. To get the variance, covariance, calculate
where u2 =
RSS
nk
=
V ar[]
u2 (xt x)1
P 2
and RSS = T SS ESS, T SS =
yt = 48.2, ESS = t xt y = 23. So

= 25.2
V ar[]
10 3
0.6
0.2
02
0.4

= 3.6
0.6
0.2
02
0.4

=
2.16 0.72
0.72 1.44
and V ar[2 ] = 2.16, V ar[3 ] = 1.44, Cov[2 , 3 ] = 0.72.

2) Test the hypothesis that 2 = 3 using a t-test
Start with H0 : 2 3 = 0 and H1 : 2 3 6= 0. The t-statistic is
(2 3 ) 0
(2 3 ) 0
130
= 1.36
=q
=
tstatistic = q
2.16
V ar(2 3 )
V ar(2 ) + V ar(3 ) 2Cov(2 , 3 )
The tcritical = 2.365 is based on = 5%, and df = 10 3 = 7. Hence, since |tstatistic | < |tcritical | = We dont reject H0 . So
we have a problem in our results.
3) Re-estimate the coefficients imposing the restriction R (1 = 2 ).
The new model with the new restrictions is of the form
yt = 3 (x2t + x3t ) + ut
We now estimate 2R and 3R usingd

1
R = + (xt x)1 R(xt x)1 Rt

(r R)
where
r = R = [0] =
and so 2R = 3R =

2
3

1
5
= 0.6 = R(xt x)1 Rt
=
3

1
5
1
0.6 0.2
1
1
= 2 = R =
+
2=
3
3
0.2 0.4
1
3

R(xt x)1 Rt = 1
r R = 0
0.6
0.2
0.2
0.4
1
1
7
3
7
3
7
3
4) Construct the 95% C.I. for the restricted 2

First note that

1
V ar[R ] =
u2 (I AR)(xt x)1 (I AR)t , A = (xt x)1 Rt R(xt x)1 Rt
where using our calculation above gives us

A=
2
3
13
t
and bashing out some numbers and matrices gives us

1 2
0.6
3
3
V ar[R ] = 3.6
1
2
0.2
3
3
so V ar[2R ] = 0.56 = SE[2R ] =
1
3
1
3
1
3
2
3
= I AR =
0.2
0.4
/2=2.5%
0.56. Now since tnk=7

1
3
2
3

=
0.56 1.12
1.12 2.24
= 2.365 then our confidence interval becomes
7
2.365 0.56 [0.56, 4.13]
3
and we say that we are 95% confident that this interval contains 2R .
14
2
3
2
3
Winter 2013
5) Test the same hypothesis using a general R test with an F distribution and conclude t2 (from 2) = F (from 5)
Use the hypotheses: H0 : r = R, H1 : r 6= R. We know that
F =

=
F (q, n k)
tU
)/(n k)
u2
(U
and calculating R rt gives us R r =
F =
1
3

0 = 2. Hence the statistic is
(2)( 53 )(2)/1
= 1.85 = F = 1.85 = t
3.6
So this is equivalent to the single restriction case. now the F critical value is
FCritical ( = 5%, df1 = q = 1, df2 = n k = 7) = 5.59
and so we do not reject H0 which is the same conclusion as in 2).
6) Test the significance of the whole relation
Here, we use the hypotheses: H0 : 2 = 3 = 0, H1 : 2 6= 0 OR 3 6= 0 OR 2 6= 3 6= 0. The statistic in this case is
F =
where k = 3, n k = 7, T SS =
and our FCritical is
ESS/(k 1)
F (k 1, n k)
RSS/(n k)
yt2 = 48.2, ESS = t xt y = 23 and RSS = T SS ESS = 25.2. Thus, F =
23/2
25.2/7
= 3.19
FCritical ( = 5%, df1 = k 1 = 2, df2 = n k = 7) = 4.74

Since F < FCritical , we do not reject H0 and our relation is insignificant. You will need to correct your specification or adjust
your sample.
7) Test the significance of Energy drinks in the model
This is a simple 2-sided t-test so it will be left as an exercise to the reader.
8) Test you belief that time elasticity of test score is elastic
We want to test that 3 > 1 so we use the hypotheses: H0 : 3 1 and H1 : 3 > 1. Our t-statistic is
tStatistic =
3 1
31
=
= 1.67
1.44
SE[3 ]
and our tCritical is

tCritical ( = 5% (right tailed), n k = 7) = 1.895
Since our tStatistic < tCritical then we do not reject H0 .
Summary 8. Here are the relevant pages and content for the midterm
CH. 2, P. 57 (Algebra, properties, geometry of LS, projection, residual matrices, derviations)
P. 67, S. 2.4.3 (Read), P. 68 to beginning of 69
P. 71 (LS Estimators; IMPORTANT)
P. 75 (Everything but Gauss-Markov derivation; will need to known the Theorem, though)
P. 76 (ANOVA Tables), Eq. 2.6.9., P. 77
P. 78-79 is NOT required EXCEPT for the formulas for bottom of P. 79
CH. 3, 83-86 (Proofs for P. 85 and P. 86 are not required), Eq. 3.20, Eq. 3.16 ONLY
15
Winter 2013
P. 87-89 (Hypothesis testing) where 88-89 is just reading

P. 90-94 (Setting up R, validity testing)
P. 94-100 are VERY IMPORTANT
P. 101, S. 3.5.
P. 101-104 READ
P. 104-106
Other
2 , F test and significance tests (t-test) to check validity. Refer back to the theory from field that is being
Use the R
studied to check validity. These 4 methods help check and validate the model.
The midterm itself will be one question with 14 requirements, equally weighted. There is some computation and some theory.
Some proofs will be included. t-Tables and F-tables will be provided.
5.10
Tests for Multiple Restrictions
In this section (p. 87), we work under the R framework under multiple restrictions:
H0 : R = r, H1 : R 6= r, q 6= 1
We claim that
F =

F (q, n k)
tU
)/(n k)
(U
Proof. Under H0 and the normality of U ,

( ) N (0, u2 (X t X)1 ) = (R R) N (0, u2 R(X t X)1 Rt )
and with H0 we get (R r) N (0, u2 R(X t X)1 Rt ). We then have
(R r)
p
N (0, 1)
u2 [R(X t X)1 Rt ]
where this quantity is q standard normal variates. The sum of squares of this is
(1) (R r)t u2 [R(X t X)1 Rt ]
We also know that
(2)
and so
F =
1
(R r) 2 (q)
U tM U
u2 (n k)
=
(n k)
u2
u2
(R r)t [R(X t X)1 Rt ]1 (R r)/q

Eq(1)
=
F (q, n k)
tU
/n k
Eq(2)
U
provided that the r.v.s in Eq. (1) and Eq. (2) are independent. Note that this is also equivalent to
F =
(RSSR RSSU N )/q

RSSU N /(n k)
16
Winter 2013
References
[1] William Kong, Stochastic Seeker : Resources, Internet: http://stochasticseeker.wordpress.com, 2011.
REFERENCES
Winter 2013
Appendix A
APPENDIX A

Stat371 CouseNotes S13

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Stat371 CouseNotes S13

Caricato da

Copyright:

Formati disponibili

Winter 2013

STAT 371 (Winter 2013 - 1135)

2 The General Linear Regression Model (GLRM)

3 Estimates and Estimators

4 Analysis of Variance (ANOVA)

5 Statistical Inference and the GLRM

THE GENERAL LINEAR REGRESSION MODEL (GLRM)

The General Linear Regression Model (GLRM)

THE GENERAL LINEAR REGRESSION MODEL (GLRM)

and using first order conditions, we want

yt = (Yt Y ) which we call deviation form.

THE GENERAL LINEAR REGRESSION MODEL (GLRM)

and we can deduce 1 using

Example 2.3. Let the true model be

and so 2 = 0.7, 3 = 0.2. Thus, our model is

Summary 1. Here are a few relevant statistical models:

THE GENERAL LINEAR REGRESSION MODEL (GLRM)

The Classicial Assumptions of the GLRM

1. The model is true as specified (below are some examples)

(c) No serial correlation between the errors; Cov(t , s ) = 0, t 6= s

(5b) V ar[U ] = E (U E[U ])

= E U E[U ] U E[U ] = E[U U t ] = u2 I which is a diagonal matrix with

diagonal entries equal to the error variance

ESTIMATES AND ESTIMATORS

Estimates and Estimators

Writing this in an alternate form,

and recall from part a), the equations

ANALYSIS OF VARIANCE (ANOVA)

Analysis of Variance (ANOVA)

For simple regression (p. 21-24), we get

We use the notation that ESS is the explained sum

ANALYSIS OF VARIANCE (ANOVA)

MSS (Mean SS)

Note that for a simple regression, k = 2. The corresponding F statistics is

The following can be found in p. 76 in the course notes.

Thus, the general ANOVA table is

STATISTICAL INFERENCE AND THE GLRM

MSS (Mean SS)

Statistical Inference and the GLRM

We start with some basic results from statistical theory.

is distributed as Z N (0, 1).

STATISTICAL INFERENCE AND THE GLRM

Single Variable Inference

(7) Hypothesis Testing + Confidence Intervals / Inference:

Inference in the GLRM

STATISTICAL INFERENCE AND THE GLRM

and since = (X t X)1 X t Y which is a linear combination of a normal r.v., we have

and we can reform the hypothesis as H0 : R = r and H1 : R 6= r.

and using Lagrange multipliers, define

and using the first order condition,

and the second order condition

STATISTICAL INFERENCE AND THE GLRM

testing material. It can also be shown that

R = + (X t X)1 Rt R(X t X)1 Rt

Single Restriction R Test

by H0 . Since 2 is not known , we use the estimate for 2 above instead.

STATISTICAL INFERENCE AND THE GLRM

if and only if Z and V are independent (proof in textbook).

(1) Test the significance of one parameter

Example 5.4. For a simple linear regression, we know that

t(n k). To see this, note that

which in deviation form, reduces to

Thus our statistic is