Dr. Etazaz Econometrics Notes PDF

ECONOMETRICS LECTURES OF DR.
EATZAZ AHMED
Quaid-e-Azam University Islamabad
Page - 1 -
ECONOMETRICS LECTURES OF DR. EATZAZ AHMED
conometrics
It is a subject in which we formulate mathematical relationship among economic

variables on the basis of knowledge of economic theory and there estimate, numerical values
of the parameters in this relationship using the actual data.
Classical Linear Regression Model:
Suppose we want to analyze a variable Y using the data; Yi (i = 1, 2, 3
n). In the most simple analysis well like to represent the whole data Y1, Y2, Y3
...Yn, by a single number. We can formulate a model for this purpose which looks like this
way;
Y=+Y
Or
Y=+U
[U = Y ]
Thus Y is set equal to a constant () plus the discrepancy (difference) between Y
and the presumed constant (). The equation is what is the most appropriate interpretation of
?
If we set the average of errors E (U) = 0
Then well have
=> E (Y ) = 0
=> E (Y) = 0
=>
E (Y) = .
[Pop Mean of Y]
With this interpretation, we can write the model as;
Y = E (Y) + U
U = Y E (Y).
Example:
Pop Mean = 20 years
Person age = 3 years
Y = E (Y) + U
Y = 20 + 3
Y = 23 years.
And if Person age = -1 year
Y = 20 + (-1)
Y = 19 years.
In the statistics, we learnt how to estimate population mean using a random sample.
In this course, we will repeat the some exercises using a different approach.
We start with the model.
Y=+U
Suppose this model is imposed on data;
Y1, Y2, Y3 ...Yn this amounts to;
Page - 2 -
Y1 = + U1
Y2 = + U2
Y3 = + U3
:
Yi = + Ui
:
:
Yn = + Un
The estimation of depends on assumptions of the model. The Classical
assumptions are as follows;
1). Ui is a random variable for each i.
This means Ui is a random variable for that U1, U2, U3 Un are all random
variables.
Random Variable: random variable is that which can take at least two values with non
zero probability].
Ui is one out of infinite values, each have infinite values.
Time is fixed variable.
Age is not random variable.
Weight is random variable.
2). E (Ui) = 0 for each i.
On average errors are equal to zero.
Since Ui = Y => Yi E () = E (Ui)
This assumption holds by construction.
3). Var (Ui) = for all i.
All errors terms have the same variance, this assumption is known as
Homoscedasticity assumption and if assumption violated well have Hetroscedasticity.
4). Co-Var (Ui, Uj) = 0 of all i j.
Time series data they are correlated but not in cross section data.
If co-Var (Ui, Uj) 0 for some i j then we say that Ui is Auto correlated with Uj different
time at one variable (for example food expenditure).
5). Ui is distributed normally.
Some times we also make the assumption that;
Ui ~ N [Ui is distributed normally]
It is also challengeable assumption.

Page - 3 -
Y E (Y)
0 800
[Mean income]
Or
1000000 800
Estimation of
Let
is an estimator of .
[Where = Y - ]
is regression error or estimation error or residual.

One way to approach estimation is to focus on and choose such an estimator which
minimizes the error.
Suppose we attempt to minimize i
Examples
1). i
2). i
10
0
-10
0
20
1
-20
0
i = 0
i = 1
Choosing (1) it is preferred wrong criteria.
Suppose we minimize |i| (ignoring signs)
Examples
1). i
5
-5
5
-5
|i |= 20
2). i
0
0
19
0
|i |= 19
Choosing (2) is wrong criteria.
Page - 4 -
We should minimize weighted some of errors such that larger error are assigned
greater weights.
Suppose we set weights proportional to absolute size of error, so set;
i = |i |
Now minimize i |i |
Min |i | |i |
Min |i |
Min i
The estimator , which min i is known as Ordinary Least Square (OLS) method
Y= + U
[Basic equation]
Estimation:
where [E (U) =0]
Regression error or residual
e=Ye=YMin ei
[Ordinary Least Square Estimator]
OLS estimator of :
Min ei = (Y - )
First-Order condition
_ [(Y1 - ) + (Y2 - ) + (Y3 - ) + ..+ (Yn - ) ] =0
[2(Y1 - ) (-1) + 2(Y2 - ) (-1) + 2(Y3 - ) (-1) + . + 2 (Yn - ) (-1)] =0
Page - 5 -
-2[Y n ] =0 divide both sides by -2 and n
OLS estimator of is mean of Y.

Some properties of :
1).
has min sum square of errors ei.
ei = (Yi - )
= (Yi = y
2).
is a random variable of Ui.

= 1_ [Y1 + Y2 + Y3 + + Yn]
n
= 1_ [( + U1) + ( + U2) + ( + U3) +. + ( + Un)]
n
= 1_ [n + (U1 + U2 + U3 +. + Un)]
n
= + (1_ U1 + 1_ U2 + 1_ U3 +. + 1_Un)] equation (i)
n
n
n
n
= a0 + a1 U1 + a2 U2 + a3 U3 +. + an Un)] equation (ii)
Where a0= , ai= 1_ and so on for all as.

n
Equation (ii) show that is a linear function of random variable of U1, U2, U3, .. , Un.
.:
3).
must be random.
is Unbiased.
Proof:
E ( ) = E ( + ( 1_U1 + 1_U2 + 1_U3 +. + 1_Un)]
n
n
n
n
= + (1_ E (U1) + 1_ E (U2) + 1_ E (U3) +. + 1_ E (Un)]
n
n
n
n
= + (1_ (0) + 1_ (0) + 1_ (0) +. + 1_ (0)]
n
n
n
n
[As we know that E (Ui) =0]
E ( ) = .
Page - 6 -
4).
has minimum variance in the class of linear unbiased estimator.
Proof: (a).
Var ( ) = E [ - E ( )]
= E [ + 1_ U1 + 1_ U2 + 1_ U3 +. + 1_ Un - ]
n
n
n
n
= E [1_ (U1 + U2 + U3 +. + Un)]
n
= E [1_ (U1 + U2 + U3 +. + Un) ]
n
= 1_ E [U1 + U2 + U3 +. + Un + ij (Ui, Uj)]
n
= 1_ [E (U1) + E (U2) + E (U3) +. + E (Un) + ij E (Ui, Uj)]
n
= 1_ [ + + +. + + ij (o)]
.: {co-var (Ui, Uj) =0}
n
= 1_ n
n
equation (a)
=
n
(b). Now consider any linear unbiased estimator.
(i)
* = b1Y1 + b2Y2+ b3Y3+ b4Y4+ . + bnYn
Where b1, b2, b3, b4 bn are constants obviously * is linear, to make *

unbiased, we set E (*) this implies the following;
E (*) =
E (b1Y1 + b2Y2+ b3Y3+ b4Y4+ . + bnYn) =
E [b1 (+U1) + b2 (+U2) + b3 (+U3) + . + bn (+Un)] =
E [ (b1 + b2 + b3 + b4 + + bn) + b1U1 + b2U2 + b3U3 + . + bnUn] =
(b1 + b2 + b3 + .. + bn) + b1 E (U1) + b2 E (U2) + b3 E (U3) + . +bn E (Un)] =
bi + b1 (0) + b2 (0) + b3 (0) + . +bn (0)] =
bi =
bi = 1.
(ii). Now compute variance *.
Var (*) = Var (b1Y1 + b2Y2+ b3Y3+ b4Y4+ . + bnYn)
= Var (b1Y1) + Var (b2Y2) + Var (b3Y3) + . + Var (bnYn)
+ ij Co-var (biYi, bjYj)
= b1Var (Y1) + b2 Var (Y2) + b3 Var (Y3) + . + bn Var (Yn)
+ ij bi bj Co-var (YiYj)
= b1Var (U1) + b2 Var (U2) + b3 Var (U3) + . + bn Var (Un)
+ ij bi bj Co-var (UiUj)
Page - 7 -
= b1 + b2 + b3 + . + bn + ij bi bj (0)
= bi
(iii). Comparison between Var ( ) < Var (*) unless b1= 1 for all i.
n
Consider Var (*) and minimize it by choosing bi.
Min (b1, bn)
Var (*) = bi
Subject to bi = 1.
Make Lagrangian
L = (b1 + b2 + b3 + . + bn) + [1-(b1 + b2 + . . + bn)]
First-order conditions;
(i= 1,2,3n) --------------- (A)
L => L bi => 2 bi =0
bi
--------------- (B)
L => L => 1-(b1 + b2 + . . + bn) =0
bi = _ substitute in equation (B)

2
=> 1-( _ + _ + . . + _) =0
2 2
2
=> = 2
-------------- (C)
=> n_ =1
2
n
Substitute equation (C) into (A).
From equation (A)
=> 2 bi =
2 bi = 2
n
bi = 1_
n
=> *= b1Y1 + b2Y2 + b3Y3 + ..+ bnYn
= 1_Y1 + 1_Y2 + 1_Y3 + ..+ 1_Yn
n
n
n
n
= 1_ [Y1 + Y2 + Y3 + ..+ Yn]
n
= 1_ Yi
n
*=
Recap: the OLS estimator is linear, unbiased and has minimum variance in the class of
linear unbiased estimators, that is is best linear unbiased estimator (minimum variance).
i.e.:-
is BLUE
Page - 8 -
Comments on BLUE property:

1). linear:
is a linear function of Y.
= 1 Y1 + 1 Y2 + 1 Y3 + ..+ 1 Yn12
n
n
n
n
Theorem: If X1 ~ N, X2 ~ N, X3 ~ N.. Xn ~ N then any linear combination of X1, X2,
X3 Xn.
Z = a1X1 + a2X2 + a3X3 + .. + anXn ~N
By this theorem since Ui ~N and Yi = + Ui is linear function of Ui, we infer that Yi ~N
Further = b1Y1 + b2Y2 + b3Y3 + ..+ bnYn being a linear function of Y1 toYn is also
distributed normally;
Ui ~N _Lf__ Yi ~N _ Lf__
This
~N
[Lf = linear function]
is normally distributed ~N.
mportance of BLUE property:
Linearity is important to apply tools of statistical inferences because of the following

argument;
Ui ~N
Yi ~N
= ~N.
We can use standard tools of statistical inferences; we can say big things with limited
source of data. There is counter argument that the above chain of reasoning is too long and
unnecessary.
We can just assume that ~N. Linearity is not very important (indispensable). We have
more options of estimations. Unbiased ness this means E ( ) = .
.: if we draw all possible random samples of Y and estimate from each sample one by
one, then mean value of will be equal to .
Where
This property is desirable because we dont want to have any systematic error in
estimation but it is not indispensable.
Consider the following example;
Prob ( E < < +E) = 0.6
E( )=
unbiased
* estimator is biased
Prob ( E < * < +E) = 0.9.
E( )
We can see in figure that estimator biased could be better than unbiased.
Page - 9 -
is OLS
Best/Minimum Variance this means that Var ( ) < Var (*), where
estimator, * is any other linear biased estimator. If we compute with nonlinear or biased
estimator, the property doesnt help. BLUE property is desirable, unbiased limit our choices
and linearity also limit.
The above model determines E (Y) as a constant.
Y= +U
= E (Y).
Now suppose we want to determine E (Y) given some information set (I). This
information is usually in form of data on variables called explanatory variables, e.g. Gender,
Height etc.
Suppose such variables are X1, X2, X3.. Xm, if the set of information is complete
then we can write;
Y= f (X1, X2, X3.. Xm)
Complete information means:
List of all variables X1, X2, X3.. Xm is complete.
All data are measured accurately.
The functional form [f (.)] is exactly known.
The three sources of error:
Incomplete list of X variables.
Measurement error in data.
Misspecification of the functional form.
This will produce the following type of equation
Page - 10 -
Y= 2 X2 + 3 X3 + 4 X4 + + k Xm +Z.
[k < m]
Z= error committed due to above three reasons.

Econometrics is all about extracting information from the composition of error term Z
and using it beneficially, extracting information intelligently with cost and beneficially.
We can extract information immediately.
Z= E (Z) + Z E (Z)
Fluctuation in error around its mean value, denote it by (U)
A parameter that can be estimated ()
Now we can write the model as;
Y= 2 X2 + 3 X3 + 4 X4 + + k Xm +Z.
Or
Y= 2 X2 + 3 X3 + 4 X4 + + k Xm + E (Z) + Z E (Z)
Or
Y= 1 + 2 X2 + 3 X3 + + k Xk + U.
[X1 =1]
This model is known as General Linear Regression Model (focus on parameters).

Assumptions:
1). Ui is random variable for each i.
1b). Ui is normally distributed (Ui ~ N) for each i.
2). E (Ui) =0 for each i.
As this assumption holds by construction;
Ui = Zi E (Zi)
E (Ui) = E (Zi) E [E (Zi)]
= E (Zi) E (Zi)
=0
4). Co-Var (Ui Uj) = 0 for all ij.
5). X variables are fixed or exogenous or non-random.
Example:
(i). Height depends on age.
Age is exogenous variable.
(ii). Marks depends on hours of study (reading).
Hours of study are exogenous variable.
(iii). C = + Y + U.
Y = C + I + G +NX (X M).
Page - 11 -
T Y = + X + U
wo Variable Regression Model
Suppose we have data through a random sample size n, and then we can write;
Yi = + Xi + Ui
(i= 1, 2, 3..n)
Example: Height or Age is exogenous variables.
Yi = + Xi + Ui
Estimation:
Suppose and are estimators of and respectively, therefore we have the estimated
values of Y given as;
i = + Xi + Ui
The regression residual;
ei = Y i - i
= Yi - ( + Xi)
ei = [Yi - - Xi]
For OLS we minimize ei with respect to
= 2[Yi-
- Xi] (-1) =0
= 2[Yi-
- Xi] (-Xi) =0
and the first-order conditions are;
Divide both sides by -2 and rearrange
Yi- n - Xi =0 ----------- (i)

Xi Yi- Xi - Xi =0 --------- (ii)
From equation (i)
Yi + Xi = n
both sides by n
----------- (iii)
Substitute (iii) into (ii)
Page - 12 -
Consider
Like wise, we can show that;

Now substitute (v) and (vi) into (iv)
Thus we have
These are OLS estimators of and .

Properties of : 1).
Proof:
Page - 13 -
= a1 Y1 + a2 Y2 + a3 Y3 + an Yn.
= ai Yi
----------------------- (vii)
By assumption x values are fixed therefore a1 = xi / x is fixed value and can be

treated as a constant, the same is true for a2, a3 an.
This means that
= a1 Y1 + a2 Y2 + a3 Y3 + . an Yn is a linear function of Y1, . Yn.
1b).
is a linear function of U.
Proof:
= ai Yi.
= ai ( + Xi + Ui)
= ai + ai Xi + ai Ui
Now consider
---------------- (viii)
ai = (xi / xi)
= 1 . xi.
xi
= 1 . (0) => ai = 0.
xi
------------------ (ix)
ai Xi = (xi / xi) Xi.
Page - 14 -
= 1 . xi
xi
ai Xi = 1.
Substitute (ix) and (x) into (viii)
= ai + ai Xi + ai Ui
= (0) + (1) + ai Ui
= + ai Ui
------------------ (xi)
= + a1 U1 + a2 U2 + . + an Un is a linear function of U.
It follows that
2).
is random variable, it also follows that Ui ~ N for each i then
~ N.
is unbiased:
Proof:
= + ai U i
E ( ) = E [ + (ai Ui)]
= + ai E (Ui)
= + ai (0)
E( )=
3).
since ai is fixed
where E (Ui) = 0
--------------------- (xii)
has minimum variance in the class of linear unbiased estimators:
Var ( ) = E [ - E ( )]
= E [ + ai Ui - ]
= E [ai Ui]
= E [a1 U1 + a2 U2 + a3 U3 + ... + an Un + ij aij Ui Uj]
= a1 E (U1) + a2 E (U2) + ... + an E (Un) + ij aij E (Ui Uj) (A)
Since x values are fixed.
Consider
E (Ui) = E [Ui E (Ui)]
= E (Ui)
Var (Ui) =
Now consider
[where E (Ui) = 0]
---------------------- (xiii)
E (Ui Uj) = E [Ui E (Ui)] [Uj E (Uj)]
[where E (Ui) = 0]
= co-Var (Ui Uj) = 0 ----------------------- (xiv)

Page - 15 -
Substitute (xiii) and (xiv) into (A)

Var ( ) = a1 + a2 + ... + an + ij aij (0)
= ai
= (xi / xi)
= [ xi/( xi)]
Var ( ) = 1 .
xi
---------------------- (xv)
4). Consider another linear estimator:

Let * be another linear estimator of .
Proof:
* = bi Yi.
[Where bi is fixed]
= b1 Y1 + b2 Y2 + b3 Y3 + + bn Yn.
If * is to be unbiased, we will need
E (*) = .
That is
E (*) = => E (bi Yi) = .
=> E [bi ( + Xi + Ui)] = .
=> E [bi + bi Xi + bi Ui] = .
=> bi + bi X i + bi E (Ui) = .
[Where E (Ui) = 0]
=> bi + bi Xi = . ---------------------- (xvi)
This require bi = 0, bi Xi = 1.
Now consider variance of *.
Var (*) = E [* - E (*)]
----------------------- (xvii)
Consider * = bi Yi.
= bi ( + Xi + Ui)
= bi + bi Xi + bi Ui.
= (0) + (1) + bi Ui.
* = + bi Ui.
Substitute in (xvii)
[Using (xvi) equation]
Var (*) = E [ + bi Ui ]
= E [ bi Ui]
= E [bi Ui + ij bi Ui bj Uj].
= bi E (Ui ) + ij bi bj E (Ui Uj)].
[E (Ui Uj) = 0]
Var (*) = bi.
-------------------------- (xviii)
We need to prove that Var (*) > Var ( ).
Var (*) = bi.
= (bi- ai + ai) .
= [(bi- ai) + (ai) + 2 (bi- ai) (ai)].
= (bi- ai) + ai + 2 (bi - ai) (ai) -2 ai].
Page - 16 -
[ ai = 0, ai Xi = 1, bi = 0, bi Xi = 1].
= [ (bi- ai) - ai + 2 bi ai]
Consider
-------------- (xix)
b i ai = b i x i .
xi
bi ai = ai
Substitute in (xix) equation
Var (*) = [ (bi- ai) - ai + 2 ai]

= [ (bi- ai) + ai]
= ai + (bi- ai)
Var (*) = Var ( ) + (bi- ai)
Var (*) > Var ( ) unless bi = ai
Var (*) = ai + (bi- ai)
= 1 + (bi- ai)
xi
Var (*) = Var ( ) + (bi- ai)
If * is different from
then we must have bi ai for at least some i, this will yield;

Var (*) = Var ( )
ractice Equation:
Suppose we want to estimate the equation,
Example: Per minute income hypothesis.

Qi = Ai + Ui
Production = Area
Or
Ti = M+ Ui
Yi = Xi+ Ui (error) and
Derive OLS estimator of ( 1
Now convert the equation as,
Yi = + i
Xi
Again derive OLS estimator of .
Page - 17 -
Compare between
and
, think over it.
Hetroscedisticity [Yi = + i]
Xi
Properties of OLS residuals:
1). OLS residuals are or thogonal to Regressors.
Like [a1 a2]
a1.b1 + a2.b2 =0
Regressors in equation are 1 and Xi in the below equation.

Yi = + Xi + Ui
a). [ 1,1, ..1]
b). [X1 + X2 + .. Xn]
e1 + e2 + e3 + ,.. en => ei =0
X1 e1 + X2 e2 + X3 e3 + ,.. Xn e2n => Xi ei =0
Proof:OLS estimators are derived from the equation

=> (Yi-
- Xi) = 0
=> (Yi- ) = 0
=> i = 0
= 0 => (Yi-
- Xi) Xi = 0
=> (Yi- ) Xi = 0
=> Xi i = 0
ERFORMANCE OF AN ESTIMATED EQUATION:

Consider the following relation,
Yi =
+ i
=0
Page - 18 -
Multiply both sides
(i)
Actual variation = Expected variation.
Now consider
Where
=0
= 0.
Using this result we can write equation (i) as
The performance of the estimated model can be measured by R, which is Square of

multiple correlation co-efficient between one variable (V) on the one hand and a set of
variables (V) on the other hand.
R = minimum at 0.
R = maximum at 1.
R is also called the co-efficient of determination, it is given by;
Also note that,
Extreme cases, so
0 R 1
There is no bench mark and in what context R is taken.
Example:
Age of Ali = + (age of Alis dad) + U
Page - 19 -
R = 1
Weight of Ali = + (weight of Alis dad) + U
R = 1
Pakistan, consumption
R = 0.95
Spurious cause between Y and G
In cross section data R = 0.4 is good, but in time series data R = 0.9 is not a remarkable
because of bound ness, we think this is best measure. Problem with R is that R
increases if we add more variables in the regression.
Example:
Dependent variable is consumption of household.
Data: Cross section.
C = + Y +U
R = 0.25
C = + Y + N +U
R = 0.46
C = + Y + Nc + Nm + Nf + R +U
R = 0.56
C = is linear function of Y, Nc, Nm, Nf, Residence, education female, education male,
wealth .etc.
R = 0.9899
If we make R as the criteria to choose the number of variables in the equation, well end
up with as many variables as the number of sample points with R = 1.
Also note that as the sample size decreases R will in general increases.
Page - 20 -
It do not put limitations to include variables in the model, model has to be small cater.
Adjusted R:
Consider the formula for R
Now
Where K= number of parameters in the equation [ etc].
We can write adjusted R as;
Page - 21 -
If the net effect is positive then, on net basis

will increases and well include variable
under consideration.
Degree of Freedom (n-k), we need at least two variables.
tatistical Inference in Econometrics:

Theorems:
1). If X is distributed normally with mean () and standard deviation ().

X ~ N (, )
[shape and all estimates exactly]
Then
Z = x- ~ N (1, 0)
Standardize normal distribution.

2). If X1 ~ N ( 1, 1)
X2 ~ N ( 2, 2)
X3 ~ N ( 3, 3)
:
:
:
Xm ~ N ( m, m)
And X1, X2, X3 .. Xm are mutually independent then,
X1 + X2 + X3 +. + Xm ~ N ( 1+ 2+ m, 1+ 2 + 3+ m)
If X1, X2, X3 .. Xm are not mutually independent then,
X1 + X2 + X3 +. + Xm ~ N ( 1 + 2+ m, 1+ 2+ 3+ m+ I j i j)
3). Suppose Xi ~ N ( i, i) [i= 1, 2, 3 ]
And Xi is mutually independent
Number of observer of minimum required.
Continuous variable (Height)

There will be no point probability, only interval probability.
4). Suppose
V1 ~ m1
Page - 22 -
V2 ~ m2
V1 and V2 are independent,
F = V1/ m1 ~ F m1, m2 [Fisher distribution with numerator degree of freedom to m 1 and denominator
V2/ m2
degree of freedom to m 2]
5). Suppose
X ~ N (, )
V ~ m
X and V are mutually independent then
= > standardize normal
(V/m)
t = X-/
(V/m)
t = X-/ ~ t m
(V/m)
Variances increases tails increases
Back to Econometrics:
Consider
Y=+X+U
Suppose we want to test the null hypothesis;
H0: = 0 [where 0 is given value and alternative]
H1: 0
As we know that
~ N (,
Therefore
Page - 23 -
In testing null (H0) hypothesis, we will use estimated value of as , the hypothetical
value 0 in place of and x will be computed from the actual data.
That is
from data
= 0 from H0
x from data
Note that remains unknown; one option is to replace by its unbiased estimator,
[Proof is in the book]
Then
is no longer normally distributed if the sample size is large then

Approximately normal
Another option is to convert Z into t distribution the steps are as follows;
1).
~ N (,
is
2). It can be shown that
Or
3). It can also be shown and V are mutually independent.

4). It follows from 1, 2, and 3 that
Now consider
Page - 24 -
Recall
Example 1:
Y=weight
4
8
10
12
11
Y=45
Y=+X+U
X =Age
1
2
3
4
5
X=15
(Xmean X) =x
-2
-1
0
1
2
x=0
(Y-mean Y) =y
-5
-1
1
3
2
y=0
X
4
1
0
1
4
x=10
y
25
1
1
9
4
y=40
xy
10
1
0
3
4
x y=18
= 9 1.8 x 3
= 9 - 5.4
= 3.6
The estimated equation is;
3.6 = weight at time of birth.
1.8 = rate of increase in weight per year increase in age.
Suppose we want to test;

H0: = 0
H1: 0
Page - 25 -
5.4
7.2
9
10.8
12.6
When x = 1
x=2
x=3
x=4
x=5
We can compute
Deviation is standard error.

Now
Set level of significance (or probability of type 1-error, equal to one minus level of
confidence) = 0.05
The calculated t-value falls in rejection (area) range so we reject H0. This means the effect
of age on weight is significantly different from zero.
Page - 26 -
Testing of Hypothesis:
Suppose we have estimated the following equation;
C =10.0 + 0.8Y
, R = 0.97, n = 25
(2.5) (0.1)
[The values in brackets are standard errors]
Interpret these results as an economist.

Before we interpret, we test a few hypotheses.
1).
H0: = 0
H1: 0
Degree of freedom = 25 2 = 23
Level of significance = 0.05
Critical t-values = + 2.069
We reject H0.
2).
H0: = 0
H1: 0
We reject H0.
3).
H0: = 1
H1: < 1
Page - 27 -
We reject H0.
Interpretation: The result show that 97% variation in consumption expenditure is explain
by our model, which indicates that the over all performance of the equation is satisfactory,
the intercept is positive and significantly different from zero, its magnitude shows that the
subsistence or autonomous consumption expenditure is 10 thousand rupees per capita per
year, further note that marginal propensity to consume (MPC) is significantly different from
zero and less than one, the estimated value of the MPC shows that the marginal consumption
rate is 0.8 or 80% of each incremental rupee of income is consumed, while the remaining
20% is saved.
Testing a linear restriction on two or more parameters:
Y= + X + U
H0: + = 1
H1: + 1
Page - 28 -
Actual application:
Variances and co-variances are obtained from coefficient variance- co-variance
matrix.
Suppose
= 1.7,
= - 0.2
H0: + = 0
H1: + 0
H0: - = 0
H1: - 0
Page - 29 -
hree Variables Model:

Yi = 1 + 2X2i + 3X3i + Ui
Assumptions:
1). Ui is random variable for each i.
1b). Ui is normally distributed (Ui ~ N) for each i.
2). E (Ui) =0 for each i.
As this assumption holds by construction;
Ui = Zi E (Zi)
E (Ui) = E (Zi) E [E (Zi)]
= E (Zi) E (Zi)
=0
4). Co-Var (Ui Uj) = 0 for all ij.
5). X variables are fixed or exogenous or non-random.
6). Correlation between X2, X3 is not equal to 1.
[This is X2, and
X3 is not perfectly correlated].
E (Yi) = 1 + 2X2i + 3X3i

Consider
2 = E (Y)
X2
3 = E (Y)
X3
Since all variation in X2, and X3 are common, it is impossible to separate the effects of
X2, and X3 on Y [conceptually wrong, where variables are perfectly correlated, model
construction is wrong].
We can further show that all possible methods of estimation will fail; we can also
interpret this situation by argument that information content in data is zero.
X2, X3
Information,
Content is zero.
X2,
Information
Content is full.
X3
X2, X3
Information is rich.
Page - 30 -
Estimation:
Yi = 1 + 2X2 + 3X3 + U
Replace unknown parameters by their estimators and set U = 0.
In OLS we solve the following problem
First-Order-Condition:
The end result is as follows;
Also note;
It can not be shown that following properties hold:

Page - 31 -
3).Var ( 1), Var ( 2), Var ( 3) and Z have minimum variance in the class of linear
unbiased estimators, it can be shown that an unbiased estimator of the variance of U,
is;
ultiple Regression Model:
Testing of more than one restriction jointly:

Suppose we have to estimate or test.
H0: 2 = 1, 3 = 0
H1: 2 1 and/ or 3 0,
And one model is
Y = 1 + 2X2 + 3X3 + 4X4 + U
[We have to see the number of restrictions and not to focus on how many restrictions on parameters].
Where
The Steps are as follows:

1). Estimate the unrestricted equation to compute
4.
Then compute
Then
i = Y i .
Page - 32 -
Finally compute U
Suppose U = 50.
2). Impose the restriction given into this will yield, according to our example;
Y = 1 + (1) X2 + (0) X3 + 4X4 + U
Y X2 = 1 + 4X4 + U
Estimate 1 and 4; and compute;
1
= ..
= ..
Now complete
i = Y i .
Finally compute R
Suppose we have R = 60.
3).Compute the F-statistics.
Note these values; U = 50, R = 60, R = 2, n = 34 and k = 4.
Now plug in values;
F = (60 50) / 2 = 10 / 2 = 5*30 = 3.
50 / (34 4) 50 / 30 50
4). Conclusion.
We conclude by comparing calculated F-value with the critical F-value. In our
case the critical F-value at R = 2 and n-k = 30 and df = 2.87 is supposed.
In our example the calculated F-value > critical F-value, so we reject H0.
Page - 33 -
How to check F- value in the Table:
Consider now General model:

Y = 1 + 2X2 + 3X3 + K XK + U
Special Case 1: Just one restriction.
Examples: H0: 1 = 0.
H0: 1 = 1.
H0: 2 + 3 = -1.
H0: 2 + 3 + 4 + 5 = 0.
H0: 2 + 3 + 4 + + K = 0.
In this case we can show that;
F = t
Critical F = (critical t)
| t | > | t critical | | F | > | F critical |
| t | = | t critical | | F | = | F critical |
| t | < | t critical | | F | < | F critical |
So t and F test will lead to identical conclusions.
Special Case 2: Each parameter except intercept is set equal to zero.
H0: 2 = 0, 3 = 0, 4 = 0 .. K = 0.
H1: At least one j 0, [j = 2, 3, 4 k]
In this case restricted model becomes;
Y = 1 + U
Or
Y = b1 + V
Now
Page - 34 -
Where all values come from unrestricted model, so we can ignore the subscript .
We can write model as;
It is test of over all performance of the model.

Note 1: In General Case.
Or
Divide all terms by
y.
F-statistics indicates that increase in the restriction decrease R by more error (In against
or alternative).
F-statistics indicates the increase in the R due to removal of restrictions.
Note 2: In Special Case 2.
Divide all terms by
Page - 35 -
When F = 0 => R = 0 test just like.

F-statistics indicates the significance difference of R from zero.
ULTICOLLINEARITY:
It is an econometric problem.
There are four ways to solve the problem:
What is problem
What are the consequences of problem
How to test the problem
What is solution
Recall the three variables regression model problem.
Yi = 1 + 2X2 + 3X3 + U
Also recall;
Also note;
In each equation denominator is same.

= x2 x3 - (x2 x3)
Page - 36 -
Now suppose we have X2 and X3 are perfectly correlated;
23 = + 1.
In this case
= ---------------,
3 = ---------------.
O.
O
So we can estimate 2 and 3 by OLS. It can be shown that 2 and 3 can not be estimated
at all. In fact the true values of 2 and 3 are not even perfectly defined.
2
2 = E (Y)
X2
and
3 = E (Y)
X3
do not exist
On the other hand if X2 and X3 are not at all correlated then,
23 = + 0.
In this case
These are OLS estimators when we regress.

Y = + 2 X2 + U
Y = + 3 X3 + U
Also note that in this case.
--------------- One model

--------------- Other model
2 = E (Y) = d E (Y)
X2
d X2
3 = E (Y) = d E (Y)
X2
d X3
Finally note that in this case multiple regression equation and partial (simple)
regression equation produce identical results.
Recap: In case 23 = + 1, multiple regression equation fails theoretically and application wise
also.
In other extreme case 23 = 0, multiple regression equation is not needed, so the only
practical use of multiple regression equation is when
23 = 0, 23 + 1.
Page - 37 -
Now multicollinearity can be defined as Strong but imperfect correlation among X

variables. A correlation problem in which independent (X) variables is highly correlated,
and then estimators quality is poor.
Another way of understanding the problem is to look at the information content in data.
X2, X3.
23
= + 1.
X2,
Information.
Content is zero.
23
X3.
= + 1.
Information
Content is full.
23 = 0, 23 + 1.
| 23 | is low.
Information content is rich.
| 23 | is High.
Information content is poor.
Multicollinearity is a problem of poor information content in data. Variation

or estimation is possible but in poor performance.
Concept of Multicollinearity: Type of situation when multicollinearity can arise.
Multicollinearity is most common in time series data because in such data variables
are highly correlated due to common trend.
Example: Relationship between age of my father and my age it is one by one relationship
which is spurious cause.
Multicollinearity can also arise if the X variables are poorly specified.
Examples:
Exchange Rate.
ER = + CAD + Export + Import + U.
In this equation we have repeated same variables on the right hand side as exports and
imports are included in Current Account Deficit (CAD) therefore writing only CAD or
exports and imports are good.
Consumer Price Index.
CPI = + CC + DD + TD + OD + U
Page - 38 -
In this equation of CPI, DD plus TD are coming from commercial banks and CC is
major part of money, we can write this equation as to be good.
CPI = + [CC + DD + TD + OD] + U
Or
CPI = + M2 + U
[M2 = CC + DD + TD + OD]
Or
CPI = + (CC + DD) + (TD + OD) + U
There could be model specification problem. Here is matter of judgment not a matter
of science.
Consequences of Multicollinearity:
Note that OLS estimators remain BLUE.
(1) The variances of OLS estimators become large.
Recall formula for variances.
If | 23 | is high (1- 23) will be low, therefore Var ( 2) and Var ( 3) will be large.
It follows that,
Page - 39 -
Standard error will also be large. Thus the t-value for H0:
t=
- 0.
= 0.
Will become small
SE ( 2)
Therefore we may accept H0, while will should not have to accept. In other words we
may wrongly conclude that X variables do not affect Y.
Example:
CPIt = + M2 + ERt + Yt + CPI t-1 + Ut
If data is too large than more multicollinearity, we may accept H0: j = 0. Standard
error will be greater and cause misleading t-value. Another implication of this consequence is
that
varies quite a bit from sample to sample. In particular even small changes in the
sample may produce large changes in j.

Take above example of CPI equation, we may have
= 2.1 than we use sample of
annual data from 1970 to 2002. = -3.7 when the data are from 1970 to 2005, so conclusion
is that s also erratically changes with small changes in the data, specification etc, t-value
(decreases) is small then s will volatile the model their will be no robusness (stability) in
the model, their will be no trust over model.
(2) Recall the formula for Co-Var ( 2,
).
If 2 3 is high (due to multicollinearity), this will make Co-Var ( 2,

Further note that
) large.
Page - 40 -
If X2 and X3 are positively and highly correlated with each other, then (negative
and very large correlation) under estimation of 2 will accompany over estimation of 3 and
vice versa.
Example: 2 = 100 and 3 = 250
3 = 300 and 2 = 100
Like wise if X2 and 3 are negatively and highly correlated then over (under)
estimation of 2 will accompany under (over) estimation of 3.
Consequences of (1) and (2) imply that the estimated parameters ( s) become
volatile (unreliable, unstable) and too sensitive, their magnitudes are quite likely to be
unrealistic in terms of sign and size (some significant parameters sign will not good).
Example: CPI, Y, ER, IR MPC = 1.3. MPC = -1.3
Own price elasticity is negative it comes positive; it means estimation is not
realistic and reliable.
Testing and Diagnostic of Multicollinearity:
Formal test of multicollinearity are too complex but not much fruit full. In practice
we may rely on certain clues and symptoms (indicators).
(1) Multicollinearity is likely to be present if data are time series data at high
frequency (for example annual rather than monthly data). Unless data are de-trended
(Remove common trend, low interval and low frequency).
(2) A very popular symptom of multicollinearity is that over all performance of the
estimated equation is good in terms of high value of R, but t-statistics for individual
regression coefficients for H0: j = 0 is mostly insignificant.
Example1:
Log CPIt = 1.2 + 0.3 log M2 + 0.7 log Yt 0.25 log ERt + 0.97 log CPIt-1.
T-values:
(0.85)
(1.37)
(-0.09)
(44.73)
R = 0.9938
Log CPIt = 1.2 + 0.3 log M2 + (-) 0.7 log Yt (+) 0.25 log ERt + 0.97 log CPIt-1.
(Insignificant)
(Insignificant)
(Insignificant)
(Highly significant)
R = 0.9938 is very impressive.

In this case over all performance is good but individual effects are misleading
due to sign and high (low) standard error and small (low) t-values and also insignificant tvalues. It is not result that before CPI is present thats why it is arisen.
Example 2:
Log (Qw) = 1.2 + 0.3 log Y 0.1 log (Pw) + 0.3 log (Pr) + 1.1 log (Npop)
T-values:
(1.57)
(-0.73)
(1.21)
(Insignificant) (Insignificant) (Insignificant)
(17.43)
(Highly significant)
R = 0.99.
Page - 41 -
Where
Log Y = Income elasticity.
Log Pw = Price elasticity of wheat.
Log Pr = Cross price elasticity of rice.
Like wise signs are good results is fine because all factors are good.
(3) Parameter estimates are too sensitive to changes in sample, definition of variables
and specification of the model.
If we change sample little bit to add new data and regress and it give new situation
which drastically changes results (we are avoiding Var ( ) is too high not good). Defining
variables in more than one ways like GNP as GDP or GNI and saving data which definition
we are going to use or to take, which changes drastically. Through which way we are
specifying the model,
C = + 1 Y + R + ..
Log C = + 2 log Y + log R + .....
-------- Linear function.

-------- Logarithm function.
1 = C.
Y
2 = Log C = C Y.
Log Y Y C
Therefore
2 = 1 Y.
C
Drastically changes means too sensitive.

(4) Detection through Correlation Matrix.
Let take an example:
Log P = + log M + log GDP + log ER + U.
(CPI)
(Ms)
(Real GDP)
(Real ER)
Page - 42 -
Construct Correlation Matrix for all variables.
Rule of Thumb: If correlation among X variables is stronger than correlation between X

and Y then multicollinearity is present.
Example: X and Y are strongly correlated.
ABC
In this case the correlation among X variables can undermine the relationship.
Relationship between X and Y variables
Among X variables
Case 1: P, M = 0.95
M, ER = 0.80 (Not ok)
(Multicollinearity exists).
P, ER = 0.70.
.
--.
--.
--.
Case 2: P, M = 0.95
M, Y = 0.60 (Ok)
P, Y = 0.65.
(No Multicollinearity exists).
Case 3: P, GDP = 0.95
GDP, ER = 0.55(Ok)
P, ER = 0.70.
(No Multicollinearity exists).
Solutions of Multicollinearity:
(1) Exclude variable (s) causing multicollinearity.
This solution makes sense only when the variable being dropped is not important in the
over all frame-work of our analysis.
Example:
Pt = + Mt + GDPt + ERt + Pt-1 + Pt-2 + U
If Pt-2 is causing multicollinearity we should exclude this variable which is not very
important while Mt is causing multicollinearity we should not exclude this variable because
with out Mt (money supply) we can not measure the inflation.
Note: Unfortunately important variable causes multicollinearity.
Page - 43 -
(2) Increase the sample size.
This solution in general is not much appealing because along sample is

desirable in any case (Why we wait for multicollinearity to arise?). In most cases the sample
size is small in first place just because large sample is not available.
Example: Poverty, income reduces poverty, why poverty is exists?
However a meaning full interpretation of this solution can be as follows.
(a) Split the data into quarterly data or monthly.
=> 30 x 1 = 30 yearly data.
=> 30 x 4 = 120 quarterly data.
=> 30 x 12 = 360 monthly data.
Two issues: Can we split data on the basis of month and quarter. Monthly and quarterly data
is not valid due to common trend. In some cases like as;
ER weakly, monthly daily data.

Real activity can not split quarterly and monthly like as;
GDP, saving is not perfectly in quarterly data but quarterly approximation.
Pt: Mt, GDPt, ERt, Pt-1, Pt-2.
GDPt is not accurately splits, if advantage is more in splitting data then split the data.
(b) Split the data on special basis.

For example data on Pakistan can be split into data on Punjab, Sindh, Blochistan and
NWFP provinces wise, area (space) wise, time wise etc.
(C) Merge two different (not identical) but similar data.
For example we can merge 30 observations of Pakistan with 30 observations each of India,
Sri Lanka and Bangladesh. It is work able solution.
(3) Filter the data.
(1) In time series we can apply first differencing.
Yt = + Xt + Zt + Ut.
Yt-1 = + Xt-1 + Zt-1 + Ut-1
.
Yt Yt-1 = ( Xt Xt-1) + (Zt Zt-1) + Ut Ut-1
Or
Yt = Xt + Zt + Ut.
Page - 44 -
It reduces the changes drastically of multicollinearity but it filter out all valuable
variables also. It is not a good solution, intercept is also gone. Suppose multicollinearity is
caused by common trend.
Why not control trend? To control for the trend, we include time variable in the model/
equation.
t = 0, 1, 2, 3 ..
Yt = + t + Xt + Zt + Ut.
Yt-1 = + (t-1) + Xt-1 + Zt-1 + Ut-1 .
Yt = + Xt + Zt + Ut.
Where => t (t-1)

=> [t (t 1)]
=> .
Relationship between independent and dependent will become week.

(2) Both in cross-section and time series data we can take ratios.
C = + Y + W + U.
[Y increases W increases]
In time series = Pop income

In cross-section = household pop income (per capita).
We are redefining; not dividing by N trend will become week.

Suppose we have Cob-Douglas Production function.
K = Capital
L = Labor.
Page - 45 -
M = Material.
E =Energy.
Log Q = log A + log K + log L + log E + log M + U.
Or
Log Q = a0 + log K + log L + log E + log M + U.
Firm have higher capital stock, and then there will be more employment arises.
Test:
H0: + + + = 1
H1: + + + 1
Suppose H0 is accepted then we can write;
= 1 .
Now production function becomes,
Or
Y per worker = f (K per worker, E per worker, M per worker).

Note: A problem does not have to be shall only tackle, leave it alone.
RISCH-WAUGH THEOREM:
Suppose we have;
Y = 1 + 2 X + 3 Z + U
Then
2 = E (Y)
X
The effect of Z can also be eliminated as follows;
Regress Y on Z.
Y = a0 + a1 Z + V
And obtain
(by OLS)
Regress X on Z.
X = b0 + b 1 Z + W
And obtain
(by OLS)
Page - 46 -
Now regress by OLS below equation

Y* = C0 + C1X* +
It can be proved.
If we make Z as constant or we eliminate it.
Log Xt = log Xt log Xt-1
= log Xt log Xt-1
t (t 1)
= Log Xt
t
log Xt = log Xt . Xt
t
Xt
dt
= 1 X
X
= growth rate of X.
Note: - We have to week the co linearity to decrease multicollinearity.
UTOCORRELATION:
Definition:
Correlation between Xi and Yj, ij and X and Y may be the same or different
variables are called Serial correlation.
Example: correlation between Mt-1, Pt

Yt, Rt-2
If X and Y are the same then serial correlation becomes Autocorrelation. Correlation
between Xi and Xj, it arises in time series data and in cross section as well.
Example: - Correlation between Ct, Ct-1
[Consumption]
[Output]
Correlation between Qt, Qt-4
[Temperature]
Correlation between Tt, Tt-12
Special case of serial correlation is correlation between Xi, Yi [contemporiuos].
Autocorrelation Problem:
This problem means presence of autocorrelation in error term [Ui] in the regression
equation.
Yi = + Xi + Ui
Autocorrelation means,
Cov (Ui, Uj) 0 for at least some ij.
Recall the assumption.
Cov (Ui, Uj) = 0 for all ij.
Autocorrelation is violation of this assumption and mainly a problem of time
series data, it is usually present because the dependent variable has inertia or sluggish ness or
stickiness and this inertia is not captured by any variable on the right hand side.
Page - 47 -
Ct = + Yt + Ut
[4 years monthly data]
We have exclude variables which capture the inertia, error become auto correlated
when error term captures inertia.
Consequences of Autocorrelation:
Note: - OLS estimators remain linear and unbiased.
1). OLS estimators no more have minimum variance in the class of linear unbiased
estimators.
Not remains best.
2). Ordinary formula for calculating variances is no more valid.
Var ( ) _
x
ij Cov (Ui Uj) 0
OLS estimators are not sufficient, they are larger in variances.
This is not a big problem we can make correlation if we apply BLUE but not best by
using correct formula, result will come in too high variances then standard errors also
become high and we will accept t-value which miss lead the parameters.
[Testing and Solution of Autocorrelation is post pond till such time we understand the
various forms of Autocorrelation].
Form of Autocorrelation:
Consider the model
Yt = 0 + 1 Xt + Ut
1). Auto Regressive system [AR (p) model]
Ut = 0 + 1 Ut-1 + + p Ut-p + t
Auto correlated portion
Non-auto portion
Innovation
News, Shock
White noise error
Page - 48 -
2). Moving Average system [MA (q) model]

Ut = 0 + 1 t-1 + + q t-q + t
(Regressed over errors, chicken egg problem)
= + 1 t-0 + + q t-q
[ 0 =1]
Why we call it moving average, we can write equation as
Ut = + [Moving average of t]
3). ARMA (p, q) model:
Ut = 0 + 1 Ut-1 + + p Ut-p + 0 + 1
AR (p) model
+ + q t-q + t
t-1
MA (q) model
[Some called it ARIMA model]
AR (1) Model:
Ut = 0 + 1 Ut-1 + t
This is the most popular and a simple way to model Autocorrelation.
Assumptions:
1). t is a random variable for all t.
2). E ( t) = 0.
3). Var ( t) = for all t.
4). Cov ( t, t) = 0 for all t = t.
[at two different points they are not correlated].
5). | 1 | < 1.
Properties of Ut:
Solve Ut as follows
Ut = 0 + 1 Ut-1 + t
= 0 + 1 [Ut = 0 + 1 Ut-2 + t-1] + t
= 0 + 0 1 + 1 Ut-2 + 1 t-1 + t
= 0 + 0 1 + 1 [0 + 1 Ut-3 + t-2] + 1 t-1 + t
= 0 + 0 1 + 0 1 + 1 Ut-3 + 1 t-2 + 1 t-1 + t
[We will end up with following equation]
Ut = 0 + 0 1 + 0 1 + 0 1 +
+ t + 1 t-1 + 1 t-2 + 1 t-3 + ..
+ 1 Ut-
[1 = 0]
Or
Ut = 0 [1 + 1 + 1 + ...] + t + 1 t-1 + 1 t-2 + ..
Page - 49 -
= 0 1- 1 + t + 1 t-1 + 1 t-2 + ..
1- 1
Ut = 0 + t + 1 t-1 + 1 t-2 + MA()
1-1
AR (1) model = MA () model
Ut = + 0 t-0 + 1 t-1 + 2 t-2 +
Weighted average of past innovations, shocks telling us use full information.
Parametric Properties of Ut:
1). E (Ut) = 0 + E ( t) + 1 E (t-1) + 1 E ( t-2) + .
1-1
= 0 + (0) + 1 (0) + 1 (0 ) + ...
1-1
= 0 + 0 [1+ 1 + 1 + ....]
1-1
= 0 + 0 [ 1 ]
1-1,
1-1
= 0 .
1-1
[0 + 0 + 0 + 0 + 0 + 0 + 0 + .+ 0 0]
Time zeros
10 = => 10 = 0 .
0
1=
=> 1 = 0 .
0
In AR process for Ut, if E ( t) = 0, we should set 0 = 0, so well have E (Ut) =0
Ut = 0 + 1 Ut-1 + + t
2). Var (Ut) = Var ( t + 1 t-1 + 1 t-2 + )
= Var ( t) + Var (1 t-1) + Var (1 t-2) + + (covariance)
= + 1 + 1 + ..(0).
= [1 + 1 + 1 + ].
[ = (0.8) = 0]
= [ 1 (1)]
1-1
Var (Ut) = .
--------------------- (1a)
1-1
This is constant variance for all t, there is no hetroscedisticity in Ut.
Page - 50 -
3). Cov (Ut, Ut-1) = E [Ut E (Ut)][Ut-1 E (Ut-1)]

= E [Ut Ut-1]
= E [ t + 1 t-1 + 1 t-2 + .] x
[ t-1 + 1 t-2 + 1 t-3 + ..]
= 1 + 1 + 1 +
= 1 [1 + 1 + 1 + ]
= 1 [ 1 ]
1-1
Cov (Ut, Ut-2) = E [Ut E (Ut)] [Ut-2 E (Ut-2)]
= E [Ut Ut-2]
= E [ t + 1 t-1 + 1 t-2 + ................] x
[
t-2 + 1 t-3 + 1 t-2 + ..]
= 1 + 1 + 1 + ...
= 1 [1 + 1 + 1 + ]
= 1 [ 1 ]
1-1
Cov (Ut, Ut-3) = E [Ut E (Ut)] [Ut-3 E (Ut-3)]
= E [Ut Ut-3]
= E [ t + 1 t-1 + 1 t-2 + ................] x
[
t-3 + 1 t- 4 + 1 t-5 + ..]
= 1 + 1 + 1 + ...
= 1 [1 + 1 + 1 + ]
= 1 [ 1 ]
1-1
In General, we obtain;
Note
Var (Ut, U0) = 1 [ 1 ]
1-1
= Var (Ut)
= .
1-1
-------------- as given in equation (1a).
4). Corr (Ut, Ut-i) =
Cov (Ut, Ut-i)

SD (Ut), SD (Ut-i)
= Cov (Ut, Ut-i)
SD (Ut), SD (Ut)
= Cov (Ut, Ut-i)
Var (Ut)
= Cov (Ut, Ut-i)
Var (Ut-i)
.
.
.
.
Page - 51 -
Autocorrelation Coefficient at lag length i
which is function of i
1 > 0
1 = 0.8
Auto function is geometrically declining and approaching towards zero as the lag
length increases.
1 < 0
1 = - 0.5
Auto function oscillatory, starting with a negative value at lag length one and
approaching towards zero.
Price level = + (money supply) + Ut
Another case of AR (P)
Suppose we have quarterly data, to estimate the equation.
Yt = + Xt + Ut
We expect that
Ut = 0+ 1 Ut-1 + 2 Ut-2 + 3 Ut-3 + 4 Ut- 4 + t
To simply that matter we assume 0 = 1 = 2 = 3 = 0.
Page - 52 -
We are left with

Ut = 4 Ut- 4 + t -------------- (1)
Assumptions about t:
1). t is a random variable for all t.
2). E ( t) = 0.
3). Var ( t) = for all t.
4). Cov ( t, t) = 0 for all t = t.
Properties of Ut:
1). Equation (1) can be expressed as a MA () process;
t + 4 t-4 + 8 t-8 + .................
2). E (Ut) = 0.
3). Var (Ut) = .
1- 4
5). Cov (Ut, Ut-i) = i = 0 => .

1- 4
i = 1 => 0
i = 2 => 0
i = 3 => 0
i = 5 => 0
i = 6 => 0.
It follows that
= 0 other wise.
Autocorrelation function is;
4 > 0
4 = 0.5
Page - 53 -
4 < 0
4 = -0.5
In computer software the autocorrelation function is shown as a part of

Correlogram (One part of autocorrelation, other are partial autocorrelation etc).
AR (1), 1<0
AR (1), 1>0
AR (4), 1<0
AR (4), 1>0
AR (1) is just a symptom, it is kind of art not a perfect science and very use full idea,
correlogram is reason.
MA (1) Model:
Ut = t + t-1
t satisfies all standard properties, we can show that;
1). E (Ut) = 0.
2). Var (Ut) = (1+ 1)
3). Cov (Ut, Ut-i) = 1 for i = 1
Cov (Ut, Ut-i) = 0
for i 2
Ut = t + t-1
Ut-1 = t-1 + t-2
Ut-2 = t-2 + t-3
Ut
Ut-1
Ut-2
No correlation
Page - 54 -
4). Corr (Ut, Ut-i) = 1 .

1+ 1
=0
Correlogram:
For i = 1
for i 2
MA (1) 1 < 0
MA (1) 1 > 0
MA (4); Ut = t-2 + t-4

MA (1) 1 > 0
MA (1) 1 < 0
Testing of Autocorrelation:
1). Durbin Watson test:
DW test is based on DW statistics.
This formula can be further expressed as follows;
Page - 55 -
H0: = 0 => = 2
H1: 0 => 2
Now unfortunately the distribution of is not unique; it depends on actual data,
for exact distribution we dont have time and energy to make calculations of observation.
Durbin Watson has provided the two extreme distributions as shown in the following graph
Page - 56 -
.
Table for critical value provides dl and du for various values of;
n (Number of observations)
k`(Number of parameters minus one)
Example:
CPI = + M2 + GDP + ER +U
Sample 1970-71 to 2004-05
n = 35
k`= 3
From the table we have
dl = 1.42
du = 1.71
Suppose the calculated = 2.74, we can determine the right tail critical value.
4-dl = 4- 1.42 = 2.58
4-du = 4- 1.71 = 2.29
Since calculated < 4-dl and 4-du, we reject H0 and conclude that autocorrelation is
present, this test has some problem.
Notes on the test:
1). the test statistics has inconclusive range, so it may not produce a concrete conclusion.
2). the test is especially designed for AR (1) process, but not for higher order auto processes
or MA process or others.
3). Despite the above two limitations the test is power full to detect autocorrelation,
especially it is most common form AR (1) process.
Page - 57 -
H0 is :
Test
Reject
Decision
Accept
True
Type I
error
confidence
False
Power
Type II
error
4). DW test is the most popular test.

5). DW gives biased results when lagged dependent variable appears on the right hand side.
Example:
Pt = + M6 + Yt + Et + Pt-1 + Ut
Note (1) is reality not a weak ness (2) and (5) are serious problems.
2). Durbin-h Test:
Durbin-h is use full to test autocorrelation of first order when lagged dependent
variable is on the right hand side.
h ~ N (0, 1)
Critical values are + 1.96 for 5% level of significance.
+ 1.645 for 10% level of significance.
+ 1.345 for 1% level of significance.
If h turn to be an imaginary number,
Then we use another method.

3). Durbins Alternative Test:
Estimate the regression equation.
Yt = + X t + Ut
By OLS and compute regression residual t, then estimate the following regression
equation.
t = 0 + 1 t-1 + 2 t-2 + 3 t-3 + . p t- p + Yt-1 + Xt + error.
Page - 58 -
Now test the null hypothesis,

H0: 0 = 0, 1 = 0, 2 = 0, 3 = 0 .. p = 0.
H1: At least one j 0 for j = 1, 2, 3 ..p.
Apply F test [simple form p = 1]
t = 0 + 1 t-1 + Yt-1 + Xt + error.
H0: 1 = 0.
H1: 1 0.
4). Q-Test:
Q-test is use full to test cumulative autocorrelation up to any order p permissible
with the data.
H0: 0 = 0, 1 = 0, 2 = 0, 3 = 0 .. p = 0.
H1: At least one j 0 for j =1,2, 3 p.
a). P =1.
b). P =2.
c). P =3.
First order autocorrelation.

First and second order autocorrelation
First, second and third order autocorrelation and so on.
For Q-test formula is;
Improved form of formula;
Solutions for Autocorrelation:

Consider the following model
Yt = + Xt + Ut. -------------------------- (1)
Ut = Ut-1 + t.
-------------------------- (2)
t. is White noise.
[Random variables with zero mean and constant variance and zero autocorrelation]
The estimation procedure attempt to replace the auto correlated variable Ut by non
auto correlated variable t.
Consider
Yt = + Xt + Ut.
(1)
[Take first difference of equation (1) and multiply with to all terms minus new equation from (1)]
Page - 59 -
Yt-1 = + ( Xt-1) + Ut-1.
(1`)
Yt = + Xt + Ut.
Yt-1 = + ( Xt-1) + Ut-1.
Subtract: Yt Yt-1 = (1- ) + (Xt Xt-1) + Ut- Ut-1.
Or
Yt Yt-1 = (1- ) + (Xt Xt-1) + t. ----------------- (A)
Now using equation (2) we can write
Subtract:
Ut = Yt Xt
Ut-1 = Yt-1 Xt-1
.
Ut Ut-1 = (Yt Xt) (Yt-1 Xt-1)
= t.
Or
(Yt Xt) = (Yt-1 Xt-1) + t. ----------------- (B)
Equation (A) or (B) has the error term t which satisfies all the classical assumptions. The
two unknown values of coefficients multiply each other then it becomes non-linear equation.
However, the trouble is that both these equations are non-linear in parameters; we can
not drive the formula for the OLS estimators of , , .
Since we can not use any unique formula to compute OLS estimators of , and ,
well have to apply some numerical algorithm.
Well consider two methods
(1) Cochrane-Orcatt two step iterative method.
(2) A Version of Direct Search.
Cochrane-Orcatt two step iterative method:
Step 1a:
Start with some initial value of , suppose we set = 0.
Then equation (A) becomes;
Yt = + Xt + t.
-------------------------------- (A`)
and
Now apply OLS to compute

treat autocorrelation (ignoring).
Step 1b:
Substitute
and
these estimators are poor because they do not
in equation (B)
Or
t = t-1 + t.
Apply OLS to compute , now use
-------------------------- (B`)
in equation (A);
Page - 60 -
Yt* = X1 + Xt* + t.
Apply OLS to yield and .
These are two step estimators of and (either not good due to wrongly estimated
and ).
Since = 0 is not true, and are poor
is poor
and are poor

However, we expect that is more likely to be closer to the true value of than = 0
it follows, therefore that
improvement
Step 2a:
Use
and
are preferable to and , but there is possibility of
and in equation (B) to compute .
Or
t = t-1 + t.
------------------- (B``)
Step 2b:
Use in equation (A) to compute
and .
This process continuous till convergence achieved.
Then the estimator of and becomes stable.

A Version of Direct Search:
Consider equation (A) which can be written as;
Yt = (1- ) + (Xt Xt-1) + Yt-1 + t.
Above equation of Yt regressed on Xt, Xt-1 and Yt-1.
Yt = 0 + 1 Xt 2 Xt-1) + 3 Yt-1 + t.
Such that 1, 3 = 2
We start with initial values of all parameters of , and .
For example we can set = 2, = 0.5 and = 0.7.
This will yield
Yt = (1- ) + (Xt Xt-1) + Yt-1 + t.
= 2 (1-0.7) + 0.5 Xt 0.35 Xt-1 + 0.7 Yt-1 + t.
Page - 61 -
= 0.6 + 0.5 Xt 0.35 Xt-1 + 0.7 Yt-1 + t.

Now compute t= (Y- )
t = Yt - (1- ) - (Xt Xt-1) - Yt-1.
= Yt - 0.6 - 0.5 Xt + 0.35 Xt-1
Finally compute
Let = 52.
Now change one of the three parameters at a time and recomputed .
For example we change from 0.5 to 0.6 keep and = 2, = 0.7.
Now compute the following expression
This is so called numerical derivative, if the expression in (a) is positive, it means that
at = 0.5, errors are increasing in , so we should set less than 0.5.
Repeat the same procedure for , and .
Once we know the directions in which , and should be searched, we can change
the initial values and repeat the entire process.
Example:
= 2, = 0.5, = 0.7.
Derivative with respect to > 0
Derivative with respect to < 0
Derivative with respect to < 0
Now we can set = 0.2
= 0.8
= 0.9.
For example now the signs of derivatives are
Positive for
Positive for
Negative for
Now set = 0.-3.
= 0.7.
= 0.95.
ETROSCEDISTICITY:
Introduction:
If the assumption that Var (Ui) = for all i is violated, well have Var (Ui) =i, which
can vary from observation to observation, this situation is referred to as Hetroscedisticity.
Examples:
(i) Qi = + Ki + Li + Ai +Ui
Page - 62 -
Q = wheat output
K= capital
L = labor
A= acreage.
In our sample we have all size of farms; Var (Ui) measures the size of variation in output
due to random factor. We expect that Var (Ui) to increase with the size of farm.
(ii) Yi = + Xi + Ui
Y= expenditure on snacks
X= income
There is random fluctuation, low variance, and low income in mostly in cross section data.
Yi = + X i + U i
Var (Ui) = i which varies across observation points, one reason can be that when the
value of Xi is larger, there are more chances of larger unexpected variations in Yi, that is
Var (Ui) = i => f (Xi)
Example1):
Yi = + Xi + Ui
Yi is food consumption, Xi is income, and data is at household level. Now the
household with higher income level are expected to experience larger fluctuations in food
consumption.
Example2):
Yi = + X i + U i
Yi is wheat output; Xi is area under wheat crop, and Ui is random fluctuation in
wheat output. Larger the farms are expected to experience larger fluctuations in output. There
can be favorable and unfavorable effects of weather conditions on wheat output.
Obviously hetroscedisticity problem is more likely to arise where larger
variations in Xi. This is more likely to happen in cross section data rather than in time series
data. Hetroscedasticity mainly a problem of cross section data, it may arise in time series data
if the data is observed at low frequency level like daily or weakly.
Consequences of hetroscedisticity:
OLS estimators are remains linear and unbiased.
1). OLS estimators no more have minimum variance in the class of linear unbiased
estimators.
Not remains best.
2). Ordinary formula for calculating variances is no more valid.
Var ( ) _
x
Page - 63 -
OLS estimators are not sufficient, they are larger in variances.

Testing of Hetroscedisticity:
1). Gold-field Quandt test.
2). Glejser test.
3). Rank correlation test.
4). Whites General test.
These all are well known tests, Glejser is weaker test and Whites General test is
modified form of that, Gold-field Quandt test is very power full test.
Gold-field Quandt:
This test is power full test but it can not help in detecting the form of hetroscedisticity
and not giving direction for its solutions.
Consider the regression equation;
Yi = + X i + U i
Steps:
1) Arrange the data in order of Xi (ascending order).
2) Omit central 20% observations (to get some whole number), this will yield two subsamples 40% observations with small Xi and 40%observations with large Xi.
3) Estimate a regression equation for each sub-sample and compute 1, 2 and
hence;
2= 2
1= 1,
n1 k
n2 k
4) Compute F-statistics.
n1 = n2 = 0.4n
F = 1
if 1> 2
2
if 2> 1
F = 2
1
The F test is applied at 5% level of significance and degrees of freedom (df) equal
to n1-k and n2-k.our null and alternative hypothesis are as given below;
H0: 1= 2
H1: 1 2
[no hetroscedisticity]
[Hetroscedisticity]
Notes:
1). Test is very power full.
2). If there are more than one X variables than the test become quite complicated.
Foodi = + Incomei + Familyi +Ui
Or
Yi = + Xi + Zi + Ui
3). the test does not indicate the form of hetroscedisticity (due to linear, quadratic or
simultaneous).
Page - 64 -
Whites General Test:

Whites General test on the other hand is an instrument in detecting the form of
hetroscedasticity and there by directing towards the possible solutions.
Consider the regression equation
Yi = + Xi + Zi + Ui
Steps:
1). Estimate the equation by OLS and compute ei = Yi
2). Estimate by OLS the following equation
i= a0 +a1Xi + a2Zi + b1Xi+ b2Zi+ cXiZi +Vi --------- (2)
3). Compute F-statistics or
F = Explained variation/ (n-1) __

Unexplained variation/ (n-m)
= n R
R is obtained from equation (2) it is not negligible, it is significant.
F is ~ F (n-1), (n-m)
is ~ m
m = 1+ (k-1) + (k-1) + (k-1) (k-2)
____
2
.

.
Intercept, linear, square
Simultaneous
m = 1+ (k-1) + (k-1) + (k-1) (k-2)
2
= 1+ k - 1 + k -1 + k - 3k + 2
2
= 2k 1 + k -1.5k + 1
2
= k + 1k
2 2
m = k (k +1)
2
Hypothesis:
H0: ai =0, bi =0, ci =0 for all I except intercept.
H1: At least one parameter in H0 is 0.
Rejection of H0 indicates presence of Hetroscedisticity.
Notes:
1) If k is large then m will also be large and it will reduce the power of test.
Let suppose k=6 => 6*7 = 21
2
Page - 65 -
If n= 50 then n-m= is too small, it is very poor.

Partial solution for this problem is to omit the simultaneous term, in this case we will
have;
m = 1 + (k-1) + (k-1) + 0
= 1 + k -1 + k- 1
m = 2k-1.
If k = 6 then m =11
2) The test also help in determining the form of Hetroscedisticity, which can be guessed by
looking at estimate of equation (2)s t-statistics.
i= a0 +a1 Xi + a2 Zi + b1 Xi+ b2 Zi+ c Xi Zi + Vi

(1.1) (0.95) (1.32)
(1.15)
(4.5)
(0.99)
3) The test is very general in application, it give more than one form of Hetroscedisticity.
Solutions of Hetroscedisticity:
Informal Solution:
In some contexts, we can re specify our model to reduce the chances of
Hetroscedisticity.
Example1:
Suppose we suspect Hetroscedisticity relates to K (capital), we also expect that + + 1
than we can write,
It is more stable variable model as compare to previous; this equation is less likely to have
Hetroscedisticity.
Example2:
Consider a quadratic expenditure system
QES: Yi = + Xi + Xi+Ui
Yi = food,
Xi = income.
Suppose we expect Var (Ui) = Xi

SD (Ui) = Xi
Now the equation by Xi
Yi =1 + + Xi +Ui
Xi, Xi,
Xi,
Xi
Si = + X i + 1 + V i
Xi
Var (Vi) = Var (1 Ui)
Xi
Page - 66 -
= 1 Var (Ui)
Xi
= 1 Xi
Xi
= -------- no Hetroscedisticity.
[Micro economic theories are based or depend on survey data.]

Formal Solution:
In the formal solution, we use the basic principles.
Suppose
Yi = + Xi + Zi + Ui ---------- (i)
Var (Ui) = i
----------------- (ii)
Transform equation (i) in the light of equation (ii), so that the transformed error term is
homoscedastic.
Thus equation (i) by i
Yi = 1 + Xi + Zi +Ui
i,
i,
i,
i, i
Or
Yi* = Xi + Xi*+ Zi * + Ui* -------- (iii)
Now Var (Ui*) = Var (Ui)
i
= 1 Var (Ui)
i
= 1 i
i
= 1 ----------- No Hetroscedisticity.
Equation (iii) can be estimated only when i is known, thus we have to apply a
two step procedure.
Step1:
Apply OLS to equation (i) and compute the series of i; then if we follow Whites test
well estimate the equation.
i= a0 +a1Xi + a2Zi + b1Xi+ b2Zi+ c Xi Zi +error ----------- (iv)
Apply Whites test,
If the H0 is accepted then hetroscedisticity is not present and step one complete
the estimation, if H0 is rejected then hetroscedisticity is present and we move to step two.
Step2:
From the estimated equation (iv) and compute the estimated value of the dependent
variable i.
Set
Page - 67 -
Hence
Now replace i by
in equation (iii) and apply OLS.
EGRESSION ANALYSIS WITH QUANTITATIVE DATA:
We confine to the cases where in the equation quantitative variables appears on

the right hand side only, e.g. gender, ethnicity, residential etc.
Suppose we consider the effect of gender on income.
Y=+D+U
------------- equation (i)
Where D is dummy or binary variable in equation (i) indicating gender

D =0
D =1
for male
for female
(0 and 1 are more convenient values)
From equation (i) if we assume as usual that E (U) =0 and D is fixed, we can infer the
following,
E (Y) = + D
[Mean income depends on income of male and female]
E (YM) =
E (YF) = +
E (YF) - E (YM) = ( + ) =>
[if D =0, as male]

[if D =1, as female]
[difference between male and female income]
Let us redo by defining two dummies:

D1 =0
D1 =1
for male
for female
D2 =0
for female
D2 =1
for male
We can write the model in three different forms (ways).
Y = 0 + 1D1 + U
--------------------- (i)
Y = 0 + 1D2 + V
--------------------- (ii)
Y = 1D1 + 2D2 + W --------------------- (iii)
If we include all dummies for all categories of a quantitative variable and also include
intercept it will create dummy variable trap, this will create perfect co-linearity and
estimation will break down.
E (Y) = 0 + 1D1 = 0 + 1D2 = 1D1 + 2D2
E (YM) = 0
= 0 + 1 =
+ 2
[if D1 =0, D2 =1, as male]
E (YF) = 0 + 1 = 0
= 1
[if D1 =1, D2 =0, as female]
E (YF) - E (YF) = 1
= - 1
= 1 - 2
[difference]
In equation (iii) model specification is not very good, essentially there is no difference in
results (base category is male) e.g. education and literacy relationship with income.
Page - 68 -
Dummies for more than two categories:

Categories of education qualification (banking side worker)
Illiterate
Primary
Secondary
Senior secondary
Higher or above
Define the following dummy variables;
[Base category is Illiterate]
0-4 years of education

14or above years of education
D2 =1 if primary,
=0 otherwise.
D3 =1 if secondary,
=0 otherwise.
D4 =1 if senior secondary, =0 otherwise.
D5 =1 if higher,
=0 otherwise.
The regression model is;
Y = 1 + 2D2 + 3D3 + 4D4 + 5D5 + U
We can set that
E (Y) = 1 + 2D2 + 3D3 + 4D4 + 5D5
E (YI) = 1
E (YP) = 1+ 2
E (YS) = 1+ 3
E (YSS) = 1+ 4
E (YH) = 1+ 5
[if D1 =1, as Illiterate]

[if D2 =1, as Primary]
[if D3 =1, as Secondary]
[if D4 =1, as Senior Secondary]
[if D5 =1, as Higher]
[Theoretically we expect] 5> 5> 4> 3> 2>0

[General expectation] 1>0
Example: Now consider the effects of gender and education on income.
Gender:
G =1 if female,
=0 otherwise
Education: E2=1 if secondary, =0 otherwise

E3=1 if higher,
=0 otherwise
The model can be constructed as follows;
Y = + G + U -------------------- (1)
Now we propose () is expected income of male and it depends upon level of education,
= 1 + 2E2 + 3E3 ---------------------- (2a)
= 1 +2E2 + 3E3 ---------------------- (2b)
Now substitute (2a) and (2b) into (1) then,
Y = 1 + 2E2 + 3E3 + [1 + 2E2 + 3E3] G + U
Or
Y = 1 + 2E2 + 3E3 + 1G + 2 (E2G) + 3 (E3G) + U
Page - 69 -
Categories
G =0
Male, primary
Male, secondary
Male, higher
Expected income E (Y) _____

E (YP) = 1
E (YS) = 1+ 2
E (YH) = 1+ 3
Female, primary
E (YP) = 1
+ 1
G =1 Female, secondary
E (YS) = 1+ 2 + 1 + 2
Female, higher
E (YH) = 1+ 2 + 1 + 3
----------------------------------------------------------------------------------------------.
Combining Qualitative and Quantitative Variables:
Suppose income depends upon experiences and education, experience is measured as a
quantitative variable (the years of experiences), education has three categories;
1). M Sc. or Equivalent
2). M Phil or Equivalent
3). PhD or Equivalent
Defining dummies;
D2 = 1 if M.Phil, = 0 otherwise
D3 = 1 if PhD,
= 0 otherwise
The model can be constructed as follows,
Y = + E + U -------------------- (1)
= 1 + 2D2 + 3D3 ---------------------- (2a)
= 1 +2D2 + 3D3 ---------------------- (2b)
Substitute (2a) and (2b) into (1) then,
Y = 1 + 2D2 + 3D3+ [1 +2D2 + 3D3] E + U
[Mean Income]
E (Y) = 1 + 2D2 + 3D3+ [1 +2D2 + 3D3] E
E (Y M Sc) = 1 + 1E
[Mean income at M Sc. level]
E (Y M.Phil) = ( 1+ 2) + (1 + 2) E
[Mean income at M Phil level]
E (Y PhD) = ( 1+ 3) + (1 + 3) E
[Mean income at PhD level]
We expect that
3> 2>0,
3> 2>0,
1>0
1>0
Page - 70 -
TOCHASTIC/ RANDOM REGRESSORS:
Suppose the assumption that X variables are exogenous is not true. This situation is
called as the case of stochastic/ random repressors.
In a typical equation we have,
Y=+X+U
X
is given, U is random and model is complete.
If X is not given then the model,

Y=+X+U
U
is random, model is not complete (information is not complete).
Example1: Macro level consumption function.

Ct = + Yt + Ut
Yt = Ct + Zt
[Y it self depends upon C, it is case of simultaneous equation, as we can see from
below graph]
Example2: [Nt = A
Ut] Population is growing exponentially.
Log Nt = + Log Nt-1 + Ut

Page - 71 -
Log Nt-1 is not exactly given, it follows the path,

Log Nt-1 = + Log Nt-2 + Ut-1
[Nt-1 is also evolving from previous population Nt-2 and so on.]
Example3: Market Demand function
Q=+P+U
P is not given,
Infect both P and Q is determined by the intersection of supply and demand.
Example4: We have the following relationship for a sample of children aged 0 to18.
Weight = + food + U
Cross section data of 100 on average, more you eat more will be the weight and food is
not independent, it also depends upon weight also.
Example5: Investment and Saving Model (IS-Model).
Y=+R+G+U
IS:
R=
Interest rate, G = government expenditures
Where G is given and R and Y are not given, government can change G according to
their needs.
Example6:
Age
Weight = + Age + U
is given and information is complete, there are more factors like age are given in
the practice.
Consequences of stochastic/ random Regressors problem:
Consider
Y=+X+U
U satisfies all standard assumptions.
X is not fixed, it is random
Page - 72 -
1). OLS estimator of is :

= xy
x
= xY
x
= x1 Y1 + x2 Y2 + x3 Y3 + .. + xn Yn
x
x x
x
= a1 Y1 + a 2 Y2 + a 3Y3 + .. + a n Yn
Now X variables are not fixed, therefore we can not treat a1, a 2, a 3 a n as constants,
in this case a1, a 2, a 3 a n are them salves random variables, so is not a linear function of
Y. so property of linearity of OLS estimator is violated.
2). Consider E ( ):
= + xU
x
= + x1 U1 + x2 U2 + x3 U3 + .. + xn Un
x
x x
x
Apply expectation:
E ( ) = +E ( x1 U1) +E ( x2 U2) +E ( x3 U3) + .. +E ( xn Un)
x
x
x
x
E ( ) + x1 E (U1) + x2 E (U2) + x3 E (U3) + .. + xn E (Un)
x
x
x
x
Because x can not be factored out from expectation, (X and U are correlated with each other)
so
is biased.
3). It can be shown that if x and u are independent, but x is random then the OLS estimator
is biased but with increase in sample size the biasness approach towards zero.
Examples:
1). Weight = + F + U
UWF
(F and U are correlated)
2). Q = + P + U
UQP
3). Y = + R + G + U
UYR
A good example is;
(P and U are correlated)

(R and U are correlated)
Yt = + Yt-1 + U t
Since Yt-1 = + Yt-2 + Ut-1 depends on random variable U t-1 soYt-1 is random, Ut is
uncorrelated with Yt-1 (It means todays event does not depends upon yesterdays action)
todays shock does not change yesterdays event.
However, Ut and Yt-1 are independent as sample size tends to infinity or reasonable large than
bias will be negligible.
Page - 73 -
4). It can be shown that if x is random and x is not independent of U then OLS estimator is
biased and the amount of bias does not diminish with increase in sample size.
5). Consider special case of example 4.
Yt = + Yt-1 + U t
[Current CPI depends upon previous CPI].
Ut = Ut-1 + t
Ut
Now Ut-1
Ut and Yt-1 are correlated
Yt-1
Now OLS estimator becomes biased. [Recall that D W statistics also becomes biased, now
we know the reason]. Auto correlated and leg dependent variables both create more serious
problem.
Solution / Estimation Procedure:
Consider the model
Y=+X+U
Cov(X, U) 0
[X is not given, not random and correlated with U, example 4]
Now we define instrumental variable, say Z, as a variable that satisfies two conditions;
1). Z and X are closely correlated with in the given sample.
2). Cov (Z, U) =0 in the population. [This seems impossible in sense]
X
U
But Z is not correlated with U.
Z
Food = + Weight + U
X = weight
Z = age
Age does not depends upon U

Weight
U (sickness)
Age
Example:
C=+Y+U
Y=C+Z
Z = exogenous [U C Y]
Z, Yt-1
Time series data and Yt-1 also good instrumental factor to use in this example
Page - 74 -
One important of estimation is Two Stages Least Squares (2SLS) Method.

Model:
Stage 1:
Y=+X+U
------------ (i)
Cov (X, U) 0
Z is valid instrument.
Regress X on Z
X=a+bZ+V
Apply OLS and obtain estimated
and
and hence
is such that it contains only those variations in X which are determined by Z

(basically we are filtering out the problem).
In other words we have
Not explained by Z, Trouble some.

Explained by Z, Trouble free roughly.
We can say that X is endogenous and therefore trouble some

De-endogenizes X
Stage 2: Rewrite the main equation (i) as follows;
Y=+(
)+U
Error in X variable
Or
Where
Now apply OLS,
It can be shown that estimator, so obtained are;

1). Not linear
2). Biased
Page - 75 -
3). asymptotically unbiased.

[As the sample size increases biasness diminishes towards zero].
These estimators are called 2SLS estimators as well as Instrumental Variables Least
Squares [IVLS] estimators.
imultaneous Equations estimations:
Consider any two equations

Let
Q = + P + Y + U
Q = a + bP + cW + dR+ V
[Demand]
[Supply]
P and Q both are endogenous variables, we are going to calculate the endogenous variables at
the same time and find first variable and put the value of that to calculate the second one at the same
time is referred to simultaneous equation case.
! ~~~~~~~~~~~~~~~~~~~~~~~~~ !
Page - 76 -
Page - 77 -
Econometrics Practice Questions

Sir Eatzaz Ahmed
Q.1: Are the following statements true, false or uncertain? Explain your answer.
a) Instrumental variables are used when data on some variables are not available.
Answer: False statement, because the instrumental variables are used as proxy variable on the behalf
of endogenous variables when there is endogenity problem we define instrumental variable.
b) OLS estimators for the parameters of simultaneous equation are inconsistent when the
equations are under-identified. However, the estimators become consistent if the equations
are identified.
Answer: We have not learned that simultaneous equation topic.
c) Cochrane-Orcutt iterative procedure is a test for autocorrelation.
Answer: False Statement, Cochrane-Orcutt iterative is not a test for autocorrelation because it
is solution for autocorrelation.

d) Multicollinearity problem arises only when there are many equations in the model.
Answer: False Statement, because Multicollinearity problem arises only when there are many
explanatory variables in the model.

e) A major limitation of DW test is that it is a powerful test.
Answer: False Statement, because it is not a limitation of DW test; it is specialty that it is
powerful test.
f) Following is a set of simultaneous equations
Z = + Y +U
Y=C+I+G+XM
Answer: False Statement, because Z and Y are not simultaneously determined where value of
Y is given in the second equation if both are not given same time then it will be the
set of simultaneous equation.
g) The inconclusive range in the DW test is the result of type-2 error.
Answer: False Statement because type-2 error is acceptance of H0 when it is false and inconclusive
range means we unable to give concrete result.
h) Goldfield-Quandt test is a powerful method of estimating an equation in the presence of
hetroscedisticity.
Answer: False Statement, because Goldfield-Quandt is a test not a estimating method or
solution in the presence of hetroscedasticity.

In the regression equation Yt = + X t + 2 t-2 + 1 t-1 + t , where t is a random error
term, multicollinearity can occur if there is strong correlation between t-1 and t-2 .
Answer: True Statement, because the regression is run on the random error terms and
i)
multicollinearity is present.
Q.2: Critically evaluate the following statements. Give details to justify your answer.
a) Hybrid equations are used in order to remove both autocorrelation and multicollinearity from
an equation.
Answer: We have not learned that hybrid equation topic.
b) In the presence of multicollinearity, the OLS estimator is linear and unbiased and its variance
is smaller than the variance of any other linear and unbiased estimator.
Answer: It is true that In the presence of multicollinearity, the OLS estimator is linear and
unbiased and its variance is smaller than the variance of any other linear and unbiased
estimator.
Page - 78 -
c) In the presence of autocorrelation OLS estimators of regression parameters are likely to have
large sampling error and, therefore, they are unbiased.
Answer: It is true that In the presence of autocorrelation OLS estimators of regression
parameters are likely to have large sampling error and, therefore, they are unbiased.
d) The estimators based on Cochrane-Orcutt iterative method are linear and unbiased with
minimum variance.
Q.3: Using a cross section data of 500 household you are to study the effects of income, rural-urban
residence and education level of the household on household savings. The information on
education is classified as no education, school level education and higher education. Formulate an
appropriate regression equation.
Answer:
Saving = f [Income (Y), Residence (R), Education (E)]

S = + Y + U.
-------------------- (1)
Education dummies:
E2 = 1 if school level, = 0 other wise.
E3 = 1 if higher level, = 0 other wise.
Residential dummy:
R = 1 if rural, = 0 other wise.
= 0+ 1 R + 2 E2 + 3 E3
= 0 + 1 R + 2 E2 + 3 E3
------------------- (2a)
------------------- (2b)
Substitute (2a) and (2b) in (1)

S = 0+ 1 R + 2 E2 + 3 E3 + (0 + 1 R + 2 E2 + 3 E3) Y + U.
Or
S = 0+ 1 R + 2 E2 + 3 E3 + 0 Y + 1 (RY) + 2 (E2 Y) + 3 (E3 Y) + U.
Q.4: Carefully explain steps to apply Whites hetroscedisticity test for the Q.3 equation.
Answer:
S = 0+ 1 R + 2 E2 + 3 E3 + 0 Y + 1 (RY) + 2 (E2 Y) + 3 (E3 Y) + U.
Apply Whites General Test.

Steps:
(1) Estimate the equation by OLS and compute.
i = S
(2) Estimate by OLS the following equation.
S = a0+ a1 R + a2 E2 + a3 E3 + a4 Y + b1 R + b2 E2 + b3 E3 + b4 Y
+ c1 RY + c2 E2Y + c3 E3Y + V.
This given i equation all the forms of hetroscedasticity.
(3) Compute F-statistics or
F = Explained variation/ (n-1) __
Unexplained variation/ (n-m)
= n R
[R is obtained from given equation, it is significant.]
Where
m = 2k-1 with out cross product terms
Page - 79 -
= 2*9 1.
= 18 1.
m = 17
[k = 9]
n = 500
n m = 500 17 = 483.
m = k (k +1)
2
Test the Hypothesis:
H0: ai =0, bi =0, ci =0 for all i except intercept.
H1: At least one parameter in H0 is 0.
Rejection of H0 indicates presence of hetroscedisticity and Acceptance of H0 indicates
no hetroscedisticity.
Q.5: Consider the following estimated regression equation based on a sample of 26 firms in a
manufacturing industry of Pakistan, where MPL and L denote the marginal product of labor and
the number of labor units respectively. The values in parenthesis are computed the t-values.
MPL = 100 +.012 (1/L), R = 0.4
(40.0)
(0.03)
a) Explain and interpret all the results.

b) Test the prepositions that marginal product of labor is diminishing.
c) Suppose your sample include 16 firms in the privates sector and 10 in public sector. How
would you modify the regression equation in order to allow for the possibility that the
marginal product of labor diminishes faster in private sector than in the public sector? How
would you carry out the test?
Q.6: Consider the following demand equation, where M is the quantity of Money, Y is real out put, P
is general price level, W is financial wealth and t denotes the time period.
Log (Mt) = + log (Yt) + log (Pt) + Rt + log (Wt) + log (Mt-1) + Ut
a) Explain the meaning of each parameter.
Answer: Meaning of parameters
= Subsistence or Autonomous elasticity of money demand

= Log (Mt) =
Log (Yt)
% change in elasticity of money demand
= Log (Mt)
Log (Pt)
= Log (Mt)
Rt
= Log (Mt)
Log (Wt)
% change in output elasticity of money demand
% change in price elasticity of money demand
% change in World interest rate
= Log (Mt) =
Log (Mt-1)
% change in financial wealth elasticity of money demand
% change in previous elasticity of money demand
Page - 80 -
b) How would you test the following propositions, one at time?

Answer: Testing of propositions
i.
Money demand does not depend on financial wealth
Answer: H0: = 0
H1: 0
ii.
The output elasticity of money demand is greater than the price elasticity
Answer:
H0: > 0
H1: = 0
Answer:
H0: = , = 0, = 0, = 0
H1: , 0, 0, 0
iii.
Money demand depends on nominal income PY only
Log (Mt) = + log (Yt) + log (Pt)

Log (Mt) = + [log (Yt) + log (Pt)]
Log (Mt) = + log (Yt Pt)
Q.7: In the regression equation Yi = a + bXi + c Xi-1 + Ui, Durbin-Watson test is not appropriate to
detect first order autocorrelation. Do you agree? If yes, which test is suitable in this case?
Answer: Not agree, because there is no lagged dependent variable; there is lagged
independent variable on the right hand side when lagged dependent appears on the right
hand side then DW is not appropriate to detect first order autocorrelation and Durbin htest is suitable in this case.
Q.8: Can Whites general test detect all type of autocorrelation in a random variable?
Answer: This statement is wrong because Whites General test is for hetroscedasticity
problem not for detection of autocorrelation problem.

Q.9: In the presence of hetroscedasticity a regression equation can be estimated by Goldfield-Quandt
test. Do you agree?
Answer: We can not estimate the regression equation in the presence of hetroscedasticity
because Goldfield-Quandt is test not an estimation technique; estimation techniques are

OLS and many others.
Q.10: While estimation the regression equation Yi = a + bXi + Ui, multicollinearity problem is more
likely to occur in cross section data than in time series data. Do you agree?
Answer: No, because multicollinearity problem of correlation among explanatory (X)
variables and is more likely to occur in time series data and there is only one explanatory
variable in the given regression equation.
Q.11: Interpret the following regression equation as an economist. C, Y and W are per capita
consumption, income and wealth respectively, all in thousand rupees. Numbers in parentheses are
the t-values.
Ct = 1.17 + 0.45Yt + 0.55Wt,
(2.13) (6.17) (1.43)
Answer: Interpretation: -
R = 0.9643, DW = 0.09.
The result show that 96.43% variation in consumption expenditure is

explain by our model, which indicates that the over all performance of the equation is
satisfactory, the intercept is positive and significantly different from zero, its magnitude
shows that the subsistence or autonomous consumption expenditure is Rs.1.17 thousand
Page - 81 -
rupees per capita per year, further note that marginal propensity to consume (MPC) is
significantly different from zero and less than one, the estimated value of the MPC shows
that the marginal consumption rate is 0.45 or 45% of each incremental rupee of income is
consumed, while the remaining 55% is the marginal consumption rate of each
incremental rupee of wealth is consumed.
Q.12: Suppose in the equation Yt = a + bXt + Ut, the stochastic variables Xt and Ut are correlated with
each other.
a) Does this imply that we have problems of autocorrelation and/or multicollinearity and/or
hetroscedasticity?
Answer: It is not a problem of autocorrelation or multicollinearity or hetroscedasticity then it
is endogeniety problem
b) Can in this case the equation be estimated by Whites general test or Durbin-Watson test or
Durbins h-test?
Answer: We can not because Whites general test or Durbin-Watson test or Durbins h-test are
tests not solutions for the given equation.
Q.13: Suppose you have estimated two alternative cost functions for wheat using data on 500 farms.
The cost (C) is measured in thousands of rupees while output (Q) is measured in tons. The
regression results are given below. The vales in parentheses are standard errors.
C/Q = 4568 + 0.2284 Q + 4.84 Q-
(1141) (0.5521) (4.40)
Log (C) = - 10.48 +1.12 log (Q)

(2.56) (0.16)
Can you test the null hypothesis that the marginal cost is an increasing function of output for each
equation? If yes, apply the test and draw your conclusion. If not, explain why the test cannot be
applied and what additional information, if any, is required to perform the test.
Answer:
Log (C) = - 10.48 + 1.12 log (Q)
(2.56) (0.16)
Null Hypothesis
H0: = 0
H1: 0
Degree of freedom = (n-k)

= (500 2) => 498.
Critical value = 1.96

Level of significance = 5%
Two tail test
Apply test
t
We reject H0.
H0: = 1
H1: < 1

= (500 2) => 498.
Page - 82 -

Right tail test
Apply test
We accept H0.
C/Q = 4568 + 0.2284 Q + 4.84 Q-
(1141) (0.5521) (4.40)
Null Hypothesis
H0: = 0
H1: 0

= (500 2) => 498.

Two tail test
Apply test
We reject H0.
H0: = 0
H1: 0

= (500 2) => 498.

Two tail test
Apply test
We reject H0.
Page - 83 -
H0: = 1
H1: < 1

= (500 2) => 498.

Right tail test
Apply test
We accept H0.
H0: = 1
H1: > 1

= (500 2) => 498.

Left tail test
Apply test
We accept H0.
Conclusion:
The over all tests on each equation is given but we are able to make the
decision that over all equation is satisfactory and there is also need to check the performance
of the equation given by R which is not given in each of the equation. It is shown from the
results that marginal cost is not increasing function of the output for each equation.
Q.14: Suppose you want to study the propositions:
Loan recovery rate varies considerably across private commercial banks, public owned
commercial banks and development finance institutions,
ii.
The loan recovery rate declines with the size of loan. Formulate an appropriate econometric
equation, giving special attention to construction of variables and the type of data to be used
for estimation.
Answer:
i.
Loan Recovery = f [size of loan (Z), nature of bank)

LR = + Z + U.
---------------- (1)
Page - 84 -
Dummies of banks:
D2 = 1 if Public own commercial banks, = 0 other wise.
D3 = 1 if Development finance banks, = 0 other wise.
= 1 + 2 D2 + 3 D3
= 1 + 2 D2 + 3 D3
------------------- (2a)
------------------- (2b)

LR = 1 + 2 D2 + 3 D3 + (1 + 2 D2 + 3 D3) Z + U.
E (LR) = 1 + 2 D2 + 3 D3 + 1Z+ 2 (D2Z) + 3 (D3Z)
E (LR, commercial banks) = 1 + 1Z.
E (LR, Public own commercial banks) = (1 + 2) + (1 + 2) Z.
E (LR, Development finance banks) = (1 + 3) + (1+ 3) Z.
Q.15: Consider the following the regression equation estimated by OLS using time series data for 24
years, where E is official exchange rate (rupees per US dollar), P is domestic price level (CPI),
is world price level, and T is trade deficit as a percentage of GDP. The numbers in
parentheses are the computed t-values.
Log (Et) = 0.12 + 0.58 Log (Pt) 0.23 Log ( t) + 0.0021T t + 0.92 Log (E t-1)
(2.23) (3.21)
(-1.85)
(2.43)
(23.00)
R = 0.9958
a)
b)
c)
d)
DW = 1.82
Interpret all the results other than R and DW statistic.

What could be the reason for including lagged exchange rate in the equation?
Are there are serious econometric problems apparent from the results?
How would you re-estimate the equation in the light of these problems? If there are two or
more problems, consider each one at a time.
Q.16: Consider the following model of IS-LM equilibrium:

IS:
LM:
Y = + R + Z +U
M = + R + Y + W +V
Where Y is aggregate expenditure, R is interest rate; Z is exogenous component of aggregate

expenditure, M is the quantity of money and W is financial wealth. Suppose the State Bank of
Pakistan (SBP) pegs money supply at predetermined levels (that is the quantity of money is
exogenous) and lets the interest rate be determined in the market (the interest rate is
endogenous).
Q.17:
a) Why do we include lagged variables in a regression equation? Illustrate using an example
from economics.
Answer: We use the lagged dependent variable to capture the inertia (sluggish ness) of the equation,
for example current consumption depends up on previous consumption.
b) Explain the use of Instrumental Variable Least Squares method using your example.
Answer:
Model:
Y=+X+U
------------ (i)
Cov (X, U) 0
Z is valid instrument.
Page - 85 -
Stage 1:
Regress X on Z
X=a+bZ+V
Apply OLS and obtain estimated
and
and hence
is such that it contains only those variations in X which are determined by Z (basically
we are filtering out the problem).
Not explained by Z, Trouble some.

Explained by Z, Trouble free roughly.
We can say that X is endogenous and therefore trouble some

De-endogenizes X
Stage 2: Rewrite the main equation (i) as follows;
Y=+(
)+U
Error in X variable
Or
Where
Now apply OLS
It can be shown that estimator, so obtained are;

1). Not linear
2). Biased
3). asymptotically unbiased.
[As the sample size increases biasness diminishes towards zero].
These estimators are called 2SLS estimators as well as Instrumental Variables Least
Squares [IVLS] estimators.
c) Provide interpretation for each parameter in the light of economic model you have chosen.
Answer:
Page - 86 -
Q.18:
a) Explain the use of dummy variables in determining the effects of gender (male or female) and
education (matriculation, intermediate, bachelor or higher) on wage rates among clerical
personnel.
Answer:
Wage = f [Gender (G), Education (E)]
Gender dummy:
G =1 if female, = 0 other wise.
Education dummies:
E2 = 1 if intermediate, = 0 other wise.
E3 = 1 if higher, = 0 other wise.
We can construct model in this way;
W = + G + U.
= 1 + 2 E2 + 3 E3
= 1 + 2 E2 + 3 E3
------------------- (1)
------------------- (2a)
------------------- (2b)

W = 1 + 2 E2 + 3 E3 + (1 + 2 E2 + 3 E3) G + U.
Or
W = 1 + 2 E2 + 3 E3 + 1G + 2 (E2G) + 3 (E3G) + U.
Categories
Expected Wage E (W)
Male, Matriculation
G = 0. Male, Intermediate
Male, Higher
= 1.
= 1 + 2
= 1 + 3.
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------.
Female, Matriculation
G = 1. Female, Intermediate
Female, Higher
= 1 + 1
= (1 + 2) + (1 + 2)
= (1 + 3) + (1 + 3).
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------.
b) Provide interpretation for each parameter.

Answer:
Interpretation of parameters.
1 = Mean wage of male at matriculation level education.
2 = Mean wage of male at intermediate level education.
3 = Mean wage of male at higher level education.
1 = Differential effect of being a female at matriculation level education.
2 = Differential effect of being a female at intermediate level education.
3 = Differential effect of being a female at higher level education.
Page - 87 -
Q.19: Using the regression equation Yi = + Ui provide a precise answer to the following question
with or without mathematical proofs.
a) Under what assumption is the OLS estimator of linear?
Answer: should be linear function Y.
b) Under what assumption is the OLS estimator of unbiased?
Answer: when E ( ) = then OLS estimator will be unbiased.
Q.20: Consider the following demand function for rice where Q is per capita consumption of rice in
kilograms, P is price of rice per kilogram and M is per capita income in rupees. The regression
equation has been estimated on the basis of time series data for 9 years. The values in
parentheses are standard errors.
ln Q = 2.46 0.45 ln P + 0.65 ln M
(0.82) (0.20)
(0.50)
R = 0.90,
F = 12.00
a) Explain the meanings of estimated regression coefficients.

b) Test the null hypothesis of the following one by one and interpret the results of your tests.
i.
Income elasticity of rice is greater than one.
ii.
Income elasticity of rice is negative.
Q.21: In the regression equation Yi = Xi +Ui the parameter can be estimated using one of the
following methods.
a) Are the two estimators linear; prove?

b) Are the two estimators unbiased; prove?
c) Which of the two estimators do you prefer over the other? Justify your choice.
Q.22: What are the limitations of DW test for autocorrelation?
Answer:
1. The test statistics has inconclusive range, so it may not produce a concrete
conclusion.
2. The test is especially designed for AR (1) process, but not for higher order auto
processes or MA process or others.
3. DW gives biased results when lagged dependent variable appears on the right hand
side.
Q.23: Explain rank correlation test for hetroscedasticity.

Answer: We have not learned that test.
Q.24: Derive autocorrelation function for the following autoregressive processes.
a) Ut = 3 Ut-3 + t
b) Ut = t + 1 t -1 + 3 t -3
Q.25: Derive OLS estimators for the parameters of the following equations, where NX, X, and P are
net export (exports minus imports), export and consumer price index respectively.
a) NX = Xt - Yt +Ut
Page - 88 -
Answer:
Apply OLS and estimate the NXt to compute regression residual i.
First order condition.
b) Log (Pt) = + log (Pt-1) + Ut
Answer: Apply OLS and estimate the equation to compute regression residual i.
First order condition:
Q.26: Interpret multicollinearity problem as poor information content in data. Consider any
estimation strategy and explain how it can improve the information content.
Page - 89 -
Q.27: What econometric problems arise in the estimation of an equation with lagged dependent
variable on the right hand side? Suggest solution(s) to these problems.
Q.28: Specify an econometric equation to determine monthly earning in a cross section of 300
economists in Pakistan. Define all the variables in your model and explain how they can be
measured in practice.
Q.29: Determine identification of the following two equations by hybrid equations method and
explain the steps for estimation of each equation by 2SLS method. Consider the following set
of equations.
a) Y = 1 + 2 R + 3 Z + U
Cov (R, U) 0.
Answer: Here
Z is the valid instrumental variable.

Stage 1: Regress R on Z
R=a+bZ+V
Apply OLS and obtain , and
Where is such that it contains only that variation in R which is explained or determined
by Z.?
R= +R .
We can say that Z is endogenous there is some trouble.
De-endogenizes R
Stage 2: Rewrite the main equation as follows.
Y = 1 + 2 ( + R ) + 3 Z + U
Y = 1 + 2 + 3 Z + U + 2 (R ).
Error in R variable
Or
Y = 1 + 2 + 3 Z + V
Where V = 2 (R ).
Now apply OLS
Where
It can be shown that estimators, so obtained are

(1) Not linear.
(2) Biased.
(3) Asymptotically unbiased
[As the sample size increases biasness will diminishes towards zero].
b) M = 1 + 2 Y + 3 R + V
Answer: Here
Cov (Y, V) 0.
R is the valid instrumental variable.

Page - 90 -
Stage 1: Regress Y on R
Y = 1 + 2 R + U
Apply OLS and obtain , and
Where is such that, it contains only that variation which explained or determined by R.
Y= +YWe can say that Y is endogenous, therefore there is some trouble
De-endogenizes Y
Stage 2: Rewrite the main equation as follows
M = 1 + 2 ( + Y - ) + 3 R + V
M = 1 + 2 + + 3 R + V + 2 (Y - )
Error in Y variable
Or
[W = 2 (Y - )]
M = 1 + 2 + + 3 R + W
Now apply OLS

Where
It can be shown that estimators, so obtained are

(1) Not linear.
(2) Biased.
(3) Asymptotically unbiased
[As the sample size increases biasness will diminishes towards zero].
Q.30: Consider an econometric equation involving four or more variables. Suppose you have access
to only annual data for 25 years for Pakistan and no other data are available in or outside
Pakistan. Further suppose that there is severe multicollinearity in data that can not be
eliminated by dropping any variable from the equation. How would you handle this situation?
Provide an elaborate answer.
Q.31: The daily demand for strawberries in Islamabad depends on price of strawberries only. On each
day a fixed quantity of strawberries (that can change from day to day) is brought to the market
and the price is determined at a level that clears the market. If it were known that the elasticity
of demand is constant. Would you be able to obtain unbiased estimator of the elasticity?
Answer:
Qd = + P + U.
Qs = Q is fixed.
[Demand function]
[Supply function]
Page - 91 -
Quantity supply is fixed, exogenous variable and Price is endogenous.
If elasticity of demand is (fixed) constant than the demand function becomes;

Log Qd = + log P + U.
Log Qs = Q is fixed.
[Demand function]
[Supply function]
Here we can not take out the P from expectation because P is not fixed variable, so it
becomes biased because P and U are correlated with each other.
Q.32: You want to estimate Cobb-Douglas production function for manufacturing sector of
Pakistan with capital, labor and energy as the factor inputs, with only 16 time series
observations available. Multicollinearity problem is likely to arise. In order to tackle
this problem one can use 16 observations on the private sector and other 16 on public
sector to make a pooled sample of 32 observations. What complications are likely to
arise due to pooling and how would you respond to these complications?
!~~~~~~~~~~~~~~~~~~~~~~~~~!
Page - 92 -
E523 Econometrics
Sir Eatzaz Ahmed
Terminal Paper
Spring Semester 2007

Total marks: 75
Time: 3 hour
NOTE: Attempt any three questions. Each question is worth 25 marks

1.
Explain and differentiate between:

a) Error term of a regression equation and regression residual
b) Random and fixed variables
c) R square and Adjusted R-square
d) Goldfield-Quandt Test and Whites general test
e) Dummy, proxy and instrumental variables
2.
Are the following statements true, false or uncertain? Explain your answer.
a) The sample mean of the random error term, U = 1 Ui is equal to zero.
n
b) In the regression equation Y/ X = b + U the OLS estimator of b is equal to Y / X.
c) If the variable Y is regressed on X and log (X), it may create multicollinearity due
to strong linear relationship between the variables X and log (X).
d) In the equation X t = + Y t + Y t-1 + U t a major limitation of DW test is that it
produces biased results due to presence of Y t-1 on right hand side of the
equation.
e) Since a dummy variable can take only values, it must be fixed (exogenous).
f) Instrumental variables are used to test the presence of endogenous variables in the
equation.
3. Consider the regression model:
Yt = + Yt + Ut
U t = U t-2 + t.
t is white noise
a) Derive autocorrelation coefficients for the lag lengths 0, 1, 2, 3, 4.
b) Explain the Two-Step Iterative method of estimation.
4.
Consider the regression model:

Yt = + Xt + Ut
X t = a + b Y t + V t.
U and V satisfy all standard assumptions.
a) Show that the OLS estimator of is biased.
b) Does bias decrease with increase in sample size?
c) Consider any context in economics in which the above model can be applied.
Mention what are the X and Y variables in the context that you have chosen.
d) In the context you have chosen, what instrumental variable can be used for
estimating and .
! ~~~~~~~~~~~~~~~~~~~~~~~~ !
Page - 93 -
E523 Econometrics
Sir Eatzaz Ahmed
1st Mid Term

Total marks: 35
Time: 1 hour
1. Suppose you have estimated two alternative cost functions for wheat using data on 500 farms. The
cost (C) is measured in thousands of rupees while output (Q) is measured in tons. The regression
results are given below. The vales in parentheses are standard errors.
C/Q = 4568 + 0.2284 Q + 4.84 Q-
(1141) (0.5521) (4.40)
Log (C) = - 10.48 +1.12 log (Q)

(2.56) (0.16)
Can you test the null hypothesis that the marginal cost is an increasing function of output for each
equation? If yes, apply the test and draw your conclusion. If not, explain why the test cannot be
applied and what additional information, if any, is required to perform the test.
Answer:
Log (C) = - 10.48 + 1.12 log (Q)
(2.56) (0.16)
Null Hypothesis
H0: = 0
H1: 0

= (500 2) => 498.

Two tail test
Apply test
t
We reject H0.
H0: = 1
H1: < 1

= (500 2) => 498.

Right tail test
Apply test
We accept H0.
Page - 94 -
C/Q = 4568 + 0.2284 Q + 4.84 Q-

(1141) (0.5521) (4.40)
Null Hypothesis
H0: = 0
H1: 0

= (500 2) => 498.

Two tail test
Apply test
We reject H0.
H0: = 0
H1: 0

= (500 2) => 498.

Two tail test
Apply test
We reject H0.
H0: = 1
H1: < 1

= (500 2) => 498.

Right tail test
Apply test
We accept H0.
Page - 95 -
H0: = 1
H1: > 1

= (500 2) => 498.

Left tail test
Apply test
We accept H0.
Conclusion:
The over all tests on each equation is given but we are able to make the
decision that over all equation is satisfactory and there is also need to check the
performance of the equation given by R which is not given in each of the equation. It is
shown from the results that marginal cost is not increasing function of the output for each
equation.
2. Using the regression equation Yi = + Xt + Ui provide a precise answer to the following question
with or without mathematical proofs.
a) OLS estimator of is .
Answer: OLS estimator of is .
Yi = + Xt + Ui
Estimation:
As we know that
= Yi As we know
= (Yi - )
= (Yi -
-Xt)
First order condition
=> -2 (Yi -
-Xt) = 0.
=> Yi n - Xt) = 0.
=> Yi Xt = n
Page - 96 -
Both sides divide by n
b) Under what assumption is the OLS estimator of linear?

Answer: OLS estimator of linear
= 1 ( Yi Xt)
n
= 1 (Yi Xt)
n
= 1 ( + Xt + Ui Xt)
n
= 1 ( + Ui)
n
= (1 +1 Ui)
n
n
= n 1 +1 Ui)
n
n
= +1 Ui
n
= + ai Ui
where ai = 1.
n
c) Under what assumption is the OLS estimator of unbiased?
Answer: OLS estimator
is unbiased
E ( ) = .
Proof:
E ( ) = E [ + (ai Ui)]
= + ai E (Ui)
= + ai (0)
since ai is fixed
where E (Ui) = 0
E( )=
! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ !
Page - 97 -
E523 Econometrics
Sir Eatzaz Ahmed
2nd Mid Term

Total marks: 35
Time: 1 hour
1.
a) How would you simply define Multicollinearity?
b) What type of procedure do you suggest to Diagnose or test Multicollinearity?
c) Multicollinearity?
2.
d) Drive Autocorrelation coefficient function at lag length 0, 1, 2, 3, 4.
e) Consider the following model & solve through Iterative Two-Step procedure.
Yt = + Xt + Ut. -------------------------- (1)

Ut = Ut-1 + t.
-------------------------- (2)
t. is White noise.
! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ !
Page - 98 -

Dr. Etazaz Econometrics Notes PDF

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Dr. Etazaz Econometrics Notes PDF

Caricato da

Copyright:

Formati disponibili

ECONOMETRICS LECTURES OF DR.

Quaid-e-Azam University Islamabad

ECONOMETRICS LECTURES OF DR. EATZAZ AHMED

It is a subject in which we formulate mathematical relationship among economic

ECONOMETRICS LECTURES OF DR. EATZAZ AHMED

It is also challengeable assumption.

ECONOMETRICS LECTURES OF DR. EATZAZ AHMED

is regression error or estimation error or residual.

Choosing (2) is wrong criteria.

Quaid-e-Azam University Islamabad

ECONOMETRICS LECTURES OF DR. EATZAZ AHMED

where [E (U) =0]

Regression error or residual

[Ordinary Least Square Estimator]

_ [(Y1 - ) + (Y2 - ) + (Y3 - ) + ..+ (Yn - ) ] =0

[2(Y1 - ) (-1) + 2(Y2 - ) (-1) + 2(Y3 - ) (-1) + . + 2 (Yn - ) (-1)] =0

Quaid-e-Azam University Islamabad

ECONOMETRICS LECTURES OF DR. EATZAZ AHMED

-2[Y n ] =0 divide both sides by -2 and n

OLS estimator of is mean of Y.

has min sum square of errors ei.

is a random variable of Ui.

Where a0= , ai= 1_ and so on for all as.

ECONOMETRICS LECTURES OF DR. EATZAZ AHMED

has minimum variance in the class of linear unbiased estimator.

* = b1Y1 + b2Y2+ b3Y3+ b4Y4+ . + bnYn

Where b1, b2, b3, b4 bn are constants obviously * is linear, to make *

ECONOMETRICS LECTURES OF DR. EATZAZ AHMED

bi = _ substitute in equation (B)

Quaid-e-Azam University Islamabad

ECONOMETRICS LECTURES OF DR. EATZAZ AHMED

Comments on BLUE property:

[Lf = linear function]

is normally distributed ~N.

mportance of BLUE property:

Linearity is important to apply tools of statistical inferences because of the following

ECONOMETRICS LECTURES OF DR. EATZAZ AHMED

ECONOMETRICS LECTURES OF DR. EATZAZ AHMED

Z= error committed due to above three reasons.

This model is known as General Linear Regression Model (focus on parameters).

Quaid-e-Azam University Islamabad

ECONOMETRICS LECTURES OF DR. EATZAZ AHMED

wo Variable Regression Model

Example: Height or Age is exogenous variables.

and the first-order conditions are;

Divide both sides by -2 and rearrange

Yi- n - Xi =0 ----------- (i)

Substitute (iii) into (ii)

Quaid-e-Azam University Islamabad

ECONOMETRICS LECTURES OF DR. EATZAZ AHMED

Like wise, we can show that;

These are OLS estimators of and .

Quaid-e-Azam University Islamabad

ECONOMETRICS LECTURES OF DR. EATZAZ AHMED

By assumption x values are fixed therefore a1 = xi / x is fixed value and can be

ai Xi = (xi / xi) Xi.

Quaid-e-Azam University Islamabad

ECONOMETRICS LECTURES OF DR. EATZAZ AHMED

is random variable, it also follows that Ui ~ N for each i then

has minimum variance in the class of linear unbiased estimators:

E (Ui Uj) = E [Ui E (Ui)] [Uj E (Uj)]

= co-Var (Ui Uj) = 0 ----------------------- (xiv)