Sei sulla pagina 1di 25

INTRODUCTION TO ECONOMETRICS

Basic Econometrics
M. Hashem Pesaran
Paper 3, Lent Term

1
1 Statistical Inference
While mathematics is concerned (almost
exclusively) with deductive methods, statistics
employs inductive methods of inference.
Following examples clarify the meaning of
these two methods and the differences that
exist between them.
1.1 Deductive methods
The deductive process involves deriving a set
of “conclusions” (B ) from a set of premises
(A), in a logically consistent manner.
Premises A = (A1, A2, ...)
Conclusions B = (B1, B2, ...)
A implies B but not necessarily the reverse.
A1 : All employed persons in the UK
are over 16 years old.
A2 : Paul is employed.
Then we must necessarily have:

B : Paul is over 16 years old.

2
1.2 Inductive inference
The inference from the knowledge of the par-
ticular instances of which we have experience
to the knowledge of cases of which we have
no experience represents an inductive process.
Put less rigorously: the extension from the
particular to the general is called inductive in-
ference. Unlike deductive methods, induction
is not conclusive and is subject to a number of
problems.

2 Hypothesis Testing -
Statistical Inference
There are two basic approaches to statistical
inference; classical and Bayesian. The
focus of main stream econometrics is on
the classical procedure and involves a “null
hypothesis” (often denoted by H0) and an
“alternative hypothesis” denoted by H1.
The null hypothesis is the initial position
maintained by the investigator, and it is
3
also often referred to as the “maintained”
hypothesis. The null hypothesis could be
“simple” or “composite”.
2.1 Example of a Simple
Hypothesis
Suppose we are interested in testing the
hypothesis that the mean of yi is 2.5, namely
if
E(yi) = 2.5.
For this purpose we are given the T ob-
servations y1, y2, ..., yT , and it is assumed
that
yt = µ + εi, εi ∼ N (0, σ 2),
i = 1, 2, ..., T
where σ 2 is known (given). In this example
we have
H0 : µ = 2.5 ,
H1 : µ 6= 2.5.
In this example where H0 fully specify the
distribution of y1, y2, ..., yT (recall that σ 2 is
assumed to be given), the null hypothesis is
4
said to be simple. But if σ 2 is not known then
the hypothesis H0 will be complex. Another
example of a simple null hypothesis is given
by
H0 : µ = 2.5, σ 2 = 1
H1 : µ 6= 2.5 and/or σ 2 6= 1.
2.1.1 Type I and Type II error and the
power of test
Type I error is the probability of falsely
rejecting the null hypothesis (α).
Type II error is the probability of not
rejecting a false null hypothesis (β).
The power of a test is equal to 1 minus
the size of the Type II error, (1 − β), which
is also equal to probability of rejecting a false
null hypothesis.
2.2 Percentiles, Critical Values
and Value at Risk
Suppose a random variable r (say daily returns
on an instrument) has the probability density

5
function f (r). Then the p−th percentile of the
distribution of r , denoted by Cp, is defined
as that value of return that p percent of the
returns fall below it. Mathematically we have
Z Cp
p = Pr(r < Cp) = f (r)dr.
−∞
In the literature of risk management Cp is used
to compute “Value at Risk” or V aR for short.
The BIS-compliant VaR corresponds to a 1%
one-sided critical value of a normal, namely
−2.33σ , where σ is the standard deviation of
returns.
In hypothesis testing Cp is known as
the critical value of the test associated with
a (one-sided) test of size p. In the case of
two-sided tests of size p, the associated critical
value is computed as Cp/2.
2.3 Testing Simple Hypothesis
Suppose we are interested in testing that the
mean of yi, i =¡ 1, 2,¢..., T, is equal to µ0,
where yi ∼ N µ, σ 2 , with known σ 2. The

6
null and the alternative hypotheses could be
H0 : µ = µ0
H1 : µ 6= µ0 ← Two-sided alternative
½
µ ≥ µ0
One-sided alternatives
µ ≤ µ0
Under H0 µ ¶
2
σ
ȳT − µ0 ∼ N 0,
T
P
where ȳT = T1 Ti=1 yi, is the sample mean.
Define the standardized variable

ȳT − µ0 T (ȳT − µ0)
ZT = =
√σ σ
T
Hence under H0, ZT ∼ N (0, 1)
Type I error = Prob {ZT ≥ Cα |H0 } = α

Type II error = Prob {ZT < Cα |H1 } = β

Power = 1 − Type II error = 1 − β


= Prob {ZT ≥ Cα |H1 }
If σ 2 is not known the null hypothesis
is not simple and the statistic ZT can not be
7
used. An operational version of ZT is the
so-called t-ratio statistic
√ defined by
T (ȳT − µ0)
tT = ,
sT
where s2T is an unbiased estimator of σ 2 given
by PT 2
2 i=1 (yi − ȳT )
sT = .
T −1
The distribution of tT is now Student t with
T − 1 degrees of freedom.

8
3 Relationship Between Two
Variables
There are a number of ways that a regression
between two or more variables can be
motivated. It can, for example, arise because
we know a priori that there exists an exact
linear relationship between Y and X , with
Y being observed with measurement errors.
Alternatively, it could arise if (Y, X) have
a bivariate normal distribution and we are
interested in the conditional expectations of Y
given X , namely E(Y | X), which will be a
linear function of X either if the underlying
relationship between Y and X is linear, or if
Y and X have a bivariate normal distribution.
A regression line can also be considered
without any underlying statistical model, just
as a method of fitting a line to a scatter of
points in a two dimensional space.

9
4 The Curve Fitting
Approach
We first consider the problem of regression
purely as an act of fitting a line to a scatter di-
agram. Suppose (y1, x1) , (y2, x2) . . . (yT , xT )
are T pairs of observations on the variables
Y and X . We are interested in obtaining the
equation of a straight line such that, for each
observation xi, the corresponding value of Y
on a straight line in the (Y, X) plane is as
“close” as possible to the observed values yi.
Immediately different criteria of “close-
ness” or “fit” present themselves. Two basic
issues are involved:
(i) How to define and measure the distance of
the points in the scatter diagram from the
fitted line;
(ii) How to add up all such distances of the
sampled observations?
(A) Distance: There are three plausible
ways to measure the distance of a point
10
from the fitted line:
A(i) perpendicular to x-axis
A (ii) perpendicular to y -axis
A (iii) perpendicular to the fitted line
(B) Weighting (adding-up) schemes
B (i) simple average of the squares
B (ii) simple average of the absolute values
B (iii) weighted averages either of squared
distance measure or their absolute values
The simplest is the combination A(i) +
B(i), which gives the Ordinary Least Squares
(OLS) estimates of the regression of Y on
X . The difference between A(i) and A(ii)
can also be characterized as to which of
the two variables, X or Y , is represented
on the horizontal axis. The combination
A(ii) + B(i) is also referred to as the “reverse
regression of Y on X ”. Other combinations
of Distance/Weighting schemes are also
considered. For example A(iii) + B(i) is
called orthogonal regression, A(i) + B(ii)
yields the Absolute Minimum Distance

11
regression, and A(i) + B(iii) gives the
weighted (or absolute distance) least squares
(or absolute distance) regression.

5 Method of Least Squares


Treating X as the regressor and Y as the
regressand, then choosing the distance
measure, di = |yi − α − βxi|, the OLS
criterion function to be minimized is1
X T XT
Q (α, β) = d2i = (yi − α − βxi)2 .
i=1 i=1
The necessary conditions for this minimiza-
tion problem are given by
∂Q (α, β) X ³ ´
= (−2) yi − α̂ − β̂xi = 0,
∂α i
(1)

P P
1
The notation Ti=1 and i are used to denote the sum of the
terms after the summation sign over i = 1, 2, . . . T .

12
∂Q (α, β) X ³ ´
= (−2xi) yi − α̂ − β̂xi = 0,
∂β i
(2)
5.1 Normal Equations
Equations (1) and (2) are called the “normal
equations” for the OLS problems and can be
written as X
ei = 0, (3)
Xi
eixi = 0, (4)
i
where ei = yi − α̂i − β̂x Pi are the OLS
residuals. The condition ei = 0 also
P
gives ȳ =Pα̂ + β̂ x̄, where x̄ = i xi /T
and ȳ = i yi/T , and demonstrates that the
least squares regression line ŷi = α̂ + β̂xi,
goes through the sample means of Y and X .
Solving (3) and (4)
Pfor β̂ we have:
xiyi − T x̄ȳ
β̂ = P ,
x2i − T x̄2

13
orX
since X
(xi − x̄) (yi − ȳ) = xiyi − T x̄ȳ,
X 2
X
(xi − x̄) = x2i − T x̄2,
equivalently
P
(xi − x̄) (yi − ȳ) SXY
β̂ = i P 2 = ,
i (xi − x̄)
SXX
where P
i (xi − x̄) (yi − ȳ)
SXY = = SY X ,
P T
2
i (x i − x̄)
SXX =
T

6 Correlation Coefficient
Between X and Y
The coefficient of correlation between X and
Y is defined by
SXY
ρ̂XY = 1 = ρ̂Y X (5)
(SY Y SXX ) 2
and is easily seen to lie between -1 and +1,
or ρ̂2XY ≤ 1. Note also that the correlation
coefficient between Y and X is the same as
14
the correlation coefficient between X and Y !
Namely ρ̂XY = ρ̂Y X . In the bivariate case
we have the following interesting relationship
between ρ̂XY and the regression coefficients
of the regression Y on X and the “reverse”
regression of Y on X . Denoting these two
regression coefficients respectively by β̂ Y ·X
and β̂ X·Y , we have
SY X SXY
β̂ Y ·X β̂ X·Y = = ρ̂2XY . (6)
(SXX SY Y )
If β̂ Y ·X > 0 then β̂ X·Y > 0. Since ρ̂2XY ≤ 1,
where β̂ Y ·X > 0, then β̂ X·Y ≤ β̂ 1 and if
Y ·X
ρ̂2XY
β̂ Y ·X < 1 then β̂ X·Y = β̂ Y ·X
> ρ̂2XY , again
assuming that β̂ Y ·X > 0.

15
Exercise 1: Consider the following
observations on heights and weights of 10
different individuals:
Height in Centimeters Weight in kilograms

X Y

169.6 71.2

166.8 58.2

157.1 56.0

181.1 64.5

158.4 53.0

165.6 52.4

166.7 56.8

156.5 49.2

168.1 55.6

165.3 77.8

X̄ = 165.52 Ȳ = 59.47
SXX = 472.076, SY Y = 731.961,
SXY = 274.786.
Plot Y against X . Run OLS regressions
of Y on X and the reverse regression of Y on
X . Check that the fitted regression line goes
through the means of X and Y .
16
7 A Simple Application:
Engel’s Curve (Law)
Engel’s Law (associated to Ernst Engel, a
19th century German statistician) states that
the share of food in household expendi-
ture declines with household income (total
expenditure)—the income elasticity of de-
mand for food is less than unity. Studies of
Engel curves often use data from “budget
surveys” on patterns of expenditure across
individual households. The surveys are nor-
mally completed within a brief time interval,
ensuring that households face almost identi-
cal prices. Hence, this type of data is ideal for
focusing on the responsiveness of demands to
changes in income (or expenditure).
Suppose household i’s share of expen-
diture on food is governed by the following
simple Engel curve:
Wif = P f Qfi /Ei = α + β/Ei + ui (7)
where P f denotes the price of food, Qfi and
17
Ei denote household i’s quantity of food
demanded and total expenditure, respectively;
and ui denotes an error term.
From equation (1) the quantity de-
manded by individual household i is:
Qfi = αEi/P f + β/P f . Hence the income
(expenditure) elasticity of demand for food is
Ei ∂Qfi f f
f
Qi ∂Ei
= α/W i . Since W i = α + β/Ei , it
follows that the elasticity can be rewritten as
α/(α + (β/Ei)). Therefore, the income elas-
ticity of demand for food will be less than one,
when α > 0 and β > 0.
An alternative specification often used in
empirical demand analysis is the log-linear
specification
ln Wif = ln(P f Qfi /Ei) = α̃ + β̃ ln Ei + ũi.
(8)
The income (expenditure) elasticity of demand
for food in this case is given by 1 + β̃ , and
the Engel’s Law holds if β̃ < 0. The log-
linear specification is more robust to extreme
outliers and is less likely to be associated
18
with the heteroscedasticity problem. Its
drawback, particularly in the context of a
system of demand equations, is that it does
not satisfy the adding up restrictions - namely
the requirement that sum of expenditures on
different commodity groups should add up to
the total expenditures..
An alternative way of dealing with
outliers and/or measurement errors is to
aggregate the data by income (expenditure)
classes. Suppose the observations are grouped
into m income classes and there are Ng
households within each group. Consider now
the following share equations applicable to
groups: _ _ _ _
Wgf f f
= P Q g /E g = a + b/E g + ug (9)
_ _
where g is the group index, P Q g and E g
f f

denote income class g ’s mean expenditure


on food and total expenditure, respectively.
_
The disturbance term, ug , is equal to a
weighted average of the underlying household

19
P
disturbances, (1/Ng ) i²Ng ω iui, where
_
ω i = Ei/E g and Ng denotes the set of indices
i belonging to the income group g . The
coefficients a and b are comparable to α and
β defined above.
7.1 Some Estimates Based on
PSID Data
Panel Study of Income Dynamics in the
US provides a rich source of cross-section
and short panel data sets for the analysis
of family and individual incomes and other
characteristics. For more information visit
http://www.isr.umich.edu/src/psid/ .
Using the cross section of 2049 house-
holds in 1988 we obtain the following OLS
estimates for the log-linear specification

LC88 = 3.3622 + 0.4052 LY 88,


(0.1317) (0.0139)
R2 = 0.2928, s.e. of estimates in ( . ).
LC88 = Expenditure on Food and LY 88 =
20
log(Total Family Income). Income elasticity
of food expenditure = 0.4052. This estimate
seems to be quite robust to outliers. Dropping
the bottom and top 50 income households we
obtain 0.40403(0.0162).
However for the share specification we
have
W 88 = C88/Y 88 = 0.0137 + 1113.2/Y 88,
(0.0036) (17.08)
R2 = 0.6747
The income elasticity of food expenditure is
given by α/(α + (β/Yi)). Evaluating this at
the mean of the food expenditure in the full
sample (namely at Ȳ = $16, 250), we have

0.0137
Income elasticity =
0.0137 + 1113.2/16250
= 0.1667
which is significantly smaller than the estimate
obtained using the log-linear specification.
This is largely due to the high degree of the
sensitivity of the share equation approach to
the outliers. Dropping the top and bottom 100
21
households and re-estimating we obtain
0.0383
Income elasticity =
0.0383 + 905.6/16250
= 0.4073
Notice also that the income elasticity based on
the share equation varies with the household
income, whilst the estimate based on the
log-linear specification is fixed and does not
change with income or total expenditure of
the household.
Exercise 2: Obtain income elasticity
of food expenditure using the following
specification of the Engel curve
Wif = P f Qfi /Ei = a + b ln(Ei) + εi.
The PSID data for 1988 is supplied in the
psidh.fit file available from http://www.econ.
cam.ac.uk/faculty/pesaran/teaching.htm
Exercise 3: Re-estimate the above
Engel curve specifications using grouped
data supplied in psidg.fit and comment
on your results. In particular, discuss the
relative advantages of using household

22
and grouped observations for estimation of
income elasticities of demand from household
surveys.

23
Table 1: Group Mean Food Expenditure
and Family Income in 1988
(in US dollars, per annum)
Income Classes Food Expenditure Family Income # Per Group

0-3500 847.2151 2296.3 96

3501-5000 980.1776 4307.9 113

5001-6500 1112.1 5874.5 139

6501-8000 1175.1 7252.8 192

8001-9500 1224.9 8796.7 175

9501-11000 1275.8 10323.6 170

11001-12500 1363.8 11703.2 164

12501-14000 1430.6 13257.6 126

14001-15500 1622.3 14703.1 119

15501-17000 1535.5 16256.2 88

17001-18500 1577.9 17704.9 105

18501-20000 1824.5 19221.5 61

20001-21500 1711.4 20811.6 64

21501-23000 1787.1 22169.6 58

Continued....

24
Table 1 (Continued)
Income Classes Food Expenditure Family Income # Per Group

23001-24500 1742.3 23743.3 40

24501-26000 1877.9 25191.2 38

26001-27500 1781.1 26764.2 41

27501-29000 1882.3 28205.4 33

29001-30500 2664.0 29650.3 25

30501-32000 2513.9 31270.8 24

32001-33500 2405.8 32801.9 21

33501-35000 1980.6 34442.1 20

35001-36500 2560.3 35612.9 22

36501-38000 2435.2 37266.1 11

38001+ 2590.1 58411.4 104

Source: Panel Study of Income Dynamics Household Surveys.

25

Potrebbero piacerti anche