Sei sulla pagina 1di 31

2.

1
The Bivariate
Regression:
The Mechanics of
OLS
2.2
Aims and Learning Objectives
By the end of this session students should be able to:

Understand the difference between a population
regression and a sample regression

Explain the nature of the error term

Apply the technique of OLS and evaluate the
goodness of fit of the resulting regression

2.3
Regression Analysis
Regression analysis is concerned with describing and
evaluating the relationship between a given variable
(usually called the dependent variable) and one or more
other variables (usually known as the independent
variable(s)).
2.4
1. Estimate a relationship among
economic variables, such as Y = f(X).

2. Interpret the results, perform appropriate regression
diagnostics and conduct hypothesis tests

3. Forecast or predict the value of one
variable, Y, based on the value of
another variable, X.

Y = dependent (response) variable
X = independent (explanatory) variable
2.1 The Bivariate Population Regression
2.5
Terminology
y is the xs are the
predictand predictors
regressand regressors
explained variable explanatory variables
dependent variable independent variables
endogenous variable exogenous variables
left hand side variable right hand side variables
2.6
Regression and Correlation
If we say y and x are correlated, it means that we are
treating y and x in a symmetric way.
In regression, we treat the dependent variable (y) and
the independent variable(s) (xs) very differently
The y variable is assumed to be random or stochastic in
some way, i.e. to have a probability distribution.
The x variables are assumed to have fixed (non-
stochastic) values in repeated samples.
2.7
The relationship between X and the expected value
of Y , when X takes a specific value, might be linear:

E(Y|X) = |
1
+ |
2
X

where |
1
= intercept parameter
|
2
= slope parameter
Here, Linear has the meaning linear in parameter |
2
The Bivariate Population Regression
2.8
{
|
1
AX
AE(Y|X)
E(Y|X)
X
E(Y|X)=|
1
+|
2
X
|
2
=
AE(Y|X)
AX
Figure 2.1 The Economic Model: a linear relationship
between Y and X.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
The Bivariate Population Regression
-
-
-
-
2.9
The Error (Disturbance) Term
Y is a random variable composed of two parts:

I. Systematic component: E(Y) = |
1
+ |
2
X
This is the mean (or expected value) of Y.

II. Random component: U = Y - E(Y)
= Y - |
1
- |
2
X
This is called the random error.


Together E(Y) and U form the model:
Y
i
= |
1
+ |
2
X
i
+ U
i

2.10
Unobservable Nature
of the Error Term
1. Omitted explanatory variables
(very likely in the bivariate case)
-> more variables should be in equations
2. Measurement Error in the variables (especially
for y)
3. Randomness of (human) nature
men and markets are not machines,
containing some intrinsic randomness can
not be explain no matter how hard we try
4. Error term containing many small factors (hardly observed) that
do not systematically affect the analysis
2.11
In practice we will hardly ever observe the
population regression line. Instead we take a
random sample of observations from the
population regression in order to estimate |
1

and |
2
. We distinguish the sample regression
from the population regression as follows:



2.2 The Bivariate Sample Regression
1 2 1 2

i i i i i
Y X e b b X e | | = + + = + +
i i i
e Y Y + =

2.12
Determining the Regression Coefficients
So how do we determine what are?
Choose so that the distances from the
data points to the fitted lines are minimised (so
that the line fits the data as closely as possible)
The most common method used to fit a line to the
data is known as OLS (ordinary least squares).
1 2

and | |
1 2

and | |
2.13
Data/sample
Suppose we have n observations on y and x:

cross section
y
i
= b
1
+ b
2
x
i
+ e
i
i = 1, 2, , n
time series
y
t
= b
1
+ b
2
x
t
+ e
t
,

t = 1, 2, , n
2.14
}
.
}
.
.
.
Y
4
Y
1
Y
2
Y
3
X
1
X
2
X
3
X
4
{
{
e
1
e
2
e
3
e
4
X
Y
Figure 2.2 The relationship among Y, e and
the fitted regression line.
.
.
.
.
i i i
e X Y + + =
2 1

| |
2

Y
1

Y
4

Y
3

Y
2.15
To get good point estimates of
given the data can be done using OLS
What we actually do is
1. take each vertical distance between the data
point and the fitted line
2. square it and
3. minimise the total sum of the squares (hence
least squares).
2.3 The Method of Ordinary Least Squares
1 2

and | |
2.16
2.3 The Method of Ordinary Least Squares
The Ordinary Least Squares (OLS) method is to
minimise the residual sum of squares (RSS):

= =
=
n
i
i i
n
i
i
X Y e
1
2
2 1
1
2
)

( | |
( )
2 2
2

=
i i
i i i i
X X n
X Y X Y n
|
2 2
2
) (
) )( (

i
i i
i
i i
x
x y
X X
X X Y Y


= |
The solution to this problem is:
X Y
2 1

| | =
or
and
) ( Y Y y
i i
=
) ( X X x
i i
=
n
Y
Y
i
=
n
X
X
i
=
where
2.17
Population regression values:
y
t
= |
1
+ |
2
x
t
+ e
t

Population regression line:
E(y
t
|x
t
) = |
1
+ |
2
x
t

Sample regression values:
y
t
= b
1
+ b
2
x
t
+ e
t

Sample regression line:
y
t
= b
1
+ b
2
x
t
^
^
3.18
2.18
y
t
= |
1
+ |
2
x
t
+ e
t
Minimize error sum of squared deviations:
S(|
1
,|
2
) =
E
(y
t
- |
1
- |
2
x
t

)
2
(3.3.4)
t=1
n
e
t
= y
t
- |
1
- |
2
x
t

3.19
2.19
Minimize w. r. t. |
1
and |
2
:
S(|
1
,|
2
) =
E
(y
t
- |
1
- |
2
x
t

)
2
(3.3.4)
t =1
n
= - 2
E
(y
t
- |
1
- |
2
x
t

)
= - 2
E
x
t
(y
t
- |
1
- |
2
x
t

)
cS(.)
c|
1
cS(.)
c|
2
Set each of these two derivatives equal to zero and
solve these two equations for the two unknowns: |
1,
|
2
3.20
2.20
S(.)
S(.)
|
i
b
i
.
.
.
Minimize w. r. t. |
1
and |
2
:
S(.) =
E
(y
t
- |
1
- |
2
x
t

)
2
t =1
n
c
S(.)
c |
i
<
0

c
S(.)
c |
i
>
0
c
S(.)
c |
i
=
0
3.21
2.21
To minimize S(.), you set the two
derivatives equal to zero to get:
= - 2
E
(y
t
- b
1
- b
2
x
t

) = 0
= - 2
E
x
t
(y
t
- b
1
- b
2
x
t

) = 0
cS(.)
c|
1
cS(.)
c|
2
When these two terms are set to zero,
|
1
and |
2
become b
1
and b
2
because they no longer
represent just any value of |
1
and |
2
but the special
values that correspond to the minimum of S(.) .
3.22
2.22
- 2
E
(y
t
- b
1
- b
2
x
t

) = 0
- 2
E
x
t
(y
t
- b
1
- b
2
x
t

) = 0

E
y
t
- nb
1
- b
2 E
x
t

= 0
E
x
t
y
t
- b
1 E

x
t
- b
2 E
x
t
= 0
2
nb
1
+ b
2 E
x
t

=
E
y
t

b
1 E

x
t
+ b
2 E
x
t
=
E
x
t
y
t

2
3.23
2.23
Solve for b
1
and b
2
using definitions of x and y
nb
1
+ b
2 E
x
t

=
E
y
t

b
1 E

x
t
+ b
2 E
x
t
=
E
x
t
y
t

2
n

E

x
t
y
t
-
E
x
t

E
y
t

n

E

x
t
- (
E
x
t
)
2 2
b
2
=
b
1
= y - b
2
x
3.24
2.24
The Method of Ordinary Least Squares
Why use OLS?
(1) OLS is relatively easy to use

(2) The goal of minimising
is appropriate from a theoretical point
(3) OLS estimators have a number of useful
properties
2.25
2.4 Describing the Overall Fit of the
Estimated Model
The proportion of the variation
in Y
i
which is explained by the regression
Coefficient of determination, or R
2
is a
measure of the goodness of fit of a regression
DEFINITION
2.26
Explaining Variation in Y
i
Y
i
= |
1
+ |
2
X
i
+ e
i
Unexplained variation:
Y
i
= |
1
+ |
2
X
i
Explained variation:
e
i
= Y
i
Y
i
= Y
i
|
1
|
2
X
i
^
^
^ ^
^ ^
^ ^
2.27
}
.
}
.
.
.
Y
4
Y
1
Y
2
Y
3
X
1
X
2
X
3
X
4
{
{
e
1
e
2
e
3
e
4
X
Y
Figure 2.3 Measuring Goodness of Fit
.
.
.
.
i i i
e X Y + + =
2 1

| |
2

Y
1

Y
4

Y
3

Y
------------------------------------------------------------
Y
_
Y Y
4

2.28
Explaining Variation in Y
i
Y
4
= Y
4
+ e
4
Y
4
Y = Y
4
Y + e
4
_ _
Using Y as baseline
_
Specifically:
Generally:
Y
i
= Y
i
+ e
i
Y
i
Y = Y
i
Y + e
i
_ _
^
^
^
^
2.29
Explaining Variation in Y
i
Square both sides and sum over the sample
E
(Y
i
Y)
2
=
E
(Y
i
Y)
2
+
E
e
2
i
_ _
^
Or
Ey
i
2
= Ey
i
2
+ Ee
i
2

^
Or
TSS = ESS + RSS
2.30
Coefficient of Determination
What proportion of the variation
in Y
i
is explained?
ESS
TSS
0 R
2
1
Ey
i
2

Ey
i
2

R
2
=
^
= or
Ee
i
2

Ey
i
2

R
2
=
1- =1-
RSS
TSS
Caveat: do not really solely on the R
2
when
determining the overall quality of the regression
E=Explained
R=Residual
T=Total
2.31

Potrebbero piacerti anche