Sei sulla pagina 1di 8

Applied Econometrics: Handouts Week 3

Bivariate Regression Models, Properties


Filippo Ferroni
Banque de France and University of Surrey

August 30, 2014


In previous classes we dened the population model y = + x + , and we claimed that the
key assumption for simple regression analysis to be useful is that the expected value of given any
value of x is zero. We now return to the population model and study the statistical properties of
OLS. In other words, we now view a and b as estimators for the parameters and that appear in
the population model. This means that we will study properties of the distributions of a and b over
dierent random samples from the population.
Moreover, we tackle the issue of statistical signicance. Sometimes the statistical relationship
among two variables is feeble and as a consequence the estimated slope is almost at. We provide
a formal test to decide if the slope coecient is statistically non signicant (zero) or statistically
signicant (dierent from zero).
1 Properties of the OLS estimator
We begin by establishing the unbiasedness of OLS under a simple set of assumptions. For future
reference, it is useful to number these assumptions using the prex SLR for simple linear regression.
The rst assumption denes the population model.
Assumption 1. In the population model, the dependent variable y is related to the independent
variable x and the error (or disturbance) as
y = + x +
where and are the population intercept and slope parameters, respectively
We are interested in using data on y and x to estimate the parameters and, especially, . We
assume that our data were obtained as a random sample. We can write the equation in Assumption
1 in terms of the random sample as
y
i
= + x
i
+
i

e-mail:filippo.ferroni@banque-france.fr.
1
where
i
is the error or disturbance for observation i (for example, person i, rm i, city i, etc.). Thus,

i
contains the unobservables for observation i which aect y
i
. The
i
should not be confused with
the residuals, e
i
. Later on, we will explore the relationship between the errors and the residuals.
Assumption 2. (zero conditional mean)
E() = 0
Assumption 3. (Orthogonality between error term and regressor)
E(|x) = E()
For a random sample, this assumption implies that E(
i
|x
i
) = 0, for all i = 1, ..., n. Assumption
4 is veried if the error term and the regressor are independent.
Assumption 4. In the sample, the independent variables x
i
, i = 1, 2, .., .n, are not all equal to the
same constant. This requires some variation in x in the population.
Else

i
(x
i
x)
2
= 0 and b is not nite. Of the four assumptions made, this is the least important
because it essentially never fails in interesting applications.
1.1 Expected value of the OLS
Theorem. If assumption 1) to 3) hold then the OLS is unbiased, i.e.
E(b | x) = and E(a | x) =
Proof. Notice, since y
i
y = + x
i
+
i
y = (x
i
x) +
i
b =

(y
i
y)(x
i
x)

(x
i
x)
2
=

n
i=1
(x
i
x)((x
i
x) +
i
)

(x
i
x)
2
= +

(x
i
x)
i

(x
i
x)
2
a = y b x = + x + 1/n

i
b x = + ( b) x + 1/n

i
E(b | x) = E

(x
i
x)
i

(x
i
x)
2
| x

=
E(a | x) = E

+ ( b) x + 1/n

i
| x

=
Unbiasedness generally fails if any of our four assumptions fail. This means that it is important
to think about the veracity of each assumption for a particular application.
The assumption we should concentrate on for now is 3. If 3 holds, the OLS estimators are
unbiased. Likewise, if 3 fails, the OLS estimators generally will be biased. The possibility that x
2
is correlated with is almost always a concern in simple regression analysis with nonexperimental
data. Using simple regression when contains factors aecting y that are also correlated with x can
result in spurious correlation: that is, we nd a relationship between y and x that is really due to
other unobserved factors that aect y and also happen to be correlated with x.
1.2 Variance of the OLS
In addition to knowing that the sampling distribution of b is centered about (unbiased), it is
important to know how far we can expect b to be away from on average. Among other things,
this allows us to choose the best estimator among all, or at least a broad class of, the unbiased
estimators. The measure of spread in the distribution that is easiest to work with is the variance or
its square root, the standard deviation. Need to add an assumption.
Assumption 5. (constant variance or homoscedasticity and no autocorrelation)
V (
i
| x) =
2
and E(
j

i
| x) = 0
for all i and for j = i
The rst equation requires that the error term has the same dispersion and does not change
wehn a new observation is used. The second requires that the error term is uncorrelated and do not
show any pattern. With assumption 5 in place, we are ready to prove the following:
Theorem. If assumptions 1) to 4) hold then the OLS variance is given by
V (b | x) =

2

(x
i
x)
2
V (a | x) =

2
1/n

x
2
i

(x
i
x)
2
Proof.
V (b | x) = V

(x
i
x)
i

(x
i
x)
2
| x

=

2

(x
i
x)
2
V (a | x) = V

+ ( b) x + 1/n

i
| x

=
=

2
x
2

(x
i
x)
2
+ 1/n
2
=

2
1/n

x
2
i

(x
i
x)
2
For most purposes, we are interested in V (b | x). It is easy to summarize how this variance
depends on the error variance and the total variation in x
1
, x
2
, ..., x
n
. First, the larger the error
variance, the larger is V (b | x). This makes sense since more variation in the unobservables aect-
ing y makes it more dicult to precisely estimate . On the other hand, more variability in the
3
independent variable is preferred: as the variability in the x
i
increases, the variance of b decreases.
This also makes intuitive sense since the more spread out is the sample of independent variables, the
easier it is to trace out the relationship between E(y|x) and x. That is, the easier it is to estimate
. If there is little variation in the x
i
, then it can be hard to pinpoint how E(y|x) varies with x.
1.3 Error term, residuals and the estimator of the error variance
The formulas in the previous sections allow us to isolate the factors that contribute to the variance
of the OLS estimator. But the volatility of the error term is unknown, except in the extremely rare
case. Nevertheless, we can use the data to estimate
2
, which then allows us to estimate V (b | x)
and V (a | x)
This is a good place to emphasize the dierence between the the errors (or disturbances) and
the residuals, since this distinction is crucial for constructing an estimator of
2
. The former being
a random variable and the latter specic realizations of the random variable.
Equation of assumption 1 shows how to write the population model in terms of a randomly
sampled observation as
y
i
= + x
i
+
i
where
i
is the error or disturbance for observation i. We can also express y
i
in terms of its tted
value and residual as in equation
y
i
= a + bx
i
+ e
i
Comparing these two equations, we see that the error shows up in the equation containing the
population parameters, and . On the other hand, the residuals show up in the estimated equation
with a and b. The errors are never observable, while the residuals are computed from the data. We
can use the above equations and write the residuals as a function of the errors:
e
i
= y
i
a bx
i
= + x
i
+
i
a bx
i
e
i
=
i
+ ( a) + ( b)x
i
The dierence between them does have an expected value of zero.
Now the estimate of
2
. First, E(
2
i
) =
2
, so an unbiased estimator of
2
is 1/n

2
i
. Un-
fortunately, this is not a true estimator, because we do not observe the errors
i
. But, we do have
estimates of the
i
, namely the OLS residuals e
i
. This is a true estimator, because it gives a com-
putable rule for any sample of data on x and y. One slight drawback to this estimator is that it
turns out to be biased (although for large n the bias is small). The estimator is biased essentially
because it does not account for two restrictions that must be satised by the OLS residuals. These
restrictions are given by the two OLS rst order conditions:

e
i
= 0 and

e
i
x
i
= 0
4
One way to view these restrictions is this: if we know n 2 of the residuals, we can always get the
other two residuals by using the restrictions implied by the rst order conditions in the minimization
problem. Thus, there are only n2 degrees of freedom in the OLS residuals (as opposed to n degrees
of freedom in the errors). The unbiased estimator of
2
that we will use makes a degrees-of-freedom
adjustment:
s
2
=
1
n 2

e
2
i
=
RSS
n 2
Therefore, we can construct measures of dispersions of the OLS estimators as function of the ob-
servable. In particular, the variance of the OLS slope and intercept are dened as
s
2
b
=
s
2

(x
i
x)
2
(1)
s
2
a
=
s
2
1/n

x
2
i

(x
i
x)
2
(2)
where s
2
=
1
n2

e
2
i
.
2 Inference and Testing the significance of the regressor
Knowing the expected value and variance of the OLS estimators is useful for describing the precision
of the OLS estimators. However, in order to perform statistical inference, we need to know more
than just the rst two moments of b and we need the full sampling distribution of b. When we
condition on the values of the independent variables in our sample, it is clear that the sampling
distributions of the OLS estimators depend on the underlying distribution of the errors. To make
the sampling distributions of the slope b assume that the unobserved error is normally distributed
in the population.
Assumption 6. The population error is independent of the explanatory variables x
1
, x
2
, ..., x
k
and
is normally distributed with zero mean and variance
2
: N(0,
2
).
Assumptions 1) to 5) are called the classical linear model (CLM) assumptions.
Theorem. ( t distribution for the standardized estimator)
If assumptions 1) to 5) hold then the standardized OLS estiamtes are distributed as
t
a
=
a
s
a
t
n2
t
b
=
b
s
b
t
n2
where a and b are the OLS estimates and s
a
and s
b
are dened in 1 and 2
The theorem is important in that it allows us to test hypotheses involving the OLS coecient.
5
type of Hp H
0
H
1
Decision Rule: Reject H
0
two tails =
0
=
0
| t |> t
/2,df
one tail
0
>
0
t > t
,df
one tail
0
<
0
t < t
,df
Table 1:
0
is the hypothesized numerical value of . | t means the absolute value of t. t
,df
and
t
/2,df
means the critical t value at the or /2 level of signicance. df: degrees of freedom, (n2)
for the two-variable model. The same procedure holds to test hypotheses about
We can test two dierent types of null hypothesis
H
0
: =
0
H
0
: ()
0
where
0
is a prespecied value and the two H
0
give rise to a one- and two-sided tests respectively.
Recalling from the review class, the classical methodology involves partitioning the sample space
into two regions. If the the test statistic falls in the rejection region, then the null hypothesis is
rejected; if they fall in the acceptance region, then it is not. Such region is constructed using the
result of the theorem of this section. Table 1 summarizes the decision rules.
If on the basis of a test of signicance, say, the t test, we decide to accept the null hypothesis,
all we are saying is that on the basis of the sample evidence we have no reason to reject it; we are
not saying that the null hypothesis is true beyond any doubt.
5 0 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Rejection
Area
Acceptance
Region
(a) one sided
5 0 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Acceptance
Region
Rejection
Area
Rejection
Area
(b) two sided
Figure 1: Acceptance and Rejection areas in one- and two-sided tests.
Example 7. Suppose you regress y on x using n = 42 observations and you obtain the following
6
estimates
a = 0.25 b = 0.23 RSS = 40 ESS = 10
You wish to test the hypothesis that the slope is non signicantly dierent from zero at a 5% signif-
icance level, i.e.
H
0
: = 0
Then, the test statistic is
|
b
s
b
|=|
b
s
b
|=|
0.23
401/40
10
|= 2.3
Since 2.3 > t
40,0.05
= 2.021, we reject the null of non signicance.
2.1 The level of significance and the p-value
It should be clear from the discussion so far that whether we reject or do not reject the null hypothesis
depends critically on , the level of signicance or the probability of committing a Type I error, the
probability of rejecting the true hypothesis.
But even then, why is commonly xed at the 1, 5, or at the most 10 percent levels? As a matter
of fact, there is nothing sacrosanct about these values; any other values will do just as well. In here
it is not possible to discuss in depth why one chooses the 1, 5, or 10 percent levels of signicance,
for that will take us into the eld of statistical decision making, a discipline unto itself.
A brief summary, however, can be oered. For a given sample size, if we try to reduce a Type I
error, a Type II error increases, and vice versa. That is, given the sample size, if we try to reduce
the probability of rejecting the true hypothesis, we at the same time increase the probability of
accepting the false hypothesis. So there is a tradeo involved between these two types of errors,
given the sample size. Now the only way we can decide about the tradeo is to nd out the relative
costs of the two types of errors.
The rub is that we rarely know the costs of making the two types of errors. Thus, applied
econometricians generally follow the practice of setting the value of at a 1 or a 5 or at most a 10
percent level.
All this problem with choosing the appropriate value of can be avoided if we use what is known
as the p-value of the test statistic. As noted, the Achilles heel of the classical approach to hypothesis
testing is its arbitrariness in selecting . Once a test statistic (e.g., the t statistic) is obtained in
a given example, why not simply go to the appropriate statistical table and nd out the actual
probability of obtaining a value of the test statistic as much as or greater than that obtained in the
example? This probability is called the p value (i.e., probability value), also known as the observed
or exact level of signicance or the exact probability of committing a Type I error. More technically,
the p value is dened as the lowest signicance level at which a null hypothesis can be rejected.
7
Example 8. Consider the settings of the previous example.
|
b
s
b
|= 2.3
The p value is the probability of observing a value in absolute term larger then 2.3. For a student t
with 40 degrees of freedom the probability of observing a value larger then 2.3 amounts to p = 0.0311.
This means that the slope coecient is statistically signicant at 5% but not signicant at 1% level.
One can obtain the p value using electronic statistical tables to several decimal places. Unfortu-
nately, the conventional statistical tables, for lack of space, cannot be that rened. Most statistical
packages now routinely print out the p values.
8

Potrebbero piacerti anche