Sei sulla pagina 1di 34

Topic 4

Panel Data
Regression Models

(I). The Nature of Panel data


Often we deal with cross
sectional or time series data.
Each of the data has it unique
features.
In this topic we deal with
panel data regression models
that is, models that study
the same group of entities
(individuals, firms, states,
countries, and the like) over
time.

The focus of this topic:


(I). The Nature of Panel Data
Basic notation
Why panel data?

(II). Panel Data Models


Pooled OLS
Fixed effects vs. Random effects

(III). Issue on Panel Data


Models
Poolability test;
Hausman test: Pooled OLS vs. FEM;
Random effect: Pooled OLS vs.
REM

(I). The Nature of Panel data


Basic notation:
We observes variables, X for
N units, called the crosssections, for T consecutive
periods:

i = 1, . . . , N, with N the crosssectional dimension.


t = 1, . . . , T, with T the temporal
dimension.
panel of size N T.

Balanced panel: the number of


observations is the same for each
individual.
Unbalanced panel: the number of
observations is difference for each
individual.
Short panel: N > T
Long panel: T > N

Why Panel Data?


Enrich empirical analysis in such
a way that
Take cross sectional
heterogeneity explicitly into
account.
more informative data,
more variability, less
collinearity among variables,
more degrees of freedom and
more efficiency

Better suited to study the


dynamics of change. Can better
detect and measure effects
that simply cannot observed
in pure cross-section or
pure time series data.
Enable us to study more
complicated behavioral
models.
Minimize the aggregation
bias .
4

II. Basics of Panel Data Models


Panel data models examine individual-specific effects,
time effects, or both in order to deal with heterogeneity
or individual effect (cross-sectional or time specific effect)
that may or may not be observed.
Pooled OLS: If the individual effect does not exist,
ordinary lease squares (OLS) produce efficient and
consistent parameters estimates.

Basics of Panel Data Models


Fixed effect vs. Random Effects: For the presence of
individual effects, it can be either fixed or random
effects.
A fixed-effect model examines if intercepts vary across
groups or time periods, whereas a random-effect model
explores differences in error variance components across
individuals or time periods.

Example:
Example: Panel Data Models
The Christenson Associates
airline data constructed by
Christensen Associates of
Madison, Wisconsin are a
frequently cite dataset.
For this example, lets
consider the costs of six
airline firms for the period
1970-1984, for a total of 90
panel observations as in
Table 1 in the next slide.
The variable are defined as:

FIRM = airline id; YEAR =


time in years; COST= total
cost (in $000); OUTPUT=
revenue passager miles
(index number); FUEL=
fuel price ($); and LOAD=
loading factor, the average
capacity utilization of the
fleet. We wish to estimate
an airline cost function:
Cost = f (output, fuel
price, loading factor)
7

Table 1: Cost of 6 Airlines firms, 1970-1984


Time dimension
Cross sectional dimension

Example (Panel Data Models):


Consider the following
model:
ln Cit = f (ln Qit , ln PFit , LFit )

Where i = 1, 2, ,6;
t = 1, 2, ,15
What options do we have?

There are 3 possibilities:


1. Pooled OLS model
2. The fixed effects model
- least squares dummy
variable (LSDV) model
-within-group model.
3. The random effects model.

1. Pooled OLS
We simply pool all 90 observations and estimate a
grand regression, neglecting the cross-section and
time series nature of the data.
No distinction between airlines one airline is as
good as the other. Therefore,

C it = + 1 ln Q + 2 ln PFit + 3 LFit + u it

where i = 1,2,,6;
t = 1,2,,15
10

Pooled OLS
Step 1: Stack data

Data for firms on


top of one
another

11

Pooled OLS model

12

Pooled OLS
OLS consists of five core assumptions:
1. Linearity in parameter
2. Exogeneity: Disturbances are not correlated with any
regressors.
3. Disturbance s have constant variance
(Homoscedasticity) and are not related with each other
(no autocorrelation).
4. Variability in X values and X values are fixed in
repeating sampling.
13

Pooled OLS
5. No multicollinerity problem.
If individual effects are not zero, heterogeneity
(individual
specific characteristics
like
intelligence and personality that are not
captured in the regressors) may influence
Assumptions 2 and 3.
Hence, the OLS estimator is no longer best
unbiased linear estimator. The panel data
models provide a way to deal with these
problems.
14

Dealing with individual effects


How do we account for the unobservable, or
heterogeneity, effect(s)so that we can obtain
consistent and/or efficient estimates of the
parameters of the variables of prime interest, which
are output, fuel price, and loading factor in our
case.
Our prime interest may not be in obtaining the
impact of the unobservable variables because they
remain the same for a given subject: nuisance
parameters.
15

2. Fixed Effect Model (FEM)


Since an individual specific effect is time invariant
and considered a part of the intercept, individual
effect is allowed to be correlated with other
regressors. Hence, OLS Assumption 2 is not
violated.
This fixed effect model is estimated by
(a) Least Squares Dummy Variable (LSDV) regression
(OLS with a set of dummies) and
(b) within effect estimation methods.
16

(a). Least Squares Dummy Variable


(LSDV) FEM
Dummy for each firm

Step 1: Stack
data
Step 2: Apply
OLS

17

Example: LSDV FEM


Back to our example
The LSDV model allows for
heterogeneity among subjects
by allowing each entity to
have its own intercept value,
Cit = i + 1 ln Q + 2 ln PFit + 3 LFit + uit

where i = 1,2,,6;
t = 1,2,,15

Notice that we have put the


subscript i on the intercept
term to suggest that the
intercepts of the 6 airlines
may be different.
The difference may be due to
special features of each
airline, such as managerial
style, or the type of market
each airline is serving.
18

Example: LSDV FEM


How its works?
How to we actually allow for
the (fixed effect) intercept to
vary among the airlines?
We can easily do this by
using the dummy variable
technique, particularly the
differential intercept dummy
technique.

LSDV
Now we can write the FEM
as
Cit = 1 + 2 D2 i + 3 D3i + 4 D4i + 5 D5i + 6 D6i
+ 1Qit + 2 PFit + 3 LFit + uit

where D2i = 1 for airline 2, 0


otherwise; D3i = 1 for airline
3, 0 otherwise; and so on.

19

Example: LSDV FEM


Notice that since we have 6
airlines, we have introduced
only 5 dummy variables to
avoid falling into the dummy
variable trap (i.e. the situation
of perfect multicollinerity).
Here we treat airline 1 as the
base, or reference, category.
Of course, you can choose
any airline as the reference
point.

As a result, the intercepts 1


is the intercept value of
airline 1 and other
coefficients represent by
how much the intercept
values of the other airlines
differ from the intercept
value of the first airline.
Thus, 2 tells by how much
the intercept value of the
second airline differs from
1.
20

Example: LSDV FEM


The sum (1+ 2) gives the
actual value of the intercept
for airline 2.
The intercept values of the
other airlines can be
computed similarly.
Keep in mind that if we want
to introduce a dummy for
each airline, we need to drop
the (common) intercept;
otherwise we will fall into the
dummy variable trap.
21

LSDV estimates

22

Caution on the use of the LSDV FEM


1.

2.

If you introduce too many


dummy variables, you will
run up against the degree of
freedom problem.
With too many dummy
variables in the model, both
individual and interactive or
multiplicativ, there are
always the possibility of
multicolinerity, which might
make precise estimationof
one or more parameter
difficult,

3.

LSDV may not be able to


identify the impact of timeinvariant variables. Since
these variables will not
change over time for an
individual subject, the
LSDV approach may not be
able to identify the impact
of such time-invariable
variable on Y: subjectspecific intercepts absorb all
heterogeneity that may exist
in the Y and Xs.
23

Caution on the use of the LSDV FEM


4. We have to think
carefully about the error
term, uit. It is based on
the assumption that the
classical assumptions,
namely, uit~N(0, 2).
There are several
possibilities:

2. Error term of the cost


function for airline #1
can be nonautocorrelated or
autocorrelated.
3. It is also possible that
error term for airline #1
to be correlated, say
airline #2.

1. Error variance can be


homoscedastic or
heteroscedastic
24

(b). Fixed-effect within-group(WG) estimator


LSDV estimation is equivalent to
Within transformation: transforming each
variable by subtracting the cross sectional mean
*
*
X
= X -X
;
Y =Y-Y
Regress demean data using OLS
This is use when the numbers of cross sections are
large
This is called within estimator.
25

LSDV vs. WG estimator


LSDV

Within estimator

26

3. Random Effect Model (REM)


Random effect model: assume individual effect are
not correlated with any regressors, and estimate
error variance specific to group (or times). Hence,
ui is an individual specific random heterogeneity or
a component error term.
REM also known as Error Component Model.
The intercept and slope of regressors are the same.
The difference among individuals (or time periods)
lies in their individual specific errors, not their
intercept.

27

Example: REM

We can write the REM as


C it = i + 1 Q it + 2 PF it + 3 LF it + u it

Instead of treating i as
fixed, we assume that it is a
random variable: i = + i
where i is a random error
term with a mean of zero
and a variance of 2 .

28

Example: REM
Note that the error
component consists of two
part: i, which is the crosssection, or individualspecific, error component,
and uit, which is the
combined time series and
cross-section error
component and is sometimes
called the idiosyncratic term
because it varies over time
cross-section (i.e. subject) as
well as time.

It is important to note that


wit is not correlated with any
of the explanatory variables
included in the model.

29

(III). Issue on Panel Data Models


Selecting the models:
1. Poolability test:Pooled OLS vs. FEM
2. Breusch and Pagan test: Pooled OLS vs. REM
3. Hausman test: FEM vs. REM

30

Poolability test:Pooled OLS vs. FEM


F test for fixed effect:
o In a regression of FEM, the
null hypothesis is that all
dummy parameters, except for
the one that has been dropped,
are all zero.
o The alternative hypothesis is
that at least one dummy
parameter is not zero. This
hypothesis is tested with an F
test based on loss of goodnessof-fit.

If the null hypothesis is


rejected (at least one
group/time specific intercept
ui is not zero), you may
conclude that there is a
significant fixed effect or
significant increase in
goodness-of-fit in the fixed
effect model; therefore, the
fixed effect model is better
than the pooled OLS.

31

Breusch and Pagan test:


Pooled OLS vs. REM
Breusch and Pagan test
Provide a test of the random
effects model against the
pooled OLS model
Test the null hypothesis that
2 = 0 , which is the case where
the individual effects do not
exist and OLS is applicable
(i.e., the random effects model
reduces to the pooled one if
the variance of the individual
effect is zero)

Denote the residuals from


the OLS (pooled) regression
as it

Define: S = and
T

i =1

t =1

it


i = 1

2
it

t = 1

Then, test statistic is


S1

NT

=
1
2 ( T 1) S 2

and distributed as a 2 statistic


with 1 degree of freedom
under the null hypothesis.
32

Hausman test: FEM vs. REM


Hausman test
Usually applied to test for fixed vs.
random effects models: Compare
directly the random effects
estimator RE to the fixed effects
estimator FE
In the presence of a correlation
between the individual effects and
the regressors, the GLS estimates
are inconsistent, while the OLS
fixed effects results are consistent.
If there is no correlation between
the fixed effects and the
regressors, both estimators are
consistent, but the OLS fixed
effects estimator is inefficient.

Construct q = FE RE

and

V (q ) = V ( FE ) V ( RE )

Test statistic: m = q
[V (q )] 1 q
where q is distributed as a 2
statistic with k degree of freedom
(where k is the dimensionality of
).
The null hypothesis is that the
preferred model is a random
effects model and the alternative
is that the fixed effects model is
preferred.

33

Properties of various estimators


Pooled OLS estimator: consistent. However, error
term are likely to be correlated over time for a given
subject (panel-correlated standard errors can be use
for hypothesis testing purpose.
FEM estimators: are always consistent even if we
assumed that the underlying true model is pooled or
random.
REM estimators: consistent even if the true model
is Pooled estimator. If the true model is fixed, the
REM estimator is inconsistent.

34

Potrebbero piacerti anche