Panel Data Models Explained

Topic 4
Panel Data
Regression Models
(I). The Nature of Panel data

Often we deal with cross
sectional or time series data.
Each of the data has it unique
features.
In this topic we deal with
panel data regression models
that is, models that study
the same group of entities
(individuals, firms, states,
countries, and the like) over
time.
The focus of this topic:

(I). The Nature of Panel Data
Basic notation
Why panel data?
(II). Panel Data Models

Pooled OLS
Fixed effects vs. Random effects
(III). Issue on Panel Data

Models
Poolability test;
Hausman test: Pooled OLS vs. FEM;
Random effect: Pooled OLS vs.
REM
(I). The Nature of Panel data

Basic notation:
We observes variables, X for
N units, called the crosssections, for T consecutive
periods:
i = 1, . . . , N, with N the crosssectional dimension.

t = 1, . . . , T, with T the temporal
dimension.
panel of size N T.
Balanced panel: the number of

observations is the same for each
individual.
Unbalanced panel: the number of
observations is difference for each
individual.
Short panel: N > T
Long panel: T > N
Why Panel Data?

Enrich empirical analysis in such
a way that
Take cross sectional
heterogeneity explicitly into
account.
more informative data,
more variability, less
collinearity among variables,
more degrees of freedom and
more efficiency
Better suited to study the

dynamics of change. Can better
detect and measure effects
that simply cannot observed
in pure cross-section or
pure time series data.
Enable us to study more
complicated behavioral
models.
Minimize the aggregation
bias .
4
II. Basics of Panel Data Models

Panel data models examine individual-specific effects,
time effects, or both in order to deal with heterogeneity
or individual effect (cross-sectional or time specific effect)
that may or may not be observed.
Pooled OLS: If the individual effect does not exist,
ordinary lease squares (OLS) produce efficient and
consistent parameters estimates.
Basics of Panel Data Models

Fixed effect vs. Random Effects: For the presence of
individual effects, it can be either fixed or random
effects.
A fixed-effect model examines if intercepts vary across
groups or time periods, whereas a random-effect model
explores differences in error variance components across
individuals or time periods.
Example:
Example: Panel Data Models
The Christenson Associates
airline data constructed by
Christensen Associates of
Madison, Wisconsin are a
frequently cite dataset.
For this example, lets
consider the costs of six
airline firms for the period
1970-1984, for a total of 90
panel observations as in
Table 1 in the next slide.
The variable are defined as:
FIRM = airline id; YEAR =

time in years; COST= total
cost (in $000); OUTPUT=
revenue passager miles
(index number); FUEL=
fuel price ($); and LOAD=
loading factor, the average
capacity utilization of the
fleet. We wish to estimate
an airline cost function:
Cost = f (output, fuel
price, loading factor)
7
Table 1: Cost of 6 Airlines firms, 1970-1984

Time dimension
Cross sectional dimension
Example (Panel Data Models):

Consider the following
model:
ln Cit = f (ln Qit , ln PFit , LFit )
Where i = 1, 2, ,6;
t = 1, 2, ,15
What options do we have?
There are 3 possibilities:

1. Pooled OLS model
2. The fixed effects model
- least squares dummy
variable (LSDV) model
-within-group model.
3. The random effects model.
1. Pooled OLS
We simply pool all 90 observations and estimate a
grand regression, neglecting the cross-section and
time series nature of the data.
No distinction between airlines one airline is as
good as the other. Therefore,
C it = + 1 ln Q + 2 ln PFit + 3 LFit + u it
where i = 1,2,,6;
t = 1,2,,15
10
Pooled OLS
Step 1: Stack data
Data for firms on

top of one
another
11
Pooled OLS model
12
Pooled OLS
OLS consists of five core assumptions:
1. Linearity in parameter
2. Exogeneity: Disturbances are not correlated with any
regressors.
3. Disturbance s have constant variance
(Homoscedasticity) and are not related with each other
(no autocorrelation).
4. Variability in X values and X values are fixed in
repeating sampling.
13
Pooled OLS
5. No multicollinerity problem.
If individual effects are not zero, heterogeneity
(individual
specific characteristics
like
intelligence and personality that are not
captured in the regressors) may influence
Assumptions 2 and 3.
Hence, the OLS estimator is no longer best
unbiased linear estimator. The panel data
models provide a way to deal with these
problems.
14
Dealing with individual effects

How do we account for the unobservable, or
heterogeneity, effect(s)so that we can obtain
consistent and/or efficient estimates of the
parameters of the variables of prime interest, which
are output, fuel price, and loading factor in our
case.
Our prime interest may not be in obtaining the
impact of the unobservable variables because they
remain the same for a given subject: nuisance
parameters.
15
2. Fixed Effect Model (FEM)

Since an individual specific effect is time invariant
and considered a part of the intercept, individual
effect is allowed to be correlated with other
regressors. Hence, OLS Assumption 2 is not
violated.
This fixed effect model is estimated by
(a) Least Squares Dummy Variable (LSDV) regression
(OLS with a set of dummies) and
(b) within effect estimation methods.
16
(a). Least Squares Dummy Variable

(LSDV) FEM
Dummy for each firm
Step 1: Stack
data
Step 2: Apply
OLS
17
Example: LSDV FEM

Back to our example
The LSDV model allows for
heterogeneity among subjects
by allowing each entity to
have its own intercept value,
Cit = i + 1 ln Q + 2 ln PFit + 3 LFit + uit
where i = 1,2,,6;
t = 1,2,,15
Notice that we have put the

subscript i on the intercept
term to suggest that the
intercepts of the 6 airlines
may be different.
The difference may be due to
special features of each
airline, such as managerial
style, or the type of market
each airline is serving.
18
Example: LSDV FEM

How its works?
How to we actually allow for
the (fixed effect) intercept to
vary among the airlines?
We can easily do this by
using the dummy variable
technique, particularly the
differential intercept dummy
technique.
LSDV
Now we can write the FEM
as
Cit = 1 + 2 D2 i + 3 D3i + 4 D4i + 5 D5i + 6 D6i
+ 1Qit + 2 PFit + 3 LFit + uit
where D2i = 1 for airline 2, 0

otherwise; D3i = 1 for airline
3, 0 otherwise; and so on.
19
Example: LSDV FEM

Notice that since we have 6
airlines, we have introduced
only 5 dummy variables to
avoid falling into the dummy
variable trap (i.e. the situation
of perfect multicollinerity).
Here we treat airline 1 as the
base, or reference, category.
Of course, you can choose
any airline as the reference
point.
As a result, the intercepts 1

is the intercept value of
airline 1 and other
coefficients represent by
how much the intercept
values of the other airlines
differ from the intercept
value of the first airline.
Thus, 2 tells by how much
the intercept value of the
second airline differs from
1.
20
Example: LSDV FEM

The sum (1+ 2) gives the
actual value of the intercept
for airline 2.
The intercept values of the
other airlines can be
computed similarly.
Keep in mind that if we want
to introduce a dummy for
each airline, we need to drop
the (common) intercept;
otherwise we will fall into the
dummy variable trap.
21
LSDV estimates
22
Caution on the use of the LSDV FEM

1.
2.
If you introduce too many

dummy variables, you will
run up against the degree of
freedom problem.
With too many dummy
variables in the model, both
individual and interactive or
multiplicativ, there are
always the possibility of
multicolinerity, which might
make precise estimationof
one or more parameter
difficult,
3.
LSDV may not be able to

identify the impact of timeinvariant variables. Since
these variables will not
change over time for an
individual subject, the
LSDV approach may not be
able to identify the impact
of such time-invariable
variable on Y: subjectspecific intercepts absorb all
heterogeneity that may exist
in the Y and Xs.
23
Caution on the use of the LSDV FEM

4. We have to think
carefully about the error
term, uit. It is based on
the assumption that the
classical assumptions,
namely, uit~N(0, 2).
There are several
possibilities:
2. Error term of the cost

function for airline #1
can be nonautocorrelated or
autocorrelated.
3. It is also possible that
error term for airline #1
to be correlated, say
airline #2.
1. Error variance can be

homoscedastic or
heteroscedastic
24
(b). Fixed-effect within-group(WG) estimator

LSDV estimation is equivalent to
Within transformation: transforming each
variable by subtracting the cross sectional mean
*
*
X
= X -X
;
Y =Y-Y
Regress demean data using OLS
This is use when the numbers of cross sections are
large
This is called within estimator.
25
LSDV vs. WG estimator

LSDV
Within estimator
26
3. Random Effect Model (REM)

Random effect model: assume individual effect are
not correlated with any regressors, and estimate
error variance specific to group (or times). Hence,
ui is an individual specific random heterogeneity or
a component error term.
REM also known as Error Component Model.
The intercept and slope of regressors are the same.
The difference among individuals (or time periods)
lies in their individual specific errors, not their
intercept.
27
Example: REM
We can write the REM as

C it = i + 1 Q it + 2 PF it + 3 LF it + u it
Instead of treating i as
fixed, we assume that it is a
random variable: i = + i
where i is a random error
term with a mean of zero
and a variance of 2 .
28
Example: REM
Note that the error
component consists of two
part: i, which is the crosssection, or individualspecific, error component,
and uit, which is the
combined time series and
cross-section error
component and is sometimes
called the idiosyncratic term
because it varies over time
cross-section (i.e. subject) as
well as time.
It is important to note that

wit is not correlated with any
of the explanatory variables
included in the model.
29
(III). Issue on Panel Data Models

Selecting the models:
1. Poolability test:Pooled OLS vs. FEM
2. Breusch and Pagan test: Pooled OLS vs. REM
3. Hausman test: FEM vs. REM
30
Poolability test:Pooled OLS vs. FEM

F test for fixed effect:
o In a regression of FEM, the
null hypothesis is that all
dummy parameters, except for
the one that has been dropped,
are all zero.
o The alternative hypothesis is
that at least one dummy
parameter is not zero. This
hypothesis is tested with an F
test based on loss of goodnessof-fit.
If the null hypothesis is

rejected (at least one
group/time specific intercept
ui is not zero), you may
conclude that there is a
significant fixed effect or
significant increase in
goodness-of-fit in the fixed
effect model; therefore, the
fixed effect model is better
than the pooled OLS.
31
Breusch and Pagan test:

Pooled OLS vs. REM
Breusch and Pagan test
Provide a test of the random
effects model against the
pooled OLS model
Test the null hypothesis that
2 = 0 , which is the case where
the individual effects do not
exist and OLS is applicable
(i.e., the random effects model
reduces to the pooled one if
the variance of the individual
effect is zero)
Denote the residuals from

the OLS (pooled) regression
as it
Define: S = and
T
i =1
t =1
it

i = 1
2
it
t = 1
Then, test statistic is

S1
NT
=
1
2 ( T 1) S 2
and distributed as a 2 statistic

with 1 degree of freedom
under the null hypothesis.
32
Hausman test: FEM vs. REM

Hausman test
Usually applied to test for fixed vs.
random effects models: Compare
directly the random effects
estimator RE to the fixed effects
estimator FE
In the presence of a correlation
between the individual effects and
the regressors, the GLS estimates
are inconsistent, while the OLS
fixed effects results are consistent.
If there is no correlation between
the fixed effects and the
regressors, both estimators are
consistent, but the OLS fixed
effects estimator is inefficient.
Construct q = FE RE
and
V (q ) = V ( FE ) V ( RE )
Test statistic: m = q
[V (q )] 1 q
where q is distributed as a 2
statistic with k degree of freedom
(where k is the dimensionality of
).
The null hypothesis is that the
preferred model is a random
effects model and the alternative
is that the fixed effects model is
preferred.
33
Properties of various estimators

Pooled OLS estimator: consistent. However, error
term are likely to be correlated over time for a given
subject (panel-correlated standard errors can be use
for hypothesis testing purpose.
FEM estimators: are always consistent even if we
assumed that the underlying true model is pooled or
random.
REM estimators: consistent even if the true model
is Pooled estimator. If the true model is fixed, the
REM estimator is inconsistent.
34

Panel Data Models Explained

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Panel Data Models Explained

Caricato da

Copyright:

Formati disponibili

Topic 4

(I). The Nature of Panel data

The focus of this topic:

(II). Panel Data Models

(III). Issue on Panel Data

(I). The Nature of Panel data

i = 1, . . . , N, with N the crosssectional dimension.

Balanced panel: the number of

Why Panel Data?

Better suited to study the

II. Basics of Panel Data Models

Basics of Panel Data Models

FIRM = airline id; YEAR =

Table 1: Cost of 6 Airlines firms, 1970-1984

Example (Panel Data Models):

There are 3 possibilities:

Data for firms on

Pooled OLS model

Dealing with individual effects

2. Fixed Effect Model (FEM)

(a). Least Squares Dummy Variable

Example: LSDV FEM

Notice that we have put the

Example: LSDV FEM

where D2i = 1 for airline 2, 0

Example: LSDV FEM

As a result, the intercepts 1

Example: LSDV FEM

Caution on the use of the LSDV FEM

If you introduce too many

LSDV may not be able to

Caution on the use of the LSDV FEM

2. Error term of the cost

1. Error variance can be

(b). Fixed-effect within-group(WG) estimator

LSDV vs. WG estimator

3. Random Effect Model (REM)

We can write the REM as

It is important to note that

(III). Issue on Panel Data Models

Poolability test:Pooled OLS vs. FEM

If the null hypothesis is

Breusch and Pagan test:

Denote the residuals from

Then, test statistic is

and distributed as a 2 statistic

Hausman test: FEM vs. REM

Properties of various estimators

Potrebbero piacerti anche