Sei sulla pagina 1di 42

Panel data methods in Stata

Session 2
By Ziyodullo Parpiev, PhD

Overview

Panel data
How to get to know the data

Change over time


Tabulating
Calculating transition probabilities

Using panel data in Stata

Data on n cases, over t time periods, giving a total of


n t observations

One record per observation


i.e. long format

Stata tools for analyzing panel data begin with the


prefix xt

First need to tell Stata that you have panel data using
xtset

Complete and incomplete person-wave data


+------------------------------------------------------------------+
|
pid
wave
sex
age
mastat
jbstat
fihhmn |
|------------------------------------------------------------------|
| 10019057
1
female
59
never ma
retired
780 |
| 10019057
2
female
60
never ma
retired
759.14 |
| 10019057
3
female
61
never ma
retired
923.5 |
| 10019057
4
female
62
never ma
retired
62.5 |
| 10019057
5
female
63
never ma
retired
663 |
| 10019057
6
female
64
never ma
retired
missing o |
| 10019057
7
female
65
never ma
retired
1254.963 |
| 10019057
8
female
66
never ma
retired
1270.432 |
| 10019057
9
female
67
never ma
retired
1364.555 |
| 10019057
10
female
67
never ma
retired
1479.74 |
| 10019057
11
female
68
never ma
retired
1328.25 |
| 10019057
12
female
69
never ma
retired
1371.49 |
| 10019057
13
female
71
never ma
retired
missing o |
| 10019057
14
female
71
never ma
retired
1372.333 |
| 10019057
15
female
73
never ma
retired
1475.812 |
|------------------------------------------------------------------|
| 10028005
1
male
30
never ma
employed
1501.155 |
| 10028005
2
male
31
never ma
employed
1636.259 |
| 10028005
3
male
32
never ma
employed
1943.283 |
| 10028005
6
male
35
never ma
employed
2001.54 |
| 10028005
7
male
36
never ma
employed
1634.33 |
| 10028005
9
male
38
never ma
employed
1587.945 |
+------------------------------------------------------------------+

Telling Stata you have time series data


Unique cross-wave identifier
Time variable

. xtset pid wave


panel variable:
time variable:
delta:

pid (unbalanced)
wave, 1 to 15, but with gaps
1 unit

Cases not observed for


every time period

. xtset pid wave


panel variable:
time variable:
delta:

pid (unbalanced)
wave, 1 to 15, but with gaps
1 unit

Period between observations in units of the time variable

Describing the patterns in panel data


. xtdes,patterns(20)
Freq. Percent
Cum. | Pattern
---------------------------+----------------1294
28.12
28.12 | 111111111111111
248
5.39
33.51 | 1..............
157
3.41
36.93 | 11.............
115
2.50
39.43 | ..............1
105
2.28
41.71 | 111............
104
2.26
43.97 | 1111...........
73
1.59
45.56 | 11111..........
69
1.50
47.05 | ............111
66
1.43
48.49 | ..........11111
62
1.35
49.84 | .............11
60
1.30
51.14 | .1.............
60
1.30
52.45 | 11111111111....
58
1.26
53.71 | 11111111.......
58
1.26
54.97 | 111111111......
57
1.24
56.21 | 11111111111111.
55
1.20
57.40 | .....1.........
54
1.17
58.57 | ........1111111
54
1.17
59.75 | .11111111111111
54
1.17
60.92 | 1111111111.....
53
1.15
62.07 | .........111111
1745
37.93 100.00 | (other patterns)
---------------------------+----------------4601
100.00
| XXXXXXXXXXXXXXX

Examining change over two waves


1991
|
1992 Employment status
Employment|
status
|
1
2
3 |
Total
-----------+---------------------------------+---------1 |
961
35
76 |
1,072
2 |
36
38
24 |
98
3 |
40
23
524 |
587
-----------+---------------------------------+---------Total |
1,037
96
624 |
1,757

2001
|
2002 Employment status
Employment |
status
|
1
2
3 |
Total
-----------+---------------------------------+---------1 |
991
15
46 |
1,052
2 |
20
12
9 |
41
3 |
56
20
495 |
571
-----------+---------------------------------+---------Total |
1,067
47
550 |
1,664

Calculating transition probabilities


The transition probability is the probability of
transitioning from one state to another

p ij Pr{ X t j | X t 1 i )
So to calculate by hand,
n

p ij N ij / N ij
j 1

Cell count

Row total

Transition probability matrix


1991
|
1992 Employment status
Employment|
status
|
1
2
3 |
Total
-----------+---------------------------------+---------1 |
0.90
0.03
0.07|
1.00
2 |
0.37
0.39
0.24|
1.00
3 |
0.07
0.04
0.89|
1.00
-----------+---------------------------------+----------

2001
|
2002 Employment status
Employment|
status
|
1
2
3 |
Total
-----------+---------------------------------+--------1 |
0.94
0.01
0.04|
1.00
2 |
0.49
0.29
0.22|
1.00
3 |
0.10
0.04
0.87|
1.00
-----------+---------------------------------+---------

Transition probability matrices in Stata


Mean transition probabilities for all waves t
to t+1 when you leave out the if statement
. xttrans jbstat if wave<3,freq
current |
economic |
current economic activity
activity |
1
2
3 |
Total
-----------+---------------------------------+---------1 |
961
35
76 |
1,072
|
89.65
3.26
7.09 |
100.00
-----------+---------------------------------+---------2 |
36
38
24 |
98
|
36.73
38.78
24.49 |
100.00
-----------+---------------------------------+---------3 |
40
23
524 |
587
|
6.81
3.92
89.27 |
100.00
-----------+---------------------------------+---------Total |
1,037
96
624 |
1,757
|
59.02
5.46
35.52 |
100.00

Change in a categorical variable over time


A decision tree
0.91
empl

empl

0.03
unemp
0.06

0.90

olf
unemp

0.03

0.26
0.49

empl

empl
unemp

0.25
olf

0.07
olf

0.10
0.03

empl
unemp

0.87
olf

Change in a continuous variable over time

Size transition matrix

Quantile transition matrix

Mean transition matrix

Median transition matrix

Size transition matrix

Absolute mobility

Boundaries set exogenously i.e. predetermined

e.g. movement in and out of poverty

e.g. poverty defined a priori as an income below 5,000

Do not depend on distribution under investigation

e.g. comparing mobility in 1990s and 2000s

incorporates both movements of positions of individuals and


economic growth

Quantile transition matrix

Mobility as a relative concept

Same number of individuals in each class

Only records movements involving re-ranking

Cannot take account of economic growth, for


example when comparing matrices

Cannot draw a complete picture if comparing


mobility in different cohorts/countries/welfare
regimes

Mean/median transition matrices

Both absolute and relative approaches incorporated


into matrices
Class boundaries defined as percentages of mean or
median income of the origin and destination
distributions
Example:

25%, 50%, 75% of median income


Note that this is not the same as quartiles

Example: income 1991-1992


wave = 1
household income: month before interview
------------------------------------------------------------Percentiles
Smallest
1%
181.86
0
5%
349.82
0
10%
458.98
0
Obs
2795
25%
826.6895
0
Sum of Wgt.
2795
50%

1511.067

75%
90%
95%
99%

2365.493
3329.769
4062.217
6748.689

Largest
9230.818
9230.818
9230.818
9230.818

Mean
Std. Dev.

1773.253
1299.089

Variance
Skewness
Kurtosis

1687633
1.836874
8.622895

wave = 2
household income: month before interview
------------------------------------------------------------Percentiles
Smallest
1%
207.9433
0
5%
338.7431
0
10%
460.68
0
Obs
2639
25%
861.67
5
Sum of Wgt.
2639
50%
75%
90%
95%
99%

1508
2449.813
3414.511
4103.649
5824.449

Largest
8405.636
8405.636
10491.08
10491.08

Mean
Std. Dev.

1795.179
1229.827

Variance
Skewness
Kurtosis

1512476
1.352148
6.370836

Category boundaries for each method


Matrix

Year

Boundary 1
(n)

Boundary 2
(n)

Boundary 3
(n)

Boundary 4
(n)

Size

1991

0 - 800
(580)

800 - 1500
(650)

1500 - 2200
(504)

2200 - 9231
(715)

1992

0 - 800
(580)

800 - 1500
(645)

1500 - 2200
(473)

2200 - 10491
(751)

1991

0 827
(609)

827 -1511
(615)

1511 2365
(611)

2365 9231
(614)

1992

0 862
(610)

862 1508
(612)

1508 2450
(612)

2450 10491
(615)

1991

0 887
(654)

887 -1773
(814)

1773 2660
(506)

2660 9231
(475)

1992

0 898
(652)

898 -1795
(766)

1795 2693
(501)

2693 10491
(530)

1991

0 750
(539)

750 -1500
(685)

1500 2250
(540)

2250 9231
(685)

1992

0 746
(536)

746 -1491
(686)

1491 -2237
(505)

2262 10491
(722)

Quartile

Mean

Median

Warning!

Measurement error

Causes an over-estimation of mobility

If mothers and babys weight are reported to nearest half


pound can affect which band the observations falls in

A respondent may describe their marital status as separated in


year 1 and single in year 2

Overview

Types of questions, types of variables: time-invariant, time-varying and


trend
Between- and within-individual variation
Concept of individual heterogeneity
From OLS to models that allow causal interpretations: fixed effects and
random effects models
The basics of these models implementation in Stata

Types of variable

Those which vary between individuals but hardly ever over time

Those which vary over time, but not between individuals

The retail price index


National unemployment rates
Age, in a cohort study

Those which vary both over time and between individuals

Sex
Ethnicity
Parents social class when you were 14
The type of primary school you attended (once youve become an adult)

Income
Health
Psychological wellbeing
Number of children you have
Marital status

Trend variables

Vary between individuals and over time, but in highly predictable ways:
Age
Year

Between- and within-individual variation

If you have a sample with repeated observations on the same individuals, there are two
sources of variance within the sample:

The fact that individuals are systematically different from one another (between-individual variation)
The fact that individuals behaviour varies between observations over time (within-individual variation)
k


i 1

j 1


i 1

( x ij x )

( x ij x i )

Within variation is the sum of the squares of each


individuals observation from his or her mean

j 1


i 1

Total variation is the sum over all individuals and years,


of the square of the difference between each
observation of x and the mean

( xi x)

Between variation is the sum of squares of differences


between individual means and the whole-sample mean

j 1

x11 x12 ... x1 m

x 21 x 22 ... x 2 m

.......... ........
.......... ........

x
x
... x km
k1 k 2

Remember:
From the variation, you get to the variance, you get to
the Standard Deviation:

SD

T/(N - 1)

xtsum in STATA

Similar to ordinary sum command

xtset pid wave


panel variable:
time variable:
delta:

pid (unbalanced)
wave, 1 to 15, but with gaps
1 unit

Have chosen a balanced sample

xtsum female partner age ue_sick LIKERT wave if nwaves == 15

Variable
female

Mean

Std. Dev.

Min

Max

Observations

.4984321
.4989059
0

0
0
.5397574

1
1
.5397574

N =
16324
n =
1237
T-bar = 13.1964
N =
16292
n =
1234
T-bar = 13.2026

overall
between
within

.5397574

partner

overall
between
within

.6892954

.4627963
.4217842
.243531

0
0
-.244038

1
1
1.622629

age

overall
between
within

40.03349

19.74332
19.27238
4.31763

0
6.4
31.30015

98
90.93333
54.30015

ue_sick

overall
between
within

.0672924

.2505353
.1738938
.1852756

0
0
-.866041

1
1
1.000626

N =
16302
n =
1237
T-bar = 13.1787

LIKERT

overall
between
within

11.26167

5.344825
3.609665
4.030974

0
0
-6.738331

36
29.69231
35.12834

N =
15661
n =
1225
T-bar = 12.7845

wave

overall
between
within

4.320605
0
4.320605

1
8
1

15
8
15

N =
n =
T =

N =
n =
T =

19410
1294
15

19410
1294
15

All variation is
between
Most variation
is between,
because its
fairly rare to
switch between
having and not
having a
partner

All variation is within,


because this is a
balanced sample

More on xtsum.
.

xtset pid wave


panel variable:
time variable:
delta:

pid (unbalanced)
wave, 1 to 15, but with gaps
1 unit

xtsum female partner age ue_sick LIKERT wave if nwaves == 15

Variable
female

Mean

Std. Dev.

Min

Max

Observations

.4984321
.4989059
0

0
0
.5397574

1
1
.5397574

N =
16324
n =
1237
T-bar = 13.1964
N =
16292
n =
1234
T-bar = 13.2026

overall
between
within

.5397574

partner

overall
between
within

.6892954

.4627963
.4217842
.243531

0
0
-.244038

1
1
1.622629

age

overall
between
within

40.03349

19.74332
19.27238
4.31763

0
6.4
31.30015

98
90.93333
54.30015

ue_sick

overall
between
within

.0672924

.2505353
.1738938
.1852756

0
0
-.866041

1
1
1.000626

N =
16302
n =
1237
T-bar = 13.1787

LIKERT

overall
between
within

11.26167

5.344825
3.609665
4.030974

0
0
-6.738331

36
29.69231
35.12834

N =
15661
n =
1225
T-bar = 12.7845

overall
between
within

4.320605
0
4.320605

1
8
1

15
8
15

wave

N =
n =
T =

N =
n =
T =

19410
1294
15

Observations with
non-missing
variable
Number of
individuals
Average number
of time-points

Min & max refer to xi-bar

19410
1294
15

Min & max refer to individual deviation from own averages, with global averages added back in.

The xttab command


For simplicity, omitted jbstats of missing, maternity
leave, gov training and other.
.

xttab jbstat if nwaves == 15 & jbstat >= 1 & jbstat != 5 & jbstat <= 8

jbstat

Overall
Freq.
Percent

self-emp
employed
unemploy
retired
family c
ft studt
lt sick,

1388
8982
539
2687
1159
718
558

8.66
56.03
3.36
16.76
7.23
4.48
3.48

Total

16031

100.00

Pooled sample, broken


down by person/years

Between
Freq.
Percent
228
974
274
314
292
271
105
2458
(n = 1236)

Within
Percent

18.45
78.80
22.17
25.40
23.62
21.93
8.50

42.72
68.27
17.51
58.49
28.97
42.93
39.08

198.87

50.28

Number of people who


spent any time in this state

Of those who spent any


time in this state, the
proportion of their time
(on average) they spent in
it.

Which statistical model for panel data?


Your research question will guide which models are most suitable
but the nature of your data is also important:
Is your research question cross-sectional or longitudinal, or both?

Cross-sectional: exploit variation between individuals


Longitudinal: exploit variation within individuals over time and permit
causal interpretation of effects

and can consider between variation if needed

What is the effect on income of having more children?

What is the difference in income between individuals who have a different


number of children?
What is the difference in income before and after the birth of a child?

What is the difference in income between men and women and before
and after the birth of a child?
How does income change in the time leading up to the birth of a
child ? survival analysis later in this course!

Longitudinal analysis is concerned with


modelling individual heterogeneity
A very simple concept: people are different!
In social science, when we talk about heterogeneity, we are really
talking about unobservable (or unobserved) heterogeneity:

Observed heterogeneity: differences in education levels, or


parental background, or anything else that we can measure and
control for in regressions
Unobserved heterogeneity: anything which is fundamentally
unmeasurable, or which is rather poorly measured, or which does
not happen to be measured in the particular data set we are
using.

With panel data we can do something about unobserved heterogeneity


as we can differentiate between person-level unobserved x that are
identical over time and those that vary over time!

OLS with panel data

OLS: pooled

3000

4000

OLS: cross-section

1000

2000

Income

x1
0
5
10
15
20
25
5
10
15
20
25
30
10
15
20
25
30
35

4000

y
2340
2405
2730
3250
3705
4030
1885
2145
2275
2470
2762
3120
780
1170
1365
2405
2405
2470

3000

wave
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
5
6

2000

pid
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3

1000

Cross-sectional effect captures may be quite misleading (omitted variable bias)!


By adding more data points from the same units at different points in time we can get
better estimates. But assumptions of OLS may be violated!

Income

10
20
30
40
Number of years since leaving school
pid=1

pid=2

pid=3

OLSt=1: y=2448 -156*x1

10
20
30
40
Number of years since leaving school
pid=1

pid=2

pid=3

OLSpooled: y=1925 + 29*x1

An illustration of how unobserved


heterogeneity matters
Considering this is from panel data, two problems become apparent:

Error terms for persons 1, 2 and 3 differ systematically


The association between x and y appears to be biased

OLS: unobs het

4000

4000

OLS: pooled

w1

3000
2000

Income
2000

Income

3000

u1 ?

1000

1000

w3

10
20
30
40
Number of years since leaving school
pid=1
pid=3

pid=2

10
20
30
40
Number of years since leaving school
pid=1
pid=3

pid=2

Panel data allows you to:


Break down the error term
(wi) in two components: the
unobservable characteristics
of the person (ui), and
genuine error (ei).
then model ui and ei

Expanding the OLS model to consider


unobserved heterogeneity
Analytically, think of splitting the error term into its two components ui and

y i x i1 1 x i 2 2 x i 3 3 ......... x iK K u i i
and consider that you have repeated observations over time
Individual-specific, fixed over time

y it x it u i it

Varies over time, usual assumptions apply (mean


zero, homoscedastic, uncorrelated with x or u or
itself)

.. and then reduce the complexity of the information available in some way, or
add further assumptions. Your options:

Focus on between variation: loose info on within variation


Focus on within variation: loose info on between variation
Model both types of variation making further assumptions

Within and between estimators


Individual-specific, fixed over time

y it x it u i it

Varies over time, usual assumptions apply


(mean zero, homoscedastic, uncorrelated with
x or u or itself)

Not interested in within variation? Use the means of all observations for all persons i

y i xi ui i

This is the between estimator

Not interested in between variation? Why not remove it in that case!

( y it y i ) ( x it x i ) ( it i )

And this is the within estimator fixed effects

Interested in both? Well, lets treat xi_bar as imperfect to measure person fixed effect
and use between variation where within variation is poorly captured
( y it y i ) (1 ) ( x it x i ) {( 1 ) u i ( it i )}

measures the weight given to


between-group variation, and is
derived from the variances of ui
and i

Between estimator
y it x it u i it
y i xi ui i

Interpret as how much does y change between different people


Not much used

Its inefficient compared to random effects

It doesnt use as much information as is available in the data (only uses means)

Assumption required: that ui is uncorrelated with xi

Except to calculate the parameter for random effects, but Stata does this, not you!

Easy to see why: if they were correlated, how could one decide how much of the
variation in y to attribute to the xs (via the betas) as opposed to the correlation?

Cant estimate effects of variables where mean is invariant over individuals

Age in a cohort study


Macro-level variables

Focusing on within variation the fixed


effects family

Fixed effects estimator


Basic idea: For each individual, calculate the mean of x and the
mean of y. Then run OLS on a transformed dataset where each yit
is replaced by ( x it x i ) and each xit is replaced by ( y it y i )
xtreg y x, fe

Identical to:
Least Squares Dummy Variables regression areg, y x, absorb(pid)
Include a dummy indicator for each individual; all individual level differences,
including the idiosyncratic error term, will then be captured in the person-specific
intercept.

Members of the same family, which you may come across in the literature:
First Differences regress D.(y x)
For each individual, and each time periods y and x, calculate the difference between the value in
this period and that in the last period. Then run OLS on a transformed dataset where each yit is
replaced by (yit yit-1) and each xit is replaced by (xit xit-1)

Hybrid models regress y x mean_x z

run standard OLS but add x i of each time-varying variable as additional regressors

Fixed effects estimator


1000

y it x it u i it

-1000

-500

Income

500

( y it y i ) ( x it x i ) ( it i )
pid wave y x1
x i ( y yi) (x xi )
yi
1
1 2340 0 3076.7 12.5 -736.7
-12.5
1
2 2405 5 3076.7 12.5 -671.7
-7.5
1
3 2730 10 3076.7 12.5 -346.7
-2.5
1
4 3250 15 3076.7 12.5 173.3
2.5
1
5 3705 20 3076.7 12.5 628.3
7.5
1
6 4030 25 3076.7 12.5 953.3
12.5
2
1 1885 5 2442.8 17.5 -557.8
-12.5
2
2 2145 10 2442.8 17.5 -297.8
-7.5
2
3 2275 15 2442.8 17.5 -167.8
-2.5
2
4 2470 20 2442.8 17.5 27.2
2.5
2
5 2762 25 2442.8 17.5 319.2
7.5
2
6 3120 30 2442.8 17.5 677.2
12.5
3
1
780 10 1765.8 22.5 -985.8
-12.5
3
2 1170 15 1765.8 22.5 -595.8
-7.5
3
3 1365 20 1765.8 22.5 -400.8
-2.5
3
4 2405 25 1765.8 22.5 639.2
2.5
3
5 2405 30 1765.8 22.5 639.2
7.5
3
6 2470 35 1765.8 22.5 704.2
12.5

Fixed Effects

-10

0
Number of years since leaving school
pid=1
pid=3

Fixed effects:

10

pid=2

y=65*x1

Ignores between-group variation so its an


inefficient estimator
However, few assumptions are required for FE to
be consistent: ui is allowed to correlate with xi
Disadvantage: cant estimate the effects of any
time-invariant variables
Need to consider change in interpretation of effects

Want to look at the effect of non-time


varying x? Use x and x in OLS
i

y it x it u i it
y it 1 x it 2 x i 3 z i u i

residual

Hint: create
pid
1
1
1
1
1
1
2
2
2
2
2
2
3
3
3
3
3
3

wave
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
5
6

xi

y
2340
2405
2730
3250
3705
4030
1885
2145
2275
2470
2762
3120
780
1170
1365
2405
2405
2470

yourself
x
1
2
2
2
1
1
0
1
1
1
1
0
1
1
0
0
0
0

z
1
1
1
1
1
1
2
2
2
2
2
2
2
2
2
2
2
2

x_bar
1.5
1.5
1.5
1.5
1.5
1.5
0.66
0.66
0.66
0.66
0.66
0.66
0.33
0.33
0.33
0.33
0.33
0.33

it

it

zi: non-time varying individual characteristics for


which you do not need to include group means

the effect of any unobserved characteristic


otherwise transported in the effect x it is shifted
to the effect of x i : 1 approximates the
coefficient in the FE model, 3 gives you,
approximately, the OLS estimate for non-timevarying variables z i
Typically no interest in the effect of x i so no
need to worry about its interpretation. Note
that 1 3 is approximately equal to the effect
in the pooled OLS
Disadvantage: can only control for
unobserved heterogeneity associated with
observed time-varying variables xi; u iresidual

Random effects estimator


y it x it u i it

Random Effects Model here RE Generalised Least


Squares

( y it y i ) (1 ) ( x it x i ) {( 1 ) u i ( it i )}

Uses both within- and between-group variation, so makes best use of the
data and is efficient. Starts off with the idea that using xi_bar is not the best
we can do to capture within variation.

the more imprecise the estimate of the person-level variation (as measured by the
person xi_bar) the more we should draw on the information from other units (x_bar)

Assumption required: that ui is uncorrelated with xi


Rather heroic assumption think of examples
Will see a test for this later
Note that the within and between effect is constrained to be identical (much
more like OLS in this respect so no causal interpretation!).

E.g., when you include a location indicator in your model, you are saying that the
effect on y of moving to a new town is the same as the effect on y of living in
different towns. When you include a female dummy, you are saying that the effect
of being female on y is the same as the effect on y of changing gender.

Estimating fixed effects in STATA


.

xtreg LIKERT female ue_sick partner age age2 badh, fe

Fixed-effects (within) regression


Group variable: pid

R-square-like
R-sq:
statistic

within
= 0.0501
between = 0.1906
overall = 0.1285

corr(u_i, Xb)

Peaks at age 48

Number of obs
Number of groups

Coef.

female
ue_sick
partner
age
age2
badhealth
_cons

(dropped)
1.951485
-.298668
.1141748
-.0011833
1.230831
6.252975

sigma_u
sigma_e
rho

3.9934565
4.0525618
.49265449

F test that all u_i=0:

24204
3317

Obs per group: min =


avg =
max =

1
7.3
14

F(5,20882)
Prob > F

= 0.1561

LIKERT

=
=

Std. Err.

.1394164
.118635
.0214403
.0002209
.0428556
.4932977

14.00
-2.52
5.33
-5.36
28.72
12.68

P>|t|

0.000
0.012
0.000
0.000
0.000
0.000

=
=

[95% Conf. Interval]

1.678218
-.5312018
.0721501
-.0016163
1.14683
5.286073

(fraction of variance due to u_i)


F(3316, 20882) =

4.56

Talk about xtmixed

220.44
0.0000

2.224752
-.0661342
.1561994
-.0007503
1.314831
7.219877

u and e are the two parts


of the error term
Prob > F = 0.0000

Between regression:

Not much used, but useful to compare coefficients with fixed effects

xtreg LIKERT female ue_sick partner age age2 badh, be

Between regression (regression on group means)


Group variable: pid

Number of obs
Number of groups

=
=

24204
3317

R-sq:

Obs per group: min =


avg =
max =

1
7.3
14

within
= 0.0480
between = 0.2322
overall = 0.1482

sd(u_i + avg(e_i.))=

F(6,3310)
Prob > F

3.833357

LIKERT

Coef.

female
ue_sick
partner
age
age2
badhealth
_cons

1.476659
2.038192
-.0101941
.0827335
-.0009489
2.275832
3.953941

Std. Err.
.1350226
.312191
.1777423
.0219026
.0002263
.0926521
.4430909

t
10.94
6.53
-0.06
3.78
-4.19
24.56
8.92

P>|t|
0.000
0.000
0.954
0.000
0.000
0.000
0.000

=
=

166.80
0.0000

[95% Conf. Interval]


1.211923
1.426085
-.35869
.0397895
-.0013927
2.094171
3.085181

1.741395
2.650299
.3383019
.1256775
-.0005052
2.457493
4.822701

Coefficient on
partner was
negative and
significant in FE
model.
In FE, the partner
coeff really measures
the events of gaining
or losing a partner

Random effects regression


.

xtreg LIKERT female ue_sick partner age age2 badh, re theta

Random-effects GLS regression


Group variable: pid

Number of obs
Number of groups

=
=

24204
3317

R-sq:

Obs per group: min =


avg =
max =

1
7.3
14

within
= 0.0500
between = 0.2239
overall = 0.1471

Random effects u_i ~ Gaussian


corr(u_i, X)
= 0 (assumed)

min
0.1986

5%
0.1986

theta
median
0.5482

95%
0.6629

Std. Err.

Wald chi2(6)
Prob > chi2

LIKERT

Coef.

female
ue_sick
partner
age
age2
badhealth
_cons

1.493431
2.045302
-.1947691
.1058038
-.0011062
1.433115
5.181864

.1259931
.1271039
.0973734
.014544
.0001498
.0385506
.3137662

sigma_u
sigma_e
rho

3.0248563
4.0525618
.3577895

(fraction of variance due to u_i)

11.85
16.09
-2.00
7.27
-7.39
37.17
16.52

2013.32
0.0000

Option theta gives a summary


of weights

max
0.6629

=
=

P>|z|
0.000
0.000
0.045
0.000
0.000
0.000
0.000

[95% Conf. Interval]


1.246489
1.796183
-.3856175
.0772981
-.0013998
1.357558
4.566894

1.740373
2.294422
-.0039207
.1343094
-.0008126
1.508673
5.796835

Tells you how good an approximation xi_bar is of the person-level effect; or


how much of the within variation we used to determine the effect size
zero= OLS 1=FE estimators

And what about OLS?

OLS simply treats within- and between-group variation as the same


Pools data across waves

reg LIKERT female ue_sick partner age age2 badh

Source

SS

df

MS

Model
Residual

103583.505
6
591239.694 24197

17263.9175
24.4344214

Total

694823.199 24203

28.7081436

LIKERT

Coef.

female
ue_sick
partner
age
age2
badhealth
_cons

1.409466
2.031815
-.0751296
.0983746
-.0010613
1.841796
4.450393

Std. Err.
.0640651
.1240757
.0769271
.0103316
.0001049
.0357165
.2212733

t
22.00
16.38
-0.98
9.52
-10.12
51.57
20.11

Number of obs
F(
6, 24197)
Prob > F
R-squared
Adj R-squared
Root MSE

P>|t|
0.000
0.000
0.329
0.000
0.000
0.000
0.000

=
=
=
=
=
=

24204
706.54
0.0000
0.1491
0.1489
4.9431

[95% Conf. Interval]


1.283895
1.788619
-.2259116
.078124
-.001267
1.771789
4.016684

1.535038
2.275011
.0756524
.1186252
-.0008557
1.911802
4.884102

Test whether pooling data is valid


y it x it u i it

If the ui do not vary between individuals, they can be treated as part of and OLS
is fine.
Breusch-Pagan Lagrange multiplier test
H0 Variance of ui = 0
H1 Variance of ui not equal to zero
If H0 is not rejected, you can pool the data and use OLS
Post-estimation test after random effects

quietly xtreg LIKERT female ue_sick partner age age2 badh, re

xttest0

Breusch and Pagan Lagrangian multiplier test for random effects


LIKERT[pid,t] = Xb + u[pid] + e[pid,t]
Estimated results:
Var
LIKERT
e
u
Test:

28.70814
16.42326
9.149756

sd = sqrt(Var)
5.357998
4.052562
3.024856

Var(u) = 0
chi2(1) = 10816.48
Prob > chi2 =
0.0000

Comparing models

Compare coefficients between models


Reasonably similar differences in partner and badhealth coeffs
R-squareds are similar
Within and between estimators maximise within and between r-2 respectively.
FE

RE

fe m ale
u e _sick
p artn e r

1.95 ***
-0.30 **

BE

O LS

1.49 ***

1.48 ***

1.41 ***

2.05 ***

2.04 ***

2.03 ***

-0.19 **

-0.01

-0.08

age

0.11 ***

0.11 ***

0.08 ***

0.10 ***

age 2

0.00 ***

0.00 ***

0.00 ***

0.00 ***

b ad h e alth

1.23 ***

1.43 ***

2.28 ***

1.84 ***

_co n s

6.25 ***

5.18 ***

3.96 ***

4.45 ***

R-2 w ith in

0.050

0.050

0.048

R-2 b e tw e e n

0.191

0.224

0.232

R-2 o v e rall

0.129

0.147

0.148

0.149

Potrebbero piacerti anche