Sei sulla pagina 1di 12

Stata Library: Panel Data Analysis using GEE

1 of 12

http://www.ats.ucla.edu/stat/stata/library/gee.htm

Help the Stat Consulting Group by

stat > stata > library >

Stata Library
Panel Data Analysis using GEE
Note: This page uses Stata 11 syntax in the examples.

Introduction
Panel data analysis, also known as cross-sectional time-series analysis, looks at a group of people, the 'panel,' on more
than one occasion. Panel studies are essentially equivalent to longitudinal studies, although there may be many response
variables observed at each time point.
These data are from a 1996 study (Gregoire, Kumar Everitt, Henderson & Studd) on the efficacy of estrogen patches in treating postnatal depression. Women
were randomly assigned to either a placebo control group (group=0, n=27) or estrogen patch group (group=1, n=34). Prior to the first treatment all patients took
the Edinburgh Postnatal Depression Scale (EPDS). EPDS data was collected monthly for six months once the treatment began. Higher scores on the EDPS
are indicative of higher levels of depression.
Before reading in the data we will need to change the size of the largest matrix that Stata can use. We need to do this because one of the analyses requires a
large number of coded variables:

set matsize 160


use http://www.ats.ucla.edu/stat/stata/library/depress, clear

Let the analyses begin


Note that the data are in the wide format, we will collect some information and perform two analyses while the data are in
this format.
sort group
by group: summarize pre dep1 dep2 dep3 dep4 dep5 dep6
-> group=
0
Variable |
Obs
Mean
Std. Dev.
Min
Max
---------+----------------------------------------------------pre |
27
20.77778
3.954874
15
28
dep1 |
27
16.48148
5.279644
7
26
dep2 |
22
15.88818
6.124177
4
27
dep3 |
17
14.12882
4.974648
4.19
22
dep4 |
17
12.27471
5.848791
2
23
dep5 |
17
11.40294
4.438702
3.03
18
dep6 |
17
10.89588
4.68157
3.45
20
-> group=
1
Variable |
Obs
Mean
Std. Dev.
Min
Max
---------+----------------------------------------------------pre |
34
21.24882
3.574432
15
28
dep1 |
34
13.36794
5.556373
1
27
dep2 |
31
11.73677
6.575079
1
27
dep3 |
29
9.134138
5.475564
1
24
dep4 |
28
8.827857
4.666653
0
22
dep5 |
28
7.309286
5.740988
0
24
dep6 |
28
6.590714
4.730158
1
23
corr pre dep1 dep2 dep3 dep4 dep5 dep6
(obs=45)
|
pre
dep1
dep2
dep3
dep4
dep5
dep6
---------+--------------------------------------------------------------pre |
1.0000
dep1 |
0.1922
1.0000
dep2 |
0.3904
0.4982
1.0000
dep3 |
0.3958
0.5258
0.8672
1.0000
dep4 |
0.1658
0.3933
0.7357
0.7831
1.0000
dep5 |
0.2848
0.3674
0.7500
0.8520
0.8449
1.0000
dep6 |
0.2688
0.2795
0.6900
0.7967
0.7894
0.9014
1.0000

7/3/2013 3:15 PM

Stata Library: Panel Data Analysis using GEE

2 of 12

http://www.ats.ucla.edu/stat/stata/library/gee.htm

graph matrix dep1 dep2 dep3 dep4 dep5 dep6, half

Let's check to see if the groups differ on the pretest depression score:

ttest pre, by(group)


Two-sample t test with equal variances
-----------------------------------------------------------------------------Group |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------0 |
27
20.77778
.7611158
3.954874
19.21328
22.34227
1 |
34
21.24882
.61301
3.574432
20.00165
22.496
---------+-------------------------------------------------------------------combined |
61
21.04033
.476678
3.722975
20.08683
21.99383
---------+-------------------------------------------------------------------diff |
-.4710457
.9658499
-2.403707
1.461615
-----------------------------------------------------------------------------Degrees of freedom: 59
Ho: mean(0) - mean(1) = diff = 0
Ha: diff < 0
t = -0.4877
P < t =
0.3138

Ha: diff ~= 0
t = -0.4877
P > |t| =
0.6276

Ha: diff > 0


t = -0.4877
P > t =
0.6862

There isn't much of a difference between groups on the pretest so let's continue on to the panel data analysis.

GEE with Continuous Response Variable


In order to use these data for our panel data analysis, the data must be reorganized into the long form using the reshape command.

reshape long dep, i(subj) j(visit)


(note:

j = 1 2 3 4 5 6)

Data
wide
->
long
----------------------------------------------------------------------------Number of obs.
61
->
366
Number of variables
9
->
5
j variable (6 values)
->
visit
xij variables:
dep1 dep2 ... dep6
->
dep

7/3/2013 3:15 PM

Stata Library: Panel Data Analysis using GEE

3 of 12

http://www.ats.ucla.edu/stat/stata/library/gee.htm

----------------------------------------------------------------------------Before we begin the panel data anlyses let's look at some other analyses for comparison. We will begin with a repeated measures analysis of variance. This is
the analysis that requires the larger matrix size.

anova dep group / subj|group visit group#visit /, repeated(visit)


Number of obs =
295
Root MSE
= 3.39594

R-squared
=
Adj R-squared =

0.7699
0.6980

Source | Partial SS
df
MS
F
Prob > F
------------+---------------------------------------------------Model | 8643.81572
70 123.483082
10.71
0.0000
|
group | 548.494938
1 548.494938
5.60
0.0212
subj|group | 5775.54143
59 97.8905328
------------+---------------------------------------------------visit | 1050.05444
5 210.010889
18.21
0.0000
group#visit | 19.3028953
5 3.86057906
0.33
0.8916
|
Residual | 2583.26536
224 11.5324346
------------+---------------------------------------------------Total | 11227.0811
294 38.1873506
Between-subjects error term:
Levels:
Lowest b.s.e. variable:
Covariance pooled over:

subj|group
61
(59 df)
subj
group
(for repeated variable)

Repeated variable: visit


Huynh-Feldt epsilon
=
Greenhouse-Geisser epsilon =
Box's conservative epsilon =

0.5930
0.5532
0.2000

------------ Prob > F -----------Source |


df
F
Regular
H-F
G-G
Box
------------+---------------------------------------------------visit |
5
18.21
0.0000
0.0000
0.0000
0.0001
group#visit |
5
0.33
0.8916
0.7979
0.7840
0.5658
Residual |
224
----------------------------------------------------------------matrix list e(Srep)
symmetric e(Srep)[6,6]
c1
c2
r1 31.361171
r2
15.71989 38.927914
r3 13.555927 28.365674
r4 9.4625252
22.74371
r5 8.6149335 23.887935
r6 4.6830378 19.242424

c3

c4

c5

c6

27.90249
20.519069
23.161248
18.721233

26.403025
22.47211
18.46616

28.026157
22.103924

22.204237

This analysis indicates that both group and visit are significant while the group*visit interaction is not. Some researchers are critical of this type of analysis since
it is based on fixed-effects adjusted for the repeated factor. Also, this repeated measures analysis assumes compound symmetry in the covariance matrix
(which seems to be a stretch in this case). However, we can do worse. The next several analyses are not meant to answer the research question but to show
relationships among several different commands in Stata.

regress dep pre group visit


Source |
SS
df
MS
---------+-----------------------------Model | 3719.12931
3 1239.70977
Residual | 7507.95176
291 25.8005215
---------+-----------------------------Total | 11227.0811
294 38.1873506

Number of obs
F( 3,
291)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

295
48.05
0.0000
0.3313
0.3244
5.0794

-----------------------------------------------------------------------------dep |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------pre |
.4769071
.0798565
5.972
0.000
.3197376
.6340767
group | -4.290664
.6072954
-7.065
0.000
-5.485912
-3.095416
visit | -1.307841
.169842
-7.700
0.000
-1.642116
-.9735667
_cons |
8.233577
1.803945
4.564
0.000
4.683143
11.78401
-----------------------------------------------------------------------------glm dep pre group visit, fam(gaus) link(iden)

7/3/2013 3:15 PM

Stata Library: Panel Data Analysis using GEE

4 of 12

http://www.ats.ucla.edu/stat/stata/library/gee.htm

Iteration 1 : deviance = 7507.9518


Residual df
Pearson X2
Dispersion

=
=
=

291
7507.952
25.80052

No. of obs =
Deviance
=
Dispersion =

295
7507.952
25.80052

Gaussian (normal) distribution, identity link


-----------------------------------------------------------------------------dep |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
---------+-------------------------------------------------------------------pre |
.4769071
.0798565
5.972
0.000
.3197376
.6340767
group | -4.290664
.6072954
-7.065
0.000
-5.485912
-3.095416
visit | -1.307841
.169842
-7.700
0.000
-1.642116
-.9735667
_cons |
8.233577
1.803945
4.564
0.000
4.683143
11.78401
-----------------------------------------------------------------------------(Model is ordinary regression, use regress instead)
We are finally ready to try the panel data analysis using Stata's xtgee command. xtgee allows us to specify various working covariance structures through the
use of the corr option. We will start with an covariance structure of independence. We don't believe that this is the correct covariance structure but it allows us
to compare results with the OLS regression and the glm results above. The estat wcorrelations (which we will abbreviate as estat wcorr) will allow us to view
the working correlation matrix.

xtgee dep pre group visit, fam(gaus) link(iden) i(subj) t(visit) corr(ind)
Iteration 1: tolerance = 3.270e-15
GEE population-averaged model
Group variable:
Link:
Family:
Correlation:
Scale parameter:

25.45068

Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(3)
Prob > chi2

Pearson chi2(295):
Dispersion (Pearson):

7507.95
25.45068

Deviance
Dispersion

subj
identity
Gaussian
independent

=
=
=
=
=
=
=

295
61
1
4.8
6
146.13
0.0000

=
=

7507.95
25.45068

-----------------------------------------------------------------------------dep |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------pre |
.4769071
.0793133
6.013
0.000
.321456
.6323582
group | -4.290664
.6031641
-7.114
0.000
-5.472844
-3.108484
visit | -1.307841
.1686866
-7.753
0.000
-1.638461
-.9772215
_cons |
8.233577
1.791673
4.595
0.000
4.721962
11.74519
-----------------------------------------------------------------------------estat wcorr
Estimated within-subj correlation matrix R:
r1
r2
r3
r4
r5
r6

c1
1.0000
0.0000
0.0000
0.0000
0.0000
0.0000

c2

c3

c4

c5

c6

1.0000
0.0000
0.0000
0.0000
0.0000

1.0000
0.0000
0.0000
0.0000

1.0000
0.0000
0.0000

1.0000
0.0000

1.0000

The three previous analyses yielded identical but propbably incorrect results. The common thread among them is that they all assume that the observations
within subjects are independent. This seems, on the face of it, to be highly unlikely. Scores on the depression scale are not likely to be independent from one
visit to the next.
We can also try analyzing these data using compound symmetry for the correlational structure. Compound symmetry is obtained using exchangable for the corr
option in xtgee.

xtgee dep pre group visit, fam(gaus) link(iden) i(subj) t(visit) corr(exc)
GEE population-averaged model
Group variable:
subj
Link:
identity
Family:
Gaussian
Correlation:
exchangeable
Scale parameter:

25.56569

Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(3)
Prob > chi2

=
=
=
=
=
=
=

295
61
1
4.8
6
135.08
0.0000

-----------------------------------------------------------------------------dep |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------pre |
.4599018
.1441533
3.190
0.001
.1773666
.742437

7/3/2013 3:15 PM

Stata Library: Panel Data Analysis using GEE

5 of 12

http://www.ats.ucla.edu/stat/stata/library/gee.htm

group | -4.024676
1.081131
-3.723
0.000
-6.143654
-1.905698
visit | -1.226764
.1175009
-10.440
0.000
-1.457062
-.9964666
_cons |
8.432806
3.120987
2.702
0.007
2.315783
14.54983
----------------------------------------------------------------------------estat wcorr
Estimated within-subj correlation matrix R:
r1
r2
r3
r4
r5
r6

c1
1.0000
0.5554
0.5554
0.5554
0.5554
0.5554

c2

c3

c4

c5

c6

1.0000
0.5554
0.5554
0.5554
0.5554

1.0000
0.5554
0.5554
0.5554

1.0000
0.5554
0.5554

1.0000
0.5554

1.0000

Note in particular the change in the standard errors between this analysis and the previous one. Next, what if we impose no preconceived notions about the
correlations among the responses over time. In this next example, we will request an unstructured correlation matrix. This is equivalent to the assumptions made
in a multivariate analysis.

xtgee dep pre group visit, fam(gaus) link(iden) i(subj) t(visit) corr(unstr)
GEE population-averaged model
Group and time vars:
subj visit
Link:
identity
Family:
Gaussian
Correlation:
unstructured
Scale parameter:

25.87029

Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(3)
Prob > chi2

=
=
=
=
=
=
=

295
61
1
4.8
6
94.13
0.0000

-----------------------------------------------------------------------------dep |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------pre |
.3399185
.1326684
2.562
0.010
.0798932
.5999437
group | -4.134413
.9986306
-4.140
0.000
-6.091693
-2.177133
visit | -1.228327
.1492831
-8.228
0.000
-1.520916
-.9357372
_cons |
11.13045
2.892903
3.848
0.000
5.460464
16.80044
-----------------------------------------------------------------------------estat wcorr
Estimated within-subj correlation matrix R:
r1
r2
r3
r4
r5
r6

c1
1.0000
0.4955
0.3477
0.3012
0.2328
0.0943

c2

c3

c4

c5

c6

1.0000
0.8622
0.7359
0.7431
0.5671

1.0000
0.6677
0.7394
0.5625

1.0000
0.7701
0.6166

1.0000
0.7179

1.0000

Now, let's try a different correlation structure, auto regressive with lag one. This is the correlational structure that is most
likely to be correct considering the repeated measures over time
xtgee dep pre group visit, fam(gaus) link(iden) i(subj) t(visit) corr(ar1)
GEE population-averaged model
Group and time vars:
Link:
Family:
Correlation:
Scale parameter:

subj visit
identity
Gaussian
AR(1)
25.82413

Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(3)
Prob > chi2

=
=
=
=
=
=
=

287
53
2
5.4
6
64.55
0.0000

-----------------------------------------------------------------------------dep |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
---------+-------------------------------------------------------------------pre |
.4268002
.1376156
3.101
0.002
.1570785
.6965219
group | -4.218194
1.053504
-4.004
0.000
-6.283023
-2.153364
visit | -1.181975
.1907298
-6.197
0.000
-1.555799
-.8081517
_cons |
9.037864
3.036076
2.977
0.003
3.087264
14.98846
-----------------------------------------------------------------------------estat wcorr
Estimated within-subj correlation matrix R:

7/3/2013 3:15 PM

Stata Library: Panel Data Analysis using GEE

6 of 12

r1
r2
r3
r4
r5
r6

c1
1.0000
0.6812
0.4641
0.3161
0.2154
0.1467

http://www.ats.ucla.edu/stat/stata/library/gee.htm

c2

c3

c4

c5

1.0000
0.6812
0.4641
0.3161
0.2154

1.0000
0.6812
0.4641
0.3161

1.0000
0.6812
0.4641

1.0000
0.6812

c6

1.000

This analysis probably more closely reflects the correlations among the depression scores over six visits that we observed in our descriptive analysis.
Now, let's back up and reconsider the group by visit interaction. We will try a model with the interaction using the ar1 correlations.

xtgee dep pre group visit c.group#c.visit, fam(gaus) link(iden) i(subj) t(visit) corr(ar1)
note:

some groups have fewer than 2 observations


not possible to estimate correlations for those groups
8 groups omitted from estimation

Iteration
Iteration
Iteration
Iteration

1:
2:
3:
4:

tolerance
tolerance
tolerance
tolerance

=
=
=
=

.08642572
.00129189
.00002644
5.433e-07

GEE population-averaged model


Group and time vars:
Link:
Family:
Correlation:

subj visit
identity
Gaussian
AR(1)

Scale parameter:

25.81682

Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(4)
Prob > chi2

=
=
=
=
=
=
=

287
53
2
5.4
6
64.83
0.0000

-----------------------------------------------------------------------------dep |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pre |
.4284649
.1377094
3.11
0.002
.1585595
.6983703
group |
-3.55197
1.654127
-2.15
0.032
-6.794
-.3099395
visit | -1.057824
.3044115
-3.47
0.001
-1.654459
-.4611881
|
c.group#|
c.visit | -.2040059
.3905217
-0.52
0.601
-.9694144
.5614026
|
_cons |
8.606923
3.147897
2.73
0.006
2.437158
14.77669
-----------------------------------------------------------------------------The group by visit interaction still is not significant even though this may be a better approach for testing it. So far we have been treating visit as a continuous
variable. Is it possible that our analysis might change if we were to treat visit as a categorical variable, in the way that the anova did? Let's try one more analysis
using Stata's factor variable syntax (i.) introduced in Stata 11, to include dummy coded terms in the model.

xtgee dep pre group i.visit, fam(gaus) link(iden) i(subj) t(visit) corr(ar1)
note:

some groups have fewer than 2 observations


not possible to estimate correlations for those groups
8 groups omitted from estimation

Iteration
Iteration
Iteration
Iteration

1:
2:
3:
4:

tolerance
tolerance
tolerance
tolerance

=
=
=
=

.12083034
.00138846
.00002034
2.990e-07

GEE population-averaged model


Group and time vars:
Link:
Family:
Correlation:
Scale parameter:

subj visit
identity
Gaussian
AR(1)
25.67071

Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(7)
Prob > chi2

=
=
=
=
=
=
=

287
53
2
5.4
6
66.85
0.0000

-----------------------------------------------------------------------------dep |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pre |
.4264589
.1372194
3.11
0.002
.1575137
.6954041
group | -4.197096
1.050645
-3.99
0.000
-6.256323
-2.137869
|
visit |
2 |
-.964717
.5556079
-1.74
0.083
-2.053689
.1242546

7/3/2013 3:15 PM

Stata Library: Panel Data Analysis using GEE

7 of 12

http://www.ats.ucla.edu/stat/stata/library/gee.htm

3
4
5
6

| -2.790063
.7474989
-3.73
0.000
-4.255134
-1.324992
| -3.730425
.8528421
-4.37
0.000
-5.401964
-2.058885
| -5.127078
.9147959
-5.60
0.000
-6.920045
-3.334111
|
-5.84916
.9534054
-6.14
0.000
-7.7178
-3.98052
|
_cons |
7.896145
2.998003
2.63
0.008
2.020168
13.77212
-----------------------------------------------------------------------------testparm i.visit
(
(
(
(
(

1)
2)
3)
4)
5)

2.visit
3.visit
4.visit
5.visit
6.visit

=
=
=
=
=

0
0
0
0
0

chi2( 5) =
Prob > chi2 =

40.56
0.0000

We can test to see whether the categorical version of visit accounts for more variability that the continuous version by including both in the model but using only
k - 2 = 4 dummy variables for time

xtgee dep pre group c.visit i.visit, fam(gaus) link(iden) i(subj) t(visit) corr(ar1)
note: 6.visit omitted because of collinearity
note: some groups have fewer than 2 observations
not possible to estimate correlations for those groups
8 groups omitted from estimation
Iteration
Iteration
Iteration
Iteration

1:
2:
3:
4:

tolerance
tolerance
tolerance
tolerance

=
=
=
=

.203814
.00172276
.000025
3.675e-07

GEE population-averaged model


Group and time vars:
Link:
Family:
Correlation:
Scale parameter:

subj visit
identity
Gaussian
AR(1)
25.67071

Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(7)
Prob > chi2

=
=
=
=
=
=
=

287
53
2
5.4
6
66.85
0.0000

-----------------------------------------------------------------------------dep |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pre |
.4264589
.1372194
3.11
0.002
.1575137
.6954041
group | -4.197096
1.050645
-3.99
0.000
-6.256323
-2.137869
visit | -1.169832
.1906811
-6.14
0.000
-1.54356
-.7961039
|
visit |
2 |
.205115
.5196299
0.39
0.693
-.8133408
1.223571
3 | -.4503992
.648481
-0.69
0.487
-1.721399
.8206003
4 | -.2209286
.6602134
-0.33
0.738
-1.514923
1.073066
5 | -.4477498
.5585628
-0.80
0.423
-1.542513
.6470131
6 | (omitted)
|
_cons |
9.065977
3.031614
2.99
0.003
3.124124
15.00783
-----------------------------------------------------------------------------testparm i.visit
(
(
(
(

1)
2)
3)
4)

2.visit
3.visit
4.visit
5.visit

=
=
=
=

0
0
0
0

chi2( 4) =
Prob > chi2 =

1.92
0.7506

These results indicate that the categorical version of visit does not account for significantly more variability than the continuous version. In the final analysis, I
think that I prefer the following model, xtgee dep pre group visit, fam(gaus) link(iden) i(subj) t(visit) corr(ar1), of all the analyses run so far. Those results
looked as follows:

-----------------------------------------------------------------------------dep |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pre |
.4268002
.1376156
3.10
0.002
.1570785
.6965219

7/3/2013 3:15 PM

Stata Library: Panel Data Analysis using GEE

8 of 12

http://www.ats.ucla.edu/stat/stata/library/gee.htm

group | -4.218194
1.053504
-4.00
0.000
-6.283023
-2.153364
visit | -1.181975
.1907298
-6.20
0.000
-1.555799
-.8081517
_cons |
9.037864
3.036076
2.98
0.003
3.087264
14.98846
-----------------------------------------------------------------------------The final interpretation of these results indicate that there is a significant effect for the pretest, i.e., for evey one point increase in the pretest score there is about
a 0.4 increase in the depression score, when controlling for treatment and visit. There is also an effect for the estrogen patch when controlling for pretest
depression and visit. Use of the estrogen patch reduces the depression score by 4.2 point. Finally, there is also a significant visit effect when controlling for
pretest depression and group membership. The depression score decreases on the average by 1.18 points for each visit.

GEE with Binary Response Variable


The binary response variable in these examples was created from the data from the 1996 Gregoire, Kumar Everitt,
Henderson & Studd study on the efficacy of estrogen patches in treating postnatal depression. Women were randomly
assigned to either a placebo control group (group=0, n=27) or estrogen patch group (group=1, n=34). Prior to the first
treatment all patients took the Edinburgh Postnatal Depression Scale (EPDS). EPDS data was collected monthly for six
months once the treatment began. Depression scores greater than or equal to 11 were coded as 1.
use http://www.ats.ucla.edu/stat/stata/library/depres01, clear
We will go through as series of analyses pretty much paralleling models that were run above using the continuous response variable. To get a binary logit type
model we will set family to binary and link to logit. We will start with the correlation structure independent follow by exchangable (compound symmetry) and
then unstructured.

xtgee depressd group visit, i(subj) fam(bin) link(logit) corr(ind)


Iteration 1: tolerance = 2.028e-12
GEE population-averaged model
Group variable:
Link:
Family:
Correlation:

subj
logit
binomial
independent

Scale parameter:

Pearson chi2(295):
Dispersion (Pearson):

295.72
1.00245

Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(2)
Prob > chi2

=
=
=
=
=
=
=

295
61
1
4.8
6
52.54
0.0000

Deviance
Dispersion

=
=

338.95
1.148974

-----------------------------------------------------------------------------depressd |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------group | -1.606602
.277919
-5.78
0.000
-2.151313
-1.061891
visit | -.4402142
.0802387
-5.49
0.000
-.5974791
-.2829493
_cons |
2.38366
.3675414
6.49
0.000
1.663292
3.104028
-----------------------------------------------------------------------------estat wcorr
Estimated within-subj correlation matrix R:
|
c1
c2
c3
c4
c5
c6
------+-----------------------------------------------------------------r1 |
1
r2 |
0
1
r3 |
0
0
1
r4 |
0
0
0
1
r5 |
0
0
0
0
1
r6 |
0
0
0
0
0
1
xtgee depressd group visit, i(subj) fam(bin) link(logit) corr(exc)
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration

1:
2:
3:
4:
5:
6:

tolerance
tolerance
tolerance
tolerance
tolerance
tolerance

=
=
=
=
=
=

.0287363
.00515198
.00024259
.00001921
1.162e-06
5.879e-08

GEE population-averaged model


Group variable:
subj
Link:
logit
Family:
binomial
Correlation:
exchangeable
Scale parameter:

Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(2)
Prob > chi2

=
=
=
=
=
=
=

295
61
1
4.8
6
45.64
0.0000

7/3/2013 3:15 PM

Stata Library: Panel Data Analysis using GEE

9 of 12

http://www.ats.ucla.edu/stat/stata/library/gee.htm

-----------------------------------------------------------------------------depressd |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------group | -1.616323
.4669082
-3.46
0.001
-2.531446
-.7011994
visit | -.3984038
.0613331
-6.50
0.000
-.5186145
-.2781931
_cons |
2.409522
.4456646
5.41
0.000
1.536035
3.283008
-----------------------------------------------------------------------------estat wcorr
Estimated within-subj correlation matrix R:
|
c1
c2
c3
c4
c5
c6
------+-----------------------------------------------------------------r1 |
1
r2 | .4518293
1
r3 | .4518293
.4518293
1
r4 | .4518293
.4518293
.4518293
1
r5 | .4518293
.4518293
.4518293
.4518293
1
r6 | .4518293
.4518293
.4518293
.4518293
.4518293
1
xtgee depressd group visit, i(subj) t(visit) fam(bin) link(logit) corr(unstr)
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration

1:
2:
3:
4:
5:
6:

tolerance
tolerance
tolerance
tolerance
tolerance
tolerance

=
=
=
=
=
=

.03269114
.00216503
.00040481
.00005372
6.971e-06
8.126e-07

GEE population-averaged model


Group and time vars:
subj visit
Link:
logit
Family:
binomial
Correlation:
unstructured
Scale parameter:

Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(2)
Prob > chi2

=
=
=
=
=
=
=

295
61
1
4.8
6
32.57
0.0000

-----------------------------------------------------------------------------depressd |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------group |
-1.5933
.4553165
-3.50
0.000
-2.485704
-.7008963
visit | -.3897561
.0748284
-5.21
0.000
-.5364169
-.2430952
_cons |
2.311344
.4521761
5.11
0.000
1.425095
3.197593
-----------------------------------------------------------------------------estat wcorr
Estimated within-subj correlation matrix R:
|
c1
c2
c3
c4
c5
c6
------+-----------------------------------------------------------------r1 |
1
r2 |
.404501
1
r3 | .1803076
.6315383
1
r4 |
.284646
.5602217
.5795466
1
r5 | .2789439
.6335593
.4888153
.7378928
1
r6 | .0837916
.3078638
.5616702
.4672481
.5587511
1
With these data, just as with the continnuous response variable, it might be more reasonable to hypothesize that the correlation structure would be
autoregressive.

xtgee depressd group visit, i(subj) t(visit) fam(bin) link(logit) corr(ar1)


note:

some groups have fewer than 2 observations


not possible to estimate correlations for those groups
8 groups omitted from estimation

Iteration
Iteration
Iteration
Iteration

1:
2:
3:
4:

tolerance
tolerance
tolerance
tolerance

=
=
=
=

.00583303
.00029981
7.003e-06
1.599e-07

GEE population-averaged model


Group and time vars:
Link:
Family:

subj visit
logit
binomial

Number of obs
Number of groups
Obs per group: min
avg

=
=
=
=

287
53
2
5.4

7/3/2013 3:15 PM

Stata Library: Panel Data Analysis using GEE

10 of 12

http://www.ats.ucla.edu/stat/stata/library/gee.htm

Correlation:

AR(1)

Scale parameter:

Wald chi2(2)
Prob > chi2

max =
=
=

6
26.04
0.0000

-----------------------------------------------------------------------------depressd |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------group | -1.588712
.4391128
-3.62
0.000
-2.449358
-.7280672
visit | -.4036122
.0933711
-4.32
0.000
-.5866163
-.2206082
_cons |
2.259702
.4961409
4.55
0.000
1.287284
3.23212
-----------------------------------------------------------------------------estat wcorr
Estimated within-subj correlation matrix R:
|
c1
c2
c3
c4
c5
c6
------+-----------------------------------------------------------------r1 |
1
r2 | .5643214
1
r3 | .3184587
.5643214
1
r4 | .1797131
.3184587
.5643214
1
r5 | .1014159
.1797131
.3184587
.5643214
1
r6 | .0572312
.1014159
.1797131
.3184587
.5643214
1

If we want, we can also obtain the results in the odds ratio metric using the eform option.
xtgee, eform
GEE population-averaged model
Group and time vars:
Link:
Family:
Correlation:

subj visit
logit
binomial
AR(1)

Scale parameter:

Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(2)
Prob > chi2

=
=
=
=
=
=
=

287
53
2
5.4
6
26.04
0.0000

-----------------------------------------------------------------------------depressd | Odds Ratio


Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------group |
.2041883
.0896617
-3.62
0.000
.086349
.4828413
visit |
.6679031
.0623629
-4.32
0.000
.5562061
.8020309
-----------------------------------------------------------------------------Let's add in the pretest and a group by visit interaction.

xtgee depressd pre group visit c.group#c.visit, i(subj) t(visit) fam(bin) link(logit) corr(a
note:

some groups have fewer than 2 observations


not possible to estimate correlations for those groups
8 groups omitted from estimation

Iteration
Iteration
Iteration
Iteration
Iteration
Iteration

1:
2:
3:
4:
5:
6:

tolerance
tolerance
tolerance
tolerance
tolerance
tolerance

=
=
=
=
=
=

.27128178
.00369124
.00040644
.00002584
1.892e-06
1.383e-07

GEE population-averaged model


Group and time vars:
Link:
Family:
Correlation:
Scale parameter:

subj visit
logit
binomial
AR(1)
1

Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(4)
Prob > chi2

=
=
=
=
=
=
=

287
53
2
5.4
6
29.71
0.0000

-----------------------------------------------------------------------------depressd |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pre |
.1231682
.0565583
2.18
0.029
.012316
.2340204
group | -1.278468
.7833482
-1.63
0.103
-2.813802
.2568666
visit | -.3504923
.1484459
-2.36
0.018
-.6414409
-.0595436
|
c.group#|
c.visit | -.1279848
.1946883
-0.66
0.511
-.5095669
.2535973

7/3/2013 3:15 PM

Stata Library: Panel Data Analysis using GEE

11 of 12

http://www.ats.ucla.edu/stat/stata/library/gee.htm

|
_cons | -.4669354
1.271484
-0.37
0.713
-2.958999
2.025128
-----------------------------------------------------------------------------Clearly, there is no interaction but we'll stick with the pretest for the moment. Next let's try the categorical version of visit and the model that contains both the
categorical and continuous version of visit.

xtgee depressd pre group i.visit, i(subj) fam(bin) link(logit) t(visit) corr(ar1)
note:

some groups have fewer than 2 observations


not possible to estimate correlations for those groups
8 groups omitted from estimation

Iteration
Iteration
Iteration
Iteration
Iteration

1:
2:
3:
4:
5:

tolerance
tolerance
tolerance
tolerance
tolerance

=
=
=
=
=

.31672971
.00966096
.00059261
3.397e-06
9.420e-07

GEE population-averaged model


Group and time vars:
Link:
Family:
Correlation:

subj visit
logit
binomial
AR(1)

Scale parameter:

Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(7)
Prob > chi2

=
=
=
=
=
=
=

287
53
2
5.4
6
30.86
0.0001

-----------------------------------------------------------------------------depressd |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pre |
.1140311
.056433
2.02
0.043
.0034244
.2246378
group | -1.692654
.4377388
-3.87
0.000
-2.550607
-.8347021
|
visit |
2 | -.1751772
.3106588
-0.56
0.573
-.7840573
.4337028
3 | -1.015265
.3915632
-2.59
0.010
-1.782715
-.2478151
4 | -1.108258
.4287682
-2.58
0.010
-1.948628
-.2678878
5 | -1.489162
.4548596
-3.27
0.001
-2.380671
-.597654
6 |
-2.14973
.4951443
-4.34
0.000
-3.120195
-1.179265
|
_cons | -.4832614
1.18731
-0.41
0.684
-2.810346
1.843823
-----------------------------------------------------------------------------testparm i.visit
(
(
(
(
(

1)
2)
3)
4)
5)

2.visit
3.visit
4.visit
5.visit
6.visit

=
=
=
=
=

0
0
0
0
0

chi2( 5) =
Prob > chi2 =

21.92
0.0005

xtgee depressd pre group c.visit i.visit, i(subj) fam(bin) link(logit) t(visit) corr(ar1)
note: 6.visit omitted because of collinearity
note: some groups have fewer than 2 observations
not possible to estimate correlations for those groups
8 groups omitted from estimation
Iteration
Iteration
Iteration
Iteration
Iteration
Iteration

1:
2:
3:
4:
5:
6:

tolerance
tolerance
tolerance
tolerance
tolerance
tolerance

=
=
=
=
=
=

.38986389
.0130539
.00083314
4.664e-06
1.294e-06
6.999e-08

GEE population-averaged model


Group and time vars:
Link:
Family:
Correlation:
Scale parameter:

subj visit
logit
binomial
AR(1)
1

Number of obs
Number of groups
Obs per group: min
avg
max
Wald chi2(7)
Prob > chi2

=
=
=
=
=
=
=

287
53
2
5.4
6
30.86
0.0001

7/3/2013 3:15 PM

Stata Library: Panel Data Analysis using GEE

12 of 12

http://www.ats.ucla.edu/stat/stata/library/gee.htm

-----------------------------------------------------------------------------depressd |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------pre |
.1140311
.056433
2.02
0.043
.0034244
.2246378
group | -1.692654
.4377388
-3.87
0.000
-2.550607
-.8347021
visit |
-.429946
.0990289
-4.34
0.000
-.624039
-.235853
|
visit |
2 |
.2547688
.2901423
0.88
0.380
-.3138998
.8234373
3 | -.1553729
.3440849
-0.45
0.652
-.829767
.5190212
4 |
.1815801
.3544878
0.51
0.608
-.5132033
.8763635
5 |
.2306217
.3201945
0.72
0.471
-.3969481
.8581914
6 | (omitted)
|
_cons | -.0533153
1.201905
-0.04
0.965
-2.409005
2.302375
-----------------------------------------------------------------------------testparm i.visit
(
(
(
(

1)
2)
3)
4)

2.visit
3.visit
4.visit
5.visit

=
=
=
=

0
0
0
0

chi2( 4) =
Prob > chi2 =

3.04
0.5507

How to cite this page

Report an error on this page or leave a comment

The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.

IDRE RESEA RCH TECHNOLOGY


GROUP

High Perform ance Com puting

Statistical Com puting

2013 UC Regents

CONTACT

NEWS

GIS

StatisticalCom puting

Hoffm an2 Cluster

M apshare

Classes

Hoffm an2 AccountApplication

Visualization

Conferences

Hoffm an2 Usage Statistics

3D M odeling

Reading M aterials

UC Grid Portal

Technology Sandbox

IDRE Listserv

UCLA Grid Portal

Tech Sandbox Access

IDRE Resources

Shared Cluster& Storage

Data Centers

SocialSciences Data Archive

AboutIDRE

GIS and Visualization

ABOUT

High Perform ance Com puting

EVENTS

OUR EXPERTS

Terms of Use & Privacy Policy

7/3/2013 3:15 PM

Potrebbero piacerti anche