PSM in Stata Using Teffects

12/12/2014
Propensity Score Matching in Stata using teffects

For many years, the standard tool for propensity score matching in Stata has been the psmatch2command, written by Edwin Leuven and
Barbara Sianesi. However, Stata 13 introduced a new teffectscommand for estimating treatments effects in a variety of ways, including
propensity score matching. The t
effects psmatchcommand has one very important advantage over psmatch2: it takes into
account the fact that propensity scores are estimated rather than known when calculating standard errors. This often turns out to make a
significant difference, and sometimes in surprising ways. We thus strongly recommend switching from psmatch2to tef
fects
psmatch, and this article will help you make the transition.
An Example of Propensity Score Matching

Run the following command in Stata to load an example data set:
use http://ssc.wisc.edu/sscc/pubs/files/psm
It consists of four variables: a treatment indicator t
, covariates x1and x2, and an outcome y. This is constructed data, and the effect of
the treatment is in fact a one unit increase in y. However, the probability of treatment is positively correlated with x1and x
2, and both x1
and x2are positively correlated with y
. Thus simply comparing the mean value of yfor the treated and untreated groups badly
overestimates the effect of treatment:
ttest y, by(t)
(Regressing
The
yon t, x1, and x2will give
psmatch2command will give
you a pretty good picture of the situation.)
you a much better estimate of the treatment effect:
psmatch2 t x1 x2, out(y)
---------------------------------------------------------------------------------------Variable
Sample |
Treated
Controls
Difference
S.E. T-stat
----------------------------+----------------------------------------------------------y Unmatched | 1.8910736 -.423243358
2.31431696 .109094342
21.21
ATT | 1.8910736 .871388246
1.01968536 .173034999
5.89
----------------------------+----------------------------------------------------------Note: S.E. does not take into account that the propensity score is estimated.
The teffects Command
You can carry out the same estimation with
matching is:
teffects. The
basic syntax of the
teffectscommand when used for propensity score
teffects psmatch (outcome) (treatment covariates)

In this case the basic command would be:
teffects psmatch (y) (t x1 x2)

However, the default behavior of te
ffectsis not the same as psmatch2so we'll need to use some options to get the same results.
First, psm
atch2by default reports the average treatment effect on the treated (which it refers to as ATT). The teffectscommand by
default reports the average treatment effect (ATE) but will calculate the average treatment effect on the treated (which it refers to as ATET)
if given the atetoption. Second, p
smatch2by default uses a probit model for the probability of treatment. The teffectscommand
uses a logit model by default, but will use probit if the probitoption is applied to the treatment equation. So to run the same model
using t
effectstype:
teffects psmatch (y) (t x1 x2, probit), atet
Treatment-effects estimation
Number of obs
=
1000
Estimator
: propensity-score matching
Matches: requested =
1
Outcome model : matching
min =
1
Treatment model: probit
max =
1
-----------------------------------------------------------------------------|
AI Robust
y|
Coef. Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------ATET
|
t|
(1 vs 0) | 1.019685 .1227801
8.30
0.000
.7790407
1.26033
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors
1/6
12/12/2014
-----------------------------------------------------------------------------The average treatment effect on the treated is identical, other than being rounded at a different place. But note that
teffectsreports a
very different standard error (we'll discuss why that is shortly), plus a Z-statistic, p-value, and 95% confidence interval rather than just a Tstatistic.
Running
teffectswith the
default options gives the following:
Number of obs
=
1000
Estimator
1
min =
1
Treatment model: logit
max =
1
-----------------------------------------------------------------------------|
AI Robust
y|
Coef. Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------ATE
|
t|
(1 vs 0) | 1.019367 .1164694
8.75
0.000
.7910912
1.247643
-----------------------------------------------------------------------------This is equivalent to:
psmatch2 t x1 x2, out(y) logit ate
---------------------------------------------------------------------------------------Variable
Sample |
Treated
Controls
Difference
S.E. T-stat
----------------------------+----------------------------------------------------------y Unmatched | 1.8910736 -.423243358
2.31431696 .109094342
21.21
ATT | 1.8910736 .930722886
.960350715 .168252917
5.71
ATU |-.423243358 .625587554
1.04883091
.
.
ATE |
1.01936701
.
.
----------------------------+----------------------------------------------------------Note: S.E. does not take into account that the propensity score is estimated.
The ATE from this model is very similar to the ATT/ATET from the previous model. But note that
different ATT in this model. The
teffectscommand reports the
psmatch2is reporting a somewhat
same ATET if asked:
teffects psmatch (y) (t x1 x2), atet
Number of obs
=
1000
Estimator
1
min =
1
Treatment model: logit
max =
1
-----------------------------------------------------------------------------|
AI Robust
y|
Coef. Std. Err.
z
P>|z|
-------------+---------------------------------------------------------------ATET
|
t|
(1 vs 0) | .9603507 .1204748
7.97
0.000
.7242245
1.196477
-----------------------------------------------------------------------------Standard Errors
The output of
psmatch2includes the
following caveat:
Note: S.E. does not take into account that the propensity score is estimated.
A recent paper by Abadie and Imbens (2012. Matching on the estimated propensity score. Harvard University and National Bureau of
Economic Research) established how to take into account that propensity scores are estimated, and teffects psmatc
hrelies on their
work. Interestingly, the adjustment for ATE is always negative, leading to smaller standard errors: matching based on estimated propensity
scores turns out to be more efficient than matching based on true propensity scores. However, for ATET the adjustment can be positive or
negative, so the standard errors reported by
psmatch2may be
too large or to small.
Handling Ties
Thus far we've used p
smatch2and teffects psmatchto do simple nearest-neighbor matching with one neighbor (and no caliper).
However, this raises the question of what to do when two observations have the same propensity score and are thus tied for "nearest
neighbor." Ties are common if the covariates in the treatment model are categorical or even integers.
2/6
12/12/2014
The psma
tch2command by default matches with one of the tied observations, but with the tiesoption it matches with all tied
observations. The t
effects psmatchcommand always matches with all ties. If your data set has multiple observations with the same
propensity score, you won't get exactly the same results from teffects psmatchas you were getting from psmat
ch2unless you
go back and add the t
iesoption to your psmatch2commands. (At this time we are not aware of any clear guidance as to whether it is
better to match with ties or not.)
Matching With Multiple Neighbors

By default teff
ects psmatchmatches each observation with one other observation. You can change this with the
(or just n
n()) option. For example, you could match each observation with its three nearest neighbors with:
nneighbor()
teffects psmatch (y) (t x1 x2), nn(3)
Postestimation
By default teff
ects psmatchdoes not add any new variables to the data set. However, there are a variety of useful variables that
can be created with options and post-estimation p
redictcommands. The following table lists the 1st and 467th observations of the
example data set after some of these variables have been created. We'll refer to it as we explain the commands that created the new
variables. Reviewing these variables is also a good way to make sure you understand exactly how propensity score matching works.
+-------------------------------------------------------------------------------------------------------+
|
x1
x2 t
y match1
ps0
ps1
y0
y1
te |
|-------------------------------------------------------------------------------------------------------|
1. | .0152526 -1.793022 0 -1.79457
467 .9081651 .0918349
-1.79457 2.231719 4.026289 |
467. | -2.057838
.5360286 1 2.231719
781
.907606
.092394 -.6012772 2.231719 2.832996 |
+-------------------------------------------------------------------------------------------------------+
Start with a clean slate by typing:
use http://ssc.wisc.edu/sscc/pubs/files/psm, replace

The
gen()option tells teffects psmatchto create
a new variable (or variables). For each observation, this new variable will
contain the number of the observation that observation was matched with. If there are ties or you told teffects psm
atchto use
multiple neighbors, then g
en()will need to create multiple variables. Thus you supply the stem of the variable name, and teffects
psmatchwill add suffixes as needed.
teffects psmatch (y) (t x1 x2), gen(match)

In this case each observation is only matched with one other, so gen(match) only creates
the match of observation 1 is observation 467 (which is why those two are listed).
match1. Referring to the
example output,
Note that these observation numbers are only valid in the current sort order, so make sure you can recreate that order if needed. If
necessary, run:
gen ob=_n
and then:
sort ob
to restore the current sort order.
The
predictcommand with the psoption creates two variables containing the
propensity scores, or that observation's predicted
probability of being in either the control group or the treated group:
predict ps0 ps1, ps

Here ps0is the predicted probability of being in the control group (t=0) and ps1is the predicted probability of being in the treated group
(t
=1). Observations 1 and 467 were matched because their propensity scores are very similar.
The
pooption creates variables containing the
potential outcomes for each observation:
predict y0 y1, po
Because observation 1 is in the control group,
y0contains its observed value
of
y. y1is the
observed value of
yfor observation 1's match,
observation 467. The propensity score matching estimator assumes that if observation 1 had been in the treated group its value of y would
have been that of the observation in the treated group most similar to it (where "similarity" is measured by the difference in their propensity
scores).
Observation 467 is in the treated group, so its value for
its match, observation 781.
y1is its observed value
of
ywhile
its value for
y0is the
observed value of
yfor
Running the predict command with no options gives the treatment effect itself:
3/6
12/12/2014
predict te
The treatment effect is simply the difference between
y1and y0. You could calculate
the ATE yourself (but emphatically not its standard
error) with:
sum te
and the ATET with:
sum te if t
Regression on the "Matched Sample"

Another way to conceptualize propensity score matching is to think of it as choosing a sample from the control group that "matches" the
treatment group. Any differences between the treatment and matched control groups are then assumed to be a result of the treatment.
Note that this gives the average treatment effect on the treatedto calculate the ATE you'd create a sample of the treated group that
matches the controls. Mathematically this is all equivalent to using matching to estimate what an observation's outcome would have been if
it had been in the other group, as described above.
Sometimes researchers then want to run regressions on the "matched sample," defined as the observations in the treated group plus the
observations in the control group which were matched to them. We will discuss how this can be done without passing judgement on the
appropriateness or usefulness of the technique.
psmatch2makes this easy by creating a _weightvariable
automatically. For observations in the treated group,
_weightis 1. For
observations in the control group it is the number of observations from the treated group for which the observation is a match. If the
observation is not a match, _
weightis missing. _weightthus acts as a frequency weight (fweight) and can be used with Stata's
standard weighting syntax. For example (starting with a clean slate again):
use http://ssc.wisc.edu/sscc/pubs/files/psm, replace

psmatch2 t x1 x2, out(y) logit
reg y x1 x2 t [fweight=_weight]
Observations with a missing value for
_weightare
teffects psmatchdoes not create
omitted from the regression, so it is automatically limited to the matched sample.
_weightvariable, but it is possible
to create one based on the
match1variable. Here
is
example code, with comments:
gen ob=_n //store the observation numbers for future use

save fulldata,replace // save the complete data set
keep if t // keep just the treated group
keep match1 // keep just the match1 variable (the observation numbers of their matches)
bysort match1: gen weight=_N // count how many times each control observation is a match
by match1: keep if _n==1 // keep just one row per control observation
ren match1 ob //rename for merging purposes
merge 1:m ob using fulldata // merge back into the full data
replace weight=1 if t // set weight to 1 for treated observations
The resulting
weightvariable
will be identical to the
_weightvariable
created by
psmatch2, as can be
verified with:
assert weight==_weight
It is used in the same way and will give exactly the same results:
reg y x1 x2 t [fweight=weight]
Obviously this is a good bit more work than using p
smatch2. If your propensity score matching model can be done using both
teffects psmatchand psmatch2, you may want to run teffects psmatchto get the correct standard error and then
psmatch2if you need a _weightvariable.
This regression has an N of 666, 333 from the treated group and 333 from the control group. However, it only uses 189 different
observations from the control group. About 1/3 of them are the matches for more than one observation from the treated group and are thus
duplicated in the regression (run
psmatch2to ensure
tab weight if !tfor details). Researchers sometimes use
the
norepl(no replacement) option in
each observation is used just once, even though this generally makes the matching worse. To the best of our
knowledge there is no equivalent with t
effects psmatch.
The results of this regression leave somewhat to be desired:
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t
P>|t|
-------------+---------------------------------------------------------------x1 |
1.11891 .0440323
25.41
0.000
1.03245
1.205369
x2 |
1.05594 .0417253
25.31
0.000
.97401
1.13787
4/6
12/12/2014
t | .9563751 .0802273
11.92
0.000
.7988445
1.113906
_cons | .0180986 .0632538
0.29
0.775
-.1061036
.1423008
-----------------------------------------------------------------------------By construction all the coefficients should be 1. Regression using all the observations (reg
[fweight=weight]) does better in this case:
y x1 x2 trather than reg y x1 x2 t
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t
P>|t|
-------------+---------------------------------------------------------------x1 | 1.031167 .0346941
29.72
0.000
.9630853
1.099249
x2 | .9927759 .0333297
29.79
0.000
.9273715
1.05818
t | .9791484 .0769067
12.73
0.000
.8282306
1.130066
_cons | .0591595 .0416008
1.42
0.155
-.0224758
.1407948
-----------------------------------------------------------------------------Other Methods of Estimating Treatment Effects
While propensity score matching is the most common method of estimating treatments effects at the SSCC,
teffectsalso implements
Regression Adjustment (t
effects ra), Inverse Probability Weighting (teffects ipw), Augmented Inverse Probability Weighting
(t
effects aipw), Inverse Probability Weighted Regression Adjustment (teffects ipwra), and Nearest Neighbor Matching
(t
effects
nnmatch). The
syntax is similar, though it varies whether you need to specify variables for the outcome model, the
treatment model, or both:
teffects ra (y x1 x2) (t)

teffects ipw (y) (t x1 x2)
teffects aipw (y x1 x2) (t x1 x2)
teffects ipwra (y x1 x2) (t x1 x2)
teffects nnmatch (y x1 x2) (t)
Complete Example Code

The following is the complete code for the examples in this article.
clear all
use http://www.ssc.wisc.edu/sscc/pubs/files/psm
ttest y, by(t)
reg y x1 x2 t
psmatch2 t x1 x2, out(y)
teffects psmatch (y) (t x1 x2, probit), atet
psmatch2 t x1 x2, out(y) logit ate
teffects psmatch (y) (t x1 x2), atet
use http://www.ssc.wisc.edu/sscc/pubs/files/psm, replace
predict ps0 ps1, ps
predict y0 y1, po
predict te
l if _n==1 | _n==467
use http://www.ssc.wisc.edu/sscc/pubs/files/psm, replace
psmatch2 t x1 x2, out(y) logit
reg y x1 x2 t [fweight=_weight]
gen ob=_n
save fulldata,replace
keep if t
keep match1
bysort match1: gen weight=_N
by match1: keep if _n==1
ren match1 ob
5/6
12/12/2014
merge 1:m ob using fulldata

replace weight=1 if t
assert weight==_weight
reg y x1 x2 t [fweight=weight]
reg y x1 x2 t
teffects ra (y x1 x2) (t)
teffects ipw (y) (t x1 x2)
teffects aipw (y x1 x2) (t x1 x2)
teffects ipwra (y x1 x2) (t x1 x2)
teffects nnmatch (y x1 x2) (t)
Last Revised: 11/13/2013
2009-2014 UW Board of Regents, University of Wisconsin - Madison
6/6

PSM in Stata Using Teffects

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

PSM in Stata Using Teffects

Caricato da

Copyright:

Formati disponibili

12/12/2014

Propensity Score Matching in Stata using teffects

Propensity Score Matching in Stata using teffects

An Example of Propensity Score Matching

yon t, x1, and x2will give

psmatch2command will give

you a pretty good picture of the situation.)

you a much better estimate of the treatment effect:

psmatch2 t x1 x2, out(y)

basic syntax of the

teffectscommand when used for propensity score

teffects psmatch (outcome) (treatment covariates)

teffects psmatch (y) (t x1 x2)

teffects psmatch (y) (t x1 x2, probit), atet

Propensity Score Matching in Stata using teffects

default options gives the following:

teffects psmatch (y) (t x1 x2)

psmatch2 t x1 x2, out(y) logit ate

teffectscommand reports the

psmatch2is reporting a somewhat

same ATET if asked:

teffects psmatch (y) (t x1 x2), atet

too large or to small.

Propensity Score Matching in Stata using teffects

Matching With Multiple Neighbors

teffects psmatch (y) (t x1 x2), nn(3)

use http://ssc.wisc.edu/sscc/pubs/files/psm, replace

gen()option tells teffects psmatchto create

psmatchwill add suffixes as needed.

teffects psmatch (y) (t x1 x2), gen(match)

match1. Referring to the

predictcommand with the psoption creates two variables containing the

propensity scores, or that observation's predicted

probability of being in either the control group or the treated group:

predict ps0 ps1, ps

pooption creates variables containing the

potential outcomes for each observation:

y0contains its observed value

yfor observation 1's match,

y1is its observed value

its value for

Propensity Score Matching in Stata using teffects

y1and y0. You could calculate

the ATE yourself (but emphatically not its standard

Regression on the "Matched Sample"

psmatch2makes this easy by creating a _weightvariable

automatically. For observations in the treated group,

use http://ssc.wisc.edu/sscc/pubs/files/psm, replace

teffects psmatchdoes not create

omitted from the regression, so it is automatically limited to the matched sample.

_weightvariable, but it is possible

to create one based on the

example code, with comments:

gen ob=_n //store the observation numbers for future use

will be identical to the

psmatch2if you need a _weightvariable.

tab weight if !tfor details). Researchers sometimes use

norepl(no replacement) option in

Propensity Score Matching in Stata using teffects

y x1 x2 trather than reg y x1 x2 t

treatment model, or both:

teffects ra (y x1 x2) (t)

Complete Example Code

Propensity Score Matching in Stata using teffects