Sei sulla pagina 1di 6

12/12/2014

Propensity Score Matching in Stata using teffects

Propensity Score Matching in Stata using teffects


For many years, the standard tool for propensity score matching in Stata has been the psmatch2command, written by Edwin Leuven and
Barbara Sianesi. However, Stata 13 introduced a new teffectscommand for estimating treatments effects in a variety of ways, including
propensity score matching. The t
effects psmatchcommand has one very important advantage over psmatch2: it takes into
account the fact that propensity scores are estimated rather than known when calculating standard errors. This often turns out to make a
significant difference, and sometimes in surprising ways. We thus strongly recommend switching from psmatch2to tef
fects
psmatch, and this article will help you make the transition.

An Example of Propensity Score Matching


Run the following command in Stata to load an example data set:

use http://ssc.wisc.edu/sscc/pubs/files/psm
It consists of four variables: a treatment indicator t
, covariates x1and x2, and an outcome y. This is constructed data, and the effect of
the treatment is in fact a one unit increase in y. However, the probability of treatment is positively correlated with x1and x
2, and both x1
and x2are positively correlated with y
. Thus simply comparing the mean value of yfor the treated and untreated groups badly
overestimates the effect of treatment:

ttest y, by(t)
(Regressing
The

yon t, x1, and x2will give

psmatch2command will give

you a pretty good picture of the situation.)

you a much better estimate of the treatment effect:

psmatch2 t x1 x2, out(y)

---------------------------------------------------------------------------------------Variable
Sample |
Treated
Controls
Difference
S.E. T-stat
----------------------------+----------------------------------------------------------y Unmatched | 1.8910736 -.423243358
2.31431696 .109094342
21.21
ATT | 1.8910736 .871388246
1.01968536 .173034999
5.89
----------------------------+----------------------------------------------------------Note: S.E. does not take into account that the propensity score is estimated.
The teffects Command
You can carry out the same estimation with
matching is:

teffects. The

basic syntax of the

teffectscommand when used for propensity score

teffects psmatch (outcome) (treatment covariates)


In this case the basic command would be:

teffects psmatch (y) (t x1 x2)


However, the default behavior of te
ffectsis not the same as psmatch2so we'll need to use some options to get the same results.
First, psm
atch2by default reports the average treatment effect on the treated (which it refers to as ATT). The teffectscommand by
default reports the average treatment effect (ATE) but will calculate the average treatment effect on the treated (which it refers to as ATET)
if given the atetoption. Second, p
smatch2by default uses a probit model for the probability of treatment. The teffectscommand
uses a logit model by default, but will use probit if the probitoption is applied to the treatment equation. So to run the same model
using t
effectstype:

teffects psmatch (y) (t x1 x2, probit), atet

Treatment-effects estimation
Number of obs
=
1000
Estimator
: propensity-score matching
Matches: requested =
1
Outcome model : matching
min =
1
Treatment model: probit
max =
1
-----------------------------------------------------------------------------|
AI Robust
y|
Coef. Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------ATET
|
t|
(1 vs 0) | 1.019685 .1227801
8.30
0.000
.7790407
1.26033
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors

1/6

12/12/2014

Propensity Score Matching in Stata using teffects

-----------------------------------------------------------------------------The average treatment effect on the treated is identical, other than being rounded at a different place. But note that

teffectsreports a

very different standard error (we'll discuss why that is shortly), plus a Z-statistic, p-value, and 95% confidence interval rather than just a Tstatistic.
Running

teffectswith the

default options gives the following:

teffects psmatch (y) (t x1 x2)

Treatment-effects estimation
Number of obs
=
1000
Estimator
: propensity-score matching
Matches: requested =
1
Outcome model : matching
min =
1
Treatment model: logit
max =
1
-----------------------------------------------------------------------------|
AI Robust
y|
Coef. Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------ATE
|
t|
(1 vs 0) | 1.019367 .1164694
8.75
0.000
.7910912
1.247643
-----------------------------------------------------------------------------This is equivalent to:

psmatch2 t x1 x2, out(y) logit ate

---------------------------------------------------------------------------------------Variable
Sample |
Treated
Controls
Difference
S.E. T-stat
----------------------------+----------------------------------------------------------y Unmatched | 1.8910736 -.423243358
2.31431696 .109094342
21.21
ATT | 1.8910736 .930722886
.960350715 .168252917
5.71
ATU |-.423243358 .625587554
1.04883091
.
.
ATE |
1.01936701
.
.
----------------------------+----------------------------------------------------------Note: S.E. does not take into account that the propensity score is estimated.
The ATE from this model is very similar to the ATT/ATET from the previous model. But note that
different ATT in this model. The

teffectscommand reports the

psmatch2is reporting a somewhat

same ATET if asked:

teffects psmatch (y) (t x1 x2), atet

Treatment-effects estimation
Number of obs
=
1000
Estimator
: propensity-score matching
Matches: requested =
1
Outcome model : matching
min =
1
Treatment model: logit
max =
1
-----------------------------------------------------------------------------|
AI Robust
y|
Coef. Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------ATET
|
t|
(1 vs 0) | .9603507 .1204748
7.97
0.000
.7242245
1.196477
-----------------------------------------------------------------------------Standard Errors
The output of

psmatch2includes the

following caveat:

Note: S.E. does not take into account that the propensity score is estimated.
A recent paper by Abadie and Imbens (2012. Matching on the estimated propensity score. Harvard University and National Bureau of
Economic Research) established how to take into account that propensity scores are estimated, and teffects psmatc
hrelies on their
work. Interestingly, the adjustment for ATE is always negative, leading to smaller standard errors: matching based on estimated propensity
scores turns out to be more efficient than matching based on true propensity scores. However, for ATET the adjustment can be positive or
negative, so the standard errors reported by

psmatch2may be

too large or to small.

Handling Ties
Thus far we've used p
smatch2and teffects psmatchto do simple nearest-neighbor matching with one neighbor (and no caliper).
However, this raises the question of what to do when two observations have the same propensity score and are thus tied for "nearest
neighbor." Ties are common if the covariates in the treatment model are categorical or even integers.

http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors

2/6

12/12/2014

Propensity Score Matching in Stata using teffects

The psma
tch2command by default matches with one of the tied observations, but with the tiesoption it matches with all tied
observations. The t
effects psmatchcommand always matches with all ties. If your data set has multiple observations with the same
propensity score, you won't get exactly the same results from teffects psmatchas you were getting from psmat
ch2unless you
go back and add the t
iesoption to your psmatch2commands. (At this time we are not aware of any clear guidance as to whether it is
better to match with ties or not.)

Matching With Multiple Neighbors


By default teff
ects psmatchmatches each observation with one other observation. You can change this with the
(or just n
n()) option. For example, you could match each observation with its three nearest neighbors with:

nneighbor()

teffects psmatch (y) (t x1 x2), nn(3)

Postestimation
By default teff
ects psmatchdoes not add any new variables to the data set. However, there are a variety of useful variables that
can be created with options and post-estimation p
redictcommands. The following table lists the 1st and 467th observations of the
example data set after some of these variables have been created. We'll refer to it as we explain the commands that created the new
variables. Reviewing these variables is also a good way to make sure you understand exactly how propensity score matching works.

+-------------------------------------------------------------------------------------------------------+
|
x1
x2 t
y match1
ps0
ps1
y0
y1
te |
|-------------------------------------------------------------------------------------------------------|
1. | .0152526 -1.793022 0 -1.79457
467 .9081651 .0918349
-1.79457 2.231719 4.026289 |
467. | -2.057838
.5360286 1 2.231719
781
.907606
.092394 -.6012772 2.231719 2.832996 |
+-------------------------------------------------------------------------------------------------------+
Start with a clean slate by typing:

use http://ssc.wisc.edu/sscc/pubs/files/psm, replace


The

gen()option tells teffects psmatchto create

a new variable (or variables). For each observation, this new variable will

contain the number of the observation that observation was matched with. If there are ties or you told teffects psm
atchto use
multiple neighbors, then g
en()will need to create multiple variables. Thus you supply the stem of the variable name, and teffects

psmatchwill add suffixes as needed.

teffects psmatch (y) (t x1 x2), gen(match)


In this case each observation is only matched with one other, so gen(match) only creates
the match of observation 1 is observation 467 (which is why those two are listed).

match1. Referring to the

example output,

Note that these observation numbers are only valid in the current sort order, so make sure you can recreate that order if needed. If
necessary, run:

gen ob=_n
and then:

sort ob
to restore the current sort order.
The

predictcommand with the psoption creates two variables containing the

propensity scores, or that observation's predicted

probability of being in either the control group or the treated group:

predict ps0 ps1, ps


Here ps0is the predicted probability of being in the control group (t=0) and ps1is the predicted probability of being in the treated group
(t
=1). Observations 1 and 467 were matched because their propensity scores are very similar.
The

pooption creates variables containing the

potential outcomes for each observation:

predict y0 y1, po
Because observation 1 is in the control group,

y0contains its observed value

of

y. y1is the

observed value of

yfor observation 1's match,

observation 467. The propensity score matching estimator assumes that if observation 1 had been in the treated group its value of y would
have been that of the observation in the treated group most similar to it (where "similarity" is measured by the difference in their propensity
scores).
Observation 467 is in the treated group, so its value for
its match, observation 781.

y1is its observed value

of

ywhile

its value for

y0is the

observed value of

yfor

Running the predict command with no options gives the treatment effect itself:

http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors

3/6

12/12/2014

Propensity Score Matching in Stata using teffects

predict te
The treatment effect is simply the difference between

y1and y0. You could calculate

the ATE yourself (but emphatically not its standard

error) with:

sum te
and the ATET with:

sum te if t

Regression on the "Matched Sample"


Another way to conceptualize propensity score matching is to think of it as choosing a sample from the control group that "matches" the
treatment group. Any differences between the treatment and matched control groups are then assumed to be a result of the treatment.
Note that this gives the average treatment effect on the treatedto calculate the ATE you'd create a sample of the treated group that
matches the controls. Mathematically this is all equivalent to using matching to estimate what an observation's outcome would have been if
it had been in the other group, as described above.
Sometimes researchers then want to run regressions on the "matched sample," defined as the observations in the treated group plus the
observations in the control group which were matched to them. We will discuss how this can be done without passing judgement on the
appropriateness or usefulness of the technique.

psmatch2makes this easy by creating a _weightvariable

automatically. For observations in the treated group,

_weightis 1. For

observations in the control group it is the number of observations from the treated group for which the observation is a match. If the
observation is not a match, _
weightis missing. _weightthus acts as a frequency weight (fweight) and can be used with Stata's
standard weighting syntax. For example (starting with a clean slate again):

use http://ssc.wisc.edu/sscc/pubs/files/psm, replace


psmatch2 t x1 x2, out(y) logit
reg y x1 x2 t [fweight=_weight]
Observations with a missing value for

_weightare

teffects psmatchdoes not create

omitted from the regression, so it is automatically limited to the matched sample.

_weightvariable, but it is possible

to create one based on the

match1variable. Here

is

example code, with comments:

gen ob=_n //store the observation numbers for future use


save fulldata,replace // save the complete data set
keep if t // keep just the treated group
keep match1 // keep just the match1 variable (the observation numbers of their matches)
bysort match1: gen weight=_N // count how many times each control observation is a match
by match1: keep if _n==1 // keep just one row per control observation
ren match1 ob //rename for merging purposes
merge 1:m ob using fulldata // merge back into the full data
replace weight=1 if t // set weight to 1 for treated observations
The resulting

weightvariable

will be identical to the

_weightvariable

created by

psmatch2, as can be

verified with:

assert weight==_weight
It is used in the same way and will give exactly the same results:

reg y x1 x2 t [fweight=weight]
Obviously this is a good bit more work than using p
smatch2. If your propensity score matching model can be done using both
teffects psmatchand psmatch2, you may want to run teffects psmatchto get the correct standard error and then

psmatch2if you need a _weightvariable.

This regression has an N of 666, 333 from the treated group and 333 from the control group. However, it only uses 189 different
observations from the control group. About 1/3 of them are the matches for more than one observation from the treated group and are thus
duplicated in the regression (run

psmatch2to ensure

tab weight if !tfor details). Researchers sometimes use

the

norepl(no replacement) option in

each observation is used just once, even though this generally makes the matching worse. To the best of our
knowledge there is no equivalent with t
effects psmatch.
The results of this regression leave somewhat to be desired:

-----------------------------------------------------------------------------y|
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------x1 |
1.11891 .0440323
25.41
0.000
1.03245
1.205369
x2 |
1.05594 .0417253
25.31
0.000
.97401
1.13787
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors

4/6

12/12/2014

Propensity Score Matching in Stata using teffects

t | .9563751 .0802273
11.92
0.000
.7988445
1.113906
_cons | .0180986 .0632538
0.29
0.775
-.1061036
.1423008
-----------------------------------------------------------------------------By construction all the coefficients should be 1. Regression using all the observations (reg
[fweight=weight]) does better in this case:

y x1 x2 trather than reg y x1 x2 t

-----------------------------------------------------------------------------y|
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------x1 | 1.031167 .0346941
29.72
0.000
.9630853
1.099249
x2 | .9927759 .0333297
29.79
0.000
.9273715
1.05818
t | .9791484 .0769067
12.73
0.000
.8282306
1.130066
_cons | .0591595 .0416008
1.42
0.155
-.0224758
.1407948
-----------------------------------------------------------------------------Other Methods of Estimating Treatment Effects
While propensity score matching is the most common method of estimating treatments effects at the SSCC,

teffectsalso implements

Regression Adjustment (t
effects ra), Inverse Probability Weighting (teffects ipw), Augmented Inverse Probability Weighting
(t
effects aipw), Inverse Probability Weighted Regression Adjustment (teffects ipwra), and Nearest Neighbor Matching
(t
effects

nnmatch). The

syntax is similar, though it varies whether you need to specify variables for the outcome model, the

treatment model, or both:

teffects ra (y x1 x2) (t)


teffects ipw (y) (t x1 x2)
teffects aipw (y x1 x2) (t x1 x2)
teffects ipwra (y x1 x2) (t x1 x2)
teffects nnmatch (y x1 x2) (t)

Complete Example Code


The following is the complete code for the examples in this article.

clear all
use http://www.ssc.wisc.edu/sscc/pubs/files/psm
ttest y, by(t)
reg y x1 x2 t
psmatch2 t x1 x2, out(y)
teffects psmatch (y) (t x1 x2, probit), atet
teffects psmatch (y) (t x1 x2)
psmatch2 t x1 x2, out(y) logit ate
teffects psmatch (y) (t x1 x2), atet
use http://www.ssc.wisc.edu/sscc/pubs/files/psm, replace
teffects psmatch (y) (t x1 x2), gen(match)
predict ps0 ps1, ps
predict y0 y1, po
predict te
l if _n==1 | _n==467
use http://www.ssc.wisc.edu/sscc/pubs/files/psm, replace
psmatch2 t x1 x2, out(y) logit
reg y x1 x2 t [fweight=_weight]
gen ob=_n
save fulldata,replace
teffects psmatch (y) (t x1 x2), gen(match)
keep if t
keep match1
bysort match1: gen weight=_N
by match1: keep if _n==1
ren match1 ob
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors

5/6

12/12/2014

Propensity Score Matching in Stata using teffects

merge 1:m ob using fulldata


replace weight=1 if t
assert weight==_weight
reg y x1 x2 t [fweight=weight]
reg y x1 x2 t
teffects ra (y x1 x2) (t)
teffects ipw (y) (t x1 x2)
teffects aipw (y x1 x2) (t x1 x2)
teffects ipwra (y x1 x2) (t x1 x2)
teffects nnmatch (y x1 x2) (t)
Last Revised: 11/13/2013
2009-2014 UW Board of Regents, University of Wisconsin - Madison

http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors

6/6

Potrebbero piacerti anche