Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
use http://ssc.wisc.edu/sscc/pubs/files/psm
It consists of four variables: a treatment indicator t
, covariates x1and x2, and an outcome y. This is constructed data, and the effect of
the treatment is in fact a one unit increase in y. However, the probability of treatment is positively correlated with x1and x
2, and both x1
and x2are positively correlated with y
. Thus simply comparing the mean value of yfor the treated and untreated groups badly
overestimates the effect of treatment:
ttest y, by(t)
(Regressing
The
---------------------------------------------------------------------------------------Variable
Sample |
Treated
Controls
Difference
S.E. T-stat
----------------------------+----------------------------------------------------------y Unmatched | 1.8910736 -.423243358
2.31431696 .109094342
21.21
ATT | 1.8910736 .871388246
1.01968536 .173034999
5.89
----------------------------+----------------------------------------------------------Note: S.E. does not take into account that the propensity score is estimated.
The teffects Command
You can carry out the same estimation with
matching is:
teffects. The
Treatment-effects estimation
Number of obs
=
1000
Estimator
: propensity-score matching
Matches: requested =
1
Outcome model : matching
min =
1
Treatment model: probit
max =
1
-----------------------------------------------------------------------------|
AI Robust
y|
Coef. Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------ATET
|
t|
(1 vs 0) | 1.019685 .1227801
8.30
0.000
.7790407
1.26033
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors
1/6
12/12/2014
-----------------------------------------------------------------------------The average treatment effect on the treated is identical, other than being rounded at a different place. But note that
teffectsreports a
very different standard error (we'll discuss why that is shortly), plus a Z-statistic, p-value, and 95% confidence interval rather than just a Tstatistic.
Running
teffectswith the
Treatment-effects estimation
Number of obs
=
1000
Estimator
: propensity-score matching
Matches: requested =
1
Outcome model : matching
min =
1
Treatment model: logit
max =
1
-----------------------------------------------------------------------------|
AI Robust
y|
Coef. Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------ATE
|
t|
(1 vs 0) | 1.019367 .1164694
8.75
0.000
.7910912
1.247643
-----------------------------------------------------------------------------This is equivalent to:
---------------------------------------------------------------------------------------Variable
Sample |
Treated
Controls
Difference
S.E. T-stat
----------------------------+----------------------------------------------------------y Unmatched | 1.8910736 -.423243358
2.31431696 .109094342
21.21
ATT | 1.8910736 .930722886
.960350715 .168252917
5.71
ATU |-.423243358 .625587554
1.04883091
.
.
ATE |
1.01936701
.
.
----------------------------+----------------------------------------------------------Note: S.E. does not take into account that the propensity score is estimated.
The ATE from this model is very similar to the ATT/ATET from the previous model. But note that
different ATT in this model. The
Treatment-effects estimation
Number of obs
=
1000
Estimator
: propensity-score matching
Matches: requested =
1
Outcome model : matching
min =
1
Treatment model: logit
max =
1
-----------------------------------------------------------------------------|
AI Robust
y|
Coef. Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------ATET
|
t|
(1 vs 0) | .9603507 .1204748
7.97
0.000
.7242245
1.196477
-----------------------------------------------------------------------------Standard Errors
The output of
psmatch2includes the
following caveat:
Note: S.E. does not take into account that the propensity score is estimated.
A recent paper by Abadie and Imbens (2012. Matching on the estimated propensity score. Harvard University and National Bureau of
Economic Research) established how to take into account that propensity scores are estimated, and teffects psmatc
hrelies on their
work. Interestingly, the adjustment for ATE is always negative, leading to smaller standard errors: matching based on estimated propensity
scores turns out to be more efficient than matching based on true propensity scores. However, for ATET the adjustment can be positive or
negative, so the standard errors reported by
psmatch2may be
Handling Ties
Thus far we've used p
smatch2and teffects psmatchto do simple nearest-neighbor matching with one neighbor (and no caliper).
However, this raises the question of what to do when two observations have the same propensity score and are thus tied for "nearest
neighbor." Ties are common if the covariates in the treatment model are categorical or even integers.
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors
2/6
12/12/2014
The psma
tch2command by default matches with one of the tied observations, but with the tiesoption it matches with all tied
observations. The t
effects psmatchcommand always matches with all ties. If your data set has multiple observations with the same
propensity score, you won't get exactly the same results from teffects psmatchas you were getting from psmat
ch2unless you
go back and add the t
iesoption to your psmatch2commands. (At this time we are not aware of any clear guidance as to whether it is
better to match with ties or not.)
nneighbor()
Postestimation
By default teff
ects psmatchdoes not add any new variables to the data set. However, there are a variety of useful variables that
can be created with options and post-estimation p
redictcommands. The following table lists the 1st and 467th observations of the
example data set after some of these variables have been created. We'll refer to it as we explain the commands that created the new
variables. Reviewing these variables is also a good way to make sure you understand exactly how propensity score matching works.
+-------------------------------------------------------------------------------------------------------+
|
x1
x2 t
y match1
ps0
ps1
y0
y1
te |
|-------------------------------------------------------------------------------------------------------|
1. | .0152526 -1.793022 0 -1.79457
467 .9081651 .0918349
-1.79457 2.231719 4.026289 |
467. | -2.057838
.5360286 1 2.231719
781
.907606
.092394 -.6012772 2.231719 2.832996 |
+-------------------------------------------------------------------------------------------------------+
Start with a clean slate by typing:
a new variable (or variables). For each observation, this new variable will
contain the number of the observation that observation was matched with. If there are ties or you told teffects psm
atchto use
multiple neighbors, then g
en()will need to create multiple variables. Thus you supply the stem of the variable name, and teffects
example output,
Note that these observation numbers are only valid in the current sort order, so make sure you can recreate that order if needed. If
necessary, run:
gen ob=_n
and then:
sort ob
to restore the current sort order.
The
predict y0 y1, po
Because observation 1 is in the control group,
of
y. y1is the
observed value of
observation 467. The propensity score matching estimator assumes that if observation 1 had been in the treated group its value of y would
have been that of the observation in the treated group most similar to it (where "similarity" is measured by the difference in their propensity
scores).
Observation 467 is in the treated group, so its value for
its match, observation 781.
of
ywhile
y0is the
observed value of
yfor
Running the predict command with no options gives the treatment effect itself:
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors
3/6
12/12/2014
predict te
The treatment effect is simply the difference between
error) with:
sum te
and the ATET with:
sum te if t
_weightis 1. For
observations in the control group it is the number of observations from the treated group for which the observation is a match. If the
observation is not a match, _
weightis missing. _weightthus acts as a frequency weight (fweight) and can be used with Stata's
standard weighting syntax. For example (starting with a clean slate again):
_weightare
match1variable. Here
is
weightvariable
_weightvariable
created by
psmatch2, as can be
verified with:
assert weight==_weight
It is used in the same way and will give exactly the same results:
reg y x1 x2 t [fweight=weight]
Obviously this is a good bit more work than using p
smatch2. If your propensity score matching model can be done using both
teffects psmatchand psmatch2, you may want to run teffects psmatchto get the correct standard error and then
This regression has an N of 666, 333 from the treated group and 333 from the control group. However, it only uses 189 different
observations from the control group. About 1/3 of them are the matches for more than one observation from the treated group and are thus
duplicated in the regression (run
psmatch2to ensure
the
each observation is used just once, even though this generally makes the matching worse. To the best of our
knowledge there is no equivalent with t
effects psmatch.
The results of this regression leave somewhat to be desired:
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------x1 |
1.11891 .0440323
25.41
0.000
1.03245
1.205369
x2 |
1.05594 .0417253
25.31
0.000
.97401
1.13787
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors
4/6
12/12/2014
t | .9563751 .0802273
11.92
0.000
.7988445
1.113906
_cons | .0180986 .0632538
0.29
0.775
-.1061036
.1423008
-----------------------------------------------------------------------------By construction all the coefficients should be 1. Regression using all the observations (reg
[fweight=weight]) does better in this case:
-----------------------------------------------------------------------------y|
Coef. Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------x1 | 1.031167 .0346941
29.72
0.000
.9630853
1.099249
x2 | .9927759 .0333297
29.79
0.000
.9273715
1.05818
t | .9791484 .0769067
12.73
0.000
.8282306
1.130066
_cons | .0591595 .0416008
1.42
0.155
-.0224758
.1407948
-----------------------------------------------------------------------------Other Methods of Estimating Treatment Effects
While propensity score matching is the most common method of estimating treatments effects at the SSCC,
teffectsalso implements
Regression Adjustment (t
effects ra), Inverse Probability Weighting (teffects ipw), Augmented Inverse Probability Weighting
(t
effects aipw), Inverse Probability Weighted Regression Adjustment (teffects ipwra), and Nearest Neighbor Matching
(t
effects
nnmatch). The
syntax is similar, though it varies whether you need to specify variables for the outcome model, the
clear all
use http://www.ssc.wisc.edu/sscc/pubs/files/psm
ttest y, by(t)
reg y x1 x2 t
psmatch2 t x1 x2, out(y)
teffects psmatch (y) (t x1 x2, probit), atet
teffects psmatch (y) (t x1 x2)
psmatch2 t x1 x2, out(y) logit ate
teffects psmatch (y) (t x1 x2), atet
use http://www.ssc.wisc.edu/sscc/pubs/files/psm, replace
teffects psmatch (y) (t x1 x2), gen(match)
predict ps0 ps1, ps
predict y0 y1, po
predict te
l if _n==1 | _n==467
use http://www.ssc.wisc.edu/sscc/pubs/files/psm, replace
psmatch2 t x1 x2, out(y) logit
reg y x1 x2 t [fweight=_weight]
gen ob=_n
save fulldata,replace
teffects psmatch (y) (t x1 x2), gen(match)
keep if t
keep match1
bysort match1: gen weight=_N
by match1: keep if _n==1
ren match1 ob
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors
5/6
12/12/2014
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors
6/6