Sei sulla pagina 1di 73

Introduction to Impact

Evaluation
Nur Cahyadi

Department of Economics
Universitas Padjadjaran
Readings
https://www.povertyactionlab.org/research-resources/teaching
https://www.betterevaluation.org/en/managers_guide
Outline
• Purpose of Impact Evaluation (IE)
• How to conduct IE: the big picture
• Introduction to IE problem
• Validity and threats to validity
• Overview of IE methods
Purpose of Impact
Evaluation (IE)
1
Motivation
• Important question for policy makers, donors, tax
payers, NGOs and the poor…
• How effective are policy reforms and interventions in
improving social outcomes and service delivery?
• Why evaluate interventions and policy reforms?
• Improve design and effectiveness of interventions
• Reallocate funds to optimize social welfare
• Accountability
• Increasing (public) demand for “hard evidence”
of the effectiveness of public programs on
outcomes
• Develop knowledge base
• Political support
Monitoring and evaluation
• Monitoring
• Setting goals, indicators and targets for programs
• Continues process
• Evaluation
• Periodic assessment of projects, programs and policies
• Answer specific questions
• Descriptive questions
• Normative questions
• Cause-and-effect questions
• – How does a program affect a specific outcome?
In this course we look at (quantitative) impact
evaluation
Answer these questions
• Why is evaluation valuable?

• What is the difference between Impact


Evaluation and
o Monitoring?
o Other types of Evaluation?

• What makes a good impact evaluation?


Why Evaluate?
• Why Evaluate?
 Need evidence on whatworks
o Limited budget and bad policies couldhurt

 Information key to sustainability


o Design (eligibility, benefits)
o Operations (efficiency & targeting)

 Improve program/policyimplementation
o Budget negotiations
o Informing beliefs and the press
o Results agenda and Aid effectiveness

 Social Reasons
• o increases transparency & accountability
o Support of public sector reform / innovation
 Political Reasons
o Credibility/break with “bad” practices ofpast
Definition of IE
Governments and non-governmental organizations (NGOs) often implement projects,
programs or policies that are intended to change individuals’ economic or social
outcomes.

An important (and difficult toanswer) question is:

How effectivearetheseprograms inchanging thoseeconomic orsocial outcomes?

Governments and others would like to know the answer so that they can
compare the relative effectiveness of different programs, as well as
compare these programs’ benefits to their costs.
This question can be modified to obtain a definition of impact evaluation…
Definition of IE
For this course,

An impact evaluation is a study that attempts to measure the causal


impact of a project, program or policy on an outcome of interest 
of governments and other interested parties.
Some comments on this definition:
1. Some people use the term “program evaluation” for impact evaluation. Some people use
the term “impact evaluation” tojust mean randomized control trials, but in this course,
“impact evaluation” has a wider interpretation, including quasi-experimental
methods/designs.

2. This is not an easy task, but hopefully not an impossible one, because itis not easy to
measure a causal impact of a project, program orpolicy.
3. Usually, we can measure outcomes of interest for people who participate in the project or
program, or are affected by the policy. The hard partis to measure what their outcomes would
have been if they had not participated. This is often referred to asthe counterfactual outcome,
or simply the counterfactual.
Association does not prove
causality!
Causal Inference and Counterfactual
Example: Does wearing eye glasses cause better student
performance?
One might observe that students who wear eyeglasses outperform students who
do not on exams. But students who wear eyeglasses might study harder than
those who do not (which may be why they need eyeglasses in the first place).
Their better performance may simply be due to the fact that they work harder.

So to establish causality between a program (e.g., wearing eyeglasses) and an


outcome (student performance), we use impact evaluation methods to rule out
the possibility that any factors other than the program of interest (e.g., diligence)
explain the observed outcome.

Ideally, the causal impact of a program P on an outcome Y for the unit of analysis
(e.g., a student, a household or a village) is the difference between the outcome
with the program (Y | P = 1) and the outcome without the program (Y | P = 0) for
the same unit. We can define this as:

∆ = (Y | P = 1) – (Y | P = 0).
Causal Inference and Counterfactual

The problem is that we never observe both (Y| P= 1) and (Y| P= 0) at the same time. In
particular, we can observe (Y| P= 1) for program participants but we cannot observe (Y| P
= 0) for participants.
Causal Inference and Counterfactual

For example…

 For a student who has been


wearing eyeglasses for a long
time, we do not know what her
academic performance in the
state in which she never wears
eyeglasses.

 Even the performance of her twin sister is not a very good counterfactual,
because even identical twins might experience different learning
environments (for example, why one twin wears eyeglasses while the
other does not in the first place?).
Causal Inference and Counterfactual

The term (Y | P= 0) represents the counterfactual, which is what


would have happened if a participant had not participated in the
program. Since (Y | P= 0) cannot be directly observed for program
participants, we need to estimate it.

(Y | P= 0) is usually estimated from a comparison group that is


similar to the treatment group in all aspects.
Causal Inference and Counterfactual

Discussion questions:

• A patient died 3 days after he took a pill newly prescribed by his doctor.
Can we infer from this fact that this new drug caused the death of this
patient? Why or why not?
(Hint: Can you think of any factors that we need to rule out in order to establish
the causality operated from the new drug to this patient’s health outcome?)

• Last year, a city increased the size of its police force by 20%. This year,
we observed that the crime rate in this city increased by 15%. Can we
conclude that the increase in policy force caused the increase in crime
rate? Why or why not?

• What are the counterfactuals in the above two cases?


The Difference between Monitoring and
Evaluation
The terms “monitoring” and “evaluation” are sometimes
used together, but they refer to two different processes.

What Is Monitoring?

Monitoring …

• is a continuous or regular collection and analysis of informationabout


implementation to reviewprogress;

• compares actual progress with what was planned so that adjustmentscan be made in
implementation;and

• is an internal activity that is the responsibility of those who manage implementation.


Monitoring thus represents a good managementpractice.
The Difference between Monitoring and Evaluation (2)

Examples of MonitoringQuestions

The key question in monitoring is,“Are we doing what we are supposed to be doing?”, rather than
“Should we be doing this?”

 Is the activity being implemented well and ontime?

 Have the outputs been delivered asplanned and is their qualityas specified?

 We agreed to train 150 teachers by the end of the project’s second year. Have we done so?

 We are supposed to have 25 performance-based contracts signed within the project’s first
12 months. How many do we have, and how many are being implemented?
The Difference between Monitoring and Evaluation (2)

What Is Evaluation?

Evaluation…

 is a periodic assessment of the relevance, efficiency,


effectiveness, impact, and sustainability of a
project, program, or intervention;

 measures the effects of an intervention and


compares them with the goals and objectives of the
project, program, or intervention.
Two broad types of evaluation
• Formative
• Summative

Formative Summative

Information gathered Information gatheredat


during implementation the end or after
implementation
Two broad types of evaluation
Formative Evaluations

Formative evaluations are based on information gathered


during implementation and:

 provide information that helps to improve a program or intervention;

 focus primarily on activities, outputs, and short-term outcomes for


the purpose of monitoring progress and making mid-course
corrections;and

 are helpful in bringing suggestions for improvement to the attention


of program implementers.
Summative Evaluations

Summative evaluations are based on information gathered at the end of (or


after) implementation and:

 provide information that can be used to demonstrate results;

 focus on intermediate outcomes and impacts – the purpose is to


determine the value and worth of an intervention based on results; and

 are helpful in describing the quality and effectiveness of an intervention by


documenting its impact.

25
Evaluation Questions
When we are thinking about the type of evaluation approach that is most
appropriate to use, we should always ask ourselvesthis…

The focus of this course is on impact evaluation, yet impact evaluation will not
give us answers to many types of questions we have. Other types of evaluation
– or even monitoring – approaches will be more appropriate for answering
different types of questions.

The questions should lead the


method/approach we choose, not
the other way around.

26
Three Types of EvaluationQuestions
Evaluations can address three types of questions
(Imas and Rist 2009):

 Descriptive questions. The evaluation seeks to


determine what is taking place and describes processes, conditions,
organizational relationships, and stakeholder views. (What is?)

 Normative questions. The evaluation compares what is taking placeto


what should be taking place; it assesses activities and whether or not
targets are accomplished. (What should be?)

 Cause-and-effect questions. The evaluation examines outcomes and tries


to assess what difference the intervention makes in outcomes. (“What
difference does the programmake?”)

From the impact evaluation definition given earlier, we know that impact
evaluations are a particular type of evaluation that seeks toanswer cause-and-
effect questions.

27
Seven Purposes (Uses) of ImpactEvaluation

1. Measure the impacts of programs and policies on outcomes ofinterest

2. Provide information useful for improving existing programs and


policies
3. Based on results from other evaluations, or on estimates of behavioral
models, predict the impacts of proposed policies andprograms
4. Develop a stock of knowledge on the effectiveness of various
projects, programs and policies (so that we know “whatworks” and
where we should be spending our money)

5. Generate political support for effectiveprograms and policies


6. Generate political support for thegovernment

28
Some Strengths and Weakness of Impact Evaluations (with examples)

A. Strengths

• Applies to the country in which the impact evaluation was


conducted, so the results are not “imported” from elsewhere

• When done correctly, one can have confidence in theresults

• Provides specific quantitative impacts, not just general anecdotes

• Can use for cost-benefit and cost-effectiveness analysis

29
B. Weaknesses
1. Some programs or policies are hard to evaluate. For example, a program
that was implemented nationwide, withno restrictions on who can
participate, will be difficult to evaluate if there is some major shock to the
economy exactly at the time the program started.

2. Short-run political danger if many programs appear tobe ineffective

3. Some methods (e.g. RCTs)offer little insight into WHY a program worked or
did not work. Thus, monitoring data or/and qualitative approaches should
also be used to help interpret results of impact evaluations (however, this
approach has not been widely adopted).

4. Can be expensive (e.g. RCTs require new data collection) and/or hard to
implement (e.g. sometimes the sample is too small to yield a significant
impact; or for DID analysis it may be hard to collect baseline data). Proper
monitoring and better design are needed to reduce costs.

5. There could be ethical and political limitations. For example, in a


deworming program, leaving infected children in the control group
untreated (for the sake of the evaluation) may be considered unethical.
30
D. Retrospective vs. Prospective Evaluation

RCTs(which are often referred to as prospective evaluations, which are


usually designed along with the program) are becoming much more
common for conducting impact evaluations in developingcountries.
They have many advantages (e.g. better design to producemore
credible estimates) but tend to be relatively expensive (e.g. to collect
baseline data before the program has been implemented).

Studies that do not depend on randomization (often called


retrospective evaluations, for which data used for impact evaluation
are usually collected after the program has been implemented) are
arguably less reliable but also lessexpensive.

31
How to Conduct an
Impact Evaluation:
The Big Picture
Introduction:

Almost all impact evaluations seek to answer the following question:

What is the CAUSALimpact of the program (or project or policy) on the


outcome variables of interest?

Yet this question raises other questions, such as: What are the outcome
variables of interest? It also assumes that the program/project/policy is clearly
defined.

This lecture provides an overview of the steps that any organization must take
to implement an impact evaluation. Note that the order of the steps is not
rigid; at times is may be useful to go back to an earlier step based on what
happens at a laterstep.
Here are the 5 Stepsthat will be covered in this session:

1. Clarify what the program and the outcome variables of interest are.

2. Formulate a theory of change to refine the evaluationquestions.

3. Depict the theory ofchange in a “results chain”.

4.Formulate specific hypotheses for theimpact evaluation.

5. Select performance indicators for monitoring andevaluation.


Step 1. What is the Program? and What are the Outcomes of Interest?

We must be completely clear about:

1.What project, program or policyis being evaluated?

2. What outcome variables are of interest?


Both questions must be answered in consultation with all groups that have an
interest in the program (and thus have an interest in the program’s evaluation).
If not, the evaluation results may not be accepted by some of those groups.

Groups that will certainly have an interest in the program are:


1. Policymakers
2. Program managers
3. Program beneficiaries
What Are the Outcomes ofInterest?

Any well designed program should be clear


about what social or economic
outcome variables it is trying to
change. Some complications that may
ariseare:

 There may be a large number of outcome variables, and it may be


expensive to measure all of them (for example, a health program could
affect many different types of healthconditions).

 For any new program, a set of outcomes that the program is intended to
change is established as the program is designed. If the program
changes before or during implementation, therelevant outcomes may
also change.
 There may be impacts on outcome variables that the program was not
designed to change. These “unintended” outcomes may be “good”
or “bad”.

 Some outcomes are immediate (e.g. increase in hand washing) while the
real/ultimate intended outcomes occur later (e.g. reduced rate of diarrhea).

 Example: In the early 1980s, China manufactured and imported ultrasound


B-machines on a large scale. Their purpose was to monitor pregnancy,
check IUD placement, and other diagnostic tasks. Yet Chinese couples’
strong son preference, and China’s new one-child policy, led people to use
ultrasoundto determine sex prenatally, which led to sex-selective
abortions (Chu2001).

Example: Program Keluarga Harapan, Mexico’s PROGRESAcash transfer


program focused on health and education, but other outcomes e.g. labor
supply have also beenanalyzed.
Step 2: Forming a Theory of Change to Refine the Evaluation Questions

After Step 1, the team that is doing the impact evaluation should have a
good understanding of what the program is, and what its expected
outcomes are. Then the evaluation question (or set ofevaluation questions)
becomes:

What is the causal impact of the “P” program (or project or policy) on
outcome variables “X, Y, and Z”?
Step 2: Forming a Theory of Change to Refine the Evaluation Questions

In reality, forming the evaluation question(s) is an iterative process, with


a first draft, a second draft,etc.

One way to get this process started is to form a theory of


change, which is simply an explanation of how theprogram
(or project or policy) should have an effect on the outcome
variables of interest. This is especially important if the
program is a new one that has not been implemented
elsewhere.

Ideally, program designers should have, at least implicitly, a


theory of change. If not, the evaluators will have to meet with program
designers to work oneout.

A theory of change does not have to be very complicated. In its simplest form
it describes the sequence of events that one expects the program (or project or
policy) to put in place that will lead to the desired outcomes.

40
Some comments on developing a theory of change:

1.It is most important for programs designed to change participants’


behavior.

2.Ideally, it should be developed when the program is being designed.

3.It is useful to read reports on other, similar programs to get an idea of


what has happened with similar programs

4.It is also useful to identify the conditions needed for the intervention to
work. Researchers should collect data on these conditions to see if they are
met.

5. Through the theory of change, program teams may develop more


expected indicators and measurement strategies in theresults chain.
Step 3: Depicting a Theory of Change in a Results Chain (Logic Model)

1.Inputs or resources are provided toimplement activities.

2. The activities should result in the production and delivery of services or


products, which are outputs.

3. These services or products should cause something to change in the


desired direction. In the near and medium term the program’s effects are
outcomes.

4. The longer-term effects of the program are referred to as impacts.


In short, a results chain (logic, model, theory ofchange, program theory,
etc.) describes a plausible, causal relationship between inputs andhow
these inputs will lead to the intended outcomes.
A Visual Depiction of a Results Chain (Logic Model)
Here Is a More Detailed View of a Results Chain (Logic Model)

MONITORING EVALUATION
Examples of a Results Chain (LogicModel)

MONITORING EVALUATION
Here is some more detail on the 5 parts of the results chain (logic model):

 Inputs: the project’s resources: funds, staff, facilities/equipment,technical


expertise.

 Activities: what the project does with the inputs. Examples: building schools,
vaccinating children, training teachers, and developing plans and
partnerships.

 Outputs are the supply-side services or products generated by a project’s


activities. Examples are construction of 500 schools, training 1,500 nurses, a
20% increase in rice yields, and a plan to improve social protectionprograms.
 Outcomes result from activities and outputs. They reflect uptake/adoption/use
of outputs by the project’s intended beneficiaries. They are what are changed
by the project. Examples: children learn more because their schools have
higher quality, fewer illnesses due to cleaner water. “Good” outcomes can be
measured.

 Impacts: the long-term consequences of a project (sometimes calledhigher-level


outcomes). They usually refer to goal attainment. Examples: lower child
mortality, higher economic growth. Achieving impacts is usually beyond the
control of the project implementers; they should not be held accountable for
achieving them.
Example of a Results Chain: Conditional Cash Trasnfer Program (e.g.
Progressa, PKH)

INPUTS ACTIVITIES OUTPUTS OUTCOMES IMPACTS

Finance for Collect data Provide funds Increased use Reduction in


cash on eligibility; for cash of education current
transfers & set up transfers; and health poverty &
pay staff; services; system to services; future poverty
Staff to identify check increased (due to higher
deliver recipients; compliance; spend- ing on human
services, check health serv. goods capital)
check compliance
 Budget
compliance  Data  Cash  Higher  Higher
 Staffing collection transfers school years of
 Train staff o delivered enrollment education
 Explain  Health  Higher use  Better health
program t services of health  Lower poverty
eligible pop.  Data
Activities of Implementing Agency Results (SUPPLY +
services
(SUPPLY SIDE) collection BEHAVIOR)

You should fill this out for your proposed project in the exercise.
Step 4: Formulate Specific Hypotheses for the ImpactEvaluation

After a clear results chain is developed, specific hypotheses can be


developed regarding the impact of the program (or project or policy) on the
outcome variables of interest. These hypotheses should be as specific as
possible:

 The outcomes, both short term and long term,must be very clearly
defined.
Example: Mexico’s Progresa program should increase school enrollment
rates in the short run, and completed years of schooling in the long run.

 The timing ofwhen the outcome should appear should be specified.

 Perhaps allow for different outcomes on differentsubpopulations of


interest.
Example: A program to increase the number of female teachers should
have a larger impact on the enrollment of girls than on the enrollment of
boys.
Step 5: Select Performance Indicators for M&E

Outcome variables of interest are not always easy to


measure. It is very useful to measure progress in
program implementation as well as outcome variables.
“SMART” gives some ideas of how to select performanceindicators:

 Specific: Indicators should reflect simple information that is communicable


and easily understood by the provider and the user of the information.
 Measurable: Be sure that it is possible to measure each outcome of
interest.
 Achievable: Indicators and their measurement units must beachievable
and sensitive to change during the life of the project.
 Relevant: Indicators should reflect information that is importantand likely
to be used for management or immediate analytical purposes.
 Time-bound: Track progress at a desired frequency for a set period of time.
SPECIFIC: Indicators should reflect simple information that is
communicable and easily understood by the provider and the user of the
information.

Example: Consider a family planning program in a developing country where


families usually have more than two children. The outcome variable of interest
is the reduction in the number of children in a family. The program sets up new
clinics for women of child-bearing age, and provides low cost contraceptives
and health education. Which of the following indicators is more specific?

 Indicator 1: Increased number of smallfamilies.


 Indicator 2: Increased number of one-child and two-childfamilies.

Example: Consider a program that aims to improve rural children’s health


status. Which of thefollowing indicators is more measurable?

 Indicator 1: The number of children that becamehealthier.


 Indicator 2: The number of children who were ill in the past four weeks.
MEASURABLE: Be sure that it is possible to measure each outcome
of interest.

Example: Consider a program that aims to improve rural elementary schools’


physical facilities. Which ofthe following indicators is more measurable?

 Indicator 1: Percentage of schools that have poor quality of physical


facilities.
 Indicator 2: Percentage of schools that have leaking classrooms (or
without electricity or libraries).

Example: Consider a program that aims to improve ruralfamilies’ living


standards. Which ofthe following indicators is more measurable?

 Indicator 1: The number of families that becamewealthier.


 Indicator 2: The increase in monthly consumption on non-food items inthe
target families.
52
ACHIEVABLE: Indicators and their measurement units must be
achievable and sensitive to change during the life of the project.

Example: A program offers free regular check-up for pregnant women


from poor families. To evaluate this program on its impact of the health of
newborn children, which of the following indicators is more achievable?
 Indicator 1: Percentage of newborns with low birth weight (< 2500 kg).
 Indicator 2: Percentage of children of these women receiving daily iron
supplementation from 4 up to 24 months (note: a body’s iron store lasts for
4 to 6 months).
Example: Consider a school intervention program that provides free extra
after-school tutoring services to poor-performing students in 100 primary
schools. Which of the following indicators is moreachievable?
 Indicator 1: A 0.2 standard deviations increase in test scores of studentson
a standardized exam administrated one year after the programstarted.
 Indicator 2: A 20% increase in percentage of students who attend university.
53
RELEVANT: Indicators should reflect information that isimportant
and likely to be used for management or immediate analytical
purposes.

Example: A program offers free regular check-up for pregnant women from
poor families. To evaluate this program on its impact of the health of newborn
children, which of the following indicators is more relevant?

 Indicator 1: Percentage of newborns with low birth weight (< 2500 kg).
 Indicator 2: Percentage of newborns withcleft lip.

Example: A program offers free school lunches to students in a developing


country where malnutrition is prevalent. To evaluate this program, which of
the following indicators is morerelevant?

 Indicator 1: Weight-for-height z-score of students.


 Indicator 2: Students’ time spent participating in sports after school (which
may increase due to more nutrition intake).
TIME-BOUND: Track progress at a desired frequency for a set period oftime.

Example: Consider a cash transfer program that provides cash


transfer to poor rural families in a developing county if the parents
in these families have their infants checked up at local health
facilities. Regarding the condition for receiving cash transfers,which
indicator is more time-bound?

 Indicator 1: whether a family had its infant checked up at least 4


times during this infant’s first six months of life.
 Indicator 2: whether a family had its infantchecked up
when this infant was 1 month, 2 months, 4 months, and 6 monthsold.
Example: In the above example, to monitor the process of cash transfer, which
indicator is more time-bound?
 Indicator 1: whether an eligible household received cash transfer
bimonthly for the first year of the program.
 Indicator 2: whether a family received a certain amount of sometime
during the first year ofthe program.
Introduction to the
Evaluation Problem
A. The Evaluation Problem

Recall that almost all impact evaluations seek to answer the following question:

What is the causal impact of the program (or project or policy) on the outcomes of
interest?
Potential Outcomes, Observed Outcomes, and the Gain from Treatment

Let Y denote the outcome of interest, for example health status or school
enrollment.

For any person there are two potential values of Y, the value that would occur if he or
she were not “treated” by the program, which is denoted as Y0, and the value that would
occur if he or she were “treated”, which is denoted as Y1:

Y0 = value of Y if person is nottreated

Y1 = value of Y if person istreated

Both Y0 and Y1 are defined for ALL people. For example, we may observe Y1 for a treated
person but if he or she had not been treated we would have observed Y0. Indeed, we
usually want to estimate Y1 – Y0, or at least an average of this for some population of
interest.
The main problem for impact evaluation is that for each person, we observe only Y0 or
Y1, but not both, at any point in time.
Another very important variable is P, which is variable that indicates whether a person
participates in the program (is “treated” by the program).

P takes only two values:

P = 0: Individual was not treated (did not participate in theprogram)

P = 1: Individual was treated (did participate in the program)

From now on, Y represents the observed value of Y. That is, for a person with P = 1we
have Y = Y1, and for a person with P = 0 we have Y =Y0.

Mathematically, this implies the following relationship between P, Y0, Y1 andY:

Y = PY1 + (1 − P)Y0

The potential outcomes framework was first introduced by Fisher (1951) and Roy (1951).
Early discussions of it appear in Quandt (1972) and Rubin (1978). It is often called the Roy
Model in economics or the Causal Model in statistics.
For any individual, the gain from treatment is defined as the difference between Y1 and
Y0. This can be denoted by Δ:

Δ= Y1 − Y0
Remember!
Because only one state is observed at any giventime Δ=the difference, or change
(that is, at any point in time for a given individual we (also known as ‘delta’)
observe either Y0 or Y1, but not both), the gain from
treatment (Δ) is not directly observed foranyone.

Estimating Δcan therefore be seen as a missing data problem. The evaluation literature
has developed a variety of different approaches to solve this problem.

Be aware o f t h e
missing d a t a
problem…
Parameters of Interest

The impacts of a project, program or policy are likely to be different for different
individuals, so almost all impact evaluations calculate some kind of average effect. The
different averages are often called “parameters ofinterest”.

The two most common parameters of interest for impact evaluation are:

 ATE(average impact of the treatment, or average treatmenteffect)

 ATT (average impact of treatment on the treated)


1.The average gain from the program for all persons in the population:

E[Y1 − Y0] = E[Δ]

This is commonly referred to as the average impact


of the treatment, or average treatment effect,
which is denoted by ATE. Note that E[ ] denotes
“expected value” (the mean of the expression in the
brackets).

2.The average gain from the program for program


participants:

E[Y1 − Y0| P = 1] = E[Δ| P= 1]


This is called the average impact of treatment on the
treated, and is denoted by ATT.
In fact, it is sometimes possible to go further by estimating ATEand ATT for a person with
characteristics X, where X is a vector of observable variables. That is, we can define :

ATE(X) ≡ E[Y1 − Y0| X] = E[Δ| X]

ATT(X) ≡ E[Y1 − Y0| P = 1, X] = E[Δ| P = 1, X]

In words, ATE(X) is the gain from the program that would be experienced on
average if a randomly chosen person with characteristics Xwere assigned to participate in
the program, and ATT(X) is the average gain experienced for the subset of individuals who
actually participated in the program (for whom P = 1).

Question: Suppose that participants in the program tend to be the people who receive
the greatest benefit from it. Which of the following would you expect?

ATT(X) >ATE(X)

ATT(X) =ATE(X)

ATT(X) <ATE(X)
Internal and External
Validity
Definitions:

 A study has strong internal validity if the analyst is able to demonstrate


that an observed correlation between program participation and the
outcomes of interest represents a causalrelationship.

 A study has strong external validity if a causal effect found in a study


environment can be generalized toother populations, places or times.

Internal validity deals with whether you


have successfully identified the causal
effects that the program or policy that you
are studying has on outcomes of interest,

External validity deals with whether or


not your findings provide a
reliable estimate of what similarprograms’
effects would be in othercontexts.
IntermediateApproaches

Often (though not always), there is a tradeoff between internal validity and
external validity, as depicted in this chart.

Stronger internal validity Stronger external Validity

Lab Empirical
experiment: Field Natural analysis of
If well experiment experiment observational
controlled data
Consequences of Small Samples and Low StatisticalPower

Finally, something to be wary of are the consequences of a study with low


statistical power. Studies with small sample sizes offer low
precision. This means that even if the treatment has an
effect on a given outcome, a study design with lowprecision
will not detect this effect. Precision is lower if the study:

 Has a small sample size (total number of observations)

 Has a small number of clusters, where randomization is


done by cluster

 Measures outcomes with error.

 Has treatment and control groups of very differentsizes.

Power analysis can be done before beginning an IE to estimate how precise


your estimates will be, given a sample design. This is discovered in detail in the
Sampling session, and reviewed here.
The precision of estimates is evaluated by the minimum detectableeffect size
(MDE). Measured in standard deviations, this is the smallest statistically
significant effect that can be found for a given estimate. This is mostly a
function of severalparameters:
 Sample size
 Level of treatment assignment (individual level?clustered?)
 Correlation within or betweenclusters (ICC)
 Level of significance

For example, with an MDE of 0.2, the smallest significant effect that can be
identified is 0.2 standard deviations. Any effect greater than 0.2 could be
detected, while any effect smaller than 0.2 would not be. Adifference in means
of 0.15 might be observed, but this difference would not be significant.

When conducting analysis, if no significant effect is found, this means that the
treatment did not have an effect at least as large as the MDE. The analyst
cannot draw any conclusions about whether smaller effects mightbe present.
It is important to clarify this when reporting results.
Final Comment on Sample Sizes:

Although higher sample sizes increase the precision of the estimated


treatment effects, they also entail greater costs and more extensive
supervision. If budgetary and other resources are limited, it may be better to
go ahead with the analysis using a smaller than ideal sample than to abandon
all attempts at evaluation.
Overview of Impact
Evaluation Methods
Overview of Impact Evaluation Methods
Method When to use? Threats
Randomized When one can randomly assign individuals, Non-compliance of the
Evaluations households or villages into treatment and treatment assignment,
control groups. Usually the experiment is non-random attrition,
designed along with the program. control group
contamination
Difference in When there are data before and afterprogram Violation of thecommon
Differences implementation, for participants and a time trend assumption
(Diff-in-Diff) comparison group, and both groups can be
assumed to have a similar time trend for the
outcome variable.
(Propensity When large samples and high quality dataare Nonrandom selectioninto
Score) available to construct a comparison group based on the program dueto
unobservedfactors
Matching people’s probability of programparticipation.

Regression When program participation is determinedin Small sample size; result


discontinuity part by an eligibility rule based on acontinuous may not begeneralizable
variable, e.g. test scores or poverty index,etc.
Instrumental When non-compliance occurs for arandomized Weak instruments; “local
Variable evaluation, there is a randomizedencouragement average treatment effects”
design, or a fuzzy regression discontinuity design.
Three Broad Types of Design
Before we get started on this session, let’s talk about the three broad categories
of research designs that are used in different types of evaluations. In this
course, we are focusing on the experimental (RandomizedEvaluations-RCTs)
and quasi-experimental methods (Diff-in-Diff, Matching, RD,IV).

Typical Features
Types Random Before-After Comparison
assignment Group
Experimental
  
Quasi-Experimental
 
Non-Experimental
? ?
What we have discussed
• Purpose of Impact Evaluation (IE)
• How to conduct IE: the big picture
• Introduction to IE problem
• Validity and threats to validity
• Overview of IE methods
Terima Kasih