Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Evaluation
Nur Cahyadi
Department of Economics
Universitas Padjadjaran
Readings
https://www.povertyactionlab.org/research-resources/teaching
https://www.betterevaluation.org/en/managers_guide
Outline
• Purpose of Impact Evaluation (IE)
• How to conduct IE: the big picture
• Introduction to IE problem
• Validity and threats to validity
• Overview of IE methods
Purpose of Impact
Evaluation (IE)
1
Motivation
• Important question for policy makers, donors, tax
payers, NGOs and the poor…
• How effective are policy reforms and interventions in
improving social outcomes and service delivery?
• Why evaluate interventions and policy reforms?
• Improve design and effectiveness of interventions
• Reallocate funds to optimize social welfare
• Accountability
• Increasing (public) demand for “hard evidence”
of the effectiveness of public programs on
outcomes
• Develop knowledge base
• Political support
Monitoring and evaluation
• Monitoring
• Setting goals, indicators and targets for programs
• Continues process
• Evaluation
• Periodic assessment of projects, programs and policies
• Answer specific questions
• Descriptive questions
• Normative questions
• Cause-and-effect questions
• – How does a program affect a specific outcome?
In this course we look at (quantitative) impact
evaluation
Answer these questions
• Why is evaluation valuable?
Improve program/policyimplementation
o Budget negotiations
o Informing beliefs and the press
o Results agenda and Aid effectiveness
Social Reasons
• o increases transparency & accountability
o Support of public sector reform / innovation
Political Reasons
o Credibility/break with “bad” practices ofpast
Definition of IE
Governments and non-governmental organizations (NGOs) often implement projects,
programs or policies that are intended to change individuals’ economic or social
outcomes.
Governments and others would like to know the answer so that they can
compare the relative effectiveness of different programs, as well as
compare these programs’ benefits to their costs.
This question can be modified to obtain a definition of impact evaluation…
Definition of IE
For this course,
2. This is not an easy task, but hopefully not an impossible one, because itis not easy to
measure a causal impact of a project, program orpolicy.
3. Usually, we can measure outcomes of interest for people who participate in the project or
program, or are affected by the policy. The hard partis to measure what their outcomes would
have been if they had not participated. This is often referred to asthe counterfactual outcome,
or simply the counterfactual.
Association does not prove
causality!
Causal Inference and Counterfactual
Example: Does wearing eye glasses cause better student
performance?
One might observe that students who wear eyeglasses outperform students who
do not on exams. But students who wear eyeglasses might study harder than
those who do not (which may be why they need eyeglasses in the first place).
Their better performance may simply be due to the fact that they work harder.
Ideally, the causal impact of a program P on an outcome Y for the unit of analysis
(e.g., a student, a household or a village) is the difference between the outcome
with the program (Y | P = 1) and the outcome without the program (Y | P = 0) for
the same unit. We can define this as:
∆ = (Y | P = 1) – (Y | P = 0).
Causal Inference and Counterfactual
The problem is that we never observe both (Y| P= 1) and (Y| P= 0) at the same time. In
particular, we can observe (Y| P= 1) for program participants but we cannot observe (Y| P
= 0) for participants.
Causal Inference and Counterfactual
For example…
Even the performance of her twin sister is not a very good counterfactual,
because even identical twins might experience different learning
environments (for example, why one twin wears eyeglasses while the
other does not in the first place?).
Causal Inference and Counterfactual
Discussion questions:
• A patient died 3 days after he took a pill newly prescribed by his doctor.
Can we infer from this fact that this new drug caused the death of this
patient? Why or why not?
(Hint: Can you think of any factors that we need to rule out in order to establish
the causality operated from the new drug to this patient’s health outcome?)
• Last year, a city increased the size of its police force by 20%. This year,
we observed that the crime rate in this city increased by 15%. Can we
conclude that the increase in policy force caused the increase in crime
rate? Why or why not?
What Is Monitoring?
Monitoring …
• compares actual progress with what was planned so that adjustmentscan be made in
implementation;and
Examples of MonitoringQuestions
The key question in monitoring is,“Are we doing what we are supposed to be doing?”, rather than
“Should we be doing this?”
Have the outputs been delivered asplanned and is their qualityas specified?
We agreed to train 150 teachers by the end of the project’s second year. Have we done so?
We are supposed to have 25 performance-based contracts signed within the project’s first
12 months. How many do we have, and how many are being implemented?
The Difference between Monitoring and Evaluation (2)
What Is Evaluation?
Evaluation…
Formative Summative
25
Evaluation Questions
When we are thinking about the type of evaluation approach that is most
appropriate to use, we should always ask ourselvesthis…
The focus of this course is on impact evaluation, yet impact evaluation will not
give us answers to many types of questions we have. Other types of evaluation
– or even monitoring – approaches will be more appropriate for answering
different types of questions.
26
Three Types of EvaluationQuestions
Evaluations can address three types of questions
(Imas and Rist 2009):
From the impact evaluation definition given earlier, we know that impact
evaluations are a particular type of evaluation that seeks toanswer cause-and-
effect questions.
27
Seven Purposes (Uses) of ImpactEvaluation
28
Some Strengths and Weakness of Impact Evaluations (with examples)
A. Strengths
29
B. Weaknesses
1. Some programs or policies are hard to evaluate. For example, a program
that was implemented nationwide, withno restrictions on who can
participate, will be difficult to evaluate if there is some major shock to the
economy exactly at the time the program started.
3. Some methods (e.g. RCTs)offer little insight into WHY a program worked or
did not work. Thus, monitoring data or/and qualitative approaches should
also be used to help interpret results of impact evaluations (however, this
approach has not been widely adopted).
4. Can be expensive (e.g. RCTs require new data collection) and/or hard to
implement (e.g. sometimes the sample is too small to yield a significant
impact; or for DID analysis it may be hard to collect baseline data). Proper
monitoring and better design are needed to reduce costs.
31
How to Conduct an
Impact Evaluation:
The Big Picture
Introduction:
Yet this question raises other questions, such as: What are the outcome
variables of interest? It also assumes that the program/project/policy is clearly
defined.
This lecture provides an overview of the steps that any organization must take
to implement an impact evaluation. Note that the order of the steps is not
rigid; at times is may be useful to go back to an earlier step based on what
happens at a laterstep.
Here are the 5 Stepsthat will be covered in this session:
1. Clarify what the program and the outcome variables of interest are.
For any new program, a set of outcomes that the program is intended to
change is established as the program is designed. If the program
changes before or during implementation, therelevant outcomes may
also change.
There may be impacts on outcome variables that the program was not
designed to change. These “unintended” outcomes may be “good”
or “bad”.
Some outcomes are immediate (e.g. increase in hand washing) while the
real/ultimate intended outcomes occur later (e.g. reduced rate of diarrhea).
After Step 1, the team that is doing the impact evaluation should have a
good understanding of what the program is, and what its expected
outcomes are. Then the evaluation question (or set ofevaluation questions)
becomes:
What is the causal impact of the “P” program (or project or policy) on
outcome variables “X, Y, and Z”?
Step 2: Forming a Theory of Change to Refine the Evaluation Questions
A theory of change does not have to be very complicated. In its simplest form
it describes the sequence of events that one expects the program (or project or
policy) to put in place that will lead to the desired outcomes.
40
Some comments on developing a theory of change:
4.It is also useful to identify the conditions needed for the intervention to
work. Researchers should collect data on these conditions to see if they are
met.
MONITORING EVALUATION
Examples of a Results Chain (LogicModel)
MONITORING EVALUATION
Here is some more detail on the 5 parts of the results chain (logic model):
Activities: what the project does with the inputs. Examples: building schools,
vaccinating children, training teachers, and developing plans and
partnerships.
You should fill this out for your proposed project in the exercise.
Step 4: Formulate Specific Hypotheses for the ImpactEvaluation
The outcomes, both short term and long term,must be very clearly
defined.
Example: Mexico’s Progresa program should increase school enrollment
rates in the short run, and completed years of schooling in the long run.
Example: A program offers free regular check-up for pregnant women from
poor families. To evaluate this program on its impact of the health of newborn
children, which of the following indicators is more relevant?
Indicator 1: Percentage of newborns with low birth weight (< 2500 kg).
Indicator 2: Percentage of newborns withcleft lip.
Recall that almost all impact evaluations seek to answer the following question:
What is the causal impact of the program (or project or policy) on the outcomes of
interest?
Potential Outcomes, Observed Outcomes, and the Gain from Treatment
Let Y denote the outcome of interest, for example health status or school
enrollment.
For any person there are two potential values of Y, the value that would occur if he or
she were not “treated” by the program, which is denoted as Y0, and the value that would
occur if he or she were “treated”, which is denoted as Y1:
Both Y0 and Y1 are defined for ALL people. For example, we may observe Y1 for a treated
person but if he or she had not been treated we would have observed Y0. Indeed, we
usually want to estimate Y1 – Y0, or at least an average of this for some population of
interest.
The main problem for impact evaluation is that for each person, we observe only Y0 or
Y1, but not both, at any point in time.
Another very important variable is P, which is variable that indicates whether a person
participates in the program (is “treated” by the program).
From now on, Y represents the observed value of Y. That is, for a person with P = 1we
have Y = Y1, and for a person with P = 0 we have Y =Y0.
Y = PY1 + (1 − P)Y0
The potential outcomes framework was first introduced by Fisher (1951) and Roy (1951).
Early discussions of it appear in Quandt (1972) and Rubin (1978). It is often called the Roy
Model in economics or the Causal Model in statistics.
For any individual, the gain from treatment is defined as the difference between Y1 and
Y0. This can be denoted by Δ:
Δ= Y1 − Y0
Remember!
Because only one state is observed at any giventime Δ=the difference, or change
(that is, at any point in time for a given individual we (also known as ‘delta’)
observe either Y0 or Y1, but not both), the gain from
treatment (Δ) is not directly observed foranyone.
Estimating Δcan therefore be seen as a missing data problem. The evaluation literature
has developed a variety of different approaches to solve this problem.
Be aware o f t h e
missing d a t a
problem…
Parameters of Interest
The impacts of a project, program or policy are likely to be different for different
individuals, so almost all impact evaluations calculate some kind of average effect. The
different averages are often called “parameters ofinterest”.
The two most common parameters of interest for impact evaluation are:
In words, ATE(X) is the gain from the program that would be experienced on
average if a randomly chosen person with characteristics Xwere assigned to participate in
the program, and ATT(X) is the average gain experienced for the subset of individuals who
actually participated in the program (for whom P = 1).
Question: Suppose that participants in the program tend to be the people who receive
the greatest benefit from it. Which of the following would you expect?
ATT(X) >ATE(X)
ATT(X) =ATE(X)
ATT(X) <ATE(X)
Internal and External
Validity
Definitions:
Often (though not always), there is a tradeoff between internal validity and
external validity, as depicted in this chart.
Lab Empirical
experiment: Field Natural analysis of
If well experiment experiment observational
controlled data
Consequences of Small Samples and Low StatisticalPower
For example, with an MDE of 0.2, the smallest significant effect that can be
identified is 0.2 standard deviations. Any effect greater than 0.2 could be
detected, while any effect smaller than 0.2 would not be. Adifference in means
of 0.15 might be observed, but this difference would not be significant.
When conducting analysis, if no significant effect is found, this means that the
treatment did not have an effect at least as large as the MDE. The analyst
cannot draw any conclusions about whether smaller effects mightbe present.
It is important to clarify this when reporting results.
Final Comment on Sample Sizes:
Typical Features
Types Random Before-After Comparison
assignment Group
Experimental
Quasi-Experimental
Non-Experimental
? ?
What we have discussed
• Purpose of Impact Evaluation (IE)
• How to conduct IE: the big picture
• Introduction to IE problem
• Validity and threats to validity
• Overview of IE methods
Terima Kasih