Jim Rugh and Michael Bamberger

Jim Rugh and Michael Bamberger
RealWorld Evaluation Designing Evaluations under Budget, Time, Data and Political Constraints AEA Professional Pre-Conference Workshop, Anaheim, Nov. 1, 2011 Facilitated by Jim Rugh and Michael Bamberger Note: This PowerPoint presentation and the summary chapter of the book are available at: RealWorld Evaluation - Steps 1, 2 & 3

Workshop Objectives 1. The seven steps of the RealWorld Evaluation approach for addressing common issues and constraints faced by evaluators such as: when the evaluator is not called in until the project is nearly completed and there was no baseline nor comparison group; or where the evaluation must be conducted with inadequate budget and insufficient time; and where there are political pressures and expectations for how the evaluation should be conducted and what the conclusions should say; RealWorld Evaluation - Steps 1, 2 & 3

Workshop Objectives Defining what impact evaluation should be;
Identifying and assessing various design options that could be used in a particular evaluation setting; Ways to reconstruct baseline data when the evaluation does not begin until the project is well advanced or completed; How to minimize threats to validity or adequacy by utilizing appropriate combinations of quantitative and qualitative approaches (i.e. mixed methods) with reference to the specific context of RealWorld evaluations. RealWorld Evaluation - Steps 1, 2 & 3

Workshop Objectives And, in particular, a preview of the new, 2nd edition of the RealWorld Evaluation book, which is scheduled to be published next month! RealWorld Evaluation - Steps 1, 2 & 3

2 EDITION Evaluation RealWorld
Working Under Budget, Time, Data, and Political Constraints 2 EDITION A preview of the photo that will appear on the cover of the 2nd edition when it is published next month.

Workshop Objectives Note: In this workshop we focus on project-level impact evaluations. There are, of course, many other purposes, scopes, evaluands and types of evaluations. Some of these methods may apply to them, but our examples will be based on project impact evaluations, most of them in the context of developing countries. RealWorld Evaluation - Steps 1, 2 & 3

--- lunch [60 minutes] ---
Workshop agenda 1. Introduction [10 minutes] 2. Brief summary of the RealWorld Evaluation (RWE) approach [30 minutes] 3. Small group self-introductions and sharing of RWE issues you have faced in your own practice. [30 minutes] 4. RWE Steps 1, 2 and 3: Scoping the evaluation and strategies for addressing budget and time constraints [75 minutes] --- short break [15 minutes]--- 5. RWE Step 4: Addressing data constraints [30 minutes] 6. Small groups read their case studies and begin discussions [30 minutes] --- lunch [60 minutes] --- 7. Quantitative, qualitative and mixed methods [20 minutes] 8. Small groups complete preparation of their case study ToR exercises [30 minutes] 9. Paired groups negotiate their ToRs [60 minutes] 10. Feedback from exercise [15 minutes] 11. Wrap-up discussion, evaluation of the workshop [30 minutes]

OVERVIEW OF THE RWE APPROACH
RealWorld Evaluation Designing Evaluations under Budget, Time, Data and Political Constraints OVERVIEW OF THE RWE APPROACH

RealWorld Evaluation Scenarios
Scenario 1: Evaluator(s) not brought in until near end of project For political, technical or budget reasons: There was no life-of-project evaluation plan There was no baseline survey Project implementers did not collect adequate data on project participants at the beginning nor during the life of the project It is difficult to collect data on comparable control groups

RealWorld Evaluation Scenarios
Scenario 2: The evaluation team is called in early in the life of the project But for budget, political or methodological reasons: The ‘baseline’ was a needs assessment, not comparable to eventual evaluation It was not possible to collect baseline data on a comparison group

Reality Check – Real-World Challenges to Evaluation
All too often, project designers do not think evaluatively – evaluation not designed until the end There was no baseline – at least not one with data comparable to evaluation There was/can be no control/comparison group. Limited time and resources for evaluation Clients have prior expectations for what they want evaluation findings to say Many stakeholders do not understand evaluation; distrust the process; or even see it as a threat (dislike of being judged)

RealWorld Evaluation Quality Control Goals
Achieve maximum possible evaluation rigor within the limitations of a given context Identify and control for methodological weaknesses in the evaluation design Negotiate with clients trade-offs between desired rigor and available resources Presentation of findings must acknowledge methodological weaknesses and how they affect generalization to broader populations

The Need for the RealWorld Evaluation Approach
As a result of these kinds of constraints, many of the basic principles of rigorous impact evaluation design (comparable pre-test -- post test design, control group, adequate instrument development and testing, random sample selection, control for researcher bias, thorough documentation of the evaluation methodology etc.) are often sacrificed.

The RealWorld Evaluation Approach
An integrated approach to ensure acceptable standards of methodological rigor while operating under real-world budget, time, data and political constraints. See the RealWorld Evaluation book or at least summary chapter for more details

2 RealWorld Evaluation 2 Evaluation RealWorld RealWorld Evaluation
EDITION RealWorld Evaluation Bamberger Rugh Mabry RealWorld Evaluation Working Under Budget, Time, Data, and Political Constraints EDITION RealWorld Evaluation This book addresses the challenges of conducting program evaluations in real-world contexts where evaluators and their clients face budget and time constraints and where critical data may be missing. The book is organized around a seven-step model developed by the authors, which has been tested and refined in workshops and in practice. Vignettes and case studies—representing evaluations from a variety of geographic regions and sectors—demonstrate adaptive possibilities for small projects with budgets of a few thousand dollars to large-scale, long-term evaluations of complex programs. The text incorporates quantitative, qualitative, and mixed-method designs and this Second Edition reflects important developments in the field over the last five years. New to the Second Edition: Adds two new chapters on organizing and managing evaluations, including how to strengthen capacity and promote the institutionalization of evaluation systems Includes a new chapter on the evaluation of complex development interventions, with a number of promising new approaches presented Incorporates new material, including on ethical standards, debates over the “best” evaluation designs and how to assess their validity, and the importance of understanding settings Expands the discussion of program theory, incorporating theory of change, contextual and process analysis, multi-level logic models, using competing theories, and trajectory analysis Provides case studies of each of the 19 evaluation designs, showing how they have been applied in the field “This book represents a significant achievement. The authors have succeeded in creating a book that can be used in a wide variety of locations and by a large community of evaluation practitioners.” —Michael D. Niles, Missouri Western State University “This book is exceptional and unique in the way that it combines foundational knowledge from social sciences with theory and methods that are specific to evaluation.” —Gary Miron, Western Michigan University “The book represents a very good and timely contribution worth having on an evaluator’s shelf, especially if you work in the international development arena.” —Thomaz Chianca, independent evaluation consultant, Rio de Janeiro, Brazil Michael Bamberger Jim Rugh Linda Mabry 2 EDITIO N MANDATORY SPACE REQUIRED BY BANG FOR SUSTAINABLE FORESTRY INITIATIVE LOGO

The RealWorld Evaluation approach
Developed to help evaluation practitioners and clients managers, funding agencies and external consultants Still a work in progress (we continue to learn more through workshops like this) Originally designed for developing countries, but equally applicable in industrialized nations

Special Evaluation Challenges in Developing Countries
Unavailability of needed secondary data Scarce local evaluation resources Limited budgets for evaluations Institutional and political constraints Lack of an evaluation culture (though evaluation associations are addressing this) Many evaluations are designed by and for external funding agencies and seldom reflect local and national stakeholder priorities

Expectations for « rigorous » evaluations
Despite these challenges, there is a growing demand for methodologically sound evaluations which assess the impacts, sustainability and replicability of development projects and programs. (We’ll be talking more about that later.)

Most RealWorld Evaluation tools are not new— but promote a holistic, integrated approach
Most of the RealWorld Evaluation data collection and analysis tools will be familiar to experienced evaluators. What we emphasize is an integrated approach which combines a wide range of tools adapted to produce the best quality evaluation under RealWorld constraints.

What is Special About the RealWorld Evaluation Approach?
There is a series of steps, each with checklists for identifying constraints and determining how to address them These steps are summarized on the following slide and then the more detailed flow-chart …

The Steps of the RealWorld Evaluation Approach
Step 1: Planning and scoping the evaluation Step 2: Addressing budget constraints Step 3: Addressing time constraints Step 4: Addressing data constraints Step 5: Addressing political constraints Step 6: Assessing and addressing the strengths and weaknesses of the evaluation design Step 7: Helping clients use the evaluation

22 The Real-World Evaluation Approach
Step 1: Planning and scoping the evaluation . Defining client information needs and understanding the political context . Defining the program theory model . Identifying time, budget, data and political constraints to be addressed by the RWE . Selecting the design that best addresses client needs within the RWE constraints Step 2 Addressing budget constraints A. Modify evaluation design B. Rationalize data needs C. Look for reliable secondary data D. Revise sample design E. Economical data collection methods Step 3 Addressing time constraints All Step 2 tools plus: F. Commissioning preparatory studies G. Hire more resource persons H. Revising format of project records to include critical data for impact analysis. I. Modern data collection and analysis technology Step 4 Addressing data constraints . Reconstructing baseline data . Recreating comparison groups . Working with non-equivalent comparison groups . Collecting data on sensitive topics or from difficult to reach groups . Multiple methods Step 5 Addressing political influences A. Accommodating pressures from funding agencies or clients on evaluation design. B. Addressing stakeholder methodological preferences. C. Recognizing influence of professional research paradigms. This chart, that summarizes the seven steps of the RealWorld Evaluation approach, can be found on page 4 of the Condensed Overview (summary chapter), or on page 21 of the RealWorld Evaluation book. Step 6 Assessing and addressing the strengths and weaknesses of the evaluation design An integrated checklist for multi-method designs A. Objectivity/confirmability B. Replicability/dependability C. Internal validity/credibility/authenticity D. External validity/transferability/fittingness Step 7 Helping clients use the evaluation A. Utilization B. Application C. Orientation D. Action 22

TIME FOR SMALL GROUP DISCUSSION
23

How did you cope with them?
Self-introductions What constraints of these types have you faced in your evaluation practice? How did you cope with them? 24

PLANNING AND SCOPING THE EVALUATION
RealWorld Evaluation Designing Evaluations under Budget, Time, Data and Political Constraints Step 1 PLANNING AND SCOPING THE EVALUATION RealWorld Evaluation - Steps 1, 2 & 3

Step 1: Planning and Scoping the Evaluation
Understanding client information needs Defining the program theory model Preliminary identification of constraints to be addressed by the RealWorld Evaluation RealWorld Evaluation - Steps 1, 2 & 3

A. Understanding client information needs
Typical questions clients want answered: Is the project achieving its objectives? Is it having desired impact? Are all sectors of the target population benefiting? Will the results be sustainable? Which contextual factors determine the degree of success or failure? RealWorld Evaluation - Steps 1, 2 & 3

A. Understanding client information needs
A full understanding of client information needs can often reduce the types of information collected and the level of detail and rigor necessary. However, this understanding could also increase the amount of information required! RealWorld Evaluation - Steps 1, 2 & 3

B. Defining the program theory model
All programs are based on a set of assumptions (hypothesis) about how the project’s interventions should lead to desired outcomes. Sometimes this is clearly spelled out in project documents. Sometimes it is only implicit and the evaluator needs to help stakeholders articulate the hypothesis through a logic model. RealWorld Evaluation - Steps 1, 2 & 3

B. Defining the program theory model
Defining and testing critical assumptions are essential (but often ignored) elements of program theory models. The following is an example of a model to assess the impacts of microcredit on women’s social and economic empowerment RealWorld Evaluation - Steps 1, 2 & 3

Critical logic chain hypothesis for a Gender-Inclusive Micro-Credit Program
Sustainability Structural changes will lead to long-term impacts. Medium/long-term impacts Increased women’s economic and social empowerment. Economic and social welfare of women and their families will improve. Short-term outcomes If women obtain loans they will start income-generating activities. Women will be able to control the use of loans and reimburse them. Outputs If credit is available women will be willing and able to obtain loans and technical assistance. Typically project designers begin with their planned activities and outputs, then predict what outcomes and impact will come about due to those project interventions. RealWorld Evaluation - Steps 1, 2 & 3

PROBLEM Consequences Consequences Consequences PRIMARYCAUSE 1
Secondary cause 2.1 Secondary cause 2.3 Secondary cause 2.2 Ideally project design should begin by key stakeholders (including intended beneficiaries) going through a process of identifying the main problem they want to address – the change they want to bring about. Then identifying primary causes of that problem, secondary causes, tertiary causes (here illustrated with three at each level, though there could be fewer or more). There can, of course, be higher-level consequences; but the project should identify a major problem it will address that can reasonably be expected to be achieved during the project’s lifetime. Tertiary cause 2.2.1 Tertiary cause 2.2.2 Tertiary cause 2.2.3

DESIRED IMPACT Consequences Consequences Consequences OUTCOME 1
OUTPUT 2.1 OUTPUT 2.3 OUTPUT 2.2 The problem tree is then converted to a “Solution Tree” with the high-level change (in human conditions) identified as the desired Impact Goal, Outcome Objectives needed to achieve that Impact, project Outputs needed to achieve each Outcome Objective, etc. At the lower level are the actual interventions the project intends to undertake (e.g. training). Note that good design practice begins at the top, then works down. Intervention 2.2.1 Intervention 2.2.2 Intervention 2.2.3

Reduction in poverty Schools built Women empowered
Economic opportunities for women Women in leadership roles Young women educated Improved educational policies Curriculum improved Female enrollment rates increase We will illustrate this form of logic model with an education project – more specifically one that is focused on building schools. (Typically beginning at the bottom.) The designers need to recognize that more than classrooms are needed in order to achieve the outcome of increased enrollment of girls. And there need to be even other external assumptions fulfilled if higher outcomes and impact are to be achieved, such as girls completing quality education, leading to their long-term empowerment, and even (a higher desired consequence) that this will all lead to a reduction of poverty in these households and communities. Parents persuaded to send girls to school School system hires and pays teachers Schools built

the same target population. Program Goal: Young women educated
To have synergy and achieve impact all of these need to address the same target population. Program Goal: Young women educated Advocacy Project Goal: Improved educational policies enacted Teacher Education Project Goal: Improve quality of curriculum Construction Project Goal: More classrooms built Here is an illustration of what a Program looks like: Two or more projects working in the same area, addressing the needs and rights of the same target population, collaborate together to provide the synergy needed to achieve higher level (program) impact. As in this example, each project has a Outcome Objective at a level that can be achieved and measured during the life of the project. Though our own agency may choose to directly implement one or more of the projects, it may be more appropriate to assist one or more partners to implement complementary projects. If one assumes that someone else will take care of some aspect (e.g. Government Ministry of Education improving educational policies), it is important that we monitor that assumption. For the hypothesis developed from the problem analysis (in this example) is that all three of these causes must be addressed in order for the Program Impact to be achieved (young women completing quality education.) ASSUMPTION (that others will do this) OUR project PARTNER will do this Program goal at impact level

What does it take to measure indicators at each level?
Impact :Population-based survey (baseline, endline evaluation) Outcome: Change in behavior of participants (can be surveyed annually) Output: Measured and reported by project staff (annually) The quality of each level in the cause-effect hierarchy is measured at the next higher level. Here we consider what it takes to measure indicators at each level. Inputs are commonly measured by the financial accounting system. (We’re pretty good at this form of accountability.) Activities are frequently reported by field staff. (Too frequently -- typically there is far more recorded than is really needed for management decision-making.) At least in their annual reports, project staff should document what Outputs the project achieved. There are basically two types of Effect (Outcome) measurement: A) One involves following up project participants to see how many are practicing what they were taught. B) To measure the proportion of the community practicing these techniques (behaviors) requires a population-based survey. Project impact goals commonly refer to the percentage of the target population that will be benefited. This calls for a population-based survey. That requires extra effort and resources; is not likely to be done every year, but should be done at least at baseline and final project evaluation. Program impact evaluations that assess the higher (and multi-sectoral) impact of a series of projects require even greater resources and sophistication. These should probably not be attempted at time intervals of less than 10 years. Activities: On-going (monitoring of interventions) Inputs: On-going (financial accounts)

PERFORMANCE MONITORING
We need to recognize which evaluative process is most appropriate for measurement at various levels Impact Outcomes Output Activities Inputs IMPACT EVALUATION PROJECT EVALUATION PERFORMANCE MONITORING There has been a trend by USAID and some other donors and agencies to rely more and more on Performance Monitoring and away from regular Project Evaluation, much less occasional Impact Evaluation. This has its limitations, as illustrated on this slide. It is reasonable to expect a Performance Monitoring system to collect, aggregate and report quantitative data at the input, activities and output levels on a fairly routine basis. But to determine whether or not effects and impacts have been achieved requires more than an analysis of monitoring data -- they require a special study, a greater “stepping back” and broader, more holistic perspectives -- that’s the unique role of evaluations.

One form of Program Theory (Logic) Model
Economic context in which the project operates Political context in which the project operates Institutional and operational context Socio-economic and cultural characteristics of the affected populations Design Inputs Implementation Process Outputs Outcomes Impacts Sustainability Typically project logic models are linear. This is one illustration of a more holistic model, where key external conditions are considered, since the success of the project is obviously dependent on those conditions. And if those external conditions change (negatively or positively) during the life of the project, the project’s own plans may need to be flexible and change accordingly. Note: The orange boxes are included in conventional Program Theory Models. The addition of the blue boxes provides the recommended more complete analysis.

There are many creative ways to depict logic models
There are many creative ways to depict logic models. Here is one developed by the Asia Foundation for a Trafficking In People (TIP) program in Nepal. It includes three Components (sub-projects), with a variety of types of interventions. The challenge is to make it compressible enough to show the “big picture logic” of the program without becoming so complex that it becomes confusing. Another challenge is to know what aspects of a logic model like this should be tested by an evaluation.

Expanding the results chain for multi-donor, multi-component program
Increased rural H/H income Increased political participation Improved education performance Improved health Impacts Increased production Access to off-farm employment Increased school enrolment Intermediate outcomes Increased use of health services Outputs Credit for small farmers Health services Rural roads Schools Many programs are much more complex than the simple cause-effect hierarchy depicted in typical project logframes. Although a more comprehensive picture (such as the one on this slide) is more realistic, especially at broader geographical levels where multiple agencies are involved, assessing attribution of each gets rather challenging. Ideally we can identify plausible contributions each actor makes to the achievement of higher level outcomes and ultimate impact. Inputs Donor Government Other donors Attribution gets very difficult! Consider plausible contributions each makes.

Education Intervention Logic
Global Impacts Output Clusters Outcomes Specific Impact Intermediate Impacts Better Allocation of Educational Resources Improved Family Planning & Health Awareness Institutional Management Quality of Education Increased Affordability of Education Economic Growth Skills and Learning Enhancement Curricula & Teaching Materials Poverty Reduction Improved Participation in Society MDG 2 MDG 1 Teacher Recruitment & Training Equitable Access to Education Social Development Here is another example of a more comprehensive logic model, this one by OECE/DAC, with a multi-sectoral national set of programs aimed at having impact on the MDGs (Millennium Development Goals). Question: if you were designing an impact evaluation, what aspects of such a program would you focus on? MDG 2 Health MDG 3 Greater Income Opportunities Education Facilities Optimal Employment Source: OECE/DAC Network on Development Evaluation

So what should be included in a “rigorous impact evaluation”?
Direct cause-effect relationship between one output (or a very limited number of outputs) and an outcome that can be measured by the end of the research project?  Pretty clear attribution. … OR … Changes in higher-level indicators of sustainable improvement in the quality of life of people, e.g. the MDGs (Millennium Development Goals)?  More significant but much more difficult to assess direct attribution. RealWorld Evaluation - Steps 1, 2 & 3

So what should be included in a “rigorous impact evaluation”?
OECD-DAC (2002: 24) defines impact as “the positive and negative, primary and secondary long-term effects produced by a development intervention, directly or indirectly, intended or unintended. These effects can be economic, sociocultural, institutional, environmental, technological or of other types”. Does it mention or imply direct attribution? Or point to the need for counterfactuals or Randomized Control Trials (RCTs)? RealWorld Evaluation - Steps 1, 2 & 3

Coming to agreement on what levels of the logic model to include in evaluation
This can be a sensitive issue: Project staff generally don’t like to be held accountable for more than the output level, while donors (and intended beneficiaries) may insist on evaluating higher-level outcomes. An approach evaluators might take is that if the correlation between intermediary effects (outcomes) and impact has been adequately established thorugh research and previous evaluations, then assessing intermediary outcome-level indicators might suffice, as long as the contexts can be shown to be sufficiently similar to where such cause-effect correlations have been tested. RealWorld Evaluation - Steps 1, 2 & 3

Definition of program evaluation
Program evaluation is the systematic collection of information about the activities, characteristics and results of a program to make judgments about the program, improve or further develop program effectiveness, inform decisions about future programming, and/or increase understanding. -- Michael Quinn Patton, Utilization-Focused Evaluation, 4th edition, 2008, page 39 RealWorld Evaluation - Steps 1, 2 & 3

Some of the purposes for program evaluation
Formative: learning and improvement including early identification of possible problems Knowledge generating: identify cause-effect correlations and generic principles about effectiveness. Accountability: to demonstrate that resources are used efficiently to attain desired results Summative judgment: to determine value and future of program Developmental evaluation: adaptation in complex, emergent and dynamic conditions -- Michael Quinn Patton, Utilization-Focused Evaluation, 4th edition, pages RealWorld Evaluation - Steps 1, 2 & 3

Determining appropriate (and feasible) evaluation design
Based on the main purpose for conducting an evaluation, an understanding of client information needs, required level of rigor, and what is possible given the constraints, the evaluator and client need to determine what evaluation design is required and possible under the circumstances. RealWorld Evaluation - Steps 1, 2 & 3

Some of the considerations pertaining to evaluation design
1: When evaluation events take place (baseline, midterm, endline) 2. Review different evaluation designs (experimental, quasi-experimental, other) 3: Levels of rigor 4: Qualitative & quantitative methods 5: A life-of-project evaluation design perspective. RealWorld Evaluation - Steps 1, 2 & 3

An introduction to various evaluation designs
Illustrating the need for quasi-experimental longitudinal time series evaluation design Project participants scale of major impact indicator Here’s a very short course on Evaluation (research) Design. Let’s assume a project’s Final Goal has a quantifiable indicator. (Think of an example. The enrollment rate of girls in schools might be one.) Higher is better. Evaluation Design #1: Only end-of-project evaluation. Indicator appears to be high, but even if measured very precisely, it begs the question: compared to what? Evaluation Design #2: Pre-test + post-test (baseline + evaluation). Now we see that the indicator went from low to high. Impressive. Did the project have impact? A good evaluator would ask if the change is attributable to the project’s interventions or could it be due to general trends? Evaluation Design #3: Post-test with control (comparison). In this scenario a “comparable” community was found and the indicator measured there during the evaluation. But the evaluator wonders if we purposely chose a poor comparison group. Evaluation Design #4: Quasi-Experimental Design. Pretty convincing that the project’s participants did better than the comparison group. But it only involves two snapshots before and two after. Evaluation Design #5: Longitudinal monitoring (at least small sample of proxy outcome indicator). Gives a more detailed view of trends during the life of the project. Looks good, but evaluator should still ask about sustainability. Evaluation Design #6: Quasi-Experimental Longitudinal Design with Ex-Post Evaluation some time after project completion. Now we see that the project’s participants were spoiled (by some form of subsidy) and in the long-run the “comparison” community did better. Comparison group baseline end of project evaluation post project evaluation 49 RealWorld Evaluation - Steps 1, 2 & 3

… one at a time, beginning with the most rigorous design.
OK, let’s stop the action to identify each of the major types of evaluation (research) design … … one at a time, beginning with the most rigorous design. 50 RealWorld Evaluation - Steps 1, 2 & 3

First of all: the key to the traditional symbols:
X = Intervention (treatment), I.e. what the project does in a community O = Observation event (e.g. baseline, mid-term evaluation, end-of-project evaluation) P (top row): Project participants C (bottom row): Comparison (control) group Note: the 7 RWE evaluation designs are laid out on page 41 of the Condensed Overview of the RealWorld Evaluation book 51 RealWorld Evaluation - Steps 1, 2 & 3

Design #1: Longitudinal Quasi-experimental
P1 X P2 X P3 P4 C C2 C3 C4 Project participants Comparison group baseline midterm end of project evaluation post project evaluation 52 RealWorld Evaluation - Steps 1, 2 & 3

Design #2: Quasi-experimental (pre+post, with comparison)
P1 X P2 C C2 Project participants Notice how much less information is available as various components of evaluation design are removed. Comparison group baseline end of project evaluation 53 RealWorld Evaluation - Steps 1, 2 & 3

Design #2+: Randomized Control Trial
P1 X P2 C C2 Project participants Research subjects randomly assigned either to project or control group. Even if there is an initial random selection of who should participate in the project and who should be held as “control” (the essential yet controversial element of Randomized Control Trials), notice how much information is missing without longitudinal data and ex-post evaluation. Control group baseline end of project evaluation 54 RealWorld Evaluation - Steps 1, 2 & 3

Design #3: Truncated Longitudinal
X P1 X P2 C1 C2 Project participants In this scenario there was no baseline conducted at the beginning of the project, but there was a midterm (e.g. year 2.5 of a 5-year project) survey when relevant data was measured. This helps, but the remaining time is so short it will be hard to create much change. Comparison group midterm end of project evaluation 55 RealWorld Evaluation - Steps 1, 2 & 3

Design #4: Pre+post of project; post-only comparison
P X P2 C Project participants In this design no baseline data was collected on the comparison group, though data was collected on a comparison group at the endline. Requires seeking adequate information to indicate what the conditions of the comparison group were at the time the project started. Look for relevant secondary data, key informants, use of recall methods, etc. Comparison group baseline end of project evaluation 56 RealWorld Evaluation - Steps 1, 2 & 3

Design #5: Post-test only of project and comparison
X P C Project participants No baseline. Begs the question on what the conditions were for both the project’s participants and the comparison group at the time the project started. Look for relevant secondary data, key informants, recall, etc. to fill in the missing information. Comparison group end of project evaluation 57 RealWorld Evaluation - Steps 1, 2 & 3

Design #6: Pre+post of project; no comparison
P X P2 Project participants Pre-test and post-test of the project participants measures change in the indicator, but this design provides no information on the counterfactual -- what would have happened without the project. Needs to be supplemented by information from other sources on what changes occurred in the general population or in comparable communities during the life of the project. baseline end of project evaluation 58 RealWorld Evaluation - Steps 1, 2 & 3

Design #7: Post-test only of project participants
X P Project participants Need to fill in missing data through other means: What change occurred during the life of the project? What would have happened without the project (counterfactual)? How sustainable is that change likely to be? Status of high-level outcome indicator only measured at end of project, only of project beneficiaries. Obviously the weakest evaluation design – yet by far the most common scenario in the real world (at least in international development projects). Even if the indicator in question is measured very precisely (e.g. with a very rigorous survey, or exhaustive qualitative methods) there was no direct measurement of what change occurred during the life of the project, nor any form of counterfactual. Very important to use complementary methods to obtain other information. end of project evaluation 59 RealWorld Evaluation - Steps 1, 2 & 3

T1 (baseline) X T2 (midterm) T3 (endline) T4 (ex-post)
Design T1 (baseline) X (intervention) T2 (midterm) (intervention, cont.) T3 (endline) T4 (ex-post) 1 P1 C1 P2 C2 P3 C3 P4 C4 2 3 4 5 6 7 This table summarizes these 7 evaluation designs.

Attribution and counterfactuals
How do we know if the observed changes in the project participants or communities income, health, attitudes, school attendance. etc are due to the implementation of the project credit, water supply, transport vouchers, school construction, etc or to other unrelated factors? changes in the economy, demographic movements, other development programs, etc

The Counterfactual What change would have occurred in the relevant condition of the target population if there had been no intervention by this project?

Where is the counterfactual?
After families had been living in a new housing project for 3 years, a study found average household income had increased by an 50% Does this show that housing is an effective way to raise income?

Comparing the project with two possible comparison groups
Project group. 50% increase 750 Scenario 2. 50% increase in comparison group income. No evidence of project impact 500 50% higher incomes of those involved in the project looks impressive, if others’ incomes stayed stagnant. But if the comparison group (people living in comparable communities) also enjoy higher incomes, the net impact of the project on incomes was actually zero. Scenario 1. No increase in comparison group income. Potential evidence of project impact 250 2004 2009

Control group and comparison group
Control group = randomized allocation of subjects to project and non-treatment group Comparison group = separate procedure for sampling project and non-treatment groups that are as similar as possible in all aspects except the treatment (intervention)

Some recent developments in impact evaluation in development
J-PAL is best understood as a network of affiliated researchers … united by their use of the randomized trial methodology… 2003 2006 2009 2008 2010 In recent years there has been an increasing emphasis on impact evaluation in public policy and development. In public policy there has been a focus on evidence-based policy and practice, drawing on approaches developed for evidence-based medicine. In development there have been calls for more rigorous impact evaluation. In both areas there have been debates about what constitutes ‘rigorous evidence’ and ‘scientific approaches’ to impact evaluation. Some people and organizations have argued exclusively for increased use of specific research designs – in particular Randomized Controlled Trials (RCTs). Others have argued that these designs are not always appropriate or feasible, and that we need other approaches to doing more systematic, rigorous and useful impact evaluation.

If so, under what circumstances should they be used?
So, is Jim saying that Randomized Control Trials (RCTs) are the Gold Standard and should be used in most if not all program impact evaluations? Yes or no? Why or why not? If so, under what circumstances should they be used? If not, under what circumstances would they not be appropriate?

Different lenses needed for different situations in the RealWorld
Simple Complicated Complex Following a recipe Sending a rocket to the moon Raising a child Recipes are tested to assure easy replication Sending one rocket to the moon increases assurance that the next will also be a success Raising one child provides experience but is no guarantee of success with the next The best recipes give good results every time There is a high degree of certainty of outcome Uncertainty of outcome remains Sources: Westley et al (2006) and Stacey (2007), cited in Patton 2008; also presented by Patricia Rodgers at Cairo impact conference 2009.

Adapted from Patricia Rogers, RMIT University
Evidence-based policy for simple interventions (or simple aspects): when RCTs may be appropriate Question needed for evidence-based policy  What works? What interventions look like  Discrete, standardized intervention Logic model  How interventions work  Simple, direct cause  effect Pretty much the same everywhere Process needed for evidence uptake  Knowledge transfer Adapted from Patricia Rogers, RMIT University

When might rigorous evaluations of higher-level “impact” indicators not be needed?
Complicated, complex programs where there are multiple interventions by multiple actors Projects working in evolving contexts (e.g. conflicts, natural disasters) Projects with multiple layered logic models, or unclear cause-effect relationships between outputs and higher level “vision statements” (as is often the case in the RealWorld of international development projects) RealWorld Evaluation - Steps 1, 2 & 3

When might rigorous evaluations of higher-level “impact” indicators not be needed?
An approach evaluators might take is that if the correlation between intermediary effects (outcomes) and higher-level impact has been adequately established though research and previous evaluations, then assessing intermediary outcome-level indicators might suffice, as long as the contexts (internal and external conditions) can be shown to be sufficiently similar to where such cause-effect correlations have been tested. RealWorld Evaluation - Steps 1, 2 & 3

Examples of cause-effect correlations that are generally accepted
Vaccinating young children with a standard set of vaccinations at prescribed ages leads to reduction of childhood diseases (means of verification involves viewing children’s health charts, not just total quantity of vaccines delivered to clinic) Other examples … ?

Conditional cash transfers The use of visual aids in Kenyan schools
But look at examples of what kinds of interventions have been “rigorously tested” using RTCs Conditional cash transfers The use of visual aids in Kenyan schools Deworming children (as if that’s all that’s needed to enable them to get a good education) Note that this kind of research is based on the quest for Silver Bullets – simple, most cost-effective solutions to complex problems.

Quoted by Patricia Rogers, RMIT University
“Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.“ J. W. Tukey (1962, page 13), "The future of data analysis". Annals of Mathematical Statistics 33(1), pp Quoted by Patricia Rogers, RMIT University

“An expert is someone who knows more and more about less and less until he knows absolutely everything about nothing at all.”* *Quoted by a friend; also available at

Is that what we call “scientific method”?
There is much more to impact, to rigor, and to “the scientific method” than RTCs. Serious impact evaluations require a more holistic approach. *Quoted by a friend; also available at

A more comprehensive design
Consequences Consequences Consequences DESIRED IMPACT OUTCOME 1 OUTCOME 2 OUTCOME 3 A more comprehensive design OUTPUT 2.1 OUTPUT 2.3 OUTPUT 2.2 Unfortunately, too often simplistic RCTs only conduct research on one intervention, and judge its “impact” by attributable change in a fairly near-term outcome, without adequate consideration of other causal chains. A more comprehensive design should factor in an adequate number of causal streams to determine not only which but what combination of interventions by one agency or others, or necessary pre-conditions need to be in place to achieve higher level impact. A Simple RCT Intervention 2.2.1 Intervention 2.2.2 Intervention 2.2.3

There can be validity problems with RTCs
Internal validity Quality issues – Poor measurement, Poor adherence to randomisation, Inadequate statistical power, Ignored differential effects, Inappropriate comparisons, Fishing for statistical significance, Differential attrition between control and treatment groups, Treatment leakage, Unplanned cross-over, Unidentified poor quality implementation Other issues - Random error, Contamination from other sources, Need for a complete causal package, Lack of blinding, External validity Effectiveness in real world practice, Transferability to new situations Patricia Rogers, RMIT University

The limited use of strong evaluation designs
In the RealWorld (at least of international development programs) we estimate that: fewer than 5%-10% of project impact evaluations use a strong experimental or even quasi-experimental designs significantly less than 5% use randomized control trials (‘pure’ experimental design)

What kinds of evaluation designs are actually used in the real world of international development? Findings from meta-evaluations of 336 evaluation reports of an INGO. Post-test only 59% Before-and-after 25% With-and-without 15% Other counterfactual 1% Data summarized from four bi-annual meta-evaluations of evaluation reports from CARE projects in many countries. Colleagues familiar with other agencies report proportions of evaluations with no pre-test + post-test, nor counterfactual, are typically higher than the percentages reported here.

There are other methods for assessing the counterfactual
Reliable secondary data that depicts relevant trends in the population Longitudinal monitoring data (if it includes non-reached population) Qualitative methods to obtain perspectives of key informants, participants, neighbors, etc. We’ll talk more about this later

Rigorous impact evaluation should include (but is not limited to):
thorough consultation with and involvement by a variety of stakeholders, articulating a comprehensive logic model that includes relevant external influences, getting agreement on desirable ‘impact level’ goals and indicators, adapting evaluation design as well as data collection and analysis methodologies to respond to the questions being asked, …

7) being sufficiently flexible to account for evolving contexts, …
Rigorous impact evaluation should include (but is not limited to): 5) adequately monitoring and documenting the process throughout the life of the program being evaluated, 6) using an appropriate combination of methods to triangulate evidence being collected, 7) being sufficiently flexible to account for evolving contexts, …

8) using a variety of ways to determine the counterfactual,
Rigorous impact evaluation should include (but is not limited to): 8) using a variety of ways to determine the counterfactual, 9) estimating the potential sustainability of whatever changes have been observed, 10) communicating the findings to different audiences in useful ways, 11) etc. …

The point is that the list of what’s required for ‘rigorous’ impact evaluation goes way beyond initial randomization into treatment and ‘control’ groups.

To attempt to conduct an impact evaluation of a program using only one pre-determined tool is to suffer from myopia, which is unfortunate. On the other hand, to prescribe to donors and senior managers of major agencies that there is a single preferred design and method for conducting all impact evaluations can and has had unfortunate consequences for all of those who are involved in the design, implementation and evaluation of international development programs.

We must be careful that in using the “Gold Standard”
we do not violate the “Golden Rule”: “Judge not that you not be judged!” In other words: “Evaluate others as you would have them evaluate you.”

Caution: Too often what is called Impact Evaluation is based on a “we will examine and judge you” paradigm. When we want our own programs evaluated we prefer a more holistic approach.

To use the language of the OECD/DAC, let’s be sure our evaluations are consistent with these criteria: RELEVANCE: The extent to which the aid activity is suited to the priorities and policies of the target group, recipient and donor. EFFECTIVENESS: The extent to which an aid activity attains its objectives. EFFICIENCY: Efficiency measures the outputs – qualitative and quantitative – in relation to the inputs. IMPACT: The positive and negative changes produced by a development intervention, directly or indirectly, intended or unintended. SUSTAINABILITY is concerned with measuring whether the benefits of an activity are likely to continue after donor funding has been withdrawn. Projects need to be environmentally as well as financially sustainable.

The bottom line is defined by this question:
Are our programs making plausible contributions towards positive impact on the quality of life of our intended beneficiaries? Let’s not forget them!

Thank you! 91

But enough of my rant about RTCs!

But enough of my rant about RTCs!
Moving on …

Who asked for the evaluation? (Who are the key stakeholders)?
Still part of Step 1: Other questions to answer as you customize an evaluation Terms of Reference (ToR): Who asked for the evaluation? (Who are the key stakeholders)? What are the key questions to be answered? Will this be a formative or summative evaluation? Will there be a next phase, or other projects designed based on the findings of this evaluation? 94 RealWorld Evaluation - Steps 1, 2 & 3

Other questions to answer as you customize an evaluation ToR:
What decisions will be made in response to the findings of this evaluation? What is the appropriate level of rigor? What is the scope / scale of the evaluation / evaluand (thing to be evaluated)? How much time will be needed / available? What financial resources are needed / available? 95 RealWorld Evaluation - Steps 1, 2 & 3

Other questions to answer as you customize an evaluation ToR:
Should the evaluation rely mainly on quantitative or qualitative methods? Should participatory methods be used? Can / should there be a household survey? Who should be interviewed? Who should be involved in planning / implementing the evaluation? What are the most appropriate media for communicating the findings to different stakeholder audiences? 96 RealWorld Evaluation - Steps 1, 2 & 3

Evaluation (research) design?
Resources available? Key questions? Time available? Evaluand (what to evaluate)? Skills available? Qualitative? Participatory? Quantitative? Extractive? Scope? Appropriate level of rigor? Evaluation FOR whom? Does this help, or just confuse things more? Who said evaluations (like life) would be easy?!! 97 RealWorld Evaluation - Steps 1, 2 & 3

Before we return to the RealWorld steps, let’s gain a perspective on levels of rigor, and what a life-of-project evaluation plan could look like 98 RealWorld Evaluation - Steps 1, 2 & 3

Different levels of rigor
depends on source of evidence; level of confidence; use of information Objective, high precision – but requiring more time & expense Quick & cheap – but subjective, sloppy Level 5: A very thorough research project is undertaken to conduct in-depth analysis of situation; P= +/- 1% Book published! Level 4: Good sampling and data collection methods used to gather data that is representative of target population; P= +/- 5% Decision maker reads full report Level 3: A rapid survey is conducted on a convenient sample of participants; P= +/- 10% Decision maker reads 10-page summary of report Level 2: A fairly good mix of people are asked their perspectives about project; P= +/- 25% Decision maker reads at least executive summary of report Some are now calling this the “Rugh rigor scale.” Its purpose is simply to get us to think in terms of levels of rigor appropriate for the degree of precision required. Remember, the highest level of rigor is neither feasible nor necessary in most RealWorld Evaluation situations. At the low end decisions are based on “efficient” (quick and cheap) sources of information. Unfortunately these typically are based on rather subjective, sloppy sources. At the high end are systems that provide much more objective, precise information; but they take more time and resources to collect and process. Note that levels of rigor apply to qualitative as well as quantitative methods -- how well the evidence/data is collected and analyzed, in addition to sample design and size. They also call for triangulation, using a diversity of methods. What’s important is to determine what level of informational precision is required to inform a particular set of decisions, and thus what level of rigor is appropriate for collecting it. If the consequences of a decision are not significant, it would probably not be justified to spend a whole lot of time and resources collecting the information. On the other hand, if the decision will affect the lives of thousands of people and involve hundreds of thousands of dollars, it had better be based on very reliable information. Level 1: A few people are asked their perspectives about project; P= +/- 40% Decision made in a few minutes Level 0: Decision-maker’s impressions based on anecdotes and sound bytes heard during brief encounters (hallway gossip), mostly intuition; Level of confidence +/- 50%; Decision made in a few seconds 99 RealWorld Evaluation - Steps 1, 2 & 3

CONDUCTING AN EVALUATION IS LIKE LAYING A PIPELINE
Randomized sample selection Reliability & validity of indicators Questionnaire quality Depth of analysis Reporting & utilization Quality of data collection Consider the various elements involved in conducting a survey. When we consider rigor we need to consider the level of rigor involved in all of those elements. QUALITY OF INFORMATION GENERATED BY AN EVALUATION DEPENDS UPON LEVEL OF RIGOR OF ALL COMPONENTS

Relevance & validity of indicators Randomized sample selection
Questionnaire quality Depth of analysis Reporting & utilization Quality of data collection The quality of information/data obtained from a survey or other evaluation method depends on all of the components. The “flow” or quality will be limited to the “constraints” or level of rigor of the weakest component. AMOUNT OF “FLOW” (QUALITY) OF INFORMATION IS LIMITED TO THE SMALLEST COMPONENT OF THE SURVEY “PIPELINE”

Same level of rigor Determining appropriate levels of precision for
events in a life-of-project evaluation plan High rigor Low rigor 2 3 4 Same level of rigor Final evaluation Baseline study Mid-term evaluation Special Study Needs assessment Here we look a way to lay out plans for evaluation events during the life of a project, considering what level of rigor will be required for each. The initial diagnostic assessment should be rigorous enough to reveal what issues need to be addressed in the target community, but one should not spend a disproportionate amount of the budget on it. The baseline, as we will soon see, should be done with the same methodology and level of rigor as will be expected for the final evaluation. During the life of the project there may be other evaluation events, such as annual self-evaluations, a mid-term evaluation, or a special study to examine some aspect not considered earlier (e.g. gender equity). The appropriate level of rigor (and budget) for each of these needs to be determined, relative to other evaluation events. It is commonly agreed that the final evaluation should be quite rigorous and precise, to assess what was achieved by the project. However, if the baseline (the measure of how things were before the project started) was not done in a comparable way, it is difficult to prove what difference the project made. Annual self-evaluation Time during project life cycle 102 RealWorld Evaluation - Steps 1, 2 & 3

Now, where were we? Oh, yes, we’re ready for Steps 2 and 3 of the RealWorld Evaluation Approach. Let’s continue … RealWorld Evaluation - Steps 1, 2 & 3

ADDRESSING BUDGET AND TIME CONSTRAINTS
RealWorld Evaluation Designing Evaluations under Budget, Time, Data and Political Constraints Steps 2 + 3 ADDRESSING BUDGET AND TIME CONSTRAINTS RealWorld Evaluation - Steps 1, 2 & 3

Step 2: Addressing budget constraints
Clarifying client information needs Simplifying the evaluation design Look for reliable secondary data Review sample size Reducing costs of data collection and analysis RealWorld Evaluation - Steps 1, 2 & 3

Rationalize data needs
Use information from Step 1 to identify client information needs Simplify evaluation design (but be prepared to compensate for ‘missing pieces’) Review all data collection instruments and cut out any questions not directly related to the objectives of the evaluation. RealWorld Evaluation - Steps 1, 2 & 3

Look for reliable secondary sources
Planning studies, project administrative records, government ministries, other NGOs, universities / research institutes, mass media. RealWorld Evaluation - Steps 1, 2 & 3

Look for reliable secondary sources, cont.
Assess the relevance and reliability of sources for the evaluation with respect to: Coverage of the target population Time period Relevance of the information collected Reliability and completeness of the data Potential biases RealWorld Evaluation - Steps 1, 2 & 3

Some ways to save time and money
Depending upon the purpose and level of rigor required, some of the options might include: Reducing the number of units studied (communities, families, schools) Reducing the number of case studies or the duration and complexity of the cases Reducing the duration or frequency of observations RealWorld Evaluation - Steps 1, 2 & 3

Seeking ways to reduce sample size
Accepting a lower level of precision significantly reduces the required number of interviews: To test for a 5% change in proportions requires a minimum sample of 1086 To test for a 10% change in proportions requires a minimum sample of 270 RealWorld Evaluation - Steps 1, 2 & 3

Reducing costs of data collection and analysis
Use self-administered questionnaires Reduce length and complexity of survey instrument Use direct observation Obtain estimates from focus groups and community forums Key informants Participatory assessment methods Multi-methods and triangulation RealWorld Evaluation - Steps 1, 2 & 3

Step 3: Addressing time constraints
In addition to Step 2 (budget constraint) methods: Reduce time pressures on external consultants Commission preparatory studies Video conferences Hire more consultants/researchers Incorporate outcome indicators in project monitoring systems and documents Technology for data inputting/coding RealWorld Evaluation - Steps 1, 2 & 3

Addressing time constraints
Negotiate with the client to discuss questions such as the following: What information is essential and what could be dropped or reduced? How much precision and detail is required for the essential information? E.g. is it necessary to have separate estimates for each geographical region or sub-group or is a population average acceptable? Is it necessary to analyze all project components and services or only the most important? Is it possible to obtain additional resources (money, staff, computer access, vehicles etc) to speed up the data collection and analysis process? RealWorld Evaluation - Steps 1, 2 & 3

TIME FOR A BREAK ! 114 RealWorld Evaluation - Steps 1, 2 & 3

Step 4 Addressing data constraints
RealWorld Evaluation Designing Evaluations under Budget, Time, Data and Political Constraints Step 4 Addressing data constraints

Ways to reconstruct baseline conditions
Secondary data Project records Recall Key informants

Ways to reconstruct baseline conditions
PRA (Participatory Rapid Appraisal) and PLA (Participatory Learning and Action) and other participatory techniques such as timelines and critical incidents to help establish the chronology of important changes in the community

Assessing the utility of potential secondary data
Reference period Population coverage Inclusion of required indicators Completeness Accuracy Free from bias

Examples of secondary data to reconstruct baselines
Census Other surveys by government agencies Special studies by NGOs, donors University research studies Mass media (newspapers, radio, TV) External trend data that might have been monitored by implementing agency

Using internal project records
Types of data Feasibility/planning studies Application/registration forms Supervision reports Management Information System (MIS) data Meeting reports Community and agency meeting minutes Progress reports Construction, training and other implementation records, including costs

Assessing the reliability of project records
Who collected the data and for what purpose? Were they collected for record-keeping or to influence policymakers or other groups? Do monitoring data only refer to project activities or do they also cover changes in outcomes? Were the data intended exclusively for internal use? For use by a restricted group? Or for public use?

Using recall to reconstruct baseline data
School attendance and time/cost of travel Sickness/use of health facilities Income and expenditures Community/individual knowledge and skills Social cohesion/conflict Water usage/quality/cost Periods of stress Travel patterns

Where Knowledge about Recall is Greatest
Areas where most research has been done on the validity of recall Income and expenditure surveys Demographic data and fertility behavior Types of Questions Yes/No; fact Scaled Easily related to major events

Limitations of recall Generally not reliable for precise quantitative data Sample selection bias Deliberate or unintentional distortion Few empirical studies (except on expenditure) to help adjust estimates

Sources of bias in recall
Who provides the information Under-estimation of small and routine expenditures “Telescoping” of recall concerning major expenditures Distortion to conform to accepted behavior: Intentional or unconscious Romanticizing the past Exaggerating (e.g. “We had nothing before this project came!”) Contextual factors: Time intervals used in question Respondents expectations of what interviewer wants to know Implications for the interview protocol

Improving the validity of recall
Conduct small studies to compare recall with survey or other findings. Ensure all relevant groups interviewed Triangulation Link recall to important reference events Elections Drought/flood/tsunami/war/displacement Construction of road, school etc

Key informants Not just officials and high status people
Everyone can be a key informant on their own situation: Single mothers Factory workers Users of public transport Sex workers Street children

Guidelines for key-informant analysis
Triangulation greatly enhances validity and understanding Include informants with different experiences and perspectives Understand how each informant fits into the picture Employ multiple rounds if necessary Carefully manage ethical issues

PRA and related participatory techniques
PRA (Participatory Rapid Appraisal) and PLA (Participatory Learning and Action) techniques collect data at the group or community [rather than individual] level Can either seek to identify consensus or identify different perspectives Risk of bias: If only certain sectors of the community participate If certain people dominate the discussion

Summary of issues in baseline reconstruction
Variations in reliability of recall Memory distortion Secondary data not easy to use Secondary data incomplete or unreliable Key informants may distort the past

2. Ways to reconstruct comparison groups
Judgmental matching of communities When there is phased introduction of project services beneficiaries entering in later phases can be used as “pipeline” comparison groups Internal controls when different subjects receive different combinations and levels of services

Using propensity scores to strengthen comparison groups
Propensity score matching Rapid assessment studies can compare characteristics of project and comparison groups using: Observation Key informants Focus groups Secondary data Aerial photos and GIS data

Issues in reconstructing comparison groups
Project areas often selected purposively and difficult to match Differences between project and comparison groups - difficult to assess whether outcomes were due to project interventions or to these initial differences Lack of good data to select comparison groups Contamination (good ideas tend to spread!) Econometric methods cannot fully adjust for initial differences between the groups [unobservables]

Enough of my presentations: it’s time for you (THE RealWorld PEOPLE
Enough of my presentations: it’s time for you (THE RealWorld PEOPLE!) to get involved yourselves.

Time for small-group work
Time for small-group work. Read your case studies and begin your discussions.

Small group case study work
Some of you are playing the role of evaluation consultants, others are clients commissioning the evaluation. Decide what your group will propose to do to address the given constraints/ challenges. Prepare to negotiate the ToR with the other group (later this afternoon).

The purpose of this exercise to gain some practical ‘feel’ for applying what we’ve learned about RealWorld Evaluation. Group A (consultants) to consider how they will propose a revised evaluation design and plan that reduces the budget by 25% to 50%, and yet meets the needs of both clients: the City Housing Department and the international donor Group B will also review the initial proposal in light of what they have learned about RealWorld Evaluation, and prepare to re-negotiate the plans with the consultancy group. Note: there are two types of clients: the Housing Department (project implementers) and the international donor (foundation). Groups are given 90 minutes to prepare their cases. Later the Consultants’ group will meet with Clients’ groups to negotiate their proposed revisions in the plans for this evaluation. 45 minutes will be available for those negotiation sessions.

Mixed-method evaluations
RealWorld Evaluation Designing Evaluations under Budget, Time, Data and Political Constraints Mixed-method evaluations

It should NOT be a fight between pure
QUANTITATIVE (numbers alone) QUALITATIVE (verbiage alone) OR Qualoid! Quantoid! 140

“Your numbers look impressive,
“Your human interest story sounds nice, but let me show you the statistics.” “Your numbers look impressive, but let me tell you the human interest story.” 141

What’s needed is the right combination of BOTH QUALITATIVE methods
AND QUANTITATIVE methods 142

Quantitative data collection methods
Structured surveys (household, farm, transport usage, etc) Structured observation Anthropometric methods Aptitude and behavioral tests Indicators that can be counted

Qualitative data collection methods Characteristics
The researcher’s perspective is an integral part of what is recorded about the social world Scientific detachment is not possible Meanings given to social phenomena and situations must be understood Programs cannot be studied independently of their context Difficult to define clear cause and effect Change must be studied holistically

Using Qualitative methods to improve the Evaluation design and results
Use recall to reconstruct the pre-test situation Interview key informants to identify other changes in the community or in gender relations Conduct interviews or focus groups with women and men to assess the effect of loans on gender relations within the household, such as changes in control of resources and decision-making identify other important results or unintended consequences: increase in women’s work load, increase in incidence of gender-based or domestic violence

Mixed method evaluation designs
Combine the strengths of both QUANT and QUAL approaches One approach ( QUANT or QUAL) is often dominant and the other complements it Can have both approaches equal but harder to design and manage. Can be used sequentially or concurrently

Extractive --- Quantitative Participatory --- Qualitative
Determining appropriate precision and mix of multiple methods High rigor, high quality, more time & expense Extractive --- Quantitative Participatory --- Qualitative Nutritional measurements HH surveys Focus Groups Nutritional measurements HH surveys Focus Groups Key Informant interviews Here we illustrate a sequential use of a mix of qualitative and quantitative methods, considering what levels of rigor would be appropriate for each. Beginning with discussions with representatives of the target community, there may be agreement on the need for household surveys to measure the nutritional status of children. The findings of the survey are then brought back to the community to involve them in the assessment and interpretation, and joint decisions as to what should be done in the future. Large group Low rigor, questionable quality, quick and cheap

Participatory approaches should be used as much as possible
but even they should be used with appropriate rigor: how many (and which) people’s perspectives contributed to the story? 148

Questions? 149

Time for consultancy teams to meet with clients to negotiate the revised ToRs for the evaluation of the housing project. 150

In conclusion: Evaluators must be prepared to:
Enter at a late stage in the project cycle; Work under budget and time restrictions; Not have access to comparative baseline data; Do without a feasible comparison groups; Work with very few well qualified evaluation researchers; Reconcile different evaluation paradigms and information needs of different stakeholders.

Main workshop messages
Evaluators must be prepared for RealWorld evaluation challenges. There is considerable experience to learn from. A toolkit of practical “RealWorld” evaluation techniques is available (see Never use time and budget constraints as an excuse for sloppy evaluation methodology. A “threats to validity” checklist helps keep you honest by identifying potential weaknesses in your evaluation design and analysis.

We hope these ideas will be helpful as you go forth and practice evaluation in the RealWorld!

Jim Rugh and Michael Bamberger

Similar presentations

Presentation on theme: "Jim Rugh and Michael Bamberger"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Jim Rugh and Michael Bamberger

Similar presentations

Presentation on theme: "Jim Rugh and Michael Bamberger"— Presentation transcript:

Similar presentations

About project

Feedback