Presentation is loading. Please wait.

Presentation is loading. Please wait.

Paul Bakker – Social Impact Squared

Similar presentations


Presentation on theme: "Paul Bakker – Social Impact Squared"— Presentation transcript:

1 Paul Bakker – Social Impact Squared
Differences in Using Common Metrics to Measure Business vs. Social Performance Paul Bakker – Social Impact Squared SIAA – Canada Launch Sept. 13, 2013

2 Profit Bottom Line Only (Traditional Business)
Theories of Change Profit Bottom Line Only (Traditional Business) Social Bottom Line Logic Model Investment for Cost of Production Inputs Income (Donations/Grants) Manufacturing, Marketing, Sales, etc Distribution of Needed Goods, Social Services Activities Recognize that this is a simplification, but I think the main points apply. Now let’s explore the similarities and difference between a profit logic model and a social change logic model: The Key differences relate to Outputs and Outcomes However, the goal of making profit stops at the provision of outputs. Whether the products or services have positive effects on society are only a concern if negative results will discourage future sales. Traditional business theory assumes that a business endeavour has positive social value if people are willing to pay for it. This assumption freed traditional business from having to pay much attention to or measure their social impact. However, we can think of many examples of highly purchased products that have negative or questionable social consequences (i.e. tobacco, certain t.v. programming, guns, etc.) When you are trying to achieve social change, outputs are only important if they lead to outcomes; that is, positive changes in people’s lives. So, the social change logic model requires us to pay attention to and measure outcomes rather than just outputs. Another important difference is that goal of the traditional business model is always the same (profit), which facilitates benchmarking against other businesses. For social change efforts, the goal can take any number of forms, making benchmarking more difficult. # of Goods and Services Sold (Income, Profit) Outputs # of people served Goal Positive Change in People’s Lives Goal Assumed Positive Outcomes

3 Measuring Outputs vs. Outcomes
Directly observable Count how many people bought your products or services. Record how much they cost you to produce. Record how much people paid for them. Outcomes Not directly observable Hard to observe changes in ideas, attitudes, and beliefs Often lose contact with clients Observed changes might not be caused by you Why can’t business just apply business’ knowledge of measuring outputs to measuring outcomes, because, unlike outputs, outcomes are not directly observable. It is harder to observe what is going on inside people’s head, although market researchers have these skills. Social programs often lose contact with clients after their programs are over. Consider a program trying to prevent at-risk youth from re-engaging with the justice system. These youth often want their actions to remain unobserved and their families may move frequently, and privacy of information makes it hard to obtain data from institutions. All of this makes it hard to observe what happened to youth after they left the program. Even if you are able to observe what changed in clients’ lives, it is hard to attribute those changes to your actions. Consider the youth crime prevention example: Did the youth stop engaging in criminal activity because of your program, a change in schools, a change in parenting, other community programs, etc. The key to figuring out how much of the observed change is attributable to you is to estimate what would have occurred if you didn’t provide your goods or services. In technical terms, this is called the counterfactual. I don’t think traditional business performance measurement has paid much attention to establishing the counterfactual.

4 How to Describe Alternative Reality (i.e. the counterfactual)
It’s all about making comparisons: To similar groups that didn’t receive your goods/services (equivalent or nonequivalent comparison groups) To the past (pre-post, longitudinal, time-series) To groups that are different in known ways (cut-off score designs, regression point displacement) To what statistics predict would have happened To what experts think would have happened (including clients and other stakeholders) So the additional challenge of the social impact analysts is to somehow come up with an accurate description of what would have occurred in an alternative dimensions where clients did not receive your goods or services There are many different ways to estimate the counterfactual. We don’t have time to review all of them in detail. For now, it is important to highlight that the likelihood of obtaining accurate estimates varies by methods. From my observations, the field of evaluation (at least in Canada) has come to a general consensus that the best methods depend on the context of the program and focus of the evaluation. Examples if needed Let’s consider our example of a program that is trying to prevent at-risk youth from engaging in the justices system. Let’s also say that many of the youth are already involved in criminal activity. What comparison should we use? Randomization – morally opposed, contamination, youth might not want to show if friends can’t come, poor generalizibility. Comparison group – Difficult to find other crime involved youth that you aren’t giving service to, that will be willing to let you collect data from them, and that are meaningfully different from the group in your program. Pre-post: not meaningful in this case, you are trying to prevent something from happening, not change an existing characteristics. Statistical predictions: maybe, but it is very hard to create tools that can accurately predict people’s behaviour. RD: Maybe, would need hundreds of youth in sample. Selection would have to be based on cut-off. What if the program costs $100,000 to run, do you want to spend more than that to measure it’s impact? Another example: Research has shown that the literacy and essential skills of those in training programs often can improve in the short-term, but not in comparison to similar people not in the training program (as their skills tended to improve too). What did improve was literacy habit like reading books more often, which after many year’s lead to better outcomes than those that did not take the training. But do we want to require every literacy training organization to undertake an expensive multi-year longitudinal study of their social impact? After a few strong studies, I would suggest that weak designs would give enough confidence that things are working.

5 The State of Social Impact Measurement
There are different ways to attribute observed change to social programs, some of which are complex and expensive. The most appropriate methods depend on the context of the program. The most frequently used methods provide estimates that have a good chance of being off the mark. So what methods to estimate the counterfactual are typically being used. Most frequent: The 2012 State of Evaluation found that 65% of nonprofit organization survey said they used before and after measures, while only 6% said they used quasi-experimental designs or control groups, and only 4% said they used randomized control trials. My description of the state of social impact measurement doesn’t sound so good does it? I want to assure you that I believe that social impact analysis can have tremendous value, but it does have it’s limitations, and we need to be aware of them.

6 Social Impact Common Measurement Systems
Some want to be able to assess performance of social investments like they assess the performance of financial investments. To date, developed good activity & output metrics that are standardized and comparable across programs. But, are challenged to incorporate outcome measurement. Examples of Social Impact Common Measurement Systems: Impact Reporting and Investment Standards Global Impact Investing Rating System Charity Navigator Now, let’s change focus a bit and review efforts to create common measurement system for social impact. I am not going to over all these systems in detail right now. Maybe we can explore the details during the discussion that we’ll have soon. For now, here are the points I want to make. The tools of financial performance measurement serve us well when developing systems to compare activities and outputs, but what really matters is outcomes, and each of these systems could do a better job of incorporating outcome measurement into their systems. Part of the reason the systems currently don’t have a strong focus on outcome measurement, is that outcome measurement data in any form is not consistently available throughout the system. The different systems deal with the challenge of outcome measurement data in different ways, and some are taking action to encourage social impact reporting through the social sector.

7 Comparability of Outcome Metrics
Even if social outcome data was readily available, it wouldn’t be fully comparable. Differences in attribution methods (counterfactual estimates). Measurement tools like surveys have different degrees of error and biases can vary by cultural groups and contexts. The systems do not prescribe study designs; as they shouldn’t. Even if used the same methods, results can still not be fully comparable. For example, social desirability biases in surveys have been found to be higher in more collectivistic cultures. So, if surveying is used to estimate social impact, programs serving different ethnic groups might look like they have different levels of social impact, when in fact they don’t.

8 Where Do We Go From Here Giving that outcome measures are estimates and never fully comparable, what should we do with systems design to compare social performance? Abandon them? Nothing, they are fine the way they are? Improve them? How? Let’s Discuss! Nothing – am I making a mountain out of a mole hill? My position is we have to improve them.

9 Recognizing Uncertainty
Program 1 Program 2 SROI of 1:16 or GIIRS score of 140 Average Level of Evidence of Effective of Program Components: 6 SROI of 1:4 or GIIRS score of 95 Average Level of Evidence of Effective of Program Components: 2.75 Levels of Evidence 1 Systematic review of randomized controlled trials 2 At least one randomized controlled trial 3 Multiple well-designed quasi-experimental study 4 At least one well-designed quasi-experimental study 5 Descriptive studies, such as correlation studies 6 Stakeholder/expert opinion 7 No good evidence There are different approaches that we can take, but if we want to develop performance dashboards like those used in the financial sector, we need to incorporate measurement error into the dashboard. There needs to be a uncertainty metric. Here is an example of what it could look like if we recognized uncertainty in common performance systems for social impact This will help: Risk-taking or cautious investors make investing decisions. Discourage use of weak attribution methods that are more likely to provide over-estimates of impact At the same time, will allow for risk-taking investors to invest in innovative and promising approaches that have yet to be tested with more rigorous impact measurement approaches. What do you think of this approach?


Download ppt "Paul Bakker – Social Impact Squared"

Similar presentations


Ads by Google