Download presentation

Presentation is loading. Please wait.

Published byJaren Roose Modified over 2 years ago

1
NATIONAL DEFENSE INTELLIGENCE COLLEGE Measuring Forecaster Performance Lt Col James E. Kajdasz, Ph.D., USAF

2
NATIONAL DEFENSE INTELLIGENCE COLLEGE Scholarship of Intelligence Analysis “A comprehensive review of the literature indicates that while much has been written, largely there has not been a progression of thinking relative to the core aspect and competencies of doing intelligence analysis.” (Mangio & Wilkinson, 2008) “Do [they] teach structured methods because they are the best way to do analysis, or do they teach structured methods because that’s what they can teach?” (Marrin, 2009)

3
NATIONAL DEFENSE INTELLIGENCE COLLEGE Grade forecasters on % correct? judgments We could grade forecaster accuracy similar to a T/F test. (yes/no answers) –Will Qadhafi still be in Libya at this time next year? No –Will the government of Yemen fall in the next year? No –Will I still be driving my 2001 Corolla in the year 2020? Yes Wait until outcomes occur/don’t occur, and calculate percent of correct forecasts. Compare Forecaster A to Forecaster B by seeing who has the higher % correct.

4
NATIONAL DEFENSE INTELLIGENCE COLLEGE What about probabilistic judgments? When there is a high level of uncertainty, laypeople and even experts often qualify judgments. –Will Qadhafi still be in Libya at this time next year? No (70% confidence) –Will the government of Yemen fall in the next year? No (60% confidence) –Will I still be driving my 2001 Corolla in the year 2020? Yes (95% confidence)

5
NATIONAL DEFENSE INTELLIGENCE COLLEGE What about probabilistic judgments? 0.1.2.3.4.5.6.7.8.9 1.0 __ __ __ __ __ __ __ __ __ __ __ Impossible Highly unlikely Somewhat unlikely As likely as other two possibilities combined Somewhat likely Highly likely Certainty Tetlock, 2005

6
NATIONAL DEFENSE INTELLIGENCE COLLEGE Let’s Compare analysts… So which analyst performed best? It’s hard to say… We need a summary statistic to summarize total performance. Probability assigned EventOccurred?Analyst 1Analyst 2Analyst 3 1 No (0)000.1 2 Yes (1)0.90.7 3 No (0)0.10.30 4 Yes (1)0.70.5 5 Yes (1)0.911

7
NATIONAL DEFENSE INTELLIGENCE COLLEGE Mean Probability Score Probability Score or Brier Score –Estimate: Probability provided by forecaster.00 – 1.00 –Outcome: 0 (if event did not occur) 1 (if event did occur)

8
NATIONAL DEFENSE INTELLIGENCE COLLEGE Mean Probability Score Probability Score or Brier Score –Forecaster says 70% probability X will occur. –X occurs. –

9
NATIONAL DEFENSE INTELLIGENCE COLLEGE Mean Probability Score Mean Probability Score or Mean Brier Score

10
NATIONAL DEFENSE INTELLIGENCE COLLEGE Let’s Compare analysts… Probability assigned EventOccurred?Analyst 1Analyst 2Analyst 3 1 No (0)000.1 2 Yes (1)0.90.7 3 No (0)0.10.30 4 Yes (1)0.70.5 5 Yes (1)0.911 0.020.090.07

11
NATIONAL DEFENSE INTELLIGENCE COLLEGE Components of Total Forecaster Error Several things contribute to overall error, not all of which can be controlled by the forecaster. Discrimination Errors Calibration Errors Variance of the Outcome

12
NATIONAL DEFENSE INTELLIGENCE COLLEGE Decomposing Mean Probability Score Bias Slope ScatterVar(d)

13
NATIONAL DEFENSE INTELLIGENCE COLLEGE Decomposing PS: Bias Where: = Mean estimate = Mean outcome Arkes, Dawson, Speroff & et.al. (1995) Estimated Probability of Survival (f) Outcome Index (d)

14
NATIONAL DEFENSE INTELLIGENCE COLLEGE Decomposing PS: Slope Where: = Mean estimate when outcome was 1 = Mean estimate when outcome was 0 Arkes, Dawson, Speroff & et.al. (1995) Estimated Probability of Survival (f) Outcome Index (d)

15
NATIONAL DEFENSE INTELLIGENCE COLLEGE Decomposing PS: Scatter Where: = Variance when outcome was 1 = Variance when outcome was 0 Arkes, Dawson, Speroff & et.al. (1995) Estimated Probability of Survival (f) Outcome Index (d)

16
NATIONAL DEFENSE INTELLIGENCE COLLEGE Title Body Patients Doctors PS=.23 Bias=0.13 Slope=.13 Scat.=.05 PS=.18 Bias=-0.11 Slope=.26 Scat.=.05 Estimated Probability of Survival (f) Outcome Index (d) Arkes, Dawson, Speroff & et.al. (1995)

17
NATIONAL DEFENSE INTELLIGENCE COLLEGE Prediction Markets

18
NATIONAL DEFENSE INTELLIGENCE COLLEGE A-priori Hypotheses: H1: Discrimination will improve as the event nears – Slope measure will increase over time. H2: Scatter will decrease as the event nears – Scatter measure will get smaller over time. H3: Analysts will be biased toward predicting the status quo – Bias measure will be negative

19
NATIONAL DEFENSE INTELLIGENCE COLLEGE T-70 Days

20
NATIONAL DEFENSE INTELLIGENCE COLLEGE T-60 Days

21
NATIONAL DEFENSE INTELLIGENCE COLLEGE T-50 Days

22
NATIONAL DEFENSE INTELLIGENCE COLLEGE T-40 Days

23
NATIONAL DEFENSE INTELLIGENCE COLLEGE T-30 Days

24
NATIONAL DEFENSE INTELLIGENCE COLLEGE T-20 Days

25
NATIONAL DEFENSE INTELLIGENCE COLLEGE T-10 Days

26
NATIONAL DEFENSE INTELLIGENCE COLLEGE PS is a measure of overall error low PS is better Graph suggests curvilinear relationship with time Total Error over Time

27
NATIONAL DEFENSE INTELLIGENCE COLLEGE PS composed of Bias, Slope, Scatter, and Variance of the outcome Graph suggests decrease in error is primarily due to improvement in slope Slope is a measure of discrimination High slope is better Components of Error

28
NATIONAL DEFENSE INTELLIGENCE COLLEGE The observed slope was modeled. Curvilinear relationship modeled with Days and Days 2 Adj R 2 =.834, p=.01 H1 supported. Discrimination improves as date approaches..6.4.2 Slope.0 -.2 Modeling Slope Over Time

29
NATIONAL DEFENSE INTELLIGENCE COLLEGE Scatter is a measure of ‘spread’ of probability estimates. Slight linear trend not significant. H2 not supported. Scatter Over Time

30
NATIONAL DEFENSE INTELLIGENCE COLLEGE Bias Over Time Questions recoded such that probability ‘0’ represented a continuation of status-quo, and probability ‘1’ represents a change in status-quo Analysts were biased toward predicting a change in the status-quo – Indicated by positive bias numbers – t(6)=4.73, p <.01 H3 not supported. BUT significant results in the direction opposite that hypothesized. Linear trend over time not statistically significant.

31
NATIONAL DEFENSE INTELLIGENCE COLLEGE Lt Col James E. Kajdasz, Ph.D., USAF James.kajdasz@dia.mil The views expressed in this presentation are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.

Similar presentations

OK

1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.

1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google