ABS VV&A Framework Study Phase II Pythagoras COIN – Application of the Validation Framework Lisa Jean Moya WernerAnderson, Inc.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Chapter 2 The Process of Experimentation
Sample size estimation
1 Health Warning! All may not be what it seems! These examples demonstrate both the importance of graphing data before analysing it and the effect of outliers.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Designs to Estimate Impacts of MSP Projects with Confidence. Ellen Bobronnikov March 29, 2010.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
Copyright © 2011 by Pearson Education, Inc. All rights reserved Statistics for the Behavioral and Social Sciences: A Brief Course Fifth Edition Arthur.
Overarching Goal: Understand that computer models require the merging of mathematics and science. 1.Understand how computational reasoning can be infused.
The process of formulating responses remains
Assessing Program Impact Chapter 8. Impact assessments answer… Does a program really work? Does a program produce desired effects over and above what.
1 CS 106, Winter 2009 Class 2, Section 4 Slides by: Dr. Cynthia A. Brown, Instructor section 4: Dr. Herbert G. Mayer,
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 4: Modeling Decision Processes Decision Support Systems in the.
Chapter 11: Sequential Clinical Trials Descriptive Exploratory Experimental Describe Find Cause Populations Relationships and Effect Sequential Clinical.
Chapter 9 Audit Sampling: An Application to Substantive Tests of Account Balances McGraw-Hill/Irwin ©2008 The McGraw-Hill Companies, All Rights Reserved.
1 Validation and Verification of Simulation Models.
Lecture 10 Comparison and Evaluation of Alternative System Designs.
Horng-Chyi HorngStatistics II41 Inference on the Mean of a Population - Variance Known H 0 :  =  0 H 0 :  =  0 H 1 :    0, where  0 is a specified.
Chemometrics Method comparison
Example 10.1 Experimenting with a New Pizza Style at the Pepperoni Pizza Restaurant Concepts in Hypothesis Testing.
2006 Palisade User ConferenceNovember 14 th, 2006 Inventory Optimization of Seasonal Products with.
Determining Sample Size
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
14. Introduction to inference
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
Verification & Validation
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
Irregular Warfare Project MCCDC Operations Analysis Division (OAD) January 2008.
Chapter 8 Introduction to Hypothesis Testing
SIMULATION USING CRYSTAL BALL. WHAT CRYSTAL BALL DOES? Crystal ball extends the forecasting capabilities of spreadsheet model and provide the information.
GBA IT Project Management Final Project - Establishment of a Project Management Management Office 10 July, 2003.
Significance Tests: THE BASICS Could it happen by chance alone?
1 TenStep Project Management Process ™ PM00.7 PM00.7 Project Management Preparation for Success * Manage Risk *
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Government Procurement Simulation (GPSim) Overview.
CHAPTER 17: Tests of Significance: The Basics
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 7 Sampling Distributions.
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Time series Model assessment. Tourist arrivals to NZ Period is quarterly.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
BPS - 3rd Ed. Chapter 161 Inference about a Population Mean.
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
Inference: Probabilities and Distributions Feb , 2012.
Northrop Grumman Mission Systems The Pythagoras Counterinsurgency Application To Support The Marine Corps Irregular Warfare Study In Progress Review #5.
CHAPTER OVERVIEW Say Hello to Inferential Statistics The Idea of Statistical Significance Significance Versus Meaningfulness Meta-analysis.
Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
Chapter 9 Audit Sampling – Part a.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
The Practice of Statistics Third Edition Chapter 11: Testing a Claim Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
ABS VV&A Framework Study Phase II Project Overview Lisa Jean Moya WernerAnderson, Inc. Phase II Workshop 3 8 July /8/2008.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Northrop Grumman Mission Systems The Pythagoras Counterinsurgency Application To Support The Marine Corps Irregular Warfare Study ABSVal Workshop #4 Mr.
Chapter 8 Introducing Inferential Statistics.
Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.
Understanding Results
Irregular Warfare Project
Significance Tests: The Basics
Irregular Warfare Project
Inference on the Mean of a Population -Variance Known
Inferential Statistics
Biological Science Applications in Agriculture
Managerial Decision Making and Evaluating Research
Presentation transcript:

ABS VV&A Framework Study Phase II Pythagoras COIN – Application of the Validation Framework Lisa Jean Moya WernerAnderson, Inc. Phase II Workshop 3 9 July /9/ P-COIN Validation Briefing Wkshp3 Moya

Scenario 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 2

Analysis Context Can Pythagoras be used to model population dynamics? In a Disaster Relief/Humanitarian Assistance mission for the stated scenario, is it better to base the MAGTF ashore or afloat? Alternative selection drivers – Do no harm: create no increase in insurgency activity. – Improve the political situation: create an improvement in GOVT and Pro-GOVT sectors. Measures – Box & Whisker plot comparisons of the percent of population by population segment in each insurgency sector at end state (18 months) 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 3

Conceptual Model of Civilian Population 4 FARCPro-FARCNeutralPro-GoCGoC Insurgency Behavior Orientation Civilian Population Segments FARC = Revolutionary Armed Forces of Colombia GoC = Govt of Colombia Natural Drift Salience Influencing events 7/9/2008P-COIN Validation Briefing Wkshp3 Moya

7/9/2008P-COIN Validation Briefing Wkshp3 Moya 5

7/9/2008P-COIN Validation Briefing Wkshp3 Moya 6

Areas of Interest Core – MAGTF influence on Insurgency orientation Cases – MAGTF/No MAGTF – Ashore/Afloat Dynamic Influences – Natural Drift – Salience Background – Population segments 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 7

Areas of Interest Core – MAGTF influence on Insurgency orientation Cases – MAGTF/No MAGTF – Ashore/Afloat Dynamic Influences – Natural Drift – Salience Background – Population segments 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 8 First order assessment

Pythagoras-COIN Building Blocks 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 9

Salience as a Dynamic Influence 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 10

Salience as a Dynamic Influence 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 11

As the simulation might progress … 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 12 An extreme example to demonstrate the issue

Implications Influence changers only applied w.r.t. initial state; changes in orientation do not change the influence → Dynamic effects of salience and natural drift are not accounted for → Secondary and tertiary effects of MAGTF arrival not accounted for … Dampening on the insurgency orientation ! Risk is that the simulation does not model the desired population dynamics ! Risk is that the dynamics of the MAGTF arrival are not adequately captured 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 13

Data 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 14 "Humanitarian exchange." Wikipedia, The Free Encyclopedia. 3 Jul 2008, 10:26 UTC. Wikimedia Foundation, Inc. 9 Jul ?title=Humanitarian_exchange&oldid= Data imprecise – Mitigated by “tolerance” and multiple runs Data processing – Need to verify that process results in expected directional & magnitude shifts Data is perishable – Natural drift data has an embedded perishibility Would an actual model use require a “warm-up period” on the Markov Chain? – Other influencing events might significantly change the data values

Uses of the Model Markov assumptions in referent descriptions – No long term effects – May need a “warm-up” period Outside the salience or along with salience? Data precision – No exact results; results in the distribution Data perishability – Need to collect new data after significant events – Including MAGTF departure Were the dynamics captured … – Could add other influencing events Could add additional dynamics Q: Can we apply the influencers more robustly? Q: Would changing our initial starting agents help? 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 15

Analysis Results Given in box & whiskers plots at end state with data table No statistical comparisons No “hard” description of better Point estimate in time (18 months) 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 16

MAGTF Influence Data Expect: Sea better than shore – Urban Middle Class & Urban Poor “drive” result Catholic Church drive more right with Sea vs Shore Displaced Persons “wash”? Salience causes Urban Poor and Middle Class to be like Military; Military to be like Catholic Church – in opposition to the direct MAGTF influence … what would we expect? 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 17

Second Order Effects 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 18 What do we expect in the interactions

Earlier iteration (Military) 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 19 Not clear from documentation what is being reported These multiple influences appear to be captured – Presuming no population segment strays “too far” from initial state – Except … Military does! No MAGTF Afloat Ashore

20 Analysis Conclusions (IPR#5) “Which COA is better?” cannot be answered with much confidence “What is the chance that ashore is better than afloat?” can be answered with greater confidence – More pro-Government sentiment if Marines stay afloat – Lower pro-FARC sentiment if Marines stay afloat – Marine arrival has a polarizing effect (fewer neutrals) – Marine arrival in either case increases anti-Government sentiments of the Illicit Organizations and the Military Afloat seems to usually do less harm. – There is no factor in our influence estimation that BOTH reduces the negative impact of Ashore AND increase the negative impact of Afloat

21 Analysis Conclusions (IPR#5) (cont) Because the current Markov chain will eventually return to the same steady state, regardless of MAGTF action, once the MAGTF leaves, we need to consider: – Does the MAGTF commander care about leaving a lasting impression? – At what point in time do we measure ‘better’? – Pythagoras could change the final steady state as a function of one or more population segments exceeding or falling below some target value. However, this data was not collected

Validation Conclusions 1.The P-COIN simulation fails to capture the dynamic effects intended in the conceptual model of the insurgency in Colombia provided to the P-COIN developer. That is, P- COIN does not capture the secondary and tertiary effects of the natural drift of population segments between insurgency sectors or the salience between population segments resulting from the influencing event of the MAGTF. 2.The data supporting the P-COIN model is perishable and of low precision. Care should be taken when using the data beyond its origination date; perhaps “warming-up” the Markov chains supporting the data used to build the P-COIN model. Further, the data cannot be deemed valid if an influencing event occurs that would cause the base data used in this simulation to change. 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 22

Validation Conclusions (cont) 3.The P-COIN model should not be used to evaluate long term effects on the population resulting from the influencing event of the MAGTF arrival. 4.This model and simulation cannot be deemed as predictive of the actual population distributions amongst insurgency sectors in the event that the scenario described in the scenario documentation actually occurs. 5.There is little risk in using the results of the analysis since the analysis does not advocate a change in current Marine Corps procedure. However, item 1 implies that P-COIN also provides little insight into the ashore or afloat question in its current implementation. 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 23 Recommend: Applying influencers more robustly

What Would Be Useful Better documentation on the P-COIN instantiation Time series data A descriptive walk-thru of results charts (meaning & implications) Verification cases (isolated effects) to ensure dynamics have expected direction (first derivative) and order of magnitude – Descriptions of why we believe it is correct Referent – Better explanations of expected resulting effects from data values Most had to be inferred Order of magnitude differences unknown Expected interaction effects would be “spectacular” 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 24

Can the Results Be Trusted? Without trusting the dynamics – Take caution but … – Recommendation is innocuous Under current political circumstances – No … new data is required Can Pythagoras model population dynamics – Probably … more care is required in the instantiation 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 25

Levels of Validation Process 7/9/2008P-COIN Validation Briefing Wkshp3 Moya 26 Adapted from Harmon & Youngblood (2005) p. 186 Subjective validation Objective requirements Objective results Objective referent Automated validation Initial (level 0) Initial (level 0) Subjective (level 1) Subjective (level 1) Complete (level 2) Complete (level 2) Accurate (level 3) Accurate (level 3) Confident (level 4) Confident (level 4) Automated (level 5) Automated (level 5) Tier 0, “I have no idea.” Tier 1, “It works; trust me.” Tier 3, “It does the right things; its rep ns are complete enough.” Tier 4, “For what it does; its rep ns are accurate enough.” Tier 5, “I’m confident that this sim n is valid.” SME Ind nt Observer Formal Proof None SME opinion Single source Multiple sources Rigorously derived Correlated with statistical estimates of uncertainties Referent None Validated Analyzed Conceptual Model