If you fix everything you lose fixes for everything else Tim Menzies (WVU) Jairus Hihn (JPL) Oussama Elrawas (WVU) Dan Baker (WVU) Karen Lum (JPL) International.

Slides:

Advertisements

Similar presentations

Unsolved Problems and Issues Related to the SLEUTH Urban Growth and Land Use Change Model Keith C. Clarke University of California, Santa Barbara SLEUTH.

Advertisements

Design of Experiments Lecture I

CHAPTER 2 ALGORITHM ANALYSIS 【 Definition 】 An algorithm is a finite set of instructions that, if followed, accomplishes a particular task. In addition,

Michele Samorani Manuel Laguna. PROBLEM : In meta-heuristic methods that are based on neighborhood search, whenever a local optimum is encountered, the.

Imbalanced data David Kauchak CS 451 – Fall 2013.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Menzies/Hihn - 1 STAR Seeking New Frontiers in Cost Modeling Tim Menzies (WVU) Jairus Hihn (JPL) Oussama Elrawas (WVU) Karen Lum (JPL) Dan Baker (WVU)

SE 450 Software Processes & Product Metrics 1 Introduction to Quality Engineering.

Chapter 1 DECISION MODELING OVERVIEW. MGS 3100 Business Analysis Why is this class worth taking? –Knowledge of business analysis and MS Excel are core.

An Analysis and Survey of the Development of Mutation Testing by Yue Jia and Mark Harmon A Quick Summary For SWE6673.

L.M. McMillin NOAA/NESDIS/ORA Regression Retrieval Overview Larry McMillin Climate Research and Applications Division National Environmental Satellite,

CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.

©GoldSim Technology Group LLC., 2012 Optimization in GoldSim Jason Lillywhite and Ryan Roper June 2012 Webinar.

Gizem ALAGÖZ. Simulation optimization has received considerable attention from both simulation researchers and practitioners. Both continuous and discrete.

Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)

Simulation Where real stuff starts. ToC 1.What, transience, stationarity 2.How, discrete event, recurrence 3.Accuracy of output 4.Monte Carlo 5.Random.

Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 4: Modeling Decision Processes Decision Support Systems in the.

Discrete-Event Simulation: A First Course Steve Park and Larry Leemis College of William and Mary.

Introduction to Quality Engineering

Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Statistics Micro Mini Threats to Your Experiment!

University of Southern California Center for Systems and Software Engineering 1 © USC-CSSE A Constrained Regression Technique for COCOMO Calibration Presented.

“2cee” A 21 st Century Effort Estimation Methodology Tim Menzies Dan Baker Jairus Hihn Karen Lum

+ Doing More with Less : Student Modeling and Performance Prediction with Reduced Content Models Yun Huang, University of Pittsburgh Yanbo Xu, Carnegie.

How to Stall a Motor: Information-Based Optimization for Safety Refutation of Hybrid Systems Todd W. Neller Knowledge Systems Laboratory Stanford University.

Descriptive Modelling: Simulation “Simulation is the process of designing a model of a real system and conducting experiments with this model for the purpose.

Part I: Classification and Bayesian Learning

Expert Systems Infsy 540 Dr. Ocker. Expert Systems n computer systems which try to mimic human expertise n produce a decision that does not require judgment.

System Modeling Nur Aini Masruroh.

Chapter 5 Modeling & Analyzing Inputs

Software Testing. Definition To test a program is to try to make it fail.

Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University

1SAS 03/ GSFC/SATC- NSWC-DD System and Software Reliability Dolores R. Wallace SRS Technologies Software Assurance Technology Center

 1  Outline  stages and topics in simulation  generation of random variates.

Uncertainty Analysis and Model “Validation” or Confidence Building.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory Mixed Integer Problems Most optimization algorithms deal.

Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.

Quality Control Lecture 5

MBA7025_01.ppt/Jan 13, 2015/Page 1 Georgia State University - Confidential MBA 7025 Statistical Business Analysis Introduction - Why Business Analysis.

Chapter 22 Developer testing Peter J. Lane. Testing can be difficult for developers to follow  Testing’s goal runs counter to the goals of the other.

MGS3100_01.ppt/Aug 25, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Introduction - Why Business Analysis Aug 25 and 26,

The Art of Estimating Programming Tasks Adriana Lopez Development Director Dragon Age II.

Simulation is the process of studying the behavior of a real system by using a model that replicates the behavior of the system under different scenarios.

MBA7020_01.ppt/June 13, 2005/Page 1 Georgia State University - Confidential MBA 7020 Business Analysis Foundations Introduction - Why Business Analysis.

Chapter 10 Verification and Validation of Simulation Models

Building Simulation Model In this lecture, we are interested in whether a simulation model is accurate representation of the real system. We are interested.

Simulation is the process of studying the behavior of a real system by using a model that replicates the system under different scenarios. A simulation.

CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)

Trust Me, I’m Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster Shengliang Dai.

Machine Learning Concept Learning General-to Specific Ordering

1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.

Research Heaven, West Virginia Martha: a next generation testable language Tim Menzies West Virginia University World’s worst.

BME 353 – BIOMEDICAL MEASUREMENTS AND INSTRUMENTATION MEASUREMENT PRINCIPLES.

CISC Machine Learning for Solving Systems Problems Microarchitecture Design Space Exploration Lecture 4 John Cavazos Dept of Computer & Information.

CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.

Data Mining What is to be done before we get to Data Mining?

Unconstrained Submodular Maximization Moran Feldman The Open University of Israel Based On Maximizing Non-monotone Submodular Functions. Uriel Feige, Vahab.

Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.

8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.

Robust Optimisation of Processes and Products by Using Monte Carlo Simulation Experiments Robert Anderson – JMP.

Chapter 10 Verification and Validation of Simulation Models

Dr. Unnikrishnan P.C. Professor, EEE

20th International Forum on COCOMO and Software Cost Modeling

Operations Analysis Division Marine Corps Combat Development Command

Area Coverage Problem Optimization by (local) Search

Dr. Arslan Ornek MATHEMATICAL MODELS

Introduction to Decision Sciences

Working Scientifically

Presentation transcript:

If you fix everything you lose fixes for everything else Tim Menzies (WVU) Jairus Hihn (JPL) Oussama Elrawas (WVU) Dan Baker (WVU) Karen Lum (JPL) International Workshop on Living with Uncertainty, IEEE ASE 2007, Atlanta, Georgia, Nov 5, 2007 This work was conducted at West Virginia University and the Jet Propulsion Laboratory under grants with NASA's Software Assurance Research Program. Reference herein to any specific commercial product, process, or service by trademark, manufacturer, or otherwise, does not constitute or imply its endorsement by the United States Government.

2 What does this mean? Q: for what models does (a few peeks) = (many hard stares)? A supposedly np-hard task abduction over first- order theories nogood/2

3 A: models with “collars” Grow – Monte Carlo a model Picking input settings at random – For each run Score each output Add score to each input settings Harvest – Rule generation experiments, favoring settings with better scores If “collars”, then – … small rules … – … learned quickly … – … will suffice “Collar” variables set the other variables – Narrows Amarel in the 60s – Minimal environments DeKleer ’85 – Master variables Crawford & Baker ‘94 – Feature subset selection Kohavi & John ‘97 – Back doors Williams et al ‘03 – Etc Implications for uncertainty? “Collar” variables set the other variables – Narrows Amarel in the 60s – Minimal environments DeKleer ’85 – Master variables Crawford & Baker ‘94 – Feature subset selection Kohavi & John ‘97 – Back doors Williams et al ‘03 – Etc Implications for uncertainty? Feather & Menzies RE’02

4 STAR: collars + simulated annealing on Boehm’s USC’s software process models USC software process models for effort, defects, threats – y[i] = impact[i] * project[i] + b[i] for i  {1,2,3,…} –  ≤ project[i] ≤  : uncertainty in project description –  ≤ impact[i] ≤  : uncertainty in model calibration Random solution – pick project[i] and impact[i] from any .. , ..  – ..  set via domain knowledge; e.g. process maturity in 3 to 5 – range of ..  known from history; Score solution by effort (Ef), defects (De) and Threat (Th) For example uncontrollable controllable

5 Two studies y[i] = impact[i] * project[i] + b[i] Certain methods – Using much historical data – Learn the magnitude of the impact[i] relationship – With fixed impact[I] Monte Carlo at andom across the project[i] settings E.g. – Regression-based tools that learn impact[I] from historical records – 93 records of JPL systems – SCAT: JPL’s current methods – 2CEE: WVU’s improvement over SCAT (currently under test) Methods with more uncertainty – Using no historical data – Monte Carlo at random across the project[i] settings and impact[i] settings E.g. – STAR – Monte Carlo a model – Score each output – Sort settings by their “C”, “C”= cumulative score – Rule generation experiments, favoring settings with better “C”. Methods with more uncertainty – Using no historical data – Monte Carlo at random across the project[i] settings and impact[i] settings E.g. – STAR – Monte Carlo a model – Score each output – Sort settings by their “C”, “C”= cumulative score – Rule generation experiments, favoring settings with better “C”. Tame uncontrollables via historical records one two

6 for setting  S x { value[setting] += E } Sort all settings by their value – Ignore uncontrollables impact[I] – Assume the top (1 ≤ i ≤ max) project[I] settings – Randomly select the rest “Policy point” : – smallest I with lowest E Median = 50% percentile – Spread = (75-50)% percentile Bad Good 22 good ideas 38 not-so- good ideas Inside STAR 1. sampling - simulated annealing 2. summarizing - post-processor

7 SCAT vs 2CEE vs STAR project[i]

8 SCAT vs 2CEE vs STAR project[i] Control impact[I] via historical data

9 SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Control impact[I] via historical data

10 Median: 50% point Spread : ( )% Median: 50% point Spread : ( )% SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Control impact[I] via historical data

11 Median: 50% point Spread : ( )% Median: 50% point Spread : ( )% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% SCAT vs 2CEE vs STAR project[i] Stagger around superset of possible impact[I] Control impact[I] via historical data

12 STAR/2cee= 400/1600= 25% STAR/scat= 400/1900= 21% STAR/2cee= 400/1600= 25% STAR/scat= 400/1900= 21% Median: 50% point Spread : ( )% Median: 50% point Spread : ( )% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% SCAT vs 2CEE vs STAR project[i] STAR/2cee= 180/ 400= 45% STAR/scat= 180/1900= 60% STAR/2cee= 180/ 400= 45% STAR/scat= 180/1900= 60% Stagger around superset of possible impact[I] Control impact[I] via historical data

13 STAR/2cee= 400/1600= 25% STAR/scat= 400/1900= 21% STAR/2cee= 400/1600= 25% STAR/scat= 400/1900= 21% Median: 50% point Spread : ( )% Median: 50% point Spread : ( )% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% SCAT vs 2CEE vs STAR project[i] STAR/2cee= 180/ 400= 45% STAR/scat= 180/1900= 60% STAR/2cee= 180/ 400= 45% STAR/scat= 180/1900= 60% Stagger around superset of possible impact[I] Control impact[I] via historical data

14 STAR/2cee= 400/1600= 25% STAR/scat= 400/1900= 21% STAR/2cee= 400/1600= 25% STAR/scat= 400/1900= 21% Median: 50% point Spread : ( )% Median: 50% point Spread : ( )% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% SCAT vs 2CEE vs STAR project[i] STAR/2cee= 180/ 400= 45% STAR/scat= 180/1900= 60% STAR/2cee= 180/ 400= 45% STAR/scat= 180/1900= 60% Stagger around superset of possible impact[I] Control impact[I] via historical data Ignoring historical data is useful (!!!?)

15 STAR/2cee= 400/1600= 25% STAR/scat= 400/1900= 21% STAR/2cee= 400/1600= 25% STAR/scat= 400/1900= 21% Median: 50% point Spread : ( )% Median: 50% point Spread : ( )% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% SCAT vs 2CEE vs STAR project[i] STAR/2cee= 180/ 400= 45% STAR/scat= 180/1900= 60% STAR/2cee= 180/ 400= 45% STAR/scat= 180/1900= 60% Stagger around superset of possible impact[I] Control impact[I] via historical data Ignoring historical data is useful (!!!?)

16 STAR/2cee= 400/1600= 25% STAR/scat= 400/1900= 21% STAR/2cee= 400/1600= 25% STAR/scat= 400/1900= 21% Median: 50% point Spread : ( )% Median: 50% point Spread : ( )% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% STAR/2cee= 50/ 800= 6% STAR/scat= 50/1300= 4% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% STAR/2cee= 30/620= 5% STAR/scat= 30/730= 4% SCAT vs 2CEE vs STAR project[i] STAR/2cee= 180/ 400= 45% STAR/scat= 180/1900= 60% STAR/2cee= 180/ 400= 45% STAR/scat= 180/1900= 60% Stagger around superset of possible impact[I] Control impact[I] via historical data If you fix everything, you lose fixes for everything else Ignoring historical data is useful (!!!?)

Luke, trust the force, I mean, collars IEEE Computer, Jan 2007 “The strangest thing about software”

Extra Material

19 Related work Feather, DDP, treatment learning – Optimization of requirement models XEROC PARC, 1980s, qualitative representations (QR) – not overly-specific, – Quickly collected in a new domain. – Used for model diagnosis and repair – Can found creative solutions in larger space of possible qualitative behaviors, than in the tighter space of precise quantitative behaviors Abduction : – World W = minimal set of assumptions (w.r.t. size) such that T  A => G Not(T U A => error) – Framework for validation, diagnosis, planning, monitoring, explanation, tutoring, test case generation, prediction,… – Theoretically slow (NP-hard) but this should be practical: Abduction + stochastic sampling Find collars Learn constraints on collars Abduction : – World W = minimal set of assumptions (w.r.t. size) such that T  A => G Not(T U A => error) – Framework for validation, diagnosis, planning, monitoring, explanation, tutoring, test case generation, prediction,… – Theoretically slow (NP-hard) but this should be practical: Abduction + stochastic sampling Find collars Learn constraints on collars

20 Possible optimizations (not used here) STAR, an example of a general process: – Stochastic sampling – Sort settings by “value” – Rule generation experiments favoring highly “value”-ed settings See also, elite sampling in the cross-entropy method If SA convergence too slow – Try moving back select into the SA; – Constrain solution mutation to prefer highly “value”-ed settings BORE (best or rest) – n runs – Best= top 10% scores – Rest = remaining 90% – {a,b} = frequency of discretized range in {best, rest – Sort settings by -1 * (a/n) 2 / (a/n + b/n) Other valuable tricks: – Incremental discretization: Gama&Pinto’s PID + Fayyad&Irani – Limited discrepancy search: Harvey&Ginsberg – Treatment learning: Menzies&Yu BORE (best or rest) – n runs – Best= top 10% scores – Rest = remaining 90% – {a,b} = frequency of discretized range in {best, rest – Sort settings by -1 * (a/n) 2 / (a/n + b/n) Other valuable tricks: – Incremental discretization: Gama&Pinto’s PID + Fayyad&Irani – Limited discrepancy search: Harvey&Ginsberg – Treatment learning: Menzies&Yu Ask me why, off-line

“Uncertainty helps planning” (questions? comments?)

22 At the “policy point”, STAR’s random solutions are surprisingly accurate LC : learn impact[i] via regression (JPL data) STAR: no tuning, randomly pick impact[i] Diff = ∑ mre(lc)/ ∑ mre(star) Mre = abs(predicted - actual) /actual { “ ” “  ”} same at {95, 99}% confidence (MWU) Why so little Diff (median= 75%)? – Most influential inputs tightly constrained diff same diff same diff same ∑ mre(lc) / ∑ mre(star)strategictactical ground 66% 63% all 91% 75% OSP2 99% 125%  OSP 112%  111%  flight 101%  121% 

23 (Model uncertainty = collars) << inputs In many models, a few “collar” variables set the other variables – Narrows (Amarel in the 60s) – Minimal environments (DeKleer ’85) – Master variables (Crawford & Baker ‘94) – Feature subset selection (Kohavi & John ‘97) – Back doors (Williams et al ‘03) – See “The Strangest Thing About Software (IEEE Computer, Jan’07)” Collars appear in all execution traces (by definition) – You don’t have to find the collars, they’ll find you So, to handle uncertainty – Write a simulator – Stagger over uncertainties – From stagger, find collars – Constrain collars This talk: a very simple example of this process

24 Comparisons Standard software process modeling – Models written more than run (PROSIM community) Limited sensitivity analysis Limited trade space – Or, expensive, error-prone, incomplete data collection programs Point solutions Here: – No data collection – Found stable conclusions within a space of possibilities – Search : very simple – Solution, not brittle With trade-off space 22 good ideas, sorted

25 Summary Living with uncertainty – Sometimes, simpler than you may think – more useful than you might think Simple: – Here, the smallest change to simulating annealing Useful: – Sometimes uncertainty can teach you more than certainty – If you fix everything, you lose fixes to everything else Collars control certainty – Uncertainty plus constrained collars  more certainty – Also, can drive model to better performance An example you can explain to any business user Bad Good 22 good ideas, sorted An example you can explain to any business user