# Cost Drivers Learning Event, 2 nd November 2005 1 Correlation Tutorial Raymond Covert, MCR, LLC Timothy Anderson, The Aerospace Corporation.

## Presentation on theme: "Cost Drivers Learning Event, 2 nd November 2005 1 Correlation Tutorial Raymond Covert, MCR, LLC Timothy Anderson, The Aerospace Corporation."— Presentation transcript:

Cost Drivers Learning Event, 2 nd November 2005 1 Correlation Tutorial Raymond Covert, MCR, LLC (rcovert1@cox.net) Timothy Anderson, The Aerospace Corporation (Timothy.P.Anderson@aero.org) This tutorial was developed by the authors at: The Aerospace Corporation 15049 Conference Center Drive, Suite 600 Chantilly, VA 20151 Copyright © 2004 The Aerospace Corporation

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 2 Outline 1. Introduction to Correlation in Risk Analysis Introduction to Correlation in Risk Analysis 2. A Statistical View of Cost Analysis A Statistical View of Cost Analysis 3. Types of Correlation Types of Correlation 4. The Correlation Matrix The Correlation Matrix 5. Deriving Correlation Coefficients Deriving Correlation Coefficients

Cost Drivers Learning Event, 2 nd November 2005 3 Part 1 Introduction to Correlation in Risk Analysis Purpose of Section - To Answer These 6 Questions About Correlation: 1. Who Should Understand It 2. What Is It 3. Why Is It Used 4. Where Is It Used 5. When Is It Used 6. How Is It Used

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 4 Who and What Who Should Understand Correlation ? Correlation should be understood by all cost analysts performing quantitative cost risk analysis. What is Correlation? Ref. 1 A measure of association between two variables. It measures how strongly the variables are related, or change, with each other. If two variables tend to move up or down together, they are said to be positively correlated. If they tend to move in opposite directions, they are said to be negatively correlated. The most common statistic for measuring association is the Pearson correlation coefficient,  P. 1) www.statlets.com/usermanual/glossary.htmwww.statlets.com/usermanual/glossary.htm

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 5 Correlation in Risk Analysis (2) Why is correlation used? To quantify the effects of statistical dependence when performing algebra on random variables. It has a large impact on the statistical properties of the results, particularly when many random variables are involved. Example: Dice Roll. What happens when we roll 2 dice and add their result? Assume 3 cases:  Case 1: Uncorrelated. Outcome of 1 die is independent from the other.  Case 2: Negatively correlated. Outcome of 1 die relate to the outcome of the other. If one die is a “6”, the other must be “1”.  Case 3: Positively correlated. Outcome of 1 die is same as the other.

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 6 Example: Dice Roll Roll of the die gives an equal chance of getting an outcome (1,2,3,4,5 or 6) Equal, discrete probability Uniform discrete distribution of probabilities Variance,  2 = 3.5 What happens when we sum 2 correlated dice? Roll of Die 0 0.2 0.4 0.6 0.8 1 123456 Roll of Die Probability Probability of x, P(x) = 1/6

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 7 Example: Dice Roll Sum of Dice: Uncorrelated 0 0.2 0.4 0.6 0.8 1 23456789101112 Sum of Dice Probability Sum of Dice: Correlation =+1 0 0.2 0.4 0.6 0.8 1 23456789101112 Sum of Dice Probability Sum of Dice: Correlation =-1 0 0.2 0.4 0.6 0.8 1 23456789101112 Sum of Dice Probability Case 1:  = 0 Case 2:  = -1 Case 2:  = +1 Triangular, discrete shape Moderate variance,  2 =6 Mean = 7 P(7) = 1 P(<>7)=0 No variance,  2 =0 Mean = 7 Uniform, discrete shape P(each even)=1/6, P(odd) =0 Wide variance,  2 =14 Mean = 7 2 + 1 = 3 5 + 2 = 7 2 + 2 = 4

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 8 What We Learned From Dice Roll What we learned about the effects of correlation on sums of dice: It affects the variance and shape It doesn’t affect the mean  = 0 changes shape to a discrete triangular distribution  =-1 changes shape and removes variance  =+1 preserves shape, adds the most variance, and is the same as multiplying by 2 The sum of dice example used a discretely distributed random variable, but the same rules apply for continuously distributed random variables. Uniform Triangular Normal Lognormal Weibull

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 9 Where and When Correlation Is Used Where is correlation used? When performing algebra on random variables. Quantifying the effects of random variables in cost estimates.  Summing costs of WBS elements.  Multiplying costs by random variables (i.e. Inflation).  In exponentiation of one random variable with another (i.e. learning curves). When is correlation used? Whenever we have random variables in our estimates. When we use Monte Carlo Simulations. In analytic statistical sums.

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 10 How Correlation Is Used Directly Through a correlation matrix, Indirectly By neglecting correlation, we are defining  = 0 By “reusing” random variables, we are defining  = 1  Example: We define inflation as a random variable and use the same random variable throughout our cost estimate By multiplying random variables by a constant, we are defining  = 1  Example: We define spacecraft weight as a random variable and use fractions of it to define weights of different subsystems.

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 11 Part 1 Summary All cost analysts performing quantitative cost risk analysis should understand correlation Correlation measures how strongly the variables are related, or change, with each other. Correlation affects the variance and shape, but not the mean We use correlation frequently, but may not even know it

Cost Drivers Learning Event, 2 nd November 2005 12 Part 2 A Statistical View of Cost Analysis Purpose of Section: To Understand the following 1. Costs are uncertain quantities 2. Costs can be treated as random variables 3. How correlation affects variance

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 13 A Statistical View of Cost Analysis WBS Element Costs are Uncertain Quantities That Have “Probability” Distributions And Statistical Characteristics such as Mean, Median, Mode Costs are Random Variables Our Goal in Cost Risk Analysis is to Combine Element Cost Distributions to Generate Probability Distribution of Total Cost Use Monte-Carlo or Other Statistical Procedure Quantify Confidence in “Best” Estimate of Total Cost, e.g., Mode Read off Mean, 50th Percentile Cost, 70th Percentile Cost, etc., from Cumulative Distribution to Estimate Amount of Risk Dollars Needed

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 14 Elements of Risk Model Cost Drivers Risk Drivers Assumptions Cost Estimate Model Quantified Risk Our Cost / Risk Model Quantifies: Costs Effects of Uncertainty Uncertainty in Program Assumptions Risks to Program

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 15 Example Cost Drivers Component, Assembly, Propellant Weights Cooling Requirements Data-Processing Requirements Power Requirements Solar Array Area Orbit Altitude Thrust Requirements Special Mission Equipment

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 16 Example Risk Drivers Beyond-State-of-the-Art Technology Cooling Processing Survivability Power Laser Communications Unusual Production Requirements Large Quantities (Space Systems) Toxic Materials Yields Tight Schedules Undeveloped Technology Software Development Supplier Viability System Integration Multi-contractor Teams System Testing Limited Resources Program Funding Stretch- Out Premature Commitment to RDT&E Phase Unforeseen Events

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 17 Cost-element Probability Distributions Best Estimate Low Risk High Risk Low Cost, High Risk vs. High Cost, Low Risk Best Estimate Narrow Symmetric distribution: equal Probability of actual cost higher or lower than best estimate Wide, Right Skewed distribution Lower point estimate, but high probability of actual cost greater than point estimate These curves tell two very different stories Would you believe both could come from the same estimate? mode = mean mode

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 18 Correlation Affects the Variance are Costs of WBS Elements (Random Variables) and n = number of WBS elements Total Cost = Mean of Total Cost= Variance of Total Cost = = A Very Important Relationship

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 19 Variance Measures Dispersion Small Large XX Area = 1.00 XX

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 20 Does Correlation Matter? If WBS-Element Costs are Uncorrelated (all  ij = 0 ), Variance of Total Cost = If WBS-Element Costs are Correlated, Variance of Total Cost = – Positive Correlations Increase Dispersion – Negative Correlations Reduce Dispersion If (“Worst” Case) All Correlations, Variance of Total Cost = “Ignoring” Correlation Issue is Tantamount to Setting all  ij = 0

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 21 Yes, Correlation Matters Suppose for Simplicity There are n Cost Elements Each Total Cost

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 22 Magnitude of Correlation Impact Percent Underestimation of Total-Cost Sigma When Correlation Assumed to be 0 instead of  is 100% times... Percent Overestimation of Total-Cost Sigma When Correlation Assumed to be 1 instead of  is 100% times...

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 23 Maximum Possible Underestimation of Total-cost Sigma Percent Underestimated * 100% When Correlation Assumed to be 0 Instead of 

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 24 Maximum Possible Overestimation of Total-Cost Sigma Percent Overestimated When Correlation Assumed to be 1 Instead of 

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 25 Part 2 Summary In this section we learned: Costs are uncertain quantities Costs can be treated as random variables Correlation affects variance  Especially when we are summing large numbers of WBS elements Remember that in the total cost distribution: The means add The standard deviation is the square root of the variance The variance =

Cost Drivers Learning Event, 2 nd November 2005 26 Part 3 Types of Correlation Purpose of Section is to learn about: 1. Functional and Statistical Correlation 2. Pearson and Spearman Correlation

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 27 Types of Correlation Functional (Causal) Correlation Between cost drivers (Cost Engineering Tools) Between CERs (Cost dependent CERs – SEITPM) Statistical Correlation Between CER errors  Residual analysis (USCM,SSCM, NAFCOM)  Retro-ICE method  Estimated based on Number of WBS items Between Engineering drivers Between complexity, weight, power, etc.

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 28 Functional Correlation Between Cost Drivers Cost drivers are functionally correlated Beginning of Life Power, EPS Weight, Solar Array Weight, Battery Weight, RCS Weight, etc… Use Sizing equations from cost engineering tools to examine the causal relationship Two Good Examples: Electrical Power System Sizing Reaction Control System Sizing

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 29 Functional (Causal) Correlation Between Cost Drivers Example: Electrical Power System Sizing Equations in Design loop are functional correlations Power Requirement drives EPS Weight

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 30 Functional Correlation Between Cost Drivers Feedback Example: Reaction Control System Reaction Control Selection Natural Disturbance Torques SV Dimensions Solar Array Pointing Error Slew Rate Antenna Pointing Error Slew Rate Inertia: SV Body Antennas Solar Array Reaction Torque RCS Size RCS Power RCS Weight Gimbal Torques RCS Weight and size drives SV Body Inertia RCS Power drives Solar Array Inertia RCS Size drives SV Dimensions Equations in Design loop are functional correlations

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 31 There are two basic types of Cost Estimating Relationships (CERs). 1. Design Parameter Dependent: Subsystem Hardware (HW) CERs which use weight (or other design parameter) as a base. 2. Cost Dependent: Systems Engineering, Integration and Test, and Program Management (SEITPM) CERs which use estimated cost as a base. CERs are Used Serially in Risk Analysis. 1. Subsystem HW estimated costs are driven by subsystem estimated weights (or other cost drivers). 2. SEITPM estimated costs are driven by HW cost estimates. 3. So, the variance of the SEITPM cost estimate is functionally correlated to the HW cost estimates. Functional Correlation Between CERs

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 32 Mathematically Total estimate is a sum of the subsystem and SEITPM costs:* Subsystem estimates follow the form: SEITPM estimate follows the form: *Note: The “Total cost” is actually a sum of the subsystem and SEITPM costs. It is not the sum of all costs associated with a spacecraft. Error terms for SS estimates Error term for SEITPM Weight

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 33 Mathematically So, the SEITPM estimate is actually represented by: And the total estimate is represented by: Functional correlation SS CERsSEITPM

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 34 Statistical Correlation Between CER errors.  Residual analysis (USCM,SSCM, NAFCOM).  Retro-ICE method.  Estimated based on Number of WBS items. Between Engineering drivers. Between complexity, weight, power, etc. Look at 2 types of statistical correlation Pearson’s correlation Spearman’s correlation When you have data When you have to guess When you have data but don’t know functional relationships

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 35 Two Types of Statistical Correlation Pearson Product-Moment Linear Correlation if and only if X and Y are linearly related, i.e., the least- squares linear relationship between X and Y allows us to predict Y precisely, given X = proportion of variation in Y that can be explained on the basis of a least-squares linear relationship between X and Y if and only if the least-squares linear relationship between X and Y provides no ability to predict Y, given X Spearman Rank Correlation if and only if the largest value of X corresponds to the largest value of Y, the second largest,..., etc. if and only if the largest value of X corresponds to the smallest value of Y, etc. if and only if the rank of a particular X among all X values provides no ability to predict the rank of the corresponding Y among all Y values

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 36 Pearson “Product-Moment” Correlation Suppose X and Y are Two Random Variables are their Expected Values (“Means”) True Theorem: False Theorem: “Covariance” of X and Y “Variance” of X “Variance” of Y “Correlation” of X and Y =

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 37 Pearson Correlation Measures Linearity A Statistical Relationship Between Two Random Variables X and Y Realizations of Y, Given Actual Values of X : X Y 0 X Y 0

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 38 Spearman Rank Correlation Coefficient Data Structure Statistics Theorem: Spearman Rank Correlation Coefficient Equals Pearson (Linear) Correlation Coefficient Calculated Between the Two Sets of Ranks  s

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 39 Linear vs. Rank Correlation     00 -0.4.. s     076 0..13.. s LINEARPOWER“KNEE” ROOT DECAY w/ OUTLIER RANDOM w/ OUTLIER More Nonlinear Linear Data gives similar  and  s

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 40 What Does Correlation Measure? PEARSON Correlation Measures Extent of LINEARITY of a Relationship Between Two Random Variables SPEARMAN Correlation Measures Extent of MONOTONICITY of a Relationship Between Two Random Variables A way to remember: …L M N O P Q R S… Rank Is Spearman Linear Is Pearson

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 41 Part 3 Summary Two types of correlation Functional and Statistical Correlation Functional correlation affects Cost Drivers and Cost Dependent CERs Statistical correlation is an observed relationship between data Two types of Correlation Statistics Pearson and Spearman Correlation or Linear and Rank Correlation They are similar when the data is linear They are different when data is not linear

Cost Drivers Learning Event, 2 nd November 2005 42 Part 4 The Correlation Matrix Purpose of Section is to learn : 1. Correlation in Risk Rollups 2. Anatomy of a Correlation Matrix 3. How to use a Correlation Matrix 4. Which Common Cost Models Handle Correlation

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 43 Cost-Risk Procedure Use analytic or Statistical sampling methods to arrive at a total cost distribution WBS-ELEMENT TRIANGULAR DISTRIBUTIONS MERGE WBS-ELEMENT COST DISTRIBUTIONS INTO TOTAL-COST DISTRIBUTION BEST ESTIMATE COST (MOST LIKELY) 70th PERCENTILE COST RISK DOLLAR S \$ Note: Addition of risk dollars brings confidence that total appropriation (best estimate plus risk dollars) is sufficient to fund program. L1L1 B1B1 H1H1 \$ L 2 = B 2 H2H2 \$ L3L3 B3B3 H3H3 \$

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 44 Total Cost Variance Remember from Part 1, the Total cost variance,   l l    T  n  Correlation matrix (full matrix) n  Vector of standard deviations (cost space) n Excel Commands u SIGMA_TOT=SQRT(MMULT(MMULT(TRANSPOSE(SIGMA),RHO),SIGMA)) Correlation is Essential in calculating variance!

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 45 Representing Correlation Matrices Full Matrix (Have to use this when you use analytic function: ) Upper Triangular: Lower Triangular: All 3 representations mean the same thing    T 

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 46 Representing Correlation Matrices Single value shorthand: This means all of the off diagonal terms are the same value The Rules: Always positive definite Diagonal terms always 1.0 Off diagonal terms are correlation values Columns and rows are transposed,  j,k =  k,j Now for some practical examples   

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 47 Cost Risk Rollup Procedure WBS Items Segment (Launch, Space, Ground) Want to form a Distribution for Total Cost of Program Spacecraft Bus Subsystems Payload Subsystems Payload SEITPM (function of payload subsystem cost) System (Spacecraft Bus, Payload) Subsystem 1 Subsystem 2 Subsystem N    f(\$) Subsystem 1 Subsystem 2 Subsystem N f(\$) Launch Other Elements Spacecraft Bus SEITPM (function of spacecraft bus subsystem cost) Inter-element Correlation needed for All Rollup (Summed) Costs

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 48 Spacecraft Bus: USCM7 Correlation Coefficients Correlation coefficients for USCM7 Weight based, Mean Unbiased Percentage Error (MUPE) CERs Average correlation coefficient = 0.160 These correlation coefficients should not be used for all spacecraft cost models

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 49 The “Big” WBS Case Study (ISS Risk Estimate) Suppose a risk analyst diligently applies distributions to all costs at the “level of estimating” – this is good. Assume that: There are 300 cost elements (N=300) There are about four cost elements in each subsystem (n=4) There are (N/n = 75 subsystems) Correlation is defined between all elements within a subsystem This means: That WBS elements are correlated Only about 1% of the cost elements are correlated Risk is very narrow and understated The correlation appears “just-off-the diagonal” of the correlation matrix – This is bad.

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 50 “Just-Off-Diagonal” Correlation  Some tools cannot support this function  Some nominal statistical correlation does exist  Even a few percent makes a big difference with a big WBS These Inter-Subsystem WBS Elements are Effectively Uncorrelated These Intra-Subsystem WBS Elements are Correlated

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 51 How to Use Correlation Matrices Typically, we wouldn’t want to define all of the correlation coefficients for a big WBS (>10 elements) We can break it up into parts, get the statistics and then sum at higher levels Reduces the size of correlation matrices Provides Risk Breakout by WBS Summary Level Lets use an example of a “Big” WBS with 40 elements

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 52 Yuck 40 Individual WBS Elements and the correlation matrix Imagine 300 WBS Elements!

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 53 Big Correlation Matrix Layout AA BA CA DA EA AB BB CB DB EB AC BC CC DC EC AD BD CD DD ED AE BE CE DE EE A= SETPM (5 elements) B= Space Segment (20 elements) C= Ground Segment (5 elements) D= Launch Segment (5) elements) E= Operations Segment (5 elements) Each Block Represents a group of inter element correlations The full matrix requires (40*39)/2=780 different correlations

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 54 Multilevel Risk Look at the problem one small set of pieces at a time Now do the same for SPACE, LAUNCH, GROUND, O&M SEITPM Mean = Sum(Means) SEITPM Sigma =SQRT(MMULT(MMULT(TRANSPOSE(sigma),correl_matrix),sigma))  SEITPM

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 55 Space Element Risk In the Space Element, first break-out the Bus calculation Space Vehicle SEITPM is one line item, Let’s assume we already calculated mean and sigma for the Payload, like we did for the Bus  S/C Bus

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 56 Space Element Risk Now we roll-up 3 Items: Use a small correlation matrix: The result is: Cost 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0100200300400 PDF

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 57 Economics of Multi-Level Risk After summing all of the elements, we used : 144 Correlation Coefficients vs. 780 (one big matrix) Views of Risk at all roll-up levels Easier to obtain values for correlation coefficients We will discuss this in the next part

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 58 What We Just Did AA BA CA DA EA AB BB CB DB EB AC BC CC DC EC AD BD CD DD ED AE BE CE DE EE A= SETPM (5 elements) B= Space Segment (20 elements) C= Ground Segment (5 elements) D= Launch Segment (5) elements) E= Operations Segment (5 elements) Relied on AA, BB, CC, DD, and EE correlation

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 59 Mathematically Step1: Calculate ; Where, and Step 2: Need correlation coefficients of partition AA, BB and all  s to calculate

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 60 Mathematically Step 3: Calculate total variance using This is useful when : We know the correlation between subsystem elements But not the correlation between subsystems from different elements to each other (i.e., thermal control SS in spacecraft to ground Command and control CSCIs) But do have an idea of correlation between higher-level elements like space to ground.

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 61 Mathematically 11 21 31 41 51 12 22 32 42 52 13 23 33 43 53 14 24 34 44 54 15 25 35 45 55

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 62 Models That Handle Correlation PRICE* NAFCOM* SSCM* USCM + Others Cannot Full Partial Simulation AnalyticDiscrete FRISK @RISK and CB Monte Carlo Simulators Cost Model w/ Risk* SEER* RI\$K SICM With Analytic or Simulation Method & correlation  Study Type of Risk Analysis Method Level of Correlation Specification © 2003 The Aerospace Corporation

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 63 Part 4 Summary Showed how correlation is used in Risk Rollups Provided the anatomy of a correlation matrix 1’s on diagonals Correlation coefficients on off-diagonals Rows and columns are transposes of each other How to use a correlation matrix Breaking down big risk jobs into smaller pieces Easier to understand Easier to correlate Showed which common cost models handle correlation

Cost Drivers Learning Event, 2 nd November 2005 64 Part 5 Deriving Correlation Coefficients Purpose of Section is to learn how to derive correlation coefficients: 1. When data is available 2. When you have to make an educated guess

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 65 Deriving Correlation Coefficients 2 Ways to derive correlation coefficients  Data Available: (CADRE, CERs) No Data: Educated Guess Residual Analysis Retro- ICE Causal Guess N-Effect Guess StatisticalNon-Statistical Effective  Knee in curve (Steve Book Method)

Cost Drivers Learning Event, 2 nd November 2005 66 Determining Correlation When Data is Available  Data Available: (CADRE, CERs) No Data: Educated Guess Residual Analysis Retro- ICE Causal Guess N-Effect Guess StatisticalNon-Statistical Effective  Knee in curve (Steve Book Method)

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 67 Determining Correlation When Data is Available Statistical Correlation Residual analysis (USCM,SSCM, NAFCOM) (HARD)  Need Database of cost and cost drivers+ CERs+ CER errors  Only have good bus correlations with standard models Retro-ICE method (HARD)  Need actual cost data from several similar programs + similar WBS structure+ total error+ similar models Estimated based on Number of WBS items (EASY)  Need number of WBS items + typical uncertainty  Strong function of number of correlated elements  Decreases with number of correlated elements Functional (Causal) Correlation Between cost drivers (HARD)  Need Cost Engineering Tools (CDC, SWAP model) Between CERs (EASY)  Use cost dependent CERs (SEITPM, etc) linked to summary costs in model © 2003 The Aerospace Corporation

Cost Drivers Learning Event, 2 nd November 2005 68 Residual Analysis Method  Data Available: (CADRE, CERs) No Data: Educated Guess Residual Analysis Retro- ICE Causal Guess N-Effect Guess StatisticalNon-Statistical Effective  Knee in curve (Steve Book Method)

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 69 Statistical Correlation From Residual Analysis Percentage error or standard error are a measure of residual errors Uncertainty and risk calculations Use residual errors to represent uncertainty Correlation between residuals Cost vs. Weight 0 500 1000 1500 2000 2500 3000 020406080100 Weight (lbs) Cost (\$K)

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 70 Deriving Correlation Coefficients Sample calculation using randomly generated numbers Error X i and Error Y i represent regression residuals for 2 CERs (X and Y) for 8 programs

Cost Drivers Learning Event, 2 nd November 2005 71 Retro-ICE Method  Data Available: (CADRE, CERs) No Data: Educated Guess Residual Analysis Retro- ICE Causal Guess N-Effect Guess StatisticalNon-Statistical Effective  Knee in curve (Steve Book Method)

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 72 Retro-ICE Method Another way to look at correlation is the effect on the variance of the total vs. the variance of the components Example: SPACE Segment Contains 3 elements: Space SEITPM, Spacecraft Bus, Payload What you will need: Actuals for some programs (8 or so) Estimates using your “new” cost model or method (You will be doing a Retro-ICE) Use the equation for Pearson correlation:

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 73 Retro-ICE Example Start with a table of actuals and estimates for 8 programs and the 3 WBS elements you wish to determine correlation 8 Programs Actual CostsRe-Estimated Costs (From Retro-ICEs)

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 74 Retro-ICE Example Calculate the Residuals (Actuals – Estimate) For Each WBS Element (and each program) For the Total Find the standard deviation, , of each set of residuals For Each WBS Element (8.05, 6.63, 21.26) For the Total (30.53)

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 75 Retro-ICE Example Construct the correlation matrix Ones on the diagonal Use Excel “CORREL” function on the off-diagonals Remember the row/column transpose rule Check your work: The result should match the standard deviation of the total, 30.53 ( )^2

Cost Drivers Learning Event, 2 nd November 2005 76 Effective Correlation  Data Available: (CADRE, CERs) No Data: Educated Guess Residual Analysis Retro- ICE Causal Guess N-Effect Guess StatisticalNon-Statistical Effective  Knee in curve (Steve Book Method)

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 77 Effective Correlation Effective correlation is different from average correlation, or the average of the correlation values in the upper (or lower) triangle of the correlation matrix Effective correlation is weighted by the value of the standard deviation of the constituent elements The effective correlation may be much different from the average correlation Look at SSCM as an example © 2003 The Aerospace Corporation

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 78 Effective Correlation SSCM has an average correlation of 0.04, but an effective correlation of 0.10 Effective correlation was calculated with SSCM the following way: 1. Calculate the total cost error of SSCM For each data point in the database:  Calculate the sum of actual cost database  Calculate the sum of the estimated costs  Determine the percentage error, and take the average 2 For each data point, calculate the SSCM error by multiplying the sum of actual costs by the SSCM percent error,  SSCM and square to get   TOT © 2003 The Aerospace Corporation

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 79 3. Then, for each data point, calculate  i by multiplying the actual cost for each WBS element by its respective percent error 4. Calculate the dot product of  i with itself to get   i 5. Now use the following formula to get the effective correlation,  eff for that data point. 6. Finally, get the average of the effective correlations to get  eff for the model The  eff for SSCM is 0.10 We should figure this out for all of our models! Effective Correlation © 2003 The Aerospace Corporation

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 80 Effective Correlation with the Retro ICE Method Remember the Retro Ice Example? Use the effective correlation equation to solve for  eff

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 81 Effective Correlation with the Retro ICE Method Retro ICE correlation Matrix Average  = 0.502 Retro ICE effective correlation  eff =0.508

Cost Drivers Learning Event, 2 nd November 2005 82 Determining Correlation When Data is Not Available  Data Available: (CADRE, CERs) No Data: Educated Guess Residual Analysis Retro- ICE Causal Guess N-Effect Guess StatisticalNon-Statistical Effective  Knee in curve (Steve Book Method)

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 83 The Problem It is not always possible to calculate statistical correlation between WBS elements. May be insufficient data to determine statistical correlation. May be no known functional relationship between WBS elements. Yet, there may be reason to believe increases or decreases in the cost of a certain WBS element are likely to cause corresponding increases or decreases in the cost of another WBS element. In cases such as these, it is still desirable to construct a correlation matrix in order to ensure a truer picture of the total cost variance.

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 84 Potential Solutions What can you do if you cannot construct a correlation matrix from statistical or empirical means? Assume independence  Same as a correlation matrix of zeros.  Extremely easy – but, as we have shown, it is WRONG! Use Knee-in-Curve Method (Steve Book’s Rule of Thumb)  When in doubt, assume all correlation values are 0.2.  Captures about 80% of the variance compared to assuming independence. Easy to do. Use the “N-effect”  Modulate our guess at correlation by preserving total error of the estimate Develop a subjective correlation matrix  Excellent results if you can do it accurately.  Can fill out an entire correlation matrix this way, but is somewhat difficult.

Cost Drivers Learning Event, 2 nd November 2005 85 Knee-in-Curve Method (Steve Book’s Rule of Thumb)  Data Available: (CADRE, CERs) No Data: Educated Guess Residual Analysis Retro- ICE Causal Guess N-Effect Guess StatisticalNon-Statistical Effective  Knee in curve (Steve Book Method)

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 86 Steve Book’s Rule of Thumb According to Dr. Steve Book… From: 1999 Cost Risk Analysis Seminar, Manhattan Beach, CA

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 87 Steve Book’s Rule of Thumb Dr. Book plotted the theoretical underestimation of percent total cost standard deviation when correlation is assumed to be zero rather than its true value, . From: 1999 Cost Risk Analysis Seminar, Manhattan Beach, CA

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 88 Steve Book’s Rule of Thumb For example, given a 30 x 30 correlation matrix in which each actual correlation coefficient is 0.2, if you were to instead assume each correlation coefficient’s value is zero, then you would underestimate the standard deviation of the resulting cost probability distribution by about 60%. Dr. Book argues that since the “knee” of these curves lies at about 0.2, then it is better to populate an unknown correlation matrix with 0.2’s, or 0.3’s, rather than zeros. Doing so will reasonably capture a substantial amount of cost variance over doing nothing.

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 89 Steve Book’s Rule of Thumb Again, according to Dr. Book… From: 1999 Cost Risk Analysis Seminar, Manhattan Beach, CA

Cost Drivers Learning Event, 2 nd November 2005 90 “N-effect” Correlation  Data Available: (CADRE, CERs) No Data: Educated Guess Residual Analysis Retro- ICE Causal Guess N-Effect Guess StatisticalNon-Statistical Effective  Knee in curve (Steve Book Method)

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 91 “N-effect” Correlation As N increases, the effective correlation (  eff ) will decrease in reaction to the central limit theorem. This is the “N-effect” Why? There is a fundamental limit to the predictive capability of our CERs. Just by breaking the WBS up into more pieces doesn’t improve our estimates.

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 92 “N-effect” Correlation Average Correlation* in models seem to be sensitive to number (N) of CERs As N increases,  decreases © 2003 The Aerospace Corporation Maximum Possible Underestimation of Total-Cost Sigma 0 10 20 30 40 50 60 70 80 90 100 00.10.20.30.40.50.60.70.80.91 Actual Correlation Percent Underestimated NAFCOM N= 55 USCM7 Bus N= 19 USCM7 FU Bus N= 11 USCM8 Bus N= 17 USCM8 Comm N= 13 SSCM N= 9 The average correlation is different from the effective correlation, but the effect is similar

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 93 Determining Correlation from the Number of WBS Items There appears to be a trend between the number of WBS Elements (N) in a cost model and the derived average correlation coefficient (  AVG ) and effective correlation  EFF  EFF is a single number used to fill the correlation matrix As N increases,  EFF decreases We looked at the following models: NAFCOM (NASA/ Air Force Cost Model) USCM7 (Unmanned Space Vehicle Cost Model, Ver. 7) USCM8 (Unmanned Space Vehicle Cost Model, Ver. 8) SSCM (Small Satellite Cost Model)  

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 94 Determining Correlation from Number of WBS Items If we see a trend in the chart of percent under-estimation of sigma vs. effective correlation, we have a sound basis for determining correlations when the number of WBS elements grows. If the actual percent underestimated is k then the N-effect correlation  N for a model with N CERs would be: So, for k=50%: © 2003 The Aerospace Corporation

Cost Drivers Learning Event, 2 nd November 2005 95 Causal Guess Method of Subjective Correlation (Tim Anderson’s Method)  Data Available: (CADRE, CERs) No Data: Educated Guess Residual Analysis Retro- ICE Causal Guess N-Effect Guess StatisticalNon-Statistical Effective  Knee in curve (Steve Book Method)

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 96 Subjective Correlation It has previously been shown that it is possible to derive the empirical residual correlation coefficients of a cost model such as USCM, NAFCOM or SSCM. However, this method requires exclusive use of either of these two cost models to be effective. One alternative method is to subjectively develop approximate correlation coefficients between WBS elements. This can be as simple as determining whether any two WBS elements are correlated by a small amount, or by a large amount, and whether that correlation is positive or negative.

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 97 Subjective Correlation An example of a subjective correlation decision table might look like the following: For example, if you believe two WBS elements have a small amount of positive correlation, then you would choose a correlation value of 0.3. Positive correlation Negative correlation Uncorrelated 0 0 Small amount of correlation 0.3 - Large amount of correlation 0.75 - Subjective Correlation Coefficients

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 98 Subjective Correlation Using this technique, it is only necessary to make the following argument between any two WBS elements: Thus, if the answer were that a change in the cost of one WBS element might cause a minor, similar change in the other WBS element, then one would assign a correlation value of 0.3 between the two WBS elements. “If circumstances cause the cost of one WBS element to change, is the other WBS element likely to change also? If so, how substantially and in what direction?”

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 99 But Does It Work? The subjective scoring scheme shown previously is based on averages. As the figure below shows, the values 0.0, ±0.3, and ±0.75 are the average values in each of their respective ranges. l The idea is this: While one might not know the true correlation between WBS elements, if instead one can subjectively bucket the correlation into one of these five ranges, then a correlation matrix composed of these averages should give approximately the same results as if the true correlations were known.

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 100 But Does It Work? To test the hypothesis, suppose we substitute the true correlation coefficients in a typical correlation matrix with their corresponding averages from the subjective correlation table. If the resulting cost probability distributions are similar, then we can use the technique with some confidence. An important point, however, is that the subjectively derived values should reflect the actual values. If this is true, then this method should give a more precise answer than Steve Book’s Rule of Thumb method. We will see in the following example

Cost Drivers Learning Event, 2 nd November 2005 101 Application of Correlation Methods When Data is Not Available Cost Risk Analysis of A Small Earth Orbiting Visible Imaging Sensor

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 102 Example Consider an estimate developed using the following cost model: Reprinted from Space Mission Analysis And Design, 3 rd Edition. Wertz and Larson

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 103 Input Variables Suppose this spacecraft has the following set of (mean) input variables:

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 104 Functional Relationships Integration and Test, Program Management and LOOS are functionally correlated to PL and S/C bus cost The input variables are functionally correlated through the following sizing relationships with their error terms (  ) Payload Wt. = 200 * (Aperture Diameter)^1.5 +  PL Structure Wt. = 0.4 * Payload Wt. +  STR Thermal Wt. = 0.05 * Payload Wt. +  TH BOLP = 5.0 * Payload Wt. +  BOLP EPS Wt. = 0.3 * BOLP +  EPS TTC Wt. = 50 + 0.01 * Payload Wt. +  TTC ADCS Wt = 0.7 * Payload Wt. +  ADCS

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 105 Input Parameter Error Terms Aperture Diameter has a discrete 20% probability of being 1.0m, and a 80% probability of being 1.2m. Assume the error terms (  ) for the sizing relationships are all triangular probability density functions defined by Low = 0.9 Most Likely = 1.0 High=1.4 1.00 0.90 1.40

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 106 Cost Estimate The resulting cost estimate has the following (deterministic) value:

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 107 “Actual” Correlation Matrix l Suppose the “actual” correlation matrix is as follows:

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 108 Total Cost Distribution: Actual Correlation Values Total cost distribution with “actual” correlation values: \$M Lognormal: Mean = \$534.7M Std Dev = \$126.6M Lognormal: Mean = \$534.7M Std Dev = \$126.6M Note: This is the distribution that follows as a result of using the “actual” correlation coefficients. The standard deviation is 24% of the mean. Note: This is the distribution that follows as a result of using the “actual” correlation coefficients. The standard deviation is 24% of the mean. Frequency Chart.000.027.054.081.109 0 135.7 271.5 407.2 543 0.0312.5625.0937.51250.0 5,000 Trials 0 Outliers Forecast: Total RDT&E + T1

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 109 Total Cost Distribution: No Statistical Correlation We have included functional correlation, but no statistical correlation between the error terms,  Under these circumstances, the total cost distribution has the following appearance and statistics: Frequency Chart.000.031.062.093.124 0 154.5 309 463.5 618 0.0312.5625.0937.51250.0 5,000 Trials 0 Outliers Forecast: Total RDT&E + T1 Note: This is a narrow distribution. The standard deviation is 15% of the mean. This is very different from the percent errors of our CERs, which ranged from 34% to 61%. Note: This is a narrow distribution. The standard deviation is 15% of the mean. This is very different from the percent errors of our CERs, which ranged from 34% to 61%. \$M Lognormal: Mean = \$525.8M Std Dev = \$79.7M Lognormal: Mean = \$525.8M Std Dev = \$79.7M

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 110 Total Cost Distribution: Knee-in-Curve Method Using Steve Book’s Rule of Thumb, the correlation is set to 0.2, and the total cost distribution has the following appearance and statistics: \$M Note: This has caused the standard deviation to shift substantially compared to the zero correlation case. The standard deviation grew by over 76% from the uncorrelated case. Note: This has caused the standard deviation to shift substantially compared to the zero correlation case. The standard deviation grew by over 76% from the uncorrelated case. Frequency Chart.000.019.039.058.078 0 97.25 194.5 291.7 389 0.0312.5625.0937.51250.0 5,000 Trials 1 Outlier Forecast: Total RDT&E + T1 Lognormal: Mean = \$535.3M Std Dev = \$148.2M Lognormal: Mean = \$535.3M Std Dev = \$148.2M

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 111 Total Cost Distribution: “N-Effect” Method Using the N-Effect method, the correlation is 0.125, and the total cost distribution has the following appearance and statistics: Note: This has caused the standard deviation to shift substantially compared to the zero correlation case. The standard deviation grew by over 59% from the uncorrelated case. This is almost exactly the Standard error of the “actual correlation” case. Note: This has caused the standard deviation to shift substantially compared to the zero correlation case. The standard deviation grew by over 59% from the uncorrelated case. This is almost exactly the Standard error of the “actual correlation” case. \$M Frequency Chart.000.022.044.066.088 0 110.5 221 331.5 442 0.0312.5625.0937.51250.0 5,000 Trials 0 Outliers Forecast: Total RDT&E + T1 Lognormal: Mean = \$532.6M Std Dev = \$127.0M Lognormal: Mean = \$532.6M Std Dev = \$127.0M

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 112 “Subjective” Correlation Matrix l The corresponding “subjective” correlation matrix using the causal guess method is as follows:

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 113 Total Cost Distribution: Causal Guess Subjective Values Total cost distribution with “subjective” correlation values: \$M Note: This is the distribution that follows as a result of using the “subjective” correlation coefficients. The mean is nearly identical to the “actual” case, and the standard deviation has increased by approximately 4%. Note: This is the distribution that follows as a result of using the “subjective” correlation coefficients. The mean is nearly identical to the “actual” case, and the standard deviation has increased by approximately 4%. Lognormal: Mean = \$531.2M Std Dev = \$121.2M Lognormal: Mean = \$531.2M Std Dev = \$121.2M

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 114 Summary of Results meansigma Actual534.7126.6 Uncorrelated525.879.7 Knee-in-curve535.3148.2 N-Effect532.6127 Causal guess531.2121.2

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 115 Summary of Results Ignoring correlation understated the total cost sigma The N-Effect and Causal Guess methods produced the best results The Knee-in curve method was close, but provided the largest variance

Cost Drivers Learning Event, 2 nd November 2005 16 September 2005 116 Part 5 Review We learned how to derive correlation coefficients: When data is available When you have to make an educated guess The Methods: Statistical (Data is available)  Residual analysis  Retro-ICE  Effective correlation,  eff When data is not available  Knee-in-curve method (Steve Book’s Rule of Thumb)  N-effect guess  Subjective guess (Tim Anderson’s Method) Example cost risk analysis using subjective methods

Cost Drivers Learning Event, 2 nd November 2005 117 End 

Download ppt "Cost Drivers Learning Event, 2 nd November 2005 1 Correlation Tutorial Raymond Covert, MCR, LLC Timothy Anderson, The Aerospace Corporation."

Similar presentations