 # Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs \$1000.

## Presentation on theme: "Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs \$1000."— Presentation transcript:

Regression Analysis

Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs \$1000 to repair, on average.

You’ve got the Data… Now What? Unscheduled Maintenance Actions (UMAs)

What do you want to know? l How many UMAs will there be next month? l What is the average number of UMAs ?

Sample Mean

Sample Standard Deviation

UMA Sample Statistics

UMAs Next Month 95% Confidence Interval

Average UMAs 95% Confidence Interval

Model: Cost of UMAs for one squadron If the cost per UMA = \$1000, the Expected cost for one squadron = \$60,000

Model: Total Cost of UMAs Expected Cost for all squadrons = 60 * \$1000 * 36 = \$2,160,000

Model: Total Cost of UMAs Expected Cost for all squadrons = 60 * \$1000 * 36 = \$2,160,000 How confident are we about this estimate?

~ 95% mean (=60) standard error =12/  36 = 2

~ 95% ~56 ~58 60 ~62 ~64 (1 standard unit = 2)

95% Confidence Interval on our estimate of UMAs and costs l 60 + 2(2) = [56, 64] l low cost: 56 * \$1000 * 36 = \$2,016,000 l high cost: 64 * \$1000 * 36 = \$2,304,000

What do you want to know? l How many UMAs will there be next month? l What is the average number of UMAs ? l Is there a relationship between UMAs and and some other variable that may be used to predict UMAs? l What is that relationship?

Relationships l What might be related to UMAs? n Pilot Experience ? n Flight hours ? n Sorties flown ? n Mean time to failure (for specific parts) ? n Number of landings / takeoffs ?

Regression: l To estimate the expected or mean value of UMAs for next month: n look for a linear relationship between UMAs and a “predictive” variable n If a linear relationship exists, use regression analysis

Regression analysis: describes and evaluates relationships between one variable (dependent or explained variable), and one or more other variables (called the independent or explanatory variables).

What is a good estimating variable for UMAs? l quantifiable l predictable l logical relationship with dependent variable l must be a linear relationship: Y = a + bX

Sorties

Pilot Experience

Sample Statistics

Describing the Relationship l Is there a relationship? n Do the two variables (UMAs and sorties or experience) move together? n Do they move in the same direction or in opposite directions? l How strong is the relationship? n How closely do they move together?

Positive Relationship

Strong Positive Relationship

Negative Relationship

Strong Negative Relationship

No Relationship

Relationship?

Correlation Coefficient l Statistical measure of how closely two variables are moving together in a coordinated fashion n Measures strength and direction l Value ranges from -1.0 to +1.0 n +1.0 indicates “perfect” positive linear relation n -1.0 indicates “perfect” negative linear relation n 0 indicates no relation between the two variables

Correlation Coefficient

Sorties vs. UMAs r =.9788

Experience vs. UMAs r =.1896

Correlation Matrix

A Word of Caution... l Correlation does NOT imply causation n It simply measures the coordinated movement of two variables l Variation in two variables may be due to a third common variable l The observed relationship may be due to chance alone

What is the Relationship? l In order to use the correlation information to help describe the relationship between two variables we need a model l The simplest one is a linear model:

Fitting a Line to the Data

One Possibility Sum of errors = 0

Another Possibility Sum of errors = 0

Which is Better? l Both have sum of errors = 0 l Compare sum of absolute errors:

Fitting a Line to the Data

One Possibility Sum of absolute errors = 6

Another Possibility Sum of absolute errors = 6

Which is Better? l Sum of the absolute errors are equal l Compare sum of errors squared:

50 60 70 80 90 100 110120130 X Y The Correct Relationship: Y = a + bX + U systematic random

50 60 70 80 90 100 110120130 X Y The correct relationship: Y = a + bX + U systematic random

Least-Squares Method l Penalizes large absolute errors l Y- intercept: l Slope:

Assumptions l Linear relationship: l Errors are random and normally distributed with mean = 0 and variance = n Supported by Central Limit Theorem

Least Squares Regression for Sorties and UMAs

Regression Calculations

Sorties vs. UMAs

Regression Calculations: Confidence in the predictions

Confidence Interval for Estimate

95% Confidence Interval for the model (b) X Y

Testing Model Parameters l How well does the model explain the variation in the dependent variable? l Does the independent variable really seem to matter? l Is the intercept constant statistically significant?

Variation

Coefficient of Determination l Values between 0 and 1 l R 2 = 1 when all data on line (r=1) l R 2 = 0 when no correlation (r=0)

Regression Calculations: How well does the model explain the variation?

Does the Independent Variable Matter? l If sorties do not help predict UMAs we expect b = 0 l If b is not 0, is it statistically significant?

Regression Calculations: Does the Independent Variable Matter?

95% Confidence Interval for the slope (a) Mean of Y Mean of XX Y

Confidence Interval for Slope

Is the Intercept Statistically Significant?

Confidence Interval for Y-intercept

Basic Steps of Regression Analysis l Formulate the model l Plot scatter diagram for visual inspection l Compute correlation coefficient l Fit the regression line l Test the model

Factors affecting estimation accuracy l Sample size (larger is better) l Range of X values (wider is better) l Standard deviation of U (smaller is better)

Uses and Limitations of Regression Analysis l Identifying relationships n Not necessarily cause n May be due to chance only l Forecasting future outcomes n Only valid over the range of the data n Past may not be good predictor of future

Common pitfalls in regression l Failure to draw scatter diagrams l Omitting important variables from the model l The “two point” phenomenon l Unfounded claims of model sophistication l Insufficient attention to interval estimates and predictions l Predicting too far outside of known range

Lines can be deceiving... R 2 =.6662

Nonlinear Relationship

Best fit?