# Correlation and Regression Analysis – An Application

## Presentation on theme: "Correlation and Regression Analysis – An Application"— Presentation transcript:

Correlation and Regression Analysis – An Application
Systems Engineering Program Department of Engineering Management, Information and Systems EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS AND ENGINEERS Correlation and Regression Analysis – An Application Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering

Montgomery, Peck, and Vining (2001) present data concerning the performance of the 28 National Football league teams in It is suspected that the number of games won(y) is related to the number of yards gained rushing by an opponent(x). The data are shown in the following table:

Yards Rushing by Opponent (x)
Team Games Won (y) Yards Rushing by Opponent (x) Washington 10 2205 Detroit 6 1901 Minnesota 11 2096 Green Bay 5 2288 New England 1847 Houston 2072 Oakland 13 1903 Kansas City 2861 Pittsburgh 1457 Miami 2411 Baltimore 1848 New Orleans 4 2289 Los Angeles 1564 New york Giants 3 2203 Dallas 1821 New York Jets 2592 Atlanta 2577 Philadelphia 2053 Buffalo 2 2476 St. Louis 1979 Chicago 7 1984 San Diego 2048 Cincinnati 1917 San Francisco 8 1786 Cleveland 9 1761 Seattle 2876 Denver 1709 Tampa Bay 2560

Correlation Analysis Statistical analysis used to obtain a quantitative measure of the strength of the relationship between a dependent variable and one or more independent variables

Scatter Plot

Sample correlation coefficient
Notes: -1  r  1 R=r2  100% = coefficient of determination

R=r2  100% =0.5447

Correlation To test for no linear association between x & y, calculate
Where r is the sample correlation coefficient and n is the sample size.

Correlation Conclude no linear association if
then treat y1, y2, …, yn as a random sample

Correlation Take α=0.05 and check from the T-table, we get
Since t= < , we conclude that there is linear association between x and y and proceed with regression analysis

Linear Regression Model
Simple linear regression model where Y is the response (or dependent) variable 0 and 1 are the unknown parameters  ~ N(0,) and data: (x1, y1), (x2, y2), ..., (xn, yn)

Least squares estimates of 0 and 1

estimates of 1

estimates of 0

Least squares regression equation
Point estimate of the linear model is Least squares regression equation

Regression Fitted Line Plot

Point estimate of 2

Interval Estimates for y intercept (0)
(1 - )100% confidence interval for 0 is where and

Interval Estimates for y intercept (0)
Take =0.05, then 95% confidence interval for 0 is

Interval Estimates for y intercept (0)
Apply to the equation and we get the lower and upper bound for :

Interval Estimates for slope (1)
(1 - )100% confidence interval for 1 is where and

Interval Estimates for slope (1)

Confidence interval for conditional mean of Y, given x=2205
Given x equal to 2205, we can calculate the confidence interval of conditional mean of Y

Confidence interval for conditional mean of Y, given x=2205
and

Prediction interval for a single future value of Y, given x
and

Prediction interval for a single future value of Y, given x=2000

Prediction interval for a single future value of Y, given x=2000
and

Excel Calculation X Y XY X^2 Y^2 Y ^ (Y-Y^)^2 (x-xbar)^2 2205 10 22050
100 2096 11 23056 121 1847 20317 1903 13 24739 169 1457 14570 1848 20328 1564 15640 298272 1821 20031 2577 4 10308 16 2476 2 4952 1984 7 13888 49 1917 19170 1761 9 15849 81 1709 15381 1901 6 11406 36 2288 5 11440 25 2072 10360 2861 14305 2411 14466 2289 9156 2203 3 6609 2592 7776 2053 8212 1979 19790 2048 12288 1786 8 14288 64 2876 5752 2560 SUM 59084 195 386127 1685 x-bar 9155 <-r Sb0 b1 <-S^2 b0l b0 <--S b0u Sb1 Sb1l Y(2205)-> Sb1u mu-l mu-u Y(2000)-> y-l y-u

Regression Statistics
Excel Regression Analysis Output SUMMARY OUTPUT Regression Statistics Multiple R R Square Adjusted R Square Standard Error Observations 28 ANOVA df SS MS F Significance F Regression 1 7.381E-06 Residual 26 Total 27 Coefficients t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept 1.46E-08 X Variable 1 7.38E-06 RESIDUAL OUTPUT Observation Predicted Y Residuals 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Similar presentations