Presentation on theme: "Correlation and Regression"— Presentation transcript:
1Correlation and Regression 9-2 / 9.3Correlation and Regression
2Linear Correlation Coefficient r DefinitionLinear Correlation Coefficient rmeasures strength of the linear relationship between paired x and y values in a samplenxy - (x)(y)r =n(x2) - (x) n(y2) - (y)2
3Formula for b0 and b1 b0 = (y-intercept) b1 = (slope) (y) (x2) - (x) (xy)b0 = (y-intercept)n(x2) - (x)2n(xy) - (x) (y)b1 = (slope)n(x2) - (x)2Encourage the use of calculators for these formulas.Most inexpensive non-graphics calculators will compute these two values after the data has been entered into the calculator.
4Review Calculations0.2721.4132.192.83641.810.8513.055Data from the Garbage Projectx Plastic (lb)y HouseholdFind the Correlation and the Regression Equation (Line of Best Fit)
5r = 0.842 Review Calculations b0 = 0.549 b1= 1.48 y = 0.549 + 1.48x 0.2721.4132.192.83641.810.8513.055Data from the Garbage Projectx Plastic (lb)y HouseholdUsing a calculator:b0 = 0.549b1= 1.48y = xr = 0.842
6Notes on correlationr represents linear correlation coefficient for a sample (ro) represents linear correlation coefficient for a population-1 r 1r measures strength of a linear relationship.-1 is perfect negative correlation & 1 is perfect positive correlation
7Interpreting the Linear Correlation Coefficient If the absolute value of r exceeds the value in Table A - 6, conclude that there is a significant linear correlation.Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation.Discussion should be held regarding what value r needs to be in order to have a significant linear correlation.
8Formal Hypothesis Test Two methodsBoth methods let H0: =(no significant linear correlation)H1: (significant linear correlation)
9Method 1: Test Statistic is t (follows format of earlier chapters) n - 2Critical values:use Table A-3 withdegrees of freedom = n - 2This is the first example where the degrees of freedom for Table A-3 is different from n Special note should be made of this.
10Method 2: Test Statistic is r (uses fewer calculations)Test statistic: rCritical values: Refer to Table A-6(no degrees of freedom)Much easierThis method is preferred by some instructors because the calculations are easier.
11TABLE A-6 Critical Values of the Pearson Correlation Coefficient r = .05= .01456789101112131415161718192025303540455060708090100.950.878.811.754.707.666.632.602.576.553.532.514.497.482.468.456.444.396.361.335.312.294.218.104.22.168.207.196.999.959.917.875.834.798.765.735.708.684.661.641.623.606.590.575.561.505.463.430.402.378.361.330.305.286.269.256
12Is there a significant linear correlation? 0.2721.4132.192.83641.810.8513.055Data from the Garbage Projectx Plastic (lb)y Householdn = = H0: = 0H1 : 0Test statistic is r = 0.842Using Method 2 to solve this problem.
13Is there a significant linear correlation? 456789101112131415161718192025303540455060708090100n.999.959.917.875.834.798.765.735.708.684.661.641.623.606.590.575.561.505.463.430.402.378.361.330.305.286.269.256.950.878.811.754.707.666.632.602.576.553.532.514.497.482.468.456.444.396.335.312.294.222.214.171.124.207.196= .05= .01n = = H0: = 0H1 : 0Test statistic is r = 0.842Critical values are r = and 0.707(Table A-6 with n = 8 and = 0.05)TABLE A-6 Critical Values of the Pearson Correlation Coefficient r
14Is there a significant linear correlation? > 0.707, That is the test statistic does fall within thecritical region.Reject= 0Fail to reject = 0Reject= 0- 11r =r =Sample data:r = 0.842
15Is there a significant linear correlation? > 0.707, That is the test statistic does fall within thecritical region.Therefore, we REJECT H0: = 0 (no correlation) and concludethere is a significant linear correlation between the weights ofdiscarded plastic and household size.Reject= 0Fail to reject = 0Reject= 0- 11r =r =Sample data:r = 0.842
16Regression Definition y = b0 + b1x + e y = b0 + b1x Regression Model Regression Equationy = b0 + b1x + ey = b0 + b1x^Given a collection of paired data, the regression equation algebraically describes the relationship between the two variables
17Notation for Regression Equation PopulationParameterSampleStatisticy-intercept of regression equation b0Slope of regression equation b1Equation of the regression line y = 0 + 1 x + e y = b0 + b1^x
18Regression Definition Regression Equation y = b0 + b1x Regression Line Given a collection of paired data, the regression equationy = b0 + b1x^algebraically describes the relationship between the two variablesRegression Line(line of best fit or least-squares line)is the graph of the regression equation
19Assumptions & Observations 1. We are investigating only linear relationships.2. For each x value, y is a random variable having a normal distribution.3. There are many methods for determining normality.3. The regression line goes through (x, y)
20Guidelines for Using The Regression Equation1. If there is no significant linear correlation, don’t use the regression equation to make predictions.2. Stay within the scope of the available sample data when making prediction.
21Definitions Outlier Influential Points a point lying far away from the other data pointsInfluential Pointspoints which strongly affect the graph of the regression lineThe slope b1 in the regression equation represents the marginal change in y that occurs when x changes by one unit.
22Residuals and the Least-Squares Property DefinitionsResidual (error)for a sample of paired (x,y) data, the difference (y - y) between an observed sample y-value and the value of y, which is the value of y that is predicted by using the regression equation.Least-Squares PropertyA straight line satisfies this property if the sum of the squares of the residuals is the smallest sum possible.^^
23Residuals and the Least-Squares Property x^y = 5 + 4xyy•Residual = 72468101214161820222426283032135•Residual = 11•Residual = -13•Residual = -5x
24DefinitionsTotal Deviation from the mean of the particular point (x, y)the vertical distance y - y, which is the distance between the point (x, y) and the horizontal line passing through the sample mean yExplained Deviationthe vertical distance y - y, which is the distance between the predicted y value and the horizontal line passing through the sample mean yUnexplained Deviationthe vertical distance y - y, which is the vertical distance between the point (x, y) and the regression line. (The distance y - y is also called a residual, as defined in Section 9-3.)^^^