 # Correlation Correlation measures the strength of the LINEAR relationship between 2 quantitative variables. Labeled as r Takes on the values -1 < r < 1.

## Presentation on theme: "Correlation Correlation measures the strength of the LINEAR relationship between 2 quantitative variables. Labeled as r Takes on the values -1 < r < 1."— Presentation transcript:

Correlation Correlation measures the strength of the LINEAR relationship between 2 quantitative variables. Labeled as r Takes on the values -1 < r < 1 r values close to -1 or 1 represent strong linear correlation. Values close to 0 represent weak or no correlation r is based on the z-scores of both variables. Since it uses mean and S.D. it is NOT resistant to outliers r = 1/(n-1)  ( X i – X ) (Y i – Y) S x S y Example: Weight of students on a backpacking trip: Body weight: 120 187 109 103 131 165 158 116 Backpack weight: 26 30 26 24 29 35 31 28 STAT 1:Edit STAT Calc2: 2-var stats STAT 1:Edit VARS 5: Statistics STAT 2 nd Math 5: sum( 2 nd L3 * 2 nd L4 ZOOM 9: zoomstat

Correlation r = 1/(n-1)  ( X i – X ) (Y i – Y) S x S y STAT 1:Edit STAT Calc2: 2-var stats STAT 1:Edit VARS 5: Statistics STAT 2 nd Math 5: sum( 2 nd L3 * 2 nd L4 ZOOM 9: zoomstat When prices of coffee are high, farmers clear more land to plant crops. Price /lb 29 40 54 55 72 Deforestation (%)0.49 1.59 1.69 1.82 3.10 Mass(kg) Met Rate 36.11666 54.61425 48.51396 42.01418 50.61502 42.01256 40.31189 33.1913 42.41124 34.51052 51.11867 41.21204 This data gives the lean body mass and Metabolic rate for some women. Do you Think body mass influences metabolic rate? Draw a scatterplot Find the correlation coefficient, r Do they seem correlated? Which value explains the other? Enter

Least Squares Regression If data shows moderate or strong linear correlation, we might want to use it to make predictions. CHECK r FIRST Least squares Linear Regression: Fitting a straight line that MINIMIZES the squared difference between each actual data value and the predicted value. Equation of the line: Ŷ = a + b X a = y intercept b = slope The line Ŷ must pass through the point (X, Y) and has slope b = r SySxSySx We must know which variable is explanatory (x) and which is the response (y). We can tell how much of the variation in Y is accounted for (explained) by the x variable based on the value r 2.

Residuals Residuals are the difference between the actual observed value Y And the predicted value Ŷ for each X value in the data. Residuals will always add up to 0, so the mean of the residuals is 0. Residuals should form a random scatterplot when graphed against X. There should NOT be patterns that are easy to see. Curved pattern: the original relationship was not linear Increasing spread: the predictions will be less reliable in areas of the graph where spread is larger. Vertical outliers: Large residuals could indicate an outliers which may need to be removed. Horizontal outliers: if one point has a very different x value, it can pull the regression line towards itself and overly influence the results. Patterns to watch out for: Calculator: Calculate LinReg as normal. This stores values. Y= vars 5:Statistics >EQ 1: RegEQ will graph line in Y1 STAT 1. Edit Y1(L1) – L2 In column L3 STAT PLOTTurn on plot 2 and graph L1 vs L3 Calculates residuals

Stats Assignment : Residuals 1.Weight of infants each month in a developing village in Egypt: Age(months) 1 2 3 4 5 6 7 8 9 10 11 12 Weight (kg) 4.3 5.1 5.7 6.3 6.8 7.1 7.2 7.2 7.2 7.2 7.5 7.8 (a)Find the least squares regression equation _________________ (b)Find the correlation (r) _______________ (c)Find r 2. How much of the variation is covered? ________ (d)Make a graph of the age vs the residuals. Describe the graph. (e)What conclusions can you make about your regression results? Stats Assignment : Residuals 1.Weight of infants each month in a developing village in Egypt: Age(months) 1 2 3 4 5 6 7 8 9 10 11 12 Weight (kg) 4.3 5.1 5.7 6.3 6.8 7.1 7.2 7.2 7.2 7.2 7.5 7.8 (a)Find the least squares regression equation _________________ (b)Find the correlation (r) _______________ (c)Find r 2. How much of the variation is covered? ________ (d)Make a graph of the age vs the residuals. Describe the graph. (e)What conclusions can you make about your regression results?

Download ppt "Correlation Correlation measures the strength of the LINEAR relationship between 2 quantitative variables. Labeled as r Takes on the values -1 < r < 1."

Similar presentations