Presentation is loading. Please wait.

Presentation is loading. Please wait.

MATH 2311 Section 5.2 & 5.3. Correlation Coefficient.

Similar presentations


Presentation on theme: "MATH 2311 Section 5.2 & 5.3. Correlation Coefficient."— Presentation transcript:

1 MATH 2311 Section 5.2 & 5.3

2 Correlation Coefficient

3 Facts about Correlation 1.Positive r indicates positive association and negative r indicates negative association between variables. 2. r is always between –1 and 1. 3. The closer |r| is to 1, the stronger the association. A weak association will have an r value close to 0. 4. Correlation is strongly influenced by outliers.

4 Example of a Correlation Coefficient Calculating in R-Studio: cor(a,b) Using the monopoly example from Section 5.1: assign(“spaces”,c(1,3,5,6,8,9,11,12,13,14,15,16,18,19,21,23,24,25,26,27,28, 29,31,32,34,35,37,39)) assign(“cost”,c(60,60,200,100,100,120,140,150,140,160,200,180,180,200,22 0,220,240,200,260,260,150,280,300,300,320,200,350,400)) Determine the Correlation Coefficient. What does this mean?

5 Popper 12 Create a scatter plot from the data. Based on the plot: 1. Is this a positive, negative or no relationship? a.positiveb. negativec. none 2. Is the relationship linear or not? a. linearb. not linear International Journal of Morphology

6 Popper12Continued 3. Calculate the correlation coefficient. a. 0.7985b. 0.2393c. 0.1264d. 0.0794 4. Based on the correlation coefficient, determine the direction of the relationship? a.positiveb. negativec. neither 5. Based on the correlation coefficient, is this relationship strong (|r| > 0.75), moderate (0.5 < |r| < 0.74) or weak (|r| < 0.5)? a. strongb. moderatec. weak

7 Regression Lines A regression line is a line that describes the relationship between the explanatory variable x and the response variable y. Regression lines can be used to predict a value for y given a value of x.

8 Least Squares Regression Lines (LSRL) The least squares regression line (or LSRL) is a mathematical model used to represent data that has a linear relationship. We want a regression line that makes the vertical distances of the points in a scatter plot from the line as small as possible.

9 Calculating a Least Squares Regression Line

10 Example: Using the Monopoly Problem, Calculate the Regression Line: regline=lm(cost~spaces) regline >

11 Viewing the Scatterplot and the Regression Line Note that I assigned a name to the lm command, this is not required unless you wish to use it again. We will use it again to plot the regression line on top of the scatterplot. The command is abline. > abline(regline)

12 Making Predictions: The LSRL can be used to predict values of y given values of x. Let’s use our model to predict the cost of a property 50 spaces from GO We need to be careful when predicting. When we are estimating y based on values of x that are much larger or much smaller than the rest of the data, this is called extrapolation.

13 Interpreting the Slope Notice that the formula for slope is this means that a change in one standard deviation in x corresponds to a change of r standard deviations in y. This means that on average, for each unit increase in x, then is an increase (or decrease if slope is negative) of |b| units in y. Interpret the meaning of the slope for the Monopoly example

14 Interpreting the Slope Notice that the formula for slope is this means that a change in one standard deviation in x corresponds to a change of r standard deviations in y. This means that on average, for each unit increase in x, then is an increase (or decrease if slope is negative) of |b| units in y. Interpret the meaning of the slope for the Monopoly example: For every increase of 1 space from go, there is an increase of $6.79 of cost.

15 Coefficient of Determination The square of the correlation (r), r 2 is called the coefficient of determination. It is the fraction of the variation in the values of y that is explained by the regression line and the explanatory variable. When asked to interpret r 2 we say, “approximately r 2 *100% of the variation in y is explained by the LSRL of y on x.” This tells how accurate the measurement is based on the regression line.

16 Facts about the coefficient of determination: 1.The coefficient of determination is obtained by squaring the value of the correlation coefficient. 2. The symbol used is r 2 3. Note that 0 ≤ r 2 ≤1 4. r 2 values close to 1 would imply that the model is explaining most of the variation in the dependent variable and may be a very useful model. 5. r 2 values close to 0 would imply that the model is explaining little of the variation in the dependent variable and may not be a useful model.

17 Interpret r 2 for the Monopoly problem

18 Popper 13: The following 9 observations compare the Quetelet index, x (a measure of body build) and dietary energy density, y. 1.Compute the LSRL a. y=-.8985x +.0073b. y=.0073x -.8985 c. y=.0073x +.8985d. y=.8985x -.0073 2. Find the Correlation Coefficient a..6579b..7325c..9231d..0607 3. Find the coefficient of determination a..8111b..7834c..0023d..4328


Download ppt "MATH 2311 Section 5.2 & 5.3. Correlation Coefficient."

Similar presentations


Ads by Google