# Linear regression and correlation

## Presentation on theme: "Linear regression and correlation"— Presentation transcript:

Linear regression and correlation
International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies International Baccalaureate Mathematical Studies Linear regression and correlation Learning outcomes This work will help you To draw a scatter diagrams. Draw linear regression lines y on x and x on y and work out their equations. Identify the types of correlation and calculate product moment correlation coefficient. Use technology to find all of the above.

Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. y x Looking at the graph we can see that there is some positive correlation.

First lets consider y on x regression line.
It is possible to draw a line called a regression line. There are two types y on x and x on y. First lets consider y on x regression line. x y y on x The y on x line, draws the regression line by keeping the sum of the squares of the vertical distance to a minimum. Note: The equation of the line is called “The Equation of the Least Squares regressions Lines”

Now consider the x on y regression line.
The x on y line, draws the regression line by keeping the sum of the squares of the horizontal distance to a minimum.

Drawing both graphs on the same graph we have
x on y x y y on x We should note that both graphs will pass through the means of both sets of data,

It is possible to calculate the equations of the y on x and x on y regression lines.
Important formulae y on x regression line is of the form and can be calculated by using the formula. Where is called the covariance and links the x and y data. is the variance of the x data

x on y regression line is of the form and can be calculated by using the formula.
Where is called the covariance and links the x and y data. is the variance of the y data

Example In the table below are the results of ten students in both their Mathematics and Physics examinations. The teacher thinks there might be a relationship between the two. His hypothesis is “a student who has Mathematical ability also has ability in Physics.” Mathematics Mark /100 (x) Physics Mark /100 (y) 61 56 34 45 24 15 89 92 47 67 57 82 75 6 8 53 76

Drawing a scatter graph
x y 61 56 34 45 24 15 89 92 47 67 57 82 75 6 8 53 76

Maths/100

Finding y on x using technology
Product-Moment Correlation Coefficient

Finding x on y using technology
Remember you have to interchange the x and y when writing down the x on y regression line. Product-Moment Correlation Coefficient

Example In the table below are the results of ten students in both their Mathematics and Physics examinations. The teacher thinks there might be a relationship between the two. His hypothesis is “a student who has Mathematical ability also has ability in Physics.” Mathematics Mark /100 (x) Physics Mark /100 (y) 61 56 34 45 24 15 89 92 47 67 57 82 75 6 8 53 76

Now calculating the regression lines
76 89 47 8 75 57 61 92 15 45 56 y 53 6 82 67 24 34 x 5.8 2.8 33.64 7.84 16.24 -21.2 -8.2 449.44 67.24 173.84 -31.2 -38.2 973.44 33.8 38.8 -8.2 7.8 67.24 60.84 -63.96 11.8 3.8 139.24 14.44 44.84 26.8 21.8 718.24 475.24 584.24 -49.2 -45.2 -2.2 -6.2 4.84 38.44 13.64 33.8 22.8 519.84 770.64 552 532

Using alternate formulae and the TI-nspire Calculator
Variance of x Variance of y

Covariance Having done the 2-variable stats calculation the actual value of variance (which is the standard deviation squared) can be found using the “Var” menu on the calculator.

For regression line y on x which has form

For regression line x on y which has form

Plotting both lines on the scatter diagram
y on x, and for x on y, x on y y on x Note: For x on y line, remember to rearrange it into the following form before trying to plot

(standard deviation of x)
Correlation We need a way to determine if there is linear correlation or not. So we calculate what is known as the Product-Moment Correlation Coefficient (r). (covariance), (standard deviation of x) (standard deviation of y). We can see that the quantity r from the following five sets of data above tells us something about the degree of scatter of the two sets of data, if we are looking for a linear relationship.

Table 1 x 5 10 15 20 25 30 35 y 38 28 26 19 17 8 1 y on x x on y The product moment correlation coefficient In table 1 we notice that the two regressions lines (y on x and x on y) nearly coincide and that as the x-data increases the y-data decreases. The value of r is , which is close to –1. Here we have what is called strong negative linear correlation.

Table 2 x 5 10 15 20 25 30 35 y 23 32 2 y on x x on y The product moment correlation coefficient In table 2, the two regression lines are further apart although there is weak negative linear correlation. The value of r is and it is getting closer to 0.

Table 3 x 5 10 15 20 25 30 35 y 31 19 23 32 6 y on x x on y The product moment correlation coefficient In table 3, the two regression lines are virtually perpendicular and there is no linear correlation. The value of r is and it is very close to 0.

Table 4 x 5 10 15 20 25 30 35 y 12 17 23 9 38 18 40 y on x x on y The product moment correlation coefficient In table 4, the two regression lines are further apart but we notice that as the x-data increases the y-data increases. We say there is weak positive linear correlation. The value of r is and it is moving away from 0 and getting closer to 1.

Table 5 x 5 10 15 20 25 30 35 y 2 4 12 16 18 26 27 32 y on x x on y The product moment correlation coefficient In table 5, we notice that the two regressions lines (y on x and x on y) nearly coincide and that as the x-data increases the y-data increases. The value of r is 0.990, which is very close to 1. Here we have what is called strong positive linear correlation.

r is called Product-Moment Correlation Coefficient.
The value of r determines the degree of linear scatter of the two sets of data and - indicates that the data have perfect negative linear correlation, - indicates that the data has no linear correlation, - indicates that the data have perfect positive linear correlation. r is called Product-Moment Correlation Coefficient.

Returning to our example
y on x x on y So we can conclude that as r is close to 1, that the results show that his hypothesis that “a student who has Mathematical ability also has ability in Physics’” might be true.

Maths/100