Presentation on theme: "Simple ideas of correlation Correlation refers to a connection between two sets of data. We will also be able to quantify the strength of that relationship."— Presentation transcript:
Simple ideas of correlation Correlation refers to a connection between two sets of data. We will also be able to quantify the strength of that relationship. A powerful way to see if there is a connection between two data sets is to plot a graph of one against the other. The graph we get is called a scattergraph. By observing the shape of the graph, and by analysing the line (or curve) of best fit that goes through the points, we can determine how strong the correlation is.
Scattergraphs (with lines of best fit drawn) This graph shows very strong (almost perfect) positive correlation. Very strong: points are very close to the line Positive: as x increases, so does y This graph shows weak positive correlation. Weak: points are far from the line Positive: as x increases, so does y
Scattergraphs (continued) This graph shows weak negative correlation. Negative: as x increases, y decreases This graph shows no correlation. And this graph shows strong negative correlation.
A simple correlation exercise Here are some exam results of 10 students for Maths and Science: Maths (%)Science (%) 2643 5849 6671 8981 3529 3740 4471 6260 9193 1922
Scattergraph The information from the exam results is plotted as a (scatter)graph: This has been done using Geogebra, but it could also be done in Excel. The graph shows fairly strong positive correlation
Scattergraph The line of best fit is drawn: By eye, the line of best fit is not easy to draw – you have to balance the distances of the points above the line with those below it. Geogebra (or Excel) draws the line very easily. It even gives you the equation of the line.
An important idea Let’s go back to the original data: Maths (%)Science (%) 2643 5849 6671 8981 3529 3740 4471 6260 9193 1922 Mean = 52.7Mean = 55.9 Here we have calculated the mean (average) of the 10 Maths marks and the 10 Physics marks.
The Scattergraph again … The “average point” M(52.7, 55.9) is now added to the graph: Notice how the “average point” is right on the line. This is an important feature of the line of best fit – it always passes through
How good is the correlation? In addition to plotting the line of best fit, Excel (and other programmes, or devices like your GDC) will also calculate the value of r, the correlation coefficient. r will always be in the range -1 < r < 1 r = -1 implies perfect negative correlation r = 0 implies no correlation r = 1 implies perfect positive correlation In our example, r = 0.8917 (indicating fairly strong positive correlation) This gives r 2 = 0.79514 The interpretation of r 2 is important. In our example, we can say that 79.5% of any observed changes of Science marks can be linked to a change in Maths mark.