# Chapter 10 Relationships between variables

## Presentation on theme: "Chapter 10 Relationships between variables"— Presentation transcript:

Chapter 10 Relationships between variables
Definition A Scatter Plot is a picture of bivariate numerical data in which each observation ( ie each pair of values (x,y)) is represented by a point located on a rectangular co-ordinate system. The Horizontal Axis is identified with values of x and the vertical axis with values of y.

Example: Draw a Scatter Plot to represent the following dataset:
Example: Draw a Scatter Plot to represent the following dataset: x: 1, 3, 2, 4, 7, 6, 5 y: 4, 2, 5, 6, 9, 8, 7

Another Example: Draw a Scatter Plot to represent the following dataset: x: 1, 3, 2, 4, 7, 6, 5 y: 4, 6, 1, 3, 2, 4, 1

Question Any comments on these two datasets
Question Any comments on these two datasets? Is there anything special about them? Looking at a scatter plot can sometimes allow us to determine if a relationship exists between two variables. But in general we need to go beyond pictures and develop a numerical measure of how strongly the two variables x and y are related.

Definition Pearson’s Sample Correlation Coefficient, r, is a measure of the strength of the linear relationship between two variables x and y.

Properties of r The correct interpretation of r requires an appreciation of some general properties: The value of r does not depend on the unit of measurement for either variable, nor does it depend on which variable is labelled x or y. The value of r is between -1 and 1. A positive value of r indicates a positive linear relationship between the variables. So as x increases so does y. A negative value of r corresponds to a negative relationship. As x increases y decreases.

The value r = 1, which indicates the strongest possible positive relationship between x and y results only when all points in the scatter plot lie exactly on a straight line that slopes upward. The value r = -1, which indicates the strongest possible negative relationship between x and y results only when all points in the scatter plot lie exactly on a straight line that slopes downward.

The value of r is a measure of the extent to which x and y are linearly related i.e. the extent to which the points in the scatter plot lie close to a straight line. A value close to zero does not rule out any strong relationship between x and y; there could still be a strong relationship but one that is not linear.

Examples For each of the following pairs of variables, indicate whether you would expect a positive correlation, a negative correlation or no correlation. Minimum daily temperature and heating costs Interest rate and number of loan applications Incomes of husbands and wives when both have full-time jobs Ages of boyfriends and girlfriends Height and IQ Height and shoe size Your Maths score in the Leaving Cert and your Irish score in the Leaving Cert

Correlation and causation
Years of research have established several facts: There is a strong correlation between the numbers of storks in a country and the number of births in that country. Countries with many storks have a high number of births and countries with low stork counts have low numbers of births. There is a high correlation among primary school children between vocabulary and numbers of tooth fillings. Children with many fillings have a larger vocabulary than children with only a small number or with no fillings.

Correlation and causation
What should we conclude from these facts? That storks really are responsible for bringing babies. That eating Mars bars will increase your vocabulary. No, these examples illustrate a very important point. Correlation is not the same as causation.

Correlation and causation
Larger countries have larger stork populations and usually have higher human populations as well and so there will be higher numbers of babies born than in smaller countries. Young children have very few fillings because they have only been around for a few years whereas older children have had time to eat lots of sweets, get a lot of bad teeth and learn a lot of new words. So be careful before you interpret a correlation as causation. It may be that a third confounding variable is causing the correlation: Size of country, Age of child.

Least Squares Introduction We have just mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes this is the case, eg: interest rate and number of loan applications. In this section we will deal with datasets which are correlated and in which one variable, x, is classed as an independent variable and the other variable, y, is called a dependent variable as the value of y depends on x.

Least Squares We saw that correlation implies a linear relationship. Well a line is described by the equation y = a +bx where b is the slope of the line and a is the intercept i.e. where the line cuts the y axis. The intercept a is just the value that y takes when x is zero. The slope b is how much y increases by when x increases by one unit.

Suppose we have a dataset which is strongly correlated and so exhibits a linear relationship, how would we draw a line through this data so that it fits all points best? We use the principle of least squares, we draw a line through the dataset so that the sum of the squares of the deviations of all the points from the line is minimised.

Regression Suppose we have a dataset and we have calculated the equation of the Least Squares Line y = a +bx Then we can use this line to predict a value for Y if we know a value for X. Note we should only predict for values of X which are bigger than the smallest X value in the dataset and smaller than the largest value in the dataset.

Y = 3.6 + (0.97) X Example of Regression:
A study performed in the UK examined the relationship between husband’s and wives’ ages. The data were analysed and a Least Squares Line computed: Y = (0.97) X Where Y is Husband’s age X is Wife’s age Predict the age of the husband of a 20 year old woman. Predict the age of the husband of a 25 year old woman.

Regression Answers: 20Yr old Woman Y = (0.97) 20 Y = 23.0 So Husband is probably 23 years old 25Yr old Woman Y = (0.97) 25 Y = 27.9 So Husband is probably 27.9 years old

Congratulations! It’s over! You have survived the dreaded course on STATISTICS. Hopefully none of you have died of Boredom.