Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Page 286 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or.

Similar presentations


Presentation on theme: "Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Page 286 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or."— Presentation transcript:

1 Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Page 286 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 1

2 Definition A correlation exists if there is a relationship between two quantities and, if so, will tell how strong the relationship is. A scatter diagram (or scatterplot) is a graph in which each point represents the values of two variables. 2

3 Figure 7.3 Types of correlation seen on scatter diagrams. Types of Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. Page 289 3

4 Correlation The correlation coefficient, r, is a unit-less measure that describes the strength of the linear relationship between two variables. – If the value is positive, as one variable increases, the other increases. – If the value is negative, as one variable increases, the other decreases. – The variable, r, will always be a value between -1 and 1 inclusive. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 4

5 Linear Correlation Coefficient The formula to calculate the correlation coefficient (r) is as follows: Page 294 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. Tables can be made just like we did with the Standard Deviation. We can calculate this in StatCrunch > Stats > Summary Stats > Correlation 5

6 7.2 Interpreting Correlations Page 299 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 6

7 Cautions Outliers can cause bad interpretations. If they are removed from the calculations, you must state that and why. Bad Groupings can also cause bad interpretations, they can hide or show what someone wants you to see, or other issues. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 7

8 Correlation Does Not Imply Causality Possible Explanations for a Correlation 1. The correlation may be a coincidence. 2. Both correlation variables might be directly influenced by some common underlying cause. 3. One of the correlated variables may actually be a cause of the other. But note that, even in this case, it may be just one of several causes. Page 303 8

9 7.3 Best-Fit Lines & Prediction Page 307 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 9

10 The line of best fit (regression line or the least squares line) is the line that best fits the data, i.e. it is closer to the data than any other line. This line can be calculated as: y = mx + b, where Slope, m = r(s y /s x ), with s y is the standard deviation of y & s x is the standard deviation of x Y-intercept, b = y – (m * x), with y as the mean of the y’s and x as the mean of the x’s. (again, StatCrunch or another program is handy) Page 313 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 10

11 Cautions in Making Predictions from Best-Fit Lines 1.Don’t expect a best-fit line to give a good prediction unless the correlation is strong and there are many data points. If the sample points lie very close to the best-fit line, the correlation is very strong and the prediction is more likely to be accurate. If the sample points lie away from the best-fit line by substantial amounts, the correlation is weak and predictions tend to be much less accurate. 2.Don’t use a best-fit line to make predictions beyond the bounds of the data points to which the line was fit. 3.A best-fit line based on past data is not necessarily valid now and might not result in valid predictions of the future. 4.Don’t make predictions about a population that is different from the population from which the sample data were drawn. 5.Remember that a best-fit line is meaningless when there is no significant correlation or when the relationship is nonlinear. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 11

12 State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Solution: No one exercises 18 hours per day on an ongoing basis, so this much exercise must be beyond the bounds of any data collected. Therefore, a prediction about someone who exercises 18 hours per day should not be trusted. EXAMPLE 1 Valid Predictions? You’ve found a best-fit line for a correlation between the number of hours per day that people exercise and the number of calories they consume each day. You’ve used this correlation to predict that a person who exercises 18 hours per day would consume 15,000 calories per day. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 12

13 State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Solution: EXAMPLE 1 Valid Predictions? Historical data have shown a strong negative correlation between national birth rates and affluence. That is, countries with greater affluence tend to have lower birth rates. These data predict a high birth rate in Russia. We cannot automatically assume that the historical data still apply today. In fact, Russia currently has a very low birth rate, despite also having a low level of affluence. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 13

14 State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Solution: EXAMPLE 1 Valid Predictions? A study in China has discovered correlations that are useful in designing museum exhibits that Chinese children enjoy. A curator suggests using this information to design a new museum exhibit for Atlanta-area school children. The suggestion to use information from the Chinese study for an Atlanta exhibit assumes that predictions made from correlations in China also apply to Atlanta. However, given the cultural differences between China and Atlanta, the curator’s suggestion should not be considered without more information to back it up. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 14

15 State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Solution: EXAMPLE 1 Valid Predictions? Scientific studies have shown a very strong correlation between children’s ingesting of lead and mental retardation. Based on this correlation, paints containing lead were banned. Given the strength of the correlation and the severity of the consequences, this prediction and the ban that followed seem quite reasonable. In fact, later studies established lead as an actual cause of mental retardation, making the rationale behind the ban even stronger. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 15

16 State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. Solution: EXAMPLE 1 Valid Predictions? Based on a large data set, you’ve made a scatter diagram for salsa consumption (per person) versus years of education. The diagram shows no significant correlation, but you’ve drawn a best-fit line anyway. The line predicts that someone who consumes a pint of salsa per week has at least 13 years of education. Because there is no significant correlation, the best-fit line and any predictions made from it are meaningless. Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 16

17 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. The square of the correlation coefficient, or r 2, is the proportion of the variation in a variable that is accounted for by the best-fit line. The use of multiple regression allows the calculation of a best-fit equation that represents the best fit between one variable (such as price) and a combination of two or more other variables (such as weight and color). The coefficient of determination, R 2, tells us the proportion of the scatter in the data accounted for by the best-fit equation. 17

18 Political scientists are interested in knowing what factors affect voter turnout in elections. One such factor is the unemployment rate. Data collected in presidential election years since 1964 show a very weak negative correlation between voter turnout and the unemployment rate, with a correlation coefficient of about r = -0.1. Based on this correlation, should we use the unemployment rate to predict voter turnout in the next presidential election? Note that there is a scatter diagram of the voter turnout data on page 312. Solution: The square of the correlation coefficient is r 2 = (-0.1) 2 = 0.01, which means that only about 1% of the variation in the data is accounted for by the best-fit line. Nearly all of the variation in the data must therefore be explained by other factors. We conclude that unemployment is not a reliable predictor of voter turnout. EXAMPLE 4 Voter Turnout and Unemployment Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 18

19 7.4 The Search for Causality Page 315 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them. 19

20 Guidelines for Establishing Causality If you suspect that a particular variable (the suspected cause) is causing some effect: 1.Look for situations in which the effect is correlated with the suspected cause even while other factors vary. 2.Among groups that differ only in the presence or absence of the suspected cause, check that the effect is similarly present or absent. 3.Look for evidence that larger amounts of the suspected cause produce larger amounts of the effect. 4.If the effect might be produced by other potential causes (besides your suspected cause), make sure that the effect still remains after accounting for these other potential causes. 5.If possible, test the suspected cause with an experiment. If the experiment cannot be performed with humans for ethical reasons, consider doing the experiment with animals, cell cultures, or computer models. 6.Try to determine the physical mechanism by which the suspected cause produces the effect. 20 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them.

21 Hidden Causality Sometimes correlations—or the lack of a correlation—can hide an underlying causality. For example, studies suggested patients who had heart bypass surgery fared no better than those who didn’t. But researchers found confounding variables that early studies had not considered, such as amount of blockage and surgical techniques. These confounding variables prevented the studies from finding a real correlation between the surgery and prolonged life. 21 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them.

22 Broad Levels of Confidence in Causality Possible cause: We have discovered a correlation, but cannot yet determine whether the correlation implies causality. Probable cause: We have good reason to suspect that the correlation involves cause, perhaps because some of the guidelines for establishing causality are satisfied. Cause beyond reasonable doubt: We have found a physical model that is so successful in explaining how one thing causes another that it seems unreasonable to doubt the causality. 22 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter Seminar Put ? in front of Questions so it is easier to see them.


Download ppt "Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Page 286 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or."

Similar presentations


Ads by Google