Presentation on theme: "7.1 Seeking Correlation LEARNING GOAL"— Presentation transcript:
1 7.1 Seeking Correlation LEARNING GOAL Be able to define correlation, recognize positive and negative correlations on scatter diagrams, and understand the correlation coefficient as a measure of the strength of a correlation.
2 Statistical ThinkingThought Question 1For all cars manufactured in the U.S., there is a positive correlation between the size of the engine and horsepower. There is a negative correlation between the size of the engine and gas mileage. What does it mean for two variables to have a positive correlation or a negative correlation?Chapter 14
3 Types of CorrelationPositive linear correlation: Both variables tend to increase (or decrease) togetherNegative linear correlation: One variable increases while the other variable decreasesNo correlation: no apparent linear relationshipNonlinear relationship: variables related but not in a straight line pattern
4 Statistical ThinkingThought Question 2What type of correlation would the following pairs of variables have – positive, negative, or none?Temperature during the summer and electricity billsTemperature during the winter and heating costsNumber of years of education and heightFrequency of brushing and number of cavitiesNumber of churches and number of bars in cities in a cityHeight of husband and height of wifeChapter 14
5 Scatter Diagram or Scatterplot A graph in which each point represents the values of two variables.Always plot the explanatory variable (independent variable) on the horizontal axis.Always plot the response variable (dependent variable) on the vertical axis.If there is no explanatory/response distinction either variable can go on the horizontal axis.
6 Is there a correlation between car weight and fuel consumption Is there a correlation between car weight and fuel consumption? Draw a scatterplot.Car weight (lb)Fuel consumption (mpg)31752734502932253985242440372500342290
7 Measuring the Strength of a Correlation The strength of a correlation is measured with a number called the correlation coefficient, represented by the letter r.
8 Properties of the Correlation Coefficient, r The values of r is such thatIf there is no correlation, the value of r is close to 0.If there is positive correlation, r is positive. The closer r is to 1, the stronger the correlation.If there is negative correlation, r is negative. The closer r is to -1, the stronger the correlation.If r=1, there is perfect positive correlation.If r=-1, there is perfect negative correlation.
11 7.2 Interpreting Correlations LEARNING GOALBe aware of important cautions concerning the interpretation of correlations, especially the effects of outliers, the effects of grouping data, and the crucial fact that correlation does not necessarily imply causality.
12 Statistical ThinkingBeware of OutliersConsider the two scatterplots below. How does the outlier impact the correlation for each plot?Does the outlier increase the correlation, decrease the correlation, or have no impact?Chapter 14
13 If the outlier is included, r = 0.880 If the outlier is removed, r = 0
14 What should we do with outliers? If the outliers are mistakes in the data set, they produce apparent correlations that are not real or may mask the presence of real correlations.If the outliers represent correct data points, they may help us to see relationships.Examine outliers carefully, but do not remove them unless we have strong reason to believe they do not belong.
15 Beware of Inappropriate Grouping Scatterplot of heights versus weights of males and females r =0.545
16 Separate the previous data into males and females Male height versus weight datar = 0.522Female height versus weight datar = 0.366
17 Correlation does not imply causality. Possible explanations for a correlationThe correlation may be a coincidence.Both correlation variables might be directly influenced by some common underlying cause.3. One of the correlated variables may actually be a cause of the other. Even then, it may be just one of several causes.
18 The time spent in recreation on weekends and scores on a Monday exam. a. State the correlation clearly. b. Is the correlation due to coincidence, a common underlying cause, or a direct cause. Explain.The time spent in recreation on weekends and scores on a Monday exam.The outside temperature and the amount of ice cream sold.The number of wins of a basketball team and the number of spectators.The weight of a person and the time spent reading.
19 7.3 Best-Fit Lines and Prediction LEARNING GOALBecome familiar with the concept of a best-fit line for a correlation, recognize when such lines have predictive valueand when they may not, understand how the square of the correlation coefficient is related to the quality of the fit,and qualitatively understand the use of multiple regression.
20 Best-fit lineThe best-fit line on a scatter diagram is a line that lies closer to the data points than any other possible line.Also called a regression line or least-squares line.It is called a least squares line because it is the line that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.
22 Best –Fit Lines Equation of best-fit line: y = a + bx Statistical ThinkingBest –Fit LinesEquation of best-fit line: y = a + bxx is the value of the explanatory variabley is the average value of the response variablenote that a and b are just the intercept and slope of a straight linenote that r and b are not the same thing, but their signs will agree.We will use the regression equation to predict the response variable, y, given the explanatory variable, x.Use software to calculate the regression equation.PlotThe <Plot> link on this slide is to the Correlation & Regression applet found on the VCU Stat 208 website.The address is .Chapter 15
23 Car Weight and Fuel Consumption Car Weight (lb)Fuel Consumption (mpg)Chrysler Sebring317527Ford Mustang345029BMW 3-series3225Ford Crown Victoria398524Honda Civic244037Mazda Protégé250034Hyundai Accent2290
25 Use Excel to write equation of best-fit line Look at the Coefficient column. The equation of the best-fit line is:Y= x or mpg = (car weight in lb.)How many miles per gallon would a 2000 lb. car get?Is it reasonable to make a prediction for a 2000 lb. car? A lb. car? A 4000 lb. car?
26 Cautions in Making Predictions from Best-Fit Lines Best-fit lines only give a good prediction when the correlation is strong and there are many data points.Only use the best-fit line to make predictions within the bounds of the data points used.A best-fit line based on past data is not necessarily valid now or in the future.Don’t make predictions about a population different from which the data is drawn.Best-fit line is meaningless when there is no significant correlation or the relationship is nonlinear.
28 Correlation between your bill and the tip you leave $10.15$25.36$7.38$43.78$55.8911.17$33.89$26.17$21.18Tip$2.00$3.50$1.50$6.00$9.00$5.50$4.00r squared = 0.93393.3% of the variation in the tip can be explained by the cost of the bill.What explains the other 6.7%?What should the slope of the best-fit line be?The equation of the best-fit line is y= xWhat does the line tell you about how people tip?