Presentation on theme: "Scatter Diagrams and Linear Correlation"— Presentation transcript:
1Scatter Diagrams and Linear Correlation Section 4.1Scatter Diagrams and Linear Correlation
2Scatter DiagramIs a graph in which data pairs (x, y) are plotted as individual points on a grid with horizontal axis x and vertical axis yWe call x the explanatory variable.We call y the response variable.
3Paired data x = phosphorus concentration at inlet y = phosphorus concentration at outlet
4Scatter Diagram Linear Correlation The general trend of the points seems to follow a straight line segment.
14Questions Arising Can we find a relationship between x and y? How strong is the relationship?The answer is that there is a mathematical measurement that describes the strength of the linear association between two variables. This measure is the sample correlation coefficient r.
15The Correlation Coefficient (r) A numerical measurement that assesses the strength of a linear relationship between two variables x and y
16Properties of the Correlation Coefficient r Also called the Pearson product-moment correlation coefficient, r is a unitless measurement between1 and 1.That is 1 < r < 1.
17Properties of the Correlation Coefficient r If r = 1, there is a perfect positive correlation.
18Properties of the Correlation Coefficient r If r = 1, there is a perfect negative correlation.
19Properties of the Correlation Coefficient r If r = 0, there is no linear correlation.
20Properties of the Correlation Coefficient r Positive values of r imply that as x increases, y tends to increase.
21Properties of the Correlation Coefficient r Negative values of r imply that as x increases, y tends to decrease.
22Properties of the Correlation Coefficient r The closer r is to 1 or +1, the better a line describes the relationship between the two variables x and y.The value of r does not change when either variable is converted to different units.
23Properties of the Correlation Coefficient r The value of r is the same regardless of which variable is the explanatory variable and which variable is the response variable. In other words, the value of r is the same for the pairs (x, y) as for the pairs (y, x).
24Computing the Correlation Coefficient r Obtain a random sample of n data pairs (x, y).Using the data pairs, compute Σx, Σy, Σx², Σy², and Σxy.Use the following formula:
26Computing rInterpretation of r:An r value of indicates a strong positive correlation between the variables x and y
27GUIDED EXERCISEIn one of the Boston city parks, there has been a problem with muggings in the summer months. A police officer took a random sample of 10 days (out of the 90-day summer) and compile the following data. For each day, x represents the number of police officers on duty in the park and y represents the number of reported muggings on that day. x y
28GUIDED EXERCISE Cont. Construct a diagram of x and y values. Plot the (x, y)From the scatter diagramr will be negative. The general trend is that large x values are associated with small y values and vice versa. From left to right, the least-square line goes down
29GUIDED EXERCISE Cont. Verify that Σx = 103, Σy = 47, Σ = 1347, Σ = 295, and Σxy = 343.Use calculator.Compute r. Alternatively, find the value of r directly by using a calculator.
30Sample compared to Population Correlation Sample correlation coefficient = rPopulation correlation coefficient = ρρ is the Greek letter rho.
31A CautionThe correlation coefficient measures the strength of the relationship between two variables.A strong correlation does not imply a cause and effect relationship.A correlation between two variables may be caused by other (either known or unknown) variables called lurking variables.
32Lurking VariableA lurking variable is neither an explanatory nor a response variable.A lurking variable may be responsible for changes in both x and y.
33Example Correlation does not equal Causation! You were given the data the weight of cars in pounds with their highway gas mileage. You found a linear regression equation and determined that your model was a good fit. Car Weight in Pounds Gas Mileage MPG
34Example cont. Correlation does not equal Causation! So, you now state for the whole world to hear that heavier cars get less gas mileage. Right???Not necessarily. Your statement may be correct for this particular set of data, but it may not be a universal truth. It may also be true that the weight of the car has nothing to do with the gas mileage. Perhaps some other factor is affecting the gas mileage.Just because a correlation exists does not guarantee that the change in one of your variables is causing the change in the other variable.
35Example Cause-Effect Relationship During the months of March and April, the weekly weight increases of a puppy in New York were collected. For the same time frame, the retail price increases of snowshoes in Alaska were collected.Weekly Data CollectionThe weight of a The retail price ofGrowing puppy in snowshoes inNew York Alaska 8 pounds $32.45$32.959 $33.45$34.00$34.50$35.10$35.63
36Example Cause-Effect Relationship cont. The data was examined and was found to have a very strong linear correlation. So, this must mean that the weight increase of a puppy in New York is causing snowshoe prices in Alaska to increase. Of course this is not true! The moral of this example is: "be careful what you infer from your statistical analyses." Be sure your relationship makes sense. Also keep in mind that other factors may be involved in a cause-effect relationship
37Scatter Plots (calc)A scatter plot is a graph used to determine whether there is a relationship between paired data.In many real-life situations, scatter plots follow patterns that are approximately linear. If y tends to increase as x increases, then the paired data are said to be a positive correlation. If y tends to decrease as x increases, the paired data are said to be a negative correlation. If the points show no linear pattern, the paired data are said to have relatively no correlation. To set up a scatter plot: Clear (or deactivate) any entries in Y= before you begin.1. Enter the X data values in L1. Enter the Y data values in L2, being careful that each X data value and its matching Y data value are entered on the same horizontal line.
38Scatter Plots cont. (calc) 2. Activate the scatter plot. Press 2nd STATPLOT and choose #1 PLOT 1. Be sure the plot is ON, the scatter plot icon is highlighted, and that the list of the X data values are next to Xlist, and the list of the Y data values are next to Ylist. Choose any of the three marks. 3. To see the scatter plot, press ZOOM and #9 ZoomStat. Hitting TRACE and right arrow will move along the data points. 4. To turn the scatter plot off, when you are finished with this problem: Method 1: Go to the Y= screen. Arrow up onto the PLOT highlighted at the top of the screen. Press ENTER to turn it off. Method 2: Go to STAT PLOT (above Y=). Choose your PLOT location. Arrow to OFF. Press ENTER to turn it off.
39Scatter Plots cont. (calc) Follow-up: * At this point, the graph may be observed for the existence of a positive, negative or no correlation between the data. * A line of best fit can be calculated “manually”. 1. Select two points that you feel would give a line that fits the data. 2. Using your knowledge of equations of lines and slope, write the equation of your line. 3. Enter this equation into Y1 and graph. 4. How well does the line “fit” the data? 5. Use your line to make predictions.* Or a line of best fit can be calculated "using the calculator". See Line of Best Fit.
40Line of Best Fit (calc)A line of best fit (or "trend" line) is a straight line that best represents the data on a scatter plot. This line may pass through some of the points, none of the points, or all of the points.You can examine lines of best fit with: 1. paper and pencil only 2. a combination of graphing calculator and paper and pencil 3. or solely with the graphing calculator
41Line of Best Fit cont. (calc) Example: Is there a relationship between the fat grams and the total calories in fast food?Sandwich Total Fat (g) Total CaloriesHamburgerCheeseburgerQuarter PounderQuarter Pounder with CheeseBig MacArch Sandwich SpecialArch Special with BaconCrispy ChickenFish FilletGrilled ChickenGrilled Chicken Light
42Line of Best Fit cont. (calc) Paper and Pencil Solution: 1. Prepare a scatter plot of the data on graph paper. 2. Find two points that you think will be on the "best-fit" line. Perhaps you chose the points (9, 260) and (30,530). Different people may choose different points. 3. Calculate the slope of the line through your two points (rounded to three decimal places).
43Line of Best Fit cont. (calc) 4. Write the equation of the line. This equation can now be used to predict information that was not plotted in the scatter plot. For example, you can use the equation to find the total calories based upon 22 grams of fat. Equation: Prediction based on 22 grams of fat: Different people may choose different points and arrive at different equations. All of them are "correct", but which one is actually the "best"? To determine the actual "best" fit, we will use a graphing calculator.
44Line of Best Fit cont. (calc) Graphing Calculator Solution: 1. Enter the data in the calculator lists. Place the data in L1 and L2. STAT, #1Edit, type values into the lists 2. Prepare a scatter plot of the data. Set up for the scatterplot. 2nd StatPlot - choose the first icon. Choose ZOOM #9 ZoomStat.
45Line of Best Fit cont. (calc) 3. Have the calculator determine the line of best fit. STAT → CALC #4 LinReg(ax+b) Include the parameters L1, L2, Y1. (Y1 comes from VARS → YVARS, #Function, Y1) You now have the values of a and b needed to write the equation of the actual line of best fit. y = x Graph the line of best fit. Simply hit GRAPH. To get a predicted value within the window, hit TRACE, up arrow, and type the desired value. The screen shows x = 22.