# Scatter Diagrams and Linear Correlation

## Presentation on theme: "Scatter Diagrams and Linear Correlation"— Presentation transcript:

Scatter Diagrams and Linear Correlation
Section 4.1 Scatter Diagrams and Linear Correlation

Scatter Diagram Is a graph in which data pairs (x, y) are plotted as individual points on a grid with horizontal axis x and vertical axis y We call x the explanatory variable. We call y the response variable.

Paired data x = phosphorus concentration at inlet
y = phosphorus concentration at outlet

Scatter Diagram Linear Correlation
The general trend of the points seems to follow a straight line segment.

Non-Linear Correlation

No Linear Correlation

High Linear Correlation
Points lie close to a straight line.

Moderate Linear Correlation

Low Linear Correlation

Perfect Linear Correlation

Positive Linear Correlation

Negative Linear Correlation

Little or No Linear Correlation

Questions Arising Can we find a relationship between x and y?
How strong is the relationship? The answer is that there is a mathematical measurement that describes the strength of the linear association between two variables. This measure is the sample correlation coefficient r.

The Correlation Coefficient (r)
A numerical measurement that assesses the strength of a linear relationship between two variables x and y

Properties of the Correlation Coefficient r
Also called the Pearson product-moment correlation coefficient, r is a unitless measurement between 1 and 1. That is 1 < r < 1.

Properties of the Correlation Coefficient r
If r = 1, there is a perfect positive correlation.

Properties of the Correlation Coefficient r
If r = 1, there is a perfect negative correlation.

Properties of the Correlation Coefficient r
If r = 0, there is no linear correlation.

Properties of the Correlation Coefficient r
Positive values of r imply that as x increases, y tends to increase.

Properties of the Correlation Coefficient r
Negative values of r imply that as x increases, y tends to decrease.

Properties of the Correlation Coefficient r
The closer r is to  1 or +1, the better a line describes the relationship between the two variables x and y. The value of r does not change when either variable is converted to different units.

Properties of the Correlation Coefficient r
The value of r is the same regardless of which variable is the explanatory variable and which variable is the response variable. In other words, the value of r is the same for the pairs (x, y) as for the pairs (y, x).

Computing the Correlation Coefficient r
Obtain a random sample of n data pairs (x, y). Using the data pairs, compute Σx, Σy, Σx², Σy², and Σxy. Use the following formula:

Example: Computing r

Computing r Interpretation of r: An r value of indicates a strong positive correlation between the variables x and y

GUIDED EXERCISE In one of the Boston city parks, there has been a problem with muggings in the summer months. A police officer took a random sample of 10 days (out of the 90-day summer) and compile the following data. For each day, x represents the number of police officers on duty in the park and y represents the number of reported muggings on that day. x y

GUIDED EXERCISE Cont. Construct a diagram of x and y values.
Plot the (x, y) From the scatter diagram r will be negative. The general trend is that large x values are associated with small y values and vice versa. From left to right, the least-square line goes down

GUIDED EXERCISE Cont. Verify that Σx = 103, Σy = 47, Σ = 1347,
Σ = 295, and Σxy = 343. Use calculator. Compute r. Alternatively, find the value of r directly by using a calculator.

Sample compared to Population Correlation
Sample correlation coefficient = r Population correlation coefficient = ρ ρ is the Greek letter rho.

A Caution The correlation coefficient measures the strength of the relationship between two variables. A strong correlation does not imply a cause and effect relationship. A correlation between two variables may be caused by other (either known or unknown) variables called lurking variables.

Lurking Variable A lurking variable is neither an explanatory nor a response variable. A lurking variable may be responsible for changes in both x and y.

Example Correlation does not equal Causation!
You were given the data the weight of cars in pounds with their highway gas mileage. You found a linear regression equation and determined that your model was a good fit. Car Weight in Pounds Gas Mileage MPG

Example cont. Correlation does not equal Causation!
So, you now state for the whole world to hear that heavier cars get less gas mileage.  Right??? Not necessarily.  Your statement may be correct for this particular set of data, but it may not be a universal truth.  It may also be true that the weight of the car has nothing to do with the gas mileage.  Perhaps some other factor is affecting the gas mileage. Just because a correlation exists does not guarantee that the change in one of your variables is causing the change in the other variable.

Example Cause-Effect Relationship
During the months of March and April, the weekly weight increases of a puppy in New York were collected.  For the same time frame, the retail price increases of snowshoes in Alaska were collected. Weekly Data Collection The weight of a The retail price of Growing puppy in snowshoes in New York Alaska  8 pounds \$32.45 \$32.95 9 \$33.45 \$34.00 \$34.50 \$35.10 \$35.63

Example Cause-Effect Relationship cont.
The data was examined and was found to have a very strong linear correlation. So, this must mean that the weight increase of a puppy in New York is causing snowshoe prices in Alaska to increase.  Of course this is not true!      The moral of this example is:  "be careful what you infer from your statistical analyses."  Be sure your relationship makes sense.  Also keep in mind that other factors may be involved in a cause-effect relationship

Scatter Plots (calc) A scatter plot is a graph used to determine whether there is a relationship between paired data. In many real-life situations, scatter plots follow patterns that are approximately linear.  If y tends to increase as x increases, then the paired data are said to be a positive correlation.  If y tends to decrease as x increases, the paired data are said to be a negative correlation.  If the points show no linear pattern, the paired data are said to have relatively no correlation. To set up a scatter plot: Clear (or deactivate) any entries in Y= before you begin. 1.  Enter the X data values in L1.  Enter the Y data values in L2, being careful that each X data value and its matching Y data value are entered on the same horizontal line.

Scatter Plots cont. (calc)
2. Activate the scatter plot. Press 2nd STATPLOT and choose #1 PLOT 1. Be sure the plot is ON, the scatter plot icon is highlighted, and that the list of the X data values are next to Xlist, and the list of the Y data values are next to Ylist. Choose any of the three marks. 3. To see the scatter plot, press ZOOM and #9 ZoomStat. Hitting TRACE and right arrow will move along the data points. 4. To turn the scatter plot off, when you are finished with this problem: Method 1: Go to the Y= screen. Arrow up onto the PLOT highlighted at the top of the screen. Press ENTER to turn it off. Method 2: Go to STAT PLOT (above Y=). Choose your PLOT location. Arrow to OFF. Press ENTER to turn it off.

Scatter Plots cont. (calc)
Follow-up: *  At this point, the graph may be observed for the existence of a positive, negative or no correlation     between the data. *  A line of best fit can be calculated “manually”.     1. Select two points that you feel would give a line that fits the data.     2. Using your knowledge of equations of lines and slope, write the equation of your line.     3. Enter this equation into Y1 and graph.     4. How well does the line “fit” the data?     5. Use your line to make predictions. *  Or a line of best fit can be calculated "using the calculator".     See Line of Best Fit.

Line of Best Fit (calc) A line of best fit  (or "trend" line) is a straight line that best represents the data on a scatter plot.  This line may pass through some of the points, none of the points, or all of the points. You can examine lines of best fit with:      1.  paper and pencil only      2.  a combination of graphing calculator and            paper and pencil      3.  or solely with the graphing calculator

Line of Best Fit cont. (calc)
Example:  Is there a relationship between the fat grams and the total calories in fast food? Sandwich Total Fat (g) Total Calories Hamburger Cheeseburger Quarter Pounder Quarter Pounder with Cheese Big Mac Arch Sandwich Special Arch Special with Bacon Crispy Chicken Fish Fillet Grilled Chicken Grilled Chicken Light

Line of Best Fit cont. (calc)
Paper and Pencil Solution: 1. Prepare a scatter plot of the data on graph paper. 2. Find two points that you think will be on the "best-fit" line. Perhaps you chose the points (9, 260) and (30,530). Different people may choose different points. 3. Calculate the slope of the line through your two points (rounded to three decimal places).

Line of Best Fit cont. (calc)
4.  Write the equation of the line.  This equation can now be used to predict information that was not plotted in the scatter plot.  For example, you can use the equation to find the total calories based upon 22 grams of fat. Equation:                              Prediction based on 22 grams of fat:                                                     Different people may choose different points and arrive at different equations.  All of them are "correct", but which one is actually the "best"?  To determine the actual "best" fit, we will use a graphing calculator.

Line of Best Fit cont. (calc)
Graphing Calculator Solution: 1. Enter the data in the calculator lists. Place the data in L1 and L2. STAT, #1Edit, type values into the lists 2. Prepare a scatter plot of the data. Set up for the scatterplot. 2nd StatPlot - choose the first icon. Choose ZOOM #9 ZoomStat.

Line of Best Fit cont. (calc)
3. Have the calculator determine the line of best fit. STAT → CALC #4 LinReg(ax+b) Include the parameters L1, L2, Y1. (Y1 comes from VARS → YVARS, #Function, Y1) You now have the values of a and b needed to write the equation of the actual line of best fit. y = x Graph the line of best fit. Simply hit GRAPH. To get a predicted value within the window, hit TRACE, up arrow, and type the desired value. The screen shows x = 22.