# Slide Slide 1 Chapter 4 Scatterplots and Correlation.

## Presentation on theme: "Slide Slide 1 Chapter 4 Scatterplots and Correlation."— Presentation transcript:

Slide Slide 1 Chapter 4 Scatterplots and Correlation

Slide Slide 2 Scatterplots Linear Correlation Coefficient Section 4.1 Scatter Diagrams and Correlation

Slide Slide 3 The source of the data is a full page advertisement placed in the Straits Times newspaper issue of February 29, 1992, by a Singapore-based retailer of diamond jewelry. The variables are the size of the diamond in carats (1 carat =.2 gram) and the price of ladies’ rings (single diamond stone) in Singapore dollars. Carats Singapore dollars.20495.16 328.17 350.19 385.25 642 …….….. How would you describe the association between the two variables? Association between two variables: Size of diamond and price of ring

Slide Slide 4 SCATTERPLOT: Diamond rings data Carat Price in US dollars N=48Averages.d.MinMax X Carat0.200.0560.120.35 Y Price in US \$865.144213.643851879 Diamond carats vs Price in US\$

Slide Slide 5 Terminology Response variable: measures the outcome of the study (Dependent variable) Explanatory variable: explains or causes changes in the response variable (Independent variable) Example: Carat=Explanatory variablePrice=Response variable

Slide Slide 6 4-6 6

Slide Slide 7 EXAMPLE Interpreting a Scatter Diagram The data shown to the right are based on a study for drilling rock. The researchers wanted to determine whether the time it takes to dry drill a distance of 5 feet in rock increases with the depth at which the drilling begins. Depth, x, is the explanatory variable, Time, y, (in minutes) to drill five feet is the response variable. Draw a scatter diagram of the data. Source: Penner, R., and Watts, D.G. “Mining Information.” The American Statistician, Vol. 45, No. 1, Feb. 1991, p. 6.

Slide Slide 8 4-8

Slide Slide 9 Interpreting scatter plots 1.Look for the overall pattern and for striking deviations 2.Define form, direction and strength of the relationship: a.Form: roughly linear if the points follow a straight line or nonlinear… b.Direction: positive or negative? c.Strength: how closely the points follow a clear form 3.Check for the presence of outliers, individual values that fall outside the overall pattern 4.Two variables are positively (negatively) associated if the increase of one variable correspond to an increase (decrease) in the other variable. Demo

Slide Slide 10 Various Types of Relations in a Scatter Diagram 4-10

Slide Slide 11 Example: 2000 Presidential Elections Did the butterfly ballots confuse voters? Did voters for Al Gore instead cast their votes for other candidates? Bush spokesman Ari Fleishcher stated on Nov. 9 2000 that "Palm Beach County is a Pat Buchanan stronghold and that's why Pat Buchanan received 3,407 votes there." What is the level of support that Pat Buchanan enjoys in Palm Beach County? The published election results show the association between the vote totals for Pat Buchanan and the total population for Florida counties.

Slide Slide 12 Is the association positive or negative? Is the form of the relationship almost linear? Outlier present?

Slide Slide 13 Another example: The statistics of poverty and inequality Data from U.N.E.S.C.O. 1990 Demographic Year Book. For 97 countries in the world, data are given for birth rates and for an index of the Gross National Product.

Slide Slide 14 Note: More information can be added into a graph by putting the categorical variable ON the scatter plot, either as a label of the points, or as a symbol instead of the points themselves, or by the use of color (different color for different category) as in the previous graph.

Slide Slide 15 The plot before shows a non-linear association! Sometimes we can make it linear, by using some transformations on the variables. Possible transformations are, for example, “ln”, “exp”, “sqrt”. Here we consider the natural log of GNP. Birth rate vs Log G.N.P. Linearization using Mathematical Transformations:

Slide Slide 16 Measure of Linear Association If there is a strong linear association between the variables, then the cloud of points on the scatter plot will be close to a line. Birth rate (1,000 pop) Log G.N.P.

Slide Slide 17 The Correlation Coefficient r The correlation coefficient r measures the direction and the strength of the linear relationship between two variables. It is a value between –1 and 1 The closer r is to 1 or –1, the stronger the linear association is. Positive values of r imply a positive association, negative values imply a negative association Values of r close to 0 imply weak linear association. Sample r is defined as: Where X data have average and standard deviation s x, and Y data have average and standard deviation s y.

Slide Slide 18 EXAMPLE Determining the Linear Correlation Coefficient Determine the linear correlation coefficient of the drilling data. 4-18

Slide Slide 19 (x i - 126.25)/s x (y i - 6.9858)/s y product

Slide Slide 20 20 4-20

Slide Slide 21 Properties of r  The correlation coefficient r varies between –1 and 1. If r=0 means there is no linear association between X and Y. If r=1 or –1, then the points in a scatter plot lie on a straight line.  Positive r indicates positive association between X and Y. Negative r indicates negative association between X and Y.  Both variables X and Y must be quantitative. The correlation coefficient between X and Y is the same as the correlation between Y and X  r does not change if we change the units of measurement for X and Y  The correlation measures only the linear relationship between two variables  r can be strongly affected by the presence of outliers.

Slide Slide 22 Example of correlation Birth rate (1,000 pop) Log G.N.P. r = -0.74 Negative association

Slide Slide 23 Diamond rings data Carat Price in US dollars N=48Averages.d.MinMax X Carat0.200.0560.120.35 Y Price in US \$ 865.144213.6 4 3851879 Strong positive association: r = 0.989 Diamond carats vs Price in US\$

Slide Slide 24 24 Positive Correlation In each plot there are 100 points. The correlation coefficient measures the amount of clustering around a line. If r is close to 1, then points lie close to a straight line!!

Slide Slide 25 25 Negative Correlation Negative correlation: as x increases, y tends to decrease. If r is close to – 1, then points lie close to a straight line!!

Slide Slide 26 Match the correlation with the plot! Match the diagrams with the following correlations: – 0.93 – 0.75 –0.200.270.631.0 More here

Slide Slide 27 Change of scale These are the low and high temperatures in Boulder (CO) for the month of April 1996. The first scatter plot uses degrees in Fahrenheit and the second plot uses degrees in centigrade. Notice that C o = 5/9*(F o – 32) Are the correlations between low and high temperatures in the two graphs different? r = 0.74r = ?

Slide Slide 28 28 Different correlations? In which diagram below is the correlation coefficient the largest? The smallest?

Slide Slide 29 29 Outliers and nonlinear association How are the data sets different?

Slide Slide 30 30 Plot the data: the nature of the association between x and y is very different. The correlation coefficient can be misleading in presence of outliers or non-linear association. Check the scatter plot of the data Perfect association! Why is r not equal to 1? Outliers change the value of r. What would the value of r be without the outliers? For each of these: r = 0.82

Slide Slide 31 31 Which of the following diagrams should be summarized by r? (1) (2) (3)

Slide Slide 32 32 Correlation does not mean Causation!!

Slide Slide 33 Ice cream sales and crime rates have a very high correlation. Does this mean that local governments should shut down all ice cream shops? Ans: There is another variable: temperature! As air temperatures rise, both ice cream sales and crime rates rise. Here, temperature is a lurking variable. Two variables can be related through a lurking variable even though there is no causal relation. 4-33 Example

Slide Slide 34 SCATTERPLOT and CORRELATION using Excel

Slide Slide 35 To graph a Scatterplot –(Highlight the two data columns) –Use the Chart Wizard –Choose: XY(Scatter) –Follow the dialog window steps appropriately (label axes etc.)

Slide Slide 36 Computing the Correlation coefficient  The correlation coefficient is computed using the Correlation function in the Data Analysis Toolpak. Click on TOOLS > DATA ANALYSIS > Correlation  Or you can use the function: = CORREL(data range X, data range Y) Example: If the X values are in B2:B25 and the Y values are in C2:C25, the correlation between the X data and Y data is obtained as follows: = CORREL(B2:B25, C2:C25)

Slide Slide 37 SCATTERPLOT and CORRELATION using Ti83

Slide Slide 38 Create the two Lists To input data into the STAT list editor: Enter STAT edit mode by pressing [STAT] [1]. Enter the data in the L1 and L2 lists, pressing [ENTER] after each entry. Press [2nd] [MODE] to QUIT and return to the home screen. Example: L1: {7,2,4,2,5} L2: {8,4,6,2,7}

Slide Slide 39 Graph the ScatterPlot Press [2nd] [Y=] to access the STAT PLOT editor. Press [ENTER] to edit Plot1. Press [ENTER] to turn ON Plot1. Scroll down and highlight the scatter plot graph type (first option in the first row). Press [ENTER] to select the scatter plot graph type. Scroll down and make sure Xlist: is set to L1 and Ylist: is set to L2. To input L1, press [2nd] [1]. To input L2, press [2nd] [2]. Press [GRAPH] to display the scatter plot. You may have to change the “Windows” settings to view your graph.

Slide Slide 40 Get the Correlation Coefficient r Turn on diagnostics with the [DiagnosticOn] command: –[2 nd ] [0] gets [CATALOG] – Scroll down to DiagnosticOn and press [ENTER] twice. [STAT] [►] [CALC] Scroll down to 4: LinReg(ax+b) press [ENTER] twice.

Similar presentations