Presentation is loading. Please wait.

Presentation is loading. Please wait.

Association between two variables Example: University fees for the Big Ten Universities Data were collected to study the association between the percentage.

Similar presentations


Presentation on theme: "Association between two variables Example: University fees for the Big Ten Universities Data were collected to study the association between the percentage."— Presentation transcript:

1 Association between two variables Example: University fees for the Big Ten Universities Data were collected to study the association between the percentage of students that were from out of state and the tuition paid by nonresident students (in thousand dollars). Does the tuition money increase with the percentage of non- resident students? (Does the percentage on non-resident students increase with the tuition money?) University Tuition (1,000$) (Y) Nonresidents (%) (X) Northwestern16.472 Illinois7.68 Minnesota8.723 Ohio State9.39 Penn State10.718 Purdue9.6 27 Indiana10.229 Iowa8.631 Wisconsin9.135 Michigan15.930 Michigan State 10.59

2 Example: Example: Size of diamond and price of ring The source of the data is a full page advertisement placed in the Straits Times newspaper issue of February 29, 1992, by a Singapore-based retailer of diamond jewelry. The variables are the size of the diamond in carats (1 carat =.2 gram) and the price of ladies’ rings (single diamond stone) in Singapore dollars. Carats Singapore dollars.17355.16 328.17 350.18 325.25 642 …….….. How would you describe the association between the two variables?

3 Association between variables Data are pairs (x i, y i ) collected for two variables X and Y on each individual/unit Two variables are associated if changes in one variable correspond to changes in the second variable. If there is a strong association, knowing one variable helps predicting the other. Diamond carat size & ring price Blood pressure level and number of cigarettes smoked per day If the association is weak, information about one variable is not very useful in studying the other. In neither case is there any implied causality.

4 Useful terminology The following terms are often used: Response variable: measures the outcome of the study (Dependent variable) Explanatory variable: explains or causes changes in the response variable (Independent variable) Can you identify this distinction in the examples shown earlier? 1) Tuition = Response variableNon-residents=Explanatory variable 2) Carat=Explanatory variablePrice=Response variable In this case, knowledge of the data may lead us to believe causality.

5 Scatter plots: displaying data about two variables Scatter plots show the relationship between two quantitative variables. One variable (independent variable) appears on the x-axis (horizontal axis) and the dependent variable appears on the y-axis (vertical axis). Each observation is represented by a point in the plot. Tuition Nonresident students NWU UMich

6 Interpreting scatter plots 1.Look for the overall pattern and for striking deviations 2.Define form, direction and strength of the relationship: a.Form: roughly linear if the points follow a straight line or nonlinear… b.Direction: positive or negative? c.Strength: how closely the points follow a clear form 3.Check for the presence of outliers, individual values that fall outside the overall pattern 4.Two variables are positively (negatively) associated if the increase of one variable correspond to an increase (decrease) in the other variable.

7 2000 Presidential Elections Did the butterfly ballots confuse voters? Did voters for Al Gore instead cast their votes for other candidates? Bush spokesman Ari Fleishcher stated on Nov. 9 that "Palm Beach County is a Pat Buchanan stronghold and that's why Pat Buchanan received 3,407 votes there." What is the level of support that Pat Buchanan enjoys in Palm Beach County? The published election results show the association between the vote totals for Pat Buchanan and the total population for Florida counties.

8 Is the association positive or negative? Is the form of the relationship almost linear?

9 The Correlation Coefficient r The correlation coefficient r measures the direction and the strength of the linear relationship between two variables. It is a value between –1 and 1 If r is negative, Y tends to decrease linearly with X If r is positive, Y tends to increase linearly with X. The closer r is to 1 or –1, the stronger the linear association is. Values of r close to 0 imply weak linear association. r is defined as Where X has average and standard deviation s x, and Y has average and standard deviation s y.

10 Examples of correlation Birth rate (1,000 pop) Log G.N.P. r = -0.74 Selling price (100$) Annual Taxes ($) r=0.65 Negative association Positive association

11 Diamond rings data Carat Price N=48Averages.d.MinMax X Carat0.200.0560.120.35 Y Price865.144213.643851879 Strong positive association r = 0.989 Carats vs Price

12 Positive Correlation In each plot there are 100 points. The correlation coefficient measures the amount of clustering around a line If r is close to 1, then points lie close to a straight line!!

13 Negative Correlation Negative correlation: as x increases, y tends to decrease.

14 Guess the correlation Match the diagrams with the following correlations: – 0.93 – 0.75 –0.200.270.631.0

15 Different correlations? In which diagram below is the correlation coefficient the largest? The smallest?

16 Summary  The correlation coefficient r varies between –1 and 1. If r=0 then there no linear association between X and Y.  Positive r indicates positive association between X and Y. Negative r indicates negative association between X and Y.  Both variables X and Y must be quantitative. The correlation coefficient between X and Y is the same as the correlation between Y and X  The correlation measures only the linear relationship between two variables  r can be strongly affected by the presence of outliers.

17 Compute correlation in Excel The correlation coefficient is computed using the CORR function in the Data Analysis Toolpak. Click on TOOLS > DATA ANALYSIS > Correlation Or you can use the function “=CORREL(data range X, data range Y)” For instance if X values are in B2:B25 and Y values are in C2:C25: =CORREL(B2:B25, C2:C25)


Download ppt "Association between two variables Example: University fees for the Big Ten Universities Data were collected to study the association between the percentage."

Similar presentations


Ads by Google