Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Statistics Introduction to Statistics Correlation Chapter 15 Apr 29-May 4, 2010 Classes #28-29.

Similar presentations


Presentation on theme: "Introduction to Statistics Introduction to Statistics Correlation Chapter 15 Apr 29-May 4, 2010 Classes #28-29."— Presentation transcript:

1 Introduction to Statistics Introduction to Statistics Correlation Chapter 15 Apr 29-May 4, 2010 Classes #28-29

2 Correlation Chapter 15: –Correlation pp. 466-485 –Not responsible for remainder of the chapter

3 Correlation A statistical technique that is used to measure and describe a relationship between two variables –For example: GPA and TD’s scored Statistics exam scores and amount of time spent studying

4 Notation A correlation requires two scores for each individual –One score from each of the two variables –They are normally identified as X and Y

5 Three characteristics of X and Y are being measured… The direction of the relationship –Positive or negative The form of the relationship –Usually linear form The strength or consistency of the relationship –Perfect correlation = 1.00; no consistency would be 0.00 –Therefore, a correlation measures the degree of relationship between two variables on a scale from 0.00 to 1.00.

6 Assumptions There are 3 main assumptions… –1. The dependent and independent are normally distributed. We can test this by looking at the histograms for the two variables –2. The relationship between X and Y is linear. We can check this by looking at the scattergram –3. The relationship is homoscedastic. We can test homoscedasticity by looking at the scattergram and observing that the data points form a “roughly symmetrical, cigar-shaped pattern” about the regression line. If the above 3 assumptions have been met, then we can use correlation and test r for significance

7 Pearson r The most commonly used correlation Measures the degree of straight-line relationship Computation: r = SP / (SS X )(SS Y )

8 Example 1 A researcher predicts that there is a high correlation between scores on the stats final exam (100 pts max) and scores on the university’s exit exam for graduating seniors (330 pts max)

9 Example 1 X 30 38 52 90 95 305 Y 160 180 210 240 970 X 2 900 1,444 2,704 8,100 9,025 22,173 Y 2 25,600 32,400 44,100 57,600 192,100 XY 4,800 6,840 9,360 18,900 22,800 62,700 (  X) (  X 2 ) (  Y) (  Y 2 ) (  XY)

10 Example 1 SS X SS X = X2 X2 X2 X2 - (  X) 2 (  X) 2 = 22,173 - 305 2 305 2 = n 5 = 22,173 - 93025/5 = 22,173 - 18,605 = 3,568 SS Y = Y2 Y2 - (  Y) 2 = 192,100 - 970 2 = n 5 = 192,100 - 940,900/5 = 192,100 - 188,180 = 3,920

11 Example 1 SP =  XY  XY - (  X)(  Y) (  X)(  Y) = n 62,700 - (305)(970) 5 = 62,700 - 295,850/5 = 62,700 - 59,170 = 3,530

12 Example 1 r = SP / (SS X )(SS Y ) = 3,530 / (3,568)(3,920) = 3,530 / 13,986,560 = 3,530 / 3,739.861 =.944

13 Pearson Correlation: “Rule of Thumb” If r = 1.00 Perfect Correlation If r = 1.00 Perfect Correlation +.70 to +.99 Very strong positive relationship +.40 to +.69 Strong positive relationship +.30 to +.39 Moderate positive relationship +.20 to +.29 Weak positive relationship +.01 to +.19 No or negligible relationship -.01 to -.19 No or negligible relationship -.20 to -.29 Weak negative relationship -.30 to -.39 Moderate negative relationship -.40 to -.69 Strong negative relationship -.70 or higher Very strong negative relationship

14 Example 1: Interpretation An r of 0.944 indicates an extremely strong relationship between scores on the stats final exam and scores on the exit exam. As scores on the stats final go up so too do scores on the exit exam. An r of 0.944 indicates an extremely strong relationship between scores on the stats final exam and scores on the exit exam. As scores on the stats final go up so too do scores on the exit exam. –But we are not finished with the interpretation  See next slide 

15 Interpretation (Continued) Coefficient of Determination (r 2 ) The value r 2 is called the coefficient of determination because it measures the proportion in variability in one variable that can be determined from the relationship with the other variable The value r 2 is called the coefficient of determination because it measures the proportion in variability in one variable that can be determined from the relationship with the other variable –For example:  A correlation of r =.944 means that r 2 =.891 (or 89.1%) of the variability in the Y scores can be predicted from the relationship with the X scores

16 Coefficient of Determination (r 2 ) and Interpret: The coefficient of determination is r 2 =.891. Scores on the stats final exam, by itself, accounts for 89.1% of the variation of the exit exam scores.

17 Example 2 A researcher predicts that there is a high correlation between years of education and voter turnout –She chooses Alamosa, Boston, Chicago, Detroit, and NYC to test her theory

18 Example 2 The scores on each variable are displayed in table format: –Y = % Turnout –X = Years of Education CityXY Alamosa11.955 Boston12.160 Chicago12.765 Detroit12.868 NYC13.070

19 Scatterplot The relationship between X and Y is linear.

20 Make a Computational Table XY X2X2X2X2 Y2Y2Y2Y2XY11.955141.613025654.5 12.160146.413600726 12.765161.294225825.5 12.868163.844624870.4 13.0701694900910 ∑ X = 62.5 ∑Y = 318 ∑ X 2 = 782.15 ∑Y 2 = 20374 ∑XY = 3986.4

21 Example 2 SS X SS X = X2 X2 X2 X2 - (  X) 2 (  X) 2 = 782.15 782.15 - 62.5 2 62.5 2 = n 5 = 782.15 - 3906.25/5 = 782.15 782.15 – 781.25 = 0.9 SS Y = Y2 Y2 - (  Y) 2 = 20374 - 318 2 = n 5 = 20374 - 101124/5 = 20374 – 20224.80 = 149.20

22 Example 2 SP =  XY  XY - (  X)(  Y) (  X)(  Y) = n 3986.40 - (62.5)(318) 5 = 3986.40 - 19875/5 = 3986.40 – 3975.00 = 11.40

23 Example 2: Find Pearson r r= SP / (SS X )(SS Y ) = 11.4 / (0.9)(149.2) = 11.4 / 134.28 = 11.4/ 11.58 =.984

24 Example 2: Interpretation An r of 0.984 indicates an extremely strong relationship between years of education and voter turnout for these five cities. As level of education increases, % turnout increases. An r of 0.984 indicates an extremely strong relationship between years of education and voter turnout for these five cities. As level of education increases, % turnout increases. –But we are not finished with the interpretation  See next slide 

25 Coefficient of Determination (r 2 ) and Interpret: The coefficient of determination is r 2 =.968. Education, by itself, accounts for 96.8% of the variation in voter turnout.

26 Pearson’s r Had the relationship between % college educated and turnout, r =.32. –This relationship would have been positive and weak to moderate. Had the relationship between % college educated and turnout, r = -.12. –This relationship would have been negative and weak.

27 Hypothesis Testing with Pearson We can have a two-tailed hypothesis: H o : ρ = 0.0 H 1 : ρ ≠ 0.0 We can have a one-tailed hypothesis: H o : ρ = 0.0 H 1 : ρ 0.0) Note that ρ (rho) is the population parameter, while r is the sample statistic

28 Find r critical See Table B.6 (page 537) –You need to know the alpha level –You need to know the sample size –See that we always will use: df = n-2

29 Find r calculated See previous slides for formulas

30 Make you decision… r calculated < r critical then Retain H 0 r calculated > r critical then Reject H 0

31 Always include a brief summary of your results: Was it positive or negative? Was it significant ? Explain the correlation Explain the variation –Coefficient of Determination (r 2 )

32 Credits http://campus.houghton.edu/orgs/psychology/stat15b.ppt#267,2,Review http://publish.uwo.ca/~pakvis/Interval.ppt#276,17,Practical Example using Healey P. 418 Problem 15.1


Download ppt "Introduction to Statistics Introduction to Statistics Correlation Chapter 15 Apr 29-May 4, 2010 Classes #28-29."

Similar presentations


Ads by Google