Chapter 7 Calculation of Pearson Coefficient of Correlation, r and testing its significance.

Slides:



Advertisements
Similar presentations
Simple Linear Regression and Correlation by Asst. Prof. Dr. Min Aung.
Advertisements

Lesson 10: Linear Regression and Correlation
Hypothesis Testing Steps in Hypothesis Testing:
Describing Relationships Using Correlation and Regression
© The McGraw-Hill Companies, Inc., 2000 CorrelationandRegression Further Mathematics - CORE.
Chapter 15 (Ch. 13 in 2nd Can.) Association Between Variables Measured at the Interval-Ratio Level: Bivariate Correlation and Regression.
Elementary Statistics Larson Farber 9 Correlation and Regression.
Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Correlation and Regression. Correlation What type of relationship exists between the two variables and is the correlation significant? x y Cigarettes.
CORRELATION COEFFICIENTS What Does a Correlation Coefficient Indicate? What is a Scatterplot? Correlation Coefficients What Could a Low r mean? What is.
Chapter 9: Correlation and Regression
SIMPLE LINEAR REGRESSION
Hypothesis Testing Using The One-Sample t-Test
Correlation & Regression Math 137 Fresno State Burger.
Lecture 5 Correlation and Regression
Correlation and Linear Regression
STATISTICS ELEMENTARY C.M. Pascual
SIMPLE LINEAR REGRESSION
Correlation and Regression
Introduction to Linear Regression and Correlation Analysis
Correlation Scatter Plots Correlation Coefficients Significance Test.
Linear Regression and Correlation
Regression Analysis (2)
Correlation and Regression
Chapter Correlation and Regression 1 of 84 9 © 2012 Pearson Education, Inc. All rights reserved.
Correlation.
Correlation and Regression
Chapter 15 Correlation and Regression
1 Chapter 9. Section 9-1 and 9-2. Triola, Elementary Statistics, Eighth Edition. Copyright Addison Wesley Longman M ARIO F. T RIOLA E IGHTH E DITION.
S ECTION 9.1 Correlation Larson/Farber 4th ed. 1.
Section 12.1 Scatter Plots and Correlation HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008 by Hawkes Learning Systems/Quant Systems,
© The McGraw-Hill Companies, Inc., Chapter 11 Correlation and Regression.
Correlation and Regression
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Elementary Statistics Correlation and Regression.
Correlation Analysis. Correlation Analysis: Introduction Management questions frequently revolve around the study of relationships between two or more.
Introduction to Statistics Introduction to Statistics Correlation Chapter 15 Apr 29-May 4, 2010 Classes #28-29.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
1 Chapter 10 Correlation. Positive and Negative Correlation 2.
1 Regression & Correlation (1) 1.A relationship between 2 variables X and Y 2.The relationship seen as a straight line 3.Two problems 4.How can we tell.
Chapter 9 Correlation and Regression.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28.
1 Chapter 10 Correlation. 2  Finding that a relationship exists does not indicate much about the degree of association, or correlation, between two variables.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
CHAPTER 10 ANOVA - One way ANOVa.
1 MVS 250: V. Katch S TATISTICS Chapter 5 Correlation/Regression.
© The McGraw-Hill Companies, Inc., Chapter 10 Correlation and Regression.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Chapter Correlation and Regression 1 of 84 9 © 2012 Pearson Education, Inc. All rights reserved.
Correlation and Regression. O UTLINE Introduction  10-1 Scatter plots.  10-2 Correlation.  10-3 Correlation Coefficient.  10-4 Regression.
Scatter Plots and Correlation
CHAPTER 10 & 13 Correlation and Regression
Regression and Correlation
Chapter 5 STATISTICS (PART 4).
Correlation and Regression
Correlation and Simple Linear Regression
Correlation and Regression
Correlation and Simple Linear Regression
Statistical Inference about Regression
Correlation and Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Warsaw Summer School 2017, OSU Study Abroad Program
Presentation transcript:

Chapter 7 Calculation of Pearson Coefficient of Correlation, r and testing its significance

From previous lecture: SSxx = Σx2 – (Σx)2 n SSxy = Σxy – (Σx) (Σy) n b = SSxy and a = y - b x SSxx Today’s lecture: we are going to calculate the correlation coefficient of the two variables, x and y, called the Pearson Product Moment Correlation Coefficient, r The values of SSxy, SSxx, SSyy can also be obtained by using the following basic formulas: SSxy = Σ(x – x)(y – y) SSxx = Σ (x – x)2 SSyy = Σ (y – y)2 But these formulas take longer to make calculations since you have to calculate The means x and y NOTE: x and y are denoted as means for this course only. The line should appear on top of the letters x and y.

Pearson Product Moment Correlation Coefficient, r r measures the strength of the relationship between two variables: x and y Examples of different strengths of relationships between variables x and y: Strong positive correlation Weak positive correlation Weak negative correlation Strong negative correlation

What is the correlation coefficient of the scatterplot below? The value of r ranges from -1 to +1.

Pearson Product Moment Coefficient of Correlation, r is given by: SSxy r = SSxx SSyy Example: Calculate the Pearson Product Moment Coefficient of Correlation, r to show the relationship between Maths and Science marks for Form 5A: Maths Science 35 9 49 15 21 7 39 11 15 5 28 8 25 9

STEP 1: Calculate Σx, Σy, Σxy, Σx2 Maths, x Science, y xy x2 y2 35 9 315 1225 81 49 15 735 2401 225 21 7 147 441 49 39 11 429 1521 121 15 5 75 225 25 28 8 224 784 64 25 9 225 625 81 Σx = 212 Σy = 64 Σxy = 2150 Σx2 = 7222 Σy2 = 646 STEP 2: Calcute SSxy, SSxx and SSyy SSxy = Σxy – (Σx) (Σy) n = 2150 – (212)(64) /7 = 211.7143 SSxx = Σx2 – (Σx)2 n = 7222 – (212)2 / 7 = 801.4286 SSyy = Σy2 – (Σy)2 n = 646 – (64)2 / 7 = 60.8571

STEP 3: Substitute inside the r formula: SSxy 211.7143 r = = = .96 SSxx SSyy (801.4286) (60.8571) The linear correlation coefficient is .96 (rounded to 2 decimal places) Interpretation: Maths and Science marks are strongly correlated. The square of the correlation, called the coefficient of determination, r2 = (.96)2 = .96 indicates that Maths marks account for 96% of the variance of the Science marks in this case.

STEP 4: Test the significance of r obtained by stating the null hypothesis that there are no significant relationship between Maths and Science scores. To test the significance of the r value obtained, you will first need to set the level of significance you wish to test, say at 1% or at p < .01. You can test the hypothesis about the population correlation coefficient ρ using the sample correlation coefficient, r. We can use the t distribution to make this test. n - 2 t = r 1 – r2 Where n – 2 are the degrees of freedom.

zero, that is ρ = 0. The alternative hypothesis can be: The null hypothesis is that the linear correlation coefficient between 2 variables is zero, that is ρ = 0. The alternative hypothesis can be: linear correlation coefficient between the 2 variables is less than zero, ρ < 0 linear correlation coefficient between the 2 variables is more than zero, ρ > 0 linear correlation coefficient between the 2 variables is not equal to zero, ρ≠ 0 State the null hypothesis: (ρ is the population correlation coefficient) Ho: ρ = 0 (The linear correlation coefficient is zero in the population) H1 : ρ > 0 (The linear correlation coefficient is positive in the population)  means One-tailed (We test H1: the positive correlation coefficient only when it is impossible for the correlation to be negative) (Otherwise we have to test H1: ρ≠ 0, when we wish to test for correlations both positive or negative  two-tailed test) STEP 5: Select the distribution to use. The population distribution for both variables are normally distributed. Hence, we can use the t distribution to perform this test about the linear correlation coefficient STEP 6: Determine the rejection and nonrejection regions

STEP 6: Determine the rejection and nonrejection regions The significance level you have chosen for this test is 1%. From the alternative hypothesis, we know that the test is right-tailed. Hence Area in the right tail of the t distribution = .01 df = n – 2 = 7 – 2 = 5 From the t distribution table, the critical value of t is 3.365. The rejection and nonrejection regions for this test are as shown below: Do not Reject Ho Reject Ho 3.365 Critical Value of t

STEP 7: Calculate the value of the test statistic, t n - 2 t = r 1 – r2 7 - 2 t = .96 = 7.667 1 – (.96)2 STEP 8: Make a decision The value of the test statistic t = 7.667 is greater than the critical value of t = 3.365 and it falls in the rejection region. Hence, we reject the null hypothesis and conclude that there is a significant, positive linear relationship between Maths and Science marks

Hypothesis A hypothesis is a specific statement about on aspect of the population e.g. its mean, or its variance. A null hypothesis is a specific statement that indicates that something has a “no effect” or “no difference” between two situations. Eg. There is no effect of the treatment on students’ motivation Or There are no gender differences in Mathematics scores.

Alternative Hypothesis An alternative hypothesis is the opposite of the null hypothesis. Eg. There is a relationship between academic achievement and motivation  a two-tailed hypothesis A one-tail hypothesis only tests on one direction. Eg, There boys are better in Mathematics than girls

A hypothesis is a statement about the POPULATION and not the sample. You cannot write a hypothesis as: Ho: This is not correct since can be measured accurately. We need an hypothesis to estimate the population mean .

Hypothesis Testing 1. State the null and alternative hypothesis 2. Select the distribution to use 3. Determine the rejection and nonrejection regions 4. Calculate the value of the test statistic 5. Make a decision

Hypothesis Testing – Example using Correlation Step 1. State the null and alternative hypothesis Ho: ρ = 0 (The linear correlation coefficient is zero in the population) H1 : ρ > 0 (The linear correlation coefficient is positive in the population)  means One-tailed Or H1: ρ ≠ 0 (Means two possibilities, ρ > 0 or ρ < 0 => Two tailed test)

STEP 2. Select the distribution to use The population distribution for both variables are normally distributed. Hence, we can use the t distribution to perform this test about the linear correlation coefficient

STEP 3: Determine the rejection and nonrejection regions The significance level you have chosen for this test is 1%. From the alternative hypothesis, we know that the test is right-tailed. Hence Area in the right tail of the t distribution = .01 df = n – 2 = 7 – 2 = 5 From the t distribution table, the critical value of t is 3.365. The rejection and nonrejection regions for this test are as shown below: Do not Reject Ho Reject Ho 3.365 Critical Value of t

STEP 4: Calculate the value of the test statistic, t n - 2 t = r 1 – r2 7 - 2 t = .96 = 7.667 1 – (.96)2 STEP 5: Make a decision The value of the test statistic t = 7.667 is greater than the critical value of t = 3.365 and it falls in the rejection region. Hence, we reject the null hypothesis and conclude that there is a significant, positive linear relationship between Maths and Science marks

Another way of calculating r – using the standard score method

School Class Size Achievement Test Cross-Product x Zx y Zy Zx.Zy A 25 0.15 80 B 14 -1.53 98 1.1 -1.68 C 33 1.38 50 -1.84 -2.54 D 28 0.61 82 0.12 0.07 E 20 -0.61 90 -0.37 Total=120 Total = 400 Total= -4.52 Mean = 24 Mean = 80 s = 6.54 s = 16.30

Do not Reject HO Reject HO r critical = -.878 R obtained = -.90 Decision: Sig at p < .05 Significantly negative relationship between Class Size and Achievement Test (r = -.90, p < .05)

Another method of calculating r – using the computational formula

Exercise 1 Explain the following concept. You may use graphs to illustrate each concept a) Perfect positive linear correlation b) Perfect negative linear correlation c) Strong positive linear correlation d) Strong negative linear correlation e) Weak positive linear correlation f) Weak negative linear correlation g) No linear correlation 2) For a sample data set, the linear correlation coefficient r has a positive value. Which of the following is true about the slope b of the regression line estimated for the same sample data? a) The value of b will be positive b) The value of b will be negative c) The value of b can be positive or negative 3) A population data set produced the following information. N = 250, Σx = 9880, Σy = 1456, Σxy = 85,080 Σx2 = 485,870 and Σy2 = 135,675 Find the linear correlation coefficient ρ. Ans: 0.25

4) A sample data set produced the following information. N = 10, Σx = 100, Σy = 220, Σxy = 3680 Σx2 = 1140 and Σy2 = 25,272 a) Find the linear correlation coefficient r. b) Using the 5% significance level, can you conclude the ρ is different from zero? 5) A sample data set produced the following information. N = 12, Σx = 66, Σy = 588, Σxy = 2244 Σx2 = 396 and Σy2 = 58734 a) Find the linear correlation coefficient r. b) Using the 5% significance level, can you conclude the ρ is negative?

6) The data on ages (in years) and prices (in hundred of dollars for eight cars of a specific model are shown below: Age 8 3 6 9 2 5 6 3 Prices 18 94 50 21 145 42 36 99 Do you expect the ages and prices of cars to be positively or negatively related? Explain. b) Calculate the linear correlation coefficient. c) Test at the 5% significance level whether ρ is negative. 7) The following table lists the midterm and final term exam scores for 7 students in a statistics class. Midterm score 79 95 81 66 87 94 59 Final Exam score 85 97 78 76 94 84 67 Do you expect the midterm and final exam scores to be positively or negatively correlated? b) Plot a scatter diagram. By looking at the scatter diagram, do you expect the correlation coefficient between these 2 variables to be close to zero, 1, or -1. c) Find the correlation coefficient. Is the value of r consistent with what you expected in parts a and b? d) Using the 1% significance level, test whether the linear correlation coefficient is Positive.