 # STATISTICS ELEMENTARY C.M. Pascual

## Presentation on theme: "STATISTICS ELEMENTARY C.M. Pascual"— Presentation transcript:

STATISTICS ELEMENTARY C.M. Pascual
Chapter Correlation and Regression C.M. Pascual

Chapter 9 Correlation and Regression
9-1 Overview 9-2 Correlation 9-3 Regression 9-4 Variation and Prediction Intervals 9-5 Multiple Regression 9-6 Modeling

9-1 Overview Paired Data is there a relationship
if so, what is the equation use the equation for prediction page 506 of text

9-2 Correlation

Definition Correlation
exists between two variables when one of them is related to the other in some way

Assumptions 1. The sample of paired data (x,y) is a random sample.
2. The pairs of (x,y) data have a bivariate normal distribution. page 507 of text Explain to students the difference between the ‘paired’ data of this chapter and the investigation of two groups of data in Chapter 8.

Definition Scatterplot (or scatter diagram)
is a graph in which the paired (x,y) sample data are plotted with a horizontal x axis and a vertical y axis. Each individual (x,y) pair is plotted as a single point. Relate a scatter plot to the algebraic plotting of number pairs (x,y).

Scatter Diagram of Paired Data

Scatter Diagram of Paired Data
Page 507 of text

Positive Linear Correlation
y y y x x x page 508 of text (a) Positive (b) Strong positive (c) Perfect positive Figure Scatter Plots

Negative Linear Correlation
y y y x x x (d) Negative (e) Strong negative (f) Perfect negative Figure Scatter Plots

(h) Nonlinear Correlation
No Linear Correlation y y x x Emphasize that graph (h) does have a correlation - just not linear. Other types of correlation, such as (h), will be briefly discussed in Section 9-6. (g) No Correlation (h) Nonlinear Correlation Figure Scatter Plots

Linear Correlation Coefficient r
Definition Linear Correlation Coefficient r measures strength of the linear relationship between paired x and y values in a sample page 509 of text

Linear Correlation Coefficient r
Definition Linear Correlation Coefficient r measures strength of the linear relationship between paired x and y values in a sample nxy - (x)(y) r = n(x2) - (x) n(y2) - (y)2 Formula 9-1

Linear Correlation Coefficient r
Definition Linear Correlation Coefficient r measures strength of the linear relationship between paired x and y values in a sample nxy - (x)(y) r = n(x2) - (x) n(y2) - (y)2 Formula 9-1 Calculators can compute r (rho) is the linear correlation coefficient for all paired data in the population.

Notation for the Linear Correlation Coefficient
n = number of pairs of data presented  denotes the addition of the items indicated. x denotes the sum of all x values. x indicates that each x score should be squared and then those squares added. (x)2 indicates that the x scores should be added and the total then squared. xy indicates that each x score should be first multiplied by its corresponding y score. After obtaining all such products, find their sum. r represents linear correlation coefficient for a sample  represents linear correlation coefficient for a population

Rounding the Linear Correlation Coefficient r
Round to three decimal places so that    it can be compared to critical values    in Table A-6 Use calculator or computer if possible Page 510 of text

Interpreting the Linear Correlation Coefficient
If the absolute value of r exceeds the value in Table A - 6, conclude that there is a significant linear correlation. Otherwise, there is not sufficient evidence to support the conclusion of significant linear correlation. page 511 of text Discussion should be held regarding what value r needs to be in order to have a significant linear correlation.

TABLE A-6 Critical Values of the Pearson Correlation Coefficient r
= .05 = .01 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 .950 .878 .811 .754 .707 .666 .632 .602 .576 .553 .532 .514 .497 .482 .468 .456 .444 .396 .361 .335 .312 .294 .279 .254 .236 .220 .207 .196 .999 .959 .917 .875 .834 .798 .765 .735 .708 .684 .661 .641 .623 .606 .590 .575 .561 .505 .463 .430 .402 .378 .361 .330 .305 .286 .269 .256 Table A-6 is found on the Formula and Tables insert of the text as well as in the Appendix, page 778.

Example 1 Construct a scatter plot for the given data Age, x 43 48 56
61 67 70 Pressure, y 128 120 135 143 141 152

Example 1 Solution: Draw and label the x and y axes. Plot each point on the graph below

Example 2 A Statistics professor at a state university wants to see how strong the relationship is between a student’s score on a test and his or her grade point average. The data obtained from the sample follow: Test score, x 98 105 100 106 95 116 112 GPA, y 2.1 2.4 3.2 2.7 2.2 2.3 3.8 3.4

Subject Test Score GPA x y xy x^2 y^2 1 98 2.1 205.8 9604 4.41 2 105 2.4 252 11025 5.76 3 100 3.2 320 10000 10.24 4 2.7 270 7.29 5 106 2.2 233.2 11236 4.84 6 95 2.3 218.5 9025 5.29 7 116 3.8 440.8 13456 14.44 8 112 3.4 380.8 12544 11.56 SUM 832 22.1 2321.1 86890 63.83

Example 2 Solve of SSxy, SSxx, and Ssyy; SSxy = ∑xy – [(∑x) (∑y )]/n
= – [(832)(22.1)]/8 = 22.7 SSxx = ∑x2 – (∑x)2/n = – [(832)2]/8 = 362 SSyy = ∑y2 – (∑y)2/n = – [(22.1)2]/8 = 2.78

Example 2 Substitute in the formula and solve for r;
r = SSxy/(SSxx * Ssyy)0.5 = 22.7/[(362)(2.78)]0.5 = 0.716 The correlation coefficient suggests a strong positive relationship between the test score and the grade point average.

Properties of the Linear Correlation Coefficient r
2. Value of r does not change if all values of either variable are converted to a different scale. 3. The r is not affected by the choice of x and y. Interchange x and y and the value of r will not       change. 4. r measures strength of a linear relationship. page 512 of text If using a graphics calculator for demonstration, it will be an easy exercise to switch the x and y values to show that the value of r will not change.

Common Errors Involving Correlation
1. Causation: It is wrong to conclude that correlation implies causality. 2. Averages: Averages suppress individual variation and may inflate the correlation coefficient. 3. Linearity: There may be some relationship between x and y even when there is no significant linear correlation. page 513 of text

Common Errors Involving Correlation
FIGURE 9-2 50 100 150 200 250 1 2 3 4 5 6 7 8 Distance (feet) One example of data that does have a relationship but not a linear one. Time (seconds) Scatterplot of Distance above Ground and Time for Object Thrown Upward

Formal Hypothesis Test
To determine whether there is a significant linear correlation between two variables Two methods Both methods let H0: = (no significant linear correlation) H1:  (significant linear correlation) page 514 of text

Method 1: Test Statistic is t (follows format of earlier chapters)
n - 2 This method is preferred by some instructors because it follows the format presented in Chapter 7 for hypothesis testing.

Method 1: Test Statistic is t (follows format of earlier chapters)
n - 2 Critical values: use Table A-3 with degrees of freedom = n - 2 This is the first example in the text where the degrees of freedom for Table A-3 is different from n Special note should be made of this.

Method 1: Test Statistic is t (follows format of earlier chapters)
This is the drawing used to verify the position of the sample data t value in regard to the critical t values for the example which begins on page Drawing is at the bottom of page 516. Figure 9-4

Method 2: Test Statistic is r
(uses fewer calculations) Test statistic: r Critical values: Refer to Table A-6 (no degrees of freedom) This method is preferred by some instructors because the calculations are easier.

Method 2: Test Statistic is r
(uses fewer calculations) Test statistic: r Critical values: Refer to Table A-6 (no degrees of freedom) Reject = 0 Fail to reject  = 0 Reject = 0 This is the drawing used to verify the position of the sample data r value in regard to the critical r values for the example which begins on page Drawing is at the top of page 517. -1 r = r = 1 Figure 9-5 Sample data: r = 0.828

FIGURE 9-3 Testing for a Linear Correlation
Start Let H0:  = 0 H1:   0 Select a significance level  Calculate r using Formula 9-1 METHOD 1 METHOD 2 The test statistic is t = 1 - r 2 n -2 r Critical values of t are from Table A-3 with n -2 degrees of freedom The test statistic is Critical values of t are from Table A-6 r page 515 of text If the absolute value of the test statistic exceeds the critical values, reject H0:  = 0 Otherwise fail to reject H0 If H0 is rejected conclude that there is a significant linear correlation. If you fail to reject H0, then there is not sufficient evidence to conclude that there is linear correlation.

Is there a significant linear correlation?
0.27 2 1.41 3 2.19 2.83 6 4 1.81 0.85 1 3.05 5 Data from the Garbage Project x Plastic (lb) y Household This is exercise #7 on page 521.

Is there a significant linear correlation?
0.27 2 1.41 3 2.19 2.83 6 4 1.81 0.85 1 3.05 5 Data from the Garbage Project x Plastic (lb) y Household n =  = H0:  = 0 H1 :  0 Test statistic is r = 0.842 Using Method 2 to solve this problem.

Is there a significant linear correlation?
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 25 30 35 40 45 50 60 70 80 90 100 n .999 .959 .917 .875 .834 .798 .765 .735 .708 .684 .661 .641 .623 .606 .590 .575 .561 .505 .463 .430 .402 .378 .361 .330 .305 .286 .269 .256 .950 .878 .811 .754 .707 .666 .632 .602 .576 .553 .532 .514 .497 .482 .468 .456 .444 .396 .335 .312 .294 .279 .254 .236 .220 .207 .196 = .05 = .01 n =  = H0:  = 0 H1 :  0 Test statistic is r = 0.842 Critical values are r = and 0.707 (Table A-6 with n = 8 and  = 0.05) TABLE A-6 Critical Values of the Pearson Correlation Coefficient r

Is there a significant linear correlation?
Reject = 0 Fail to reject  = 0 Reject = 0 - 1 1 r = r = Placement of the sample data r value in regard to the critical r values. Sample data: r = 0.842

Is there a significant linear correlation?
> 0.707, That is the test statistic does fall within the critical region. Reject = 0 Fail to reject  = 0 Reject = 0 - 1 1 r = r = Sample data: r = 0.842

Is there a significant linear correlation?
> 0.707, That is the test statistic does fall within the critical region. Therefore, we REJECT H0:  = 0 (no correlation) and conclude there is a significant linear correlation between the weights of discarded plastic and household size. Reject = 0 Fail to reject  = 0 Reject = 0 - 1 1 r = r = Sample data: r = 0.842

Justification for r Formula
page 517 of text

Justification for r Formula
Formula 9-1 is developed from  (x -x) (y -y) r = (n -1) Sx Sy

Justification for r Formula
Formula 9-1 is developed from  (x -x) (y -y) r = (x, y) centroid of sample points (n -1) Sx Sy The mean of the x-values is x-bar, and the mean of the y-values is y-bar.

Justification for r Formula
Formula 9-1 is developed from  (x -x) (y -y) r = (x, y) centroid of sample points (n -1) Sx Sy x = 3 y x - x = = 4 24 (7, 23) 20 y - y = = 12 Quadrant 2 Quadrant 1 16 12 y = 11 For one point (7,23) the differences from the centroid are found. These values would be used in the r formula. (x, y) 8 Quadrant 3 Quadrant 4 4 FIGURE 9-6 x 1 2 3 4 5 6 7