# Simple Linear Regression and Correlation

## Presentation on theme: "Simple Linear Regression and Correlation"— Presentation transcript:

Simple Linear Regression and Correlation
Chapter 12 Simple Linear Regression and Correlation Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

The Simple Linear Regression Model
12.1 The Simple Linear Regression Model Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Linear Relationship The simplest deterministic mathematical relationship between two variables x and y is a linear relationship The set of pairs (x,y) for which determines a straight line. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Terminology The variable whose value is fixed by the experimenter, denoted x, is the independent (predictor, explanatory) variable. For a fixed x, the second variable will be a random variable Y with observed value y, referred to as the dependent (response) variable. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
The Simple Linear Regression Model There exists parameters such that for any fixed value of x, the dependent variable is related to x through the model equation is a random variable (called the random deviation) with Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Linear Regression Model
(x1,y1) True regression line x1 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Distribution of Normal, mean = 0, standard deviation Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Distribution of Y for Different Values of x x x x3 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Estimating Model Parameters
12.2 Estimating Model Parameters Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Principle of Least Squares
The vertical deviation of the point (xi,yi) from the line y = b0 + b1x is yi – (b0 + b1xi) The sum of squared vertical deviations from the points to the line is: Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Principle of Least Squares
The least-squares (regression) line for the data is given by where and Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Ex. Find the equation of least-squares for the data x y xy x2 1 2 3 6 4 7 21 9 Sum: = 2.5 = –1 Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Fitted Values and Residuals The fitted (predicted) values are obtained by substituting into the equation of the estimated regression line: The residuals are the vertical deviations from the estimated line. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Error Sum of Squares The error sum of squares, denoted SSE, is and the estimate of is Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Computational Formula A computational formula for the SSE, is Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Total Sum of Squares The total sum of squares, denoted SST, is Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Coefficient of Determination The coefficient of determination, denoted by r2, is given by It is interpreted as the proportion of observed y variation that can be explained by the simple linear regression model. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Regression Sum of Squares SSR = SST – SSE Regression sum of squares is interpreted as the amount of variation that is explained by the model. We have Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

12.3 Inferences About the Slope Parameter Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
1. The mean of The variance and standard deviation are has a normal distribution. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
T Variable The assumptions of the simple linear regression model imply that the standardized variable has a t distribution with n – 2 df. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval of the true regression line is Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Hypothesis-Testing Procedures Null hypothesis: Test statistic value: Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Hypothesis-Testing Procedures Alternative Hypothesis Rejection Region for Approx. Level Test or A P-value based on n – 2 df can be calculated as in Chap 8 and 9. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Hypothesis-Testing The model utility test is the test of in which case the test statistic value is the ratio Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
ANOVA Table Source of Variation df Sum of squares Mean Square f Regression 1 SSR Error n – 2 SSE Total n – 1 SST Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Inferences Concerning and the Prediction of Future Y Values
12.4 Inferences Concerning and the Prediction of Future Y Values Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
is some fixed value of x. 1. The mean of is Variance and standard deviation: Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
2. (continued) has a normal distibution. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
T Variable The variable has a t distribution with n – 2 df. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Confidence Interval expected value of Y when x = x*, is Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Prediction Interval A future value of Y is not a parameter but instead a random variable; its interval of plausible values is referred to as a prediction interval. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Prediction Interval A PI for a future Y observation to be made when x = x*, is Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
12.5 Correlation Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Sample Correlation Coefficient The sample correlation coefficient, denoted r, of n pairs (x1,y1),…,(xn,yn) is Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Ex. Find the correlation coefficient for the least-squares line from the points = Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Properties of r Important properties of r The value of r does not depend on which of the two variables under study is labeled x and which is labeled y. The value of r is independent of the units in which x and y are measured. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Properties of r 4. r = 1 iff all (xi, yi) pairs lie on straight line with positive slope, and r = –1 iff all (xi, yi) pairs lie on a straight line with negative slope. 5. The square of the sample correlation coefficient gives the value of the coefficient of determination that would result from fitting the simple linear regression model. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Different Values of r r near -1 r near 1 r near 0, no relationship r near 0, nonlinear relationship Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
The Population Correlation Coefficient where depending on whether (X,Y) is discrete or continuous. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Estimator Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Assumption The joint probability distribution of (X,Y ) is specified by is called the bivariate normal probability distribution. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Testing for the Absence of Correlation When is true, the test statistic: Has a t distribution with n – 2 df. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Hypothesis-Testing Alternative Hypothesis Rejection Region for Approx. Level Test or A P-value based on n – 2 df can be calculated as described previously. Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
Other Inferences Concerning When (X1, Y1),…,(Xn, Yn) is a sample from a bivariate normal distribution, the rv has approximately a normal distribution with mean and variance Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
The test statistic for testing Rejection Region for Level Test Alternative Hypothesis or Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.
CI for where c1 and c2 are the left and right endpoints, of the CI interval for Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc.