Presentation is loading. Please wait.

Presentation is loading. Please wait.

Correlation & Regression

Similar presentations


Presentation on theme: "Correlation & Regression"— Presentation transcript:

1 Correlation & Regression
A correlation and regression analysis involves investigating the relationship between two (or more) quantitative variables of interest. The goal of such an investigation is typically to estimate (predict) the value of one variable based on the observed value of the other variable (or variables).

2 Quantitative Variables
Dependent Variable (Y) the variable being predicted called the response variable Independent Variable (X) the variable used to explain or predict Y called the explanatory or predictor variable

3 Correlation & Regression
Addresses the questions: “Is there a relationship between X and Y?” “If so, how strong is it?” Regression Addresses the question “What is the relationship between X and Y?”

4 Simple Linear Relationship
A linear (straight line) relationship between Y and a single X. The form of the equation is Y = b0 + b1 X, where b0 is the y-intercept and b1 is the slope A scatter-plot of X versus Y is useful for spotting linear relationships, and obvious departures from linear. Always start with a scatter plot!!

5 Correlation A correlation exists between two variables when they are related in some way. Linear Correlation Coefficient (r) measures the strength of the linear relationship between X and Y Properties of r -1 ≤ r ≤ 1 r=1 for a perfect positive linear relationship r= -1 for a perfect negative linear relationship r = 0 if there is no linear relationship

6 Sample Correlation Coefficient
Statistics that is useful for estimating the linear correlation coefficient

7 Coefficient of Determination
The coefficient of determination is the proportion of variability in Y that can be explained by its linear relationship to X. Computed by squaring the sample correlation squared (r2)

8 Hypothesis Testing of the Linear Correlation Coefficient
Appropriate Hypothesis:

9 Testing r Test Statistic: Rejection Region (3 cases of H1)
Two-tailed: For H1: r ≠ 0, Reject H0 for |t| ≥ tα/2 Left-tailed: For H1: r < 0, Reject H0 for t ≤ -tα Right-tailed: For H1: r > 0, Reject H0 for t ≥ tα

10 Simple Linear Regression
The Least Squares Regression line is our "best" line for explaining the relationship between Y and X. It minimizes the squared error (distance between the observed values and the values predicted by the line). The predicted value of Y for any X can be found by plugging X into the least squares regression line.

11 Simple Linear Regression Line
The equation is: where and

12 Proper Use of Correlation & Regression
Correlation does not imply causation. Simple linear regression is appropriate only if the data clusters about a line. Do not extrapolate. Do not apply model to other populations. For multiple regression, the size of the parameter does not indicate importance.

13 Effect of Extreme Values
Extreme values can have a very large effect on correlation and regression analysis. Influential outliers can largely impact model fit. Regression Applet by Webster West

14 Model Assumptions for Inference
The difference between the observed and the model predicted values is called the residual, and is denoted by e: The residuals are assumed to be independent and identically normal in distribution with mean 0 and standard deviation se. So far a particular X, the distribution of Y can be described as normal with mean equal to the predicted value of Y for that X, and standard deviation equal to se.

15 Inference about the Simple Linear Regression Model Parameters
Is there a significant relationship between X and Y? H0: b1 = 0 versus H1: b1 ≠ 0 Test Statistic:

16 Inference about the Simple Linear Regression Model Parameters
Rejection Region (3 cases of H1) Two-tailed: For H1: r ≠ 0, Reject H0 for |t| ≥ tα/2 Left-tailed: For H1: r < 0, Reject H0 for t ≤ -tα Right-tailed: For H1: r > 0, Reject H0 for t ≥ tα

17 Inference about the Simple Linear Regression Model Parameters
Is there a non-zero y-intercept in the linear relationship between X and Y? H0: b0 = 0 versus H1: b0 ≠ 0 Test Statistic:

18 Inference about a Regression Line
E(Y) is the expected value of Y. For a given X, E(Y) is determined by evaluating the simple linear regression equation at X. A t-distribution allows a confidence interval for the true mean value of Y given an X.

19 Inference about Y for a Given X
The expected observation of Y for a given X is equal to E(Y). A t-distribution on E(Y) allows the construction a predication interval for prediction of a single observation for a particular value of X.

20 Residual Analysis Can be useful for checking the model assumptions, which for the linear regression model are: Independent observations Residual have N(0,s2) distribution Plots can be useful for spotting model inadequacy

21 Variable Selection in Multiple Regression
Compare all possible regressions Backward elimination Forward Selection Stepwise Elimination


Download ppt "Correlation & Regression"

Similar presentations


Ads by Google