Presentation on theme: "Simple Linear Regression. 11.5 The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory."— Presentation transcript:
11.5 The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory variable y variable – dependent variable or response variable Correlation – The relationship between two variables Remember that we previously looked at correlation graphically with a scatter plot
We wish to quantify the strength and direction of a linear relationship (Pearson correlation coefficient, r)
Perfect Positive Linear Correlation Perfect Negative Linear Correlation No linear relationship Indicates a strong linear relationship
Example: Dosage of a Drug and Reduction in Blood Pressure y variable is Reduction in Blood Pressure x variable is Dosage of Drug Existence of correlation does not imply a cause and effect relationship x100200300400500 y1018324456
11.1 Probabilistic Models The purpose of regression is to predict For simple linear regression: We predict with a linear model the value of a difficult to measure variable, y, based on an easy to measure variable, x. In order to use linear regression, make sure the model is reasonable. You should look at the r value and the scatter plot.
Example: Back to the dosage of drug and reduction in blood pressure data
The linear regression model is: Where is the y-intercept and is the slope In the dosage of drug and reduction in blood pressure example, notice and Predict the Reduction in Blood Pressure if 250 is the Dosage of Drug
The value is the percent of the variation in y explained by the model Example For the dosage of drug and blood pressure find The higher is, the better the model is.
Interpolation – Predicting Y values for X values that are within the range of the scatter plot. (This is what regression should be used for) Extrapolation – Predicting Y values for X values beyond the range of the observations. (This should not be done using a basic regression model it is a complex problem)
11.2 Fitting the Model: The Least Squares Approach The regression model expresses y as a function of x plus random error Random error reflects variation in y values among items or individuals having the same x value We need a line that is the “best” fit for our data. We will use the method of least-squares. This says that the sum of the squares of the vertical distances from the points to the line is minimized.
It can be shown that Note: The least-squares line can be affected greatly by extreme data points.
Example Find the regression equation for the dosage of drug and reduction in blood pressure Residual – the difference between an actual value and the fitted value Example Find the residual for the point (400, 44) in the dosage of drug and reduction in blood pressure data