Download presentation
Presentation is loading. Please wait.
1
Week 4 Lecture 2 Chapter 7. Linear Regression
2
Linear Regression Model
When the scatterplot suggests that a straight-line relationship is appropriate, we propose the linear regression model. A regression line is a straight line that describes how a response variable, ๐ฆ, changes as an explanatory variable, ๐ฅ, changes. A regression line is the best fitted line, closest to all the points in the scatterplot, that relates a response variable, ๐ฆ to an explanatory variable, ๐ฅ, which has an equation of this form: E(๐ฆ)= ๐ฝ ๐ฝ 1 ๐ฅ ๐ฝ 0 is y-intercept and ๐ฝ 1 is the slope, which are regression coefficients for the linear regression function. Regression function describes how mean of response variable changes according to the values of an explanatory variable. The values of ๐ฝ 0 and ๐ฝ 1 , are unknown. Thus, we use data to estimate ๐ฝ 0 and ๐ฝ 1 in the linear regression function. We use the linear regression model to fit our data.
3
Regression Equation A regression line is often called a least-squares regression line of ๐ฆ on ๐ฅ, because this line makes the sum of squared of errors (residuals = observed โ predicted) as small as possible (more about this later in this lecture). Equation of the least-squares regression line of ๐ฆ on ๐ฅ is: ๐ฆ = ๐ ๐ 1 ๐ฅ with slope ๐ 1 = ๐ ๐ ๐ ๐ ๐ฅ and y-intercept ๐ 0 = ๐ฆ - ๐ 1 ๐ฅ The estimated linear regression function (also known as fitted function) has a form: ๐ฆ = ๐ผ+๐๐ฅ. ๐ฆ = ๐ ๐ 1 ๐ฅ is also called least-square prediction equation, fitted function, or regression equation. For a given ๐ฅ value, ๐ฆ = ๐ ๐ 1 ๐ฅ estimates the mean of ๐ฆ for all subjects (cases) in the population having that value of ๐ฅ. ๐ฆ values (For given ๐ฅ values) are always on the regression line. We predict for ๐ฆ within the range of ๐ฅ values. We do not predict outside the range of ๐ฅ values (we do not extrapolate). Such predictions are often not accurate. Why? Because the regression equation is based on the information in the data.
4
Example of Regression Model
A survey was conducted in the United States and 10 countries of Western Europe determined the percentage of teenagers who had used marijuana and other drugs. The results are summarized in the table. Correlation between Marijuana (%) and Other Drugs (%) is: r = 0.93 There appears to be a strong positive linear association between percentage of teenagers who use marijuana and other drugs.
5
Regression Equation What is the regression equation for predicting percentage of teenagers who used other drugs from marijuana? ๐ = ๐ ๐ + ๐ ๐ ๐ with slope ๐ ๐ = ๐ ๐บ ๐ ๐บ ๐ and y-intercept ๐ ๐ = ๐ - ๐ ๐ ๐ ๐ ๐ = ๐ ๐บ ๐ ๐บ ๐ = ( ๐๐.๐๐๐๐ ๐๐.๐๐๐๐ ) = 0.615 ๐ ๐ = ๐ - ๐ ๐ ๐ = โ (0.615 x ) = ๐ = ๐ ๐ + ๐ ๐ ๐ ? ๐ =โ๐.๐๐๐ + ๐.๐๐๐๐ Correlation between Marijuana (%) and Other Drugs (%) is: r = 0.93
6
Reading Regression Equation and Coefficients From StatCrunch
Letโs check our work with StatCrunch (and practice reading the output information). StatCrunch command: Stat>regression>simple linear > X-variable: Marijuana% Y-variable: Other drug% Click Compute *** Note that the output gives us lots of statistics. We have not learned about them, yet. ๐ =โ๐.๐๐๐ + ๐.๐๐๐๐ ๐๐๐๐๐ ๐
๐๐๐๐ =โ๐.๐๐๐ + ๐.๐๐๐๐๐๐๐๐๐๐๐๐
7
Interpretation of the Slope and y-intercept
๐ =โ๐.๐๐๐ + ๐.๐๐๐๐ Interpretation of estimated slope, 0.615: When the percent of teenagers who use marijuana increase by 1, the mean percent for using other drugs is estimated to increase by Interpretation of estimated y-intercept, : Since there is no 0 % of teens (๐ = 0) within the range of marijuana usage, y-intercept has no meaningful interpretation. Aside note: If we did have marijuana percent of 0 (๐ = 0) in the data, then ๐ = -๐.๐๐๐ + 0.๐๐๐(๐ = 0) = In that cases, the estimated mean percent of teens who use other drugs would be , which is meaningless!
8
Prediction After some diagnostic check (later in this lecture), we can use ๐ =โ๐.๐๐๐ + ๐.๐๐๐๐ for predicting the percent of teens who use other drug. For example for USA with 34% marijuana usage by their teens, the percent use of other drugs is predicted to be: ๐ =โ๐.๐๐๐ + ๐.๐๐๐(๐=๐๐) = 17.84 The value of means than for all those countries with 34 percent marijuana usage by their teens, the predicted percent (or the mean percent) use of other drugs is estimated to be StatCrunch command: Stat>regression>simple linear > X-variable: Marijuana% Y-variable: Other drug% In Prediction of Y, input 34 in X-value(s) Click Compute
9
Prediction Below table shows predicted values given the percent of marijuana usage in the data using StatCrunch. StatCrunch command: Stat>regression>simple linear > X-variable: Marijuana% Y-variable: Other drug% Save โPredicted Valuesโ Click Compute
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.