Week 4 Lecture 2 Chapter 7. Linear Regression.

Week 4 Lecture 2 Chapter 7. Linear Regression

Linear Regression Model
When the scatterplot suggests that a straight-line relationship is appropriate, we propose the linear regression model. A regression line is a straight line that describes how a response variable, 𝑦, changes as an explanatory variable, 𝑥, changes. A regression line is the best fitted line, closest to all the points in the scatterplot, that relates a response variable, 𝑦 to an explanatory variable, 𝑥, which has an equation of this form: E(𝑦)= 𝛽 𝛽 1 𝑥 𝛽 0 is y-intercept and 𝛽 1 is the slope, which are regression coefficients for the linear regression function. Regression function describes how mean of response variable changes according to the values of an explanatory variable. The values of 𝛽 0 and 𝛽 1 , are unknown. Thus, we use data to estimate 𝛽 0 and 𝛽 1 in the linear regression function. We use the linear regression model to fit our data.

Regression Equation A regression line is often called a least-squares regression line of 𝑦 on 𝑥, because this line makes the sum of squared of errors (residuals = observed – predicted) as small as possible (more about this later in this lecture). Equation of the least-squares regression line of 𝑦 on 𝑥 is: 𝑦 = 𝑏 𝑏 1 𝑥 with slope 𝑏 1 = 𝑟 𝑆 𝑌 𝑆 𝑥 and y-intercept 𝑏 0 = 𝑦 - 𝑏 1 𝑥 The estimated linear regression function (also known as fitted function) has a form: 𝑦 = 𝛼+𝑏𝑥. 𝑦 = 𝑏 𝑏 1 𝑥 is also called least-square prediction equation, fitted function, or regression equation. For a given 𝑥 value, 𝑦 = 𝑏 𝑏 1 𝑥 estimates the mean of 𝑦 for all subjects (cases) in the population having that value of 𝑥. 𝑦 values (For given 𝑥 values) are always on the regression line. We predict for 𝑦 within the range of 𝑥 values. We do not predict outside the range of 𝑥 values (we do not extrapolate). Such predictions are often not accurate. Why? Because the regression equation is based on the information in the data.

Example of Regression Model
A survey was conducted in the United States and 10 countries of Western Europe determined the percentage of teenagers who had used marijuana and other drugs. The results are summarized in the table. Correlation between Marijuana (%) and Other Drugs (%) is: r = 0.93 There appears to be a strong positive linear association between percentage of teenagers who use marijuana and other drugs.

Regression Equation What is the regression equation for predicting percentage of teenagers who used other drugs from marijuana? 𝒚 = 𝒃 𝟎 + 𝒃 𝟏 𝒙 with slope 𝒃 𝟏 = 𝒓 𝑺 𝒀 𝑺 𝒙 and y-intercept 𝒃 𝟎 = 𝒚 - 𝒃 𝟏 𝒙 𝒃 𝟏 = 𝒓 𝑺 𝒀 𝑺 𝒙 = ( 𝟏𝟎.𝟐𝟑𝟗𝟗 𝟏𝟓.𝟓𝟓𝟐𝟖 ) = 0.615 𝒃 𝟎 = 𝒚 - 𝒃 𝟏 𝒙 = – (0.615 x ) = 𝒚 = 𝒃 𝟎 + 𝒃 𝟏 𝒙 ? 𝒚 =−𝟑.𝟎𝟔𝟖 + 𝟎.𝟔𝟏𝟓𝒙 Correlation between Marijuana (%) and Other Drugs (%) is: r = 0.93

Reading Regression Equation and Coefficients From StatCrunch
Let’s check our work with StatCrunch (and practice reading the output information). StatCrunch command: Stat>regression>simple linear > X-variable: Marijuana% Y-variable: Other drug% Click Compute *** Note that the output gives us lots of statistics. We have not learned about them, yet. 𝒚 =−𝟑.𝟎𝟔𝟖 + 𝟎.𝟔𝟏𝟓𝒙 𝒐𝒕𝒉𝒆𝒓 𝒅𝒓𝒖𝒈𝒔 =−𝟑.𝟎𝟔𝟖 + 𝟎.𝟔𝟏𝟓𝒎𝒂𝒓𝒊𝒋𝒖𝒂𝒏𝒂

Interpretation of the Slope and y-intercept
𝒚 =−𝟑.𝟎𝟔𝟖 + 𝟎.𝟔𝟏𝟓𝒙 Interpretation of estimated slope, 0.615: When the percent of teenagers who use marijuana increase by 1, the mean percent for using other drugs is estimated to increase by Interpretation of estimated y-intercept, : Since there is no 0 % of teens (𝒙 = 0) within the range of marijuana usage, y-intercept has no meaningful interpretation. Aside note: If we did have marijuana percent of 0 (𝒙 = 0) in the data, then 𝒚 = -𝟑.𝟎𝟔𝟖 + 0.𝟔𝟏𝟓(𝒙 = 0) = In that cases, the estimated mean percent of teens who use other drugs would be , which is meaningless!

Prediction After some diagnostic check (later in this lecture), we can use 𝒚 =−𝟑.𝟎𝟔𝟖 + 𝟎.𝟔𝟏𝟓𝒙 for predicting the percent of teens who use other drug. For example for USA with 34% marijuana usage by their teens, the percent use of other drugs is predicted to be: 𝒚 =−𝟑.𝟎𝟔𝟖 + 𝟎.𝟔𝟏𝟓(𝒙=𝟑𝟒) = 17.84 The value of means than for all those countries with 34 percent marijuana usage by their teens, the predicted percent (or the mean percent) use of other drugs is estimated to be StatCrunch command: Stat>regression>simple linear > X-variable: Marijuana% Y-variable: Other drug% In Prediction of Y, input 34 in X-value(s) Click Compute

Prediction Below table shows predicted values given the percent of marijuana usage in the data using StatCrunch. StatCrunch command: Stat>regression>simple linear > X-variable: Marijuana% Y-variable: Other drug% Save “Predicted Values” Click Compute

Week 4 Lecture 2 Chapter 7. Linear Regression.

Similar presentations

Presentation on theme: "Week 4 Lecture 2 Chapter 7. Linear Regression."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Week 4 Lecture 2 Chapter 7. Linear Regression.

Similar presentations

Presentation on theme: "Week 4 Lecture 2 Chapter 7. Linear Regression."— Presentation transcript:

Similar presentations

About project

Feedback