Presentation is loading. Please wait.

Presentation is loading. Please wait.

Week 4 Lecture 2 Chapter 7. Linear Regression.

Similar presentations


Presentation on theme: "Week 4 Lecture 2 Chapter 7. Linear Regression."โ€” Presentation transcript:

1 Week 4 Lecture 2 Chapter 7. Linear Regression

2 Linear Regression Model
When the scatterplot suggests that a straight-line relationship is appropriate, we propose the linear regression model. A regression line is a straight line that describes how a response variable, ๐‘ฆ, changes as an explanatory variable, ๐‘ฅ, changes. A regression line is the best fitted line, closest to all the points in the scatterplot, that relates a response variable, ๐‘ฆ to an explanatory variable, ๐‘ฅ, which has an equation of this form: E(๐‘ฆ)= ๐›ฝ ๐›ฝ 1 ๐‘ฅ ๐›ฝ 0 is y-intercept and ๐›ฝ 1 is the slope, which are regression coefficients for the linear regression function. Regression function describes how mean of response variable changes according to the values of an explanatory variable. The values of ๐›ฝ 0 and ๐›ฝ 1 , are unknown. Thus, we use data to estimate ๐›ฝ 0 and ๐›ฝ 1 in the linear regression function. We use the linear regression model to fit our data.

3 Regression Equation A regression line is often called a least-squares regression line of ๐‘ฆ on ๐‘ฅ, because this line makes the sum of squared of errors (residuals = observed โ€“ predicted) as small as possible (more about this later in this lecture). Equation of the least-squares regression line of ๐‘ฆ on ๐‘ฅ is: ๐‘ฆ = ๐‘ ๐‘ 1 ๐‘ฅ with slope ๐‘ 1 = ๐‘Ÿ ๐‘† ๐‘Œ ๐‘† ๐‘ฅ and y-intercept ๐‘ 0 = ๐‘ฆ - ๐‘ 1 ๐‘ฅ The estimated linear regression function (also known as fitted function) has a form: ๐‘ฆ = ๐›ผ+๐‘๐‘ฅ. ๐‘ฆ = ๐‘ ๐‘ 1 ๐‘ฅ is also called least-square prediction equation, fitted function, or regression equation. For a given ๐‘ฅ value, ๐‘ฆ = ๐‘ ๐‘ 1 ๐‘ฅ estimates the mean of ๐‘ฆ for all subjects (cases) in the population having that value of ๐‘ฅ. ๐‘ฆ values (For given ๐‘ฅ values) are always on the regression line. We predict for ๐‘ฆ within the range of ๐‘ฅ values. We do not predict outside the range of ๐‘ฅ values (we do not extrapolate). Such predictions are often not accurate. Why? Because the regression equation is based on the information in the data.

4 Example of Regression Model
A survey was conducted in the United States and 10 countries of Western Europe determined the percentage of teenagers who had used marijuana and other drugs. The results are summarized in the table. Correlation between Marijuana (%) and Other Drugs (%) is: r = 0.93 There appears to be a strong positive linear association between percentage of teenagers who use marijuana and other drugs.

5 Regression Equation What is the regression equation for predicting percentage of teenagers who used other drugs from marijuana? ๐’š = ๐’ƒ ๐ŸŽ + ๐’ƒ ๐Ÿ ๐’™ with slope ๐’ƒ ๐Ÿ = ๐’“ ๐‘บ ๐’€ ๐‘บ ๐’™ and y-intercept ๐’ƒ ๐ŸŽ = ๐’š - ๐’ƒ ๐Ÿ ๐’™ ๐’ƒ ๐Ÿ = ๐’“ ๐‘บ ๐’€ ๐‘บ ๐’™ = ( ๐Ÿ๐ŸŽ.๐Ÿ๐Ÿ‘๐Ÿ—๐Ÿ— ๐Ÿ๐Ÿ“.๐Ÿ“๐Ÿ“๐Ÿ๐Ÿ– ) = 0.615 ๐’ƒ ๐ŸŽ = ๐’š - ๐’ƒ ๐Ÿ ๐’™ = โ€“ (0.615 x ) = ๐’š = ๐’ƒ ๐ŸŽ + ๐’ƒ ๐Ÿ ๐’™ ? ๐’š =โˆ’๐Ÿ‘.๐ŸŽ๐Ÿ”๐Ÿ– + ๐ŸŽ.๐Ÿ”๐Ÿ๐Ÿ“๐’™ Correlation between Marijuana (%) and Other Drugs (%) is: r = 0.93

6 Reading Regression Equation and Coefficients From StatCrunch
Letโ€™s check our work with StatCrunch (and practice reading the output information). StatCrunch command: Stat>regression>simple linear > X-variable: Marijuana% Y-variable: Other drug% Click Compute *** Note that the output gives us lots of statistics. We have not learned about them, yet. ๐’š =โˆ’๐Ÿ‘.๐ŸŽ๐Ÿ”๐Ÿ– + ๐ŸŽ.๐Ÿ”๐Ÿ๐Ÿ“๐’™ ๐’๐’•๐’‰๐’†๐’“ ๐’…๐’“๐’–๐’ˆ๐’” =โˆ’๐Ÿ‘.๐ŸŽ๐Ÿ”๐Ÿ– + ๐ŸŽ.๐Ÿ”๐Ÿ๐Ÿ“๐’Ž๐’‚๐’“๐’Š๐’‹๐’–๐’‚๐’๐’‚

7 Interpretation of the Slope and y-intercept
๐’š =โˆ’๐Ÿ‘.๐ŸŽ๐Ÿ”๐Ÿ– + ๐ŸŽ.๐Ÿ”๐Ÿ๐Ÿ“๐’™ Interpretation of estimated slope, 0.615: When the percent of teenagers who use marijuana increase by 1, the mean percent for using other drugs is estimated to increase by Interpretation of estimated y-intercept, : Since there is no 0 % of teens (๐’™ = 0) within the range of marijuana usage, y-intercept has no meaningful interpretation. Aside note: If we did have marijuana percent of 0 (๐’™ = 0) in the data, then ๐’š = -๐Ÿ‘.๐ŸŽ๐Ÿ”๐Ÿ– + 0.๐Ÿ”๐Ÿ๐Ÿ“(๐’™ = 0) = In that cases, the estimated mean percent of teens who use other drugs would be , which is meaningless!

8 Prediction After some diagnostic check (later in this lecture), we can use ๐’š =โˆ’๐Ÿ‘.๐ŸŽ๐Ÿ”๐Ÿ– + ๐ŸŽ.๐Ÿ”๐Ÿ๐Ÿ“๐’™ for predicting the percent of teens who use other drug. For example for USA with 34% marijuana usage by their teens, the percent use of other drugs is predicted to be: ๐’š =โˆ’๐Ÿ‘.๐ŸŽ๐Ÿ”๐Ÿ– + ๐ŸŽ.๐Ÿ”๐Ÿ๐Ÿ“(๐’™=๐Ÿ‘๐Ÿ’) = 17.84 The value of means than for all those countries with 34 percent marijuana usage by their teens, the predicted percent (or the mean percent) use of other drugs is estimated to be StatCrunch command: Stat>regression>simple linear > X-variable: Marijuana% Y-variable: Other drug% In Prediction of Y, input 34 in X-value(s) Click Compute

9 Prediction Below table shows predicted values given the percent of marijuana usage in the data using StatCrunch. StatCrunch command: Stat>regression>simple linear > X-variable: Marijuana% Y-variable: Other drug% Save โ€œPredicted Valuesโ€ Click Compute


Download ppt "Week 4 Lecture 2 Chapter 7. Linear Regression."

Similar presentations


Ads by Google