Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTION 2.6 Interpreting coefficients Prediction.

Similar presentations


Presentation on theme: "Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTION 2.6 Interpreting coefficients Prediction."— Presentation transcript:

1 Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTION 2.6 Interpreting coefficients Prediction Cautions Least Squares regression

2 Statistics: Unlocking the Power of Data Lock 5 Can you estimate the temperature on a summer evening, just by listening to crickets chirp? We will fit a model to predict temperature based on cricket chirp rate Crickets and Temperature

3 Statistics: Unlocking the Power of Data Lock 5 Crickets and Temperature Response Variable, y Explanatory Variable, x

4 Statistics: Unlocking the Power of Data Lock 5 Linear Model A linear model predicts a response variable, y, using a linear function of explanatory variables Simple linear regression predicts on response variable, y, as a linear function of one explanatory variable, x

5 Statistics: Unlocking the Power of Data Lock 5 Regression Line Goal: Find a straight line that best fits the data in a scatterplot

6 Statistics: Unlocking the Power of Data Lock 5 Equation of the Line The estimated regression line is InterceptSlope Slope: increase in predicted y for every unit increase in x Intercept: predicted y value when x = 0

7 Statistics: Unlocking the Power of Data Lock 5 Regression in StatKey

8 Statistics: Unlocking the Power of Data Lock 5 Regression in RStudio

9 Statistics: Unlocking the Power of Data Lock 5 Regression Model Which is a correct interpretation? a) The average temperature is 37.68  b) For every extra 0.23 chirps per minute, the predicted temperature increases by 1 degree c) Predicted temperature increases by 0.23 degrees for each extra chirp per minute d) For every extra 0.23 chirps per minute, the predicted temperature increases by 37.68 

10 Statistics: Unlocking the Power of Data Lock 5 Units It is helpful to think about units when interpreting a regression equation y units x units degrees chirps per minute degrees/ chirps per min

11 Statistics: Unlocking the Power of Data Lock 5 Prediction The regression equation can be used to predict y for a given value of x If you listen and hear crickets chirping about 140 times per minute, your best guess at the outside temperature is

12 Statistics: Unlocking the Power of Data Lock 5 Prediction

13 Statistics: Unlocking the Power of Data Lock 5 Prediction If the crickets are chirping about 180 times per minute, your best guess at the temperature is (a) 60  (b) 70  (c) 80 

14 Statistics: Unlocking the Power of Data Lock 5 Prediction The intercept tells us that the predicted temperature when the crickets are not chirping at all is 37.68 . Do you think this is a good prediction? (a) Yes (b) No None of the observed chirp rates are anywhere close to 0, so we have no way of knowing if the linear trend continues all the way down to 0.

15 Statistics: Unlocking the Power of Data Lock 5 Regression Caution 1 Do not use the regression equation or line to predict outside the range of x values available in your data (do not extrapolate!) If none of the x values are anywhere near 0, then the intercept is meaningless!

16 Statistics: Unlocking the Power of Data Lock 5 Duke Rank and Duke Shirts a)positively associated b)negatively associated c)not associated d)other Are the rank of Duke among schools applied to and the number of Duke shirts owned

17 Statistics: Unlocking the Power of Data Lock 5 Duke Rank and Duke Shirts a)positively associated b)negatively associated c)not associated d)other Are the rank of Duke among schools applied to and the number of Duke shirts owned

18 Statistics: Unlocking the Power of Data Lock 5 Regression Caution 2 Computers will calculate a regression line for any two quantitative variables, even if they are not associated or if the association is not linear ALWAYS PLOT YOUR DATA! The regression line/equation should only be used if the association is approximately linear

19 Statistics: Unlocking the Power of Data Lock 5 Regression Caution 3 Outliers (especially outliers in both variables) can be very influential on the regression line ALWAYS PLOT YOUR DATA! http://illuminations.nctm.org/LessonDetai l.aspx?ID=L455

20 Statistics: Unlocking the Power of Data Lock 5 Life Expectancy and Birth Rate Coefficients: (Intercept) LifeExpectancy 83.4090 -0.8895 Which of the following interpretations is correct? (a)A decrease of 0.89 in the birth rate corresponds to a 1 year increase in predicted life expectancy (b)Increasing life expectancy by 1 year will cause birth rate to decrease by 0.89 (c)Both (d)Neither (a)The model is predicting birth rate based on life expectancy, not the other way around (b)Can only make conclusions about causality from a randomized experiment.

21 Statistics: Unlocking the Power of Data Lock 5 Regression Caution 4 Higher values of x may lead to higher (or lower) predicted values of y, but this does NOT mean that changing x will cause y to increase or decrease Causation can only be determined if the values of the explanatory variable were determined randomly (which is rarely the case for a continuous explanatory variable)

22 Statistics: Unlocking the Power of Data Lock 5 Explanatory and Response Unlike correlation, for linear regression it does matter which is the explanatory variable and which is the response

23 Statistics: Unlocking the Power of Data Lock 5 Regression Line How do we find the best fitting line???

24 Statistics: Unlocking the Power of Data Lock 5 Predicted and Actual Values

25 Statistics: Unlocking the Power of Data Lock 5 Predicted and Actual Values

26 Statistics: Unlocking the Power of Data Lock 5 Residual The residual is also the vertical distance from each point to the line

27 Statistics: Unlocking the Power of Data Lock 5 Residual Want to make all the residuals as small as possible. How would you measure this?

28 Statistics: Unlocking the Power of Data Lock 5 Least Squares Regression Least squares regression chooses the regression line that minimizes the sum of squared residuals

29 Statistics: Unlocking the Power of Data Lock 5 Least Squares Regression

30 Statistics: Unlocking the Power of Data Lock 5 Presidential Elections Can we use the current approval rating (50%, according to a Rasmussen Poll yesterday) to predict whether he will win the election tonight?Rasmussen Poll We can build a model using data from past elections to predict an incumbent’s margin of victory based on approval rating

31 Statistics: Unlocking the Power of Data Lock 5 Presidential Elections What is Obama’s predicted margin of victory, based on his approval rating?

32 Statistics: Unlocking the Power of Data Lock 5 Presidential Elections For this number to be useful, we need an interval estimate for Obama’s margin of victory We’ll learn how to be compute these next class For now: if Obama’s approval rating really is 50%, then we are 95% confident that Obama’s margin of victory will be between -8.8 and 19.4

33 Statistics: Unlocking the Power of Data Lock 5 Group project on regression (modeling) to be done in your lab groups Need a data set with a quantitative response variable and multiple explanatory variables; must have at least one categorical and at least one quantitative explanatory variable Project 2

34 Statistics: Unlocking the Power of Data Lock 5 To Do Read Section 2.6, 9.1 Complete the anonymous midterm evaluation TODAY by 5pmmidterm evaluation Do Homework 7 (due Tuesday, 11/13)Homework 7  NO LATE HOMEWORK ACCEPTED – SOLUTIONS WILL BE POSTED IMMEDIATELY AFTER CLASS Find data for Project 2  Presentations last lab (Monday, 12/3)  Report due last day of class (Thursday, 12/6)


Download ppt "Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTION 2.6 Interpreting coefficients Prediction."

Similar presentations


Ads by Google