Download presentation

Presentation is loading. Please wait.

Published byRigoberto Hubbart Modified about 1 year ago

1
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting coefficients Prediction Cautions Inference for slope, correlation

2
Statistics: Unlocking the Power of Data Lock 5 ANOVA is used to test for an association between a)Two categorical variables b)One categorical and one quantitative variable c)Two quantitative variables Review

3
Statistics: Unlocking the Power of Data Lock 5 A 2 test is used to test for an association between a)Two categorical variables b)One categorical and one quantitative variable c)Two quantitative variables Review

4
Statistics: Unlocking the Power of Data Lock 5 MODELING

5
Statistics: Unlocking the Power of Data Lock 5 Can you estimate the temperature on a summer evening, just by listening to crickets chirp? We will fit a model to predict temperature based on cricket chirp rate Crickets and Temperature

6
Statistics: Unlocking the Power of Data Lock 5 Crickets and Temperature Response Variable, y Explanatory Variable, x

7
Statistics: Unlocking the Power of Data Lock 5 Linear Model A linear model predicts a response variable, y, using a linear function of explanatory variables Simple linear regression predicts on response variable, y, as a linear function of one explanatory variable, x

8
Statistics: Unlocking the Power of Data Lock 5 Regression Line Goal: Find a straight line that best fits the data in a scatterplot

9
Statistics: Unlocking the Power of Data Lock 5 Equation of the Line The estimated regression line is InterceptSlope Slope: increase in predicted y for every unit increase in x Intercept: predicted y value when x = 0

10
Statistics: Unlocking the Power of Data Lock 5 Regression in StatKey

11
Statistics: Unlocking the Power of Data Lock 5 Regression in RStudio

12
Statistics: Unlocking the Power of Data Lock 5 Regression Model Which is a correct interpretation? a) The average temperature is b) For every extra 0.23 chirps per minute, the predicted temperate increases by 1 degree c) Predicted temperature increases by 0.23 degrees for each extra chirp per minute d) For every extra 0.23 chirps per minute, the predicted temperature increases by

13
Statistics: Unlocking the Power of Data Lock 5 Units It is helpful to think about units when interpreting a regression equation y units x units degrees chirps per minute degrees/ chirps per min

14
Statistics: Unlocking the Power of Data Lock 5 Prediction The regression equation can be used to predict y for a given value of x If you listen and hear crickets chirping about 140 times per minute, your best guess at the outside temperature is

15
Statistics: Unlocking the Power of Data Lock 5 Prediction

16
Statistics: Unlocking the Power of Data Lock 5 Prediction If the crickets are chirping about 180 times per minute, your best guess at the temperature is (a) 60 (b) 70 (c) 80

17
Statistics: Unlocking the Power of Data Lock 5 Prediction The intercept tells us that the predicted temperature when the crickets are not chirping at all is . Do you think this is a good prediction? (a) Yes (b) No

18
Statistics: Unlocking the Power of Data Lock 5 Regression Caution 1 Do not use the regression equation or line to predict outside the range of x values available in your data (do not extrapolate!) If none of the x values are anywhere near 0, then the intercept is meaningless!

19
Statistics: Unlocking the Power of Data Lock 5 Duke Rank and Duke Shirts a)positively associated b)negatively associated c)not associated d)other Are the rank of Duke among schools applied to and the number of Duke shirts owned

20
Statistics: Unlocking the Power of Data Lock 5 Regression Caution 2 Computers will calculate a regression line for any two quantitative variables, even if they are not associated or if the association is not linear ALWAYS PLOT YOUR DATA! The regression line/equation should only be used if the association is approximately linear

21
Statistics: Unlocking the Power of Data Lock 5 Regression Caution 3 Outliers (especially outliers in both variables) can be very influential on the regression line ALWAYS PLOT YOUR DATA! l.aspx?ID=L455

22
Statistics: Unlocking the Power of Data Lock 5 Life Expectancy and Birth Rate Coefficients: (Intercept) LifeExpectancy Which of the following interpretations is correct? (a)A decrease of 0.89 in the birth rate corresponds to a 1 year increase in predicted life expectancy (b)Increasing life expectancy by 1 year will cause birth rate to decrease by 0.89 (c)Both (d)Neither

23
Statistics: Unlocking the Power of Data Lock 5 Regression Caution 4 Higher values of x may lead to higher (or lower) predicted values of y, but this does NOT mean that changing x will cause y to increase or decrease Causation can only be determined if the values of the explanatory variable were determined randomly (which is rarely the case for a continuous explanatory variable)

24
Statistics: Unlocking the Power of Data Lock 5 Explanatory and Response Unlike correlation, for linear regression it does matter which is the explanatory variable and which is the response

25
Statistics: Unlocking the Power of Data Lock 5 Regression Line How do we find the best fitting line???

26
Statistics: Unlocking the Power of Data Lock 5 Predicted and Actual Values

27
Statistics: Unlocking the Power of Data Lock 5 Predicted and Actual Values

28
Statistics: Unlocking the Power of Data Lock 5 Residual The residual is also the vertical distance from each point to the line

29
Statistics: Unlocking the Power of Data Lock 5 Residual Want to make all the residuals as small as possible. How would you measure this?

30
Statistics: Unlocking the Power of Data Lock 5 Least Squares Regression Least squares regression chooses the regression line that minimizes the sum of squared residuals

31
Statistics: Unlocking the Power of Data Lock 5 Least Squares Regression

32
Statistics: Unlocking the Power of Data Lock 5 Sample to Population Everything we have done so far is based solely on sample data Now, we will extend from the sample to the population

33
Statistics: Unlocking the Power of Data Lock 5 Intercept Slope Simple Linear Model Random error

34
Statistics: Unlocking the Power of Data Lock 5 Inference for the Slope

35
Statistics: Unlocking the Power of Data Lock 5 Inference for Slope Give a 95% confidence interval for the true slope. Is the slope significantly different from 0? (a) Yes (b) No

36
Statistics: Unlocking the Power of Data Lock 5 Confidence Interval

37
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Test

38
Statistics: Unlocking the Power of Data Lock 5 Test for a correlation between temperature and cricket chirps (r = ). Correlation Test

39
Statistics: Unlocking the Power of Data Lock 5 Two Quantitative Variables The t-statistic (and p-value) for a test for a non-zero slope and a test for a non-zero correlation are identical! They are equivalent ways of testing for an association between two quantitative variables.

40
Statistics: Unlocking the Power of Data Lock 5 Small Samples The t-distribution is only appropriate for large samples (definitely not n = 7)! We should have done inference for the slope using simulation methods...

41
Statistics: Unlocking the Power of Data Lock 5

42
r = 0 Challenge: If the correlation between x and y is 0, what would the regression line be?

43
Statistics: Unlocking the Power of Data Lock 5 To Do Read Section 2.6, 9.1 Do HW 7 (due Monday, 3/31) NO LATE HOMEWORK ACCEPTED – SOLUTIONS WILL BE POSTED IMMEDIATELY AFTER CLASS TO HELP YOU PREPARE FOR EXAM 2

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google