Chapter 8 Linear regression

Slides:



Advertisements
Similar presentations
Copyright © 2010 Pearson Education, Inc. Slide The correlation between two scores X and Y equals 0.8. If both the X scores and the Y scores are converted.
Advertisements

Chapter 8 Linear Regression
Linear Regression.  The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu:  The model won’t be perfect, regardless.
Chapter 8 Linear regression
Linear Regression Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
Residuals Revisited.   The linear model we are using assumes that the relationship between the two variables is a perfect straight line.  The residuals.
Chapter 8 Linear Regression.
Extrapolation: Reaching Beyond the Data
Lesson Diagnostics on the Least- Squares Regression Line.
Warm up Use calculator to find r,, a, b. Chapter 8 LSRL-Least Squares Regression Line.
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Statistics Residuals and Regression. Warm-up In northern cities roads are salted to keep ice from freezing on the roadways between 0 and -9.5 ° C. Suppose.
CHAPTER 8: LINEAR REGRESSION
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 7 Linear Regression.
Scatter Diagrams and Linear Correlation
Chapter 8: Linear Regression
AP Statistics Chapter 8: Linear Regression
Haroon Alam, Mitchell Sanders, Chuck McAllister- Ashley, and Arjun Patel.
Chapter 8: Linear Regression
Chapter 5 Regression. Chapter 51 u Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). u We.
Introduction to Linear Regression and Correlation Analysis
Lesson Least-Squares Regression. Knowledge Objectives Explain what is meant by a regression line. Explain what is meant by extrapolation. Explain.
Chapter 151 Describing Relationships: Regression, Prediction, and Causation.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Section 2.2 Correlation A numerical measure to supplement the graph. Will give us an indication of “how closely” the data points fit a particular line.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
 The equation used to calculate Cab Fare is y = 0.75x where y is the cost and x is the number of miles traveled. 1. What is the slope in this equation?
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression *The Linear Model *Residuals *Best Fit Line *Correlation and the Line *Predicated Values *Regression.
Chapter 8 Linear Regression. Slide 8- 2 Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression (3)
Chapter 8 Linear Regression
CHAPTER 5 Regression BPS - 5TH ED.CHAPTER 5 1. PREDICTION VIA REGRESSION LINE NUMBER OF NEW BIRDS AND PERCENT RETURNING BPS - 5TH ED.CHAPTER 5 2.
Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?
Chapter 8 Linear Regression. Objectives & Learning Goals Understand Linear Regression (linear modeling): Create and interpret a linear regression model.
CHAPTER 8 Linear Regression. Residuals Slide  The model won’t be perfect, regardless of the line we draw.  Some points will be above the line.
AP Statistics HW: p. 165 #42, 44, 45 Obj: to understand the meaning of r 2 and to use residual plots Do Now: On your calculator select: 2 ND ; 0; DIAGNOSTIC.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Chapter 8 Linear Regression. Fat Versus Protein: An Example 30 items on the Burger King menu:
Linear Regression Chapter 8. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
Copyright © 2010 Pearson Education, Inc. Chapter 3 Scatterplots, Correlation and Least Squares Regression Slide
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 7 Scatterplots, Association, and Correlation.
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 7, Slide 1 Chapter 7 Linear Regression.
Statistics 8 Linear Regression. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Honors Statistics Chapter 8 Linear Regression. Objectives: Linear model Predicted value Residuals Least squares Regression to the mean Regression line.
Part II Exploring Relationships Between Variables.
AP Statistics Chapter 8 Linear Regression. Objectives: Linear model Predicted value Residuals Least squares Regression to the mean Regression line Line.
Training Activity 4 (part 2)
CHAPTER 3 Describing Relationships
Finding the Best Fit Line
Chapter 8 Linear Regression.
Chapter 7 Linear Regression.
Chapter 8 Linear Regression Copyright © 2010 Pearson Education, Inc.
Finding the Best Fit Line
Chapter 8 Part 2 Linear Regression
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Chapter 8 Part 1 Linear Regression
Chi-square Test for Goodness of Fit (GOF)
CHAPTER 3 Describing Relationships
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
Algebra Review The equation of a straight line y = mx + b
Presentation transcript:

Chapter 8 Linear regression Math2200

Scatterplot One double Whopper contains 53 grams of protein, 65 grams of fat and 1020 calories. So two double would contain enough calories for a day. fat versus protein for 30 items on the Burger King menu

The linear model Parameters Model is NOT perfect! Predicted value Intercept Slope Model is NOT perfect! Predicted value Residual = = Observed – predicted Overestimate when residual<0 Underestimate when residual>0

The linear model (cont.) We write our model as This model says that our predictions from our model follow a straight line. The data values will scatter closely around it, if the model is a good fit .

How did I get the line? “Best fit” line Minimize residuals overall The line of best fit is the line for which the sum of the squared residuals is smallest. The least squares line minimizes

How do I get the line? (cont.) The regression line Slope: in units of y per unit of x Intercept: in units of y

Interpreting the regression line Slope: Increasing 1 unit in x  increasing units in y In particular, moving one standard deviation away from the mean in x moves us r standard deviations away from the mean in y. Intercept: predicted value of y when x=0 Predicted value at x

Fat Versus Protein: An Example The regression line for the Burger King data fits the data well: The equation is The predicted fat content for a BK Broiler chicken sandwich is 6.8 + 0.97(30) = 35.9 grams of fat.

How Big Can Predicted Values Get? r cannot be bigger than 1 (in absolute value), so each predicted y tends to be closer to its mean (in standard deviations) than its corresponding x was. This property of the linear model is called regression to the mean; the line is called the regression line.

Sir Francis Galton

Residuals Revisited The residuals are the part of the data that hasn’t been modeled. Data = Model + Residual or (equivalently) Residual = Data – Model Or, in symbols,

Residuals Revisited (cont.) When a regression model is appropriate, nothing interesting should be left behind. Residual plot should have no pattern, no bend, no outlier. Residual against x Residual against predicted value The spread of the residual plot should be the same throughout.

Residuals Revisited (cont.) If the residuals show no interesting pattern in the residual plot, we use standard deviation of the residuals to measure how much the points spread around the regression line. The standard deviation of residuals is given by We need the Equal Variance Assumption for the standard deviation of residuals. The associated condition to check is the Does the Plot Thicken? Condition.

Residual plot of regression stopping distance on car speed

How well does the linear model fit? The variation in the residuals is the key to assessing how well the model fits. Total fat (y): sd =16.4g Residual: sd = 9.2g less variation How much of the variation is accounted for by the model? How much is left in the residuals?

Variation The squared correlation gives the fraction of the data’s variation explained by the model. We can view as the percentage of variability in y that is NOT explained by the regression line, or the variability that has been left in the residuals For the BK model, r2 = 0.832 = 0.69, so 31% of the variability in total fat has been left in the residuals.

R2—The Variation Accounted For 0: no variance explained 1: all variance explained by the model How big should R2 be to conclude the model fit the data well? R2 is always between 0% and 100%. What makes a “good” R2 value depends on the kind of data you are analyzing and on what you want to do with it.

Check the following conditions The two variables are both quantitative The relationship is linear (straight enough) Scatterplot Residual plot No outliers: Are there very large residuals? Equal variance: all residuals should share the same spread Residual plot: Does the Plot Thicken?

Summary Whether the linear model is appropriate? Residual plot How well does the model fit? R2

Reality Check: Is the Regression Reasonable? Statistics don’t come out of nowhere. They are based on data. The results of a statistical analysis should reinforce your common sense, not fly in its face. If the results are surprising, then either you’ve learned something new about the world or your analysis is wrong. When you perform a regression, think about the coefficients and ask yourself whether they make sense.

What Can Go Wrong? Don’t fit a straight line to a nonlinear relationship. Beware extraordinary points (y-values that stand off from the linear pattern or extreme x-values). Don’t extrapolate beyond the data—the linear model may no longer hold outside of the range of the data. Don’t infer that x causes y just because there is a good linear model for their relationship—association is not causation. Don’t choose a model based on R2 alone.

What have we learned? When the relationship between two quantitative variables is fairly straight, a linear model can help summarize that relationship. The regression line doesn’t pass through all the points, but it is the best compromise in the sense that it has the smallest sum of squared residuals.

What have we learned? (cont.) The correlation tells us several things about the regression: The slope of the line is based on the correlation, adjusted for the units of x and y. For each SD in x that we are away from the x mean, we expect to be r SDs in y away from the y mean. Since r is always between -1 and +1, each predicted y is fewer SDs away from its mean than the corresponding x was (regression to the mean). R2 gives us the fraction of the response accounted for by the regression model.

What have we learned? (cont.) The residuals also reveal how well the model works. If a plot of the residuals against predicted values shows a pattern, we should re-examine the data to see why. The standard deviation of the residuals quantifies the amount of scatter around the line.

What have we learned? (cont.) The linear model makes no sense unless the Linear Relationship Assumption is satisfied. Also, we need to check the Straight Enough Condition and Outlier Condition with a scatterplot. For the standard deviation of the residuals, we must make the Equal Variance Assumption. We check it by looking at both the original scatterplot and the residual plot for Does the Plot Thicken? Condition.

TI-83 Enter data as lists first Press STAT Then move the cursor to CALC Press 4 (LinReg(ax+b)) or Press 8 (LinReg(a+bx)) Then put the list names for which you want to do regression, e.g., L1, L2 Press ENTER To see Set DIAGNOSTICS ON 2ND + 0 (CATALOG), move the cursor down to DiagnosticsOn Press ENTER (You will see ‘DONE’) Now repeat the above operations for linear regression, you will see the correlation coefficient and

TI-83 How to make the residual plot? Same as making a scatterplot Make the XLIST as the explanatory variable Make the YLIST as ‘RESID’

Summary for Chapters 7 and 8 How to read a scatter plot? Direction Form Strength Correlation coefficient When can you use it? How to calculate it? How to interpret it? Linear regression How to make predictions? How to read residual plot?