Inference for Regression

Slides:



Advertisements
Similar presentations
Inference for Linear Regression (C27 BVD). * If we believe two variables may have a linear relationship, we may find a linear regression line to model.
Advertisements

Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Inferences for Regression.
Copyright © 2010 Pearson Education, Inc. Chapter 27 Inferences for Regression.
Chapter 27 Inferences for Regression This is just for one sample We want to talk about the relation between waist size and %body fat for the complete population.
Inferences for Regression
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression Slope
Inference for Regression
CH 27. * Data were collected on 208 boys and 206 girls. Parents reported the month of the baby’s birth and age (in weeks) at which their child first crawled.
Regression Inferential Methods
CHAPTER 24: Inference for Regression
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 14: More About Regression Section 14.1 Inference for Linear Regression.
Objectives (BPS chapter 24)
Inference for Regression 1Section 13.3, Page 284.
Inferences for Regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Welcome to class today! Chapter 12 summary sheet Jimmy Fallon video
Copyright © 2010 Pearson Education, Inc. Chapter 24 Comparing Means.
Correlation and Regression Analysis
Chapter 12 Section 1 Inference for Linear Regression.
Inference for regression - Simple linear regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 23, Slide 1 Chapter 23 Comparing Means.
Inferences for Regression
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Chapter 15 Inference for Regression
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 24 Comparing Means.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Regression. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for each. Using the slope from each.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Regression with Inference Notes: Page 231. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for.
AP Statistics Chapter 27 Notes “Inference for Regression”
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
AP Statistics Chapter 24 Comparing Means.
Chapter 12 Inference for Linear Regression. Reminder of Linear Regression First thing you should do is examine your data… First thing you should do is.
ANOVA, Regression and Multiple Regression March
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Comparing Means Chapter 24. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 26, Slide 1 Chapter 27 Inferences for Regression.
Inference for Regression
Chapter 26 Inferences for Regression. An Example: Body Fat and Waist Size Our chapter example revolves around the relationship between % body fat and.
Statistics 27 Inferences for Regression. An Example: Body Fat and Waist Size Our chapter example revolves around the relationship between % body fat and.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Statistics 24 Comparing Means. Plot the Data The natural display for comparing two groups is boxplots of the data for the two groups, placed side-by-side.
Chapter 12 Inference for Linear Regression. Reminder of Linear Regression First thing you should do is examine your data… First thing you should do is.
BPS - 5th Ed. Chapter 231 Inference for Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Statistics 27 Inferences for Regression. An Example: Body Fat and Waist Size Our chapter example revolves around the relationship between % body fat and.
Regression Inference. Height Weight How much would an adult male weigh if he were 5 feet tall? He could weigh varying amounts (in other words, there is.
Chapter 13 Lesson 13.2a Simple Linear Regression and Correlation: Inferential Methods 13.2: Inferences About the Slope of the Population Regression Line.
Chapter 13 Lesson 13.2a Simple Linear Regression and Correlation: Inferential Methods 13.2: Inferences About the Slope of the Population Regression Line.
CHAPTER 12 More About Regression
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Chapter 23 Comparing Means.
Inferences for Regression
Inference for Regression
CHAPTER 12 More About Regression
Inferences for Regression
CHAPTER 29: Multiple Regression*
Regression Chapter 8.
CHAPTER 12 More About Regression
Chapter 24 Comparing Means Copyright © 2009 Pearson Education, Inc.
Inference for Regression Slope
CHAPTER 12 More About Regression
Inferences for Regression
Inference for Regression
Presentation transcript:

Inference for Regression 4/22/2017 5:24 AM Inference for Regression Course: AP Statistics Chapter: 27 Book: Stats: Modeling the World Authors: BVD (2nd edition) © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Categorical Variables: Use Chi-Squared Procedures Inference for: Categorical Variables: Use Chi-Squared Procedures Quantitative Variables: Use LinearRegression

Regression reminders Regression Line: Predicted value of y

Regression reminders Regression Line: x is the explanatory variable y is the response variable

Regression reminders Regression Line: residual = Predicted value of y Actual value of y

So…what’s new?? Regression Line: Now, in Chapter 27, the regression line we find represents a SAMPLE of some given data. It’s the best-fit line for that sample, so the slope and y-intercept we have found are the statistics for that line. BIG QUESTION: What are the slope and y-intercepts for the POPULATION regression line?

Chapter 27 Regression Line: We are going to use our SAMPLE statistics to find our POPULATION statistics (or, at least, we’ll get as close as we can). What are these called??

Population Statistics Regression Line: = Sample slope = Population slope = Sample y-intercept = Population y-intercept

Population Regression Line Sample Regression Line: Population Regression Line:

Population Regression Line So.. We have two statistics we need to find. What do we do? A) First find the slope and then find the y-intercept. What model will we use? Student’s t-curve, with degree of freedom = df = n-2 (because we have both x values and y values to consider)

Confidence Interval (for slope) How do we find the Standard Error??

Confidence Interval (for slope) How do we find the Standard Error?? We don’t! We’ll let our calculator (or a computer printout) give it to us.

Confidence Interval (for slope) Really? We don’t care about the Standard Error for the slope? Well….actually, we care a little. It depends on three things: 1) The spread of the residuals -more about this later! The spread of the x-values 3) The sample size (n)

Confidence Interval (for slope) Let’s find the Standard Error. Ready to try?? Here is a sample computer printout.

Confidence Interval (for slope) Help! That’s too confusing. What do I need?

Confidence Interval (for slope) The Constant you see is the value of

Confidence Interval (for slope) Age is the name of x and the slope

Confidence Interval (for slope) Age is the name of x and the slope

Confidence Interval (for slope) Income is the name of y (the response variable)

Confidence Interval (for slope) The degree of freedom is given ….. df = 25

Confidence Interval (for slope) And so is the Standard Error for the slope! 337.7

Confidence Interval (for slope) The equation of the regression line would be:

Confidence Interval (for slope) Wait a second….that’s chapter 8. We’re in Ch. 27. We want to find the Confidence Interval for Slope!

Confidence Interval (for slope)

Confidence Interval (for slope) I am 95% confident that the true slope of the regression line is between -451.5 and 939.7. But….that’s not in context! We need to state this in context…..

Confidence Interval (for slope) I am 95% confident that the true change in the amount of income for 1 year increase in age is between $451.50 lost and $939.70 gained.

Hypothesis Testing for Slope That’s great, but what about Hypothesis Testing? What would the Null Hypothesis be?

Hypothesis Testing for Slope Remember, Null means Nothing. Or…no change. Therefore, for each increase in x there must be no change in y. That means the slope must be zero. Or….

Hypothesis Testing for Slope Hypotheses 2-tailed

Hypothesis Testing for Slope What about the t-score? The P-Value? Easy! Everything is based on the student’s t-curve, so the mechanics are the same….

Hypothesis Testing for Slope For our line, it would be:

Hypothesis Testing for Slope Hey…is that value in the computer printout??

Hypothesis Testing for Slope The P-Value is also the same:

Hypothesis Testing for Slope With a P-value of .4763, which is large compared to an alpha level of .05, I will fail to reject the null and conclude that there is no evidence to suggest the true slope is different than zero.

Conditions & Assumptions What about the conditions and assumptions??? We skipped them…. And …… THAT’S BAD!

Conditions & Assumptions There are 4 of them to satisfy. Linearity assumption The scatterplot of the data should be “roughly linear”. We show this two ways and we have done both before! 1) Graph the scatterplot and look at it. Does it look straight? 2) Graph the residuals against the x-variable. It should be randomly scattered. If this condition fails then straighten the data (see Ch. 9)

Conditions & Assumptions Here is a scatterplot comparing waist size to body fat percentage.

Conditions & Assumptions Here is a scatterplot of the residuals plotted against the x-value (waist size).

Conditions & Assumptions 2) Independence Assumption The next three are a little tricky. That’s only because we need to understand what is happening with inference on regression lines. Here’s the situation: When you have a sample of data and you find the sample regression line for that data you are fitting the line that best fits (or passes through) the y-values that you have plotted at each x-value. Here is an example:

Conditions & Assumptions Here is the scatterplot again (comparing waist size to body fat percentage).

Conditions & Assumptions At each x-value there are multiple y-values that spread out around the line.

Conditions & Assumptions For the true regression line, the y-values at each x-value should each be nearly normal. In fact, notice that the true regression line passes through the mean of each set of y-values…..

Conditions & Assumptions That means the true regression line can be thought of as: and the residuals would be: Notice that this is the same as

Conditions & Assumptions So..the residuals are really what we care about here, not the y-values…. If the residuals for a regression model make sense then so will the y-values.

Conditions & Assumptions 2) Independence Assumption Okay, back to #2. We now know the residuals (errors) are what we care about here. For #2 we want these to be independent for a given sample. If the sample was collected randomly, we are fine. Just state that the data can be assumed to be independent because the sample was random. You have no reason to believe that any y-value (or residual) has any impact on another one. Easy!

Conditions & Assumptions 2) Independence Assumption Wait…didn’t you say this was hard? Well, it can be. If you are graphing a time plot (x represents time) the y-values might not be independent. Now you need to check the residuals. So…we graph them against the x-values (you already did this!) and see what we get. It should be a random scatter. Any pattern will show there is some sort of relationship which indicates a lack of independence. Moving on….

Conditions & Assumptions 3) Equal Variance Assumption Okay…this one is a little tricky. But, that’s only because you don’t know WHY we are checking for it. Let’s stop and figure that out first. The best thing to do is to go once more to that image of normal models along the line….

Conditions & Assumptions 3) Equal Variance Assumption What we want is for the spread of each set of y-values to be roughly the same. Remember, we care about residuals, so what this means is that we want to Standard Deviation of the residuals to be uniform. That means the residuals should be the same throughout.

Conditions & Assumptions 3) Equal Variance Assumption That means we want the spread of each set of y-values to be roughly the same. Remember, we care about residuals, so what this means is that we want the Standard Deviation of the residuals to be uniform. Huh? Well, it means should not fan out, or clump together. The spread about the line should be the same (constant) throughout. This is called the, “DOES THE PLOT THICKEN?” Condition. How do we check for this? Residuals again. If the plot does fan out, it will show up in the residual plot against y. Here it is:

Conditions & Assumptions 3) Equal Variance Assumption Randomly scattered residuals! y-values (body fat percentage)

Conditions & Assumptions 4) Normal Population Assumption We’ve already looked at the residuals and seen that we want them to be nearly normal at each x-value. This is important so we can use the Student t-curve in the mechanics section. How do we check this? Group all the residuals together (they are sitting in the Resid List, waiting for you!) and graph them to see if they are nearly normal. Graph them? To check for nearly normal? HOW????

Conditions & Assumptions 4) Normal Population Assumption HOW? You know how! We’ve done this before!! Graph them as a histogram and check for unimodal and symmetric. What about the normal probability plot? Should we do that as well?

Conditions & Assumptions 4) Normal Population Assumption HOW? You know how! We’ve done this before!! Graph them as a histogram and check for unimodal and symmetric. What about the normal probability plot? Should we do that as well? Not this time! Phew!!

Practice Problem! Let’s try one that is done on the calculator instead of the computer printout. The best part of this is that Linear Regression for slope is never done by yourself. All of the calculations are either given by the calculator or by the computer or are really easy. And…one more thing…..we rarely do regression for the y-intercept. It usually isn’t a value we care about!