Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 21 The Simple Regression Model.

Slides:



Advertisements
Similar presentations
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Copyright © 2010 Pearson Education, Inc. Slide
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Objectives (BPS chapter 24)
The Simple Regression Model
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Linear Regression and Correlation Analysis
Simple Linear Regression Analysis
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
SIMPLE LINEAR REGRESSION
Pertemua 19 Regresi Linier
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Active Learning Lecture Slides
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Inference for regression - Simple linear regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Chapter 11 Simple Regression
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Copyright © 2011 Pearson Education, Inc. Regression Diagnostics Chapter 22.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 22 Regression Diagnostics.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 23 Multiple Regression.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Copyright © 2011 Pearson Education, Inc. Comparison Chapter 18.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 17 Comparison.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 27 Time Series.
Copyright © 2011 Pearson Education, Inc. Analysis of Variance Chapter 26.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 19 Linear Patterns.
Copyright © 2011 Pearson Education, Inc. The Simple Regression Model Chapter 21.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Copyright © 2011 Pearson Education, Inc. Time Series Chapter 27.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Copyright © 2011 Pearson Education, Inc. Regression Diagnostics Chapter 22.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 24 Building Regression Models.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 26 Analysis of Variance.
Lecture Slides Elementary Statistics Twelfth Edition
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
26134 Business Statistics Week 5 Tutorial
Inferences for Regression
Inference for Regression
Correlation and Simple Linear Regression
Lecture Slides Elementary Statistics Thirteenth Edition
Unit 3 – Linear regression
Correlation and Simple Linear Regression
CHAPTER 12 More About Regression
Inferences for Regression
Presentation transcript:

Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 21 The Simple Regression Model

Copyright © 2014, 2011 Pearson Education, Inc The Simple Regression Model What is the turn around time for small orders of specialized parts?  Use a simple regression model with response time as y and order size as x  Use inference related to regression: standard errors, confidence intervals and hypothesis tests

Copyright © 2014, 2011 Pearson Education, Inc The Simple Regression Model Estimated Production Time = Number of Units

Copyright © 2014, 2011 Pearson Education, Inc The Simple Regression Model  Simple Regression Model (SRM): model for the association in the population between an explanatory variable x and response y.  Consider the data to be a sample from a population.

Copyright © 2014, 2011 Pearson Education, Inc The Simple Regression Model Linear on Average  The equation of the SRM describes how the conditional mean of Y depends on X.  The SRM shows that these means lie on a line with intercept β 0 and slope β 1 :

Copyright © 2014, 2011 Pearson Education, Inc The Simple Regression Model Deviations from the Mean  The deviations of responses around are called errors.  Error, is denoted by, and E( ) = 0.

Copyright © 2014, 2011 Pearson Education, Inc The Simple Regression Model Deviations from the Mean The SRM makes three assumptions about : 1. Independent. Errors are independent of each other. 2. Equal variance. All errors have the same variance, Var( ) =. 3. Normal. The errors are normally distributed.

Copyright © 2014, 2011 Pearson Education, Inc The Simple Regression Model Data Generating Process  Let Y denote monthly sales of a company and let X denote its spending on advertising (both in thousands of dollars).  Assume the following population model:

Copyright © 2014, 2011 Pearson Education, Inc The Simple Regression Model Data Generating Process The SRM assumes a normal distribution at each x.

Copyright © 2014, 2011 Pearson Education, Inc The Simple Regression Model Data Generating Process Eventually the data shown below are observed.

Copyright © 2014, 2011 Pearson Education, Inc The Simple Regression Model Data Generating Process  The true regression line is a characteristic of the population, not the observed data.  The SRM is a model and offers a simplified view of reality.

Copyright © 2014, 2011 Pearson Education, Inc The Simple Regression Model Simple Regression Model (SRM) Observed values of the response Y are linearly related to the values of the explanatory variable X by the equation:, ~ N(0, ). The observations are independent of one another, have equal variance around the regression line, and are normally distributed around the regression line.

Copyright © 2014, 2011 Pearson Education, Inc Conditions for the SRM Conditions for the SRM – Checklist  Is the association between Y and X linear?  Have we ruled out lurking variables?  Are the errors evidently independent?  Are the variances of the residuals similar?  Are the residuals nearly normal?

Copyright © 2014, 2011 Pearson Education, Inc Conditions for the SRM Conditions – Production Time Example Linearity satisfied; no pattern in the residuals. Similar variances satisfied; spread of residuals constant around horizontal line.

Copyright © 2014, 2011 Pearson Education, Inc Conditions for the SRM Conditions – Production Time Example No obvious lurking variable. Without knowing more about context, we can only guess at a lurking variable (e.g., complexity of parts ordered). Evidently independent. Is there any reason to believe that the time needed for one run influences those of others? If data are time series, plot residuals over time.

Copyright © 2014, 2011 Pearson Education, Inc Conditions for the SRM Conditions for the SRM – Production Time Example Nearly normal condition satisfied. If not, need to have sample size condition (satisfied) to use CLT.

Copyright © 2014, 2011 Pearson Education, Inc Conditions for the SRM Modeling Process Before looking at plots, ask two questions: 1. Does a linear relationship make sense? 2. Is the relationship free of lurking variables? Then begin working with data.

Copyright © 2014, 2011 Pearson Education, Inc Conditions for the SRM Modeling Process  Plot y versus x and verify a linear association.  Fit the least squares line and obtain residuals.  Plot the residuals versus x.  If time series data, construct a timeplot of residuals.  Inspect the histogram and quantile plot of the residuals.

Copyright © 2014, 2011 Pearson Education, Inc Inference in Regression Parameters and Estimates for SRM

Copyright © 2014, 2011 Pearson Education, Inc Inference in Regression Standard Errors  Describe the sample-to-sample variability of b 0 and b 1  The estimated standard error of b 1 is

Copyright © 2014, 2011 Pearson Education, Inc Inference in Regression Estimated Standard Error of b 1 Influenced by:  Standard deviation of the residuals. As it increases, the standard error increases.  Sample size. As it increases, the standard error decreases.  Standard deviation of x. As it increases, the standard error increases.

Copyright © 2014, 2011 Pearson Education, Inc Inference in Regression More variation in x leads to better estimate of slope.

Copyright © 2014, 2011 Pearson Education, Inc Inference in Regression Results for Production Time Example

Copyright © 2014, 2011 Pearson Education, Inc Inference in Regression Confidence Intervals The 95% confidence interval for β 1 is The 95% confidence interval for β 0 is

Copyright © 2014, 2011 Pearson Education, Inc Inference in Regression Confidence Intervals – Production Time Example The 95% confidence interval for β 1 is The 95% confidence interval for β 0 is

Copyright © 2014, 2011 Pearson Education, Inc Inference in Regression Hypothesis Tests To test H 0 : β 1 = 0 use To test H 0 : β 0 = 0 use

Copyright © 2014, 2011 Pearson Education, Inc Inference in Regression Hypothesis Tests – Production Time Example  The t-statistic of with p-value of < indicates that the slope is significantly different from zero.  The t-statistic of 4.2 with p-value of indicates that the intercept is significantly different from zero.

Copyright © 2014, 2011 Pearson Education, Inc Inference in Regression Equivalent Inferences for SRM We reject the claim that a parameter in the SRM equals zero with 95% confidence (or a 5% chance of Type I error) if  Zero lies outside the 95% confidence interval;  The absolute value of the associated t-statistic is larger than 2; or  The p-value reported with the t-statistic is less than 0.05.

Copyright © 2014, 2011 Pearson Education, Inc. 29 4M Example 21.1: LOCATING A FRANCHISE OUTLET Motivation Does traffic volume affect gasoline sales? How much more gasoline can be expected to be sold at a franchise location with an average of 40,000 drive-bys compared to one with an average of 32,000 drive-bys?

Copyright © 2014, 2011 Pearson Education, Inc. 30 4M Example 21.1: LOCATING A FRANCHISE OUTLET Method Use sales data from a recent month obtained from 80 franchise outlets. The 95% confidence interval for 8,000 times the estimated slope will indicate how much more gas is expected to sell at the busier location.

Copyright © 2014, 2011 Pearson Education, Inc. 31 4M Example 21.1: LOCATING A FRANCHISE OUTLET Method Association is linear; no obvious lurking variable.

Copyright © 2014, 2011 Pearson Education, Inc. 32 4M Example 21.1: LOCATING A FRANCHISE OUTLET Mechanics

Copyright © 2014, 2011 Pearson Education, Inc. 33 4M Example 21.1: LOCATING A FRANCHISE OUTLET Mechanics Residual plot confirms similar variances.

Copyright © 2014, 2011 Pearson Education, Inc. 34 4M Example 21.1: LOCATING A FRANCHISE OUTLET Mechanics Residuals appear normally distributed.

Copyright © 2014, 2011 Pearson Education, Inc. 35 4M Example 21.1: LOCATING A FRANCHISE OUTLET Mechanics The 95% confidence interval for β 1 is approximately to gallons/car. Hence, a difference of 8,000 cars in daily traffic volume implies a difference in average daily sales of approximately 1,507 to 2,281 more gallons per day.

Copyright © 2014, 2011 Pearson Education, Inc. 36 4M Example 21.1: LOCATING A FRANCHISE OUTLET Message Based on a sample of 80 gas stations, we expect that a station located at a site with 40,000 drive-bys will sell, on average, from 1,507 to 2,281 more gallons of gas daily than a location with 32,000 drive bys.

Copyright © 2014, 2011 Pearson Education, Inc Prediction Intervals Leveraging the SRM  Prediction interval: an interval designed to hold a fraction (usually 95%) of the values of the response for a given value of x.  A prediction interval differs from a confidence interval because it makes a statement about the location of a new observation rather than a parameter of a population.

Copyright © 2014, 2011 Pearson Education, Inc Prediction Intervals Leveraging the SRM The 95% prediction interval for y new is where and

Copyright © 2014, 2011 Pearson Education, Inc Prediction Intervals Leveraging the SRM  A simple approximation for a 95% prediction interval is.  Prediction intervals are reliable within the range of observed data. They are also sensitive to the assumptions of constant variance and normality.

Copyright © 2014, 2011 Pearson Education, Inc Prediction Intervals Leveraging the SRM – Production Time Example At x = 300 units, = minutes. The resulting 95% prediction interval is [660.9 to 1,148.4] minutes.

Copyright © 2014, 2011 Pearson Education, Inc Prediction Intervals Leveraging the SRM – Production Time Example 95% prediction intervals hold about 95% of the data if the SRM holds.

Copyright © 2014, 2011 Pearson Education, Inc Prediction Intervals Reliability of Prediction Intervals Prediction intervals fail when the SRM does not hold. This is the problem with extrapolation.

Copyright © 2014, 2011 Pearson Education, Inc. 43 4M Example 21.2: MANAGING NATURAL RESOURCES Motivation In managing commercial fishing fleets, the level of effort (number of boat-days) is assumed to influence the size of the catch. What is the predicted crab catch in a season with 7,500 days of effort?

Copyright © 2014, 2011 Pearson Education, Inc. 44 4M Example 21.2: MANAGING NATURAL RESOURCES Method Use regression with Y equal to the catch near Vancouver Island from 1980 – 2007 measured in thousands of pounds of Dungeness crabs with X equal to the level of effort (total number of days by boats catching Dungeness crabs).

Copyright © 2014, 2011 Pearson Education, Inc. 45 4M Example 21.2: MANAGING NATURAL RESOURCES Method Linear association is evident.

Copyright © 2014, 2011 Pearson Education, Inc. 46 4M Example 21.2: MANAGING NATURAL RESOURCES Mechanics

Copyright © 2014, 2011 Pearson Education, Inc. 47 4M Example 21.2: MANAGING NATURAL RESOURCES Mechanics Evidently independent.

Copyright © 2014, 2011 Pearson Education, Inc. 48 4M Example 21.2: MANAGING NATURAL RESOURCES Mechanics Similar variances confirmed.

Copyright © 2014, 2011 Pearson Education, Inc. 49 4M Example 21.2: MANAGING NATURAL RESOURCES Mechanics Nearly normal condition could be satisfied.

Copyright © 2014, 2011 Pearson Education, Inc. 50 4M Example 21.2: MANAGING NATURAL RESOURCES Mechanics The t-statistic (and p-value) indicate that the slope is significantly different from zero. The predicted catch in a year with x = 7500 days of effort is 1, thousand pounds. The exact 95% prediction interval (from software) is from to 1, thousand pounds.

Copyright © 2014, 2011 Pearson Education, Inc. 51 4M Example 21.2: MANAGING NATURAL RESOURCES Message There is a statistically significant linear association between days of effort and total catch. On average, each additional day of effort (per boat) increases the harvest by about 160 pounds. In a season with 7,500 days of effort, there is an expected total harvest of about 1.2 million pounds. There is a 95% probability that the catch will be between 910,000 and 1.4 million pounds.

Copyright © 2014, 2011 Pearson Education, Inc. 52 Best Practices  Verify that your model makes sense, both visually and substantively.  Consider other possible explanatory variables.  Check the conditions, in the listed order.

Copyright © 2014, 2011 Pearson Education, Inc. 53 Best Practices (Continued)  Use confidence intervals to express what you know about the slope and intercept.  Check the assumptions of the SRM carefully before using prediction intervals.  Be careful when extrapolating.

Copyright © 2014, 2011 Pearson Education, Inc. 54 Pitfalls  Don’t overreact to residual plots.  Do not mistake varying amounts of data for unequal variances.  Do not confuse confidence intervals with prediction intervals.  Do not expect that r 2 and s e must improve with a larger sample.