COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., 1999 10-1 Using Statistics The Simple.

Slides:



Advertisements
Similar presentations
Variation, uncertainties and models Marian Scott School of Mathematics and Statistics, University of Glasgow June 2012.
Advertisements

Simple Linear Regression 1. review of least squares procedure 2
Chapter 4: Basic Estimation Techniques
Multiple Regression. Introduction In this chapter, we extend the simple linear regression model. Any number of independent variables is now allowed. We.
Lecture Unit Multiple Regression.
Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Experimental Design and Analysis of Variance
10-1 COMPLETE BUSINESS STATISTICS by AMIR D. ACZEL & JAYAVEL SOUNDERPANDIAN 6 th edition (SIE)
Simple Linear Regression Analysis
Correlation and Linear Regression
Multiple Regression and Model Building
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Linear regression models
Simple Linear Regression and Correlation
Objectives (BPS chapter 24)
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Chapter Topics Types of Regression Models
Simple Linear Regression Analysis
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression Analysis
Lecture 5 Correlation and Regression
Correlation & Regression
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Regression Analysis and Multiple Regression
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Lecture 10: Correlation and Regression Model.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 4: Basic Estimation Techniques
Chapter 20 Linear and Multiple Regression
Regresi dan Korelasi Pertemuan 10
Least Square Regression
Least Square Regression
Relationship with one independent variable
Chapter 13 Simple Linear Regression
Correlation and Simple Linear Regression
Inference for Regression Lines
CHAPTER 29: Multiple Regression*
Correlation and Simple Linear Regression
Relationship with one independent variable
Simple Linear Regression and Correlation
Chapter Fourteen McGraw-Hill/Irwin
Chapter Thirteen McGraw-Hill/Irwin
Chapter 13 Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Using Statistics The Simple Linear Regression Model Estimation: The Method of Least Squares Error Variance and the Standard Errors of Regression Estimators Correlation Hypothesis Tests about the Regression Relationship How Good is the Regression? Analysis of Variance Table and an F Test of the Regression Model Residual Analysis and Checking for Model Inadequacies Use of the Regression Model for Prediction Using the Computer Summary and Review of Terms Simple Linear Regression and Correlation 10

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Using Statistics

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., X Y X Y X Y X Y X Y X Y Examples of Other Scatterplots

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., The inexact nature of the relationship between advertising and sales suggests that a statistical model might be useful in analyzing the relationship. A statistical model separates the systematic component of a relationship from the random component. The inexact nature of the relationship between advertising and sales suggests that a statistical model might be useful in analyzing the relationship. A statistical model separates the systematic component of a relationship from the random component. Data Statistical model Systematic component + Random errors In ANOVA, the systematic component is the variation of means between samples or treatments (SSTR) and the random component is the unexplained variation (SSE). In regression, the systematic component is the overall linear relationship, and the random component is the variation around the line. In ANOVA, the systematic component is the variation of means between samples or treatments (SSTR) and the random component is the unexplained variation (SSE). In regression, the systematic component is the overall linear relationship, and the random component is the variation around the line. Model Building

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., The population simple linear regression model: Y=  0 +  1 X +  Nonrandom or Random Systematic Component Component where Y is the dependent variable, the variable we wish to explain or predict; X is the independent variable, also called the predictor variable; and  is the error term, the only random component in the model, and thus, the only source of randomness in Y.  0 is the intercept of the systematic component of the regression relationship.  1 is the slope of the systematic component. The conditional mean of Y: The population simple linear regression model: Y=  0 +  1 X +  Nonrandom or Random Systematic Component Component where Y is the dependent variable, the variable we wish to explain or predict; X is the independent variable, also called the predictor variable; and  is the error term, the only random component in the model, and thus, the only source of randomness in Y.  0 is the intercept of the systematic component of the regression relationship.  1 is the slope of the systematic component. The conditional mean of Y: 10-2 The Simple Linear Regression Model

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., The simple linear regression model posits an exact linear relationship between the expected or average value of Y, the dependent variable, and X, the independent or predictor variable: E[Y i ]=  0 +  1 X i Actual observed values of Y differ from the expected value by an unexplained or random error: Y i = E[Y i ] +  i =  0 +  1 X i +  i The simple linear regression model posits an exact linear relationship between the expected or average value of Y, the dependent variable, and X, the independent or predictor variable: E[Y i ]=  0 +  1 X i Actual observed values of Y differ from the expected value by an unexplained or random error: Y i = E[Y i ] +  i =  0 +  1 X i +  i X Y E[Y]=  0 +  1 X XiXi } }  1 = Slope 1  0 = Intercept YiYi { Error:  i Regression Plot Picturing the Simple Linear Regression Model

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., The relationship between X and Y is a straight-line relationship. The values of the independent variable X are assumed fixed (not random); the only randomness in the values of Y comes from the error term  i. The errors  i are normally distributed with mean 0 and variance  2. The errors are uncorrelated (not related) in successive observations. That is:  ~ N(0,  2 ) The relationship between X and Y is a straight-line relationship. The values of the independent variable X are assumed fixed (not random); the only randomness in the values of Y comes from the error term  i. The errors  i are normally distributed with mean 0 and variance  2. The errors are uncorrelated (not related) in successive observations. That is:  ~ N(0,  2 ) X Y E[Y]=  0 +  1 X Assumptions of the Simple Linear Regression Model Identical normal distributions of errors, all centered on the regression line. Assumptions of the Simple Linear Regression Model

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Estimation of a simple linear regression relationship involves finding estimated or predicted values of the intercept and slope of the linear regression line. The estimated regression equation: Y=b 0 + b 1 X + e where b 0 estimates the intercept of the population regression line,  0 ; b 1 estimates the slope of the population regression line,  1 ; and e stands for the observed errors - the residuals from fitting the estimated regression line b 0 + b 1 X to a set of n points. Estimation of a simple linear regression relationship involves finding estimated or predicted values of the intercept and slope of the linear regression line. The estimated regression equation: Y=b 0 + b 1 X + e where b 0 estimates the intercept of the population regression line,  0 ; b 1 estimates the slope of the population regression line,  1 ; and e stands for the observed errors - the residuals from fitting the estimated regression line b 0 + b 1 X to a set of n points Estimation: The Method of Least Squares

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Fitting a Regression Line X Y Data X Y Three errors from a fitted line X Y Three errors from the least squares regression line e X Errors from the least squares regression line are minimized

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., { Y X Errors in Regression

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Least Squares Regression b0b0 SSE b1b1 Least squares b 0 Least squares b 1

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Sums of Squares, Cross Products, and Least Squares Estimators

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., MilesDollarsMiles 2 Miles*Dollars MilesDollarsMiles 2 Miles*Dollars Example 10-1

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., MTB > Regress 'Dollars' 1 'Miles'; SUBC> Constant. Regression Analysis The regression equation is Dollars = Miles Predictor Coef Stdev t-ratio p Constant Miles s = R-sq = 96.5% R-sq(adj) = 96.4% Analysis of Variance SOURCE DF SS MS F p Regression Error Total MTB > Regress 'Dollars' 1 'Miles'; SUBC> Constant. Regression Analysis The regression equation is Dollars = Miles Predictor Coef Stdev t-ratio p Constant Miles s = R-sq = 96.5% R-sq(adj) = 96.4% Analysis of Variance SOURCE DF SS MS F p Regression Error Total Example 10-1: Using the Computer

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., The results on the right side are the output created by selecting REGRESSION option from the DATA ANALYSIS toolkit. Example 10-1: Using Computer-Excel

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Residual Analysis. The plot shows the absence of a relationship between the residuals and the X-values (miles). Residuals vs. Miles Miles Residuals Example 10-1: Using Computer-Excel

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Y X What you see when looking at the total variation of Y. X What you see when looking along the regression line at the error variance of Y. Y Total Variance and Error Variance

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., X Y Square and sum all regression errors to find SSE Error Variance and the Standard Errors of Regression Estimators

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Standard Errors of Estimates in Regression

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Length = 1 Height = Slope Least-squares point estimate: b 1 = Upper 95% bound on slope: Lower 95% bound: (not a possible value of the regression slope at 95%) 0 Confidence Intervals for the Regression Parameters

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., The correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables. The population correlation, denoted by , can take on any value from -1 to 1. The correlation between two random variables, X and Y, is a measure of the degree of linear association between the two variables. The population correlation, denoted by , can take on any value from -1 to 1.  indicates a perfect negative linear relationship -1<  <0 indicates a negative linear relationship  indicates no linear relationship 0<  <1 indicates a positive linear relationship  indicates a perfect positive linear relationship The absolute value of  indicates the strength or exactness of the relationship.  indicates a perfect negative linear relationship -1<  <0 indicates a negative linear relationship  indicates no linear relationship 0<  <1 indicates a positive linear relationship  indicates a perfect positive linear relationship The absolute value of  indicates the strength or exactness of the relationship Correlation

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Y X  =0 Y X  =-.8 Y X  =.8 Y X  =0 Y X  =-1 Y X  =1 Illustrations of Correlation

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Example 10-1: = r SS XY SS X Y   ()().. *Note: If  0, b 1 >0 Covariance and Correlation

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Example 10-2: Using Computer-Excel

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., United States International Y = XR-Sq = Regression Plot Example 10-2: Regression Plot

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., H 0 :  =0(No linear relationship) H 1 :  0(Some linear relationship) Test Statistic: Hypothesis Tests for the Correlation Coefficient

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Y X Y X Y X Constant YUnsystematic VariationNonlinear Relationship A hypothesis test for the existence of a linear relationship between X and Y: H 0 H 1 Test statistic for the existence of a linear relationship between X and Y: (-) where is the least-squares estimate ofthe regression slope and() is the standard error of. When thenull hypothesis is true, the statistic has a distribution with- degrees offreedom. : : ()      t n b sb bsbb tn Hypothesis Tests about the Regression Relationship

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Hypothesis Tests for the Regression Slope

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., The coefficient of determination, r 2, is a descriptive measure of the strength of the regression relationship, a measure of how well the regression line fits the data.. { Y X { } Total Deviation Explained Deviation Unexplained Deviation Percentage of total variation explained by the regression How Good is the Regression?

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Y X r 2 =0SSE SST Y X r 2 =0.90 SSESSE SST SSR Y X r 2 =0.50 SSE SST SSR The Coefficient of Determination

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Analysis of Variance and an F Test of the Regression Model

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Residual Analysis and Checking for Model Inadequacies

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Point Prediction – A single-valued estimate of Y for a given value of X obtained by inserting the value of X in the estimated regression equation. Prediction Interval – For a value of Y given a value of X Variation in regression line estimate Variation of points around regression line – For an average value of Y given a value of X Variation in regression line estimate Point Prediction – A single-valued estimate of Y for a given value of X obtained by inserting the value of X in the estimated regression equation. Prediction Interval – For a value of Y given a value of X Variation in regression line estimate Variation of points around regression line – For an average value of Y given a value of X Variation in regression line estimate Use of the Regression Model for Prediction

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., X Y X Y Regression line Upper limit on slope Lower limit on slope 1) Uncertainty about the slope of the regression line X Y X Y Regression line Upper limit on intercept Lower limit on intercept 2) Uncertainty about the intercept of the regression line Errors in Predicting E[Y|X]

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., X Y X Prediction Interval for E[Y|X] Y Regression line The prediction band for E[Y|X] is narrowest at the mean value of X. The prediction band widens as the distance from the mean of X increases. Predictions become very unreliable when we extrapolate beyond the range of the sample itself. The prediction band for E[Y|X] is narrowest at the mean value of X. The prediction band widens as the distance from the mean of X increases. Predictions become very unreliable when we extrapolate beyond the range of the sample itself. Prediction Interval for E[Y|X] Prediction band for E[Y|X]

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Additional Error in Predicting Individual Value of Y 3) Variation around the regression line X Y Regression line X Y X Prediction Interval for E[Y|X] Y Regression line Prediction band for E[Y|X] Prediction band for Y

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Prediction Interval for a Value of Y

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Prediction Interval for the Average Value of Y

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., MTB > regress 'Dollars' 1 'Miles' tres in C3 fits in C4; SUBC> predict 4000; SUBC> residuals in C5. Regression Analysis The regression equation is Dollars = Miles Predictor Coef Stdev t-ratio p Constant Miles s = R-sq = 96.5% R-sq(adj) = 96.4% Analysis of Variance SOURCE DF SS MS F p Regression Error Total Fit Stdev.Fit 95.0% C.I. 95.0% P.I ( , ) ( , ) MTB > regress 'Dollars' 1 'Miles' tres in C3 fits in C4; SUBC> predict 4000; SUBC> residuals in C5. Regression Analysis The regression equation is Dollars = Miles Predictor Coef Stdev t-ratio p Constant Miles s = R-sq = 96.5% R-sq(adj) = 96.4% Analysis of Variance SOURCE DF SS MS F p Regression Error Total Fit Stdev.Fit 95.0% C.I. 95.0% P.I ( , ) ( , ) Using the Computer

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Miles R e s i d s Fits R e s i d s MTB > PLOT 'Resids' * 'Fits'MTB > PLOT 'Resids' *'Miles' Plotting on the Computer (1)

COMPLETE f o u r t h e d i t i o n BUSINESS STATISTICS Aczel Irwin/McGraw-Hill © The McGraw-Hill Companies, Inc., Plotting on the Computer (2) MTB > HISTOGRAM 'StRes' StRes F r e q u e n c y Miles D o l l a r s MTB > PLOT 'Dollars' * 'Miles'