Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.

Slides:



Advertisements
Similar presentations
Copyright ©2006 Brooks/Cole A division of Thomson Learning, Inc. Introduction to Probability and Statistics Twelfth Edition Robert J. Beaver Barbara M.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
Objectives (BPS chapter 24)
Chapter 12 Linear Regression and Correlation
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
SIMPLE LINEAR REGRESSION
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression Analysis
Note 13 of 5E Statistics with Economics and Business Applications Chapter 11 Linear Regression and Correlation Correlation Coefficient, Least Squares,
Introduction to Probability and Statistics Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Chapter 7 Forecasting with Simple Regression
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Linear Regression/Correlation
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Regression Analysis (2)
Simple Linear Regression Models
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Introduction to Probability and Statistics Chapter 12 Linear Regression and Correlation.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Chap 13-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 12.
Copyright ©2011 Nelson Education Limited Linear Regression and Correlation CHAPTER 12.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Lecture 10: Correlation and Regression Model.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
Chapter 12 Simple Linear Regression.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Chapter 13 Simple Linear Regression
Statistics for Managers using Microsoft Excel 3rd Edition
Correlation and Simple Linear Regression
Simple Linear Regression
Chapter 13 Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Chapter 13 Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation

Correlation & Regression Univariate & Bivariate Statistics  U: frequency distribution, mean, mode, range, standard deviation  B: correlation – two variables Correlation  linear pattern of relationship between one variable (x) and another variable (y) – an association between two variables  graphical representation of the relationship between two variables Warning:  No proof of causality  Cannot assume x causes y

1. Correlation Analysis Correlation coefficientCorrelation coefficient measures the strength of the relationship between x and y Sample Pearson’s correlation coefficient

Pearson’s Correlation Coefficient “r” indicates… strength of relationship (strong, weak, or none) direction of relationship positive (direct) – variables move in same direction negative (inverse) – variables move in opposite directions r ranges in value from –1.0 to +1.0 Strong Negative No Rel. Strong Positive

Limitations of Correlation linearity: can’t describe non-linear relationships e.g., relation between anxiety & performance no proof of causation Cannot assume x causes y

Y X Y X Y Y X X Linear relationshipsCurvilinear relationships Some Correlation Patterns

Y X Y X Y Y X X Strong relationshipsWeak relationships Some Correlation Patterns

Example The table shows the heights and weights of n = 10 randomly selected college football players. Player Height, x Weight, y

r =.8261 Strong positive correlation As the player’s height increases, so does his weight. r =.8261 Strong positive correlation As the player’s height increases, so does his weight. Example – scatter plot

Inference using r coefficient of correlationThe population coefficient of correlation is called (“rho”). We can test for a significant correlation between x and y using a t test:

Is there a significant positive correlation between weight and height in the population of all college football players? Use the t-table with n-2 = 8 df to bound the p-value as p-value <.005. There is a significant positive correlation between weight and height in the population of all college football players. Use the t-table with n-2 = 8 df to bound the p-value as p-value <.005. There is a significant positive correlation between weight and height in the population of all college football players. Example

2. Linear Regression Regression: Correlation + Prediction Regression analysis is used to predict the value of one variable (the dependent variable) on the basis of other variables (the independent variables). ◦ Dependent variable: denoted Y ◦ Independent variables: denoted X 1, X 2, …, X k

Example Let y be the monthly sales revenue for a company. This might be a function of several variables: ◦ x 1 = advertising expenditure ◦ x 2 = time of year ◦ x 3 = state of economy ◦ x 4 = size of inventory We want to predict y using knowledge of x 1, x 2, x 3 and x 4.

Some Questions Which of the independent variables are useful and which are not? How could we create a prediction equation to allow us to predict y using knowledge of x 1, x 2, x 3 etc? How good is this prediction? We start with the simplest case, in which the response y is a function of a single independent variable, x.

A statistical model separates the systematic component of a relationship from the random component. Data Statistical model Systematic component + Random errors In regression, the systematic component is the overall linear relationship, and the random component is the variation around the line. Model Building

Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and the level of the explanatory variable assumed to be approximately linear (straight line) Model:  1 > 0  Positive Association  1 < 0  Negative Association  1 = 0  No Association A Simple Linear Regression Model

X Y x  = Slope 1 y Error:  Regression Plot Picturing the Simple Linear Regression Model 0  = Intercept

Simple Linear Regression Analysis Variables: x = Independent Variable y = Dependent Variable Parameters:  = y Intercept β = Slope ε ~ normal distribution with mean 0 and variance  2 y = actual value of a score = predicted value

Simple Linear Regression Model… y x b=slope=  y/  x intercept a

The Method of Least Squares The equation of the best-fitting line is calculated using a set of n pairs (x i, y i ). We choose our estimates a and b to estimate  and  so that the vertical distances of the points from the line, are minimized.

Least Squares Estimators

Example The table shows the IQ scores for a random sample of n = 10 college freshmen, along with their final calculus grades. Student IQ Scores, x Calculus grade, y Use your calculator to find the sums and sums of squares.

Example

The Analysis of Variance total sum of squares The total variation in the experiment is measured by the total sum of squares: Total SS The Total SS is divided into two parts: SSR SSR (sum of squares for regression): measures the variation explained by using x in the model. SSE SSE (sum of squares for error): measures the leftover variation not explained by x.

We calculate The Analysis of Variance

The ANOVA Table Total df = Mean Squares Regression df = Error df = n -1 1 n –1 – 1 = n - 2 MSR = SSR/(1) MSE = SSE/(n-2) SourcedfSSMSF Regression1SSRSSR/(1)MSR/MSE Errorn - 2SSESSE/(n-2) Totaln -1Total SS

The Calculus Problem SourcedfSSMSF Regression Error Total

You can test the overall usefulness of the model using an F test. If the model is useful, MSR will be large compared to the unexplained variation, MSE. This test is exactly equivalent to the t-test, with t 2 = F. Testing the Usefulness of the Model ( The F Test)

Regression Analysis: y versus x The regression equation is y = x Predictor Coef SE Coef T P Constant x S = R-Sq = 70.5% R-Sq(adj) = 66.8% Analysis of Variance Source DF SS MS F P Regression Residual Error Total Regression coefficients, a and b Minitab Output Least squares regression line

Testing the Usefulness of the Model The first question to ask is whether the independent variable x is of any use in predicting y. If it is not, then the value of y does not change, regardless of the value of x. This implies that the slope of the line, , is zero.

The test statistic is function of b, our best estimate of  Using MSE as the best estimate of the random variation  2, we obtain a t statistic. Testing the Usefulness of the Model

The Calculus Problem Is there a significant relationship between the calculus grades and the IQ scores at the 5% level of significance? Reject H 0 when |t| > Since t = 4.38 falls into the rejection region, H 0 is rejected. There is a significant linear relationship between the calculus grades and the IQ scores for the population of college freshmen.

Measuring the Strength of the Relationship If the independent variable x is of useful in predicting y, you will want to know how well the model fits. The strength of the relationship between x and y can be measured using:

Measuring the Strength of the Relationship Since Total SS = SSR + SSE, r 2 measures the proportion of the total variation in the responses that can be explained by using the independent variable x in the model. the percent reduction the total variation by using the regression equation rather than just using the sample mean y-bar to estimate y. For the calculus problem, r 2 =.705 or 70.5%. Meaning that 70.5% of the variability of Calculus Scores can be exlain by the model.

Estimation and Prediction Confidence interval Prediction interval

The Calculus Problem Estimate the average calculus grade for students whose IQ score is 50 with a 95% confidence interval.

Estimate the calculus grade for a particular student whose IQ score is 50 with a 95% confidence interval. Notice how much wider this interval is! The Calculus Problem

Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI (72.51, 85.61) (57.95,100.17) Values of Predictors for New Observations New Obs x Minitab Output Green prediction bands are always wider than red confidence bands. Both intervals are narrowest when x = x- bar. Confidence and prediction intervals when x = 50

Estimation and Prediction Once you have determined that the regression line is useful used the diagnostic plots to check for violation of the regression assumptions. You are ready to use the regression line to Estimate the average value of y for a given value of x Predict a particular value of y for a given value of x.

The best estimate of either E(y) or y for a given value x = x 0 is Particular values of y are more difficult to predict, requiring a wider range of values in the prediction interval. Estimation and Prediction

Regression Assumptions 1.The relationship between x and y is linear, given by y =  +  x +  2.The random error terms  are independent and, for any value of x, have a normal distribution with mean 0 and constant variance,  2. 1.The relationship between x and y is linear, given by y =  +  x +  2.The random error terms  are independent and, for any value of x, have a normal distribution with mean 0 and constant variance,  2. Remember that the results of a regression analysis are only valid when the necessary assumptions have been satisfied. Assumptions:

Diagnostic Tools 1.Normal probability plot or histogram of residuals 2.Plot of residuals versus fit or residuals versus variables 3.Plot of residual versus order 1.Normal probability plot or histogram of residuals 2.Plot of residuals versus fit or residuals versus variables 3.Plot of residual versus order

Residuals residual errorThe residual error is the “leftover” variation in each data point after the variation explained by the regression model has been removed. normalIf all assumptions have been met, these residuals should be normal, with mean 0 and variance  2.

If the normality assumption is valid, the plot should resemble a straight line, sloping upward to the right. If not, you will often see the pattern fail in the tails of the graph. If the normality assumption is valid, the plot should resemble a straight line, sloping upward to the right. If not, you will often see the pattern fail in the tails of the graph. Normal Probability Plot

If the equal variance assumption is valid, the plot should appear as a random scatter around the zero center line. If not, you will see a pattern in the residuals. If the equal variance assumption is valid, the plot should appear as a random scatter around the zero center line. If not, you will see a pattern in the residuals. Residuals versus Fits