Introduction to Regression Analysis, Chapter 13,

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Simple Linear Regression and Correlation
Chapter 12 Simple Linear Regression
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Chapter 12 Simple Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Statistics for Business and Economics
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
1 Pertemuan 13 Uji Koefisien Korelasi dan Regresi Matakuliah: A0392 – Statistik Ekonomi Tahun: 2006.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
SIMPLE LINEAR REGRESSION
Pengujian Parameter Koefisien Korelasi Pertemuan 04 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 12 Simple Linear Regression
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Linear Regression Example Data
Korelasi dalam Regresi Linear Sederhana Pertemuan 03 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Pertemua 19 Regresi Linier
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 10 th Edition.
Correlation and Regression Analysis
Chapter 7 Forecasting with Simple Regression
Chapter 13 Simple Linear Regression
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 13 Simple Linear Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
MAT 254 – Probability and Statistics Sections 1,2 & Spring.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Chapter 14 Simple Regression
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Linear Regression and Correlation Analysis. Regression Analysis Regression Analysis attempts to determine the strength of the relationship between one.
Applied Quantitative Analysis and Practices LECTURE#22 By Dr. Osman Sadiq Paracha.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Chapter 5: Regression Analysis Part 1: Simple Linear Regression.
Chap 13-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 12.
Lecture 10: Correlation and Regression Model.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 12 Simple Linear Regression.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Conceptual Foundations © 2008 Pearson Education Australia Lecture slides for this course are based on teaching materials provided/referred by: (1) Statistics.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Chapter 13 Simple Linear Regression
Statistics for Managers using Microsoft Excel 3rd Edition
Simple Linear Regression
Chapter 11 Simple Regression
Chapter 13 Simple Linear Regression
PENGOLAHAN DAN PENYAJIAN
Chapter 13 Simple Linear Regression
Presentation transcript:

Introduction to Regression Analysis, Chapter 13, Regression analysis is used to: Predict values of a dependent variable,Y, based on its relationship with values of at least one independent variable, X. Explain the impact of changes in an independent variable on the dependent variable by estimating the numerical value of the relationship Dependent variable: the variable we wish to explain Independent variable: the variable used to explain the dependent variable

Simple Linear Regression Model Only one independent variable (thus, simple), X Relationship between X and Y is described by a linear function Changes in Y are assumed to be caused by changes in X, that is, Change In X Causes Change in Y

Important points before we start a regression analysis: The most important thing in deciding whether or not there is a relationship between X and Y is to have a systematic model that is based on logical reasons. Investigate the nature of the relationship between X and Y (use scatter diagram, covariance, correlation of coefficient) Remember that regression is not an exact or deterministic mathematical equation. It is a behavioral relationship that is subject to randomness. Remember that X is not the only thing that explains the behavior of Y. There are other factor that you may not have information about. All you are trying to do is to have an estimate of the relationship using the best linear fit possible

Types of Relationships Linear relationships Curvilinear relationships Y Y X X Y Y X X

Types of Relationships (continued) Strong relationships Weak relationships Y Y X X Y Y X X

Types of Relationships (continued) No relationship Y X Y X

Simple Linear Regression Conceptual Model The population regression model: This is a conceptual model, a hypothesis, or a postulation Random Error term Population Slope Coefficient Population Y intercept Independent Variable Dependent Variable Show the conceptual graph, identify the slop and intercept Give out the single-family housing example Go over the set up, scatter diagram, the variables Linear component Random Error component

The model to be estimated from sample data is: The actual estimated from the sample Where Residual (random error from the sample) Estimated (or predicted) Y value for observation i Estimate of the regression intercept Estimate of the regression slope Value of X for observation i

The individual random error terms, ei, have a mean of zero, i.e., Since the sum of random error is zero, we try to estimate the regression line such that the sum of squared differences are minimized, thus, the name Ordinary Least Squared Method (OLS) i.e., The estimated values of b0 and b1 by OLS are the only possible values of b0 and b1 that minimize the sum of the squared differences between Y and

Simple Linear Regression Model Y Yi Actual Observed Value of Y for Xi Slope = b1 Random Error for this Xi value Predicted Value of Y for Xi Intercept = b0 Xi X

Interpretation of the slope and the intercept b0 is the estimated average value of Y when the value of X is b0 zero b1 is the estimated change in the average value of Y as a result of a one-unit change in X Units of measurement of X and Y are very important for the correct interpretation of the slop and the intercept Example: Predict the app. Value of a house with 10,000 s.f. lot size How Good is this prediction?

How Good is the Model’s prediction Power? Total variation is made up of two parts: Total Sum of Squares Regression Sum of Squares Error Sum of Squares where: = Average value of the dependent variable Yi = Observed values of the dependent variable i = Predicted value of Y for the given Xi value

SST = total sum of squares Measures total variation of the Yi values around their mean SSR = regression sum of squares (Explained) Explained portion of total variation attributed to Y’s relationship with X SSE = error sum of squares (Unexplained) Variation of Y values attributable to other factors than its relationship with X

_ Yi _ _ Y Y, W/O the effect of X X   SSE = (Yi - Yi )2 Y SST = (Yi - Y)2  _ Y  _ SSR = (Yi - Y)2 Y Y, W/O the effect of X X Xi

How Good is the Model’s prediction Power? The coefficient of determination is the portion of the total variation in the dependent variable,Y, that is explained by variation in the independent variable, X The coefficient of determination is also called r-squared and is denoted as r2

Standard Error of Estimate The standard deviation of the variation of observations around the regression line is estimated by Where SSE = error sum of squares; n = sample size The concept is the same as the standard deviation (average difference) around the mean of a univariate

Comparing Standard Errors SYX is a measure of the variation of observed Y values from the regression line Y Y X X The magnitude of SYX should always be judged relative to the size of the Y values in the sample data i.e., SYX = $36.34K is moderately small relative to house prices in the $200 - $300K range (average 215K)

Assumptions of Regression Normality of Error Error values (ε) are normally distributed for any given value of X Homoscedasticity The probability distribution of the errors has constant variance Independence of Errors Error values are statistically independent

How to investigate the appropriateness of the fitted model The residual for observation i, ei, is the difference between its observed and predicted value; Check the assumptions of regression by examining the residuals Examine for linearity assumption Examine for constant variance for all levels of X (homoscedasticity) Evaluate normal distribution assumption Evaluate independence assumption Graphical Analysis of Residuals Can plot residuals vs. X

Residual Analysis for Linearity x x x x residuals residuals Not Linear Linear

Residual Analysis for Homoscedasticity x x x x residuals residuals Constant variance Non-constant variance

Residual Analysis for Independence Not Independent Independent residuals X residuals X residuals X

Measuring Autocorrelation: the Durbin-Watson Statistic (DO NOT COVER FOR) Used when data are collected over time to detect if autocorrelation is present. Can be useful for cross sectional data. Autocorrelation exists if residuals in one time period are related to residuals in another period Here, residuals show a cyclic pattern, not random

The Durbin-Watson Statistic The Durbin-Watson statistic is used to test for autocorrelation. The possible range is 0 ≤ D ≤ 4 D should be close to 2 if H0 is true D less than 2 may signal positive autocorrelation, D greater than 2 may signal negative autocorrelation H0: residuals are not correlated H1: autocorrelation is present

Calculate the Durbin-Watson test statistic = D Find the values dL and dU from the Durbin-Watson table (for sample size n and number of independent variables k) H0: positive autocorrelation does not exist H1: positive autocorrelation is present Reject H0 Inconclusive Do not reject H0 Positive Auto dL dU 2 H0: Negative autocorrelation does not exist H1: Negative autocorrelation is present Reject H0 Do not reject H0 Inconclusive Negative Auto 2 4-du 4-dL 4

Inferences about Estimated Parameters t test for a population slope Is there a linear relationship between X and Y? Null and alternative hypotheses H0: β1 = 0 (no linear relationship) H1: β1 0 (linear relationship does exist) Test statistic where: b1 = regression slope coefficient β1 = hypothesized slope Sb1 = estimate of the standard error of the slope

The standard error of the regression slope coefficient (b1) is estimated by is a measure of the variation in the slope of regression lines from different possible samples Y Y X X

F-Test for Significance A second approach to test the existence of a significant relationship is the ratio of explained to unexplained variances. For multiple regression, this is also a test of the entire model. H0: β1 = 0; H1: β1 ≠ 0 The Ratio is, where F follows an F distribution with k numerator and (n – k - 1) denominator degrees of freedom. (k = the number of independent variables in the regression model) df Sum S. Mean S. Actual F Regression K SSR MSR=SSR/K Fk,(n-k-1)=MSR/MSE Residual (error) n-k-1 SSE MSE=SSE/(n-k-1) Total n-1 SST      

Confidence Interval Estimates and Prediction of Individual Values of Y Confidence Interval Estimate for the Slope Confidence interval estimate for the mean value of Y given a particular Xi Where Note that the size of interval varies according to distance away from mean, X

Confidence interval estimate for an Individual value of Y given a particular Xi -- This extra term, 1, adds to the interval width to reflect the added uncertainty for an individual case

Graphical Presentation of Confidence Interval Estimation Confidence Interval for the mean of Y, given Xi Y  Y  Y = b0+b1Xi Prediction Interval for an individual Y, given Xi Xi Xj

Pitfalls of Regression Analysis Lacking an awareness of the assumptions underlying least-squares regression Not knowing how to evaluate the assumptions Not knowing the alternatives to least-squares regression if a particular assumption is violated Using a regression model without knowledge of the subject matter Extrapolating outside the relevant range Let’s look at 4 different data set, with their scatter diagram and residual plots on pages 509-510. Strategies: Start with a scatter plot of X on Y to observe possible relationship

Perform residual analysis to check the assumptions Plot the residuals vs. X to check for violations of assumptions such as homoscedasticity Use a histogram, stem-and-leaf display, box-and-whisker plot, or normal probability plot of the residuals to uncover possible non-normality If there is violation of any assumption, use alternative methods or models If there is no evidence of assumption violation, then test for the significance of the regression coefficients and construct confidence intervals and prediction intervals Avoid making predictions or forecasts outside the relevant range