I271B Quantitative Methods

Slides:



Advertisements
Similar presentations
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Objectives (BPS chapter 24)
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
Correlation & Regression
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Examining Relationships in Quantitative Research
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
I231B QUANTITATIVE METHODS ANOVA continued and Intro to Regression.
Correlation & Regression Analysis
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
CHAPTER 12 More About Regression
Chapter 13 Simple Linear Regression
Lecture #25 Tuesday, November 15, 2016 Textbook: 14.1 and 14.3
GS/PPAL Section N Research Methods and Information Systems
Chapter 4 Basic Estimation Techniques
Regression Analysis AGEC 784.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Lecture #26 Thursday, November 17, 2016 Textbook: 14.1 and 14.3
Statistics for Managers using Microsoft Excel 3rd Edition
26134 Business Statistics Week 5 Tutorial
Correlation and Simple Linear Regression
Inference for Regression
Chapter 11: Simple Linear Regression
Correlation and Regression
CHAPTER 12 More About Regression
Multiple Regression and Model Building
Chapter 11 Simple Regression
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Prepared by Lee Revere and John Large
Multiple Regression Models
Simple Linear Regression
Unit 3 – Linear regression
PENGOLAHAN DAN PENYAJIAN
Migration and the Labour Market
Correlation and Simple Linear Regression
CHAPTER 12 More About Regression
Simple Linear Regression and Correlation
Heteroskedasticity.
Product moment correlation
Simple Linear Regression
CHAPTER 12 More About Regression
Introduction to Regression
St. Edward’s University
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

I271B Quantitative Methods Regression Part I

Administrative Merriments Next Week: Reading and Evaluating Research Suggest readings using regression, bivariate statistics, etc Course Review May 5 Exam distributed May 7 in class No lecture

Regression versus Correlation Correlation makes no assumption about one whether one variable is dependent on the other– only a measure of general association Regression attempts to describe a dependent nature of one or more explanatory variables on a single dependent variable. Assumes one-way causal link between X and Y. Thus, correlation is a measure of the strength of a relationship -1 to 1, while regression is a more precise description of a linear relationship (e.g., the specific slope which is the change in Y given a change in X) But, correlation is still a part of regression: the square of the correlation coefficient (R2) becomes an expression of how much of Y’s variance is explained by X.

Basic Linear Model Yi = b0 + b1xi + ei. X (and X-axis) is our independent variable(s) Y (and Y-axis) is our dependent variable b0 is a constant (y-intercept) b1 is the slope (change in Y given a one-unit change in X) e is the error term (residuals)

Basic Linear Function

Slope So...what happens if B is negative?

The ‘least squares’ Solution The distance between each data point and the line of best fit is squared. All of the squared distances are added together. Adapted from Myers, Gamst and Guarino 2006

The Least Squares Solution (cont) For any Y and X, there is one and only one line of best fit. The least squares regression equation minimizes the possible error between our observed values of Y and our predicted values of Y (often called y-hat). Adapted from Myers, Gamst and Guarino 2006

Important features of regression There are two regression lines for any bivariate regression Regression to the mean (regression effect) Appears whenever there is spread around the SD line (1 SD increase in X  r x SDy) Regression fallacy Attempting to explain the regression effect through another mechanism

Statistical Inference Using Least Squares We obtain a sample statistic, b, which estimates the population parameter. b is the coefficient of the X variable (e.g., how much of a change in the predicted Y is associated with a change in X) We also have the standard error for b, indicated by e. We can use a standard t-distribution with n-2 degrees of freedom for hypothesis testing. Yi = b0 + b1xi + ei.

Root Mean Square Error = actual – predicted The root-mean-square (r.m.s.) error is how far typical points are above or below the regression line. Prediction errors are called residuals. The average of the residuals is = 0. The S.D. of the residuals is the same as the r.m.s. of the regression line

Interpretation: Predicted Y = constant + (coefficient * Value of X) For example, suppose we are examining Education (X) in years and Income (Y) in thousands of dollars Our constant is 10,000 Our coefficient for X is 5

Heteroskedasticity OLS regression assumes that the variance of the error term is constant. If the error does not have a constant variance, then it is heteroskedastic (literally, “different scatter”). Where it comes from Error may really change as an X increases Measurement error Underspecified model

Heteroskedasticity (continued) Consequences We still get unbiased parameter estimates, but our line may not be the best fit. Why? Because OLS gives more ‘weight’ to the cases that might actually have the most error from the predicted line. Detecting it We have to look at the residuals (difference between observed responses from the predicted responses) First, use a residual versus fitted values plot (in STATA, rvfplot) or the residuals versus predicted values plot, which is a plot of the residuals versus one of the independent variables. We should see an even band across the 0 point (the line), indicating that our error is roughly equal. If we are still concerned, we can run a test such as the Breusch-Pagan/Cook-Weisberg Test for Heteroskedasticity. It tests the null hypothesis that the error variances are all EQUAL, and the alternative hypothesis that there is some difference. Thus, if it is significant then we reject the null hypothesis and we have a problem of heteroskedasticity.

What to do about Heteroskedasticity? Perhaps other variables better predict Y? If you are still interested in current X, you can run a robust regression which will adjust the model to account for heteroskedasticity. Robust regression modifies the estimates for our standard errors and thus our t-tests for coefficients.

Data points and Regression http://www.math.csusb.edu/faculty/stanton/m262/regress/

Program and Data for Today: Regress.do GSS96_small.dta