1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable.

Slides:



Advertisements
Similar presentations
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Advertisements

Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
EPI 809/Spring Probability Distribution of Random Error.
Simple Linear Regression and Correlation
Objectives (BPS chapter 24)
Multiple Regression Analysis
Introduction to Regression Analysis
Chapter 13 Multiple Regression
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Multiple regression analysis
Chapter 10 Simple Regression.
Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Our theory states.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Chapter 11 Multiple Regression.
SIMPLE LINEAR REGRESSION
Ch. 14: The Multiple Regression Model building
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
Lecture 5 Correlation and Regression
Correlation & Regression
8.1 Ch. 8 Multiple Regression (con’t) Topics: F-tests : allow us to test joint hypotheses tests (tests involving one or more  coefficients). Model Specification:
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Correlation and Linear Regression
Multiple Regression. In the previous section, we examined simple regression, which has just one independent variable on the right side of the equation.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Y X 0 X and Y are not perfectly correlated. However, there is on average a positive relationship between Y and X X1X1 X2X2.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Regression Analysis Relationship with one independent variable.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
Significance Tests for Regression Analysis. A. Testing the Significance of Regression Models The first important significance test is for the regression.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Lecture #25 Tuesday, November 15, 2016 Textbook: 14.1 and 14.3
Regression Analysis AGEC 784.
Inference for Least Squares Lines
Correlation and Simple Linear Regression
Relationship with one independent variable
Correlation and Simple Linear Regression
Correlation and Regression
Correlation and Simple Linear Regression
Undergraduated Econometrics
Our theory states Y=f(X) Regression is used to test theory.
Relationship with one independent variable
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

1 Regression Analysis Regression used to estimate relationship between dependent variable (Y) and one or more independent variables (X). Consider the variable defined as Y – total library expenditure in cities within Los Angeles County in 1999.variable defined as Y Y is considered a variable because its values differ from one observation (or city) to another. The distribution of Y assigns the chance the variable equals a value or range of values.distribution –What is the chance library expenditures are less than $2,320,400?; between $2,320,400 and 9,271,900?

2 Regression Analysis What is the meant by a data set being a population? A sample? If we wanted to study the library expenditure distribution specifically within LA County in 1999, the data could be considered a population. E(Y) – expected value of the variable Y; this is calculated as the mean of the population, µ. E(Y) = $1,571, – interpret.

3 Regression Analysis Relationship between library expenditure and other variables. If library expenditure related to other variables, the conditional expected value of Y will differ from unconditional. E(Y) is the unconditional expected value of Y; it’s calculated by including all libraries within the defined population. E(Y|X) – expected value of Y conditional on the variable X. E(Y|X) ≠ E(Y) implies there is relationship between Y and X.

4 Regression Analysis Suppose X indicates whether the library is run by the individual city or is part of the county library system: X =1 if city run; =0 if county run. E(Y|X=1) – expected value of library expenditures conditional on the library being city-run. E(Y|X=0) – expected value of library expenditures conditional on the library being county-run. E(Y|X=1)=2,450,547.42; E(Y|X=0) = 951, –Libraries run by individual cities have greater mean expenditures than the average library. –Libraries run by individual cities have greater mean expenditures than libraries in the county system.

5 Regression Analysis Given that we’ve defined the data as the population we can say definitely the results indicate a relationship between Y and X. Our analysis however doesn’t necessarily mean a causal relationship exists. If data represents sample, we couldn’t determine with certainty whether the relationship that apparently exists in sample would also exist within the population. Define Data Now As Sample Data. We want to use regression analysis using the sample data to make predictions about the population. Want to investigate the factors that determine library expenditures: what causes the difference in library expenditures across cities?

6 Regression Analysis Population We theorize that within the population there is a function that relates library expenditures to the variable’s determinants: Y = f(X) + e. –where X can be a number of variables that “cause” the dependent variable Y and f(X) is the specific function that relates X to Y. e is the error term, the difference between the actual value of Y and the value generated by f(X). Normally f(X) will not completely account for all the variation in Y. The best it will do is calculate the expected value of Y given specific values of X.

7 Regression Analysis The function f(X) calculates the expected value of Y conditional on the independent variable(s) X: –f(X) calculates E(Y|X) If we theorize that the function representing the relationship between X and Y is linear, the expected values of Y can be expressed as: –E(Y|X)=ß 0 + ß 1 X 1 + ß 2 X 2 +…. This is the population regression equation.

8 Regression Analysis Sample In practice we won’t have all the data that make up Y and X. We’ll only have a sample. Therefore we won’t be able to actually calculate the ß parameters in the population equation. We’ll calculate the sample equation: –ŷ = b 0 + b 1 X 1 + b 2 X 2 +…… –where ŷ is the estimate of E(Y|X); b 0 is the estimate for the ß 0 ; b 1 is the estimate for the ß 1 etc.

9 Regression Analysis Inferences from the sample equation are used to describe relationships within the larger population. Assume the simple regression model: ŷ = b 0 + b 1 X 1. –where y is expenditures by the sampled libraries and X represents the number of residents in the sampled cities.

10

11 The SAS System 15:08 Sunday, March 21, (note: Y and X are untransformed) The REG Procedure Model: MODEL1 Dependent Variable: expend Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model E E <.0001 Error E E12 Corrected Total E14 Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept residents <.0001

12 Regression Analysis Equation: ŷ = X 1 Interpret b 0, b 1. ∆ŷ= b 1 ∆X 1 ; ∆ŷ= ∆X 1 An additional resident in a city is estimated to increase predicted library expenditure by $ What is the relationship between b 1 and ß 1 ?

13 Regression Analysis The sample equation isn’t accounting for all the variation in the dependent variable, y. Interpret coefficient of determination, R 2 –R 2 measures the proportion of the variation in dependent variable that is explained by the model. –How much of the variation in library expenditures across cities is explained by differences in city size?

14 Regression Analysis R 2 =65.55% –Our model accounts for 65.55% of the variation in library expenditures. –City size explains 65.55% of the variation in library expenditures. Part of the variation in library expenditures is unexplained.

15 Regression Analysis Residual term, ê, is the difference between the actual and predicted value of the dependent variable y i = ŷ i + ê i –Actual value of dependent variable = predicted value + residual Interpret residual terms (ê i = y i - ŷ i ) from regression model.residual terms The non-zero residual terms and the R 2 value less than 100% both indicate the model doesn’t perfectly predict each y-value.

16 Regression Analysis Stochastic relationship: there is a whole distribution of Y-values for each value for X. The predicted values, ŷ, are estimates of the expected value of Y conditional on X. The ŷ’s are estimates of mean library expenditures conditional on city size.

17 Regression Analysis The relationship between city size and expenditures found within the sample may not necessarily hold within the population. b 1 is an estimate for ß 1 The slope of the sample regression equation (b 1 ) is only an estimate of the “true” marginal relationship between Y and X within the population. b 1 is a variable, its value depends on the specific sample taken.

18 Regression Analysis E(b 1 )=ß 1 The expected value of b 1 is ß 1 but there still may be a difference between a particular calculated b 1 and ß 1. This difference is called sampling error. The slope estimate b 1 follows a sampling distribution with a standard deviation equal to S b1 (=2.062 in our regression output). Population Equation: E(Y|X)=ß 0 + ß 1 X 1 Interpret hypotheses: H 0 : ß 1 =0 H 1 : ß 1 ≠0

19 Regression Analysis Steps to perform hypothesis test. 1.State null and alternative hypotheses, H 0 and H 1. 2.Use t-distribution. 3.Set level of significance, α. This gives the size of the rejection region. 4.Find the critical values. For a two tailed test, the critical values are ± t α/2,γ where γ is degrees of freedom n-k-1. 5.Calculate test statistic t=(b 1 -ß 1 )/S b1. 6.Reject H 0 if test statistic, t t α/2,,γ

20 Regression Analysis Nonlinear Models The linear model E(Y|X)=ß 0 + ß 1 X 1 may not be appropriate for some relationships between variables. For example the relationship below:

21 Regression Analysis Assume theoretical relationship between X and Y within population: –E(Y|X)= αX ß (assume α is positive) If ß=1 then relationship between X and Y is positive and linear. Slope of relationship is α. If 0<ß<1 relationship is positive and nonlinear (concave). Slope no longer constant. (Use calculus to solve for slope). If ß>1, convex nonlinear relationship.

22 Regression Analysis Nonlinear models can be estimated by taking the natural log transformation of the data. Natural log value e=2.718 Example of transformation: –if X=21,900 ln(X) equals t where e t =21,900 –ln(X)=9.994

23 Regression Analysis Model: Y=αX ß Take log of both sides: ln(Y)=ln(α)+ß ln(X) Performing ordinary least squares model on transformed data converts unit changes into percentage changes.

24 Log/log model The SAS System 21:22 Sunday, March 21, The REG Procedure Model: MODEL1 Dependent Variable: logexpend Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept <.0001 logresidents <.0001

25 Regression Analysis Log/log regression model: –ŷ=b 0 +b 1 X 1 = X 1 Interpret b 1 Coefficient: A 10% increase in city size will cause predicted library expenditures to increase by 8%. b 1 is an elasticity. Interpret and compare R 2 – does the higher R 2 mean that this model is more appropriate than the linear model?

26 Regression Analysis Log/linear regression model –Model where dependent variable is log transformed but right hand variable(s) is not. –Commonly used in growth time series studies, for example, where y is the log of GNP and X is an index of time (year). Also used in labor wage models.

27 Regression Analysis Log/linear model results for our data where y is the log of library expenditures and X is number of residents by city. – ŷ=b 0 +b 1 X 1 = X 1 Interpret b 1 –Suppose ∆X 1 =1000; ∆ŷ would equal.01 or 1% –A city size increase of 1000 residents would induce a 1% increase in predicted library expenditures. Limitations of log models.

28 Regression Analysis Multiple Regression –The “true” model would have all the X’s on the right hand side that have a systematic relationship with Y. Example of log model (output on next page) ŷ = b 0 +b 1 X 1 +b 2 X 2 + b 3 X 3 +b 4 X 4 Where ŷ is predicted library expenditure; X 1 is number of residents in city; X 2 =1 if library run by city =0 if library run by county; X 3 is percent of city residents who are school aged children; X 4 is median household income by city. a.Interpret each of the b coefficients (be careful in interpreting b 2, the coefficient for the dummy variable X 2 ) b.Interpret R 2 (why is R 2 higher in the multiple regression compared to the simple regressions?) c.Perform and interpret the hypothesis test: H 0 : ß 1 =1 H 1 : ß 1 ≠1

29 (note: log/log model Model: MODEL2 Dependent Variable: lexpend Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept lresidents <.0001 citlib <.0001 lchildren lincome <.0001