Statistics and Quantitative Analysis U4320

Slides:



Advertisements
Similar presentations
AP Statistics Section 3.2 C Coefficient of Determination
Advertisements

Multiple Regression and Model Building
Lesson 10: Linear Regression and Correlation
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
The Simple Regression Model
Chapter 12 Simple Linear Regression
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Correlation and Regression
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Chapter 12 Simple Linear Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Statistics for the Social Sciences
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
SIMPLE LINEAR REGRESSION
ASSESSING THE STRENGTH OF THE REGRESSION MODEL. Assessing the Model’s Strength Although the best straight line through a set of points may have been found.
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
k r Factorial Designs with Replications r replications of 2 k Experiments –2 k r observations. –Allows estimation of experimental errors Model:
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
SIMPLE LINEAR REGRESSION
BCOR 1020 Business Statistics
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression and Correlation
Introduction to Regression Analysis, Chapter 13,
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Relationships Among Variables
Lecture 5 Correlation and Regression
SIMPLE LINEAR REGRESSION
Correlation and Regression
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Introduction to Linear Regression and Correlation Analysis
Chapter 13: Inference in Regression
Linear Regression and Correlation
Simple Linear Regression Models
Correlation and Regression
Regression. Idea behind Regression Y X We have a scatter of points, and we want to find the line that best fits that scatter.
CHAPTER 14 MULTIPLE REGRESSION
Statistics and Quantitative Analysis U4320 Segment 12: Extension of Multiple Regression Analysis Prof. Sharyn O’Halloran.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Linear Regression Least Squares Method: the Meaning of r 2.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Class 4 Simple Linear Regression. Regression Analysis Reality is thought to behave in a manner which may be simulated (predicted) to an acceptable degree.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Six.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Residuals Recall that the vertical distances from the points to the least-squares regression line are as small as possible.  Because those vertical distances.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Chapter 13 Linear Regression and Correlation. Our Objectives  Draw a scatter diagram.  Understand and interpret the terms dependent and independent.
Relationship with one independent variable
Quantitative Methods Simple Regression.
CHAPTER 29: Multiple Regression*
Least Squares Method: the Meaning of r2
Relationship with one independent variable
SIMPLE LINEAR REGRESSION
CHAPTER 14 MULTIPLE REGRESSION
SIMPLE LINEAR REGRESSION
Presentation transcript:

Statistics and Quantitative Analysis U4320 Lecture 13: Explaining Variation Prof. Sharyn O’Halloran

I. Explaining Variation: R2 A. Breaking Down the Distances Let's go back to the basics of regression analysis. How well does the predicted line explain the variation in the independent variable money spent?

I. Explaining Variation: R2 Total Variation

I. Explaining Variation: R2 Total Deviation Total = Explained + Unexplained Deviation Deviation Deviation The total distance from any point to is the sum of the distance from Y to the regression line plus the distance from the regression line to .

I. Explaining Variation: R2 B. Sums of Squares We can sum this equation across all the Y's and square both sides to get:

I. Explaining Variation: R2 1. Total Sum of Squares (SST). The term on the left-hand side of this equation is the sum of the squared distances from all points to . We call this the total variation in the Y's, or the Total Sum of Squares (SST). 2. Regression Sum of Squares The first term on the right hand side is the sum of the squared distances from the regression line to . We call it the Regression Sum of Squares, or SSR.

I. Explaining Variation: R2 3. Error Sum of Squares Finally, the last term is the sum of the squared distances from the points to the regression line. Remember, this is the quantity that least squares minimizes. We call it the Error Sum of Squares, or SSE. We can rewrite the previous equation as: SST = SSR + SSE.

I. Explaining Variation: R2 C. Definition of R2 We can use these new terms to determine how much variation is explained by the regression line. If the points are perfectly linear, then the Error Sum of Squares is 0:

I. Explaining Variation: R2 Here, SSR = SST. The variance in the Y's is completely explained by the regression line.

I. Explaining Variation: R2 On the other hand, if there is no relation between X and Y: Now SSR is 0 and SSE=SST. The regression line explains none of the variance in Y.

I. Explaining Variation: R2 3. Formula So we can construct a useful statistic. Take the ratio of the Regression Sum of Squares to the Total Sum of Squares: We call this statistic R2 It represents the percent of the variation in Y explained by the regression

I. Explaining Variation: R2 R2 is always between 0 and 1. For a perfectly straight line it's 1, which is perfect correlation. For data with little relation, it's near 0. R2 measures the explanatory power of your model. The more of the variance in Y you can explain, the more powerful your model.

I. Explaining Variation: R2 D. Example I wanted to investigate why people have confidence in what they see on TV. 1. Dependent variable TRUSTTV = 1 if has a lot of confidence = 2 if somewhat confidence = 3 if the individual has no confidence. 2. Independent variables TUBETIME = number of Hours of TV watched a week SKOOL = years of education. LIKEJPAN = feelings towards Japan. YELOWSTN = attitudes whether the US should spend more on national parks. MYSIGN = the respondents astrological sign.

I. Explaining Variation: R2 3. Calculating R2 a) Correlation matrix

I. Explaining Variation: R2 b) The first model is: TRUSTTV = 2.34 - 0.0539 (TUBETIME)

I. Explaining Variation: R2 How do we calculate the Total Sum of Squares? SST = SSR + SSE SST = 5.90 + 183.61 = 189.51 Now we can calculate R2:

I. Explaining Variation: R2 c) Each of the 4 different models has an associated R2. Eq # 2: R2 = 0.035 Eq # 3: R2 = 0.039 Eq # 4 = R2 = .04

I. Explaining Variation: R2 F. Using R2 in Practice 1. Useful Tool 2. Measure of Unexplained Variance 3. Not a Statistical Test 4. Don't Obsess about R2 5. You can always improve R2 by adding variables

I. Explaining Variation: R2 G. Example You'll notice that R2 increase every time. No matter what variables you add you can always increase your R2.

II. Adjusted R2 II. Adjusted R2 A. Definition of Adjusted R2 So we'd like a measure like R2, but one that takes into account the fact that adding extra variables always increases your explanatory power. The statistic we use for this is call the Adjusted R2, and its formula is: So the Adjusted R2 can actually fall if the variable you add doesn't explain much of the variance.

II. Adjusted R2 B. Back to the Example 1. Adjusted R2 You can see that the adjusted R2 rises from equation 1 to equation 2, and from equation 2 to equation 3. But then it falls from equation 3 to 4, when we add in the variables for national parks and the zodiac.

II. Adjusted R2 2. Calculating Adjusted R2 Example: Equation 2

II. Adjusted R2 We calculate:

II. Adjusted R2 C. Stepwise Regression One strategy for model building is to add variables only if they increase your adjusted R2. This technique is called stepwise regression. However, I don't want to emphasize this approach to strongly. Just as people can fixate on R2 they can fixate on adjusted R2. ****IMPORTANT**** If you have a theory that suggests that certain variables are important for your analysis then include them whether or not they increase the adjusted R2. Negative findings can be important!

III. F Tests III. F Tests A. When to use an F-Test? Say you add a number of variables into a regression model and you want to see if, as a group, they are significant in explaining variation in your dependent variable Y. The F-test tells you whether a group of variables, or even an entire model, is jointly significant. This is in contrast to a t-test, which tells whether an individual coefficient is significantly different from zero.

III. F Tests B. Equations To be precise, say our original equation is: EQ 1: Y = b0 + b1X1 + b2X2, and we add two more variables, so the new equation is: EQ 2: Y = b0 + b1X1 + b2X2 + b3X3 + b4X4. We want to test the hypothesis that H0: b3 = b4 = 0. That is, we want to test the joint hypothesis that X3 and X4 together are not significant factors in determining Y.

III. F Tests C. Using Adjusted R2 First There's an easy way to tell if these two variables are not significant. First, run the regression without X3 and X4 in it, then run the regression with X3 and X4. Now look at the adjusted R2's for the two regressions. If the adjusted R2 went down, then X3 and X4 are not jointly significant. So the adjusted R2 can serve as a quick test for insignificance.

III. F Tests D. Calculating an F-Test 1. Ratio If the adjusted R2 goes up, then you need to do a more complicated test, F-Test. 1. Ratio Let regression 1 be the model without X3 and X4, and let regression 2 include X3 and X4. The basic idea of the F statistic, then, is to compute the ratio:

III. F Tests 2. Correction We have to correct for the number of independent we add. So the complete statistic is: Remember that k is the total number of independent variables, including the ones that you are testing and the constant.

III. F Tests 2. Correction (cont.) This equation defines an F statistic with m and n-k degrees of freedom. We write it like this: To get critical values for the F statistic, we use a set of tables, just like for the normal and t-statistics.

III. F Tests E. Example 1. Adding Extra Variables: Are a group of variables jointly significant? Are the variables YELOWSTN and MYSIGN jointly significant?

III. F Tests 1. Adding Extra Variables (cont.) a) State the null hypothesis b) Calculate the F-statistic Our formula for the F statistic is:

III. F Tests What is SSE1 --the sum of squared errors in the first regression? What is SSE2, the sum of squared errors in the second regression? m = 2 N = 470 k = 6 The formula is:

III. F Tests c) Reject or fail to reject the null hypothesis? The critical value at the 5 % level from the table, is 3.00. Is the F-statistic > ? If yes, then we reject the null hypothesis that the variables are not significantly different from zero; otherwise we fail to reject. .319 < 3.00, so we can reject the null hypothesis.

III. F Tests 2. Testing All Variables: Is the Model Significant? Equation 2

III. F Tests a) Hypothesis: Again, we start with our formula:

III. F Tests b) Calculate F-statistic SSE2 = 182.78 SSE1 is the sum of squared errors when there are no explanatory variables at all. If there are no explanatory variables, then SSR must be 0. In this case, SSE=SST. So we can substitute SST for SSE1 in our formula. SST = SSR + SSE = 6.738 + 182.78 = 189.54 This is the number reported in your printout under the F statistic.

III. F Tests c) Reject or fail to reject the null hypothesis? The critical value at the 5% level, from your table, is 3.00. So this time we can reject the null hypothesis that b1 = b2 = 0.