ANOVA, Regression and Multiple Regression March 22-23.

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Chapter 12 Inference for Linear Regression
Lesson 10: Linear Regression and Correlation
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Objectives (BPS chapter 24)
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Regression Analysis. Unscheduled Maintenance Issue: l 36 flight squadrons l Each experiences unscheduled maintenance actions (UMAs) l UMAs costs $1000.
Chapter 12 Simple Regression
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
The Basics of Regression continued
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Business Statistics - QBM117 Statistical inference for regression.
Correlation and Regression Analysis
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Introduction to Regression Analysis, Chapter 13,
Chapter 12 Section 1 Inference for Linear Regression.
Simple Linear Regression Analysis
Correlation & Regression
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
Inference for Regression
Inferences for Regression
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
From Theory to Practice: Inference about a Population Mean, Two Sample T Tests, Inference about a Population Proportion Chapters etc.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
CHAPTER 27: One-Way Analysis of Variance: Comparing Several Means
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Regression Chapter 5 January 24 – Part II.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
Lecture Slides Elementary Statistics Twelfth Edition
CHAPTER 12 More About Regression
Chapter 13 Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
CHAPTER 12 More About Regression
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Lecture Slides Elementary Statistics Thirteenth Edition
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
CHAPTER 12 More About Regression
CHAPTER 12 More About Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

ANOVA, Regression and Multiple Regression March 22-23

Why ANOVA ANOVA allows us to compare the means for groups and ask if they are sufficiently different from one another to say that such differences are statistically significant. In other words, that there is a low probability that a difference of such magnitude would occur between groups in the real world if the actual difference between groups were zero.

Unlike “t” tests, ANOVA can be performed with more than two groups The statistic associated with ANOVA is the “f” test.

As the book notes… “The details of ANOVA are a bit daunting (they appear in an optional section at the end of this chapter). The main idea of ANOVA is more accessible and much more important. Here it is: when we ask if a set of sample means gives evidence for differences among the population means, what matters is not how far apart the sample means are but how far apart they are relative to the variability of individual observations.”samplemeanspopulation means samplemeansindividual

In other words, lets look both at the means and the overlap of the distributions!

Statistically speaking The f statistics looks at variation of the sample means over variation among individuals in the same sample Like “t” the “f” statistic is very robust and therefore you should not worry too much about deviations from normality if your sample is large

One warning ANOVA assumes that the variability of observations, measured by the standard deviation is the same in all populations In the real world if you keep the sizes of the groups you are comparing roughly similar few problems occur but you must check.

The book gives this rule Results of the “f” test are usually okay if the largest sample standard deviation is no more than twice as large as the smallest sample standard deviation Another way to check is “Levene’s equality of variance” statistic. If it is significant (low p) it means there is little probability that the standard deviations of the groups being compared are similar and you have a problem

From Table 24.2 (richness of trees by Group)

Why Regression Regression is commonly used in the social sciences because it allows us to –Explain –Predict which are two of the big goals of social science (along with describe)

Recall Regression involves mathematically describing a linear relationship between a response (or dependent) variable and an explanatory (or independent) variable That line is given in the form y = a +b(x) Where: –y is the response variable –a is the y axis intercept of the line –b is the slope of the line –X is the explanatory variable

Requirements for use of regression Also recall that if the relationship between our response and explanatory variable is not linear then regression will give misleading results. Therefore we always do a scatter plot before attempting regression. The mathematical notation for linearity is below. This is sometimes called the “least-squares regression line” because this regression procedure finds the line that is the least squared difference from each data point.

Regression requirements continued For any value of x the values of y are normally distributed and repeated responses of y are independent of each other The standard deviation of y is the same for all values of x

Regression Analysis As well as estimating the regression line, we also estimate the goodness of fit between the line and the data by using a statistic known as Rsq Rsq (as the name implies) is the square of the correlation measure known as “r”. We also have to know the significance of the association between the explanatory and response variables (as well as the coefficient “a”) for the line we have found we use a variation of the “t” test for this.

A useful tool: Regression Standard Error The regression standard error is a useful tool that can help us diagnosis whether we have met the various conditions needed to perform a regression (don’t worry your software will do this).

So looking at example 23.1 in your book here is the scatter plot

Here is the regression control showing how I have selected the standard errors called “residuals” in SPSS

Here is the dialogue box in Excel using the plug in

Here is a portion of the printout that was generated in SPSS

Here you can see the standard residuals or errors that were calculated

In Excel it looks like this

A happy coincidence As the book notes Rsq is “closely related” to r. In fact it is literally the sq. of r in a simple OLS regression with one explanatory variable. Therefore, when you test the null hypothesis of the regression line. That it is actually flat you also pretty much have tested the correlation too. However, most software also prints it out in case you want to see it.

Does it matter that the estimate of the intercept is insignificant? In practice no. What really matters is the estimate of the slope

Calculating the confidence interval for your line If you look back at our print out you will see the slope is given as is the Standard Error of the Slope and a t value. Put them together and you have the 95% confidence interval for the Population slope.

Or you could have had the computer calculate the confidence intervals for you

As noted before we can use our standardized errors to check our assumptions The y values vary normally for each x value, do a histogram of your residuals and check for relative normality of distribution Plot the residuals as the dependent variable with the x variable as independent to check for linearity and that the observations for Y are independent of each other Standard deviation of responses can be checked by looking for a rough symmetrical distribution above and below the zero point

Our previous example had too few cases to check residuals so here is example 23.9 from the book on climate change

Moving from OLS to Multiple OLS three big changes 1 We have to use Beta instead of B 2 We have to be aware of multicollinearity and other multiple impacts (in short that we are not just piling on independent variables but that each independent variable is demonstrating a unique explanatory power The book gives you a third. We have to be aware of interaction terms and other factors that lead us to pick one model over another

The equation is now changed to reflect greater number of variables and change from b to beta

How to do it? Start from the beginning and look at each variable separately using our descriptive and exploratory techniques Now look at our dependent variable in pairs with each independent variable using correlations to see which ones might have a big impact Fit different models. Pay attention to changes in explanatory power and also the t statistics If using stats software use stepwise procedures. Stepwise adds and removes variables in the order you input them based on a selection criteria (change in the F statistic from the ANOVA test). In short the computer tells you which model best fits with as few variables as possible.

In SPSS You can do more than one scatterplot at once

The data provided in Table 27.6 represent a random sample of 60 customers from a large clothing retailer. 8 The manager of the store is interested in predicting how much a customer will spend on his or her next purchase. Our goal is to find a regression model for predicting the amount of a purchase from the available explanatory variables. A short description of each variable is provided below.Table 27.6 sample 8explanatory variablesvariable

Here are the print outs for Ex using SPSS

Let’s add a new variable Purchase 12 shows the total purchases each customer makes over the last 12 months divided by the frequency of their visits to the store

As you will see it changes things Here is the OLS for it alone

The last slide was basically an interaction of the two variables we previously identified as helpful. Let’s go back to when they were separate for a second and test whether each has a separate impact or if multicollinearity is at play. Look for tolerances.1 or less as evidence of multicollinearity

Finally let’s look at our residual plots Often you might have the chance to use more elaborate residuals than standardized ones, such as studentized residuals. As there is no pattern we assume the variance for y is the same for all values of x

The sequence chart also tells us that there the y values are independent of each other

The QQ plot tells us the residuals are roughly normal meaning that the notion that values of y vary normally for each value of x might be met