Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression.

Slides:



Advertisements
Similar presentations
Topic 12: Multiple Linear Regression
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.
Structural Equation Modeling
Tests of Significance for Regression & Correlation b* will equal the population parameter of the slope rather thanbecause beta has another meaning with.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Econ 140 Lecture 81 Classical Regression II Lecture 8.
Multiple regression. Problem: to draw a straight line through the points that best explains the variance Regression.
Multiple Regression. Outline Purpose and logic : page 3 Purpose and logic : page 3 Parameters estimation : page 9 Parameters estimation : page 9 R-square.
Regression designs Growth rate Y 110 Plant size X1X1 X Y
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Analysis of Covariance Goals: 1)Reduce error variance. 2)Remove sources of bias from experiment. 3)Obtain adjusted estimates of population means.
Biol 500: basic statistics
Workshop in R and GLMs: #4 Diane Srivastava University of British Columbia
Multiple Linear Regression
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Chapter 15: Model Building
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Analysis of Variance Introduction The Analysis of Variance is abbreviated as ANOVA The Analysis of Variance is abbreviated as ANOVA Used for hypothesis.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Statistics for Business and Economics Chapter 10 Simple Linear Regression.
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Regression. Population Covariance and Correlation.
Go to Table of Content Single Variable Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
ANOVA for Regression ANOVA tests whether the regression model has any explanatory power. In the case of simple regression analysis the ANOVA test and the.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 5 Multiple Regression.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Environmental Modeling Basic Testing Methods - Statistics III.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Kin 304 Correlation Correlation Coefficient r Limitations of r
Correlation. Up Until Now T Tests, Anova: Categories Predicting a Continuous Dependent Variable Correlation: Very different way of thinking about variables.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Multivariate Analysis: Analysis of Variance
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
DTC Quantitative Research Methods Regression I: (Correlation and) Linear Regression Thursday 27 th November 2014.
There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Variable selection and model building Part I. Statement of situation A common situation is that there is a large set of candidate predictor variables.
Analysis of Variance 11/6. Comparing Several Groups Do the group means differ? Naive approach – Independent-samples t-tests of all pairs – Each test doesn't.
Linear regression models. Purposes: To describe the linear relationship between two continuous variables, the response variable (y- axis) and a single.
Venn diagram shows (R 2 ) the amount of variance in Y that is explained by X. Unexplained Variance in Y. (1-R 2 ) =.36, 36% R 2 =.64 (64%)
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Regression Models First-order with Two Independent Variables
Chapter 15 Multiple Regression and Model Building
Correlation, Bivariate Regression, and Multiple Regression
The Correlation Coefficient (r)
Multiple Regression Analysis and Model Building
Relationship with one independent variable
BPK 304W Correlation.
CHAPTER 29: Multiple Regression*
Statistics review Basic concepts: Variability measures Distributions
Model Selection In multiple regression we often have many explanatory variables. How do we find the “best” model?
Regression designs Y X1 Plant size Growth rate 1 10
Chapter 13 Group Differences
Relationship with one independent variable
Quadrat sampling Quadrat shape Quadrat size Lab Regression and ANCOVA
MGS 3100 Business Analysis Regression Feb 18, 2016
Regression designs Y X1 Plant size Growth rate 1 10
Presentation transcript:

Multiple regression

Problem: to draw a straight line through the points that best explains the variance Regression

Problem: to draw a straight line through the points that best explains the variance Regression

Problem: to draw a straight line through the points that best explains the variance Regression

Test with F, just like ANOVA: Variance explained by x-variable / df Variance still unexplained / df Regression Variance explained (change in line lengths 2 ) Variance unexplained (residual line lengths 2 )

Test with F, just like ANOVA: Variance explained by x-variable / df Variance still unexplained / df Regression In regression, each x-variable will normally have 1 df

Test with F, just like ANOVA: Variance explained by x-variable / df Variance still unexplained / df Regression Essentially a cost: benefit analysis – Is the benefit in variance explained worth the cost in using up degrees of freedom?

Total variance for 32 data points is 300 units. An x-variable is then regressed against the data, accounting for 150 units of variance. 1.What is the R 2 ? 2.What is the F ratio? Regression example

Total variance for 32 data points is 300 units. An x-variable is then regressed against the data, accounting for 150 units of variance. 1.What is the R 2 ? 2.What is the F ratio? Regression example R 2 = 150/300 = 0.5 F 1,30 = 150/1 = /30 Why is df error = 30?

Multiple regression Tree age Herbivore damage Higher nutrient trees Lower nutrient trees Damage= m 1 *age + b

Tree age Herbivore damage Tree nutrient concentration Residuals of herbivore damage

Tree age Herbivore damage Tree nutrient concentration Residuals of herbivore damage Damage= m 1 *age + m 2 *nutrient + b

Damage= m 1 *age + m 2 *nutrient + m3*age*nutrient +b No interaction (additive):Interaction (non-additive): yy

Non-linear regression? Just a special case of multiple regression! Y = m 1 x +m 2 x 2 +b XX 2 Y X2X2 X1X1 Y = m 1 x 1 +m 2 x 2 +b

STEPWISE REGRESSION

Jump height (how high ball can be raised off the ground) Feet off ground Total SS = 11.11

X variableparameterSSF 1,13 p Height < of player

X variableparameterSSp Weight < of player F 1,13

Why do you think weight is + correlated with jump height?

An idea Perhaps if we took two people of identical height, the lighter one might actually jump higher? Excess weight may reduce ability to jump high…

How could we test this idea?

lighter heavier X variableparameterSSF p Height < Weight <0.0001

Questions: Why did the parameter estimates change? Why did the F tests change?

Heavy people often tall (tall people often heavy) Tall people can jump higher People light for their height can jump a bit more Weight Height Jump + + -

The problem: The parameter estimate and significance of an x-variable is affected by the x-variables already in the model! How do we know which variables are significant, and which order to enter them in model?

Solutions 1) Use a logical order. For example in ANCOVA it makes sense to test the interaction first 2) Stepwise regression: “tries out” various orders of removing variables.

Stepwise regression Enters or removes variables in order of significance, checks after each step if the significance of other variables has changed Enters one by one: forward stepwise Enters all, removes one by one: backwards stepwise

Forward stepwise regression Enter the variable with the highest correlation with y-variable first (p>p enter). Next enter the variable to explains the most residual variation (p>p enter). Remove variables that become insignificant (p> p leave) due to other variables being added. And so on…