Multiple Linear Regression

Slides:



Advertisements
Similar presentations
Topics: Multiple Regression Analysis (MRA)
Advertisements

1 1 Chapter 5: Multiple Regression 5.1 Fitting a Multiple Regression Model 5.2 Fitting a Multiple Regression Model with Interactions 5.3 Generating and.
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Correlation and Linear Regression.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Chapter 13 Multiple Regression
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Bivariate Regression CJ 526 Statistical Analysis in Criminal Justice.
Chapter 11 Multiple Regression.
Multiple Regression and Correlation Analysis
Ch. 14: The Multiple Regression Model building
Analysis of Variance & Multivariate Analysis of Variance
Multiple Regression Research Methods and Statistics.
Correlation and Regression Analysis
Chapter 14 Inferential Data Analysis
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Multiple Regression Dr. Andy Field.
Simple Linear Regression Analysis
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Regression and Correlation Methods Judy Zhong Ph.D.
Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, November 19 Chi-Squared Test of Independence.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Introduction to Linear Regression and Correlation Analysis
Correlation and Regression
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.
Examining Relationships in Quantitative Research
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
MARKETING RESEARCH CHAPTER 18 :Correlation and Regression.
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Chapter 10 The t Test for Two Independent Samples
Chapter 13 Repeated-Measures and Two-Factor Analysis of Variance
Correlation & Regression Analysis
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Applied Quantitative Analysis and Practices LECTURE#28 By Dr. Osman Sadiq Paracha.
Principal Component Analysis
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Regression. Why Regression? Everything we’ve done in this class has been regression: When you have categorical IVs and continuous DVs, the ANOVA framework.
Stats Methods at IC Lecture 3: Regression.
Topics: Multiple Regression Analysis (MRA)
Multiple Regression.
Correlation, Bivariate Regression, and Multiple Regression
Multiple Regression Prof. Andy Field.
Multiple Regression.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Regression.
12 Inferential Analysis.
CHAPTER 29: Multiple Regression*
Multiple Regression.
Analysis of Variance (ANOVA)
CHAPTER- 17 CORRELATION AND REGRESSION
12 Inferential Analysis.
Simple Linear Regression
Product moment correlation
Regression Analysis.
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Multiple Linear Regression Nothing explains everything Multiple Linear Regression Laurens Holmes, Jr. Nemours/A.I.duPont Hospital for Children

What is MLR? Multiple Regression is a statistical method for estimating the relationship between a dependent variable and two or more independent (or predictor) variables.

Multiple Linear Regression Simply, MLR is a method for studying the relationship between a dependent variable and two or more independent variables. Purposes: Prediction Explanation Theory building

Operation? Uses the ordinary least squares solution (as does simple linear or bi-variable regression) Describes a line for which the (sum of squared) differences between the predicted and the actual values of the dependent variable are at a minimum. Represents the “function” that minimizes the sum of the squared errors. Ypred = a + b1X1 + B2X2 … + BnXn

Operation? MLR produces a model that identifies the best weighted combination of independent variables to predict the dependent (or criterion) variable. Ypred = a + b1X1 + B2X2 … + BnXn MLR estimates the relative importance of several hypothesized predictors. MLR assess the contribution of the combined variables to change the dependent variable.

Design Requirements One dependent variable (criterion) Two or more independent variables (predictor or explanatory variables). Sample size: >= 50 (at least 10 times as many cases as independent variables)

Variations Predictable variation by the combination of independent variables Total Variation in Y Total variance: Predicted (Explained) Variance (SS Regression): Coefficient of Determination (R2) = predictable variation by the combination of independent variables Unpredicted (Residual) Variance (SS Residual): SSreg/SSy = proporion of variation in Y predictable from X’s (R2) SSres/SSy = proportion of variaion in Y unpredictable from X’s (1-R2) In our example: R = .41 and R2 = .161 (16% of the variability in academic achievement is accouted for by the weighted composite of the independent variables general and academic self-concept (actually the formula for computing R2 is used first, then take the squareroot of R2) Unpredictable Variation

MLR Model: Basic Assumptions Independence: The data of any particular subject are independent of the data of all other subjects Normality: in the population, the data on the dependent variable are normally distributed for each of the possible combinations of the level of the X variables; each of the variables is normally distributed Homoscedasticity: In the population, the variances of the dependent variable for each of the possible combinations of the levels of the X variables are equal. Linearity: In the population, the relation between the dependent variable and the independent variable is linear when all the other independent variables are held constant.

Simple vs. Multiple Regression One dependent variable Y predicted from a set of independent variables (X1, X2 ….Xk) One regression coefficient for each independent variable R2: proportion of variation in dependent variable Y predictable by set of independent variables (X’s) One dependent variable Y predicted from one independent variable X One regression coefficient r2: proportion of variation in dependent variable Y predictable from X

MLR Equation Ypred = a + b1X1 + B2X2 … + BnXn (pred=predicted, 1 and 2 are underscore) Ypred = dependent variable or the variable to be predicted. X = the independent or predictor variables a = “raw score equations” include a constant or Y Intercept ob Y axis, representing the value of Y when X = 0. b = b weights; or partial regression coefficients. The bs show the relative contribution of their independent variable on the dependent variable when controlling for the effects of the other predictors

Variables in the model? One approach is to perform literature review and examine theories to identify potential predictors , thus building a “theoretical” variate, which may reflect the biologic or clinical relevance of the variable. This is sometimes referred to as the “standard” (simultaneous) regression method. A second approach is to examine statistics that show the effects of each variable both within and out of the equation. The “statistical variate” is built based on those variables showing the most effect (significant at 0.25). These are sometimes called “Forward and Backward Stepwise Regression

MLR Output The following notions are essential for the understanding of MLR output: R2, adjusted R2, constant, b coefficient, beta, F-test, t-test For MLR “R2” (the coefficient of multiple determination) is used rather than “r” (Pearson’s correlation coefficient) to assess the strength of this more complex relationship (as compared to a bivariate correlation)

Adjusted R square and b coefficient The adjusted R2 adjusts for the inflation in R2 caused by the number of variables in the equation. As the sample size increases above 20 cases per variable, adjustment is less needed (and vice versa). b coefficient measures the amount of increase or decrease in the dependent variable for a one-unit difference in the independent variable, controlling for the other independent variable(s) in the equation.

B coefficient Ideally, the independent variables are uncorrelated. Consequently, controlling for one of them will not affect the relationship between the other independent variable and the dependent variable

Intercorrelation or collinearlity If the two independent variables are uncorrelated, we can uniquely partition the amount of variance in Y due to X1 and X2 and bias is avoided. Small intercorrelations between the independent variables will not greatly biased the b coefficients. However, large intercorrelations will biased the b coefficients and for this reason other mathematical procedures are needed

MRL Model Building Each predictor is taken in turn. That is, all other predictors are first placed in the equation and then the predictor of interest is entered. This allows us to determine the unique (additional) contribution of the predictor variable. By repeating the procedure for each predictor we can determine the unique contribution of each independent variable.

Different Ways of Building Regression Models Simultaneous: all independent variables entered together Stepwise: independent variables entered according to some order By size or correlation with dependent variable In order of significance Hierarchical: independent variables entered in stages

Various Significance Tests Testing R2 Test R2 through an F test Test of competing models (difference between R2) through an F test of difference of R2s Testing b Test of each partial regression coefficient (b) by t-tests Comparison of partial regression coefficients with each other - t-test of difference between standardized partial regression coefficients ()

F and t tests The F-test is used as a general indicator of the probability that any of the predictor variables contribute to the variance in the dependent variable within the population. The null hypothesis is that the predictors’ weights are all effectively equal to zero. Implying that, none of the predictors contribute to the variance in the dependent variable in the population

F and t tests t-tests are used to test the significance of each predictor in the equation. The null hypothesis is that a predictor’s weight is effectively equal to zero when the effects of the other predictors are taken into account. That is, it does not contribute to the variance in the dependent variable within the population.

R Square When comparing the R2 of an original set of variables to the R2 after additional variables have been included, the researcher is able to identify the unique variation explained by the additional set of variables. Any co-variation between the original set of variables and the new variables will be attributed to the original variables. R2 (multiple correlation squared) – variation in Y accounted for by the set of predictors Adjusted R2 – sample variation around R2 can only lead to inflation of the value. The adjustment takes into account the size of the sample and number of predictors to adjust the value to be a better estimate of the population value. R2 is similar to η2 value but will be a little smaller because R2 only looks at linear relationship while η2 will account for non-linear relationships.

Vignette Suppose we wish to examine the factors that predict the length of hospitalization following spinal surgery in children with CP(dependent continuous variable). The available variables in the dataset are hematocrit, estimated blood loss, cell saver, operating time, age at surgery, and parked red blood cells. If the dependent and independent variables are measured on continuous scale, what will be an appropriate test statistic? Select appropriate variables (theory based and statistical approach), and determine the effect of estimated blood loss while controlling hematocrit and parked red blood cell, age at surgery, cell saver, operating time (duration of surgery).

SPSS: 1) analyze, 2) regression, 3) linear

SPSS Screen

Interpret the coefficients SPSS Output Interpret the coefficients

What does the ANOVA result mean? SPSS Output Interpret the r square What does the ANOVA result mean?

Repeated Measure Analysis of Variance (RM ANOVA) RM removes variability in baseline prognostic factor – ideal model !!! Repeated Measure Analysis of Variance (RM ANOVA) Univariable (Univariate)

Repeated Measures ANOVA Between Subjects Design ANOVA in which each participant participated in one of the three treatment groups for example. Within Subjects or Repeated Measures Design Participants participate in one treatment and the outcome of the treatment is measured in different time points for example 3, (before treatment, immediately after, and 6 months after treatment)

RM ANOVA Vs. Paired T test Repeated measures ANOVA, also known as within-subjects ANOVA, are an extension of Paired T-Tests. Like T-Tests, repeated measures ANOVA gives us the statistic tools to determine whether or not changed has occurred over time. T-Tests compare average scores at two different time periods for a single group of subjects. Repeated measures ANOVA compared the average score at multiple time periods for a single group of subjects.

RM ANOVA: Understanding the terms & analysis interpretation The first step in solving repeated measures ANOVA is to combine the data from the multiple time periods into a single time factor for analysis. The different time periods are analogous to the categories of the independent variable is a one-way analysis of variance. The time factor is then tested to see if the mean for the dependent variable is different for some categories of the time factor. If the time factor is statistically significant in the ANOVA test, then Bonferroni pair wise comparisons are computed to identify specific differences between time periods.

RM ANOVA: Understanding the terms & analysis interpretation The dependent variable is measured at three time periods, there are three paired comparisons: time 1 versus time 2 (preoperative or before treatment measure) time 2 versus time 3 (immediate after surgery/treatment measure) time 1 versus time 3 (Follow-up post operative measure)

Statistical Assumptions of RM ANOVA Independence Normality Homogeneity of within-treatment variances Sphericity RM is ideal in testing the hypothesis on treatment effectiveness when ethical constraints restricts the use of control subjects

Homogeneity of Variance In one-way ANOVA, we expect the variances to be equal We also expect that the samples are not related to one another (so no covariance or correlation)

Sphericity and Compound Symmetry Extension of homogeneity of variance assumption Compound Symmetry is stricter than Sphericity (but maybe easier to explain) All variances are equal to each other All covariance are equal to each other

Sphericity and Compound Symmetry If we meet assumption of Compound Symmetry than we meet assumption of Sphericity Sphericity is less strict and is the only thing we need to meet for RM ANOVA Sphericity is that the variance of the differences are equal Variance of difference scores between time 1 and 2 is equal to the variance of difference scores between time 2 and 3.

Spericity Assumption Violations A more conservative method of evaluating the significance of the obtained F is needed Greenhouse-Geisser (1958) correction Gives appropriate critical value for worst situation in which assumptions are maximally violated Huynh-Feldt correction The Huynh-Feldt epsilon is an attempt to correct the Greenhouse-Geisser epsilon, which tends to be overly conservative, especially for small sample sizes

Sample Table for RM ANOVA

RM ANOVA All participants participate in all treatment conditions, ex. surgery for spinal deformity correction. Participant emerges as an independent source of variance. In RM ANOVA there is no such variability. The other sources of variance include the repeated measures treatment and the Participant x treatment interaction

RM ANOVA Equation

Vignette Suppose a spinal fusion was performed to correct spinal deformities in Adolescent Idiopathic Scoliosis (AIS). If the main cobb angle was measured preoperatively, immediately after surgery (first erect), and during two years of follow-up, was the surgical procedure effective in correcting the curve deformity and maintaining correction after two years of follow-up? Hint: correction loss > 10 degrees in indicative of a clinically significant loss of correction.

Sample variables on preoperative, immediate operative and 2 year follow-up Normality assumption of the variables on the three measuring points of the cobb angle.

On SPSS, select Analyze GLM RM

From the variables box select accordingly 1, 2, and 3rd measurement points during the study period. SPSS Output Click the option box and select descriptive, and Bon multiple comparison.

SPSS OUTPUT Observe the time means and their SD Observe the sphericity significance in terms of variance

Report the Greenhouse-Geisser result SPSS Output Report the Greenhouse-Geisser result

SPSS Output

48