Topic 13: Multiple Linear Regression Example. Outline Description of example Descriptive summaries Investigation of various models Conclusions.

Slides:



Advertisements
Similar presentations
Topic 12: Multiple Linear Regression
Advertisements

Topic 9: Remedies.
Introduction to Predictive Modeling with Examples Nationwide Insurance Company, November 2 D. A. Dickey.
 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.
Forecasting Using the Simple Linear Regression Model and Correlation
Topic 15: General Linear Tests and Extra Sum of Squares.
CS Example: General Linear Test (cs2.sas)
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Simple Regression Model
EPI 809/Spring Probability Distribution of Random Error.
Topic 3: Simple Linear Regression. Outline Simple linear regression model –Model parameters –Distribution of error terms Estimation of regression parameters.
Creating Graphs on Saturn GOPTIONS DEVICE = png HTITLE=2 HTEXT=1.5 GSFMODE = replace; PROC REG DATA=agebp; MODEL sbp = age; PLOT sbp*age; RUN; This will.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Chapter 13 Multiple Regression
Multiple regression analysis
Matrix A matrix is a rectangular array of elements arranged in rows and columns Dimension of a matrix is r x c  r = c  square matrix  r = 1  (row)
Chapter 12 Multiple Regression
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Statistics for the Social Sciences Psychology 340 Spring 2005 Hypothesis testing with Correlation and Regression.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Chapter 7 Forecasting with Simple Regression
Multiple Linear Regression Response Variable: Y Explanatory Variables: X 1,...,X k Model (Extension of Simple Regression): E(Y) =  +  1 X 1 +  +  k.
Topic 16: Multicollinearity and Polynomial Regression.
Topic 28: Unequal Replication in Two-Way ANOVA. Outline Two-way ANOVA with unequal numbers of observations in the cells –Data and model –Regression approach.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Topic 2: An Example. Leaning Tower of Pisa Construction began in 1173 and by 1178 (2 nd floor), it began to sink Construction resumed in To compensate.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Topic 7: Analysis of Variance. Outline Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
Regression Examples. Gas Mileage 1993 SOURCES: Consumer Reports: The 1993 Cars - Annual Auto Issue (April 1993), Yonkers, NY: Consumers Union. PACE New.
Regression For the purposes of this class: –Does Y depend on X? –Does a change in X cause a change in Y? –Can Y be predicted from X? Y= mX + b Predicted.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
A Casual Tutorial on Sample Size Planning for Multiple Regression Models D. Keith Williams M.P.H. Ph.D. Department of Biostatistics.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Topic 17: Interaction Models. Interaction Models With several explanatory variables, we need to consider the possibility that the effect of one variable.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
1 Fourier Series (Spectral) Analysis. 2 Fourier Series Analysis Suitable for modelling seasonality and/or cyclicalness Identifying peaks and troughs.
Topic 6: Estimation and Prediction of Y h. Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Analisa Regresi Week 7 The Multiple Linear Regression Model
Chapter 13 Multiple Regression
1 Experimental Statistics - week 14 Multiple Regression – miscellaneous topics.
Topic 25: Inference for Two-Way ANOVA. Outline Two-way ANOVA –Data, models, parameter estimates ANOVA table, EMS Analytical strategies Regression approach.
ANOVA: Graphical. Cereal Example: nknw677.sas Y = number of cases of cereal sold (CASES) X = design of the cereal package (PKGDES) r = 4 (there were 4.
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Simple Linear Regression. Data available : (X,Y) Goal : To predict the response Y. (i.e. to obtain the fitted response function f(X)) Least Squares Fitting.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Topic 24: Two-Way ANOVA. Outline Two-way ANOVA –Data –Cell means model –Parameter estimates –Factor effects model.
Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects.
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
Experimental Statistics - week 9
Topic 29: Three-Way ANOVA. Outline Three-way ANOVA –Data –Model –Inference.
Anareg Week 10 Multicollinearity Interesting special cases Polynomial regression.
1 Experimental Statistics - week 12 Chapter 11: Linear Regression and Correlation Chapter 12: Multiple Regression.
Simple linear regression and correlation Regression analysis is the process of constructing a mathematical model or function that can be used to predict.
1 Experimental Statistics - week 11 Chapter 11: Linear Regression and Correlation.
INTRODUCTION TO MULTIPLE REGRESSION MULTIPLE REGRESSION MODEL 11.2 MULTIPLE COEFFICIENT OF DETERMINATION 11.3 MODEL ASSUMPTIONS 11.4 TEST OF SIGNIFICANCE.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
Chapter 14 Introduction to Multiple Regression
Simple Linear Regression
Linear Regression and Correlation
Linear Regression and Correlation
Presentation transcript:

Topic 13: Multiple Linear Regression Example

Outline Description of example Descriptive summaries Investigation of various models Conclusions

Study of CS students Too many computer science majors at Purdue were dropping out of program Wanted to find predictors of success to be used in admissions process Predictors must be available at time of entry into program.

Data available GPA after three semesters Overall high school math grade Overall high school science grade Overall high school English grade SAT Math SAT Verbal Gender (of interest for other reasons)

Data for CS Example Y is the student’s grade point average (GPA) after 3 semesters 3 HS grades and 2 SAT scores are the explanatory variables (p=6) Have n=224 students

Descriptive Statistics Data a1; infile 'C:\...\csdata.dat'; input id gpa hsm hss hse satm satv genderm1; proc means data=a1 maxdec=2; var gpa hsm hss hse satm satv; run;

Output from Proc Means VariableNMeanStd DevMinimumMaximum gpa hsm hss hse satm satv

Descriptive Statistics proc univariate data=a1; var gpa hsm hss hse satm satv; histogram gpa hsm hss hse satm satv /normal; run;

Correlations proc corr data=a1; var hsm hss hse satm satv; proc corr data=a1; var hsm hss hse satm satv; with gpa; run;

Output from Proc Corr Pearson Correlation Coefficients, N = 224 Prob > |r| under H0: Rho=0 gpahsmhsshsesatmsatv gpa < < < hsm < < < < hss < < < <.0001 hse < < < satm < <.0001 satv < <

Output from Proc Corr Pearson Correlation Coefficients, N = 224 Prob > |r| under H0: Rho=0 hsmhsshsesatmsatv gpa < < < All but SATV significantly correlated with GPA

Scatter Plot Matrix proc corr data=a1 plots=matrix; var gpa hsm hss hse satm satv; run; Allows visual check of pairwise relationships

No “strong” linear Relationships Can see discreteness of high school scores

Use high school grades to predict GPA (Model #1) proc reg data=a1; model gpa=hsm hss hse; run;

Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept hsm <.0001 hss hse Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Results Model #1 Meaningful??

ANOVA Table #1 Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model <.0001 Error Corrected Total Significant F test but not all variable t tests significant

Remove HSS (Model #2) proc reg data=a1; model gpa=hsm hse; run;

Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept hsm <.0001 hse Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Results Model #2 Slightly better MSE and adjusted R-Sq

ANOVA Table #2 Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model <.0001 Error Corrected Total Significant F test but not all variable t tests significant

Rerun with HSM only (Model #3) proc reg data=a1; model gpa=hsm; run;

Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept hsm <.0001 Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Results Model #3 Slightly worse MSE and adjusted R-Sq

ANOVA Table #3 Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model <.0001 Error Corrected Total Significant F test and all variable t tests significant

SATs (Model #4) proc reg data=a1; model gpa=satm satv; run;

Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Results Model #4 Much worse MSE and adjusted R-Sq Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept satm satv

ANOVA Table #4 Analysis of Variance SourceDF Sum of Squares Mean SquareF ValuePr > F Model Error Corrected Total Significant F test but not all variable t tests significant

HS and SATs (Model #5) proc reg data=a1; model gpa=satm satv hsm hss hse; *Does general linear test; sat: test satm, satv; hs: test hsm, hss, hse;

Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Results Model #5 Parameter Estimates VariableDF Parameter Estimate Standard Errort ValuePr > |t| Intercept hsm hss hse satm satv

Test sat Test sat Results for Dependent Variable gpa SourceDF Mean SquareF ValuePr > F Numerator Denominator Cannot reject the reduced model…No significant information lost…We don’t need SAT variables

Test hs Test hs Results for Dependent Variable gpa SourceDF Mean SquareF ValuePr > F Numerator <.0001 Denominator Reject the reduced model…There is significant information lost…We can’t remove HS variables from model

Best Model? Likely the one with just HSM or the one with HSE and HSM. We’ll discuss comparison methods in Chapters 7 and 8

Key ideas from case study First, look at graphical and numerical summaries one variable at a time Then, look at relationships between pairs of variables with graphical and numerical summaries. Use plots and correlations to understand relationships

Key ideas from case study The relationship between a response variable and an explanatory variable depends on what other explanatory variables are in the model A variable can be a significant (P 0.5) when other X’s are in the model

Key ideas from case study Regression coefficients, standard errors and the results of significance tests depend on what other explanatory variables are in the model

Key ideas from case study Significance tests (P values) do not tell the whole story Squared multiple correlations give the proportion of variation in the response variable explained by the explanatory variables) can give a different view We often express R 2 as a percent

Key ideas from case study You can fully understand the theory in terms of Y = Xβ + e However to effectively use this methodology in practice you need to understand how the data were collected, the nature of the variables, and how they relate to each other

Background Reading Cs2.sas contains the SAS commands used in this topic