SADC Course in Statistics Multiple Linear Regression: Introduction (Session 06)

Slides:



Advertisements
Similar presentations
1
Advertisements

Ecole Nationale Vétérinaire de Toulouse Linear Regression
& dding ubtracting ractions.
STATISTICS Linear Statistical Models
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
David Burdett May 11, 2004 Package Binding for WS CDL.
CALENDAR.
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
1 Correlation and Simple Regression. 2 Introduction Interested in the relationships between variables. What will happen to one variable if another is.
1 Session 8 Tests of Hypotheses. 2 By the end of this session, you will be able to set up, conduct and interpret results from a test of hypothesis concerning.
SADC Course in Statistics Common Non- Parametric Methods for Comparing Two Samples (Session 20)
SADC Course in Statistics Multiple Linear Regresion: Further issues and anova results (Session 07)
SADC Course in Statistics Estimating population characteristics with simple random sampling (Session 06)
SADC Course in Statistics Analysis of Variance with two factors (Session 13)
SADC Course in Statistics Simple Linear Regression (Session 02)
SADC Course in Statistics Introduction to Non- Parametric Methods (Session 19)
SADC Course in Statistics Tests for Variances (Session 11)
Assumptions underlying regression analysis
SADC Course in Statistics Inferences about the regression line (Session 03)
SADC Course in Statistics Revision of key regression ideas (Session 10)
Correlation & the Coefficient of Determination
SADC Course in Statistics Sampling design using the Paddy game (Sessions 15&16)
SADC Course in Statistics Comparing two proportions (Session 14)
SADC Course in Statistics Linking tests to confidence intervals (and other issues) (Session 10)
SADC Course in Statistics Review of ideas of general regression models (Session 15)
SADC Course in Statistics A model for comparing means (Session 12)
SADC Course in Statistics Modelling ideas in general – an appreciation (Session 20)
Chapter 7 Sampling and Sampling Distributions
The 5S numbers game..
Simple Linear Regression 1. review of least squares procedure 2
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Break Time Remaining 10:00.
The basics for simulations
PP Test Review Sections 6-1 to 6-6
Chi-Square and Analysis of Variance (ANOVA)
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Multiple Regression. Introduction In this chapter, we extend the simple linear regression model. Any number of independent variables is now allowed. We.
Lecture Unit Multiple Regression.
Introduction The fit of a linear function to a set of data can be assessed by analyzing residuals. A residual is the vertical distance between an observed.
When you see… Find the zeros You think….
: 3 00.
5 minutes.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
© The McGraw-Hill Companies, Inc., Chapter 10 Testing the Difference between Means and Variances.
Chapter 10 Correlation and Regression
1 Let’s Recapitulate. 2 Regular Languages DFAs NFAs Regular Expressions Regular Grammars.
Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
12 System of Linear Equations Case Study
Converting a Fraction to %
Chapter Thirteen The One-Way Analysis of Variance.
Ch 14 實習(2).
Clock will move after 1 minute
PSSA Preparation.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Module 20: Correlation This module focuses on the calculating, interpreting and testing hypotheses about the Pearson Product Moment Correlation.
Simple Linear Regression Analysis
Correlation and Linear Regression
Multiple Linear Regression and Correlation Analysis
Multiple Regression and Model Building
Select a time to count down from the clock above
STAT E-150 Statistical Methods
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
SADC Course in Statistics Comparing Means from Independent Samples (Session 12)
SADC Course in Statistics Comparing Regressions (Session 14)
Presentation transcript:

SADC Course in Statistics Multiple Linear Regression: Introduction (Session 06)

To put your footer here go to View > Header and Footer 2 Learning Objectives At the end of this session, you will be able to interpret results from a regression model with more than one explanatory variable understand the specific hypotheses being tested by t-values associated with parameter estimates have an appreciation of what might be done with outliers, identified via residual plots

To put your footer here go to View > Header and Footer 3 More than one explanatory variable In real life examples, the following type of questions may be asked… What factors affect child mortality? Can household socio-economic characteristics be identified that relate closely to household poverty levels? Would provision of free fertiliser and packs of seed increase crop productivity and hence improved livelihoods of farmers? Addressing these leads to fitting multiple linear regression models

To put your footer here go to View > Header and Footer 4 Example with 3 explanatory variables A random sample of 45 university students were asked to personally decide which of a set of 25 acts they would consider to be a crime. The number of acts selected was recorded as the variable named crimes. Data were also collected on each students age, years in college and income of parents. Question: Which of the three factors (if any) have an effect on students views on what acts constitute crime?

To put your footer here go to View > Header and Footer 5 y = number of acts regarded as being criminal x 1 = age x 2 = years in college x 3 = parents income Start with some scatter plots crimes age College years Parents income

To put your footer here go to View > Header and Footer 6 Crimes appears most strongly associated with age and income, although for age, this association is not linear – also one outlier? Crimes does not appear associated with years in college Can test whether these observations are telling us something real about the relationship by regression analysis procedures Initial visual impression

To put your footer here go to View > Header and Footer 7 Aim Would aim for the simplest possible model i.e. one with fewest parameters that still adequately summarises the relationship of response (here crimes) and one or more of the predictors (here age, college years, and income) and gives information on which of the explanatory variables make a contribution to variability in crimes.

To put your footer here go to View > Header and Footer 8 Anova with all 3 variables Source | SS df MS F Prob Model | Residual | Total | Here the F-probability of indicates there is strong evidence that at least one of the 3 explanatory variables contributes significantly to variability in crimes. The adjusted R 2 value is 77.6%

To put your footer here go to View > Header and Footer 9 Parameter Estimates crimes | Coef. Std. Err. t P>|t| age | college | income | const. | Hence equation describing the model is: Crimes (y) = (age) (college) (income) More generally, y i = x 1i + 2 x 2i + 3 x 3i + i

To put your footer here go to View > Header and Footer 10 Interpretation of t-probabilities crimes | Coef. Std. Err. t P>|t| age | college | income | const. | Each t-probability indicates whether the corresponding variable contributes significantly to the model in the presence of the other two. Thus age added to a model including college and income does not explain any additional amount of variability in crimes.

To put your footer here go to View > Header and Footer 11 Next steps… finding the best model Since both age and college give non-significant p-values, should we drop both? Most definitely the answer is NO!!! At most, we drop one and look at the results. Dropping college gives the following: crimes | Coef. Std. Err. t P>|t| age | income | const. |

To put your footer here go to View > Header and Footer 12 Meaning of regression coefficients crimes | Coef. Std. Err. t P>|t| age | income | const. | Interpret the regression coefficient 0.43 for age (or 0.32 for income) as representing the change in crimes for a unit change in age (or income), provided the other variable remains unchanged in the model.

To put your footer here go to View > Header and Footer 13 Final steps… - residual plots A normal probability plot Plot of residuals versus fitted values What do you conclude from these plots?

To put your footer here go to View > Header and Footer 14 Conclusions… (a) Normality assumption OK, but some doubt about variance homogeneity… (b) If assumptions assumed OK, age and parents income contribute significantly to explaining the variability in students response concerning the number of acts that constitute a crime. (c) 78.0% of the variability in crimes was explained by age and income. (d) The equation describing the relationship is: Crimes = (age) (income)

To put your footer here go to View > Header and Footer 15 Points to note… 1.Although age appeared non-significant in the initial model with all 3 explanatory variables, dropping college gave a significant t-value for age. This emphasises the need to remember that the interpretation of t-probabilities is dependent on other variables included in the model. 2.The graph of crimes versus age showed a quadratic relationship. Should we therefore consider including (age) 2 as an additional variable in the model?

To put your footer here go to View > Header and Footer 16 Results including age crimes | Coef. Std. Err. t P>|t| age | age2 | income | const.| There is no evidence of an improvement by adding age-squared, so we return to the previous model. i.e. initial model with age and income is still better.

To put your footer here go to View > Header and Footer 17 Consider also a model with age+age crimes | Coef. Std. Err. t P>|t| age | age2 | const. | Without income, there is a significant quadratic relationship. Age alone explains only 4% of variability in crimes, but including age 2 increase adjusted R 2 to 26%. However, this is a much lower R 2 compared to model with age and income = our final choice!

To put your footer here go to View > Header and Footer 18 Residual plots: model with age+age 2 Note the outlier in both plots!!! This is not the chosen final model, but if it were, need to consider action to take with outliers! With just one, it can be removed and reported separately!

To put your footer here go to View > Header and Footer 19 Practical work follows to ensure learning objectives are achieved…