Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10.1, 10.3 Multiple explanatory variables (10.1,

Slides:



Advertisements
Similar presentations
Objectives 10.1 Simple linear regression
Advertisements

C 3.7 Use the data in MEAP93.RAW to answer this question
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Inference for Regression
Regression Inferential Methods
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6, 9.1 Least squares line Interpreting.
Through Thick and Thin By:Mark Bergman Thomas Bursey Jay LaPorte Paul Miller Aaron Sinz.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTION 10.3 Categorical variables Variable selection.
Objectives (BPS chapter 24)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Lecture 24: Thurs., April 8th
Multiple Regression III 4/16/12 More on categorical variables Missing data Variable Selection Stepwise Regression Confounding variables Not in book Professor.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Least squares line Interpreting coefficients Prediction Cautions The formal model Section 2.6, 9.1, 9.2 Professor Kari Lock Morgan.
Simple Linear Regression Analysis
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Simple Linear Regression SECTIONS 9.3 Confidence and prediction intervals.
Multiple Regression continued… STAT E-150 Statistical Methods.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Probability SECTION 11.1 Events and, or, if Disjoint Independent Law of total.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/27/12 Multiple Regression SECTION 10.3 Categorical variables Variable.
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
Understanding Multivariate Research Berry & Sanders.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10.1, 10.3 (?) Multiple explanatory variables.
Regression. Height Weight How much would an adult female weigh if she were 5 feet tall? She could weigh varying amounts – in other words, there is a distribution.
Regression. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for each. Using the slope from each.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 9.1 Inference for correlation Inference for.
Multiple Regression I 4/9/12 Transformations The model Individual coefficients R 2 ANOVA for regression Residual standard error Section 9.4, 9.5 Professor.
Regression with Inference Notes: Page 231. Height Weight Suppose you took many samples of the same size from this population & calculated the LSRL for.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTIONS 9.1, 9.3 Inference for slope (9.1)
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Lecture 10 Chapter 23. Inference for regression. Objectives (PSLS Chapter 23) Inference for regression (NHST Regression Inference Award)[B level award]
Statistics: Unlocking the Power of Data Lock 5 Exam 2 Review STAT 101 Dr. Kari Lock Morgan 11/13/12 Review of Chapters 5-9.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTION 10.3 Variable selection Confounding variables.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 12/6/12 Synthesis Big Picture Essential Synthesis Bayesian Inference (continued)
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 2.6 Least squares line Interpreting coefficients.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/20/12 Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan 11/6/12 Simple Linear Regression SECTION 2.6 Interpreting coefficients Prediction.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Simple Linear Regression SECTION 9.1 Inference for correlation Inference for.
We will use the 2012 AP Grade Conversion Chart for Saturday’s Mock Exam.
Regression Inference. Height Weight How much would an adult male weigh if he were 5 feet tall? He could weigh varying amounts (in other words, there is.
Stats Methods at IC Lecture 3: Regression.
Regression.
Introduction to Regression Analysis
What is Correlation Analysis?
STAT 250 Dr. Kari Lock Morgan
Regression.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Lecture Slides Elementary Statistics Thirteenth Edition
Chapter 12 Regression.
Regression Inference.
Two Quantitative Variables: Linear Regression
Regression.
Regression.
Regression.
Regression Chapter 8.
Regression.
Regression.
Presentation transcript:

Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10.1, 10.3 Multiple explanatory variables (10.1, 10.3)

Statistics: Unlocking the Power of Data Lock 5 Today we’ll finally learn a way to handle more than 2 variables! More than 2 variables!

Statistics: Unlocking the Power of Data Lock 5

Question of the Day How can we predict body fat percentage from easy measurements?

Statistics: Unlocking the Power of Data Lock 5 Multiple regression extends simple linear regression to include multiple explanatory variables: Each x is a different explanatory variable k is the number of explanatory variables Multiple Regression

Statistics: Unlocking the Power of Data Lock 5 Predicting Body Fat Percentage The percentage of a person’s weight that is made up of body fat is often used as an indicator of health and fitness Accurate measures of percent body fat at difficult to implement For example, you can immerse the body in water to estimate density, then apply a formula Another option: build a model predicting % body fat based on easy to obtain measurements

Statistics: Unlocking the Power of Data Lock 5 Body Fat Data Measurements were collected on 100 men Response variable: percent body fat Explanatory variables:  Age (in years)  Weight (in pounds)  Height (in inches)  Neck circumference (in cm)  Chest circumference (in cm)  Abdomen circumference (in cm)  Ankle circumference (in cm)  Biceps circumference (in cm)  Wrist circumference (in cm) A sample taken from data provided by Johnson R., "Fitting Percentage of Body Fat to Simple Body Measurements," Journal of Statistics Education, 1996,

Statistics: Unlocking the Power of Data Lock 5 Predicting Percent Body Fat We’ll start with just three explanatory variables and fit the model: Bodyfat = Age Weight Height What can we do with this?  Make predictions  Interpret coefficients  Inference  Interpret R 2  and more!

Statistics: Unlocking the Power of Data Lock 5 Making Predictions Bodyfat = Age Weight Height If you are male, you can use this to predict your percent body fat! Age: years, weight: pounds, height: inches (Females can try too, just for practice, but it won’t be accurate) Any volunteers?

Statistics: Unlocking the Power of Data Lock 5 Percent Body Fat

Statistics: Unlocking the Power of Data Lock 5 Interpreting Coefficients Bodyfat = Age Weight Height Intercept: a man 0 years old, weighs 0 lbs, and is 0 inches tall would have 49.6% body fat Slope: Keeping weight and height constant, percent body fat increases by for every additional year Keeping age and height constant, percent body fat increases by for every additional pound

Statistics: Unlocking the Power of Data Lock 5 Interpreting Coefficients Bodyfat = Age Weight Height Which of the following is a correct interpretation? a) Keeping age and weight constant, height decreases by for every additional percent of body fat b) Keeping age and weight constant, percent body fat decreases by for every additional inch c) Predicted body fat decreases by for every additional inch

Statistics: Unlocking the Power of Data Lock 5 Inference Are our explanatory variables significant predictors? All of the p-values corresponding to the explanatory variables are very small Age, weight, and height are all significant predictors of percent body fat (given the other variables in the model)

Statistics: Unlocking the Power of Data Lock 5 Explaining Variability How much of the variability in percent body fat is explained by this model? Which of the following would tell us this? a) p-value b) correlation c) slope coefficients d) R 2 e) confidence interval

Statistics: Unlocking the Power of Data Lock 5 Comparing with BMI BMI is used more commonly than percent body fat because it is easy to calculate Currently, our predicted percent body fat is not using much more information than BMI (just age as an extra predictor) What’s wrong with body mass index (BMI) as a indicator of health and fitness? How might we improve our model to fix this problem?

Statistics: Unlocking the Power of Data Lock 5 New Model Bodyfat = Age Weight Height Abdomen Anything look odd about this equation??? Model without Abdomen: Bodyfat = Age Weight Height

Statistics: Unlocking the Power of Data Lock 5 Significance Which explanatory variable(s) are significant? a) All of them – age, weight, height, abdomen b) Weight and height c) Weight, height, abdomen d) Weight and abdomen e) Abdomen only

Statistics: Unlocking the Power of Data Lock 5 Multiple Regression The coefficient for each explanatory variable is the predicted change in y for one unit change in x, given the other explanatory variables in the model! The p-value for each coefficient indicates whether it is a significant predictor of y, given the other explanatory variables in the model! If explanatory variables are associated with each other, coefficients and p-values will change depending on what else is included in the model

Statistics: Unlocking the Power of Data Lock 5 Full Model

Statistics: Unlocking the Power of Data Lock 5 Which explanatory variable(s) are significant? a) All of them b) Weight and abdomen c) Neck only d) Abdomen and wrist

Statistics: Unlocking the Power of Data Lock 5 Insignificant Terms What should we do with the insignificant variables? Keep them in the model? Take them out of the model? Deciding which variables to keep in the model (variable selection) is an entire subfield of statistics, and beyond the scope of this class Want to learn more about it? Take STAT 462!

Statistics: Unlocking the Power of Data Lock 5 Electricity and Life Expectancy Cases: countries of the world Response variable: life expectancy Explanatory variable: electricity use (kWh per capita) Is a country’s electricity use helpful in predicting life expectancy?

Statistics: Unlocking the Power of Data Lock 5 Electricity and Life Expectancy

Statistics: Unlocking the Power of Data Lock 5 Electricity and Life Expectancy

Statistics: Unlocking the Power of Data Lock 5 Electricity and Life Expectancy If we increased electricity use in a country, would life expectancy increase? (a) Yes (b) No (c) Impossible to tell

Statistics: Unlocking the Power of Data Lock 5 Confounding Variables Wealth is an obvious confounding variable that could explain the relationship between electricity use and life expectancy Multiple regression is a powerful tool that allows us to account for confounding variables We can see whether an explanatory variable is still significant, even after including potential confounding variables in the model

Statistics: Unlocking the Power of Data Lock 5 Electricity and Life Expectancy Is a country’s electricity use helpful in predicting life expectancy, even after including GDP in the model? (a) Yes(b) No

Statistics: Unlocking the Power of Data Lock 5 Cases: countries of the world Response variable: life expectancy Explanatory variable: number of mobile cellular subscriptions per 100 people Is a country’s cell phone subscription rate helpful in predicting life expectancy? Cell Phones and Life Expectancy

Statistics: Unlocking the Power of Data Lock 5 Cell Phones and Life Expectancy

Statistics: Unlocking the Power of Data Lock 5 Cell Phones and Life Expectancy

Statistics: Unlocking the Power of Data Lock 5 If we gave everyone in a country a cell phone and a cell phone subscription, would life expectancy in that country increase? (a) Yes (b) No (c) Impossible to tell Cell Phones and Life Expectancy

Statistics: Unlocking the Power of Data Lock 5 Is a country’s cell phone subscription rate helpful in predicting life expectancy, even after including GDP in the model? (a) Yes(b) No Cell Phones and Life Expectancy

Statistics: Unlocking the Power of Data Lock 5 Cell Phones and Life Expectancy This says that wealth alone can not explain the association between cell phone subscriptions and life expectancy This suggests that either cell phones actually do something to increase life expectancy (causal) OR there is another confounding variable besides wealth of the country

Statistics: Unlocking the Power of Data Lock 5 Confounding Variables Multiple regression is one potential way to account for confounding variables This is most commonly used in practice across a wide variety of fields, but is quite sensitive to the conditions for the linear model (particularly linearity) You can only “rule out” confounding variables that you have data on, so it is still very hard to make true causal conclusions without a randomized experiment

Statistics: Unlocking the Power of Data Lock 5 To Do Read 10.1 Do HW 10 (due Friday, 12/11)