Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Objectives (BPS chapter 24)
Chapter 12 Simple Regression
Statistics for Business and Economics
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
REGRESSION AND CORRELATION
Introduction to Probability and Statistics Linear Regression and Correlation.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Simple Linear Regression and Correlation
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Linear Regression/Correlation
Review of Statistics 101 We review some important themes from the course 1.Introduction Statistics- Set of methods for collecting/analyzing data (the art.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Correlation & Regression
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Objectives of Multiple Regression
Introduction to Linear Regression and Correlation Analysis
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.1 Using Several Variables to Predict a Response.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Chapter 14 Introduction to Multiple Regression
11. Multiple Regression y – response variable x 1, x 2, …, x k -- a set of explanatory variables In this chapter, all variables assumed to be quantitative.
Soc 3306a Multiple Regression Testing a Model and Interpreting Coefficients.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chapter 10 Correlation and Regression
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
9. Linear Regression and Correlation Data: y: a quantitative response variable x: a quantitative explanatory variable (Chapter 8: Recall that both variables.
Lecture 10: Correlation and Regression Model.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
10. Introduction to Multivariate Relationships Bivariate analyses are informative, but we usually need to take into account many variables. Many explanatory.
1 Chapter 12: Analyzing Association Between Quantitative Variables: Regression Analysis Section 12.1: How Can We Model How Two Variables Are Related?
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
BPS - 5th Ed. Chapter 231 Inference for Regression.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Chapter 13 Simple Linear Regression
The simple linear regression model and parameter estimation
Chapter 14 Introduction to Multiple Regression
Inference for Least Squares Lines
Chapter 11 Simple Regression
11. Multiple Regression y – response variable
Chapter 13 Multiple Regression
Correlation and Regression
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Linear Regression/Correlation
Prepared by Lee Revere and John Large
Review for Exam 2 Some important themes from Chapters 6-9
Presentation transcript:

Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use sampling distributions, confidence intervals, significance tests, etc. As usual, exam is mixture of true/false to test concepts and problems (like examples in class and homework), with emphasis on interpreting software output.

Chap. 9: Linear Regression and Correlation Data: y – a quantitative response variable x – a quantitative explanatory variable We consider: Is there an association? (test of independence using slope) How strong is the association? (uses correlation r and r 2 ) How can we predict y using x? (estimate a regression equation) Linear regression equation E(y) =  +  x describes how mean of conditional distribution of y changes as x changes Least squares estimates this and provides a sample prediction equation

The linear regression equation E(y) =  +  x is part of a model. The model has another parameter σ that describes the variability of the conditional distributions; that is, the variability of y values for all subjects having the same x-value. For an observation, difference between observed value of y and predicted value of y, is a residual (vertical distance on scatterplot) Least squares method minimizes the sum of squared residuals (errors), which is SSE used also in r 2 and the estimate s of conditional standard deviation of y

Measuring association: The correlation and its square The correlation is a standardized slope that does not depend on units Correlation r relates to slope b of prediction equation by r = b(s x /s y ) -1 ≤ r ≤ +1, with r having same sign as b and r = 1 or -1 when all sample points fall exactly on prediction line, so r describes strength of linear association The larger the absolute value, the stronger the association Correlation implies that predictions regress toward the mean

The proportional reduction in error in using x to predict y (via the prediction equation) instead of using sample mean of y to predict y is Since -1 ≤ r ≤ +1, 0 ≤ r 2 ≤ 1, and r 2 = 1 when all sample points fall exactly on prediction line r and r 2 do not depend on units, or distinction between x, y The r and r 2 values tend to weaken when we observe x only over a restricted range, and they can also be highly influenced by outliers.

Inference for regression model Parameter: Population slope in regression model (  ) H 0 : independence is H 0 :  = 0 Test statistic t = (b – 0)/se, with df = n – 2 A CI for  has form b ± t(se) where t-score has df = n-2 and is from t-table with half the error probability in each tail. (Same se as in test) In practice, CI for multiple of slope (e.g., 10  ) may be more relevant (find by multiplying endpoints of CI for  by the relevant constant) CI not containing 0 is equivalent to rejecting H 0 (when error probability is same for each)

Software reports SS values (SSE, regression SS, TSS = regression SS + SSE) and F test results in an ANOVA (analysis of variance) table The F statistic in the ANOVA table is the square of the t statistic for testing H 0 :  = 0. It has the same P-value as for the two-sided test. We need to use F when we have several parameters in H 0, such as in testing that all  parameters in a multiple regression model = 0 (see Chapter 11)

Chap. 10: Introduction to Multivariate Relationships Bivariate analyses informative, but we usually need to take into account many variables. Many explanatory variables have an influence on any particular response variable. The effect of an explanatory variable on a response variable may change when we take into account other variables. (Recall admissions into Berkeley example) When each pair of variables is associated, then a bivariate association for two variables may differ from its “partial” association, controlling for another variable

Association does not imply causation! With observational data, effect of X on Y may be partly due to association of X and Y with other lurking variables. Experimental studies have advantage of being able to control potential lurking variables (groups being compared should be roughly “balanced” on them). When X 1 and X 2 both have effects on Y but are also associated with each other, there is confounding. It’s difficult to determine whether either truly causes Y, because a variable’s effect could be at least partially due to its association with the other variable.

Simpson’s paradox: It is possible for the (bivariate) association between two variables to be positive, yet for the partial association to be negative at each fixed level of a third variable (or reverse) Spurious association: Y and X 1 both depend on X 2 and association disappears after controlling X 2 Multiple causes more common, in which explanatory variables have associations among themselves as well as with response var. Effect of any one changes depending on what other variables controlled (statistically), often because it has a direct effect and also indirect effects. Statistical interaction – Effect of X 1 on Y changes as the level of X 2 changes (e.g., non-parallel lines in regr.)

Chap. 11: Multiple Regression y – response variable x 1, x 2, …, x k -- set of explanatory variables All variables assumed to be quantitative (later chapters incorporate categorical variables in model also) Multiple regression equation (population): E(y) =  +  1 x 1 +  2 x 2 + …. +  k x k Controlling for other predictors in model, there is a linear relationship between E(y) and x 1 with slope  1.

Partial effects in multiple regression refer to statistically controlling other variables in model, so differ from effects in bivariate models, which ignore all other variables. Partial effect of a predictor in multiple regression is identical at all fixed values of other predictors in model (assumption of “no interaction,” corresponding to parallel lines) Again, this is a model. We fit it using least squares, minimizing SSE out of all equations of the assumed form. The model may not be appropriate (e.g., if there is severe interaction). Graphics include scatterplot matrix (corresponding to correlation matrix), partial regression plots to study effect of a predictor after controlling (instead of ignoring) other var’s.

Multiple correlation and R 2 The multiple correlation R is the correlation between the observed y-values and predicted y-values. R 2 is the proportional reduction in error from using the prediction equation (instead of sample mean) to predict y 0 ≤ R 2 ≤ 1 and 0 ≤ R ≤ 1. R 2 cannot decrease (and SSE cannot increase) when predictors are added to a regression model The numerator of R 2 (namely, TSS – SSE) is the regression sum of squares, the variability in y “explained” by the regression model.

Inference for multiple regression model To test whether k explanatory variables collectively have effect on y, we test H 0 :  1 =  2 = … =  k = 0 Test statistic When H 0 true, F values follow the F distribution df 1 = k (no. of predictors in model) df 2 = n – (k+1) (sample size – no. model parameters)

Inferences for individual regression coefficients To test partial effect of x i controlling for the other explan. var’s in model, test H 0 :  i = 0 using test stat. t = (b i – 0)/se, df = n - (k + 1) CI for  i has form b i ± t(se), with t-score also having df = n - (k + 1), for the desired confidence level Partial t test results can seem logically inconsistent with result of F test, when explanatory variables are highly correlated

Modeling interaction The multiple regression model E(y) =  +  1 x 1 +  2 x 2 + …. +  k x k assumes the partial slope relating y to each x i is the same at all values of other predictors Model allowing interaction (e.g., for 2 predictors), E(y) =  +  1 x 1 +  2 x 2 +  3 (x 1 x 2 ) = (  +  2 x 2 ) + (  1 +  3 x 2 )x 1 is special case of multiple regression model E(y) =  +  1 x 1 +  2 x 2 +  3 x 3 with x 3 = x 1 x 2

Comparing two regression models To test whether a model gives a better fit than a simpler model containing only a subset of the predictors, use test statistic df 1 = number of extra parameters in complete model, df 2 = n-(k+1) = df 2 for F test that all  ’s in complete model = 0 Application: Compare model with many interaction terms to simpler model without them

How do we include categorical explanatory variable in model (Ch. 12)? Preview in Exercises and y = selling price of home (thousands of dollars) x 1 = size of home (in square feet) x 2 = whether the home is new (1 = yes, 0 = no) x 2 is a “dummy variable” (also called “indicator variable”) Predicted y-value = x x 2 The difference in predicted selling prices for new homes and older homes of a fixed size is 20, i.e., $20,000.

How do we include categorical response variable in model (Ch. 15)? Model the probability for a category of the response variable Need a mathematical formula more complex than a straight line, to keep predicted probabilities between 0 and 1 Logistic regression uses an S-shaped curve that goes from 0 up to 1 or from 1 down to 0 as a predictor x changes