Class 28 Get Ready…..

Slides:



Advertisements
Similar presentations
Chapters 14 and 15 – Linear Regression and Correlation
Advertisements

Objectives 10.1 Simple linear regression
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Review for the chapter 6 test 6. 1 Scatter plots & Correlation 6
CORRELATON & REGRESSION
Copyright © 2010, 2007, 2004 Pearson Education, Inc. *Chapter 29 Multiple Regression.
Intro to Statistics for the Behavioral Sciences PSYC 1900
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Correlation and Regression Analysis
Introduction to Linear Regression.  You have seen how to find the equation of a line that connects two points.
TODAY IN ALGEBRA 2.0…  Warm up: Writing the equation of a perpendicular line  Learning Goal 1: 2.6 (Part 1) You will fit lines to data in scatter plots.
Linear Regression Analysis
Chapter 12 Correlation and Regression Part III: Additional Hypothesis Tests Renee R. Ha, Ph.D. James C. Ha, Ph.D Integrative Statistics for the Social.
Introduction to Linear Regression and Correlation Analysis
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
What is a function? Quite simply, a function is a rule which takes certain values as input values and assigns to each input value exactly one output value.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
Simple Linear Regression. Correlation Correlation (  ) measures the strength of the linear relationship between two sets of data (X,Y). The value for.
Correlation and Regression
Correlation and Regression. The test you choose depends on level of measurement: IndependentDependentTest DichotomousContinuous Independent Samples t-test.
Correlation and regression lesson 1 Introduction.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10.1, 10.3 (?) Multiple explanatory variables.
Regression. Correlation and regression are closely related in use and in math. Correlation summarizes the relations b/t 2 variables. Regression is used.
Regression. Idea behind Regression Y X We have a scatter of points, and we want to find the line that best fits that scatter.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
1.5 Cont. Warm-up (IN) Learning Objective: to create a scatter plot and use the calculator to find the line of best fit and make predictions. (same as.
Hypothesis of Association: Correlation
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
1 Psych 5510/6510 Chapter 10. Interactions and Polynomial Regression: Models with Products of Continuous Predictors Spring, 2009.
© 2010 Pearson Prentice Hall. All rights reserved. CHAPTER 12 Statistics.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
BIOL 582 Lecture Set 11 Bivariate Data Correlation Regression.
Sec 1.5 Scatter Plots and Least Squares Lines Come in & plot your height (x-axis) and shoe size (y-axis) on the graph. Add your coordinate point to the.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
PS 225 Lecture 20 Linear Regression Equation and Prediction.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Political Science 30: Political Inquiry. Linear Regression II: Making Sense of Regression Results Interpreting SPSS regression output Coefficients for.
Chapter 8 Linear Regression. Slide 8- 2 Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the.
Correlation tells us about strength (scatter) and direction of the linear relationship between two quantitative variables. In addition, we would like to.
Correlation. Up Until Now T Tests, Anova: Categories Predicting a Continuous Dependent Variable Correlation: Very different way of thinking about variables.
Reminder Remember that both mean and standard deviation are not resistant measures so you want to take that into account when calculating the correlation.
Basic Statistics Linear Regression. X Y Simple Linear Regression.
Chapter 8: Simple Linear Regression Yang Zhenlin.
PCB 3043L - General Ecology Data Analysis.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Example x y We wish to check for a non zero correlation.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Multiple Regression SECTIONS 10.1, 10.3 Multiple explanatory variables (10.1,
$100 $200 $300 $400 $500 $200 $300 $400 $500 Rate of Change and Slope Intercept Standard Form and Point Slope Absolute Value Equations Parallel and.
Wednesday: Need a graphing calculator today. Need a graphing calculator today.
Unit 3 Correlation. Homework Assignment For the A: 1, 5, 7,11, 13, , 21, , 35, 37, 39, 41, 43, 45, 47 – 51, 55, 58, 59, 61, 63, 65, 69,
رگرسیون چندگانه Multiple Regression
Lecture note on statistics, data analysis planning – week 14 Elspeth Slayter, M.S.W., Ph.D.
Stats Methods at IC Lecture 3: Regression.
Regression and Correlation
Day 4 – Slop-Intercept Form
Let’s Get It Straight! Re-expressing Data Curvilinear Regression
Political Science 30: Political Inquiry
Chapter 5 STATISTICS (PART 4).
STAT 250 Dr. Kari Lock Morgan
QM222 Class 14 Today’s New topic: What if the Dependent Variable is a Dummy Variable? QM222 Fall 2017 Section A1.
Journal Heidi asked 4 people their height and shoe size. Below are the results. 63 inches inches inches inches 8 She concluded that.
Correlation and the Pearson r
Inferential Statistics
Ch 4.1 & 4.2 Two dimensions concept
Presentation transcript:

Class 28 Get Ready….

Height and Weight Is CM or Inches the better predictor of KG? Whichever has the lower standard error Will also have a variety of better stats NOT whichever has the bigger coefficient A multiple regression lets you test H0: all b’s = 0 (nothing in the model matters) H0: b1=0 given all the other b’s When using both CM and INCHES We reject H0 b1=b2=0 We fail to reject H0 b1=0 given b2 We fail to reject H0 b2=0 given b1 You need either CM or INCHES but not both Because they are highly correlated Regressions ALWAYS go thru the sample averages

Things I expect you will know How to interpret a regression using p-1 dummy variables The p possible forecasts will equal the sample average Y for each of the p groups The intercept is the average of the left-out group The coefficients are differences in group averages. The p-value/significance F will match that from ANOVA single factor

Things I expect you will know How to interpret a residual (error) It is Y - 𝑌 It is the distance each Y is from the line. Positive means above the line. They measure the difference between actual Y and expected Y (based on the X’s) The most over-weight girl (for her height) is the girl with the largest positive residual. Check the box to get residuals.

Things I expect you will know How to interpret a coefficient in a multiple regression. It measures the change in expected Y for a unit change in that X keeping all other Xs constant. If I keep miles and stops constant and change from williams to spencer, expect 0.97 hours less. If I change from Williams to Spencer, expect 0.33 hours more. It is the easy way to answer some questions. If the previous rating goes from 17.5 to 20, how will the expected ratings change? (by 0.18571 per point)

Things I expect you will know How to use a regression model to calculate a point forecast. Plug and chug. I use SUMPRODUCT You must know what Xs to plug in. It is a package deal….you must know and plug in ALL the Xs.

Things I expect you will know How to use a regression model to calculate a probability. The question gives you the Y. You Plug and chug to get the 𝑌 . You calculate t = (Y - 𝑌 )/ standard error Use t.dist.rt( t , dof) Dof is n – total number of regression terms. Requires the FOUR assumptions.

Things I expect you will know If the coefficient of X1 changes when X2 is included in the model….. You know X1 and X2 are correlated. You can use the two regression results to tell whether X1 and X2 are positively or negatively correlated. Ds was positively correlated with Miles Fact was negatively correlated with Stars Nobel was positively correlated with Yanks Speed was positively correlated with Dcorporate Exam 1 was negatively correlated with Exam 2.

Oh…Fact Movies had fewer Stars! UNDERSTANDING Coefficient Regression Table Constant 13.24615 Fact 1.40107 Coefficient Regression Table Constant 12.568 Fact 1.799 Stars 1.259 Oh…Fact Movies had fewer Stars!

Oh…Fact Movies had fewer Stars! Secret Formula Coefficient Regression Table Constant 13.24615 Fact 1.40107 Coefficient Regression Table Constant 12.568 Fact 1.799 Stars 1.259 Regress Y on X1 𝑐 = 𝑏 − 𝑏 1 𝑏 2 Regress Y on X1 and X2 Oh…Fact Movies had fewer Stars! Regress Y on X1 and X2 Regress X2 on X1

Secret Formula 𝑐 = 1.40−1.80 1.26 𝑐 =−0.32 Regress Y on X1 Coefficient Regression Table Constant 13.24615 Fact 1.40107 Coefficient Regression Table Constant 12.568 Fact 1.799 Stars 1.259 Regress Y on X1 Regress Y on X1 and X2 𝑐 = 1.40−1.80 1.26 Regress Y on X1 and X2 𝑐 =−0.32 Regress X2 on X1

Oh…Fact Movies had fewer Stars! UNDDERSTANDING Coefficient Regression Table Constant 13.24615 Fact 1.40107 Coefficient Regression Table Constant 12.568 Fact 1.799 Stars 1.259 Oh…Fact Movies had fewer Stars!

Fact Movies averaged 0.32 fewer Stars! UNDERSTANDING Secret Formula Coefficient Regression Table Constant 13.24615 Fact 1.40107 Coefficient Regression Table Constant 12.568 Fact 1.799 Stars 1.259 Fact Movies averaged 0.32 fewer Stars!

Regression is the line through a cloud of points Scatter-plot the cloud It is up to YOU to interpret the results. Don’t assume X causes Y Y might be causing X Both might be caused by Z Don’t assume better fitting lines are better at forecasting They usually are not…..too good a fit means too complicated a model…..means poorer performance.

Class 28 Assignment Variable School Graduation Rate % of Classes Under 20 Student/Faculty Ratio Alumni Giving Rate Description The name of the University Percentage of enrollees who graduate Percentage of Classes offered with <= 20 students. Number of students enrolled divided by total number of faculty Percentage of living alumni who gave to the University in 2000 Mean   83.042 55.729 11.542 29.271 Median 83.5 59.5 10.5 29 Mode 92 65 13 Standard Deviation 8.607 13.194 4.851 13.441 Skewness -0.282 -0.501 0.582 0.370 Minimum 66 3 7 Maximum 97 77 23 67 Count 48

Regress Giving Rate on Grad Rate Check if coeff is positive 1. Test the hypothesis that graduation rate and alumni giving rate are (linearly) independent. We expect universities with higher graduation rates to have higher mean giving rates. [15 points] Regress Giving Rate on Grad Rate Check if coeff is positive Divide reported p-value (found in two places) by 2. Reject if less than 0.05.   Coefficients Standard Error t Stat P-value Intercept -68.76 12.58 -5.46 1.82E-06 Graduation Rate 1.18 0.15 7.83 5.24E-10

2. If the graduation rate of school A is 5 percentage points higher than that of school B, how much higher do we expect school A’s giving rate to be? [10 points] Using the above regression (graduation rate is all we know), the expected giving rate will be 1.18*5 = 5.9 percentage points higher for school A.

3. If you learn that A and B above have identical student to faculty ratios, what is your revised answer to question 2? Be certain to explain why it went up (if it went up) or why it went down (if it went down) or why it stayed the same. Direct your response to a university administrator. [15 points]   Coefficients Standard Error t Stat P-value Intercept -19.10631 15.55006 -1.22870 0.22557 Graduation Rate 0.75574 0.16023 4.71669 0.00002 Student/Faculty Ratio -1.24595 0.28430 -4.38250 0.00007 IF we keep SFR constant, expected Giving Rate goes up 0.76 points per point of graduation rate. If we don’t keep SFR constant, expected Giving Rates went up 1.18 points per point. Schools with higher grad rates had LOWER SFR (that makes sense) If we don’t hold SFR constant, increases in grad rate mean decreases in SFR and the combined effect of the two is 1.18. So….if grad rate is higher (but SFR is not), expected 0.76 increase. If grad rate is higher (and SFR is lower as in the data), expect 1.18 increase.

Don’t Use this variable. 4. Provide a point forecast of alumni giving rate for a university with graduation rate of 80, 65 percent of its classes with 20 or fewer students, and a student/faculty ratio of 20. [25 points]   Coefficients Standard Error t Stat P-value Intercept -20.7201 17.5214 -1.1826 0.2433 Graduation Rate 0.7482 0.1660 4.5082 0.0000 % of Classes Under 20 0.0290 0.1393 0.2084 0.8358 Student/Faculty Ratio -1.1920 0.3867 -3.0823 0.0035 Don’t Use this variable.   Coefficients Intercept -19.10631 Graduation Rate 0.75574 Student/Faculty Ratio -1.24595 1 80 20 POINT FORECAST 16.43 Use this model. Plug and Chug. The best model includes Grad Rate and SFR (% classes <20 not needed)

The university with the most negative residual. 5. Of the 48 universities in the data set, which one has the most surprisingly low alumni giving rate? [10 points] The university with the most negative residual. Use the best model, ask for residuals, find the minimum. MICHIGAN!

ANOVA or Regression of SFR on 2 dummies. 6. Bo notices that some of the 48 have “university” in their names, some have “college” and the rest have “institute”. Bo wonders whether these names are predictive of student/faculty ratio? (Formulate and test a relevant hypothesis.) [25 points] Three groups (p=3) ANOVA or Regression of SFR on 2 dummies. SUMMARY OUTPUT ANOVA   df SS MS F Significance F Regression 2 103.7348 51.8674 2.3290 0.1090 Residual 45 1002.1818 22.2707 Total 47 1105.9167 Coefficients Standard Error t Stat P-value Intercept 11.8636 0.7114 16.6754 0.0000 Dcollege -0.3636 3.4120 -0.1066 0.9156 Dinstitute -7.3636 -2.1582 0.0363

Get Ready….. More practice problems (answers) on website. I’ll host Sunday night Office Hours. I am available Monday and Tuesday until 2pm. Email pfeiferp@virginia.edu Check the website to see where I am…you are welcome to join us.