 # Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.

## Presentation on theme: "Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null."— Presentation transcript:

Stat 217 – Day 25 Regression

Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null hypothesis:  1 =  2 = … =  I (no association between the two variables)  Alternative hypothesis: at least one  differs (there is an association between the two variables)

Example (with 3 groups…) Not significant Significant

How?  Compare differences in means vs. the natural variability in the data (s)  Compare test statistic to F distribution, p-value  Output: test statistic, p-value, ratio of variability between groups to variability within groups Demo Strong evidence (p-value =.03 <.05) that the type of disability affected the ratings, on average, of these 70 students

Technical Conditions Technical conditions:  Randomness: random sampling or random assignment  Sample sizes: Normal populations  Equal standard deviations: Check ratio of sample standard deviations  Kinda need same shape and spread for a comparison of just means to be reasonable

Technical Conditions 1) Randomness Random assignment 2) Each population follows a normal distribution 3) Each population has the same standard deviation 1.794/1.482 < 2

Summary: Comparing several groups Categorical response H 0 :  1 =  2 = … =  I Ha: at least one  differs Is test statistic large?  Chi-square test Expands 2 sample z-test Quantitative response H 0 :  1 =  2 = … =  I Ha: at least one  differs Is test statistic large?  ANOVA Expands 2 sample t-test No association between variables Is an association between variables

Exam 2 comments Pet owners and CPR (a) Make sure interpret the calculated interval “55% of pet owners “ – sample or population? (b) Technical conditions Using the ones for categorical data (c) See whether.5 is inside CI (d) Interpretation of p-value: chance of data at least this extreme if null hypothesis is true (e) Why is sample size information important? Sampling variability

Exam 2 comments Anchoring (a) Make sure clear which is which (b) “TC met”, TOS applet with 2 means (c) Chicago average estimate is 51K to 1.6 million higher than Green Bay average (direction!) (d) What does it mean to say it’s significant? What is the actual conclusion to the research question

Exam 2 comments Multiple choice 1. B 2. C – either is possible 3. B – small p-value eliminates “random chance” as a plausible explanation 4. B – it’s only unusual if she’s guessing (7s and 11s are only unusual for fair dice) Extra Credit  More likely to get a value far from mean with smaller sample size (e.g., n =1)

Next Topic: Two quantitative variables Graphical summary Numerical summary Model to allow predictions Inference beyond sample data

Activity 26-1 (p. 532) Have a sample of 20 homes for sale in Arroyo Grande in 2007  Variable 1 = house price  Variable 2 = house size Is there a relationship between these 2 variables?  Does knowing the house size help us predict its price?

1) Graphical summary: scatterplot Price vs. size 1. Direction Positive or negative? 2. Strength How closely follow the pattern 3. Form Linear?

Describing Scatterplots Activity 26-3 (p. 536) Positive  None  Negative Strong  Weak  Strong Direction Strength Form: Linear or not

2) Numerical summary: Correlation coefficient (Act 27-1).994.889.510 -.081 -.450 -.721 -.907

Temperatures vs. Month Direction: positive then negative Form: nonlinear Strength: very strong r =.257

Example 1: Price vs. Size r =.780 What do you learn from these numerical and graphical summaries?

Turn in, with partner  Activity 26-6 parts b, c, and e For Thursday  Pre-lab for Lab 9 For Monday  Activity 26-7  HW 7

2) Guess the correlation Applet

3) Model IF it is linear, what line best summarizes the relationship?  Demo Demo Moral: The “least squares regression line” minimizes the sum of the squared residuals

Interpreting the equation (p. 577)  a = intercept, b = slope  Slope = predicted change in response associated with a one-unit increase in the explanatory  Intercept = predicted value of response when explanatory variable = 0 Explanatory variable Response variable

3) Model? Price-hat = 265222 + 169  size  Slope = each additional square foot in house size is associated with a \$169 increase in predicted price (price per foot) Be a little careful here, don’t sound too “causal” I really do like the “predicted” in here  Intercept = a house of size zero (empty lot?) is predicted to cost \$265,222 Be a little careful here, don’t have any houses in data set with size near 0…

Using the model Price-hat = 265222 + 169  size  Predicted price for a 1250 square foot house?  Predicted price for a 3000 square foot house? Extrapolation: Very risky to use regression equation to predict values far outside the range of x values used to derive the line!

4) Is this relationship statistically significant? Is it possible there is no relationship between house price and size in the population of all homes for sale at that time, and we just happened to coincidently obtain this relationship in our random sample? Or is this relationship strong enough to convince us it didn’t happen just by chance but reflects a genuine relationship in the population?

p. 605 Let  represent the slope of the population regression line H 0 :  = 0; no relationship between price and size in population H a :  ≠ 0; is a relationship  positive Idea: Want to compare the observed sample slope to zero, does it differ more than we would expect by chance?

Assume  = 0 How many standard deviations away? Variation in sample slopes  Sample slopes  our slope? Standard error = SE(b) 169

Minitab The regression equation is Price = 265222 + 169 Size (sq ft) Predictor Coef SE Coef T P Constant 265222 42642 6.22 0.000 Size (sq ft) 168.59 31.88 5.29 0.000 Regression equation (add hat) b a SE(b) Two-sided t=(observed slope-hypothesized slope) standard error of slope = (b – 0)/SE(b) = (168.59-0)/31.88 = 5.29

Turn in, with partner  Price vs. pages: Interpret slope/evaluate p-value For Tuesday  Activities 26-7, 28-5  Be working on Lab 9 and HW 7 The regression equation is Price = - 3.4 + 0.147 Pages Predictor Coef SE Coef T P Constant -3.42 10.46 -0.33 0.746 Pages 0.14733 0.01925 7.65 0.000

Describing Scatterplots Activity 26-6 (p. 539) Positive, nonlinear, fairly strong Causation? Strength: How closely do the points follow the pattern? Direction Strength Form: Linear or not

For Monday Activities 26-7, 28-5 Be working on Lab 9 and HW 7

Download ppt "Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null."

Similar presentations