Presentation is loading. Please wait.

Presentation is loading. Please wait.

Remember the equation of a line: Basic Linear Regression As scientists, we find it an irresistible temptation to put a straight line though something that.

Similar presentations


Presentation on theme: "Remember the equation of a line: Basic Linear Regression As scientists, we find it an irresistible temptation to put a straight line though something that."— Presentation transcript:

1 Remember the equation of a line: Basic Linear Regression As scientists, we find it an irresistible temptation to put a straight line though something that looks like it needs one… How do we do this in a principled, systematic way??

2 Simple univariate linear model: Basic Linear Regression intercept regression coefficients explanatory or predictor variable response variable error In simple linear regression, error is assumed to be normally distributed with mean 0 and sd σ Estimated from the data

3 Basic Linear Regression x y

4

5 is significant =>keep it!

6 Basic Linear Regression Look Gaussian? Most of these fall in the line?

7 Basic Linear Regression Assumptions of simple univariate linear regression: The x i (and if measured y i ) are independent. The variance of the errors is constant (homoscedastic) and = The errors have mean zero and are normally distributed.

8 Basic Linear Regression Confidence Intervals around a regression line: What simple linear regression is really trying to estimate is a conditional mean: This is on average, what you’d expect the response to be, given the explanatory x i CIs put a confidence interval around the estimated mean at x i. See hypothesis testing slides for definition of confidence. Interval Estimates

9 Basic Linear Regression 95% Confidence Intervals around a regression line: Confidence Intervals

10 Basic Linear Regression Prediction Intervals for linear regression: Sometimes we are interested in predicting a specific future value of Y i given x i instead of an average. PIs have a similar formula as CIs but incorporate more uncertainty. Interval Estimates Operationally, PIs are derived assuming distributions on the underlying unknown mean and sd… So take them with a grain of salt!

11 Basic Linear Regression 95% Prediction Intervals for y i in red, 95% CIs around E(Y i |x i ) in blue, regression line [E(Y i |x i )] in black. Prediction Intervals

12 Basic Linear Regression Tolerance Intervals: Another question we can ask is: “Given the population, where do 90% of the values for y i fall, with 95% confidence?” Interval Estimates A tolerance interval provides the limits within which we expect a specified proportion (p) of of the population to lie, with a specified level of confidence (1-α). In general, these are tricky to compute. We will again use the R package tolerance if they are needed. See http://www.jstatsoft.org/v36/i05/paper and references therein to learn more.http://www.jstatsoft.org/v36/i05/paper

13 Basic Linear Regression 95% Tolerance Intervals for 90% of y i values over the population in red (regression line [E(Y i |x i )] in green). Tolerance Intervals

14 Data may contain non-linear behavior. A bit of Non-Linear Regression There are LOTS of kinds of non-linearity We will examine how to incorporate polynomial terms which is usually all that is needed. Non-linear terms Linear model (Polynomial) Non-Linear model

15 What is a model for x vs. y?? A bit of Non-Linear Regression

16 What is a good model for x vs. y?? Occam's razor (paraphrased): the least complicated model that reasonably explains your observations is probably the best. A bit of Non-Linear Regression Martin Tytell once said: “perfect is the enemy of the good” P. Tytell A perfect or near perfect fit is probably “over- parameterized” and will not predict future observations well.

17 A bit of Non-Linear Regression Try a linear model first Points of interest:

18 A bit of Non-Linear Regression What do the residuals look like on the linear model? Overall, not too bad, but can we do a lot better by adding a little bit more structure?

19 A bit of Non-Linear Regression Next try a quadratic model Points of interest:

20 A bit of Non-Linear Regression Now try a cubic model Points of interest:

21 A bit of Non-Linear Regression Go bananas. Try a quartic model Points of interest:

22 A bit of Non-Linear Regression What is a good model for x vs. y?? Our findings so far: Linear model: R 2 ~ 0.6, All terms significant, Residuals look OK. Quadratic model: R 2 ~ 0.65, All terms significant, Residuals look ehhh. Cubic model: R 2 ~ 0.7, x term NOT significant, Residuals look good. Quartic model: R 2 ~ 0.75, intercept term NOT significant, Residuals look ehhh. Try these for further exploration.

23 A bit of Non-Linear Regression Drop the x term in the cubic model. R 2 went up a little All terms significant Residuals look good

24 A bit of Non-Linear Regression Drop the intercept term in the quartic model. R 2 about the same, but highest so far All terms significant Residuals still look ehhh

25 A bit of Non-Linear Regression Try dropping the x and intercept terms in the quartic model. R 2 a little less, but still high All terms significant Residuals look better

26 A bit of Non-Linear Regression Which is a better model for x vs. y?? Qualitatively, by Tytell principle and Occam's razor I’d go with the cubic fit. Quantitatively, we can be a little more anal-retentive using: Akaike Information Criterion (AIC): The lowest scoring model is the “best”. Bayes Information Criterion (BIC): Same as above. Also an alternative to the AIC. ΔBIC is related to Bayes Factors (i.e. “likelihood ratios”) between models. Cubic Model 2 Quartic Model 3 Quartic Model 4

27 A bit of Non-Linear Regression Which is a better model for x vs. y?? Compute AIC and BIC for each model ΔAIC and ΔBIC for each model Cubic Model 2 or Quartic Model 2 seem best by these criteria. The right answer: true generating mechanism is:


Download ppt "Remember the equation of a line: Basic Linear Regression As scientists, we find it an irresistible temptation to put a straight line though something that."

Similar presentations


Ads by Google