Remember the equation of a line: Basic Linear Regression As scientists, we find it an irresistible temptation to put a straight line though something that.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Lesson 10: Linear Regression and Correlation
Correlation and regression
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Inferences for Regression.
Copyright © 2010 Pearson Education, Inc. Chapter 27 Inferences for Regression.
Inferences for Regression
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Copyright © 2010 Pearson Education, Inc. Slide
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Linear regression models
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Ch11 Curve Fitting Dr. Deshi Ye
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Sample size computations Petter Mostad
Chapter Topics Types of Regression Models
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Regression Diagnostics Checking Assumptions and Data.
Ch. 14: The Multiple Regression Model building
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
Correlation & Regression
Objectives of Multiple Regression
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Chapter 13: Inference in Regression
Chapter 11 Simple Regression
LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 5 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
CPE 619 Simple Linear Regression Models Aleksandar Milenković The LaCASA Laboratory Electrical and Computer Engineering Department The University of Alabama.
Simple Linear Regression Models
Inferences for Regression
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
1 Psych 5510/6510 Chapter 10. Interactions and Polynomial Regression: Models with Products of Continuous Predictors Spring, 2009.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Why Model? Make predictions or forecasts where we don’t have data.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Regression Analysis Part C Confidence Intervals and Hypothesis Testing
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
6. Simple Regression and OLS Estimation Chapter 6 will expand on concepts introduced in Chapter 5 to cover the following: 1) Estimating parameters using.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapter 26 Inferences for Regression. An Example: Body Fat and Waist Size Our chapter example revolves around the relationship between % body fat and.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Linear model. a type of regression analyses statistical method – both the response variable (Y) and the explanatory variable (X) are continuous variables.
Bivariate Regression. Bivariate Regression analyzes the relationship between two variables. Bivariate Regression analyzes the relationship between two.
Estimating standard error using bootstrap
CHAPTER 12 More About Regression
The simple linear regression model and parameter estimation
Statistical Methods For Engineers
CHAPTER 29: Multiple Regression*
Hypothesis testing and Estimation
Simple Linear Regression
Simple Linear Regression
Regression Statistics
CHAPTER 12 More About Regression
Presentation transcript:

Remember the equation of a line: Basic Linear Regression As scientists, we find it an irresistible temptation to put a straight line though something that looks like it needs one… How do we do this in a principled, systematic way??

Simple univariate linear model: Basic Linear Regression intercept regression coefficients explanatory or predictor variable response variable error In simple linear regression, error is assumed to be normally distributed with mean 0 and sd σ Estimated from the data

Basic Linear Regression x y

is significant =>keep it!

Basic Linear Regression Look Gaussian? Most of these fall in the line?

Basic Linear Regression Assumptions of simple univariate linear regression: The x i (and if measured y i ) are independent. The variance of the errors is constant (homoscedastic) and = The errors have mean zero and are normally distributed.

Basic Linear Regression Confidence Intervals around a regression line: What simple linear regression is really trying to estimate is a conditional mean: This is on average, what you’d expect the response to be, given the explanatory x i CIs put a confidence interval around the estimated mean at x i. See hypothesis testing slides for definition of confidence. Interval Estimates

Basic Linear Regression 95% Confidence Intervals around a regression line: Confidence Intervals

Basic Linear Regression Prediction Intervals for linear regression: Sometimes we are interested in predicting a specific future value of Y i given x i instead of an average. PIs have a similar formula as CIs but incorporate more uncertainty. Interval Estimates Operationally, PIs are derived assuming distributions on the underlying unknown mean and sd… So take them with a grain of salt!

Basic Linear Regression 95% Prediction Intervals for y i in red, 95% CIs around E(Y i |x i ) in blue, regression line [E(Y i |x i )] in black. Prediction Intervals

Basic Linear Regression Tolerance Intervals: Another question we can ask is: “Given the population, where do 90% of the values for y i fall, with 95% confidence?” Interval Estimates A tolerance interval provides the limits within which we expect a specified proportion (p) of of the population to lie, with a specified level of confidence (1-α). In general, these are tricky to compute. We will again use the R package tolerance if they are needed. See and references therein to learn more.

Basic Linear Regression 95% Tolerance Intervals for 90% of y i values over the population in red (regression line [E(Y i |x i )] in green). Tolerance Intervals

Data may contain non-linear behavior. A bit of Non-Linear Regression There are LOTS of kinds of non-linearity We will examine how to incorporate polynomial terms which is usually all that is needed. Non-linear terms Linear model (Polynomial) Non-Linear model

What is a model for x vs. y?? A bit of Non-Linear Regression

What is a good model for x vs. y?? Occam's razor (paraphrased): the least complicated model that reasonably explains your observations is probably the best. A bit of Non-Linear Regression Martin Tytell once said: “perfect is the enemy of the good” P. Tytell A perfect or near perfect fit is probably “over- parameterized” and will not predict future observations well.

A bit of Non-Linear Regression Try a linear model first Points of interest:

A bit of Non-Linear Regression What do the residuals look like on the linear model? Overall, not too bad, but can we do a lot better by adding a little bit more structure?

A bit of Non-Linear Regression Next try a quadratic model Points of interest:

A bit of Non-Linear Regression Now try a cubic model Points of interest:

A bit of Non-Linear Regression Go bananas. Try a quartic model Points of interest:

A bit of Non-Linear Regression What is a good model for x vs. y?? Our findings so far: Linear model: R 2 ~ 0.6, All terms significant, Residuals look OK. Quadratic model: R 2 ~ 0.65, All terms significant, Residuals look ehhh. Cubic model: R 2 ~ 0.7, x term NOT significant, Residuals look good. Quartic model: R 2 ~ 0.75, intercept term NOT significant, Residuals look ehhh. Try these for further exploration.

A bit of Non-Linear Regression Drop the x term in the cubic model. R 2 went up a little All terms significant Residuals look good

A bit of Non-Linear Regression Drop the intercept term in the quartic model. R 2 about the same, but highest so far All terms significant Residuals still look ehhh

A bit of Non-Linear Regression Try dropping the x and intercept terms in the quartic model. R 2 a little less, but still high All terms significant Residuals look better

A bit of Non-Linear Regression Which is a better model for x vs. y?? Qualitatively, by Tytell principle and Occam's razor I’d go with the cubic fit. Quantitatively, we can be a little more anal-retentive using: Akaike Information Criterion (AIC): The lowest scoring model is the “best”. Bayes Information Criterion (BIC): Same as above. Also an alternative to the AIC. ΔBIC is related to Bayes Factors (i.e. “likelihood ratios”) between models. Cubic Model 2 Quartic Model 3 Quartic Model 4

A bit of Non-Linear Regression Which is a better model for x vs. y?? Compute AIC and BIC for each model ΔAIC and ΔBIC for each model Cubic Model 2 or Quartic Model 2 seem best by these criteria. The right answer: true generating mechanism is: