Regression Eric Feigelson Lecture and R tutorial Arcetri Observatory April 2014.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Brief introduction on Logistic Regression
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Hypothesis Testing Steps in Hypothesis Testing:
Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Ch11 Curve Fitting Dr. Deshi Ye
Correlation and regression
Copyright © 2008 by the McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Managerial Economics, 9e Managerial Economics Thomas Maurice.
Maximum likelihood (ML) and likelihood ratio (LR) test
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Chapter 10 Simple Regression.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Elementary hypothesis testing
Correlation. Two variables: Which test? X Y Contingency analysis t-test Logistic regression Correlation Regression.
Bivariate Statistics GTECH 201 Lecture 17. Overview of Today’s Topic Two-Sample Difference of Means Test Matched Pairs (Dependent Sample) Tests Chi-Square.
Chapter 11 Multiple Regression.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Inferences About Process Quality
Linear and generalised linear models
Business Statistics - QBM117 Statistical inference for regression.
Maximum likelihood (ML)
Introduction to Regression Analysis, Chapter 13,
Regression Eric Feigelson. Classical regression model ``The expectation (mean) of the dependent (response) variable Y for a given value of the independent.
Linear Regression/Correlation
Variance and covariance Sums of squares General linear models.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
CIS 2033 based on Dekking et al. A Modern Introduction to Probability and Statistics Instructor Longin Jan Latecki C22: The Method of Least Squares.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
Machine Learning 5. Parametric Methods.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Chapter 9 Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis.
Review of statistical modeling and probability theory Alan Moses ML4bio.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
Computacion Inteligente Least-Square Methods for System Identification.
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Inference about the slope parameter and correlation
Chapter 4: Basic Estimation Techniques
Chapter 4 Basic Estimation Techniques
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
REGRESSION G&W p
Probability Theory and Parameter Estimation I
Basic Estimation Techniques
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Active Learning Lecture Slides
BIVARIATE REGRESSION AND CORRELATION
Simple Linear Regression - Introduction
Basic Estimation Techniques
CHAPTER 29: Multiple Regression*
CONCEPTS OF ESTIMATION
Modelling data and curve fitting
The Regression Model Suppose we wish to estimate the parameters of the following relationship: A common method is to choose parameters to minimise the.
Linear regression Fitting a straight line to observations.
Simple Linear Regression
Statistics II: An Overview of Statistics
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

Regression Eric Feigelson Lecture and R tutorial Arcetri Observatory April 2014

Regression vs. density estimation Density estimation is nonparametric: no functional form for the shape of the distribution, or relationship between the variables, is assumed. It is usually applied to 1- or 2-dimensional problems. Regression differs in two respects: It addresses problems where one seeks to understand the dependency of a specified response variable Y on one (or more) independent variables X (or X). It addresses problems of modeling where the functional form of the relationship between the variables is specified in advance. The function has parameters, and the goal of the regression is to find the `best parameter values that `fit the data. Astronomers perform regressions with heuristic functions (e.g. power laws) and with functions from astrophysical theory.

Classical regression model: ``The expectation (mean) of the dependent (response) variable Y for a given value of the independent variable X (or X) is equal to a specified function f, which depends on both X and a vector of parameters, plus a random error (scatter). The `error is commonly assumed to be a normal (Gaussian) i.i.d. random variable with zero mean, = N(, 2 ) = N(0, 2 ). Note that all of the randomness is in this error term; the functional relationship is deterministic with a known mathematical form.

Astronomers may be using classical regression too often, perhaps due to its familiarity compared to other (e.g. nonparametric) statistical methods. If there is no basis for choosing a functional form (e.g. an astrophysical theory), then nonparametric density estimation may be more appropriate than regression using a heuristic function. If there is no basis for choosing the dependency relationship (i.e. that Y depends on X, rather than X on Y or both on some hidden variables), then a form of regression that treats the variables symmetrically should be used (e.g. OLS bisector). Warning

The error term There may be different causes of the scatter: It could be intrinsic to the underlying population (`equation error). This is called a `structural regression model. It may arise from an imperfect measurement process (`measurement error) and the true Y exactly satisfy Y=f(X). This is called a `functional regression model. Or both intrinsic and measurement errors may be present. Astronomers encounter all of these situations, and we will investigate different error models soon.

Parameter estimation & model selection Once a mathematical model is chosen, and a dataset is provided, then the `best fit parameters are estimated by one (or more) of the techniques discussed in MSMA Chpt. 3: Method of moments Least squares (L 2 ) Least absolute deviation (L 1 ) Maximum likelihood regression Bayesian inference Choices must be made regarding model complexity and parsimony (Occams Razor): Does the CDM model have a w-dot term? Are three or four planets orbiting the star? Is the isothermal sphere truncated? Model selection methods are often needed:, BIC, AIC, … The final model should be validated against the dataset (or other datasets) using goodness-of-fit tests (e.g. Anderson-Darling test) with bootstrap resamples for significance levels.

Important! In statistical parlance, `linear means `linear in the parameters i, not `linear in the variable X. Examples of linear regression functions: 1 st order polynomial high order polynomial exponential decay periodic sinusoid with fixed phase

Examples of non-linear regression functions: power law (Pareto) isothermal sphere sinusoid with arbitrary phase segmented linear

Assumptions of OLS linear regression The model is correctly specified (i.e. the population truly follows the specified relationship) The errors have (conditional) mean zero: E[ |X] = E[ The errors are homoscedastic, E[ i |X] = 2, and uncorrelated, E[ i j ] = 0 (i j) For some purposes, assume the errors are normally distributed, |X ~ N(0, 2 ) For some purposes, assume the data are i.i.d., (x i,y i ) are independent from (x j,y j ) but share the same distribution For multivariate covariates X=(X 1, X 2,…, X p ), some additional assumptions: – X i … X p are linearly independent – The matrix E[X i,X i ] is positive-definite OLS gives the maximum likelihood estimator For linear regression

A very common astronomical regression procedure Dataset of the form: (bivariate data with heteroscedastic measurement errors with known variances) Linear (or other) regression model: Best fit parameters from minimizing the function: The distributions of the parameters are estimated from tables of the distribution (e.g. around best-fit parameter values for 90% confidence interval for 1 degree of freedom; Numerical Recipes, Fig ) This procedure is often called `minimum chi-square regression or `chi- square fitting because a random variable that is the sum of squared normal random variables follows a distribution. If all of the variance in Y is attributable to the known measurement errors and these errors are normally distributed, then the model is valid.

However, from a statistical viewpoint …. … this is a non-standard procedure! Pearsons (1900) chi-square statistic was developed for a very specific problem: hypothesis testing for the multinomial experiment: contingency table of counts in a pre- determined number of categories. where O i are the observed counts, and M i are the model counts dependent on the p model parameters The weights (denominator) are completely different than in the astronomers procedure. A better approach uses a complicated likelihood that includes the measurement errors & model error, and proceeds with MLE or Bayesian inference. See important article by Brandon C. Kelly, ApJ 2007.