Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Chapter Outline 3.1 Introduction
Probabilistic analog of clustering: mixture models
Prediction with Regression
CHAPTER 13 M ODELING C ONSIDERATIONS AND S TATISTICAL I NFORMATION “All models are wrong; some are useful.”  George E. P. Box Organization of chapter.
Pattern Recognition and Machine Learning
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Model generalization Test error Bias, variance and complexity
Model Assessment and Selection
Model Assessment, Selection and Averaging
Model assessment and cross-validation - overview
Chapter 2: Lasso for linear models
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Chapter 10 Simple Regression.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Prediction and model selection
Chapter 11 Multiple Regression.
Ordinary least squares regression (OLS)
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Basics of regression analysis
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Statistics 200b. Chapter 5. Chapter 4: inference via likelihood now Chapter 5: applications to particular situations.
Classification and Prediction: Regression Analysis
Variance and covariance Sums of squares General linear models.
Objectives of Multiple Regression
Linear methods for regression
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
PATTERN RECOGNITION AND MACHINE LEARNING
沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.
Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul 11 January 2010.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
INTRODUCTION TO Machine Learning 3rd Edition
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Chapter5: Evaluating Hypothesis. 개요 개요 Evaluating the accuracy of hypotheses is fundamental to ML. - to decide whether to use this hypothesis - integral.
R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu.
CpSc 881: Machine Learning
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Machine Learning 5. Parametric Methods.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Computacion Inteligente Least-Square Methods for System Identification.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
PREDICT 422: Practical Machine Learning Module 4: Linear Model Selection and Regularization Lecturer: Nathan Bastian, Section: XXX.
Probability Theory and Parameter Estimation I
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
What is Regression Analysis?
Linear Model Selection and regularization
Generally Discriminant Analysis
Basis Expansions and Generalized Additive Models (1)
Presentation transcript:

Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan

Linear Regression Models Here the X’s might be: Raw predictor variables (continuous or coded-categorical) Transformed predictors (X 4 =log X 3 ) Basis expansions (X 4 =X 3 2, X 5 =X 3 3, etc.) Interactions (X 4 =X 2 X 3 ) Popular choice for estimation is least squares:

Least Squares Often assume that the Y’s are independent and normally distributed, leading to various classical statistical tests and confidence intervals hat matrix

The least squares estimate of  is: If the linear model is correct, this estimate is unbiased (X fixed): Gauss-Markov states that for any other linear unbiased estimator : Of course, there might be a biased estimator with lower MSE… Gauss-Markov Theorem Consider any linear combination of the  ’s:

bias-variance For any estimator : bias Note MSE closely related to prediction error:

Too Many Predictors? When there are lots of X’s, get models with high variance and prediction suffers. Three “solutions:” 1.Subset selection 2.Shrinkage/Ridge Regression 3.Derived Inputs Score: AIC, BIC, etc. All-subsets + leaps-and-bounds, Stepwise methods,

Subset Selection Standard “all-subsets” finds the subset of size k, k=1,…,p, that minimizes RSS: Choice of subset size requires tradeoff – AIC, BIC, marginal likelihood, cross-validation, etc. “Leaps and bounds” is an efficient algorithm to do all-subsets

Cross-Validation e.g. 10-fold cross-validation:  Randomly divide the data into ten parts  Train model using 9 tenths and compute prediction error on the remaining 1 tenth  Do these for each 1 tenth of the data  Average the 10 prediction error estimates “One standard error rule” pick the simplest model within one standard error of the minimum

Shrinkage Methods Subset selection is a discrete process – individual variables are either in or out This method can have high variance – a different dataset from the same source can result in a totally different model Shrinkage methods allow a variable to be partly included in the model. That is, the variable is included but with a shrunken co-efficient.

Ridge Regression subject to: Equivalently: This leads to: Choose by cross-validation. Predictors should be centered. works even when X T X is singular

effective number of X’s

Ridge Regression = Bayesian Regression

The Lasso subject to: Quadratic programming algorithm needed to solve for the parameter estimates. Choose s via cross-validation. q=0: var. sel. q=1: lasso q=2: ridge Learn q?

function of 1/lambda

has largest sample variance amongst all normalized linear combinations of the columns of X Principal Component Regression Consider a an eigen-decomposition of X T X (and hence the covariance matrix of X): The eigenvectors v j are called the principal components of X D is diagonal with entries d 1 ≥ d 2 ≥… ≥d p has largest sample variance amongst all normalized linear combinations of the columns of X subject to being orthogonal to all the earlier ones

Principal Component Regression PC Regression regresses on the first M principal components where M<p Similar to ridge regression in some respects – see HTF, p.66