Prediction with Regression

Slides:



Advertisements
Similar presentations
Ordinary Least-Squares
Advertisements

Lecture 4. Linear Models for Regression
Chapter Outline 3.1 Introduction
Edge Preserving Image Restoration using L1 norm
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Chapter 10 Curve Fitting and Regression Analysis
Ch11 Curve Fitting Dr. Deshi Ye
Model Assessment, Selection and Averaging
Chapter 2: Lasso for linear models
The General Linear Model. The Simple Linear Model Linear Regression.
A Short Introduction to Curve Fitting and Regression by Brad Morantz
Data mining and statistical learning - lecture 6
Visual Recognition Tutorial
Coefficient Path Algorithms Karl Sjöstrand Informatics and Mathematical Modelling, DTU.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Chapter 10 Simple Regression.
Chapter 3 Simple Regression. What is in this Chapter? This chapter starts with a linear regression model with one explanatory variable, and states the.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Petter Mostad Linear regression Petter Mostad
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
1 Linear Methods for Regression Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Ordinary least squares regression (OLS)
Linear and generalised linear models
Basics of regression analysis
Classification and Prediction: Regression Analysis
Review of Lecture Two Linear Regression Normal Equation
Objectives of Multiple Regression
PATTERN RECOGNITION AND MACHINE LEARNING
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 14 Introduction to Regression Bastian Leibe.
Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul 11 January 2010.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
R EGRESSION S HRINKAGE AND S ELECTION VIA THE L ASSO Author: Robert Tibshirani Journal of the Royal Statistical Society 1996 Presentation: Tinglin Liu.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
CpSc 881: Machine Learning
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Psychology 202a Advanced Psychological Statistics October 22, 2015.
Machine Learning 5. Parametric Methods.
Regression. We have talked about regression problems before, as the problem of estimating the mapping f(x) between an independent variable x and a dependent.
Curve Fitting Introduction Least-Squares Regression Linear Regression Polynomial Regression Multiple Linear Regression Today’s class Numerical Methods.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Ridge Regression: Biased Estimation for Nonorthogonal Problems by A.E. Hoerl and R.W. Kennard Regression Shrinkage and Selection via the Lasso by Robert.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Computacion Inteligente Least-Square Methods for System Identification.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
1 AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Part II: Theory and Estimation of Regression Models Chapter 5: Simple Regression Theory.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Chapter 7. Classification and Prediction
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Probability Theory and Parameter Estimation I
Boosting and Additive Trees (2)
Ch3: Model Building through Regression
CSE 4705 Artificial Intelligence
Roberto Battiti, Mauro Brunato
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
What is Regression Analysis?
OVERVIEW OF LINEAR MODELS
Simple Linear Regression
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

Prediction with Regression An Introduction to Linear Regression and Shrinkage Methods Ehsan Khoddam Mohammadi

Outline Prediction Estimation Bias Variance Trade-Off Regression Ordinary Least square Ridge regression Lasso

Prediction definition set of inputs: X1, X2, …, Xp the output: Y We want to analyze the relationship between these variables (interpretation) We want to estimate output based on inputs (prediction)

Prediction same concept in different literatures Machine learning: supervised learning Finance: forecasting Politics: prediction Estimation theory: function approximation Statistics ML Economy X1, X2, …, Xn Predictors Features Independent variables Y Response Class Dependent variables

Regression Why? Well-performed and accurate in both Interpretation and Prediction Strong fundamental in math, statistics and computation Many modern and advanced methods are based on Regression, even they are variant of regression New methods are still invented for regression: Nobel prize are still given to investigations in regression, Hot topic Could be formulated as optimization problem: that’s the reason I choose it for this class, it’s more related to subject of class than any other methods I’ve known for prediction

Regression classification Linear Regression Least square Best sub-sut Selection, Regression with feature selection Stepwise Regression Shrinkage regularization for Regression: Ridge Regression Lasso Regression Non-Linear Regression Numerical Data fitting ANN Discrete regression Logistic Regression

Before proceeding with regression Let’s investigate on some statistical property of ESTIMATION

Estimating the parameter assume that we have iid (identically independent distributed) samples X1, . . . ,Xn with unknown distribution. Estimating p.d.f of them is too hard in many situations, Instead of that, We want to estimate a parameter θ . is estimation of θ, it is function of X1, . . . ,Xn .

Bias-Variance dilemma Definition 1 : The bias of an estimator is . If it is 0, the estimator is said to be unbiased. Definition 2 : The mean squared error (MSE) of an estimator is . An interesting equation: What does it really mean?

[Image from “More on Regularization and (Generalized) Ridge Operators”, Takane,(2007)]

Test and training error as a function of model complexity. [ Image from “The Elements of Statistical Learning”,Second Edition, Hastie et al. (2008)]

Linear Regression Model Set of training data : Linear Regression model: Real-valued coefficients β need to be estimated

Linear Regression Least square Most popular estimation method Minimize the Residual Sum of Squares: How do we minimize it?

Linear Regression Least square Let’s rewrite last formula in this form: Quadratic function (not a point here but we shall use this property later) Differentiating respect to β and set it to zero: Unique Solution: ; Under which assumptions we could obtain unique solution?

Linear Regression Least square, Assumptions X should be full-rank, hence is p.d and invertible, unique solution could be obtained In another word, features vectors should be linearly independent or uncorrelated What will be happened to β if X would be non-full-rank matrix or some features would be highly correlated?

Linear Regression Least square, flaws Low bias but High variance: and one could estimate Var(y) by: It’s hard to find meaning-full relation if we have too many features. What would you recommend to solve these problems?

Linear Regression Improvements Model Selection (Feature Selection): Best-Subset Selection (Branch and Leap , Furnival (1974)) Step-wise Selection (Greedy approach, sub-optimal but preferred) mRMR (using mutual information criterion for selection) Shrinkage Methods: impose constraint on β Ridge Regression Lasso Regression

Ridge Regression When you have a problem want to be solved in statistics, There is always a Russian statistician waiting for you to solve it. (Be careful! just in statistics I guarantee , they will betray you in any other situations) Andrey Nikolayevich Tychonoff provides a Tikhonov (!!!) regularization for ill-posed problems , Also known as Ridge Regression in statistics.

Ridge Regression first attempt Remember this?: Tychonoff added a term to avoid singularity and changed above formula to this: Now, the inverse could be computed even if Is not of full-rank, Also β is still linear function of y. Every thing start from above formula but now we have better point of view than Tychonoff, let’s take a look!

Ridge Regression better motivation To avoid high variance of β we just impose a constraint on it, our problem is now an optimization problem with constraints.

Even better representation: using lagrangian form Or again even better Even better representation: using lagrangian form Or again even better! in matrix representation form, we could differentiate this formula and set it to zero Could you guess the solution? Could you find a relation between β and βridge when inputs are orthonormal?

LASSO Least Absolute Selection and Shrinkage Operator

LASSO We impose L1-norm constraint on our regression No close form exists, it’s non-linear function of y How could you solve above problem? (hint: ask Mr.Iranmehr!)

LASSO Why? First attempt for usage of L1-norm, show significant results in signal processing, denoising [Chen et al. (1998)] Base method for LAR (new and novel method for regression, not covered here) [Efron et al. (2004)] Good for Sparse model selection where p>N [Donoho (2006b)]

REFERENCES “The Elements of Statistical Learning”, Second Edition, Hastie et al. , 2008 “More on Regularization and (Generalized) Ridge Operators”, Takane, 2007 “Bias, Variance and MSE of Estimators”, Guy Lebanon, 2004 “Least Squares Optimization with L1-Norm Regularization”, Mark Schmidt, 2005 “Regularization: Ridge Regression and the LASSO”, Tibshirani, 2006