Comparison of Regularization Penalties Pt.2 NCSU Statistical Learning Group Will Burton Oct. 3 2014.

Slides:



Advertisements
Similar presentations
Qualitative predictor variables
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Regularization David Kauchak CS 451 – Fall 2013.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Polynomial Curve Fitting BITS C464/BITS F464 Navneet Goyal Department of Computer Science, BITS-Pilani, Pilani Campus, India.
Pattern Recognition and Machine Learning
Model generalization Test error Bias, variance and complexity
Model Assessment and Selection
Model Assessment, Selection and Averaging
Model assessment and cross-validation - overview
Chapter 2: Lasso for linear models
P M V Subbarao Professor Mechanical Engineering Department
The loss function, the normal equation,
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Intro to Statistics for the Behavioral Sciences PSYC 1900
BIOST 536 Lecture 9 1 Lecture 9 – Prediction and Association example Low birth weight dataset Consider a prediction model for low birth weight (< 2500.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Lasso regression. The Goals of Model Selection Model selection: Choosing the approximate best model by estimating the performance of various models Goals.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Classification and Prediction: Regression Analysis
Ensemble Learning (2), Tree and Forest
Radial Basis Function Networks
Review Guess the correlation. A.-2.0 B.-0.9 C.-0.1 D.0.1 E.0.9.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
PATTERN RECOGNITION AND MACHINE LEARNING
STATISTICS: BASICS Aswath Damodaran 1. 2 The role of statistics Aswath Damodaran 2  When you are given lots of data, and especially when that data is.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Statistical Analysis. Statistics u Description –Describes the data –Mean –Median –Mode u Inferential –Allows prediction from the sample to the population.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Interpreting multivariate OLS and logit coefficients Jane E. Miller, PhD.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
INTRODUCTION TO Machine Learning 3rd Edition
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Overview of Regression Analysis. Conditional Mean We all know what a mean or average is. E.g. The mean annual earnings for year old working males.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
CpSc 881: Machine Learning
Chapter1: Introduction Chapter2: Overview of Supervised Learning
A first order model with one binary and one quantitative predictor variable.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning 5. Parametric Methods.
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016.
Lecture 3 (Chapter 4). Linear Models for Longitudinal Data Linear Regression Model (Review) Ordinary Least Squares (OLS) Maximum Likelihood Estimation.
Global predictors of regression fidelity A single number to characterize the overall quality of the surrogate. Equivalence measures –Coefficient of multiple.
Regularized Least-Squares and Convex Optimization.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the.
Model Comparison. Assessing alternative models We don’t ask “Is the model right or wrong?” We ask “Do the data support a model more than a competing model?”
Advanced Quantitative Techniques
Generalized regression techniques
CSE 4705 Artificial Intelligence
Special Topics In Scientific Computing
Bias and Variance of the Estimator
CHAPTER 29: Multiple Regression*
L. Isella, A. Karvounaraki (JRC) D. Karlis (AUEB)
CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off
What is Regression Analysis?
Linear Model Selection and regularization
Pattern Recognition and Machine Learning
Contact: Machine Learning – (Linear) Regression Wilson Mckerrow (Fenyo lab postdoc) Contact:
The loss function, the normal equation,
Support Vector Machine I
Basis Expansions and Generalized Additive Models (1)
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

Comparison of Regularization Penalties Pt.2 NCSU Statistical Learning Group Will Burton Oct

The goal of regularization is to minimize some loss function (commonly sum of squared errors) while preventing -Overfitting (high variance, low bias) the model on the training data set. and being careful not to cause -Underfitting (low variance, high bias) Review

Underfitting vs Overfitting High Error that comes from approximating a real life problem by a simpler model Optimal amount of bias and variance How much would the function change using a different training data set

Review cont. Regularization resolves the overfitting problem by applying a penalty to coefficients in the loss function, preventing them from too closely matching the training data set. There are many different regularization penalties that can be applied according to the data type.

Past Penalties

Additional Penalties Grouped Lasso Motivation: In some problems, the predictors belong to pre-defined groups. In this situation it may be desirable to shrink and select the members of a group together. The grouped Lasso achieves this. ex. Birth weight predicted by the mother’s: Age, Age^2, Age^3 Weight, Weight^2, Weight^3

Grouped Lasso Minimize Where (Euclidean Norm) L = The number of groups, p = number of predictors in each group

Grouped Lasso Group Lasso uses a similar penalty to Lasso but now instead of penalizing one coefficient, it penalizes a group of coefficients

Example-Group Lasso Predict birth weights based on Mothers Age (polynomials of 1 st 2 nd and 3 rd degree) Mothers Weight (polynomials of 1 st 2 nd and 3 rd degree) Race: white or black indicator functions Smoke: smoking status Number of previous premature labors History of hypertension Presence of uterine irritability Number of physician visits during 1 st trimester

Data Structure Used R package “grpreg”, model <- grpreg(X,y,groups,penalty = “grLasso”)

Lasso Fit

Grouped Lasso Fit

Grouped Lasso Lasso

Predictions Versus Actual Weights

Other Penalties Adaptive Lasso Motivation: In order for Lasso to select the correct model it must assume that relevant predictors can’t be too correlated with irrelevant predictors. Lasso has a hard time determining which predictor to eliminate, and may eliminate the relevant while keeping the irrelevant predictor.

Adaptive Lasso Minimize Where weights are functions of the coefficient B j:, B is the OLS estimate, and v > 0

How it works Calculate OLS B’s Calculate w j ’s Apply w j ’s to penalty to find new B’s Idea: 1)A high Beta from OLS gives low weight; A Low Beta gives high weight 2) Low weight = lower penalty; High weight = high penalty

In appearance, Adaptive Lasso looks similar to Lasso, the only difference is now better predictors need a higher lambda to be eliminated, and poor predictors need a lower lambda to be eliminated

Simulation To determine if the LASSO or Adaptive LASSO is better at finding the "true" structure of the model a Monte Carlo simulation was done. The true model was y = 3x1+1.5x2+ 0x3+ 0x4 + 2x5 + 0x6 + 0x7 + 0x8

Correlation of X’s Cor(X's) = Auto regressive correlation structure with rho=0.8

Data was generated from this true model X's from a multivariate normal model Random errors were added with mean 0 and sd=3 Lasso, ADLasso, and OLS were fit. Process repeated 500 times for n=20, 100 Average and median prediction error reported along with whether or not correct structure (oracle) was selected

Simulation Results n=20 Mean PE SE Median PE Oracle OLS LASSO ADLASSO n=100 Mean PE SE Median PE Oracle OLS LASSO ADLASSO

Summary Covered the basics of regularization as well as 5 different penalty choices: Lasso, Ridge, Elastic, Grouped Lasso, and Adaptive Lasso. We have finished the regularization section and Neal will take over next October 17 th with an overview of classification