Overfitting and Regularization Chapters 11 and 12 on amlbook.com.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Sampling plans for linear regression
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Machine Learning and Data Mining Linear regression
Regularization David Kauchak CS 451 – Fall 2013.
Regression “A new perspective on freedom” TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A AAA A A.
Polynomial Curve Fitting BITS C464/BITS F464 Navneet Goyal Department of Computer Science, BITS-Pilani, Pilani Campus, India.
Pattern Recognition and Machine Learning
Model Assessment and Selection
Data mining and statistical learning - lecture 6
Data mining in 1D: curve fitting
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Design of Engineering Experiments - Experiments with Random Factors
Feasibility of learning: the issues solution for infinite hypothesis sets VC generalization bound (mostly lecture 5 on AMLbook.com)
1cs542g-term Notes  Simpler right-looking derivation (sorry):
1cs542g-term Notes  r 2 log r is technically not defined at r=0 but can be smoothly continued to =0 there  Question (not required in assignment):
Curve-Fitting Regression
Reduced Support Vector Machine
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Support Vector Machines
Lecture 10: Support Vector Machines
Classification and Prediction: Regression Analysis
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Perceptual and Sensory Augmented Computing Machine Learning, WS 13/14 Machine Learning – Lecture 14 Introduction to Regression Bastian Leibe.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
DOX 6E Montgomery1 Design of Engineering Experiments Part 9 – Experiments with Random Factors Text reference, Chapter 13, Pg. 484 Previous chapters have.
Kernel Methods Jong Cheol Jeong. Out line 6.1 One-Dimensional Kernel Smoothers Local Linear Regression Local Polynomial Regression 6.2 Selecting.
Single Factor or One-Way ANOVA Comparing the Means of 3 or More Groups Chapter 10.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
INTRODUCTION TO Machine Learning 3rd Edition
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Section 4Chapter 4. 1 Copyright © 2012, 2008, 2004 Pearson Education, Inc. Objectives Solving Systems of Linear Equations by Matrix Methods Define.
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
CSC321: Lecture 7:Ways to prevent overfitting
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Assignments CS fall Assignment 1 due Generate the in silico data set of 2sin(1.5x)+ N (0,1) with 100 random values of x between.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Machine Learning 5. Parametric Methods.
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)
Regression Machine Learning. Outline Regression vs Classification Linear regression – another discriminative learning method –As optimization 
Basis Expansions and Generalized Additive Models Basis expansion Piecewise polynomials Splines Generalized Additive Model MARS.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Linear Algebra Curve Fitting. Last Class: Curve Fitting.
Dr. Md. Mashiur Rahman Department of Physics University of Chittagong
CSE 4705 Artificial Intelligence
Bias and Variance of the Estimator
Probabilistic Models for Linear Regression
Regularization of Evolving Polynomial Models
Logistic Regression Classification Machine Learning.
Comparing Three or More Means
Collaborative Filtering Matrix Factorization Approach
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
The Bias Variance Tradeoff and Regularization
Contact: Machine Learning – (Linear) Regression Wilson Mckerrow (Fenyo lab postdoc) Contact:
Chapter 5 Section 6.
COSC 4368 Machine Learning Organization
Shih-Yang Su Virginia Tech
Introduction to Machine learning
Presentation transcript:

Overfitting and Regularization Chapters 11 and 12 on amlbook.com

Over-fitting easy to recognize in 1D Parabolic target function 4 th order hypothesis 5 data points -> E in = 0

Origin of over-fitting can be analyzed in 1D: Bias/variance dilemma

Over-fitting easy to avoid in 1D: Results from HW2 Sum of squared deviations Degree of polynomial E val E in

Using E val to avoid over-fitting works in all dimensions but computation grows rapidly for large d EE E in E cv-1 E val d = 2 Terms in  5 (x) added successively Validation set needs to be large Does this compromise training?

What if we want to add higher order terms to a linear model but don’t have enough data a validation set? Solution: Augment the error function used to optimize weights Example Penalizes choices with large |w|. Called “weight decay”

Normal equations with weight decay essentially unchanged (Z T Z + I) w reg =Z T y

Best value is subjective In this case = large enough to suppress swings and data still important in determining optimum weights

Generation of in silico dataset y(x) = 1 + 9x 2 + N(0,1) with 5 randomly selected values of x between -1 and +1 Fit a 4 th degree polynomial to the data with and without regularization by choosing = 0, , 0.001,0.01,1.0, and 10. Display results as in slide 8 of lecture on regularization Assignment 8: due