The problem of overfitting

Slides:



Advertisements
Similar presentations
Neural Networks: Learning
Advertisements

Machine Learning and Data Mining Linear regression
Linear Regression.
Regularization David Kauchak CS 451 – Fall 2013.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Recap. Be cautious.. Data may not be in one blob, need to separate data into groups Clustering.
Indian Statistical Institute Kolkata
Logistic Regression Classification Machine Learning.
Recommender Systems Problem formulation Machine Learning.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
x – independent variable (input)
Linear Regression  Using a linear function to interpolate the training set  The most popular criterion: Least squares approach  Given the training set:
Kernel Methods and SVM’s. Predictive Modeling Goal: learn a mapping: y = f(x;  ) Need: 1. A model structure 2. A score function 3. An optimization strategy.
Classification and Prediction: Regression Analysis
Collaborative Filtering Matrix Factorization Approach
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Learning with large datasets Machine Learning Large scale machine learning.
Classification Part 3: Artificial Neural Networks
Logistic Regression L1, L2 Norm Summary and addition to Andrew Ng’s lectures on machine learning.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 2B *Courtesy of Associate Professor Andrew Ng’s Notes, Stanford University.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Machine Learning Introduction Study on the Coursera All Right Reserved : Andrew Ng Lecturer:Much Database Lab of Xiamen University Aug 12,2014.
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
Linear regression By gradient descent (with thanks to Prof. Ng’s machine learning course)
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Regularisation *Courtesy of Associate Professor Andrew.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Logistic Regression Week 3 – Soft Computing By Yosi Kristian.
Dimensionality Reduction Motivation I: Data Compression Machine Learning.
ISCG8025 Machine Learning for Intelligent Data and Information Processing Week 3 Practical Notes Application Advice *Courtesy of Associate Professor Andrew.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
Regularization (Additional)
M Machine Learning F# and Accord.net.
Chapter 2-OPTIMIZATION
Support Vector Machines Optimization objective Machine Learning.
WEEK 2 SOFT COMPUTING & MACHINE LEARNING YOSI KRISTIAN Gradient Descent for Linear Regression.
Machine Learning & Deep Learning
Boosting and Additive Trees (2)
Lecture 3: Linear Regression (with One Variable)
CSE 4705 Artificial Intelligence
ECE 5424: Introduction to Machine Learning
Machine Learning – Regression David Fenyő
Logistic Regression Classification Machine Learning.
Logistic Regression Classification Machine Learning.
Collaborative Filtering Matrix Factorization Approach
Logistic Regression Classification Machine Learning.
Neural Networks Geoff Hulten.
Overfitting and Underfitting
Support Vector Machine I
Softmax Classifier.
Artificial Intelligence 10. Neural Networks
Machine Learning Algorithms – An Overview
Logistic Regression Classification Machine Learning.
Shih-Yang Su Virginia Tech
Multiple features Linear Regression with multiple variables
Multiple features Linear Regression with multiple variables
Logistic Regression Classification Machine Learning.
Linear regression with one variable
Machine Learning in Business John C. Hull
Logistic Regression Classification Machine Learning.
Logistic Regression Geoff Hulten.
Presentation transcript:

The problem of overfitting Regularization The problem of overfitting Machine Learning

Example: Linear regression (housing prices) Size Size Size Overfitting: If we have too many features, the learned hypothesis may fit the training set very well ( ), but fail to generalize to new examples (predict prices on new examples).

Example: Logistic regression ( = sigmoid function)

Addressing overfitting: size of house no. of bedrooms Price no. of floors age of house average income in neighborhood Size kitchen size

Addressing overfitting: Options: Reduce number of features. Manually select which features to keep. Model selection algorithm (later in course). Regularization. Keep all the features, but reduce magnitude/values of parameters . Works well when we have a lot of features, each of which contributes a bit to predicting .

Regularization Cost function Machine Learning

Suppose we penalize and make , really small. Intuition Price Price Size of house Size of house Suppose we penalize and make , really small.

Regularization. Small values for parameters “Simpler” hypothesis Less prone to overfitting Housing: Features: Parameters:

Regularization. Price Size of house

In regularized linear regression, we choose to minimize What if is set to an extremely large value (perhaps for too large for our problem, say )? Algorithm works fine; setting to be very large can’t hurt it Algortihm fails to eliminate overfitting. Algorithm results in underfitting. (Fails to fit even training data well). Gradient descent will fail to converge.

In regularized linear regression, we choose to minimize What if is set to an extremely large value (perhaps for too large for our problem, say )? Price Size of house

Regularized linear regression Regularization Regularized linear regression Machine Learning

Regularized linear regression

Gradient descent Repeat

Normal equation

Non-invertibility (optional/advanced). Suppose , (#examples) (#features) If ,

Regularized logistic regression Regularization Regularized logistic regression Machine Learning

Regularized logistic regression. x1 x2 Cost function:

Gradient descent Repeat

Advanced optimization function [jVal, gradient] = costFunction(theta) jVal = [ ]; code to compute gradient(1) = [ ]; code to compute gradient(2) = [ ]; code to compute gradient(3) = [ ]; code to compute gradient(n+1) = [ ]; code to compute