CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016.

Slides:



Advertisements
Similar presentations
Polynomial Curve Fitting BITS C464/BITS F464 Navneet Goyal Department of Computer Science, BITS-Pilani, Pilani Campus, India.
Advertisements

Pattern Recognition and Machine Learning
Pattern Recognition and Machine Learning
Quiz 1 on Wednesday ~20 multiple choice or short answer questions
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 2: Review Part 2.
Recognition: A machine learning approach
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Part I: Classification and Bayesian Learning
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Machine learning Image source:
Evaluating Classifiers
PATTERN RECOGNITION AND MACHINE LEARNING
Computer Vision CS 776 Spring 2014 Recognition Machine Learning Prof. Alex Berg.
Last lecture summary. Basic terminology tasks – classification – regression learner, algorithm – each has one or several parameters influencing its behavior.
Learning from Observations Chapter 18 Through
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
INTRODUCTION TO Machine Learning 3rd Edition
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
CS 1699: Intro to Computer Vision Bias-Variance Trade-off + Other Models and Problems Prof. Adriana Kovashka University of Pittsburgh November 3, 2015.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
CORRECTIONS L2 regularization ||w|| 2 2, not ||w|| 2 Show second derivative is positive or negative on exams, or show convex – Latter is easier (e.g. x.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
Bias-Variance in Machine Learning. Bias-Variance: Outline Underfitting/overfitting: –Why are complex hypotheses bad? Simple example of bias/variance Error.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.
CS 2750: Machine Learning Machine Learning Basics + Matlab Tutorial
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CS 2750: Machine Learning Bias-Variance Trade-off (cont’d) + Image Representations Prof. Adriana Kovashka University of Pittsburgh January 20, 2016.
CS 2750: Machine Learning Linear Regression Prof. Adriana Kovashka University of Pittsburgh February 10, 2016.
Copyright  2004 limsoon wong CS2220: Computation Foundation in Bioinformatics Limsoon Wong Institute for Infocomm Research Lecture slides for 13 January.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Regression Machine Learning. Outline Regression vs Classification Linear regression – another discriminative learning method –As optimization 
Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Model Selection and the Bias–Variance Tradeoff All models described have a smoothing or complexity parameter that has to be considered: multiplier of the.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
Machine Learning Crash Course
Evaluating Classifiers
Machine Learning Slides: Isabelle Guyon, Erik Sudderth, Mark Johnson, Derek Hoiem, Lana Lazebnik Photo: CMU Machine Learning Department protests G20.
Softmax Classifier + Generalization
CS 2750: Machine Learning Linear Regression
Bias and Variance of the Estimator
Roberto Battiti, Mauro Brunato
CS 2750: Machine Learning Review (cont’d) + Linear Regression Intro
Machine Learning Crash Course
CS 2770: Computer Vision Intro to Visual Recognition
The basic notions related to machine learning
CS 2750: Machine Learning Line Fitting + Bias-Variance Trade-off
Hyperparameters, bias-variance tradeoff, validation
Pattern Recognition and Machine Learning
CS 1675: Intro to Machine Learning Regression and Overfitting
Overfitting and Underfitting
Where does the error come from?
Shih-Yang Su Virginia Tech
Lecture: Object Recognition
Introduction to Neural Networks
Presentation transcript:

CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016

Plan for Today More Matlab Measuring performance The bias-variance trade-off

Matlab Tutorial s/matlab-tutorial/ s/matlab-tutorial/ 750/Tutorial/ 750/Tutorial/ tlab_probs2.pdf tlab_probs2.pdf

Matlab Exercise p211/basicexercises.html p211/basicexercises.html – Do Problems 1-8, 12 – Most also have solutions – Ask the TA if you have any problems

Homework 1 w1.htm w1.htm If I hear about issues, I will mark clarifications and adjustments in the assignment in red, so check periodically

ML in a Nutshell y = f(x) Training: given a training set of labeled examples {(x 1,y 1 ), …, (x N,y N )}, estimate the prediction function f by minimizing the prediction error on the training set Testing: apply f to a never before seen test example x and output the predicted value y = f(x) outputprediction function features Slide credit: L. Lazebnik

ML in a Nutshell Apply a prediction function to a feature representation (in this example, of an image) to get the desired output: f( ) = “apple” f( ) = “tomato” f( ) = “cow” Slide credit: L. Lazebnik

Data Representation Let’s brainstorm what our “X” should be for various “Y” prediction tasks…

Measuring Performance If y is discrete: – Accuracy: # correctly classified / # all test examples – Loss: Weighted misclassification via a confusion matrix In case of only two classes: True Positive, False Positive, True Negative, False Negative Might want to “fine” our system differently for FP and FN Can extend to k classes

Measuring Performance If y is discrete: – Precision/recall Precision = # predicted true pos / # predicted pos Recall = # predicted true pos / # true pos – F-measure = 2PR / (P + R)

Precision / Recall / F-measure Precision = 2 / 5 = 0.4 Recall= 2 / 4 = 0.5 F-measure = 2*0.4*0.5 / = 0.44 True positives (images that contain people) True negatives (images that do not contain people) Predicted positives (images predicted to contain people) Predicted negatives (images predicted not to contain people) Accuracy: 5 / 10 = 0.5

Measuring Performance If y is continuous: – Euclidean distance between true y and predicted y’

How well does a learned model generalize from the data it was trained on to a new test set? Training set (labels known)Test set (labels unknown) Slide credit: L. Lazebnik Generalization

Components of expected loss – Noise in our observations: unavoidable – Bias: how much the average model over all training sets differs from the true model Error due to inaccurate assumptions/simplifications made by the model – Variance: how much models estimated from different training sets differ from each other Underfitting: model is too “simple” to represent all the relevant class characteristics – High bias and low variance – High training error and high test error Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data – Low bias and high variance – Low training error and high test error Adapted from L. Lazebnik

Bias-Variance Trade-off Models with too few parameters are inaccurate because of a large bias (not enough flexibility). Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample). Slide credit: D. Hoiem

Polynomial Curve Fitting Slide credit: Chris Bishop

Sum-of-Squares Error Function Slide credit: Chris Bishop

0 th Order Polynomial Slide credit: Chris Bishop

1 st Order Polynomial Slide credit: Chris Bishop

3 rd Order Polynomial Slide credit: Chris Bishop

9 th Order Polynomial Slide credit: Chris Bishop

Over-fitting Root-Mean-Square (RMS) Error: Slide credit: Chris Bishop

Data Set Size: 9 th Order Polynomial Slide credit: Chris Bishop

Data Set Size: 9 th Order Polynomial Slide credit: Chris Bishop

Question Who can give me an example of overfitting… involving the Steelers and what will happen on Sunday?

How to reduce over-fitting? Get more training data Slide credit: D. Hoiem

Regularization Penalize large coefficient values (Remember: We want to minimize this expression.) Adapted from Chris Bishop

Polynomial Coefficients Slide credit: Chris Bishop

Regularization: Slide credit: Chris Bishop

Regularization: Slide credit: Chris Bishop

Regularization: vs. Slide credit: Chris Bishop

Polynomial Coefficients Adapted from Chris Bishop No regularization Huge regularization

How to reduce over-fitting? Get more training data Regularize the parameters Slide credit: D. Hoiem

Bias-variance Figure from Chris Bishop

Bias-variance tradeoff Training error Test error UnderfittingOverfitting Complexity Low Bias High Variance High Bias Low Variance Error Slide credit: D. Hoiem

Bias-variance tradeoff Many training examples Few training examples Complexity Low Bias High Variance High Bias Low Variance Test Error Slide credit: D. Hoiem

Choosing the trade-off Need validation set (separate from test set) Training error Test error Complexity Low Bias High Variance High Bias Low Variance Error Slide credit: D. Hoiem

Effect of Training Size Testing Training Generalization Error Number of Training Examples Error Fixed prediction model Adapted from D. Hoiem

How to reduce over-fitting? Get more training data Regularize the parameters Use fewer features Choose a simpler classifier Slide credit: D. Hoiem

Remember… Three kinds of error – Inherent: unavoidable – Bias: due to over-simplifications – Variance: due to inability to perfectly estimate parameters from limited data Try simple classifiers first Use increasingly powerful classifiers with more training data (bias-variance trade-off) Adapted from D. Hoiem