Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009.

Slides:



Advertisements
Similar presentations
The blue and green colors are actually the same.
Advertisements

Adjusting Active Basis Model by Regularized Logistic Regression
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Detecting Faces in Images: A Survey
Data Mining Classification: Alternative Techniques
[slides prises du cours cs UC Berkeley (2006 / 2009)]
An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Quiz 1 on Wednesday ~20 multiple choice or short answer questions
Support Vector Machines and Margins
An Overview of Machine Learning
Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Computational Learning Theory
Sparse vs. Ensemble Approaches to Supervised Learning
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
AbstractAbstract Adjusting Active Basis Model by Regularized Logistic Regression Ruixun Zhang Peking University, Department of Statistics and Probability.
Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem
Machine Learning CS 165B Spring 2012
CSE 185 Introduction to Computer Vision Pattern Recognition.
This week: overview on pattern recognition (related to machine learning)
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Data mining and machine learning A brief introduction.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 10a-11:30a Instructor: Christoph F. Eick Classroom:AH123
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Image Categorization Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/11.
Classifiers Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 04/09/15.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Classification Heejune Ahn SeoulTech Last updated May. 03.
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
INTRODUCTION TO Machine Learning 3rd Edition
Feature (Gene) Selection MethodsSample Classification Methods Gene filtering: Variance (SD/Mean) Principal Component Analysis Regression using variable.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Lecture 2: Statistical learning primer for biologists
Machine Learning with Discriminative Methods Lecture 05 – Doing it 1 CS Spring 2015 Alex Berg.
Machine Learning with Discriminative Methods Lecture 00 – Introduction CS Spring 2015 Alex Berg.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
Classification Ensemble Methods 1
COMP24111: Machine Learning Ensemble Models Gavin Brown
1 Introduction to Predictive Learning Electrical and Computer Engineering LECTURE SET 8 Combining Methods and Ensemble Learning.
FUZZ-IEEE Kernel Machines and Additive Fuzzy Systems: Classification and Function Approximation Yixin Chen and James Z. Wang The Pennsylvania State.
Support Vector Machines Optimization objective Machine Learning.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 1-2:30p Instructor: Christoph F. Eick Classroom:AH301
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
1 C.A.L. Bailer-Jones. Machine Learning. Model selection and combination Machine learning, pattern recognition and statistical data modelling Lecture 10.
Usman Roshan Dept. of Computer Science NJIT
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
A Brief Introduction to Bayesian networks
Who am I? Work in Probabilistic Machine Learning Like to teach 
Photo: CMU Machine Learning Department Protests G20
Machine Learning Crash Course
Lecture 17. Boosting¶ CS 109A/AC 209A/STAT 121A Data Science: Harvard University Fall 2016 Instructors: P. Protopapas, K. Rader, W. Pan.
The Elements of Statistical Learning
COMP61011 : Machine Learning Ensemble Models
Todd Holloway Two Lecture Series for B551 November 20 & 27, 2007
Machine Learning Crash Course
Ensemble Methods for Machine Learning: The Ensemble Strikes Back
Support Vector Machines and Kernels
Derek Hoiem CS 598, Spring 2009 Jan 27, 2009
Presentation transcript:

Classification Derek Hoiem CS 598, Spring 2009 Jan 27, 2009

Outline Principles of generalization Survey of classifiers Project discussion Discussion of Rosch

Pipeline for Prediction ImageryRepresentationClassifierPredictions

Free Lunch Theorem

Bias and Variance Complexity Low Bias High Variance High Bias Low Variance Error

Overfitting Need validation set Validation set not same as test set

Bias-Variance View of Features More compact = lower variance, potentially higher bias More features = higher variance, lower bias More independence among features = simpler classifier  lower variance

How to reduce variance Parameterize model E.g., linear vs. piecewise

How to measure complexity? VC dimension Training error + Upper bound on generalization error N: size of training set h: VC dimension  : 1-probability

How to reduce variance Parameterize model Regularize

How to reduce variance Parameterize model Regularize Increase number of training examples

Effect of Training Size Number of Training Examples Error

Risk Minimization Margins xx x x x x x x o o o o o x2 x1

Classifiers Generative methods – Naïve Bayes – Bayesian Networks Discriminative methods – Logistic Regression – Linear SVM – Kernelized SVM Ensemble methods – Randomized Forests – Boosted Decision Trees Instance based – K-nearest neighbor Unsupervised – Kmeans

Components of classification methods Objective function Parameterization Regularization Training Inference

Classifiers: Naïve Bayes Objective Parameterization Regularization Training Inference x1x1 x2x2 x3x3 y

Classifiers: Logistic Regression Objective Parameterization Regularization Training Inference

Classifiers: Linear SVM Objective Parameterization Regularization Training Inference xx x x x x x x o o o o o x2 x1

Classifiers: Linear SVM Objective Parameterization Regularization Training Inference xx x x x x x x o o o o o x2 x1

Classifiers: Linear SVM Objective Parameterization Regularization Training Inference xx x x x x x x o o o o o o x2 x1 Needs slack

Classifiers: Kernelized SVM Objective Parameterization Regularization Training Inference xxxxooo x1x1 x x x x o o o x1x1 x12x12

Classifiers: Decision Trees Objective Parameterization Regularization Training Inference xx x x x x x x o o o o o o x2 x1

Ensemble Methods: Boosting figure from Friedman et al. 2000

Boosted Decision Trees … Gray? High in Image? Many Long Lines? Yes No Yes Very High Vanishing Point? High in Image? Smooth?Green? Blue? Yes No Yes Ground Vertical Sky [Collins et al. 2002] P(label | good segment, data)

Boosted Decision Trees How to control bias/variance trade-off – Size of trees – Number of trees

K-nearest neighbor xx x x x x x x o o o o o o o x2 x1 Objective Parameterization Regularization Training Inference

Clustering xx x x x x o o o o o x1 x x x1 +

References General – Tom Mitchell, Machine Learning, McGraw Hill, 1997 – Christopher Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 1995 Adaboost – Friedman, Hastie, and Tibshirani, “Additive logistic regression: a statistical view of boosting”, Annals of Statistics, 2000 SVMs –

Project Idea Investigate various classification methods on several standard vision problems – At least five problems with pre-defined feature set and training/test set – Effect of training size – Effect of number of variables – Any method dominant? – Any guidelines for choosing method?

Project ideas?

Discussion of Rosch