Introduction to Machine Learning for Category Representation Jakob Verbeek November 27, 2009 Many slides adapted from S. Lazebnik.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Machine learning continued Image source:
An Overview of Machine Learning
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Supervised learning Given training examples of inputs and corresponding outputs, produce the “correct” outputs for new inputs Two main scenarios: –Classification:
COMP 875: Introductions Name, year, research area/group
Pattern Recognition and Machine Learning
1 On the Statistical Analysis of Dirty Pictures Julian Besag.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Learning From Data Chichang Jou Tamkang University.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Part I: Classification and Bayesian Learning
Introduction to machine learning
Radial Basis Function Networks
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Crash Course on Machine Learning
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Introduction to Machine Learning for Category Representation
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Machine learning & category recognition Cordelia Schmid Jakob Verbeek.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
Jakob Verbeek December 11, 2009
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Probability and Statistics in Vision. Probability Objects not all the sameObjects not all the same – Many possible shapes for people, cars, … – Skin has.
Data Mining and Decision Support
Machine Learning 5. Parametric Methods.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Pattern recognition – basic concepts. Sample input attribute, attribute, feature, input variable, independent variable (atribut, rys, příznak, vstupní.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
CSE 4705 Artificial Intelligence
Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.
Machine Learning Basics
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Machine learning overview
Introduction to Sensor Interpretation
Supervised machine learning: creating a model
Introduction to Sensor Interpretation
INTRODUCTION TO Machine Learning
Presentation transcript:

Introduction to Machine Learning for Category Representation Jakob Verbeek November 27, 2009 Many slides adapted from S. Lazebnik

Plan for this course 1)Introduction to machine learning 2)Clustering techniques  k-means, Gaussian mixture density 3)Gaussian mixture density continued  Parameter estimation with EM, Fisher kernels 4)Classification techniques 1  Introduction, generative methods, semi-supervised 5)Classification techniques 2  Discriminative methods, kernels 6)Decomposition of images  Topic models, …

What is machine learning? According to wikipedia –“Learning is acquiring new knowledge, behaviors, skills, values, preferences or understanding, and may involve synthesizing different types of information. The ability to learn is possessed by humans, animals and some machines. Progress over time tends to follow learning curves.” –“Machine learning is a scientific discipline that is concerned with the design and development of algorithms that allow computers to change behavior based on data, such as from sensor data or databases. A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data. Hence, machine learning is closely related to fields such as statistics, probability theory, data mining, pattern recognition, artificial intelligence, adaptive control, and theoretical computer science.”

Why machine learning? Extract knowledge/information from past experience/data Use this knowledge/information to analyze new experiences/data Designing rules to deal with new data by hand can be difficult –How to write a program to detect a cat in an image? Collecting data can be easier –Find images with cats, and ones without them Use machine learning to automatically find such rules. Goal of this course: introduction to machine learning techniques used in current object recognition systems.

Steps in machine learning Data collection –“training data”, optionally with “labels” provided by a “teacher”. Representation –how the data are encoded into “features” when presented to learning algorithm. Modeling –choose the class of models that the learning algorithm will choose from. Estimation –find the model that best explains the data: simple and fits well. Validation –evaluate the learned model and compare to solution found using other model classes. Apply learned model to new “test” data

Data Representation Important issue when using learning techniques Different types of representations –Vectorial, graphs, … –Homogeneous or heterogeneous, e.g. Images + text Choice of representation may impact the choice of learning algorithm. Domain knowledge can help to design or select good features. –The ultimate feature would solve the learning problem… Automatic methods known as “feature selection” methods

Probability & Statistics in Learning Many learning methods formulated as a probabilistic model of data –Can deal with uncertainty in the data –Missing values for some data can be handled –Provides a unified framework to combine many different models for different types of data Statistics are used to analyze the behavior of learning algorithms –Does the learning algorithm recover the underlying model given enough data: “consistency” –How fast does is do so: rate of convergence Common important assumption –Training data sampled from the true data distribution –The test data is sampled from the same distribution

Different forms of learning Supervised –Classification –Regression Unsupervised –Clustering –Dimension reduction –Topic models –Density estimation Semi-supervised –Combine labeled data wit unlabeled data Active learning –Determine the most useful data to label next Many other forms…

Supervised learning Training data provided as pairs (x,y) The goal is to predict an “output” y from an “input” x Output y for each input x is the “supervision” that is given to the learning algorithm. –Often obtained by manual “annotation” of the inputs x –Can be costly to do Most common examples –Classification –Regression

Classification Training data consists of “inputs”, denoted x, and corresponding output “class labels”, denoted as y. Goal is to correctly predict for a test data input the corresponding class label. Learn a “classifier” f(x) from the input data that outputs the class label or a probability over the class labels. Example: –Input: image –Output: category label, eg “cat” vs. “no cat” Classification can be binary (two classes), or over a larger number of classes (multi-class). –In binary classification we often refer to one class as “positive”, and the other as “negative” Binary classifier creates a boundaries in the input space between areas assigned to each class

Example of classification Given: training images and their categoriesWhat are the categories of these test images?

Regression Similar to classification, but output y has the form of one or more real numbers. Goal is to predict for input x an output f(x) that is close to the true y. Learn a continuous function A “loss” function, or “error” function measures how we a certain function f is doing –In classification we want to minimize nr. of errors using a 0/1 loss: correct or not –In regression we minimize a graded loss function, loss is bigger as f(x) is further from correct y.

Example of regression Suppose we want to predict gas mileage of a car based on some characteristics: number of cylinders or doors, weight, horsepower, year etc.

Regression: example 2 Training set: faces (represented as vectors of distances between keypoints) together with experimentally obtained attractiveness rankings Learn: function to reproduce attractiveness ranking based on training inputs and outputs Vector of distances v Attractiveness score f(v) T. Leyvand, D. Cohen-Or, G. Dror, and D. Lischinski, Data-driven enhancement of facial attractiveness, SIGGRAPH 2008

Other forms of supervised learning Structured prediction tasks: predict several interdependent output variables Image Word

Structured Prediction Estimation of body poses Data association problem: assigning edges body parts Source: D. Ramanan model

Other supervised learning scenarios Learning similarity functions from relations between multiple input objects Pairwise constraints Source: X. Sui, K. Grauman

Learning face similarities Training data: pairs of faces labeled as same/different Similarity measure should ignore: pose, expression, … Face identification: are these faces of the same person? [Guillaumin, Verbeek, Schmid, ICCV 2009]

Unsupervised learning Input data x given without desired output variables y. Goals is to learn something about the “structure” of the data Examples include –Clustering –Dimensionality reduction –Topic models –Density estimation Not always clear how to measure success of unsupervised learning –Probabilistic models can be evaluated by computing likelihood assigned to other data sampled from the same distribution –Clustering can be evaluated by learning on labeled data, measure how clusters correspond to classes, but classes may not define most apparent clusters –Dimensionality reduction can be evaluated by reconstruction errors

Clustering Finding a group structure in the data –Data in one cluster similar to each other –Data in different clusters dissimilar Map each data point to a discrete cluster index –“flat” methods find k groups (k known, or automatically set) –“hierarchical” methods define a tree structure over the data

Clustering example Learn face similarity from training pairs labeled as same/different Cluster faces based on identity [Guillaumin, Verbeek, Schmid, ICCV 2009]

Dimension reduction Finding a lower dimensional representation of the data –Useful for compression, visualization, noise reduction Unlike regression: target values not given

Dimension reduction Finding a lower dimensional representation of the data –Useful for compression, visualization, noise reduction Unlike regression: target values not given

Dimension reduction

Topic models Decompose images or texts into groups of regions or words that often co-occur (topics)

Topic models for images Decompose each image into small set of visual topics Spatial coherence enforced by Markov Random Field Training images labeled with category (topic) names Learning algorithm assigns pixels to categories (topics) Test images do not have any labels [Verbeek & Triggs, CVPR’07]

Density estimation Fit probability density on the training data –Can be combination of discrete and continuous data –Good fit: high likelihood on training data –Smooth function: generalizes to new data Can be used to detect anomalies Many forms of unsupervised learning can be understood as doing density estimation –Type of model differs though

Different forms of learning Supervised –Classification –Regression Unsupervised –Clustering –Dimension reduction –Topic models –Density estimation Semi-supervised –Combine labeled data wit unlabeled data Active learning –Determine the most useful data to label next Many other forms…

Semi-supervised learning Learn from supervised and unsupervised data –Labeled data often expensive to obtain –Unlabeled data often cheap to obtain Why should this work? –Unsupervised data used to learn about distribution on inputs x –Supervised data used to learn about input x given output y ?

Example of semi-supervised learning Classification of newsgroup articles into 20 different classes: politics, sports, education,… Use EM to iteratively estimate class label of unlabeled data and update the model Helps when few labeled examples are available [Nigam et al., Machine Learning, Vol. 39, pp 103—134, 2000]

Active learning The learning algorithm can choose its own training examples, or ask a “teacher” for an answer on selected inputs –Labeling of most uncertain images –Labeling of images that maximally reduce uncertainty in model parameters S. Vijayanarasimhan and K. Grauman, “Cost-Sensitive Active Visual Category Learning,” 2009

Generalization The ultimate goal is to do as well as possible on new, unseen data (a test set), but we only have access to labels (“ground truth”) for the training set What makes generalization possible? Inductive bias: set of assumptions a learner uses to predict the target value for previously unseen inputs –This is the same as modeling or choosing a target hypothesis class Types of inductive bias –Occam’s razor –Similarity/continuity bias: similar inputs should have similar outputs –…–…

Achieving good generalization Consideration 1: Bias –How well does your model fit the observed data? –It may be a good idea to accept some fitting error, because it may be due to noise or other “accidental” characteristics of one particular training set Consideration 2: Variance –How robust is the model to the selection of a particular training set? –To put it differently, if we learn models on two different training sets, how consistent will the models be?

Bias/variance tradeoff Models with too many parameters may fit the training data well (low bias), but are sensitive to choice of training set (high variance)

Bias/variance tradeoff Models with too many parameters may fit the training data well (low bias), but are sensitive to choice of training set (high variance) Models with too few parameters may not fit the data well (high bias) but are consistent across different training sets (low variance) 2

Bias/variance tradeoff Models with too many parameters may fit the training data well (low bias), but are sensitive to choice of training set (high variance) Generalization error is due to overfitting Models with too few parameters may not fit the data well (high bias) but are consistent across different training sets (low variance) Generalization error is due to underfitting 2

Underfitting and overfitting How to recognize underfitting? –High training error and high test error How to deal with underfitting? –Find a more complex model How to recognize overfitting? –Low training error, but high test error How to deal with overfitting? –Get more training data –Decrease the number of parameters in your model –Regularization: penalize certain parts of the parameter space or introduce additional constraints to deal with a potentially ill- posed problem

Methodology Distinction between training and testing is crucial –Correct performance on training set is just memorization! –Not enough to perform well on new test data Strictly speaking, the researcher should never look at the test data when designing the system –Generalization performance should be evaluated on a hold-out or validation set –Raises some troubling issues for learning “benchmarks” Source: R. Parr

Plan for this course Introduction to machine learning 2)Clustering techniques  k-means, Gaussian mixture density 3)Gaussian mixture density continued  Parameter estimation with EM, Fisher kernels 4)Classification techniques 1  Introduction, generative methods, semi-supervised 5)Classification techniques 2  Discriminative methods, kernels 6)Decomposition of images  Topic models, …