Learning of Pseudo-Metrics. Slide 1 Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.

Slides:



Advertisements
Similar presentations
Aggregating local image descriptors into compact codes
Advertisements

Component Analysis (Review)
Multiclass SVM and Applications in Object Classification
Face Recognition and Biometric Systems Eigenfaces (2)
Linear Classifiers (perceptrons)
A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz.
1 Welcome to the Kernel-Class My name: Max (Welling) Book: There will be class-notes/slides. Homework: reading material, some exercises, some MATLAB implementations.
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Power of Selective Memory. Slide 1 The Power of Selective Memory Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem.
1 A Survey on Distance Metric Learning (Part 1) Gerry Tesauro IBM T.J.Watson Research Center.
Forgetron Slide 1 Online Learning with a Memory Harness using the Forgetron Shai Shalev-Shwartz joint work with Ofer Dekel and Yoram Singer Large Scale.
Pattern Recognition and Machine Learning
Support Vector Machines and Kernel Methods
K nearest neighbor and Rocchio algorithm
1 PEGASOS Primal Efficient sub-GrAdient SOlver for SVM Shai Shalev-Shwartz Yoram Singer Nati Srebro The Hebrew University Jerusalem, Israel YASSO = Yet.
Principal Component Analysis
Phoneme Alignment. Slide 1 Phoneme Alignment based on Discriminative Learning Shai Shalev-Shwartz The Hebrew University, Jerusalem Joint work with Joseph.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Learning to Align Polyphonic Music. Slide 1 Learning to Align Polyphonic Music Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.
Active Learning with Support Vector Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Group Norm for Learning Latent Structural SVMs Overview Daozheng Chen (UMD, College Park), Dhruv Batra (TTI Chicago), Bill Freeman (MIT), Micah K. Johnson.
Online Learning Algorithms
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Online Passive-Aggressive Algorithms Shai Shalev-Shwartz joint work with Koby Crammer, Ofer Dekel & Yoram Singer The Hebrew University Jerusalem, Israel.
Max-Margin Classification of Data with Absent Features Presented by Chunping Wang Machine Learning Group, Duke University July 3, 2008 by Chechik, Heitz,
Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.
Project by: Cirill Aizenberg, Dima Altshuler Supervisor: Erez Berkovich.
Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &
Online Learning Rong Jin. Batch Learning Given a collection of training examples D Learning a classification model from D What if training examples are.
Feature extraction using fuzzy complete linear discriminant analysis The reporter : Cui Yan
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Online Learning of Maximum Margin Classifiers Kohei HATANO Kyusyu University (Joint work with K. Ishibashi and M. Takeda) p-Norm with Bias COLT 2008.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Dimensionality reduction
Principal Component Analysis and Linear Discriminant Analysis for Feature Reduction Jieping Ye Department of Computer Science and Engineering Arizona State.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Smooth ε -Insensitive Regression by Loss Symmetrization Ofer Dekel, Shai Shalev-Shwartz, Yoram Singer School of Computer Science and Engineering The Hebrew.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Quadratic Perceptron Learning with Applications
An Efficient Online Algorithm for Hierarchical Phoneme Classification
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
LECTURE 10: DISCRIMINANT ANALYSIS
Unsupervised Riemannian Clustering of Probability Density Functions
Importance Weighted Active Learning
Group Norm for Learning Latent Structural SVMs
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
Boosting Nearest-Neighbor Classifier for Character Recognition
CS 188: Artificial Intelligence Fall 2008
COSC 4335: Other Classification Techniques
Presented by: Chang Jia As for: Pattern Recognition
Image Classification Painting and handwriting identification
Advanced Artificial Intelligence Classification
LECTURE 09: DISCRIMINANT ANALYSIS
Using Manifold Structure for Partially Labeled Classification
CAMCOS Report Day December 9th, 2015 San Jose State University
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Learning of Pseudo-Metrics. Slide 1 Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram Singer, Google Inc. Andrew Y. Ng, Stanford University

Learning of Pseudo-Metrics. Slide 2 Motivating Example

Learning of Pseudo-Metrics. Slide 3 Our Technique Map instances into a space in which distances correspond to labels

Learning of Pseudo-Metrics. Slide 4 Outline Distance learning setting Large margin for distances An online learning algorithm Online loss analysis A dual version Experiments: Online - document filtering Batch - handwritten digit recognition

Learning of Pseudo-Metrics. Slide 5 Problem Setting Training examples: two instances similarity label Hypotheses class: Pseudo-metrics matrix symmetric positive semi-definite matrix

Learning of Pseudo-Metrics. Slide 6 Large Margin for Pseudo-Metrics Sample S is  -separated w.r.t. a metric

Learning of Pseudo-Metrics. Slide 7 Batch Formulation s.t.

Learning of Pseudo-Metrics. Slide 8 Pseudo-metric Online Learning Algorithm (POLA) For Get two instances Calculate distance Predict Get true label and suffer hinge-loss Update matrix and threshold If: we want that If:we want that

Learning of Pseudo-Metrics. Slide 9 Core Update: Two Projections Projection of vector v on closed convex set C Two-step update: 1) Project onto a half-space 2) Project onto the PSD cone

Learning of Pseudo-Metrics. Slide 10 Core Update: Two Projections Start with An example defines a half-space is the projection of onto this half-space is the projection of onto the PSD cone PSD cone All zero loss matrices

Learning of Pseudo-Metrics. Slide 11 Online Learning Goal – minimize cumulative loss Why Online? Online processing tasks (e.g. Text Filtering) Simple to implement Memory and run-time efficient Worst-case bounds on the performance Online to batch conversions

Learning of Pseudo-Metrics. Slide 12 Online Loss Bound sequence of examples s.t. any fixed matrix and threshold Then, Loss bound does not depend on dimension Loss suffered by “Complexity” of

Learning of Pseudo-Metrics. Slide 13 Incorporating Kernels Matrix A can be written as, where Therefore:

Learning of Pseudo-Metrics. Slide 14 Online Experiments Task: Document filtering according to topics Dataset: Reuters ,000 documents Documents labeled as Relevant and Irrelevant A few relevant documents (1% - 10% of entire set) Algorithms: POLA 1 Nearest Neighbor (1-NN) Perceptron Algorithm Perceptron Algorithm with Uneven Margins (PAUM) (Li, Zaragoza, Herbrich, Shawe-Taylor, Kandola)

Learning of Pseudo-Metrics. Slide 15 POLA for Document Filtering Get a document Calculate distance to relevant documents observed so far using current matrix Predict: document is relevant iff the distance to the closest relevant document is smaller than the current threshold Get true label Update matrix and threshold

Learning of Pseudo-Metrics. Slide 16 Document Filtering Results Each blue point corresponds to one topic Y-axis designates the error of POLA Points beneath the black diagonal line mean that POLA wins 1-NN error POLA error Perceptron error POLA error PAUM error POLA error

Learning of Pseudo-Metrics. Slide 17 Batch Experiments Task: Handwritten digits recognition Dataset: MNIST dataset 45 binary classification problems (all pairs) 10,000 training examples 10,000 test examples Algorithms: Used k-NN with various metrics: Pseudo-metric learned by POLA Euclidean distance Metric induced by Fisher Discriminant Analysis (FDA) Metric learned by Relevant Component Analysis (RCA) (Bar-Hillel, Hertz, Shental, and Weinshall)

Learning of Pseudo-Metrics. Slide 18 MNIST Results Euclidean distance errorFDA errorRCA error RCA was applied after using PCA as a pre- processing step Each blue point corresponds to one binary classification problem Y-axis designates the error of POLA Points beneath the black diagonal line mean that POLA wins

Learning of Pseudo-Metrics. Slide 19 Experiments: Dimensionality Reduction PCA POLA

Learning of Pseudo-Metrics. Slide 20 Toy problem A color-coded matrix of Euclidean distances between pairs of images

Learning of Pseudo-Metrics. Slide 21 Metric found by POLA

Learning of Pseudo-Metrics. Slide 22 Mapping found by POLA Our Pseudo-metrics:

Learning of Pseudo-Metrics. Slide 23 Mapping found by POLA

Learning of Pseudo-Metrics. Slide 24 Summary and Extensions An online algorithm for learning pseudo-metrics Formal properties, good experimental results Extensions: Alternative regularization schemes to the Frobenius norm “Learning to learn”: Learning a metric from one set of classes and apply to another set of related classes

Learning of Pseudo-Metrics. Slide 25 Hello  bye  = w ¢ x