Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning of Pseudo-Metrics. Slide 1 Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram.

Similar presentations


Presentation on theme: "Learning of Pseudo-Metrics. Slide 1 Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram."— Presentation transcript:

1 Learning of Pseudo-Metrics. Slide 1 Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram Singer, Google Inc. Andrew Y. Ng, Stanford University

2 Learning of Pseudo-Metrics. Slide 2 Motivating Example

3 Learning of Pseudo-Metrics. Slide 3 Our Technique Map instances into a space in which distances correspond to labels

4 Learning of Pseudo-Metrics. Slide 4 Outline Distance learning setting Large margin for distances An online learning algorithm Online loss analysis A dual version Experiments: Online - document filtering Batch - handwritten digit recognition

5 Learning of Pseudo-Metrics. Slide 5 Problem Setting Training examples: two instances similarity label Hypotheses class: Pseudo-metrics matrix symmetric positive semi-definite matrix

6 Learning of Pseudo-Metrics. Slide 6 Large Margin for Pseudo-Metrics Sample S is  -separated w.r.t. a metric

7 Learning of Pseudo-Metrics. Slide 7 Batch Formulation s.t.

8 Learning of Pseudo-Metrics. Slide 8 Pseudo-metric Online Learning Algorithm (POLA) For Get two instances Calculate distance Predict Get true label and suffer hinge-loss Update matrix and threshold If: we want that If:we want that

9 Learning of Pseudo-Metrics. Slide 9 Core Update: Two Projections Projection of vector v on closed convex set C Two-step update: 1) Project onto a half-space 2) Project onto the PSD cone

10 Learning of Pseudo-Metrics. Slide 10 Core Update: Two Projections Start with An example defines a half-space is the projection of onto this half-space is the projection of onto the PSD cone PSD cone All zero loss matrices

11 Learning of Pseudo-Metrics. Slide 11 Online Learning Goal – minimize cumulative loss Why Online? Online processing tasks (e.g. Text Filtering) Simple to implement Memory and run-time efficient Worst-case bounds on the performance Online to batch conversions

12 Learning of Pseudo-Metrics. Slide 12 Online Loss Bound sequence of examples s.t. any fixed matrix and threshold Then, Loss bound does not depend on dimension Loss suffered by “Complexity” of

13 Learning of Pseudo-Metrics. Slide 13 Incorporating Kernels Matrix A can be written as, where Therefore:

14 Learning of Pseudo-Metrics. Slide 14 Online Experiments Task: Document filtering according to topics Dataset: Reuters-21578 10,000 documents Documents labeled as Relevant and Irrelevant A few relevant documents (1% - 10% of entire set) Algorithms: POLA 1 Nearest Neighbor (1-NN) Perceptron Algorithm Perceptron Algorithm with Uneven Margins (PAUM) (Li, Zaragoza, Herbrich, Shawe-Taylor, Kandola)

15 Learning of Pseudo-Metrics. Slide 15 POLA for Document Filtering Get a document Calculate distance to relevant documents observed so far using current matrix Predict: document is relevant iff the distance to the closest relevant document is smaller than the current threshold Get true label Update matrix and threshold

16 Learning of Pseudo-Metrics. Slide 16 Document Filtering Results Each blue point corresponds to one topic Y-axis designates the error of POLA Points beneath the black diagonal line mean that POLA wins 1-NN error POLA error Perceptron error POLA error PAUM error POLA error

17 Learning of Pseudo-Metrics. Slide 17 Batch Experiments Task: Handwritten digits recognition Dataset: MNIST dataset 45 binary classification problems (all pairs) 10,000 training examples 10,000 test examples Algorithms: Used k-NN with various metrics: Pseudo-metric learned by POLA Euclidean distance Metric induced by Fisher Discriminant Analysis (FDA) Metric learned by Relevant Component Analysis (RCA) (Bar-Hillel, Hertz, Shental, and Weinshall)

18 Learning of Pseudo-Metrics. Slide 18 MNIST Results Euclidean distance errorFDA errorRCA error RCA was applied after using PCA as a pre- processing step Each blue point corresponds to one binary classification problem Y-axis designates the error of POLA Points beneath the black diagonal line mean that POLA wins

19 Learning of Pseudo-Metrics. Slide 19 Experiments: Dimensionality Reduction PCA POLA

20 Learning of Pseudo-Metrics. Slide 20 Toy problem A color-coded matrix of Euclidean distances between pairs of images

21 Learning of Pseudo-Metrics. Slide 21 Metric found by POLA

22 Learning of Pseudo-Metrics. Slide 22 Mapping found by POLA Our Pseudo-metrics:

23 Learning of Pseudo-Metrics. Slide 23 Mapping found by POLA

24 Learning of Pseudo-Metrics. Slide 24 Summary and Extensions An online algorithm for learning pseudo-metrics Formal properties, good experimental results Extensions: Alternative regularization schemes to the Frobenius norm “Learning to learn”: Learning a metric from one set of classes and apply to another set of related classes

25 Learning of Pseudo-Metrics. Slide 25 Hello  bye  = w ¢ x


Download ppt "Learning of Pseudo-Metrics. Slide 1 Online and Batch Learning of Pseudo-Metrics Shai Shalev-Shwartz Hebrew University, Jerusalem Joint work with Yoram."

Similar presentations


Ads by Google