Download presentation

Published byIris Raby Modified over 3 years ago

1
**Multiclass SVM and Applications in Object Classification**

Yuval Kaminka, Einat Granot Advanced Topics in Computer Vision Seminar Faculty of Mathematics and Computer Science Weizmann Institute May 2007

2
**Outline Motivation and Introduction Classification Algorithms**

K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

3
**Object Classification**

?

4
**Motivation – Human Visual System**

Large Number of Categories (~30,000) Discriminative Process Small Set of Examples Invariance to transformation Similarity to Prototype instead of Features

5
**Similarity to Prototypes Vs Features**

No need for Feature Space Easy to enlarge number of categories Includes spatial relation between features No need for feature definition, for example in the tangent distance

6
**D( ) , Distance Function Similarity is defined by Distance Function**

Easy to adjust to different types (Shape, Texture) Can include invariance to intra-class transformations

7
**Distance Function – simple example**

) = ) = || 2.1, 27, 31, 15, 8 . - || 13, 45, 22.5, 78, 91 ? , , 2.1 27 31 .

8
**Outline Motivation and Introduction Classification Algorithms**

K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

9
**A Classic Classification Problem**

Training Set S: (X1..Xn), with class label (Y1.. Yn) Given a query image q, determine its label X2 X3 X1 X5 q X4 X6 X7

10
Nearest Neighbor (NN) ?

11
**K-Nearest Neighbor (KNN)**

? K = 3

12
**K-NN Pros Simple, yet outperforms other methods Low Complexity: O(Dּn)**

D - the cost per one distance function calculation No need for Feature Space definition No computational cost for adding new categories n ∞ ==> Error Rate Bayes optimal Bayes Optimal – A classifiers that always classify the classification that will get maximum probability, going over all possible hypothesis

13
**K-NN Cons Complete Set Missing Set NN SVM**

P. Vincent et al., K-local hyperplane and convex distance nearest neighbor algorithms, NIPS 2001

14
**Outline Motivation and Introduction Classification Algorithms**

K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

15
**SVM Two class classification algorithm**

Hyperplane – תת-קבוצה של וקטורים במימד n-1 שמגדיר הפרדה במימד ה-n. Linear Hyperplane – Hyperplane שעובר דרך הראשית Class 1 We’re looking for a hyperplane that best separates the classes Some of the slides on SVM are adapted with permission from Martin Law’s presentation on SVM

16
**As far away as possible from the data of both classes**

SVM - Motivation Class 2 Class 2 Class 1 Class 1 As far away as possible from the data of both classes

17
**SVM – A learning algorithm**

KNN – simple classification, no training Class 1 Class 2 SVM – a learning algorithm Training – find the hyperplane Classification – label a new query Two Phases:

18
**SVM – Training Phase We’re looking for (w,b) that will:**

Class 2 ~b wTx+b=0 Class 1 We’re looking for (w,b) that will: Classify correctly the classes Give maximum margins

19
**1. Correct classification**

{x1, ..., xn} our training set wTx+b=0 Class 1 Correct classification: wTxi+b>0 for green, and wTxi+b<0 for red Assume the labels {y1.. yn} are from the set {-1,1}:

20
2. Margin maximization Class 2 m Class 1 m = ?

21
**2. Margin maximization m We can scale (w,b) (w,b), >0**

|wTz+b| ||w|| Class 2 z m Class 1 We can scale (w,b) (w,b), >0 Won’t change classification: wTx+b>0 wTx+b>0 Get a desired distance: |wTz+b|=a =1/a, |wTz+b|=1

22
**SVM as an Optimization Problem**

Maximize margins Correct Classification Solve optimization problem with constraints We can find a1.. an, such that: Langrangian multipliers C.J.C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition, 1998.

23
**SVM as an Optimization Problem**

Maximize margins Correct Classification Classic optimization problem with constraints לשנות x ל-w ולתקן למטה ל-xi s.t.

24
**SVM as an Optimization Problem**

s.t. There must exist positive a1.. an such that: And in our case: There must exist positive a1.. an such that: gi(x) f(x)

25
**Support Vectors xi with ai>0 are called support vectors (SV)**

Class 2 a=0 a>0 a=0 a=0 a>0 a=0 a>0 a=0 a=0 Class 1 xi with ai>0 are called support vectors (SV) w is determined only by the SV

26
**Allowing errors We would now like to minimize wTx+b=1 wTx+b=0 wTx+b=-1**

Class 2 wTx+b=1 Class 1 wTx+b=0 wTx+b=-1 We would now like to minimize

27
Allowing errors As before we get: Class 2 Class 1

28
**SVM – Classification phase**

q Class 1 Compute wTq+b Classify as class 1 if positive, and class 2 otherwise

29
**Upgrade SVM We only need to calculate inner products**

In order to find a1.. an we need to calculate xiTxj i,j In order to classify a query q we need to calculate:

30
**Feature Expansion f(.) Extended space Input space f(.)**

( 1 , x , y , xy , x2 , y2 ) (x , y) Problem: too expensive!

31
**Solution: The Kernel Trick**

We only need to calculate inner products f( ) f(.) Find a kernel function K such that:

32
**The Kernel Trick We only need to calculate inner products**

In order to find a1.. an we need to calculate xiTxj i,j Build a kernel matrix MnXn: M[i,j]= (xi)T(xj)=K(xi,xj) In order to classify a query q we need to calculate wTq+b:

33
**Inner product Distance Function**

We only need to calculate inner products In our case: convert to distance function Parallelogram law: ||u+v||^2+||u-v||^2=2||u||^2+2||v||^2 From “origin” Pairwise distance

34
**Inner product Distance Function**

Use the fact that we only need to calculate inner products In order to find a1.. an we need to calculate xiTxj i,j Build a distance matrix DnXn: D[i,j] = xiTxj = 1/2ּ[d(xi,0)+d(xj,0)-d(xi,xj)] In order to classify a query q we need to calculate wTq+b:

35
**SVM Pros and Cons Pros: Easy to integrate different distance functions**

Fast classification of new objects (depends on SV) Good performance even with small set of examples Cons: Slow training ( O(n2), n=# of vectors in training set ) Separates only 2 classes להזכיר שהחיסרון הראשון "נעלם" כאשר מדובר על סט קטן של דוגמאות

36
**Outline Motivation and Introduction Classification Algorithms**

K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

37
**Multiclass SVM Extend SVM for multi-classes separation**

Nc = number of classes Class 2 Class 1 Class 5 Class 4 Class 3

38
**Two approaches Class 1 Class 2 Class 3 Class 4**

1-vs-rest 1-vs-1 DAGSVM Combine multi-binary-classifiers Generate one function based on single optimization problem

39
1-vs-rest Class 2 Class 1 Class 4 Class 3

40
1-vs-rest w2 w1 Class 2 Class 1 w3 w4 Nc classifiers Class 3 Class 4

41
**1-vs-rest Class 2 Class 1 Class 3 Class 4 w2 w1 w3 w4**

~ Similarity(q,SV3) q ~ Similarity(q,SV2) w1Tq+b1 ~ Similarity(q,SV1) ~ Similarity(q,SV4) Class 3 Class 4

42
**argmax1≤i ≤Nc{Sim(q,SVi)}**

1-vs-rest w2 w1 Class 2 Class 1 w3 w4 q Label(q)= argmax1≤i ≤Nc{Sim(q,SVi)} Class 3 Class 4

43
**1-vs-rest After training we’ll have Nc decision functions:**

fi(x)=wiTx+bi Class of query object q is determined by: argmax1≤i ≤Nc{ wiTx+bi } Pros: Only Nc classifiers to be trained and tested Cons: Every classifier use all vectors for training No bound on generalization error

44
**1-vs-rest Complexity For training:**

Nc classifiers, each using n vectors for finding hyperplane For classifying new objects: Nc classifiers, each is tested once, M=max number of SV

45
1-vs-1 Class 2 Class 1 Class 4 Class 3

46
**1-vs-1 Nc(Nc-1)/2 classifiers Class 2 Class 1 Class 4 Class 3 W1,2**

47
**1-vs-1 with Max Wins ☺ ☺ ☺ ☺ ☺ ☺ Class 2 Class 1 Class 4 Class 3 W1,2**

q W2,3 ~ 2 or 4 ? Sign(w1,2Tq+b1,2) ~ 1 or 2 ? W1,3 ~ 1 or 3 ? W2,4 ~ 1 or 4 ? ~ 3 or 4 ? W3,4 ~ 2 or 3 ? Class 4 Class 3 ☺ ☺

48
**1-vs-1 with Max Wins ☺ ☺ ☺ ☺ ☺ ☺ Class 2 Class 1 Class 4 Class 3 W1,2**

q W2,3 W1,3 W2,4 W3,4 Class 4 Class 3 ☺ ☺

49
1-vs-1 with Max Wins After training we’ll have Nc(Nc-1)/2 decision functions: fij(x)=sign(wijTx+bij) Class of query object x is determined by max-votes Pros: Every classifier use a small set of vectors for training Cons: Nc(Nc-1)/2 classifiers to be trained and tested No bound on generalization error

50
**1-vs-1 Complexity For training:**

Assume that every class contains ~ n/Nc instances Nc(Nc-1)/2 classifiers, each using ~2n/Nc vectors: For classifying new objects: Nc(Nc-1)/2 classifiers, each is tested once, M as before

51
**What did we have so far? 1-vs-1 1-vs-rest Nc(Nc-1)/2 Nc**

Class 1 Class 2 Class 3 Class 4 Class 1 Class 2 Class 3 Class 4 1-vs-1 1-vs-rest Nc(Nc-1)/2 Nc # of classifiers (each need to be trained and tested) ~2n/Nc n (all vectors) # of vectors for training (per classifier) No bound on generalization error להזכיר שכשהאימון נעשה על מס' דוגמאות קטן זה אמנם יתרון מבחינת סיבוכיות, אך יכול להיות חסרון מבחינת ביצועים

52
**DAGSVM 1-vs-1 Decision DAG (DDAG) 4 1 2 3**

3 4 2 3 4 1 2 1 2 3 2 3 not 1 not 2 not 3 not 4 4 1 2 3 Class 1 Class 2 Class 3 Class 4 W1,2 W1,3 W1,4 W2,3 W3,4 W2,4 J. C. Platt et al., Large margin DAGs for multiclass classification. NIPS, 1999.

53
**Binary decision function Nc(Nc-1)/2 internal nodes**

DDAG on Nc Classes Single root node 1 vs 4 3 vs 4 2 vs 4 1 vs 3 2 vs 3 1 vs 2 3 4 2 3 4 1 2 1 2 3 2 3 not 1 not 2 not 3 not 4 4 1 2 3 In every node: Binary decision function Nc(Nc-1)/2 internal nodes DAG Nc leaves, one per class

54
**Building the DDAG 1 2 3 4 change list order no affect on results 4 3 2**

1 vs 4 change list order no affect on results not 1 not 4 2 3 4 2 vs 4 1 2 3 1 vs 3 not 2 not 4 not 1 not 3 2 3 3 vs 4 2 vs 3 1 vs 2 3 4 1 2 4 3 2 1

55
**Classification using DDAG**

1 vs 4 W1,2 ~ 1 or 2 ? q Class 2 Class 1 ~ 1 or 4 ? not 1 not 4 W1,4 ~ 1 or 3 ? 2 3 4 2 vs 4 1 2 3 1 vs 3 W1,3 W2,3 W2,4 W3,4 not 2 not 4 not 1 not 3 בהנחה שה-classes ניתנים להפרדה והשוליים שמתקבלים אכן גדולים אזי הגיוני "להיפטר" מה-class שלא בחרנו לסווג אליה בכל פעם. 3 4 2 3 3 vs 4 2 vs 3 1 vs 2 1 2 Class 4 Class 3 4 3 2 1

56
**DAGSVM Pros: Only Nc-1 classifiers to be tested**

Every classifier uses a small set of vectors for training Bound on generalization error (~margins size) Cons: Less vectors for training worse classifier? Nc(Nc-1)/2 classifiers to be trained

57
**DAGSVM Complexity For training:**

Assume that every class contains ~n/Nc instances Nc(Nc-1)/2 classifiers, each using ~2n/Nc vectors: For classifying new objects: Nc-1 classifiers, each is tested once M = max number of SV

58
**Classification complexity**

Multiclass SVM DAGSVM 1-vs-1 1-vs-rest Nc # of classifiers O(Dּn2) O(DּNcn2) Training complexity O(M2ּNc) O(M2ּNc2) O(M1ּNc) Classification complexity

59
**Multiclass SVM comparison**

Classification Training

60
**Multiclass SVM - Summary**

Training: Classification: Error rates: Bound of generalization error - only on DAGSVM In practice – 1-vs-1 and DAGSVM The “one big optimization” methods Similar error rates Very slow training – limited to small data sets 1-vs-rest DAGSVM / 1-vs-1 O(DּNcּn2) O(Dּn2) 1-vs-1 DAGSVM / 1-vs-rest O(DּMּNc2) O(DּMּNc)

61
**So what do we have? Nearest Neighbor (KNN) SVM Fast**

Suitable for multi-class Easy to integrate different distance functions Problematic with few samples SVM Good performance even with small set of examples No natural extension to multi-class Slow to train Class 1 Class 2

62
**SVM KNN - From coarse to fine**

Suggestion Hybrid system KNN SVM Zhang et al, SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition, 2006

63
**Outline Motivation and Introduction Classification Algorithms**

K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

64
**SVM KNN – General Algorithm**

Calculate distance from query to training images Query image Class 1 Class 2 Class 3 Training images and query

65
**SVM KNN – General Algorithm**

Calculate distance from query to training images Pick K nearest neighbors Query image Class 1 Class 2 Class 3 Training images and query

66
**SVM KNN – General Algorithm**

Calculate distance from query to training images Pick K nearest neighbors Run SVM Query image Class 1 Class 2 Class 3 SVM works well with few samples Training images and query

67
**SVM KNN – General Algorithm**

Calculate distance from query to training images Pick K nearest neighbors Run SVM Label ! Query image Class 1 Class 2 Class 3 Query image Class 2 Training images and query

68
**Training + Classification**

Calculate distance from query to training images Pick K nearest neighbors Run SVM Label ! KNN SVM Classic process: Training Classification SVM-KNN Coarse Classification Final classification

69
**Details Details Details**

Calculate distance from query to training images Pick K nearest neighbors Run SVM Label ! KNN SVM Calculating distance is a heavy task Compute crude distance – faster Finding Kpotential images Ignore all other images Compute accurate distance Only relative to the Kpotential images L2 Accurate Kpotential

70
**Details Details Details**

Calculate distance from query to training images Pick K nearest neighbors Run SVM Label ! KNN SVM Complexity: Crude distance Accurate distance L2 Accurate Kpotential

71
**Details Details Details**

Calculate distance from query to training images Pick K nearest neighbors Run SVM Label ! KNN SVM If K neighbors are from the same class Done

72
**Details Details Details**

Calculate distance from query to training images Pick K nearest neighbors Run SVM Label ! KNN SVM Construct pairwise inner product matrix Improvement – cache distance calculation

73
**Details Details Details**

Calculate distance from query to training images Pick K nearest neighbors Run SVM Label ! KNN SVM Selected SVM: DAGSVM (faster) Complexity: 1 vs 4 3 vs 4 2 vs 4 1 vs 3 2 vs 3 1 vs 2

74
**Complexity Total complexity DAGSVM training complexity**

Calculate distance from query to training images Pick K nearest neighbors Run SVM Label ! KNN SVM Total complexity DAGSVM training complexity

75
**SVM KNN – continuum Defining an SVM-KNN continuum: NN SVM**

K = n (#images) NN KNN SVM SVM More than MAJ Biological motivation Human visual system

78
**SVM KNN Summary Similarity to prototypes**

Combining Advantages from both methods NN – Fast, suitable for multiclass SVM – performs well with few samples and classes Compatible with many types of distance functions Biological motivation: Human visual system Discriminative process

79
**Outline Motivation and Introduction Classification Algorithms**

K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

80
**D( ) = ?? , Distance functions Shape Texture Query image**

Class 1 Class 2 Class 3 Training images and query Shape Texture D( , ) = ??

81
**Understanding the need - Shape**

Well, which is it?? Capturing the shape Distance 1: Shape context Distance 2: Tangent distance query

82
**Distance 1: Shape context**

Find point correspondences Estimate transformation Distance correspondence quality transformation quality prototype query Belongie et al., Shape matching and object recognition using shape contexts, IEEE Trans. (2002)

83
**Find correspondences Detector - Use edge points**

Descriptor - Create “Landscape” Relationship to other edge points Histogram of orientations and distances Count = 5 Count = 6 prototype query

84
**Find correspondence Detector - Use edge points**

Descriptor - Create “Landscape” Relationship to other edge points Histogram of orientations and distances Matching compare histograms ( ) prototype query

85
**Distance 1: Shape context**

Find point correspondences Estimate transformation Distance correspondence quality transformation (quality, magnitude) prototype query

86
**MNIST – Digit DB 70,000 handwritten digits Each image 28x28**

Us postal service

87
**MNIST results Human error rate – 0.2% Better methods exist < 1%**

88
**Distance 2: Tangent distance**

Distance includes invariance to small changes small rotations translations thickening Prototype query Taking the original image and allowing small rotations Simard et al., Transformation invariance in pattern recognition-tangent distance and tangent propagation. Neural Networks (1998)

89
**Space induced by rotation**

Rotation function α=1 α=0 But – this space might be nonlinear therefore we actually look at a linear approximation Dimension = 1 α= -1 α= -2 Pixel space

90
**Tangent distance – Visual intuition**

SQ The Tangent SP Prototype Image Desired distance But – calculating distance between non linear curves can be difficult Solution: Use linear approximation The Tangent P Q Query Image Euclidian distance (L2) Pixel space

91
**Tangent Distance - General**

For every image, create surface allowing transformations Rotations Translations Thickness, etc. Find a linear approximation - the tangent plane Distance Calculate distance between linear planes Has efficient solutions 7 dimensions

92
**USPS – digit DB 9298 handwritten digits taken from mail envelopes**

Each image 16x16 Us postal service

93
**USPS results Human error rate – 2.5% For L2 – For tangent not optimal**

Q Human error rate – 2.5% For L2 – not optimal DAGSVM has similar results For tangent NN similar results DAGSVM similar to SVMKNN but SVM KNN is faster According to the paper on tangent distance, it received a 2.5% with NN using tangent distance.

94
**Understanding Texture**

Texture samples How to represent Texture??

95
**Texture representation**

Represent using responses to a filter bank Texture patch Filter bank – 48 filters Filter responses for pixel P1 Filter responses for pixel 0.1 0.8 . 0.3 P2 0.6 Filter responses for pixel -0.4 -0.7 . 0.17 P3 48 Motivation – V1 -0.2 . …. 0.4

96
**Correspond to pixels of one image**

Introducing Textons Filter responses – points in 48 dimensional space A texture patch – spatially repeating Representation is redundant Select representative responses (K-means) Correspond to pixels of one image Texture patch P1 P2 P3 Filter responses in 48-dimensional space Textons ! T. Leung, J. Malik Representing and recognizing the visual appearance of materials using three-dimensional textons (2001)

97
**“Building blocks“ for all textures**

Universal textons “Building blocks“ for all textures Prototype textures Filter bank Texton Filter responses in 48-dim space T1 T2 T3 T4

98
**Distance 3: of Texton histograms**

For a query texture Create filter responses Build texton histogram (using universal textons) Query texture Filter bank Filter responses in 48-dim space T1 T2 T3 T4 T1 T2 T3 T4 Query Texton histogram

99
**Distance 3: of Texton histograms**

For a query texture Create texton histogram Build texton histogram (using universal textons) Distance compare histograms ( ) Prototype textures Query texture Query Texton histogram Prototype Texton histogram T1 T2 T3 T4 T1 T2 T3 T4

100
**CUReT – texture DB 61 textures Different view points**

Different illuminations

101
CUReT Results T1 T2 T3 T4 (comparing texton histograms)

102
**Caltech-101 DB 102 categories Distance function**

variations in color, pose, illumination Distance function combination of texture and shape 2 algorithms Algo. A, Algo. B Samples from the Caltech-101 DB

103
**Caltech-101 Results 66% correct Correct rate (%) Algo. B:**

(15 training images) 66% correct Correct rate (%) Algo. B: Using only DAGSVM (no KNN) Still a long way to go…

104
**Motivation – Human Visual System**

Large Number of Categories (~30,000) Discriminative Process Small Set of Examples Invariance to transformation Similarity to Prototype instead of Features

105
**Summary Popular methods NN SVM DAGSVM - extension to multi-class SVM**

The hybrid method – SVM KNN Motivated by human perception (??) Improved complexity Better methods exist? A taste of the distance Shape, Texture Results classification method distance function Class 1 Class 2 1 vs 4 3 vs 4 2 vs 4 1 vs 3 2 vs 3 1 vs 2 P Q T1 T2 T3 T4

106
References H. Zhang, A. C. Berg, M. Maire and J. Malik. SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition. IEEE, Vol. 2, pages , 2006. P. Vincent and Y. Bengio. K-local hyperplane and convex distance nearest neighbor algorithms. NIPS, pages , 2001. J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAGs for multiclass classification. NIPS, pages , 1999. C. Hsu and C. Lin. A comparison of methods for multiclass support vector machines. IEEE, Vol. 13, pages , 2002. T. Leung and J. Malik. Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Computation Vision, 43(1):29-44, 2001. P. Simard, Y. LeCun, J. S. Denker, and B. Victorri. Transformation invariance in pattern recognition-tangent distance and tangent propagation. Neural Networks: Tricks of the Trade, pages , 1998. S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE, Vol. 24, pages , 2002.

107
Thank You!

Similar presentations

Presentation is loading. Please wait....

OK

Handwritten digit recognition

Handwritten digit recognition

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google