Download presentation

Presentation is loading. Please wait.

Published byIris Raby Modified about 1 year ago

1
Multiclass SVM and Applications in Object Classification Yuval Kaminka, Einat Granot Advanced Topics in Computer Vision Seminar Faculty of Mathematics and Computer Science Weizmann Institute May 2007

2
Outline Motivation and Introduction Classification Algorithms K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

3
Object Classification ?

4
Motivation – Human Visual System Large Number of Categories (~30,000) Discriminative Process Small Set of Examples Invariance to transformation Similarity to Prototype instead of Features

5
Similarity to Prototypes Vs Features No need for Feature Space Easy to enlarge number of categories Includes spatial relation between features

6
Similarity is defined by Distance Function Easy to adjust to different types (Shape, Texture) Can include invariance to intra-class transformations Distance Function D(, )

7
Distance Function – simple example 2.1 27 31.... D(, ) = || 2.1, 27, 31, 15, 8. - || 13, 45, 22.5, 78, 91. D(, ) = ?

8
Outline Motivation and Introduction Classification Algorithms K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

9
A Classic Classification Problem X4X4 X2X2 X5X5 X6X6 X7X7 X3X3 q X1X1 Training Set S: (X 1..X n ), with class label (Y 1.. Y n ) Given a query image q, determine its label

10
Nearest Neighbor (NN) ?

11
K-Nearest Neighbor (KNN) ? K = 3

12
K-NN Pros Simple, yet outperforms other methods Low Complexity: O(D ּ n) D - the cost per one distance function calculation No need for Feature Space definition No computational cost for adding new categories n ∞ ==> Error Rate Bayes optimal

13
K-NN Cons P. Vincent et al., K-local hyperplane and convex distance nearest neighbor algorithms, NIPS 2001 Complete SetMissing Set NNSVM

14
Outline Motivation and Introduction Classification Algorithms K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

15
SVM Two class classification algorithm We’re looking for a hyperplane that best separates the classes Class 1 Class 2 Some of the slides on SVM are adapted with permission from Martin Law’s presentation on SVM

16
SVM - Motivation Class 1 Class 2 Class 1 Class 2 As far away as possible from the data of both classes

17
SVM – A learning algorithm KNN – simple classification, no training Class 1 Class 2 SVM – a learning algorithm Two Phases: 1. Training – find the hyperplane 2. Classification – label a new query

18
SVM – Training Phase Class 1 Class 2 W w T x+b=0 ~b 1.Classify correctly the classes 2.Give maximum margins We’re looking for (w,b) that will:

19
1. Correct classification Class 1 Class 2 w T x+b=0 Correct classification: w T x i +b>0 for green, and w T x i +b<0 for red Assume the labels {y 1.. y n } are from the set {-1,1}: {x 1,..., x n } our training set

20
2. Margin maximization Class 1 Class 2 m m = ?

21
2. Margin maximization z Class 1 Class 2 |w T z+b| ||w|| We can scale (w,b) ( w, b), >0 Won’t change classification: w T x+b>0 w T x+ b>0 Get a desired distance: |w T z+b|=a =1/a, | w T z+ b|=1 m

22
SVM as an Optimization Problem Maximize margins Correct Classification We can find n, such that: Solve optimization problem with constraints Langrangian multipliers C.J.C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition, 1998.

23
SVM as an Optimization Problem Maximize margins Correct Classification Classic optimization problem with constraints s.t.

24
SVM as an Optimization Problem There must exist positive 1.. n such that: And in our case: There must exist positive 1.. n such that: f(x)g i (x) s.t.

25
Support Vectors Class 1 Class 2 x i with i >0 are called support vectors (SV) w is determined only by the SV =0 >0

26
Allowing errors Class 1 Class 2 w T x+b=0 w T x+b=1 w T x+b=-1 We would now like to minimize

27
Allowing errors As before we get: Class 1 Class 2

28
SVM – Classification phase Class 1 Compute w T q+b Class 2 q Classify as class 1 if positive, and class 2 otherwise

29
Upgrade SVM 1.In order to find 1.. n we need to calculate x i T x j i,j 2.In order to classify a query q we need to calculate: We only need to calculate inner products

30
Feature Expansion ( ) (.) ( ) Extended spaceInput space (.) ( 1, x, y, xy, x 2, y 2 )(x, y) Problem: too expensive!

31
Solution: The Kernel Trick Find a kernel function K such that: We only need to calculate inner products ( ) (.) ( )

32
The Kernel Trick 1.In order to find 1.. n we need to calculate x i T x j i,j Build a kernel matrix M nXn : M[i,j]= (x i ) T (x j )=K(x i,x j ) 2.In order to classify a query q we need to calculate w T q+b: We only need to calculate inner products

33
Inner product Distance Function From “origin” Pairwise distance In our case: convert to distance function We only need to calculate inner products

34
1.In order to find 1.. n we need to calculate x i T x j i,j Build a distance matrix D nXn : D[i,j] = x i T x j = 1/2 ּ [d(x i,0)+d(x j,0)-d(x i,x j )] 2.In order to classify a query q we need to calculate w T q+b: Use the fact that we only need to calculate inner products Inner product Distance Function

35
SVM Pros and Cons Pros: Easy to integrate different distance functions Fast classification of new objects (depends on SV) Good performance even with small set of examples Cons: Slow training ( O(n 2 ), n=# of vectors in training set ) Separates only 2 classes

36
Outline Motivation and Introduction Classification Algorithms K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

37
Multiclass SVM Class 1 Class 2 Class 3 Class 5 Class 4 Extend SVM for multi-classes separation N c = number of classes

38
Two approaches Class 1 Class 2 Class 3 Class 4 Combine multi-binary-classifiers Generate one function based on single optimization problem 1-vs-rest 1-vs-1 DAGSVM

39
1-vs-rest Class 1 Class 2 Class 3 Class 4

40
1-vs-rest Class 1 Class 2 Class 3Class 4 w1w1 w3w3 w4w4 w2w2 N c classifiers

41
1-vs-rest Class 1 Class 2 Class 3Class 4 q w1w1 w3w3 w4w4 w2w2 w 1 T q+b 1 ~ Similarity(q,SV 1 ) ~ Similarity(q,SV 3 ) ~ Similarity(q,SV 2 ) ~ Similarity(q,SV 4 )

42
1-vs-rest Class 1 Class 2 Class 3Class 4 q w1w1 w3w3 w4w4 w2w2 Label(q)= argmax 1≤i ≤Nc {Sim(q,SV i )}

43
1-vs-rest After training we’ll have N c decision functions: f i (x)=w i T x+b i Class of query object q is determined by: argmax 1 ≤i ≤Nc { w i T x+b i } Pros: Only N c classifiers to be trained and tested Cons: Every classifier use all vectors for training No bound on generalization error

44
1-vs-rest Complexity For training: N c classifiers, each using n vectors for finding hyperplane For classifying new objects: N c classifiers, each is tested once, M=max number of SV

45
1-vs-1 Class 1 Class 2 Class 3 Class 4

46
1-vs-1 Class 1 Class 2 Class 3 Class 4 W 1,2 W 1,3 W 1,4 W 2,3 W 3,4 W 2,4 N c (N c -1)/2 classifiers

47
1-vs-1 with Max Wins Class 1 Class 2 Class 3 Class 4 W 1,2 W 1,3 W 1,4 W 2,3 W 3,4 W 2,4 q Sign(w 1,2 T q+b 1,2 ) ~ 1 or 2 ? ☺☺☺ ☺ ☺☺ ~ 1 or 3 ? ~ 1 or 4 ? ~ 2 or 3 ? ~ 3 or 4 ? ~ 2 or 4 ?

48
1-vs-1 with Max Wins Class 1 Class 2 Class 3 Class 4 W 1,2 W 1,3 W 1,4 W 2,3 W 3,4 W 2,4 q ☺☺☺ ☺ ☺☺

49
1-vs-1 with Max Wins After training we’ll have N c (N c -1)/2 decision functions: f ij (x)=sign(w ij T x+b ij ) Class of query object x is determined by max-votes Pros: Every classifier use a small set of vectors for training Cons: N c (N c -1)/2 classifiers to be trained and tested No bound on generalization error

50
1-vs-1 Complexity For training: Assume that every class contains ~ n/N c instances N c (N c -1)/2 classifiers, each using ~2n/N c vectors: For classifying new objects: N c (N c -1)/2 classifiers, each is tested once, M as before

51
What did we have so far? 1-vs-11-vs-rest N c (N c -1)/2NcNc # of classifiers (each need to be trained and tested) ~2n/N c n (all vectors) # of vectors for training (per classifier) No bound on generalization error Class 1 Class 2 Class 3 Class 4 Class 1Class 2 Class 3 Class 4

52
DAGSVM Class 1 Class 2 Class 3 Class 4 W 1,2 W 1,3 W 1,4 W 2,3 W 3,4 W 2,4 1-vs-1 Decision DAG (DDAG) 1 vs 4 3 vs 4 2 vs 4 1 vs 3 2 vs 31 vs 2 12341234 34 34 234234 1212 123123 23 23 not 1 not 2 not 3 not 4 4123 J. C. Platt et al., Large margin DAGs for multiclass classification. NIPS, 1999.

53
DDAG on N c Classes DAG N c leaves, one per class Single root node In every node: Binary decision function N c (N c -1)/2 internal nodes 1 vs 4 3 vs 4 2 vs 4 1 vs 3 2 vs 31 vs 2 12341234 3 4 234234 1212 123123 2 3 not 1 not 2 not 3 not 4 4123

54
Building the DDAG 1 1 vs 4 3 vs 4 2 vs 4 1 vs 3 2 vs 31 vs 2 12341234 3 4 234234 1212 123123 23 23 not 1 not 2 not 3 not 4 4123 234 change list order no affect on results

55
Classification using DDAG Class 1 Class 2 Class 3 Class 4 W 1,2 W 1,3 W 1,4 W 2,3 W 3,4 W 2,4 q 1 vs 4 3 vs 4 2 vs 4 1 vs 3 2 vs 31 vs 2 12341234 34 34 234234 1212 123123 23 23 not 1 not 2 not 3 not 4 4123 ~ 1 or 4 ? ~ 1 or 3 ? ~ 1 or 2 ?

56
DAGSVM Pros: Only N c -1 classifiers to be tested Every classifier uses a small set of vectors for training Bound on generalization error (~margins size) Cons: Less vectors for training worse classifier? N c (N c -1)/2 classifiers to be trained

57
DAGSVM Complexity For training: Assume that every class contains ~n/N c instances N c (N c -1)/2 classifiers, each using ~2n/N c vectors: For classifying new objects: N c -1 classifiers, each is tested once M = max number of SV

58
Multiclass SVM DAGSVM1-vs-11-vs-rest NcNc # of classifiers O(D ּ n 2 ) O(D ּ N c n 2 ) Training complexity O(M 2 ּ N c )O(M 2 ּ N c 2 )O(M 1 ּ N c ) Classification complexity

59
Multiclass SVM comparison Classification Training

60
Multiclass SVM - Summary Training: Classification: Error rates: Bound of generalization error - only on DAGSVM In practice – 1-vs-1 and DAGSVM The “ one big optimization ” methods Similar error rates Very slow training – limited to small data sets 1-vs-restDAGSVM / 1-vs-1 O(D ּ N c ּ n 2 )O(D ּ n 2 ) 1-vs-1DAGSVM / 1-vs-rest O(D ּ M ּ N c 2 )O(D ּ M ּ N c )

61
So what do we have? Nearest Neighbor (KNN) Fast Suitable for multi-class Easy to integrate different distance functions Problematic with few samples SVM Good performance even with small set of examples Easy to integrate different distance functions No natural extension to multi-class Slow to train Class 1 Class 2

62
SVM KNN - From coarse to fine Suggestion Hybrid system KNNSVM Zhang et al, SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition, 2006

63
Outline Motivation and Introduction Classification Algorithms K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

64
SVM KNN – General Algorithm 1. Calculate distance from query to training images Training images and query Query image KNN Class 1 Class 2 Class 3

65
SVM KNN – General Algorithm 1. Calculate distance from query to training images 2. Pick K nearest neighbors Training images and query Query image KNN Class 1 Class 2 Class 3

66
SVM KNN – General Algorithm 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM Training images and query Query image SVM Class 1 Class 2 Class 3 SVM works well with few samples

67
SVM KNN – General Algorithm 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! Training images and query Query image Query image Class 2 SVM Class 1 Class 2 Class 3

68
Training + Classification 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! KNN SVM Classic process: Training Classification SVM-KNN Coarse Classification Training Final classification

69
Details Details Details 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! KNN SVM K potential Calculating distance is a heavy task Compute crude distance – faster Finding K potential images Ignore all other images Compute accurate distance Only relative to the K potential images Accurate L2

70
Details Details Details 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! Complexity: Crude distance Accurate distance KNN SVM K potential Accurate L2

71
Details Details Details 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! If K neighbors are from the same class Done KNN SVM

72
Details Details Details 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! Construct pairwise inner product matrix Improvement – cache distance calculation KNN SVM

73
Details Details Details 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! Selected SVM: DAGSVM (faster) Complexity: KNN SVM 1 vs 4 3 vs 4 2 vs 4 1 vs 3 2 vs 31 vs 2

74
Complexity 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! Total complexity DAGSVM training complexity KNN SVM

75
SVM KNN – continuum Defining an SVM-KNN continuum: K = 1 K = n (#images) NNSVM KNN Biological motivation Human visual system More than MAJ

76

77

78
SVM KNN Summary Similarity to prototypes Combining Advantages from both methods NN – Fast, suitable for multiclass SVM – performs well with few samples and classes Compatible with many types of distance functions Biological motivation: Human visual system Discriminative process

79
Outline Motivation and Introduction Classification Algorithms K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

80
Distance functions Shape Texture D(, ) = ?? Training images and query Query image Class 1 Class 2 Class 3

81
Understanding the need - Shape Well, which is it?? Capturing the shape Distance 1: Shape context Distance 2: Tangent distance query

82
Distance 1: Shape context 1. Find point correspondences 2. Estimate transformation 3. Distance correspondence quality transformation quality prototypequery Belongie et al., Shape matching and object recognition using shape contexts, IEEE Trans. (2002)

83
Find correspondences Detector - Use edge points Descriptor - Create “ Landscape ” Relationship to other edge points Histogram of orientations and distances Count = 5 Count = 6 prototypequery

84
Find correspondence Detector - Use edge points Descriptor - Create “ Landscape ” Relationship to other edge points Histogram of orientations and distances Matching compare histograms ( ) prototypequery

85
Distance 1: Shape context 1. Find point correspondences 2. Estimate transformation 3. Distance correspondence quality transformation (quality, magnitude) prototypequery

86
MNIST – Digit DB 70,000 handwritten digits Each image 28x28

87
MNIST results Human error rate – 0.2% Better methods exist < 1% Error rate (%)

88
Distance 2: Tangent distance Distance includes invariance to small changes small rotations translations thickening Simard et al., Transformation invariance in pattern recognition-tangent distance and tangent propagation. Neural Networks (1998) Prototypequery

89
Space induced by rotation α=0 α= -1 α= -2 α=1 Pixel space Dimension = 1 Rotation function

90
Tangent distance – Visual intuition Pixel space Desired distance P Q Prototype Image Query Image SPSP SQSQ But – calculating distance between non linear curves can be difficult Solution: Use linear approximation The Tangent Euclidian distance (L2)

91
Tangent Distance - General For every image, create surface allowing transformations Rotations Translations Thickness, etc. Find a linear approximation - the tangent plane Distance Calculate distance between linear planes Has efficient solutions 7 dimensions

92
9298 handwritten digits taken from mail envelopes Each image 16x16 USPS – digit DB

93
USPS results Human error rate – 2.5% For L2 – not optimal DAGSVM has similar results For tangent NN similar results P Q

94
Understanding Texture How to represent Texture?? Texture samples

95
Filter responses for pixel P1P1 Texture representation Represent using responses to a filter bank 0.6 -0.2...... 0.4 Filter bank – 48 filters 48 Filter responses for pixel 0.1 0.8...... 0.3 P2P2 Filter responses for pixel -0.4 -0.7...... 0.17 P3P3 …. Texture patch

96
P1P1 P2P2 P3P3 Filter responses in 48-dimensional space Introducing Textons Filter responses – points in 48 dimensional space A texture patch – spatially repeating Representation is redundant Select representative responses (K-means) Textons ! Texture patch T. Leung, J. Malik Representing and recognizing the visual appearance of materials using three-dimensional textons (2001) Correspon d to pixels of one image

97
Universal textons Texton Filter responses in 48- dim space T1T1 T2T2 T3T3 T4T4 Prototype textures Filter bank “Building blocks“ for all textures

98
Filter bank Distance 3: of Texton histograms For a query texture 1. Create filter responses 2. Build texton histogram (using universal textons) Filter responses in 48-dim space T1T1 T2T2 T3T3 T4T4 Query texture T1T1 T2T2 T3T3 T4T4 Query Texton histogram

99
Distance 3: of Texton histograms For a query texture 1. Create texton histogram 2. Build texton histogram (using universal textons) 3. Distance compare histograms ( ) Prototype Texton histogram T1T1 T2T2 T3T3 T4T4 T1T1 T2T2 T3T3 T4T4 T1T1 T2T2 T3T3 T4T4 T1T1 T2T2 T3T3 T4T4 T1T1 T2T2 T3T3 T4T4 T1T1 T2T2 T3T3 T4T4 T1T1 T2T2 T3T3 T4T4 Query Texton histogram Prototype textures Query texture

100
CUReT – texture DB 61 textures Different view points Different illuminations

101
CUReT Results (comparing texton histograms) T1T1 T2T2 T3T3 T4T4

102
Caltech-101 DB 102 categories variations in color, pose, illumination Distance function combination of texture and shape 2 algorithms Algo. A, Algo. B Samples from the Caltech-101 DB

103
Caltech-101 Results (15 training images) Still a long way to go … Algo. B: Using only DAGSVM (no KNN) Correct rate (%) 66% correct

104
Motivation – Human Visual System Large Number of Categories (~30,000) Discriminative Process Small Set of Examples Invariance to transformation Similarity to Prototype instead of Features

105
Summary Popular methods NN SVM DAGSVM - extension to multi-class SVM The hybrid method – SVM KNN Motivated by human perception (??) Improved complexity Better methods exist? A taste of the distance Shape, Texture Results classification method distance function T1T1 T2T2 T3T3 T4T4 P Q Class 1 Class 2 1 vs 4 3 vs 4 2 vs 4 1 vs 3 2 vs 31 vs 2

106
References H. Zhang, A. C. Berg, M. Maire and J. Malik. SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition. IEEE, Vol. 2, pages 2126-2136, 2006. P. Vincent and Y. Bengio. K-local hyperplane and convex distance nearest neighbor algorithms. NIPS, pages 985-992, 2001. J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAGs for multiclass classification. NIPS, pages 547-553, 1999. C. Hsu and C. Lin. A comparison of methods for multiclass support vector machines. IEEE, Vol. 13, pages 415-425, 2002. T. Leung and J. Malik. Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Computation Vision, 43(1):29-44, 2001. P. Simard, Y. LeCun, J. S. Denker, and B. Victorri. Transformation invariance in pattern recognition-tangent distance and tangent propagation. Neural Networks: Tricks of the Trade, pages 239-274, 1998. S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE, Vol. 24, pages 509-522, 2002.

107
Thank You!

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google