Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiclass SVM and Applications in Object Classification Yuval Kaminka, Einat Granot Advanced Topics in Computer Vision Seminar Faculty of Mathematics.

Similar presentations


Presentation on theme: "Multiclass SVM and Applications in Object Classification Yuval Kaminka, Einat Granot Advanced Topics in Computer Vision Seminar Faculty of Mathematics."— Presentation transcript:

1 Multiclass SVM and Applications in Object Classification Yuval Kaminka, Einat Granot Advanced Topics in Computer Vision Seminar Faculty of Mathematics and Computer Science Weizmann Institute May 2007

2 Outline Motivation and Introduction Classification Algorithms K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

3 Object Classification ?

4 Motivation – Human Visual System Large Number of Categories (~30,000) Discriminative Process Small Set of Examples Invariance to transformation Similarity to Prototype instead of Features

5 Similarity to Prototypes Vs Features No need for Feature Space Easy to enlarge number of categories Includes spatial relation between features

6 Similarity is defined by Distance Function Easy to adjust to different types (Shape, Texture) Can include invariance to intra-class transformations Distance Function D(, )

7 Distance Function – simple example D(, ) = || 2.1, 27, 31, 15, 8. - || 13, 45, 22.5, 78, 91. D(, ) = ?

8 Outline Motivation and Introduction Classification Algorithms K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

9 A Classic Classification Problem X4X4 X2X2 X5X5 X6X6 X7X7 X3X3 q X1X1 Training Set S: (X 1..X n ), with class label (Y 1.. Y n ) Given a query image q, determine its label

10 Nearest Neighbor (NN) ?

11 K-Nearest Neighbor (KNN) ? K = 3

12 K-NN Pros Simple, yet outperforms other methods Low Complexity: O(D ּ n) D - the cost per one distance function calculation No need for Feature Space definition No computational cost for adding new categories n  ∞ ==> Error Rate  Bayes optimal

13 K-NN Cons P. Vincent et al., K-local hyperplane and convex distance nearest neighbor algorithms, NIPS 2001 Complete SetMissing Set NNSVM

14 Outline Motivation and Introduction Classification Algorithms K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

15 SVM Two class classification algorithm We’re looking for a hyperplane that best separates the classes Class 1 Class 2 Some of the slides on SVM are adapted with permission from Martin Law’s presentation on SVM

16 SVM - Motivation Class 1 Class 2 Class 1 Class 2 As far away as possible from the data of both classes

17 SVM – A learning algorithm KNN – simple classification, no training Class 1 Class 2 SVM – a learning algorithm Two Phases: 1. Training – find the hyperplane 2. Classification – label a new query

18 SVM – Training Phase Class 1 Class 2 W w T x+b=0 ~b 1.Classify correctly the classes 2.Give maximum margins We’re looking for (w,b) that will:

19 1. Correct classification Class 1 Class 2 w T x+b=0 Correct classification: w T x i +b>0 for green, and w T x i +b<0 for red Assume the labels {y 1.. y n } are from the set {-1,1}: {x 1,..., x n } our training set

20 2. Margin maximization Class 1 Class 2 m m = ?

21 2. Margin maximization z Class 1 Class 2 |w T z+b| ||w|| We can scale (w,b)  ( w, b), >0 Won’t change classification: w T x+b>0  w T x+ b>0 Get a desired distance: |w T z+b|=a  =1/a, | w T z+ b|=1 m

22 SVM as an Optimization Problem Maximize margins Correct Classification We can find    n, such that: Solve optimization problem with constraints Langrangian multipliers C.J.C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition, 1998.

23 SVM as an Optimization Problem Maximize margins Correct Classification Classic optimization problem with constraints s.t.

24 SVM as an Optimization Problem There must exist positive  1..  n such that: And in our case: There must exist positive  1..  n such that: f(x)g i (x) s.t.

25 Support Vectors Class 1 Class 2 x i with  i >0 are called support vectors (SV) w is determined only by the SV  =0  >0

26 Allowing errors Class 1 Class 2 w T x+b=0 w T x+b=1 w T x+b=-1 We would now like to minimize

27 Allowing errors As before we get: Class 1 Class 2

28 SVM – Classification phase Class 1 Compute w T q+b Class 2 q Classify as class 1 if positive, and class 2 otherwise

29 Upgrade SVM 1.In order to find  1..  n we need to calculate x i T x j  i,j 2.In order to classify a query q we need to calculate: We only need to calculate inner products

30 Feature Expansion  ( )  (.)  ( ) Extended spaceInput space  (.) ( 1, x, y, xy, x 2, y 2 )(x, y) Problem: too expensive!

31 Solution: The Kernel Trick Find a kernel function K such that: We only need to calculate inner products  ( )  (.)  ( )

32 The Kernel Trick 1.In order to find  1..  n we need to calculate x i T x j  i,j Build a kernel matrix M nXn : M[i,j]=  (x i ) T  (x j )=K(x i,x j ) 2.In order to classify a query q we need to calculate w T q+b: We only need to calculate inner products

33 Inner product  Distance Function From “origin” Pairwise distance In our case: convert to distance function We only need to calculate inner products

34 1.In order to find  1..  n we need to calculate x i T x j  i,j Build a distance matrix D nXn : D[i,j] = x i T x j = 1/2 ּ [d(x i,0)+d(x j,0)-d(x i,x j )] 2.In order to classify a query q we need to calculate w T q+b: Use the fact that we only need to calculate inner products Inner product  Distance Function

35 SVM Pros and Cons Pros: Easy to integrate different distance functions Fast classification of new objects (depends on SV) Good performance even with small set of examples Cons: Slow training ( O(n 2 ), n=# of vectors in training set ) Separates only 2 classes

36 Outline Motivation and Introduction Classification Algorithms K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

37 Multiclass SVM Class 1 Class 2 Class 3 Class 5 Class 4 Extend SVM for multi-classes separation N c = number of classes

38 Two approaches Class 1 Class 2 Class 3 Class 4  Combine multi-binary-classifiers  Generate one function based on single optimization problem  1-vs-rest  1-vs-1  DAGSVM

39 1-vs-rest Class 1 Class 2 Class 3 Class 4

40 1-vs-rest Class 1 Class 2 Class 3Class 4 w1w1 w3w3 w4w4 w2w2 N c classifiers

41 1-vs-rest Class 1 Class 2 Class 3Class 4 q w1w1 w3w3 w4w4 w2w2 w 1 T q+b 1 ~ Similarity(q,SV 1 ) ~ Similarity(q,SV 3 ) ~ Similarity(q,SV 2 ) ~ Similarity(q,SV 4 )

42 1-vs-rest Class 1 Class 2 Class 3Class 4 q w1w1 w3w3 w4w4 w2w2 Label(q)= argmax 1≤i ≤Nc {Sim(q,SV i )}

43 1-vs-rest  After training we’ll have N c decision functions: f i (x)=w i T x+b i  Class of query object q is determined by: argmax 1 ≤i ≤Nc { w i T x+b i }  Pros:  Only N c classifiers to be trained and tested  Cons:  Every classifier use all vectors for training  No bound on generalization error

44 1-vs-rest Complexity For training: N c classifiers, each using n vectors for finding hyperplane For classifying new objects: N c classifiers, each is tested once, M=max number of SV

45 1-vs-1 Class 1 Class 2 Class 3 Class 4

46 1-vs-1 Class 1 Class 2 Class 3 Class 4 W 1,2 W 1,3 W 1,4 W 2,3 W 3,4 W 2,4 N c (N c -1)/2 classifiers

47 1-vs-1 with Max Wins Class 1 Class 2 Class 3 Class 4 W 1,2 W 1,3 W 1,4 W 2,3 W 3,4 W 2,4 q Sign(w 1,2 T q+b 1,2 ) ~ 1 or 2 ? ☺☺☺ ☺ ☺☺ ~ 1 or 3 ? ~ 1 or 4 ? ~ 2 or 3 ? ~ 3 or 4 ? ~ 2 or 4 ?

48 1-vs-1 with Max Wins Class 1 Class 2 Class 3 Class 4 W 1,2 W 1,3 W 1,4 W 2,3 W 3,4 W 2,4 q ☺☺☺ ☺ ☺☺

49 1-vs-1 with Max Wins  After training we’ll have N c (N c -1)/2 decision functions: f ij (x)=sign(w ij T x+b ij )  Class of query object x is determined by max-votes  Pros:  Every classifier use a small set of vectors for training  Cons:  N c (N c -1)/2 classifiers to be trained and tested  No bound on generalization error

50 1-vs-1 Complexity For training: Assume that every class contains ~ n/N c instances N c (N c -1)/2 classifiers, each using ~2n/N c vectors: For classifying new objects: N c (N c -1)/2 classifiers, each is tested once, M as before

51 What did we have so far? 1-vs-11-vs-rest N c (N c -1)/2NcNc # of classifiers (each need to be trained and tested) ~2n/N c n (all vectors) # of vectors for training (per classifier) No bound on generalization error Class 1 Class 2 Class 3 Class 4 Class 1Class 2 Class 3 Class 4

52 DAGSVM Class 1 Class 2 Class 3 Class 4 W 1,2 W 1,3 W 1,4 W 2,3 W 3,4 W 2,4 1-vs-1 Decision DAG (DDAG) 1 vs 4 3 vs 4 2 vs 4 1 vs 3 2 vs 31 vs not 1 not 2 not 3 not J. C. Platt et al., Large margin DAGs for multiclass classification. NIPS, 1999.

53 DDAG on N c Classes DAG N c leaves, one per class Single root node In every node: Binary decision function N c (N c -1)/2 internal nodes 1 vs 4 3 vs 4 2 vs 4 1 vs 3 2 vs 31 vs not 1 not 2 not 3 not

54 Building the DDAG 1 1 vs 4 3 vs 4 2 vs 4 1 vs 3 2 vs 31 vs not 1 not 2 not 3 not change list order no affect on results

55 Classification using DDAG Class 1 Class 2 Class 3 Class 4 W 1,2 W 1,3 W 1,4 W 2,3 W 3,4 W 2,4 q 1 vs 4 3 vs 4 2 vs 4 1 vs 3 2 vs 31 vs not 1 not 2 not 3 not ~ 1 or 4 ? ~ 1 or 3 ? ~ 1 or 2 ?

56 DAGSVM  Pros:  Only N c -1 classifiers to be tested  Every classifier uses a small set of vectors for training  Bound on generalization error (~margins size)  Cons:  Less vectors for training  worse classifier?  N c (N c -1)/2 classifiers to be trained

57 DAGSVM Complexity For training: Assume that every class contains ~n/N c instances N c (N c -1)/2 classifiers, each using ~2n/N c vectors: For classifying new objects: N c -1 classifiers, each is tested once M = max number of SV

58 Multiclass SVM DAGSVM1-vs-11-vs-rest NcNc # of classifiers O(D ּ n 2 ) O(D ּ N c n 2 ) Training complexity O(M 2 ּ N c )O(M 2 ּ N c 2 )O(M 1 ּ N c ) Classification complexity

59 Multiclass SVM comparison Classification Training

60 Multiclass SVM - Summary Training: Classification: Error rates: Bound of generalization error - only on DAGSVM In practice – 1-vs-1 and DAGSVM The “ one big optimization ” methods Similar error rates Very slow training – limited to small data sets 1-vs-restDAGSVM / 1-vs-1 O(D ּ N c ּ n 2 )O(D ּ n 2 ) 1-vs-1DAGSVM / 1-vs-rest O(D ּ M ּ N c 2 )O(D ּ M ּ N c )

61 So what do we have? Nearest Neighbor (KNN) Fast Suitable for multi-class Easy to integrate different distance functions Problematic with few samples SVM Good performance even with small set of examples Easy to integrate different distance functions No natural extension to multi-class Slow to train Class 1 Class 2

62 SVM KNN - From coarse to fine Suggestion  Hybrid system KNNSVM Zhang et al, SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition, 2006

63 Outline Motivation and Introduction Classification Algorithms K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

64 SVM KNN – General Algorithm 1. Calculate distance from query to training images Training images and query Query image KNN Class 1 Class 2 Class 3

65 SVM KNN – General Algorithm 1. Calculate distance from query to training images 2. Pick K nearest neighbors Training images and query Query image KNN Class 1 Class 2 Class 3

66 SVM KNN – General Algorithm 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM Training images and query Query image SVM Class 1 Class 2 Class 3 SVM works well with few samples

67 SVM KNN – General Algorithm 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! Training images and query Query image Query image  Class 2 SVM Class 1 Class 2 Class 3

68 Training + Classification 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! KNN SVM Classic process: Training Classification SVM-KNN Coarse Classification Training Final classification

69 Details Details Details 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! KNN SVM K potential Calculating distance is a heavy task Compute crude distance – faster Finding K potential images Ignore all other images Compute accurate distance Only relative to the K potential images Accurate L2

70 Details Details Details 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! Complexity: Crude distance Accurate distance KNN SVM K potential Accurate L2

71 Details Details Details 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! If K neighbors are from the same class  Done KNN SVM

72 Details Details Details 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! Construct pairwise inner product matrix Improvement – cache distance calculation KNN SVM

73 Details Details Details 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! Selected SVM: DAGSVM (faster) Complexity: KNN SVM 1 vs 4 3 vs 4 2 vs 4 1 vs 3 2 vs 31 vs 2

74 Complexity 1. Calculate distance from query to training images 2. Pick K nearest neighbors 3. Run SVM 4. Label ! Total complexity DAGSVM training complexity KNN SVM

75 SVM KNN – continuum Defining an SVM-KNN continuum: K = 1 K = n (#images) NNSVM KNN Biological motivation  Human visual system More than MAJ

76

77

78 SVM KNN Summary Similarity to prototypes Combining Advantages from both methods NN – Fast, suitable for multiclass SVM – performs well with few samples and classes Compatible with many types of distance functions Biological motivation: Human visual system Discriminative process

79 Outline Motivation and Introduction Classification Algorithms K-Nearest neighbors (KNN) SVM Multiclass SVM DAGSVM SVM-KNN Results - A taste of the distance Shape distance (shape context, tangent) Texture (texton histograms)

80 Distance functions Shape Texture D(, ) = ?? Training images and query Query image Class 1 Class 2 Class 3

81 Understanding the need - Shape Well, which is it?? Capturing the shape Distance 1: Shape context Distance 2: Tangent distance query

82 Distance 1: Shape context 1. Find point correspondences 2. Estimate transformation 3. Distance correspondence quality transformation quality prototypequery Belongie et al., Shape matching and object recognition using shape contexts, IEEE Trans. (2002)

83 Find correspondences Detector - Use edge points Descriptor - Create “ Landscape ” Relationship to other edge points Histogram of orientations and distances Count = 5 Count = 6 prototypequery

84 Find correspondence Detector - Use edge points Descriptor - Create “ Landscape ” Relationship to other edge points Histogram of orientations and distances Matching  compare histograms ( ) prototypequery

85 Distance 1: Shape context 1. Find point correspondences 2. Estimate transformation 3. Distance correspondence quality transformation (quality, magnitude) prototypequery

86 MNIST – Digit DB 70,000 handwritten digits Each image 28x28

87 MNIST results Human error rate – 0.2% Better methods exist < 1% Error rate (%)

88 Distance 2: Tangent distance Distance includes invariance to small changes small rotations translations thickening Simard et al., Transformation invariance in pattern recognition-tangent distance and tangent propagation. Neural Networks (1998) Prototypequery

89 Space induced by rotation α=0 α= -1 α= -2 α=1 Pixel space Dimension = 1 Rotation function

90 Tangent distance – Visual intuition Pixel space Desired distance P Q Prototype Image Query Image SPSP SQSQ But – calculating distance between non linear curves can be difficult Solution: Use linear approximation The Tangent Euclidian distance (L2)

91 Tangent Distance - General For every image, create surface allowing transformations Rotations Translations Thickness, etc. Find a linear approximation - the tangent plane Distance  Calculate distance between linear planes Has efficient solutions  7 dimensions

92 9298 handwritten digits taken from mail envelopes Each image 16x16 USPS – digit DB

93 USPS results Human error rate – 2.5% For L2 – not optimal DAGSVM has similar results For tangent NN similar results P Q

94 Understanding Texture How to represent Texture?? Texture samples

95 Filter responses for pixel P1P1 Texture representation Represent using responses to a filter bank Filter bank – 48 filters 48 Filter responses for pixel P2P2 Filter responses for pixel P3P3 …. Texture patch

96 P1P1 P2P2 P3P3 Filter responses in 48-dimensional space Introducing Textons Filter responses – points in 48 dimensional space A texture patch – spatially repeating Representation is redundant Select representative responses (K-means) Textons ! Texture patch T. Leung, J. Malik Representing and recognizing the visual appearance of materials using three-dimensional textons (2001) Correspon d to pixels of one image

97 Universal textons Texton Filter responses in 48- dim space T1T1 T2T2 T3T3 T4T4 Prototype textures Filter bank “Building blocks“ for all textures

98 Filter bank Distance 3: of Texton histograms For a query texture 1. Create filter responses 2. Build texton histogram (using universal textons) Filter responses in 48-dim space T1T1 T2T2 T3T3 T4T4 Query texture T1T1 T2T2 T3T3 T4T4 Query Texton histogram

99 Distance 3: of Texton histograms For a query texture 1. Create texton histogram 2. Build texton histogram (using universal textons) 3. Distance  compare histograms ( ) Prototype Texton histogram T1T1 T2T2 T3T3 T4T4 T1T1 T2T2 T3T3 T4T4 T1T1 T2T2 T3T3 T4T4 T1T1 T2T2 T3T3 T4T4 T1T1 T2T2 T3T3 T4T4 T1T1 T2T2 T3T3 T4T4 T1T1 T2T2 T3T3 T4T4 Query Texton histogram Prototype textures Query texture

100 CUReT – texture DB 61 textures Different view points Different illuminations

101 CUReT Results (comparing texton histograms) T1T1 T2T2 T3T3 T4T4

102 Caltech-101 DB 102 categories variations in color, pose, illumination Distance function combination of texture and shape 2 algorithms  Algo. A, Algo. B Samples from the Caltech-101 DB

103 Caltech-101 Results (15 training images) Still a long way to go … Algo. B: Using only DAGSVM (no KNN) Correct rate (%) 66% correct

104 Motivation – Human Visual System Large Number of Categories (~30,000) Discriminative Process Small Set of Examples Invariance to transformation Similarity to Prototype instead of Features

105 Summary Popular methods NN SVM DAGSVM - extension to multi-class SVM The hybrid method – SVM KNN Motivated by human perception (??) Improved complexity Better methods exist? A taste of the distance Shape, Texture Results classification method distance function T1T1 T2T2 T3T3 T4T4 P Q Class 1 Class 2 1 vs 4 3 vs 4 2 vs 4 1 vs 3 2 vs 31 vs 2

106 References H. Zhang, A. C. Berg, M. Maire and J. Malik. SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition. IEEE, Vol. 2, pages , P. Vincent and Y. Bengio. K-local hyperplane and convex distance nearest neighbor algorithms. NIPS, pages , J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAGs for multiclass classification. NIPS, pages , C. Hsu and C. Lin. A comparison of methods for multiclass support vector machines. IEEE, Vol. 13, pages , T. Leung and J. Malik. Representing and recognizing the visual appearance of materials using three-dimensional textons. Int. J. Computation Vision, 43(1):29-44, P. Simard, Y. LeCun, J. S. Denker, and B. Victorri. Transformation invariance in pattern recognition-tangent distance and tangent propagation. Neural Networks: Tricks of the Trade, pages , S. Belongie, J. Malik, and J. Puzicha. Shape matching and object recognition using shape contexts. IEEE, Vol. 24, pages , 2002.

107 Thank You!


Download ppt "Multiclass SVM and Applications in Object Classification Yuval Kaminka, Einat Granot Advanced Topics in Computer Vision Seminar Faculty of Mathematics."

Similar presentations


Ads by Google