Download presentation

Presentation is loading. Please wait.

Published bySandra Basford Modified about 1 year ago

1
Efficient Large-Scale Structured Learning Steve Branson Oscar Beijbom Serge Belongie CVPR 2013, Portland, Oregon UC San Diego Caltech

2
Overview Structured prediction Learning from larger datasets TINY IMAGES Large Datasets Mammal PrimateHoofed Mammal Odd-toedGorilla Deformable part models Object detection OrangutanEven-toed Cost sensitive Learning

3
Overview Available tools for structured learning not as refined as tools for binary classification 2 sources of speed improvement – Faster stochastic dual optimization algorithms – Application-specific importance sampling routine Mammal PrimateHoofed Mammal Odd-toedGorillaOranguta n Even-toed

4
Summary Usually, train time = 1-10 times test time Publicly available software package – Fast algorithms for multiclass SVMs, DPMs – API to adapt to new applications – Support datasets too large to fit in memory – Network interface for online & active learning Mammal PrimateHoofed Mammal Odd-toedGorillaOranguta n Even-toed

5
Summary Cost-sensitive multiclass SVM 10-50 times faster than SVM struct As fast as 1-vs-all binary SVM Deformable part models 50-1000 faster than – SVM struct – Mining hard negatives – SGD-PEGASOS Mammal PrimateHoofed Mammal Odd-toedGorillaOrangutanEven-toed

6
Binary vs. Structured Binary Learner SVM, Boosting, Logistic Regression, etc. Object Detection, Pose Registration, Attribute Prediction, etc. BINARY DATASET BINARY OUTPUT Structured Output Structured Dataset

7
Binary vs. Structured Binary Learner SVM, Boosting, Logistic Regression, etc. Object Detection, Pose Registration, Attribute Prediction, etc. BINARY DATASET BINARY OUTPUT Structured Output Structured Dataset Pros: binary classifier is application independent Cons: what is lost in terms of: – Accuracy at convergence? – Computational efficiency?

8
Binary vs. Structured Structured Prediction Loss Convex Upper Bound

9
Binary vs. Structured Structured Prediction Loss Convex Upper Bound Convex Upper Bound on Structured Prediction Loss

10
Binary vs. Structured Application-specific optimization algorithms that: – Converge to lower test error than binary solutions – Lower test error for all amounts of train time

11
Binary vs. Structured Application-specific optimization algorithms that: – Converge to lower test error than binary solutions – Lower test error for all amounts of train time

12
Structured SVM SVMs w/ structured output Max-margin MRF [Taskar et al. NIPS’03] [Tsochantaridis et al. ICML’04]

13
Binary SVM Solvers Quadratic to linear in trainset size

14
Binary SVM Solvers Linear to independent in trainset size Quadratic to linear in trainset size

15
Binary SVM Solvers Linear to independent in trainset size Quadratic to linear in trainset size Faster on multiple passes Detect convergence Less sensitive to regularization/learning rate

16
Structured SVM Solvers Applied to SSVMs [Shalev-Shwartz et al. JMLR’13] [Ratliff et al. AIStats’07]

17
Structured SVM Solvers Applied to SSVMS [Shalev-Shwartz et al. JMLR’13] [Ratliff et al. AIStats’07] Regularization: λ Approx. factor: ϵ Trainset size: n Prediction time: T

18
Use faster stochastic dual algorithms Incorporate application-specific importance sampling routine – Reduce train times when prediction time T is large – Incorporate tricks people use for binary methods Random ExampleImportance Sample Maximize Dual SSVM objective w.r.t. samples Our Approach

19
Random ExampleImportance Sample Maximize Dual SSVM objective w.r.t. samples (Provably fast convergence for simple approx. solver)

20
Recent Papers w/ Similar Ideas Augmenting cutting plane SSVM w/ m-best solutions Applying stochastic dual methods to SSVMs A. Guzman-Rivera, P. Kohli, D. Batra. “DivMCuts…” AISTATS’13. S. Lacoste-Julien, et al. “Block-Coordinate Frank-Wolfe…” JMLR’13.

21
Applying to New Problems 1. Loss function 2. Features 3. Importance sampling routine

22
Applying to New Problems

23
Example: Object Detection 3. Importance sampling routine Add sliding window & loss into dense score map Greedy NMS

24
Example: Deformable Part Models 3. Importance sampling routine Dynamic programming Modified NMS to return diverse set of poses

25
Cost-Sensitive Multiclass SVM 2. Features e.g., bag-of- words 3. Importance sampling routine Return all classes Exact solution using 1 dot product per class cat dog ant flycarbus cat dog antflycarbus

26
Results: CUB-200-2011 Pose mixture model, 312 part/pose detectors Occlusion/visibility model Tree-structured DPM w/ exact inference

27
Results: CUB-200-2011 5794 training examples400 training examples ~100X faster than mining hard negatives and SVM struct 10-50X faster than stochastic sub-gradient methods Close to convergence at 1 pass through training set

28
Results: ImageNet Comparison to other fast linear SVM solvers Comparison to other methods for cost-sensitive SVMs Faster than LIBLINEAR, PEGASOS 50X faster than SVM struct

29
Conclusion Orders of magnitude faster than SVM struct Publicly available software package – Fast algorithms for multiclass SVMs, DPMs – API to adapt to new applications – Support datasets too large to fit in memory – Network interface for online & active learning Mammal PrimateHoofed Mammal Odd-toedGorillaOranguta n Even-toed

30
Thanks!

31
Weaknesses Less easily parallelizable than methods based on 1-vs-all – Although we do offer multithreaded version Focused on SVM-based learning algorithms

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google