Presentation is loading. Please wait.

Presentation is loading. Please wait.

Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing.

Similar presentations


Presentation on theme: "Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing."— Presentation transcript:

1 Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing * Stanford University

2 How to Represent a Scene Image? Seeing the forest before the trees – Fast scene categorization with gist features – Oliva & Torralba, 2001 7 global properties (e.g., openness, mean depth, expansion, etc.) for scene gist – Kevin et al., 2003 Use the gist features to see the trees (i.e., recognizing objects) But, the trees compose the forest … – Object recognition is critical for scene categorization – Sudderth et al., 2005; Fei-Fei et al., 2005; etc. @ CMU, March, 2010 badmintonboccecroquet This is a forest scene.

3 Upstream Scene Understanding Models Erik Sudderth’s “Scene, Object, and Parts” model (CVPR 2005) Using MLE to estimate model parameters @ CMU, March, 2010

4 Upstream Scene Understanding Models Kevin Murphy’s “Forest & Tree” Model (NIPS 2003) Using MLE to estimate model parameters @ CMU, March, 2010

5 Upstream Scene Understanding Models Fei-Fei’s “Total Scene Understanding” Model (CVPR 2009) Using MLE to estimate model parameters @ CMU, March, 2010 Athlete Horse Grass Trees Sky Saddle class: Polo

6 We want to answer … Are we satisfying with the MLE method? Can we learn scene understanding models better? @ CMU, March, 2010

7 A Simple Working Example Joint scene categorization and object annotation model Global features: – Can be arbitrary! – Gist (Oliva & Torralba, 2001) – Sparse SIFT codes (Yang, Yu, Gong & Huang, 2009) @ CMU, March, 2010

8 Problem with MLE Model Joint Distribution Prediction rules for scene classification @ CMU, March, 2010

9 Problem with MLE Model Joint Distribution Maximum Likelihood Estimation (MLE) @ CMU, March, 2010 Decoupling! Scene ClassificationObject Annotation

10 Problem with MLE Model Joint Distribution Weak Coupling Effects @ CMU, March, 2010

11 Problem with MLE Model Joint Distribution Weak Coupling Effects @ CMU, March, 2010

12 Max-margin Training to achieve Strong Coupling Hint: although MLE decouples scene model and object model, the joint prediction rule couples them Discriminant function & Hinge Loss @ CMU, March, 2010

13 Max-margin Training to achieve Strong Coupling Hint: although MLE decouples scene model and object model, the joint prediction rule couples them Regularized Hinge Loss Minimization – Hinge loss couples both scene & object models, while log-loss is defined on scene model only! @ CMU, March, 2010

14 Solving the Optimization Problem Approximation to the intractable log-likelihood The optimization problem @ CMU, March, 2010

15 EM-style Algorithm Posterior Inference (inner-max problem): Parameter Estimation (outer-min problem) – alternating minimization (next page) @ CMU, March, 2010

16 Alternating-Minimization for a a Closed-form solutions – Gaussian parameters (c.f. MLE for Gaussian Mixture) – Topic parameters Loss-augmented SVM for @ CMU, March, 2010

17 Experiments 8-category sports data set (Li & Fei-Fei, 2007): – 1574 images (50/50 split) Badminton, bocce, croquet, polo, rowing, snowboarding, sailing, rockclimbing – Pre-segment each image into regions – Region features: color, texture, and location patches with SIFT features – Global features: Gist (Oliva & Torralba, 2001) Sparse SIFT codes (Yang, Yu, Gong, & Huang, 2009) 67-category MIT indoor scene (Quattoni & Torralba, 2009): – ~80 per-category for training; ~20 per-category for testing – Same feature representation as above – Gist global features @ CMU, March, 2010

18 Scene Classification Gist features – Fei-Fei’s theme model: 0.65 (different image representation) – SVM: 0.673 @ CMU, March, 2010

19 Scene Classification Loss function: @ CMU, March, 2010

20 Scene Classification Confusion Matrix & Examples @ CMU, March, 2010 $ blue for correct; red for wrong

21 MIT Indoor Scene Classification results: @ CMU, March, 2010 $ ROI+Gist(annotation) used human annotated interest regions.

22 MIT Indoor Scene Details: @ CMU, March, 2010

23 Object Annotation kNN classifier with features – Overall: – Example Objects: @ CMU, March, 2010

24 Conclusions & Future Work Conclusions: – MLE estimation can result in a weak coupling in upstream scene understanding models – Max-margin approach can be applied to achieve a well-balanced prediction rule Future Work – Improve the performance of the object annotation model Incorporate global features with conditional models “Double direction” max-margin learning with supervision on object annotation for scene completion – Systematical comparison with downstream scene understanding models Multi-class sLDA (Wang et al., 2009) MedLDA (Zhu et al., 2009) @ CMU, March, 2010


Download ppt "Max-Margin Training of Upstream Scene Understanding Models Jun Zhu Carnegie Mellon University Joint work with Li-Jia Li *, Li Fei-Fei *, and Eric P. Xing."

Similar presentations


Ads by Google