Presentation is loading. Please wait.

Presentation is loading. Please wait.

A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,

Similar presentations


Presentation on theme: "A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,"— Presentation transcript:

1 A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University, Ithaca, NY, USA *indicates equal contribution

2 Outline  Motivation  Model  Algorithm  Results and Discussions  Conclusions 2Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

3 Motivation

4 Scene Understanding Scene Categorization Event Categorization Depth Estimation Saliency Detection Geometric Layout Object Detection … Vision tasks are highly related. But, how do we connect them? S S O O E E L L D D ? 4Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

5 Motivation Li et al, CVPR’09 Hoiem et al, CVPR’08 Sudderth et al, CVPR’06 5Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Saxena et al, IJCV’07

6 Motivation S S O O E E L L D D ? 6Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen  A generic model which can treat each classifier as a “black-box” and compose them to incorporate the additional information automatically

7 Farhadi et al, CVPR’09 Motivation 7Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Visual attributes Wang et al, ICCV’09 Ferrari et al, NIPS’07 Lampert et al, CVPR’09

8 Motivation 8Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen  Attributes for scene understanding?  A model which can compose the “black-box” classifiers and automatically exploit attributes for scene understanding Bocce “opencountry-like scene” attribute “salient region” attribute “depth in the middle region” attribute

9  A model where the first layer is not trained to achieve the best independent performance, but achieve the best performance at the final output. Motivation 9Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene Depth Event Saliency Event Features Feed-forward φ S (X) φ D (X) φ E (X) φ Sal (X) Cascaded classifier model (CCM) Heitz, Gould, Saxena and Koller, NIPS’08 Final output ????

10 Model

11  Proposed generic model enables composing “black-box” classifiers  Feedback results in the first layer learning “attributes” rather than labels Model 11Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene Depth Event Saliency Event Features Feed-forward φ S (X) φ D (X) φ E (X) φ Sal (X) Feed-back Final output Attribute Learner

12 Algorithm

13 13Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene; θ S Depth; θ D Event ; θ E Saliency; θ Sal Event; ω E Features Feed-forward φ S (X (k) ) φ D (X (k) ) φ E (X (k) ) φ Sal (X (k) ) Feed-back Y E (k) (Output) TSTS TDTD TETE T Sal Optimization Goal

14 Y E (k) (Output) TSTS TDTD TETE T Sal Algorithm 14Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene; θ S Depth; θ D Event ; θ E Saliency; θ Sal Event; ω E Features Feed-forward φ S (X (k) ) φ D (X (k) ) φ E (X (k) ) φ Sal (X (k) ) Feed-back Y E (k) (Output) TSTS TDTD TETE T Sal  Our Solution: Motivated from Expectation – Maximization (EM) algorithm  Parameter Learning: fix the required outputs and estimate parameters  Latent Variable Estimation: fix the model parameters and estimate latent variables (first level outputs) θSθS θSθS θDθD θDθD θEθE θEθE θ Sal ωEωE ωEωE

15 Results and Discussion

16 Experiments Depth Estimation - Make3D Saxena et al, NIPS’05 Saliency Detection Achanta et al, CVPR’09 Event Categorization Li et al, ICCV’07 16 S S D D E E Sal S S S S D D E E D D S S D D E E E E S S D D E E Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Scene Categorization Oliva et al, IJCV’01

17 Results Improvement on every task with the same algorithm! 17Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

18 Our proposed Original image Ground truthBase – model CCM [Heitz et. al] Results: Visual improvements 18Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Depth Estimation Saliency Detection Our proposed Original image Ground truthBase – model CCM [Heitz et. al]

19 Discussion – Attributes of the scene Maps of weights given to depth maps for scene categorization task 19Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen S S D D E E Sal S S

20 Weights given to event and scene attributes for event categorization Discussion – Attributes of the scene 20Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen S S D D E E Sal E E

21 Conclusions

22  Generic model to compose multiple vision tasks to aid holistic scene understanding  “Black-box”  Feedback results in learning meaningful “attributes” instead of just the “labels”  Handles heterogeneous datasets  Improved performance for each of the tasks over state-of-art using the same learning algorithm  Joint optimization of all the tasks?  Congcong Li, Adarsh Kowdle, Ashutosh Saxena, and Tsuhan Chen, Feedback Enabled Cascaded Classification Models for Scene Understanding, NIPS 2010 22Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

23 Thank you Questions?

24

25 Event Categorization Depth Estimation Scene Categorization Saliency Detection Image Feature Vector 51 – dim104 – dim512 – dim3 – dim 1 st layer Output 8 – dim class likelihood Pixel level depth map 8 – dim class likelihood Pixel level saliency score Layer-1 Classifier Multi-class Logistic Linear Regression RBF – kernel SVM Linear Regression Layer-2 Classifier Multi-class Logistic Linear Regression Multi-class Logistic Linear Regression Implementation 25Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

26 Discussion Sparse model learnt by our model Weights for event categorization task 26Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

27 Maps of weights given to depth maps for event categorization task Discussion – Attributes of the scene Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

28 Results Improvement on every task with the same model! 28Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen


Download ppt "A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,"

Similar presentations


Ads by Google