A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle , Congcong Li , Ashutosh Saxena, and Tsuhan Chen Cornell University,

A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University, Ithaca, NY, USA *indicates equal contribution

Outline  Motivation  Model  Algorithm  Results and Discussions  Conclusions 2Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Motivation

Scene Understanding Scene Categorization Event Categorization Depth Estimation Saliency Detection Geometric Layout Object Detection … Vision tasks are highly related. But, how do we connect them? S S O O E E L L D D ? 4Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Motivation Li et al, CVPR’09 Hoiem et al, CVPR’08 Sudderth et al, CVPR’06 5Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Saxena et al, IJCV’07

Motivation S S O O E E L L D D ? 6Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen  A generic model which can treat each classifier as a “black-box” and compose them to incorporate the additional information automatically

Farhadi et al, CVPR’09 Motivation 7Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Visual attributes Wang et al, ICCV’09 Ferrari et al, NIPS’07 Lampert et al, CVPR’09

Motivation 8Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen  Attributes for scene understanding?  A model which can compose the “black-box” classifiers and automatically exploit attributes for scene understanding Bocce “opencountry-like scene” attribute “salient region” attribute “depth in the middle region” attribute

 A model where the first layer is not trained to achieve the best independent performance, but achieve the best performance at the final output. Motivation 9Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene Depth Event Saliency Event Features Feed-forward φ S (X) φ D (X) φ E (X) φ Sal (X) Cascaded classifier model (CCM) Heitz, Gould, Saxena and Koller, NIPS’08 Final output ????

 Proposed generic model enables composing “black-box” classifiers  Feedback results in the first layer learning “attributes” rather than labels Model 11Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene Depth Event Saliency Event Features Feed-forward φ S (X) φ D (X) φ E (X) φ Sal (X) Feed-back Final output Attribute Learner

Algorithm

13Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene; θ S Depth; θ D Event ; θ E Saliency; θ Sal Event; ω E Features Feed-forward φ S (X (k) ) φ D (X (k) ) φ E (X (k) ) φ Sal (X (k) ) Feed-back Y E (k) (Output) TSTS TDTD TETE T Sal Optimization Goal

Y E (k) (Output) TSTS TDTD TETE T Sal Algorithm 14Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen First level of classifiers Second level of classifier Scene; θ S Depth; θ D Event ; θ E Saliency; θ Sal Event; ω E Features Feed-forward φ S (X (k) ) φ D (X (k) ) φ E (X (k) ) φ Sal (X (k) ) Feed-back Y E (k) (Output) TSTS TDTD TETE T Sal  Our Solution: Motivated from Expectation – Maximization (EM) algorithm  Parameter Learning: fix the required outputs and estimate parameters  Latent Variable Estimation: fix the model parameters and estimate latent variables (first level outputs) θSθS θSθS θDθD θDθD θEθE θEθE θ Sal ωEωE ωEωE

Results and Discussion

Experiments Depth Estimation - Make3D Saxena et al, NIPS’05 Saliency Detection Achanta et al, CVPR’09 Event Categorization Li et al, ICCV’07 16 S S D D E E Sal S S S S D D E E D D S S D D E E E E S S D D E E Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Scene Categorization Oliva et al, IJCV’01

Results Improvement on every task with the same algorithm! 17Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Our proposed Original image Ground truthBase – model CCM [Heitz et. al] Results: Visual improvements 18Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen Depth Estimation Saliency Detection Our proposed Original image Ground truthBase – model CCM [Heitz et. al]

Discussion – Attributes of the scene Maps of weights given to depth maps for scene categorization task 19Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen S S D D E E Sal S S

Weights given to event and scene attributes for event categorization Discussion – Attributes of the scene 20Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen S S D D E E Sal E E

Conclusions

 Generic model to compose multiple vision tasks to aid holistic scene understanding  “Black-box”  Feedback results in learning meaningful “attributes” instead of just the “labels”  Handles heterogeneous datasets  Improved performance for each of the tasks over state-of-art using the same learning algorithm  Joint optimization of all the tasks?  Congcong Li, Adarsh Kowdle, Ashutosh Saxena, and Tsuhan Chen, Feedback Enabled Cascaded Classification Models for Scene Understanding, NIPS 2010 22Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Thank you Questions?

Event Categorization Depth Estimation Scene Categorization Saliency Detection Image Feature Vector 51 – dim104 – dim512 – dim3 – dim 1 st layer Output 8 – dim class likelihood Pixel level depth map 8 – dim class likelihood Pixel level saliency score Layer-1 Classifier Multi-class Logistic Linear Regression RBF – kernel SVM Linear Regression Layer-2 Classifier Multi-class Logistic Linear Regression Multi-class Logistic Linear Regression Implementation 25Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Discussion Sparse model learnt by our model Weights for event categorization task 26Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Maps of weights given to depth maps for event categorization task Discussion – Attributes of the scene Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

Results Improvement on every task with the same model! 28Adarsh Kowdle*, Congcong Li*, Ashutosh Saxena, and Tsuhan Chen

A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle , Congcong Li , Ashutosh Saxena, and Tsuhan Chen Cornell University,

Similar presentations

Presentation on theme: "A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle , Congcong Li , Ashutosh Saxena, and Tsuhan Chen Cornell University,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,

Similar presentations

Presentation on theme: "A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle *, Congcong Li *, Ashutosh Saxena, and Tsuhan Chen Cornell University,"— Presentation transcript:

Similar presentations

About project

Feedback

A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle , Congcong Li , Ashutosh Saxena, and Tsuhan Chen Cornell University,

Presentation on theme: "A generic model to compose vision modules for holistic scene understanding Adarsh Kowdle , Congcong Li , Ashutosh Saxena, and Tsuhan Chen Cornell University,"— Presentation transcript: