Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N96994134 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard.

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N96994134 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard Socher and Li Fei-Fei Computer Vision and Pattern Recognition (CVPR) 2009

Outline 2 1. Introduction 2. Hierarchical Generative Model 3. Automatic Learning 4. Inference: Classification, Annotation and Segmentation 5. Experimental Results 6. Conclusions

1.Introduction 4  This paper proposed a novel generative model for simultaneously recognizing and segmenting object and scene classes.  Robust Representation of the Noisy Data  Flexible and Automatic Learning  Total Scene Understanding

1.Introduction 5 ClassificationAnnotationSegmentation Mutually beneficial!

1.Introduction 6 Athlete Horse Grass Trees Sky Saddle ClassificationAnnotationSegmentation Horse class: Polo

1.Introduction 7 Horse Sky Tree Grass Athlete Horse Grass Trees Sky Saddle ClassificationAnnotationSegmentation Horse Athlete class: Polo

1.Introduction 8 class: Polo Horse Athlete Horse Grass Trees Sky Saddle ClassificationAnnotationSegmentation

9 Related Work: Tu et al 03 Annotation Segmentation Horse Sky Tree Grass Horse Athlete Li & Fei-Fei 07 Annotation Classification Sky Grass Horse Athlete Horse Class: Polo Classification Segmentation Tree Heitz et al 08 Class: Polo

2.Hierarchical Generative Model-- Generative Model 11  Generative model: model p(x, y) or p(x|y)p(y)  Discriminative model: model p(y|x) 010203040506070 0 0.5 1 x = data 010203040506070 0 0.05 0.1 From Prof. Antonio Torralba course slide

2.Hierarchical Generative Model-- Generative Model 12  Naïve Bayesian model  (c: class, w: visual words)  Once we have learnt the distribution, for a query image w1w1 … wnwn c Bayesian Networks

2.Hierarchical Generative Model-- Generative model: Another example 13  Mixture Gaussian Model ? How to infer from unlabeled data even if we know the underlining probability distribution structure?

2.Hierarchical Generative Model– A graphical model 14 Directed graph Nodes represent variables Links show dependencies Conditional distributions at each node Inverse Variance Observed data Object class c γ μ x Mean P(μ|c) P(c) P(γ|c) P(x|μ,γ) Hidden

2.Hierarchical Generative Model– Spatial Latent Topic Model (Unsupervised) 15  Maximize Log-likelihood  an optimization problem: close-formed solution is intractable Dirichlet prior Multinomial

2.Hierarchical Generative Model– Spatial Latent Topic Model (Supervised) 16  For a query image, I d, find its most probable category c : Now it becomes C x K matrix, i.e. θ depends on observed c

2.Hierarchical Generative Model 17 C Nr O R NFNF X ArAr Nt Z S T D Athlete Horse Grass Trees Sky Saddle

2.Hierarchical Generative Model 18 C Visual Text class: Polo Athlete Horse Grass Trees Sky Saddle Joint distribution of random variable Visual Component Text Component. D

2.Hierarchical Generative Model 19 O Text Component. D Visual Text C class: Polo

2.Hierarchical Generative Model 20 R NFNF Color Location Texture Shape Text Component. O D Visual Text C class: Polo

R NFNF O D Visual Text C class: Polo X ArAr. Text Component 2.Hierarchical Generative Model 21

R NFNF O D Visual Text C class: Polo X ArAr Z NrNt “Connector variable” Athlete Horse Grass Trees Sky Saddle Text Component. 2.Hierarchical Generative Model 22

R NFNF O D Visual Text C class: Polo X ArAr Z NrNt “Connector variable”. S Athlete Horse Grass Trees Sky Saddle Athlete Horse Grass Trees Sky Saddle Visible Not visible “Switch variable” Horse Athlete Horse 2.Hierarchical Generative Model 23

R NFNF O D Visual Text C class: Polo X ArAr Z NrNt “Connector variable” S Athlete Horse Grass Trees Sky Saddle Visible Not visible “Switch variable” T Horse. 2.Hierarchical Generative Model 24

2.Hierarchical Generative Model 25  The model represent image features, object regions, visually relevant and irrelevant tags.

3.Automatic Learning 27  A framework for automatic learning from Internet images and tags (i.e. flickr.com), hence offering a scalable approach with no additional human labor.

3.Automatic Learning Exact Inference is Intractable ! Relationship of the random variables Visual Text C Nr O R NF X Ar Nt Z S T 28

 Collapsed Gibbs Sampling 3.Automatic Learning Relationship of the random variables Visual Text C Nr O R NF X Ar Nt Z S T Top-down force Bottom-up force from visual information Bottom-up force from text information (R. Neal, 2000) 29

3.Automatic Learning 30 Step 1: Obtain Candidate Tags Reduce the number of tags by keeping words that belong to the ‘physical entity’ group. Step 2: Initialize Object Obtain initial object models Annotate scene images. Select initialization images. Step 3: Automatic Learning Add more Flickr images and their tags to jointly train the model.

4.Classification, Annotation and Segmentation 32  Classification  Use the visual component of the model to compute the probability of each scene class, by integrating out the latent object.  Annotation  Given an unknown image, annotation tags are extracted from the segmentation results.  Segmentation  Segmentation infers the exact pixel locations of each of the objects in the scene.

4.Classification, Annotation and Segmentation 33  The comparison between the results in the first two columns underscores the effectiveness of the contextual facilitation by the top-down classification on the annotation and segmentation tasks

5.Experimental Results 35  Comparison of classification results

5.Experimental Results 36  Comparison of precision and recall value of annotation

5.Experimental Results 37  Results of segmentation on seven object categories and mean values

6.Conclusion 39  This paper proposed a hierarchical model is developed to unify the patch-level, object-level, and scene-level information.  The model is related to several research area:  Image understanding using contextual information.  Machine translation between words and images.  Simultaneous object recognition and segmentation.  Learning semantic visual models from Internet data.

Thank You! 40 Q & A

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N96994134 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard.

Similar presentations

Presentation on theme: "Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N96994134 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N96994134 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard.

Similar presentations

Presentation on theme: "Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N96994134 工科所 錢雅馨 2011/01/16 Li-Jia Li, Richard."— Presentation transcript:

Similar presentations

About project

Feedback

Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N96994134 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard.

Presentation on theme: "Towards Total Scene Understanding: Classiﬁcation, Annotation and Segmentation in an Automatic Framework N96994134 工科所錢雅馨 2011/01/16 Li-Jia Li, Richard."— Presentation transcript: