Download presentation

Presentation is loading. Please wait.

Published byDestiney Fayne Modified about 1 year ago

1
Simultaneous Image Classification and Annotation Chong Wang, David Blei, Li Fei-Fei Computer Science Department Princeton University Published in CVPR 2009 Presented by Eric Wang (Final Version)

2
Outline Introduction Review of sLDA and Corr-LDA Model description Model Inference and Parameter Estimation Empirical Results Conclusion

3
Introduction Images Classification refers to assigning a class label to each image which globally describes the image. Image Annotation refers to assigning words which describe individual regions of the image. The images considered in this paper are both classified with a class label and annotated with free text. This paper will combine the basic framework of Corr-LDA, and a highly modified version of Supervised LDA (sLDA) to yield a model which simultaneously classifies images and annotates the individual regions.

4
Review of sLDA For each document In this model, the response variable y is a continuous random variable. are treated as unknown constants to be estimated, rather than as random variables. Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.

5
Review of sLDA An application of sLDA considered by Blei et. al was regressing a corpus of textual movie reviews to number of stars given. Source: D. M. Blei and J. D. McAuliffe. Supervised topic models. In NIPS, 2007.

6
Review of Corr-LDA For each document Corr-LDA is a simple extension of LDA for images to annotate image regions. Source: D. M. Blei and M. I. Jordan. Modeling annotated data. In SIGIR, 2003.

7
Annotated sLDA Model This step is identical to LDA, the topic proportions are drawn once per document. In this paper, is not optimized.

8
Annotated sLDA Model A region is characterized by one of 240 codewords (quantized from 128 dimensional SIFT features). Regions are found by segmenting images using the N-cuts algorithm. parameterizes a particular multinomial distribution (topic) over the quantized codewords.

9
Annotated sLDA Model The class label c is completely determined by the topic indicators z_{1:N} using a modified sLDA framework. The total number of classes is known a priori and the class indicators are treated separately from the annotations. This is a simpler approach than the one taken by L.J. Li, R. Socher and L. Fei-Fei in that there is no “switch” variable which determines whether a word is an annotation or label. The softmax function is well studied and is also known as “multinomial logistic regression”

10
Annotated sLDA Model The annotations are assigned to specific regions in the same manner as in Corr-LDA. This will, for example, encourage words such as “blue” and “white” to be associated with regions (and thus, codewords) which capture sky. Though not explicitly shown in the graphical model, and have symmetric Dirichlet priors.

11
Inference of Latent Variables These updates are identical to those used in Corr-LDA. Let parameterize a multinomial over the K topics Let parameterize a Dirichlet over topic distributions. Let parameterize a multinomial over image regions These updates are local to each document (thus the omission of d).

12
Inference of Latent Variables This equation updates the posterior distribution over topics. Note that this update depends on both class label c and the annotation information.

13
Inference of Latent Variables Parameter Estimation has no closed form solution and is optimized via conjugate gradient Updates of codebook word f in codebook topic i (proportional to a constant).

14
Inference of Latent Variables Parameter Estimation has no closed form solution and is optimized via conjugate gradient Updates annotation word w in annotation topic i (proportional to a constant).

15
Empirical Results LableMe dataset – 8 classes: “highway,” “inside city,” “tall building,” “street,” “forest,” “coast,” “mountain,” and “open country.” – x256 training images per class. UIUC dataset – 8 types of sports: “badminton,” “bocce,” “croquet,” “polo,” “rockclimbing,” “rowing,” “sailing” and “snowboarding.” – x256 training images. 240 codeword dictionary. Annotations which appeared less than 3 times were removed.

16
Empirical Results: Classification The black line represents of the performance of Bosch et. al 2006, which employs a non-annotated LDA on the image regions and a KNN to classify the images. The blue line is the performance of Fei-Fei and Perona 2005, which uses unannotated labeled images The models presented in this paper are much more resistant to overfitting than the models of Bosch et. al and Fei-Fei and Perona.

17
Emperical Results: Classification Confusion matrices comparing the performance of multi-class sLDA with annotations and multi-class sLDA using 100 topic models Annotations seem to improve performance slightly, although, as the last slide shows, the main benefit is more consistent performance as a function of the number of topics.

18
Empirical Results: Annotation The F-Measure is used as a score. Results are given over all numbers of topics considered above. LabelMe: 38.2% (corr-LDA) 38.7% (multi-class sLDA with annotations) UIUC-Sport: 34.7% (corr-LDA) 35.0% (multiclass sLDA with annotations).

19
Conclusion Combining image annotation with classification provides state of the art image classification performance. However, the addition of the classification framework provides only a small improvement to the annotation performance. The authors’ primary contribution is showing that image classification and annotation are related and can be conducted simultaneously in the same framework. Inference was done in a Variational EM framework.

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google