Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and.

Similar presentations


Presentation on theme: "Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and."— Presentation transcript:

1 Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition Session: Object Recognition II: using Language, Tuesday 15, June 2010 Rongrong Ji 1, Hongxun Yao 1, Xiaoshuai Sun 1, Bineng Zhong 1, and Wen Gao 1,2 1 Visual Intelligence Laboratory, Harbin Institute of Technology 2 School of Electronic Engineering and Computer Science, Peking University

2 Overview ✴ Problem Statement ✴ Building patch-labeling correspondence ✴ Generative semantic embedding ✴ Experimental comparisons ✴ Conclusions

3 Overview ✴ Problem Statement ✴ Building patch-labeling correspondence ✴ Generative semantic embedding ✴ Experimental comparisons ✴ Conclusions

4 Introduction Building visual codebooks –Quantization-based approaches K-Means, Vocabulary Tree, Approximate K-Means et al. –Feature-indexing-based approaches K-D Tree, R Tree, Ball Tree et al. Refining visual codebooks –Topic model decompositions pLSA, LDA, GMM et al. –Spatial refinement Visual pattern mining, Discriminative visual phrase et al.

5 Introduction With the prosperity of Web community User Tags Our Contribution

6 Introduction Related works –Semi-supervised clustering –Data compression –Learning to build codebooks –Learning to refined codebooks The compression task try to learn a codebook to minimize the quality lost of original and compressed data Dual Clustering MRF Co-Clustering Include category labels into quantization Adaptive Vocabulary ERC-Forest MIL minimization Supervised merge or split the original codewords using category labels

7 Introduction Problems to achieve this goal –Supervision @ image level –Correlative semantic labels –Model generality A traditional San Francisco street view with car and buildings ? Flower Rose

8 Introduction Our contribution –Publish a ground truth path-labeling set http://vilab.hit.edu.cn/~rrji/index_files/SemanticEmbedding.htm –A generalized semantic embedding framework Easy to deploy into different codebook models –Modeling label correlations Model correlative tagging in semantic embedding

9 Introduction The proposed framework Build patch-level correspondence Supervised visual codebook construction

10 Overview ✴ Problem Statement ✴ Building patch-labeling correspondence ✴ Generative semantic embedding ✴ Experimental comparisons ✴ Conclusions

11 Building Patch-Labeling Correspondence Collecting over 60,000 Flickr photos –For each photo DoG detection + SIFT “Face” labels (for instance)

12 Building Patch-Labeling Correspondence Purify the path-labeling correspondences –Density-Diversity Estimation (DDE) –Formulation For a semantic label, its initial correspondence set is –Patches extracted from images with label –A correspondence from label to local patch Purify le links in

13 Building Patch-Labeling Correspondence Density –For a given, Density reveals its representability for : Diversity –For a given, Diversity reveals it unique score for : Average neighborhood distance in m neighbors n m : number of images in neighborhood n i : number of total images with label s i

14 High T: Concerns more on Precision, not Recall Building Patch-Labeling Correspondence

15 Case study: “Face” label (before DDE)

16 Building Patch-Labeling Correspondence Case study: “Face” label (after DDE)

17 Overview ✴ Problem Statement ✴ Building patch-labeling correspondence ✴ Generative semantic embedding ✴ Experimental comparisons ✴ Conclusions

18 ✴ Generative Hidden Markov Random Field ✴ A Hidden Field for semantic modeling ✴ An Observed Field for local patch quantization Generative Semantic Embedding

19 Hidden Field –Each produces correspondence links (le) to a subset of patch in the Observed Field –Links denotes semantic correlations between and l ij Generative Semantic Embedding WordNet::Similarity Pedersen et al. AAAI 04

20 Observed Field –Any two nodes follow visual metric (e.g. L2) –Once there is a link between and, we constrain by from the hidden field sjsj didi le ij Generative Semantic Embedding

21 In the ideal case –Each is conditionally independent given : Neighbors in the Hidden Field Feature set D is regarded as (partial) generative from the Hidden Field Generative Semantic Embedding

22 –Formulize Clustering procedure Assign a unique cluster label to each Hence, D is quantized into a codebook with corresponding features Cost for codebook candidate C P(C|D)=P(C)P(D|C)/P(D) Generative Semantic Embedding P(D) is a constraint Semantic Constraint Visual Distortion

23 Semantic Constraint P(C) –Define a MRF on the Observed Field –For a quantizer assignments C, its probability can be expressed as a Gibbs distribution from the Hidden Field as Generative Semantic Embedding

24 That is –Two data points and in contribute to if and only if Observed Field Hidden Field c i =c i sxsxsxsx sysysysy djdjdjdj didididi l xy Generative Semantic Embedding

25 Visual Constraint P(D|C) –Whether the codebook C is visually consistent with current data D Visual distortion in quantization Generative Semantic Embedding

26 Overall Cost Finding MAP of P(C|D) can be converted into maximizing its posterior probability The visual constraint P(D|C) The semantic constraints P(C) Generative Semantic Embedding

27 EM solution E step M step Assign local patches to the closest clusters Update the visual word center Generative Semantic Embedding

28

29 Model generation –To supervised codebook with label independence assumption –To unsupervised codebooks Making all l=0

30 Overview ✴ Problem Statement ✴ Building patch-labeling correspondence ✴ Generative semantic embedding ✴ Experimental comparisons ✴ Conclusions

31 Experimental Comparisons Case study of ratios between inter-class distance and intra-class distance with and without semantic embedding in the Flickr dataset.

32 Experimental Comparisons ✴ Database Flickr database –60,000 images, over 3,600 unique labels PASCAL VOC 05 ✴ Evaluation criterion Mean Average Precision (MAP)

33 Experimental Comparisons ✴ Parameter tuning ✴ Different label noise T (DDE purification strength) ✴ Different embedding strength S GSE (Without Hidden Field correlations Cr.),Varied T, S GSE (Without Hidden Field correlations Cr.), Varied T, S + GNP GSE (With Cr.) + Varied T, S GSE (With Cr.) + Varied T, S + GNP

34 Experimental Comparisons MAP@1 comparisons between our GSE model to Vocabulary Tree [1] and GNP [34] in Flickr 60,000 database.

35 Experimental Comparisons Confusion table comparisons on PASCAL VOC dataset with method in [24].

36 Overview ✴ Problem Statement ✴ Building patch-labeling correspondence ✴ Generative semantic embedding ✴ Experimental comparisons ✴ Conclusions

37 Conclusion Our contribution Propagating Web Labeling from images to local patches to supervise codebook quantization Generalized semantic embedding framework for supervised codebook building Model correlative semantic labels in supervision Future works Adapt one supervised codebook for different tasks Move forward supervision into local feature detection and extraction

38 Thank you!


Download ppt "Towards Semantic Embedding in Visual Vocabulary Towards Semantic Embedding in Visual Vocabulary The Twenty-Third IEEE Conference on Computer Vision and."

Similar presentations


Ads by Google