Presentation is loading. Please wait.

Presentation is loading. Please wait.

President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard.

Similar presentations


Presentation on theme: "President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard."— Presentation transcript:

1 President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard MerialdoEurecom Sophia-Antipolis Advisor :Chabane DjerabaUniversité Lille 1 Co-advisor :Jean MartinetUniversité Lille 1

2 Introduction Related works Our approach Enhanced Bag of Visual Words (E-BOW) Multilayer Semantically Significant Analysis Model (MSSA) Semantically Significant Invariant Visual Glossary (SSIVG) Experiments Image retrieval Image classification Object Recognition Conclusion and perspectives Introduction Related works Our approach Experiments Conclusion and perspectives 2

3 Digital content grows rapidly Personal acquisition devices Broadcast TV Surveillance Relatively easy to store, but useless if no automatic processing, classification, and retrieving The usual way to solve this problem is by describing images by keywords. This method suffers from subjectivity, text ambiguity and the lack of automatic annotation 3 Introduction Related works Our approach Experiments Conclusion and perspectives

4 Image-based representations are based on global visual features extracted over the whole image like color, color moment, shape or texture Image-based representations Part-based representations Visual representations Introduction Related works Our approach Experiments Conclusion and perspectives 4

5 The main drawbacks of Image-based representations: High sensitivity to : Scale Pose Lighting condition changes Occlusions Cannot capture the local information of an image Part-based representations: Based on the statistics of features extracted from segmented image regions 5 Introduction Related works Our approach Experiments Conclusion and perspectives

6 6 Compute local descriptors Feature clustering Feature space VW 1 VW 2 VW 3 VW 4. Visual word vocabulary VW 1 VW 2 VW 3 VW 4. Frequency VW 1 VW 3 VW 2 VW 4 VW 1 Introduction Related works Our approach Experiments Conclusion and perspectives

7 Spatial information loss Record number of occurrences Ignore the position Using only keypoints-based Intensity descriptors: Neither shape nor color information is used Feature quantization noisiness: Unnecessary and insignicant visual words are generated 7 Introduction Related works Our approach Experiments Conclusion and perspectives

8 Low discrimination power: Different image semantics are represented by the same visual words Low invariance for visual diversity: One image semantic is represented by different visual words 8 VW 330 VW 480 VW 148 VW 263 Introduction Related works Our approach Experiments Conclusion and perspectives VW 1364

9 Introduction Related works Our approach Experiments Conclusion and perspectives Enhanced BOW representation Different local information (intensity, color, shape…) Spatial constitution of the image Efficient visual word vocabulary structure Higher-level visual representation Less noisy More discriminative More invariant to the visual diversity 9

10 MSSA model Learning the MSSA model E-BOW Set of images E-BOW representation Visual word vocabulary building SSIVG SSVIWs & SSIVPs generation SSIVG representation Introduction Related works Our approach Experiments Conclusion and perspectives 10

11 Introduction Related works Spatial Pyramid Matching Kernel (SPM) & sparse coding Visual phrase & descriptive visual phrase Visual phrase pattern & visual synset Our approach Experiments Conclusion and perspectives Introduction Related works Our approach Experiments Conclusion and perspectives 11

12 Lazebnik et al. [CVPR06] Spatial Pyramid Matching Kernel (SPM): exploiting the spatial information of location regions. Yang et al. [CVPR09] SPM + sparse coding: replacing k-means in the SPM 12 Introduction Related works Our approach Experiments Conclusion and perspectives

13 Zheng and Gao [TOMCCAP08] Visual phrase: pair of spatially adjacent local image patches Zhang et al. [ACM MM09] Descriptive visual phrase: selected according to the frequencies of its constituent visual word pairs Introduction Related works Our approach Experiments Conclusion and perspectives 13

14 Yuan et al. [CVPR07] Visual phrase pattern: spatially co-occurring group of visual words Zheng et al. [CVPR08] Visual synset: relevance-consistent group of visual words or phrases in the spirit of the text synset Introduction Related works Our approach Experiments Conclusion and perspectives 14

15 SPMSPM + sparse coding Visual phrase Descriptive visual phrase Visual phrase pattern Visual synset Our approach Considering the spatial location Describing different local information Eliminating ambiguous visual words semantically Efficient structure for storing visual vocabulary Enhancing low discrimination power Tackling low invariance for visual diversity Introduction Related works Our approach Experiments Conclusion and perspectives 15

16 Introduction Related works Our approach Enhanced Bag of Visual Words (E-BOW) Multilayer Semantically Significant Analysis Model (MSSA) Semantically Significant Invariant Visual Glossary (SSIVG) Experiments Conclusion and perspectives Introduction Related works Our approach Experiments Conclusion and perspectives 16

17 Introduction Related works Our approach Experiments Conclusion and perspectives 17 SSIVG MSSA model E-BOW Hierarchal features quantization Features fusion Set of images E-BOW representation SURF & Edge Context extraction

18 18 Interest points detection SURF feature vector extraction at each interest point Fusion of the SURF and edge context feature vectors HAC and Divisive Hierarchical K-Means clustering VW vocabulary Collection of all vectors for the whole image set Edge points detection Color and position vector clustering using Gaussian mixture model 3 µ 3 P i3 2 µ 2 P i2 Color filtering using vector median filter (VMF ) 1 µ 1 P i1 Color feature extraction at each interest and edge point Edge Context feature vector extraction at each interest point Introduction Related works Our approach Experiments Conclusion and perspectives

19 SURF is a low-level feature descriptor Describes how the pixel intensities are distributed within a scale dependent neighborhood of each interest point. Good at Handling serious blurring Handling image rotation Poor at Handling illumination change Efficient 19 Introduction Related works Our approach Experiments Conclusion and perspectives

20 Edge context descriptor is represented at each interest point as a histogram : 6 bins for the magnitude of the drawn vectors to the edge points 4 bins for the orientation angle 20 Introduction Related works Our approach Experiments Conclusion and perspectives

21 This descriptor is invariant to : Translation : The distribution of the edge points is measured with respect to fixed points Scale: The radial distance is normalized by a mean distance between the whole set of points within the same Gaussian Rotation: All angles are measured relative to the tangent angle of each interest point 21 Introduction Related works Our approach Experiments Conclusion and perspectives

22 Visual word vocabulary is created by clustering the observed merged features (SURF + Edge context 88 D) in 2 clustering steps: Enhanced Bag of Visual Words (E-BOW) Hierarchal feature quantization 22 … Stop clustering at desired level k A cluster at k =4 Merged feature in the feature space Hierarchical Agglomerative Clustering (HAC) The tree is determined level by level, down to some maximum number of levels L, and each division into k parts. Divisive Hierarchical K-Means Clustering k clusters from HAC Introduction Related works Our approach Experiments Conclusion and perspectives

23 Introduction Related works Our approach Experiments Conclusion and perspectives 23 SSIVG MSSA model Generative process Parameters estimation Number of latent topics Estimation VWs semantic inference estimation E-BOW Hierarchal features quantization Features fusion Set of images E-BOW representation SURF & Edge Context extraction

24 Different Visual aspects Higher-level aspect: People 24 A topic model that considers this hierarchal structure is needed Introduction Related works Our approach Experiments Conclusion and perspectives

25 V Wvhim φΘΨ M N 25 In the MSSA, there are two different latent (hidden) topics: High latent topic that represents the high aspects Visual latent topic that represents the visual aspects Introduction Related works Our approach Experiments Conclusion and perspectives

26 Probability distribution function : Log-likelihood function : Gaussier et al. [ ACM SIGIR05]: maximizing the likelihood can be seen as a Nonnegative Matrix Factorization (NMF) problem under the generalized KL divergence Objective function: 26 Introduction Related works Our approach Experiments Conclusion and perspectives

27 KKT conditions are used to derive the multiplicative update rules for minimizing the objective function This leads to the following multiplicative update rules : Introduction Related works Our approach Experiments Conclusion and perspectives 27

28 Minimum Description Length (MDL) is used as a model selection criteria Number of the high latent topics (L) Number of the visual latent topics (K) is the log-likelihood is the number of free parameters: 28 Introduction Related works Our approach Experiments Conclusion and perspectives

29 Introduction Related works Our approach Experiments Conclusion and perspectives MSSA model Generative process Parameters estimation Number of latent topics Estimation VWs semantic inference estimation E-BOW Hierarchal features quantization Features fusion Set of images E-BOW representation SURF & Edge Context extraction 29 SSIVG SSVW representation SSVPs generation SSVP representation Divisive theoretic clustering SSIVW representation SSVWs selection SSIVP representation SSIVG representation

30 Estimating using MSSA Set of relevant visual topics Estimating using MSSA Set of SSVWs Set of VWs 30 Introduction Related works Our approach Experiments Conclusion and perspectives

31 SSVP: Higher-level and more discriminative representation SSVWs + their inter-relationships SSVPs are formed from SSVW sets that satisfy all the following conditions: Occur in the same spatial context Involved in strong association rules High support and confidence Have the same semantic meaning High probability related to at least one common visual latent topic 31 Introduction Related works Our approach Experiments Conclusion and perspectives

32 Introduction Related works Our approach Experiments Conclusion and perspectives SSIVP 126 SSIVP 326 SSIVP 304 SSIVP 126 SSIVP 326 SSIVP

33 Studying the co-occurrence and spatial scatter information make the image representation more discriminative The invariance power of SSVWs and SSVPs is still low Text documents Synonymous words can be clustered into one synonymy set to improve the document categorization performance 33 Introduction Related works Our approach Experiments Conclusion and perspectives

34 SSIVG : higher-level visual representation composed from two dierent layers of representation Semantically Signicant Invariant Visual Word (SSIVW) Re-indexed SSVWs after a distributional clustering Semantically Signicant Invariant Visual Phrases (SSIVP) Re-indexed SSVPs after a distributional clustering Semantically Significant Invariant Visual Glossary (SSIVG) representation Estimating using MSSA Set of SSVWs and SSVPs Set of relevant visual topics Set of SSIVGs Divisive theoretic clustering Estimating using MSSA 34 Introduction Related works Our approach Experiments Conclusion and perspectives Set of SSIVPsSet of SSIVWs

35 Introduction Related works Our approach Experiments Image retrieval Image classification Object Recognition Conclusion and perspectives Introduction Related works Our approach Experiments Conclusion and perspectives 35

36 36 Assessment of the SSIVG representation performance in image retrieval Introduction Related works Our approach Experiments Conclusion and perspectives DatasetTotal Nr. of images Nr. of training images Nr. of test images Nr. of image categories NUS-WIDE269,648161,789107,85981 Evaluation criteria : Mean Average Precision (MAP) The traditional Vector Space Model of Information Retrieval is adapted The weighting for the SSIVP Spatial weighting for the SSIVW The inverted file structure

37 37 Assessment of the SSIVG representation Performance in image retrieval Introduction Related works Our approach Experiments Conclusion and perspectives

38 38 Assessment of the SSIVG representation performance in image retrieval Introduction Related works Our approach Experiments Conclusion and perspectives 38

39 39 Introduction Related works Our approach Experiments Conclusion and perspectives Dataset# images# training images # test images# image categories MIRFLICKER ,00011 Evaluation criteria : Classification Average Precision over each class Classifiers : SVM with linear kernel Multiclass Vote-Based Classifier (MVBC) Evaluation of the SSIVG representation in image classification

40 The final voting score for a high latent topic : is Each image is categorized according to the dominant high latent Evaluation of the SSIVG representation in image classification Multiclass Vote-Based Classifier (MVBC) Evaluation of the SSIVG representation in image classification Multiclass Vote-Based Classifier (MVBC) Introduction Related works Our approach Experiments Conclusion and perspectives For each, we detect the high latent topic that maximizes: is 40

41 41 Evaluation of the SSIVG representation performance in classification Introduction Related works Our approach Experiments Conclusion and perspectives

42 42 Assessment of the SSIVG representation Performance in object recognition Introduction Related works Our approach Experiments Conclusion and perspectives Dataset# images# training images # test images# image categories Caltech , Each test image is recognized by predicting the object class using the SSIVG representation and the MVBC Evaluation criteria: Classification Average Precision (AP) over each object class

43 43 Assessment of the SSIVG Representation Performance in Object Recognition Introduction Related works Our approach Experiments Conclusion and perspectives

44 Introduction Related works Our approach Experiments Conclusion and perspectives Introduction Related works Our approach Experiments Conclusion and perspectives 44

45 45 Conclusion Enhanced BOW (E-BOW) representation Modeling the spatial-color image constitution using GMM New local feature descriptor (Edge Context) Efficient visual word vocabulary structure New Multilayer Semantic Significance (MSSA) model Semantic inferences of different layers of representation Semantically Significant Visual Glossary (SSIVG) More discriminative More invariant to visual diversity Experimental validation Outperform other sate of the art works Introduction Related works Our approach Experiments Conclusion and perspectives

46 46 Perspectives Introduction Related works Our approach Experiments Conclusion and perspectives MSSA Parameters update On-line algorithms to continuously (re-)learn the parameters Invariance issue Context large-scale databases where large intra-class variations can occur Cross-modalitily extension to video content Cross-modal data (visual and textual closed captions contents) New generic framework of video summarization Study the semantic coherence between visual contents and textual captions

47 Thank you for your attention !

48 48 Parameter Settings Introduction Related works Our approach Experiments Conclusion and perspectives


Download ppt "President :Sophie TisonUniversité Lille 1 Reviewers :Philippe MulhemLaboratoire d'Informatique de Grenoble Zhongfei ZhangState University of New York Examinator:Bernard."

Similar presentations


Ads by Google