Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cross-modal Hashing Through Ranking Subspace Learning

Similar presentations


Presentation on theme: "Cross-modal Hashing Through Ranking Subspace Learning"— Presentation transcript:

1 Cross-modal Hashing Through Ranking Subspace Learning
Kai Li, Guojun Qi, Jun Ye, Kien A. Hua Department of Computer Science University of Central Florida ICME 2016 Presented by Kai Li

2 Motivation and Background
The amount of multimedia data has exploded in the information age. A topic or event can be described by data from multiple sources. Explore semantic correlations among multi-modal data is meaningful

3 Cross-modal Similarity Search
Beach, mountain, water, sky, trees Buildings, sky, road … tree, water, sky … Skyscraper, blue sky … Coast Building Use a query from one modality to search for semantically relevant items from another modality. e.g. search for coast images using textual tags ‘beach, water …’

4 Cross-modal Search Challenges
3.8 trillion images by 2010 ! Huge database of high-dimensional data, high computational costs …

5 Cross-modal Search Challenges
query Image Text Sand beach , sky, water sea, person standing, person walking, [0.1, 0.3, −0.14, 0.01, …] [1, 0, 0, …] Data from different modalities are not directly comparable Different dimensionalities Distinct feature representations Incomparable space structures ….

6 Cross-modal Hashing … … … … … … …
11011 10011 01100 10001 Beach, water, sky Island ,trees, water Tree, Leaves, trunk Grass, Trees, Leaves Learning common binary representations that preserve the cross-modal similarities of multimodal data Linear search based on Hamming distance is very fast Binary codes support sub-linear search using hash indexing Compact binary codes cost way less storage

7 Existing Cross-modal Hashing
Significant amount of research has emerged recently CMSSH: (Bronstein et al., 2010) Cross-modal Similarity Sensitive Hashing CVH: (Kumar et al., 2011) Cross-view Hashing CRH: (Zhen et al., 2012) Co-regularized Hashing IMH: (Song et al., 2013) Inter-media Hashing LSSH: (Zhou et al. 2014) Latent Semantic Sparse Hashing SCM: (Zhang et al., 2014) Semantic Correlation Maximization CMFH: (Ding et al., 2014) Collective Matrix Factorization Hashing and more …

8 Existing Cross-modal Hashing
In general, existing hashing algorithms follow two steps Step 1: Learn model coefficients W by minimizing some cross- correlation errors w.r.t. to ground-truth cross-modal similarity labels Step 2: Binary partitioning of linear/nonlinear feature projections, i.e. 𝐻 x =sign(F(W,x)) Different hashing algorithms usually differ in the first step, where different objective functions are used The second step, i.e. the form of hash function (sign()), are more or less the same.

9 Motivation and Contribution
We explore a new family of hash functions based on features’ relative ranking orders ℎ x;W =arg max 1≤𝑘≤𝐾 w 𝑘 𝑇 x Here, W= [ w 1 ⋯ w 𝐾 ] 𝑇 defines a linear subspace for ranking projected features. A special case of such ranking-based hashing schemes has been explored in Winner-Take-All Hash (WTA) (Yagnik et al. 2011)

10 Revisiting Winner-Take-All Hash
WTA is a special case of the ranking-hash function when w 𝑘 is restricted to axis-aligned directions and generated through random permutations WTA is an ordinal embedding of features based on partial order statistics Resilient to numeric perturbations, scaling, constant offset Non-linear feature embedding Limitations WTA is data-independent and requires long codes to get good performance WTA can not be applied to multimodal data

11 Objective The objective is to minimize
The pair-wise error is defined as Here ℎ 𝒳 𝑖 and ℎ 𝒴 𝑗 denote the hash codes obtained by using the ranking-based hash function for a pair of data points (x 𝑖 , y 𝑗 )

12 Optimization The argmax term makes direct optimization very hard, we seek a linear upper bound of the objective. To do this, we first reformulate the hash function in matrix form ℎ x;W =arg max g g 𝑇 𝐖x s.t. g ∈ {0, 1} K , 𝟏 𝑇 g = 1 The constraints for g enforce a 1-of-K coding scheme to select the maximum projected entry, which is equivalent to the previous definition.

13 Optimization The upper bound of the pairwise error is Such result directly follows from inequality Convex-concave Piece-wise linear The max term can be solved exactly in O(K)

14 Hash function learning
Randomly Initialize W 𝓧 and W 𝓨 For each cross-modal pair (x 𝑖 , y 𝑗 ) Compute 𝐡 𝒳 𝑖 and 𝐡 𝒴 𝑗 by their definitions Compute 𝐠 𝒳 𝑖𝑗 and 𝐠 𝒴 𝑖𝑗 by solving Update W 𝓧 and W 𝓨 using perceptron-like training rules Use Adaboost to learn 𝐿 hash codes sequentially assign equal weights 𝜔 𝑖𝑗 to each training pair Increase/decrease weights for correct/wrong pairs convergence study [Norouzi et al., 2011] [Bishop, 2006]

15 Experiment Dataset NUS-WIDE Wiki: Baselines
10 concepts, 186,577 image-tag pairs from Flickr Image represented as 500-D bag-of-visual words (BOVW) Tags are represented by 1000-D tag occurrence feature vectors. Wiki: 2,866 documents with images Annotated with semantic labels of 10 categories Image is represented as 128-D bag-of-SIFT feature vector Text-document is represented as 1000-D TF-IDF features Baselines CRH, CMSSH, CVH, IMH, CMFH (Refer to Page 5)

16 Experiment Performance Metrics Top-k precision Precision-recall
Proportion of ground-truth neighbors in the k nearest neighbors based on Hamming distance Precision-recall Precision different recall levels Mean Average Precision (mAP) First compute average precision of each query as the area under the precision-recall curve Compute mAP by as the average area over the query set

17 Experiment Results on NUS-WIDE

18 Experiment Results on NUS-WIDE

19 Experiment Results on NUS-WIDE

20 Experiment Results on NUS-WIDE

21 Conclusion and Future Work
Key Contributions The first cross-modal hashing scheme to exploit ranking- based hash function Effective perceptron-like learning algorithm that solve the problem efficiently Superior cross-modal retrieval performance on real- world datasets Future Work Extend to kernel subspace space ranking Incorporate feature learning stages and develop a deep ranking framework


Download ppt "Cross-modal Hashing Through Ranking Subspace Learning"

Similar presentations


Ads by Google