Presentation is loading. Please wait.

Presentation is loading. Please wait.

Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research.

Similar presentations


Presentation on theme: "Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research."— Presentation transcript:

1 Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research - Haifa Presented by: Shai Erera

2 Outline Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions

3 Outline Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions

4 Motivation Web 2.0 enables mass multimedia productions Still, search is limited to manually added metadata State of the art solutions for CBIR (Content Based Image Retrieval) do not scale – Reveal linear scalability in the collection size due to large number of distance computations Can we use textIR methods to scale up CBIR?

5 Outline Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions

6 Problem definition Low level image features can be generalized to Metric Spaces Metric Space: An ordered pair (S,d), where S is a domain and d a distance function d: S x S  R such that – d satisfies non-negativity, reflexibility, symmetry and triangle inequality The best-k results for a query in a metric space are the k objects with the smallest distance to the query – Convert distances to scores (small distance – high score) between [0,1]

7 Problem definition Top-K Problem: – Assume m metric spaces, a Query Q, an aggregate function f and a score function sd(): – Retrieve the best k objects D with highest f(sd 1 (Q,D), sd 2 (Q,D)…sd m (Q,D)) q k=5

8 Outline Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions

9 Metric Inverted Index Assume a collection of objects each having m features – Object D = {F 1 :v 1, F 2 :v 2,…, F m :v m } – m metric spaces Indexing steps – Lexicon creation (select candidates) – Invert objects (canonization to lexicon terms)

10 Metric inverted indexing – Lexicon creation Number of different features too large Need to select candidates – Naïve solution: Lexicon of fixed size l Select randomly l/m documents and extract their features These l features form our lexicon – Improvement Replace the random choice by clustering (K-Means etc.) Keep the lexicon in an M-Tree structure

11 Metric inverted indexing – invert objects Given object D = {F 1 :v 1, F 2 :v 2,…, F m :v m } Canonization – map features (F i :v i ) to lexicon entries – For each feature select the n nearest lexicon terms – D’ = {F 1 :v 11, F 1 :v 12, …F 1 :v 1n, F 2 :v 21, F 2 :v 22, …F 2 :v 2n, … F m :v m1, F m :v m2, …F m :v mn } Index D’ in the relevant posting-lists

12 Outline Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions

13 Retrieval stage – term selection Given Q = {F 1 :qv 1, F 2 :qv 2,…, F m :qv m } Canonization – For each feature select the n nearest lexicon terms – Q’ = {F 1 :qv 11, F 1 :qv 12, …F 1 :qv 1n, F 2 :qv 21, F 2 :qv 22, …F 2 :qv 2n, … F m :qv m1, F m :qv m2, …F m :qv mn }

14 Retrieval stage – Boolean Filtering These m*n posting-lists will be queried via a Boolean Query Two possible modes: – Strict-query-mode: – Fuzzy-query-mode:

15 Retrieval stage – Scoring Documents retrieved by the Boolean Query are fully scored Return the best k objects with the highest aggregate score f(sd_1(Q,D),sd_2(Q,D),…,sd_m(Q,D))

16 Outline Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions

17 Experiments Focus on: – Efficiency – Effectiveness Collection of 160,000 images from Flickr 3 features are extracted from each image – EdgeHistogram, ScalableColor and ColorLayout 180 queries – Fuzzy-Query-Mode – Sampled from the collection of images Compared to M-tree data-structure

18 Experiments – Measures Used Effectiveness: MAP is a natural candidate for measuring – Problem: In Image Retrieval, no document is irrelevant – Solution: we defined as relevant the k highest scored documents in the collection (according to the M-Tree computation) – MAP@K: MAP computed on relevant and retrieved lists of size k

19 Experiments – Measures Used contd. Efficiency: we compute the number of computations per query – A computation unit (cu) is a distance computation call between two feature values

20 Effectiveness MAP vs. number of Nearest Terms size of the lexicon = 12000

21 Effectiveness MAP vs. lexicon size Number Nearest Terms =30

22 Effectiveness vs. Efficiency MAP vs. number of comparisons Number Nearest Terms =30

23 M-Tree vs. Metric Inverted Number of comparisons vs. top-k Number Nearest Terms =30

24 Outline Motivation Problem Definition Metric Inverted Index Retrieval Experiments Conclusions

25 We reduce the gap between Text IR and Multimedia Retrieval Our method achieves very good approximation (MAP = 98%) Our method improves drastically the efficiency (90%) over state-of-the-art methods


Download ppt "Metric Inverted - An efficient inverted indexing method for metric spaces Benjamin Sznajder Jonathan Mamou Yosi Mass Michal Shmueli-Scheuer IBM Research."

Similar presentations


Ads by Google