Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula.

Similar presentations


Presentation on theme: "Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula."— Presentation transcript:

1 Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula

2 2 Presentation contents Motivation Metric spaces and similarity searching GHT* Concepts Generalized Hyperplane Tree Distributed architecture Experimental results Conclusions and future work

3 3 Motivation Searching is a fundamental problem Traditional search Numbers or strings Based on total linear order of keys New approach Free text, images, audio, video, etc. Impossible to structure in keys and records

4 4 Alternative Metric spaces Similarity searching

5 5 Metric space Set of objects (A) any class of objects, which allows distance computing for example text, audio or video files Metric function (d) positive reflexive symmetric triangle inequality

6 6 Similarity searching Range search objects at max distance r from object Q k -nearest neighbor search k nearest neighbor objects of object Q r Q 1 2 4 3 Q

7 7 GHT* – concepts Data distributed among servers Multiple buckets with limited capacity Clients perform updates and search Bucket location algorithm Based on DDH and DST algorithms Exploits Generalized Hyperplane Tree

8 8 p2 p5 p1 p10 p3 p4 p11p6 p7 p8 p9 p12 p13 P14 Generalized Hyperplane Tree Single-site metric space indexing structure Allows similarity searching and is scalable Binary search tree Data stored in leaf nodes Inner nodes for routing Two “pivots” per nodep2p5 p5p2 p2 p4 p6 p12 p10 p9 p8p5 p3 p7 p11 p13 p14 p1

9 9 GHT* – distributed architecture GHT is used as search structure Leaf node represents a server unique server identifier servers extend the tree with leaf nodes for their local buckets Inner nodes store routing information GHT is replicated GHT can be inaccurate Update (image adjustment) messages

10 10 GHT* – distributed architecture

11 11 Experimental results – inserting Preliminary phase Tests for vector space with Euclidean distance function 10000 objectsminmaxavg Occupied buckets566862.4 Occupied servers798.07 Overall bucket load58.871.464.3 Maximal tree depth162620.4 Replication3.9%5.9%5%

12 12 Experimental results – searching 20 range queries with radius 50 points (match approx. 3 objects)

13 13 Conclusions First structure for scalable distributed similarity search Satisfies properties of SDDS Scalability – can expand to new servers through autonomous splits No hot-spot – all clients use as precise addressing as possible and learn from misaddressing Updates are local and never require updates to multiple clients Client performs only a few distance computations to locate servers

14 14 Future work More experiments Different metric spaces More complex evaluation Additional evaluated properties Nearest neighbor search Algorithm for parallel processing to better utilize distributed structure Experimental evaluation

15 Questions?


Download ppt "Scalable and Distributed Similarity Search in Metric Spaces Michal Batko Claudio Gennaro Pavel Zezula."

Similar presentations


Ads by Google