Presentation is loading. Please wait.

Presentation is loading. Please wait.

Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Similar presentations


Presentation on theme: "Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept."— Presentation transcript:

1 Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept. of Electronic & Computer Engineering Technical University of Crete, Greece http://www.intelligence.tuc.gr

2 Overview Motivation Distributed resource sharing The DHTrie protocols Local filtering algorithms Conclusions

3 Motivation Resource sharing is at the core of todays computing (Web, P2P, Grid). One-time as well as continuous querying functionality is needed. Data models and languages based on Information Retrieval are useful for annotating and querying resources. Many nice technologies to build on (e.g., overlay networks, agents etc.)

4 Related work Distributed information retrieval p-Search, PlanetP, [Li et.al. 2003], [Cohen et.al. 2003], [Reynolds et.al. 2002], … Publish/subscribe Non DHT-based SIFT, SIENA, Le Suscribe, Gryphon, P2P-DIET, … DHT-based Scribe, Bayeux (topic-based) [Tam et.al. 2003], [Pietzuch et.al. 2003], [Terpstra et.al. 2003], [Triantafillou et.al. 2004] (content-based)

5 Distributed resource (file) sharing Two kinds of basic functionality are expected: One-time querying A user poses a query I want photos of Euro 2004 champions. The system returns a list of pointers to matching resources. Publish/subscribe A user posts a continuous query to receive a notification when a photo of Euro 2004 champions is published. The system notifies the subscriber with a pointer to the peer that published the video clip.

6 Distributed resource sharing One-time query scenario

7 Distributed resource sharing Publish/subscribe scenario

8 Achievements in the context of DIET Languages and data models from IR (emphasis on textual information). Efficient filtering algorithms. The system P2P-DIET A super-peer based P2P system. Implemented on top of the lightweight mobile agent platform DIET Agents. DIET project:www.dfki.uni-kl.de/IVSWEB/DIETwww.dfki.uni-kl.de/IVSWEB/DIET DIET Agents:http://diet-agents.sourceforge.net/http://diet-agents.sourceforge.net/ P2P-DIET: http://www.intelligence.tuc.gr/p2pdiethttp://www.intelligence.tuc.gr/p2pdiet Current work: Solve the pub/sub problem using ideas from DHTs.

9 Distributed Hash Tables (DHTs) Created to solve the object location problem in a distributed (dynamic) network of nodes. Support only one operation: Given a key, map the key onto a node Many existing systems (Chord, CAN, Pastry, Tapestry, P-Grid, DKS, Viceroy, …). Needs logarithmic number of messages to locate a node.

10 Data model… Publications are attribute-value pairs (A,s), where A is a named attribute and s is a text value. An example of a publication in model AWP {(AUTHOR, John Smith), (TITLE, Information dissemination in P2P systems), (ABSTRACT, In this paper we show …)}

11 …and query language Examples of continuous queries in model AWP

12 Distributed resource sharing revisited Publish/subscribe scenario

13 Subscribing with a continuous query Assume query q of the form: Then for a random attribute A i and a random word w j contained in either s i or wp i, we create the string A i w j and use it as the key to forward the query to peer with ID = H(A i w j ). The DHTrie protocols

14 The DHTrie protocols (contd) Publishing a resource Assume a publication p of the form: Obtain a list of peer IDs by hashing string A i w j for all words, and all attributes in p (necessary to ensure correctness). Use indirect message passing and the DHT infrastructure to forward the message. The receiver node, contacts neighbors included in the recipients list, removes them from it and forwards the message.

15 Traditional way to handle a message forwarding to more than one recipients. Send a lookup() message for each recipient. For k recipients we need O(k log(N)) lookup messages. Multicast techniques not applicable, since group of peers to be contacted is not known a priori. Direct message passing

16 Incorporate recipient list into message Avoid asking the same routing question more than once Opportunistic forwarding Increase in message size due to: publication size process publication (remove stopwords, stemming) use inverted (and compressed) index receipient list size use gap compression (avoid peer IDs) Indirect message passing

17 Message, {N11,N40}Message, {N40} The DHTrie protocols N1 N8 N14 N21 N32 N38 N42 N48 N51 Finger Table N1+1 N8 N1+2 N8 N1+4 N8 N1+8 N14 N1+16 N21 N1+32 N38 … Message, {N11,N40} Finger Table N14+1 N21 N14+2 N21 N14+4 N21 N14+8 N32 N14+16 N42 N14+32 N48 … Message, {} Publishing a resource

18 The DHTrie protocols Notifying interested subscribers To find all matching queries in a peer, we use filtering algorithm BestFitTrie. [Tryfonopoulos, Koubarakis, Drougas, SIGIR 2004] Once all matching queries are found, a notification message is created and forwarded to peers using indirect message passing.

19 Some (preliminary) results

20 Filtering algorithms at each super-peer Query clustering algorithm BestFitTrie Data structure is a hash table of tries Hash table is used for fast access to trie roots We search for the best place to store query q, in two phases: 1. Best position trie-wise 2. Best position forest-wise Matching procedure examines only tries with roots contained in the incoming document

21 Filtering algorithms at each super-peer PrefixTrie: Prefix-based clustering (handle a query as a sequence of words) BestFitTrie: Set-based clustering (handle a query as a set of words)

22 Filtering algorithms at each super-peer

23 BestFitTrie 1M PrefixTrie 1M BestFitTrie 3M PrefixTrie 3M

24 Other interesting issues Load balancing Frequency of occurrence of words may overload certain peers. Index queries under infrequent words. Use controlled replication. Word frequency computation Also useful in other types of queries (VSM). Global vs Local ranking schemes. Propose a hybrid ranking scheme, with updating and estimation mechanisms.

25 Thank you Funding sources: IST/FET project DIET (www.dfki.uni-kl.de/IVSWEB/DIET)www.dfki.uni-kl.de/IVSWEB/DIET IST/FET project Evergrow (http://www.evergrow.org)http://www.evergrow.org Heraclitus Ph.D. Fellowship Program (Greece)


Download ppt "Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept."

Similar presentations


Ads by Google