Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept.

Slides:



Advertisements
Similar presentations
Peer-to-Peer Infrastructure and Applications Andrew Herbert Microsoft Research, Cambridge
Advertisements

SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
17 th International World Wide Web Conference 2008 Beijing, China XML Data Dissemination using Automata on top of Structured Overlay Networks Iris Miliaraki.
Distributed Hash Tables
Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.
Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Data Currency in Replicated DHTs Reza Akbarinia, Esther Pacitti and Patrick Valduriez University of Nantes, France, INIRA ACM SIGMOD 2007 Presenter Jerry.
P2P-DIET: One-time and Continuous Queries in Super-Peer Networks By Stratos Idreos, Manolis Koubarakis and Christos Tryfonopoulos Intelligent Systems Laboratory.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
The Chord P2P Network Some slides have been borowed from the original presentation by the authors.
Massively Distributed Database Systems Distributed Hash Spring 2014 Ki-Joune Li Pusan National University.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Small-Scale Peer-to-Peer Publish/Subscribe
Small-world Overlay P2P Network
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Presented by.
Scalable Resource Information Service for Computational Grids Nian-Feng Tzeng Center for Advanced Computer Studies University of Louisiana at Lafayette.
Looking Up Data in P2P Systems Hari Balakrishnan M.Frans Kaashoek David Karger Robert Morris Ion Stoica.
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Aggregating Information in Peer-to-Peer Systems for Improved Join and Leave Distributed Computing Group Keno Albrecht Ruedi Arnold Michael Gähwiler Roger.
Object Naming & Content based Object Search 2/3/2003.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Hermes: A Distributed Event- Based Middleware Architecture Peter Pietzuch and Jean Bacon 1st DEBS Workshop, Vienna,
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
Ecole Polytechnique Fédérale de Lausanne, Switzerland Efficient processing of XPath queries with structured overlay networks Gleb Skobeltsyn, Manfred Hauswirth,
Possible uses of Everlab cluster Everlab Workshop 7-8 June, Jerusalem Iris Miliaraki Christos Tryfonopoulos Technical University of Crete Dept. of Electronics.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Achieving fast (approximate) event matching in large-scale content- based publish/subscribe networks Yaxiong Zhao and Jie Wu The speaker will be graduating.
PNear Combining Content Clustering and Distributed Hash-Tables Ronny Siebes Vrije Universiteit, Amsterdam The netherlands
A Distributed Architecture for Multi-dimensional Indexing and Data Retrieval in Grid Environments Athanasia Asiki, Katerina Doka, Ioannis Konstantinou,
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Information-Centric Networks07a-1 Week 7 / Paper 1 Internet Indirection Infrastructure –Ion Stoica, Daniel Adkins, Shelley Zhuang, Scott Shenker, Sonesh.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Full-Text Search in P2P Networks Christof Leng Databases and Distributed Systems Group TU Darmstadt.
Efficient P2P Searches Using Result-Caching From U. of Maryland. Presented by Lintao Liu 2/24/03.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
National Institute of Advanced Industrial Science and Technology Query Processing for Distributed RDF Databases Using a Three-dimensional Hash Index Akiyoshi.
Kaleidoscope – Adding Colors to Kademlia Gil Einziger, Roy Friedman, Eyal Kibbar Computer Science, Technion 1.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
1. Outline  Introduction  Different Mechanisms Broadcasting Multicasting Forward Pointers Home-based approach Distributed Hash Tables Hierarchical approaches.
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE, Spring Principles of Reliable Distributed Systems Lecture 2: Distributed Hash.
Information-Centric Networks10b-1 Week 10 / Paper 2 Hermes: a distributed event-based middleware architecture –P.R. Pietzuch, J.M. Bacon –ICDCS 2002 Workshops.
1. Efficient Peer-to-Peer Lookup Based on a Distributed Trie 2. Complex Queries in DHT-based Peer-to-Peer Networks Lintao Liu 5/21/2002.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Peter R Pietzuch and Jean Bacon Peer-to-Peer Overlay Networks in an Event-Based Middleware DEBS’03, San Diego, CA, USA,
Click to edit Master title style Multi-Destination Routing and the Design of Peer-to-Peer Overlays Authors John Buford Panasonic Princeton Lab, USA. Alan.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
P2P Content Search: Give the Web Back to the People Matthias Bender Sebastin Michel Peter Triantafillou Gerhard Weikum Christian Zimmer Mariam John CSE.
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Skype.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CHAPTER 3 Architectures for Distributed Systems
Accessing nearby copies of replicated objects
EE 122: Peer-to-Peer (P2P) Networks
Paraskevi Raftopoulou, Euripides G.M. Petrakis
Small-Scale Peer-to-Peer Publish/Subscribe
A Small and Fast IP Forwarding Table Using Hashing
Consistent Hashing and Distributed Hash Table
Presentation transcript:

Publish/Subscribe Systems with Distributed Hash Tables and Languages from IR Christos Tryfonopoulos & Manolis Koubarakis Intelligent Systems Lab Dept. of Electronic & Computer Engineering Technical University of Crete, Greece

Overview Motivation Distributed resource sharing The DHTrie protocols Local filtering algorithms Conclusions

Motivation Resource sharing is at the core of todays computing (Web, P2P, Grid). One-time as well as continuous querying functionality is needed. Data models and languages based on Information Retrieval are useful for annotating and querying resources. Many nice technologies to build on (e.g., overlay networks, agents etc.)

Related work Distributed information retrieval p-Search, PlanetP, [Li et.al. 2003], [Cohen et.al. 2003], [Reynolds et.al. 2002], … Publish/subscribe Non DHT-based SIFT, SIENA, Le Suscribe, Gryphon, P2P-DIET, … DHT-based Scribe, Bayeux (topic-based) [Tam et.al. 2003], [Pietzuch et.al. 2003], [Terpstra et.al. 2003], [Triantafillou et.al. 2004] (content-based)

Distributed resource (file) sharing Two kinds of basic functionality are expected: One-time querying A user poses a query I want photos of Euro 2004 champions. The system returns a list of pointers to matching resources. Publish/subscribe A user posts a continuous query to receive a notification when a photo of Euro 2004 champions is published. The system notifies the subscriber with a pointer to the peer that published the video clip.

Distributed resource sharing One-time query scenario

Distributed resource sharing Publish/subscribe scenario

Achievements in the context of DIET Languages and data models from IR (emphasis on textual information). Efficient filtering algorithms. The system P2P-DIET A super-peer based P2P system. Implemented on top of the lightweight mobile agent platform DIET Agents. DIET project: DIET Agents: P2P-DIET: Current work: Solve the pub/sub problem using ideas from DHTs.

Distributed Hash Tables (DHTs) Created to solve the object location problem in a distributed (dynamic) network of nodes. Support only one operation: Given a key, map the key onto a node Many existing systems (Chord, CAN, Pastry, Tapestry, P-Grid, DKS, Viceroy, …). Needs logarithmic number of messages to locate a node.

Data model… Publications are attribute-value pairs (A,s), where A is a named attribute and s is a text value. An example of a publication in model AWP {(AUTHOR, John Smith), (TITLE, Information dissemination in P2P systems), (ABSTRACT, In this paper we show …)}

…and query language Examples of continuous queries in model AWP

Distributed resource sharing revisited Publish/subscribe scenario

Subscribing with a continuous query Assume query q of the form: Then for a random attribute A i and a random word w j contained in either s i or wp i, we create the string A i w j and use it as the key to forward the query to peer with ID = H(A i w j ). The DHTrie protocols

The DHTrie protocols (contd) Publishing a resource Assume a publication p of the form: Obtain a list of peer IDs by hashing string A i w j for all words, and all attributes in p (necessary to ensure correctness). Use indirect message passing and the DHT infrastructure to forward the message. The receiver node, contacts neighbors included in the recipients list, removes them from it and forwards the message.

Traditional way to handle a message forwarding to more than one recipients. Send a lookup() message for each recipient. For k recipients we need O(k log(N)) lookup messages. Multicast techniques not applicable, since group of peers to be contacted is not known a priori. Direct message passing

Incorporate recipient list into message Avoid asking the same routing question more than once Opportunistic forwarding Increase in message size due to: publication size process publication (remove stopwords, stemming) use inverted (and compressed) index receipient list size use gap compression (avoid peer IDs) Indirect message passing

Message, {N11,N40}Message, {N40} The DHTrie protocols N1 N8 N14 N21 N32 N38 N42 N48 N51 Finger Table N1+1 N8 N1+2 N8 N1+4 N8 N1+8 N14 N1+16 N21 N1+32 N38 … Message, {N11,N40} Finger Table N14+1 N21 N14+2 N21 N14+4 N21 N14+8 N32 N14+16 N42 N14+32 N48 … Message, {} Publishing a resource

The DHTrie protocols Notifying interested subscribers To find all matching queries in a peer, we use filtering algorithm BestFitTrie. [Tryfonopoulos, Koubarakis, Drougas, SIGIR 2004] Once all matching queries are found, a notification message is created and forwarded to peers using indirect message passing.

Some (preliminary) results

Filtering algorithms at each super-peer Query clustering algorithm BestFitTrie Data structure is a hash table of tries Hash table is used for fast access to trie roots We search for the best place to store query q, in two phases: 1. Best position trie-wise 2. Best position forest-wise Matching procedure examines only tries with roots contained in the incoming document

Filtering algorithms at each super-peer PrefixTrie: Prefix-based clustering (handle a query as a sequence of words) BestFitTrie: Set-based clustering (handle a query as a set of words)

Filtering algorithms at each super-peer

BestFitTrie 1M PrefixTrie 1M BestFitTrie 3M PrefixTrie 3M

Other interesting issues Load balancing Frequency of occurrence of words may overload certain peers. Index queries under infrequent words. Use controlled replication. Word frequency computation Also useful in other types of queries (VSM). Global vs Local ranking schemes. Propose a hybrid ranking scheme, with updating and estimation mechanisms.

Thank you Funding sources: IST/FET project DIET ( IST/FET project Evergrow ( Heraclitus Ph.D. Fellowship Program (Greece)