Presentation is loading. Please wait.

Presentation is loading. Please wait.

ICDE 2004 1 A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.

Similar presentations


Presentation on theme: "ICDE 2004 1 A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science."— Presentation transcript:

1 ICDE 2004 1 A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science University of California at Santa Barbara

2 ICDE 2004 2 Outline Motivation Range mapping System overview Experimental results Conclusion and future work

3 University of California at Santa Barbara ICDE 2004 3 Motivation All queries are answered by the server Server is overloaded Scalability, availability Same/similar queries are evaluated multiple times Clients Central Data Server

4 University of California at Santa Barbara ICDE 2004 4 Motivation Users share their cached answers Server is contacted only if the P2P layer cannot find an answer Clients Central Data Server P2P Cache

5 University of California at Santa Barbara ICDE 2004 5 P2P Systems File sharing: Napster, Gnutella, KaZaA, … Central index or flooding Structured P2P systems: CAN, Chord, Pastry, Tapestry, … DHT/DOLR Efficient Routing: logarithmic/sublinear

6 University of California at Santa Barbara ICDE 2004 6 CAN Uses a d-dimensional virtual space for routing and object location Virtual space is partitioned into zones and each zone is maintained by a peer Every peer is responsible for the objects that are hashed into its zone 2-dimensional CAN

7 University of California at Santa Barbara ICDE 2004 7 Extending DHT functionality DHTs are designed for exact-match queries Piazza [Univ. of Washington], Hyperion [Univ. of Toronto], PIER [UC Berkeley] Extend DHTs for supporting range queries Selection of ranges is a primary operation for any kind of data analysis Main Goal: Utilize a DHT in order to materialize and locate cached answers of range queries

8 University of California at Santa Barbara ICDE 2004 8 Range Queries Given a range query, find the cached answers that can be used to compute the query answer Example: If the result of is already cached in the system, then the query can be answered using the cached result is subsumed by ; so the cached result is the super-set of the answer

9 University of California at Santa Barbara ICDE 2004 9 Use range string as key: Query: Hash string: “ ” DHTs for locating ranges Can we use original DHTs? Finds exact answers but not the similar ones!

10 University of California at Santa Barbara ICDE 2004 10 Extending CAN For single attribute, the virtual space is a 2- dimensional CAN The boundaries are determined by the domain of the range attribute x y 2080 20 x 80 20 Virtual space when attribute domain is [20,80]

11 University of California at Santa Barbara ICDE 2004 11 Mapping Scheme Range is mapped to point (x,y) Super-ranges are only in the upper- left region (40,60) (20,20)(80,20) (80,80)(20,80) 40 60 (40,60) (30,70) (30,50) Start value End value

12 University of California at Santa Barbara ICDE 2004 12 Space Partitioning Virtual space is partitioned into rectangular zones Each zone is assigned to an active peer With this mapping, the data source is responsible for the top-left zone

13 University of California at Santa Barbara ICDE 2004 13 Space Partitioning Active/Passive peers Passive Peers Active Peers S S Data Source

14 University of California at Santa Barbara ICDE 2004 14 S Space Partitioning Active/Passive peers Passive Peers Active Peers

15 University of California at Santa Barbara ICDE 2004 15 S Space Partitioning Active/Passive peers Passive Peers Active Peers

16 University of California at Santa Barbara ICDE 2004 16 S Space Partitioning Active/Passive peers Passive Peers Active Peers Each active peer keeps a list of passive peers Passive peers register with active peers

17 University of California at Santa Barbara ICDE 2004 17 Zone Split An active peer splits its zone when it is overloaded Load can be due to storage or bandwidth, etc. Split line is selected by the owner of the zone Even partitioning of the zone and the cached results New zone is assigned to a passive peer

18 University of California at Santa Barbara ICDE 2004 18 Routing Same as in CAN (Greedy routing) Each zone passes the message to the neighbor closest to the destination (50,55) Query:

19 University of California at Santa Barbara ICDE 2004 19 A Sharing cached answers Map the range to a point and send a notification message towards that point Destination peer keeps the index information P 1) caches 2) notify 55 50.. : P.. Local index at A 3) insert to index

20 University of California at Santa Barbara ICDE 2004 20 A Querying Map the range to a point and send a query message towards that point Destination peer searches the local index C requires 1) query 55 50.. : P.. Local index at A 2) return P P 3) transfer

21 University of California at Santa Barbara ICDE 2004 21 Forwarding The zones on the upper- left region may have super-ranges Destination zone forwards the request to upper-left zones (50,55) If no result is found at the destination, then…

22 University of California at Santa Barbara ICDE 2004 22 Acceptable Fit (50,55) How far to forward?  Forwarding is controlled by a parameter: AcceptableFit  It is a real value between [0,1]: offset = AcceptableFit x |domain|  Acceptable range for a range query is then: offset

23 University of California at Santa Barbara ICDE 2004 23 Forwarding Schemes Two schemes for forwarding: Flooding: Flood to all candidate zones Directed Forwarding: Iteratively forward to a single neighbor, that has the largest overlap with the acceptable region Stop if a result is found or a certain number of peers are contacted (DirectedLimit)

24 University of California at Santa Barbara ICDE 2004 24 Flooding vs. Directed Forwarding FloodingDirected Forwarding (Directed Limit=2)

25 University of California at Santa Barbara ICDE 2004 25 Updates Tuple with value 40 is updated! (40,40) Go to the corresponding point, (40,40), and flood to the upper-left region Costly, so we need better solutions Batching updates

26 University of California at Santa Barbara ICDE 2004 26 Forwarding Decreasing coordinates along odd dimensions Increasing coordinates along even dimensions Multiple range attributes Each attribute maps to two dimensions A range query over k attributes is mapped to a point in 2k-dimensional CAN ( 20<A<40, 50<B<60 ) (20,40,50,60) ( 10<A<50, 40<B<70 )  (10,50,40,70) ( -, -, -, - )

27 University of California at Santa Barbara ICDE 2004 27 Experiment Settings Single attribute with domain [0,500] The system is initially empty Range queries are selected uniformly at random For every zone: Split Point=5, Routing Threshold=3

28 University of California at Santa Barbara ICDE 2004 28 Flooding vs. Directed Forwarding Performance with flood forwardingPerformance with directed forwarding

29 University of California at Santa Barbara ICDE 2004 29 Routing is scalable Visited zones with Flood forwardingVisited zones with Directed forwarding

30 University of California at Santa Barbara ICDE 2004 30 Load Distribution 1000 peers, 10000 queries

31 University of California at Santa Barbara ICDE 2004 31 Conclusion and Future Work We presented a simple yet powerful mapping for ranges which allows us to leverage DHT infrastructure for range queries Limitations/Future Work Number of attributes should be fixed Does not work with other DHTs Assumes the existence of passive peers for load balancing

32 University of California at Santa Barbara ICDE 2004 32 Questions? odsahin@cs.ucsb.eduhttp://www.cs.ucsb.edu/~dsl/gaia.html


Download ppt "ICDE 2004 1 A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science."

Similar presentations


Ads by Google