Presentation is loading. Please wait.

Presentation is loading. Please wait.

COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for.

Similar presentations


Presentation on theme: "COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for."— Presentation transcript:

1 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for P2P Information Systems Karl Aberer EPFL-DSC Distributed Information Systems Laboratory karl.aberer@epfl.ch

2 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Overview 1.Peer-to-Peer Information Systems 2.Data Access in a P2P Information System 3.P-Grid 1.Structure 2.Construction algorithm 3.Simulation 4.P-Grid Search and Update 1.Algorithms 2.Simulation 5.Application to Gnutella 6.Conclusions

3 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis 1. P2P Information Systems P2P Systems draw currently a lot of attention –File-sharing systems Napster, Gnutella, FreeNet, etc. –Conferences O’Reilly P2P conference 2001 (conferences.oreilly.com/p2p/) 2001 International Conference on Peer-to-Peer Computing (P2P2001) (www.ida.liu.se/conferences/p2p/p2p2001/) …

4 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Napster [www.napster.com] A Napster C 1. A asks Napster: "I am searching XXX.mp3" 2. Napster tells A: "C should have XXX.mp3" 3. A asks C: "I am requesting XXX.mp3" C 4. C delivers XXX.mp3 to A XXX.mp3 YYY.mp3 Internet

5 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Gnutella [www.gnutella.com] A B 1. A asks B: "I am searching XXX.mp3" 2. B tells A: "C should have XXX.mp3" 3. A asks C: "I am requesting XXX.mp3" C 4. C delivers XXX.mp3 to A XXX.mp3 YYY.mp3 ZZZ.mp3 Internet

6 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Properties of P2P Information Systems No central coordination No central database No peer has a global view of the system Global behavior emerges from local interactions Peers are autonomous Peers and connections are unreliable Despite these limitations: All existing information should be accessible

7 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis 2. Data Access in a P2P System B2B servers, Napster, eBay etc. –Central database (efficient) ! Gnutella –Search requests are broadcasted (inefficient) –Anectode: the founder of Napster computed that a single search request (18 Bytes) on a Napster community would generate 90 Mbytes of data transfers. [http://www.darkridge.com/~jpr5/doc/gnutella.html]

8 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Problem Can a set of peers provide –efficient search on a data set –of which the storage space exceeds the resources of each agent substantially: e.g. s_local = O(log(s_global)) Answer –In principle, yes ! –Requires scalable data access structure

9 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Scalable Data Access Structures Work in the following way –Every peer maintains a small fragment of the database and a routing table –The routing tables are organized such that at different levels of granularity requests can be forwarded –Replication is used to increase robustness route R0 route R1 route R00 route R01 data D01

10 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Approaches Scalable data access structures –[Plaxton 97] (distributed object addressing) –CHORD [Dabek 01] (distributed object addressing) –CAN (distributed object addressing) –FreeNet [Clarke 00] (file sharing systems) –[Litwin 97] (distributed databases) –[Yokota 99] (parallel databases) –P-Grid [Aberer 01] (decentralized databases) –etc. Question –Are they decentralized ?

11 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Comparison Criteria Routing criteria –trees, key similarity, hashing, multidim. keys, … Search criteria –equality, prefix, range, similarity Performance –search, update, join and leave the network Robustness –use of replication Global knowledge (except nature of search keys) –number of ex. addresses Global Control –Coordinator, central repository Local autonomy –fixed association of roles with address Scalable data access structure De- centralization

12 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Comparison – Data Access Structure yesO(log n)prefixBinary tree P-Grid noO(log n)rangeB-TreeYokota yesO(log n) ?equalityKey similarity FreeNet yesO(n 1/d )equalityMulti- dim. Grid CAN noO(log n)equalityImplicit binary tree CHORD yesO(log n)equalityBinary tree Plaxton ReplicationSearch perform. SearchRouting

13 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Comparison - Decentralization yesnononeP-Grid noyesallYokota no noneFreeNet yesnononeCAN no IP address space CHORD no Max # participa nts Plaxton Local autonomy Global Control Global Knowledge

14 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis 3. The P-Grid Search Structure

15 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis ref data R0101 R1 Data Structure of a Peer a R0R1 R00R01R00 R011R010R011 R0100R0101R0100 path of peer references ref data R0101

16 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid Construction Bootstrap problem: How to build the P-Grid ? –without a fixed association of addresses with keys i.e. a global schema to assign roles violating local autonomy –efficiently

17 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid Construction Algorithm (Bootstrap) When peers meet (randomly) –Compare the current search paths p and q Case 1: p and q are the same –If the maximal path length is not reached extend the paths and split search space, i.e. to p0 and q1 Case 2: p is a subpath of q, i.e. q = p0… –Extend p by the complement of q, i.e. p1 Case 3: only a common prefix exists –Forward to one of the referenced peers –Limit forwarding by recmax The peers remember each other and exchange in addition references at all levels

18 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Simulations Implementation in Mathematica Simulation parameters (n, k, recmax, refmax) –Peer population size n –Key length k –Recursion depth recmax –Multiple references refmax Determine number of meetings required –by each peer –to reach on average 99% of maximal pathlength

19 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Dependency on Peer Population Size (n = 200..1000, k = 6, recmax = 2, refmax = 1) None !?

20 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Dependency on Key Length (n = 500, k = 2..7, recmax = 2, refmax = 1) exponential

21 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Dependency on Recursion Depth (n = 500, k = 6, recmax = 0..6, refmax = 1) There exists an optimal value

22 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Replica Distribution (n = 20000, k = 10, recmax =2, refmax =20)

23 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Properties of P-Grid Bootstrap Algorithm Convergence ? –Does not depend on population size –Depends on key length exponentially –Depends on recursion depth Distribution of replicas ? –Simulations indicate a reasonable distribution –Access paths to replicas are non-uniformly distributed Balanced trees ? –Simple argument (and simulations) show that this is very likely

24 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis 4. Search and Update Search straightforward –Follow own path or references –At most k steps –If multiple references are online, select randomly Updates –All replicas need to be found –Repeated searches Breadth first (limited recursion breadth) Depth first Depth first and contact buddies with same key

25 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Simulation Result (n = 20000, k = 10, recmax = 2, refmax = 20) online probability 30% 10002000300040005000 0.2 0.4 0.6 0.8 1 breadthfirstsearch withbuddies depthfirstsearch

26 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Update vs. Search Cost Trade lower update quality for higher search cost –Use repeated searches to confirm results

27 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid Variations To be further explored –No global, maximal keylength –Growing and shrinking of keys problem: integrity of referenced peers –Joining and leaving P-Grids

28 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid Flexibility The algorithm represents rather a framework than a single solution –options are left open and leave room for optimization –e.g. taking into account access probability existing data distribution reachability and access cost

29 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis 5. Application to Gnutella Currently under implementation Uses Gnutella protocol and software Controls routing of search requests using P-Grid Problem: non-uniform distribution of search keys –Build statistics –Compute a global, prefix-preserving hash function

30 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis Computing the Required Resources Assume –10^7 searchable keys (substrings of filenames) –10 Bytes for storing a peer address –10^5 Bytes per peer provided for indexing –30 % online probability –99 % answer reliability Then –Approx. 20.000 peers can be supported –refmax = 20 is sufficient

31 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis 6. Conclusions Scalable distributed and decentralized access structures are possible P-Grids offer a lot of flexibility to be further exploited Powerful tools for analysis required Foundation for many fully decentralized P2P applications Application in mobile ad-hoc networks (www.terminode.org), Swiss national research centre at EPFLwww.terminode.org

32 COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis References [Aberer01] Karl Aberer, Zoran Despotovic. Managing Trust in a Peer-2-Peer Information System. To appear in the Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM 2001) 2001. [Vingralek 98] Radek Vingralek, Yuri Breitbart, Gerhard Weikum: Snowball: Scalable Storage on Networks of Workstations with Balanced Load. Distributed and Parallel Databases 6(2): 117-156 (1998) [Stonebraker 96] Michael Stonebraker, Paul M. Aoki, Witold Litwin, Avi Pfeffer, Adam Sah, Jeff Sidell, Carl Staelin, Andrew Yu: Mariposa: A Wide-Area Distributed Database System. VLDB Journal 5(1): 48-63 (1996) [Plaxton 97] C. Greg Plaxton, Rajmohan Rajaraman, Andréa W. Richa: Accessing Nearby Copies of Replicated Objects in a Distributed Environment. SPAA 1997: 311-320. [Yokota 99] Haruo Yokota, Yasuhiko Kanemasa, Jun Miyazaki: Fat-Btree: An Update-Conscious Parallel Directory Structure. ICDE 1999: 448-457. [Litwin 97] Witold Litwin, Marie-Anne Neimat: LH*s: A High-Availability and High-Security Scalable Distributed Data Structure. RIDE 1997. [Stoica 00] Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, Hari Balakrishnan. Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. Proceedings of the ACM SIGCOMM, 2001. [Clarke 00] Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A Distributed Anonymous Information Storage and Retrieval System. Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability. LLNCS 2009. Springer Verlag 2001. [Ratnasamy01] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker. A Scalable Content-Addressable Network. Proceedings of the ACM SIGCOMM, 2001.


Download ppt "COOPIS 2001, Trento, Italy ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis P-Grid: A Self-organizing Access Structure for."

Similar presentations


Ads by Google