Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos P2P Systems & technologies Zacharioudakis Giorgos.

Similar presentations


Presentation on theme: "1 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos P2P Systems & technologies Zacharioudakis Giorgos."— Presentation transcript:

1 1 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos P2P Systems & technologies Zacharioudakis Giorgos

2 2 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Presentation overview  P2P architectures & typical systems  Technical issues  Popular P2P Systems  Research areas  Project JXTA technology  Vision about SeLene project

3 3 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos What is Peer-to-Peer?  Definition: Nodes of equal roles exchanging information and services directly  Scale: millions (billions?) of peers  Nature of peers: PC’s  Application: lightweight semantics (e.g., file-sharing)  Is this a new idea?  IP routing  DNS, NTP  Distributed Databases

4 4 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos P2P vs. Distributed DBMS  Traditional DDBMS Issues:  Transactions  Network Partitions  Distributed Query Optimization  Interoperation of heterogeneous data sources  Reliability/failure of nodes  Complex features do not scale  Example P2P application: file-sharing  Simple data model & query language  No complex query optimization  Easy interoperation  No guarantee on quality of results  Individual site availability unimportant  Local updates  No transactions  Network partitions OK  Simple Amenable to large-scale network of PCs

5 5 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos P2P Applications  File sharing  Napster, Gnutella  Instant Messaging  Jabber  Distributed Computation  SETI@home  Web services  Akamai  Distributed storage  Freenet  Anonymity, censorship resistance  Mixmaster remailers  Red Rover, Publius  Cooperative work  Groove  Other...

6 6 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Technical issues  scalability  fault tolerance  speed  bandwidth consumption  processing cost  security  anonymity  publishing/retrieval  metadata  semantic querying  availability of results  interoperability ...

7 7 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Metadata and Interoperability  Metadata  System metadata (e.g filename, bitrate, filesize etc)  Resource metadata (e.g relations, hierarchies etc)  Currently, queries are in the form of keyword matching  We would like to perform queries in more expressive languages, taking advantage of semantic knowledge metadata  Technologies:  Programming interfaces: XML-RPC, SOAP, HTTP, JXTA  Data and metadata representation - common ontologies and format XML, RDF

8 8 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Different Approaches to Distributed Search  Network topology based architectures  Relies on the organization of peers within the network to route requests  These approaches focus on how to reduce the diameter of the graph representing the distributed networks  Content based approaches  Message content is used in either the organization of the network or the routing of messages or both  These approaches focus on how to reduce the query path-length of the access structure they use

9 9 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Spectrum of “Purity”  Hybrid  Centralized index, P2P file storage and transfer  Napster, SETI@home  Super-peer  A “pure” network of “hybrid” clusters  Morpheus, e-donkey  Pure  functionality completely distributed  Freenet, Gnutella

10 10 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Publishing/Requesting/Responding  hybrid  central indexing  each node registers to a central index  queries are performed to the central index  retrieval is done from other ‘peer’ nodes  pure  each ‘peer’ manages its own index about local (remote) resources  queries are typically performed with broadcasts  retrieval is done from responding ‘peers’ that hold the requested resource  super-peers  some nodes act as coordinators and manage indices for a subset of nodes  each node registers to its local coordinator  queries are performed to the coordinators, which in turn communicate as in a distributed p2p system with other super-peers  retrieval is done from other ‘peers’ that hold the requested resource

11 11 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Representative P2P Systems  Network topology based architectures  Napster  Gnutella  Morpheus  Content based architectures  Chord  P-Grid

12 12 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Napster (hybrid)  Membership: Each client joins a server, where he registers its local files to the central index  Query: A client make queries to the central server which returns references to the clients that actually hold the resources  Retrieval: The client connects to other ‘peer’ clients and retrieves the resource. The selection is performed by the user but it could be done automatically based on bandwidth, load or other criteria

13 13 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Napster (hybrid) server membership / register resources 1... 2 3 4 query response {1,4} get file

14 14 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Gnutella (pure)  Gnutella is not a system: it is a protocol, with various existing gnutella clients that implement it.  Membership: Through a predefined static list with addresses or through “host caches”, a peer can connect to a set of gnutella clients. After connection a client expands its list of known addresses with the lists obtained from other peers.  Query: A peer broadcasts a query to its known peers; these forward the query to their known peers and so on until a max TTL (packet’s Time To Live) is reached, which is the depth limit of the query.  Retrieval: Peers that hold the requested resource respond to the peer that issued the query. Through the reverse path of the query, the originating peer finally discovers a list of peers having the resource and then obtains it from one of them.

15 15 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Gnutella (pure) = forward query = processed query = source = found result = forward response Breadth-First Search (BFS)

16 16 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Gnutella (pure)  Each peer maintains a small minimum number of simultaneous active connections  These peers are selected from a locally maintained host catcher list containing the addresses of all known peers  Peer discovery  watching PING-PONG messages  noting the addresses of peers initiating queries  receiving connections from previously unknown hosts  out-of-band channels (IRC, Web)  host caches  Query propagation: upon receiving a query a peer broadcasts it to all peers that is currently connected to, and so on as a chain letter  If a peer has a file that matches the query, sends an answer back (though it still forwards the query). This process continues to a maximum depth (“search horizon”)

17 17 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Morpheus (Super-Peer)  Self organizing network  Neither search requests nor actual downloads pass through any central server  The network is multi-layered, so that more powerful computers get to become search hubs ("SuperNodes")  Any client may become a SuperNode, if it meets the criteria of processing power, bandwidth and latency  Network management is automatic - SuperNodes appear and disappear according to demand

18 18 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Morpheus (Super-Peer) SN1 SN3 SN2SN4 12.34.56.78

19 19 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Morpheus (Super-Peer)  Intelligent downloads  Morpheus implements a type of fail-over system that attempts to locate another peer sharing the same file, and automatically resume the download where it left off at the failed host  When Morpheus search engine finds that more than one active peer is serving a particular file, it associates the list of peers with the file for later reference  If the user instructs Morpheus to download the file, it can distribute the download task over this list of peers  SuperNodes act like local search hubs and proxy search requests on behalf of their connected peers Supernode Peer 1Peer 2Peer 3 File 1 File 2. File n File 1 File 2. File n File 1 File 2. File n Search query Peer 2:file 1 Get file 1

20 20 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Chord (content based search)  Chord is a lookup service, not a search service  Based on binary search trees  Provides just one operation:  A peer-to-peer hash lookup: Lookup(key)  IP address Chord does not store the data  Uses Hash function: Key identifier = SHA-1 (key) Node identifier = SHA-1 (IP address)  Both are uniformly distributed  Both exist in the same ID space  How to map key IDs to node IDs?  A key is stored at its successor: node with next higher ID (modulo N) 0 M - an item- a node 0 1 4 6 7 10 N10 N1 K0 K7 K4 Circular ID space K11

21 21 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Chord (content based search)  The goal of Chord is to provide the performance of a binary search which means O(log N) query path-length  In order to manage a maximum path-length O(log N) each node maintains a routing table (called “finger table”) with at most m entries (where m=logN)  The i th entry in the table at node n contains the identity of the first node s that succeeds n by at least 2 i-1 on the identifier circle (all arithmetic modulo 2 m )  i.e., s = successor(n + 2 i-1 ), 1≤ i ≤ m  Note that the first finger of n is its immediate successor on the circle 1 6 5 4 0 3 2 7 Start (n + 2 i- 1 ) Interval of responsibility Successor 1[1,2)1 2[2,4)3 4[4,0)0 existing node not existing node, but a possible value in ID space

22 22 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Chord (content based search) Important characteristics  Each node stores info only about a small number of possible IDs (at most logN)  Knows more info about nodes closely following it on the identifier circle  A node’s table does not generally contain enough info to locate the successor of an arbitrary key k 1 6 5 4 0 3 2 7 StartInt.Succ. 1[1,2)1 2[2,4)3 4[4,0)0 StartInt.Succ. 2[2,3)3 3[3,5)3 5[5,1)0 StartInt.Succ. 4[4,5)0 5[5,7)0 7[7,3)0

23 23 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Chord (content based search) “Finger Table” Allows Log(n)-time Lookups N32 N10 N5 N20 N110 N99 N80 N60 K19  How do we locate the successor of a key k?  If n can find a node whose ID is closer than its own to k, that node will know more about the identifier circle in the region of k than n does  Thus n searches its finger table for the node j whose ID most immediately precedes k, and asks j for the node it knows whose ID is closest to k startIntervalSucc. 100[100,101)110 101[101,103)5 103[103,107)5 107[107,115)5 115[115,3)5 3[3,35)5 35[35,100)60  By repeating this process, n learns about nodes with IDs closer and closer to k  Gradually we will find the immediate predecessor of k ……… 9[9,13)10 13[13,21)20 Lookup (K19 )

24 24 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Chord Autonomy  When new keys are inserted the system is not affected. It just finds the appropriate node and stores it  When nodes join or leave, the finger tables must be correctly maintained and also some keys must be transferred to other nodes  Also, every key is stored only in one node, which means that if that node becomes unavailable the key is also unavailable  This incurs an O(log 2 N) cost for maintaining the finger tables and assuring correctness of the system while nodes join/leave the system  This imply a restricted autonomy of the system  The only replicated information is (implicitly) the finger tables, because each node has to maintain its own

25 25 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos P-Grid  Basic characteristics  Based on building distributed, binary prefix trees  Use of randomized algorithms for constructing the access structure, updating the data and performing the search  Scale gracefully, equally for all nodes  Access structure  We assume that the index terms are binary strings, built from 0’s & 1’s  The search space is partitioned into intervals  Every peer takes over responsibility for one interval  As each key corresponds to a path in the binary prefix tree the peer is also responsible for one path of the search tree  Each peer stores the peers responsible for the other branches of the path for routing  Search requests are either processed locally or forwarded to the peers on the alternative branches

26 26 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos P-Grid  P-Grid construction  Initially, all peers are responsible for the whole search space  Whenever peers meet, they try to make a refinement to the access structure they split the search space into two parts and each take the responsibility for the one half  They also store the reference to the other peer in order to cover the other part of the search space  The same happens whenever two peers meet, that are responsible for the same interval at the same level  To avoid overspecialization of peers, we restrict the maximal length of paths that can be constructed to a defined maxlength

27 27 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos P-Grid 152346 Key intervals Level 0 001001001010010010011011110 1 126534 Key intervals Level 1 0 162345 Key intervals Level 2 00 01 10 11

28 28 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos P-Grid 152346 126534 162345 Key intervals Level 0 Key intervals Level 1 Key intervals Level 2 queries 0110 0 001001001010010010011011110 1 00 01 10 11

29 29 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos P-Grid Autonomy  The system implies that peers eventually meet, but does not examine how does this occur, i.e. it is possible that they never meet  As many peers can be responsible for the same key the general problem is how to find all those peers in case of an update  Proposed solutions multiple BFS or DFS searches for a key and propagating the update to them Creating lists of “buddies” for each peer (i.e. other peers that share the same key) and propagate the update to all buddies  These imply that although the system is decentralized and peers does not rely to central authorities, the construction and update of the access structure may impose some performance issues, especially when updating a key

30 30 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos P-Grid Autonomy  When a new node enters the system, assumes that he is responsible over the whole prefix namespace interval  When he meets with other nodes they split the interval and each maintain a reference to the other node  When a node leaves abruptly, the other nodes have incorrect references and as soon as they are aware of it they ‘resume’ responsibility over that prefix interval  The replicated information in this system is the multiple references to the same keys and the “buddies” lists (when used) in order to face the update problem

31 31 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos P2P comparison ParadigmSearch typeSearch cost (messages) Autonomy NapsterCentralized indexing String comparison O(1)Low GnutellaBreadth-first search on graph String comparison Very high MorpheusSuper-peersMetadata comparison O(logN)?High ChordImplicit binary search trees EqualityO(logN)Restricted P-GridBinary prefix trees PrefixO(logN)High

32 32 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos P2P performance metrics  Bandwidth  Storage (replication)  Processing cost  Path-length (required hops)  Quality of Results  Number of results  Satisfaction (true if # results >= X, false otherwise)  Time to satisfaction

33 33 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Hybrid p2p Advantages  Simple to manage and availability of results -due to central indexing  Less (aggregated) bandwidth consumption  Small processing cost for peers  Idle nodes that do not offer resources does not downscale system’s performance Disadvantages  Does not scale  Single point of failure  Great processing cost for server  Vulnerable to censorship

34 34 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Pure p2p Advantages  Efficiency: harnessing unused resources  Self-organizing  Robustness and availability through replication  Anonymity/legal protection/censorship resistant Disadvantages  Difficult to manage and poor results due to lack of central indexing  Bandwidth consuming  Idle nodes downscale the overall performance  Higher processing cost for peers

35 35 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Super peers Advantages  Scalable  Fault tolerant  Adaptable and self-organizing  Efficient  Low path-length Disadvantages  Hard to manage/maintain  Complex topology, difficult to evaluate its metrics (through simulation or trace driven analysis)

36 36 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Content-based searching architectures Advantages  Low search cost ( O(logN) )  Harnessing the content information into queries.  Good approach for content that can be described with simple attributes.  Less messages per query than a random graph.  Load balancing. Disadvantages  More restrictions than topology- based architectures: when nodes join/leave, rehashing and content migration needs to be performed.  A peer needs to know what is looking for, to map it to an address.  Not practical for content described by multiple attributes.  Storage and routing are closely connected

37 37 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Conclusions about p2p systems  Benefits  efficiency: harnessing unused resources  Self-organizing  Sharing cost of ownership  Robustness and availability through replication  Anonymity/legal protection  Challenges  No authority to enforce behavior  Cooperation  Unreliability of individual peers  Efficiency of distributed operations (absolute resources) Imposed research issues Resource Management Security Efficient Search

38 38 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Resource Management  Resource:  Storage/information  CPU processing  Bandwidth  Issues:  fairness  load balancing

39 39 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Security  Issues:  Reputation  Trust  Accountability  Information Preservation & Quality  Denial of service attacks  Problem: Detecting and punishing bad behavior

40 40 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Efficiency of Search  Problem: finding needle in haystack  Efficiency measured in terms of absolute resources consumed  Bandwidth  Processing cost  Several factors:  “Purity”  “Control”  Query expressiveness

41 41 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Project JXTA  JXTA is a set of protocols which allow peers to discover and communicate with each other  Protocols are defined in terms of XML messages exchanged between peers  JXTA is platform (e.g Windows), language (e.g Java) and transport (e.g TCP/IP) independent

42 42 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos JXTA Concepts  Concepts:  Peer - a node that speaks the JXTA protocols  Peer Group - a collection of cooperating peers  Message - a datagram containing an envelope, protocol headers and bodies  Pipe - an async communication channel for sending/receiving messages  Advertisement - an XML document that publishes the existence of a resource (peer, peer group, pipe, service) peer peer group pipe advertisement

43 43 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos JXTA Model

44 44 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos JXTA Protocols  Peer Discovery Protocol - used between any peers to find other peers, peer groups, or advertisements  Peer Information Protocol - used to learn about another peer's properties  Peer Resolver Protocol - 'foundation protocol' for the Peer Discovery Protocol and the Peer Information Protocol. Can be used to build other protocols as well. Defines send/receive 'generic queries' and responses to be sent from one peer to another  Peer Membership Protocol - used to find out about, join and leave groups  Pipe Binding Protocol - used to bind a pipe to an actual endpoint  Peer Endpoint Protocol - used to provide routing information for paths between peers (if a direct connection is not possible)

45 45 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos JXTA Search  JXTASearch is a framework for searching in distributed networks  A protocol for registration, query and response  A series of services for interacting via this protocol Gnutella style peer search JXTA style peer search

46 46 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos JXTA Search  Advantages  Supports very dynamic networks  Reduce publishing and query response latency  Centralized control (centralized implementation of security, accounting, membership, …)  Disadvantages  Single point of failure  Scalability  Centralized control …

47 47 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos Towards a Super-Peer Architecture for SeLene Birkbeck Orsay Uoc UoCyprus

48 48 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos References  http://www.internet2.edu/presentations/20020131-P2P-Kan.htm  http://softwaredev.earthweb.com/java/article/0,,12082_783281,00.html  http://www.cs.vu.nl/pub/globe/cp2pc/notes/allnotes/jxta.overview  http://wiki.cs.uiuc.edu/cs427/P2P+Architecture  http://www.stanford.edu/class/cs347/handouts/p2p.ppt  http://cv.uoc.es/~grc0_000228_web/Marques/Tesi_JM.htm  http://iew3.technion.ac.il/~spektory/098223/presentations/fastTrack.ppt


Download ppt "1 ICS-FORTH & Univ. of Crete SeLene November 2002 Zacharioudakis Giorgos P2P Systems & technologies Zacharioudakis Giorgos."

Similar presentations


Ads by Google