Bruce Hammer, Steve Wallis, Raymond Ho

Slides:



Advertisements
Similar presentations
Peer-to-Peer Infrastructure and Applications Andrew Herbert Microsoft Research, Cambridge
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer Systems Chapter 25. What is Peer-to-Peer (P2P)? Napster? Gnutella? Most people think of P2P as music sharing.
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Druschel Presented by: Cristian Borcea.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Exercises for Chapter 10: Peer-to-Peer Systems
Slides for Chapter 10: Peer-to-Peer Systems From Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edition 4, © Addison-Wesley.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Distributed Systems 1 Topics  What is a Distributed System?  Why Distributed Systems?  Examples of Distributed Systems  Distributed System Requirements.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
1 Accessing nearby copies of replicated objects Greg Plaxton, Rajmohan Rajaraman, Andrea Richa SPAA 1997.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Antony Rowstron and Peter Druschel Proc. of the 18th IFIP/ACM.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Distributed Lookup Systems
Peer-To-Peer Systems Chapter 10 B. Ramamurthy. 6/25/2015B.RamamurthyPage 2 Introduction Monolithic application Simple client-server Multi-tier client-server.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Wide-area cooperative storage with CFS
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
Storage management and caching in PAST PRESENTED BY BASKAR RETHINASABAPATHI 1.
Introduction to Peer-to-Peer Networks. What is a P2P network Uses the vast resource of the machines at the edge of the Internet to build a network that.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho.
Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Distributed Systems: Concepts and Design Chapter 1 Pages
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
From Coulouris, Dollimore, Kindberg and Blair Distributed Systems: Concepts and Design Edition 5, © Addison-Wesley 2012 Slides for Chapter 10: Peer-to-Peer.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
Exercises for Chapter 10: Peer-to-Peer Systems Peer-to-Peer Systems
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Peer-to-peer systems Chapter Outline Introduction Napster and its legacy Peer-to-peer middleware Routing overlay Pastry 2.
Peer-to-peer systems Chapter Outline Introduction Napster and its legacy Peer-to-peer middleware Routing overlay Pastry 2.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Slides for Chapter 10: Peer-to-Peer Systems
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Bruce Hammer, Steve Wallis, Raymond Ho
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
1 Tuesday, February 03, 2009 “Work expands to fill the time available for its completion.” - Parkinson’s 1st Law.
Peer-to-peer systems ”Sharing is caring”. Why P2P? Client-server systems limited by management and bandwidth P2P uses network resources at the edges.
Plethora: A Locality Enhancing Peer-to-Peer Network Ronaldo Alves Ferreira Advisor: Ananth Grama Co-advisor: Suresh Jagannathan Department of Computer.
Chapter 29 Peer-to-Peer Paradigm Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Distributed Web Systems Peer-to-Peer Systems Lecturer Department University.
Peer-to-Peer Information Systems Week 12: Naming
Pastry Scalable, decentralized object locations and routing for large p2p systems.
Accessing nearby copies of replicated objects
Peer-to-Peer Information Systems Week 12: Naming
Presentation transcript:

Bruce Hammer, Steve Wallis, Raymond Ho Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems Bruce Hammer, Steve Wallis, Raymond Ho

Bruce Hammer, Steve Wallis, Raymond Ho 10.1: Introduction Peer-to-Peer Systems Where data and computational resources are contributed by many hosts Objective to balance network traffic and reduce the load on the primary host Management requires knowledge of all hosts, their accessibility, (distance in number of hops), availability and performance. They exploit existing naming, routing, data replication and security techniques in new ways Client-Server relationships are more transaction-based than defined as permanent roles. Peer-to-Peer elevates clients to servers – on an asynchronous basis Subordinate hosts need to be aware that they are participating in the peer-to-peer system Objective – to build a reliable resource sharing layer over an unreliable and untrusted collection of computers and networks Bruce Hammer, Steve Wallis, Raymond Ho

Bruce Hammer, Steve Wallis, Raymond Ho 10.1: Introduction Goal of Peer-to-Peer Systems Sharing data and resources on a very large scale ‘Applications that exploit resources available at the edges of the Internet – storage, cycles, content, human presence’ (Shirky 2000) Uses data and computing resources available in the personal computers and workstations Eliminating any requirement for separately-managed servers and their associated infrastructure – Can use peer-to-peer within one company, but this relies on each subordinate host to maintain its own infrastructure The professor has mentioned the SETI@home project, Search for Extra-Terrestrial Intelligence, which partitions a stream of digitzed radio telescope data into 107-second work units, and parcels them out to personal computers on the network. This is controlled by one single server. SETI does not involve any communication or coordination between computers while they are processing the work units. Single message to central server. Design aims to deliver a service that is fully decentralized and self-organizing, dynamically balancing the storage and processing loades between all the participaating computers as computers join and leave the service Increasingly attractive as the performance difference narrows between host servers and personal computers Bruce Hammer, Steve Wallis, Raymond Ho

Bruce Hammer, Steve Wallis, Raymond Ho 10.1: Introduction Characteristics of Peer-to-Peer Systems Each computer contributes resources All the nodes have the same functional capabilities and responsibilities No centrally-administered system Offers a limited degree of anonymity Algorithm for placing and accessing the data Balance workload, ensure availability Without adding undue overhead Napster legal problems need to add anonymity Not needed in corporate peer-to-peer systems Bruce Hammer, Steve Wallis, Raymond Ho

Bruce Hammer, Steve Wallis, Raymond Ho 10.1: Introduction Evolution of Peer-to-Peer Systems Napster – download music, return address Freenet, Gnutella, Kazaa and BitTorrent More sophisticated – greater scalability, anonymity and fault tolerance Pastry, Tapestry, CAN, Chord, Kademlia Peer-to-peer middleware Steve will demonstrate BitTorrent Napster and Peer-to-Peer middleware are the next two presentations Bruce Hammer, Steve Wallis, Raymond Ho

Bruce Hammer, Steve Wallis, Raymond Ho 10.1: Introduction Evolution (Continued) Immutable Files, (music, video) GUIDs (Globally Unique Identifiers) Middleware to provide better routing algorithms, react to outages Evolve to mutable files Application within one company’s intranet Globally Unique Identifiers, usually derived as a secure hash – from some or all of the resource’s state. (metioned in Chapter 7) The use of a secure hash makes a resource ‘self certifying’ The use of p2p that demand a high level of availability for the objects stored requires careful application design to avoid situations in all of the replicas are unavailable. The use of randomly-distributed GUIDs assists by distributing the object replicas to randomly-located nodes in the underlying network. Spans many organizations across the globe minimizes the risk. Bruce Hammer, Steve Wallis, Raymond Ho

10.2: Napster and its Legacy Provided a means for users to share music files – primarily MP3s Launched 1999 – several million users Not fully peer-to-peer since it used central servers to maintain lists of connected systems and the files they provided, while actual transactions were conducted directly between machines Proved feasibility of a service using hardware and data owned by ordinary Internet users Bruce Hammer, Steve Wallis, Raymond Ho

10.2: Napster and its Legacy Bruce Hammer, Steve Wallis, Raymond Ho

10.2: Napster and its Legacy Bit Torrent Designed and implemented 2001 Next generation from Napster - true Peer To Peer (P2P) Can handle large files e.g WAV, DVD, FLAC (e.g 1CD = approx 500KB) After the initial pieces transfer from the seed, the pieces are individually transferred from client to client. The original seeder only needs to send out one copy of the file for all the clients to receive a copy Tracker URL hosted at Bit Torrent site e.g Traders Den Bruce Hammer, Steve Wallis, Raymond Ho

10.2: Napster and its Legacy Bit Torrent (contd) Many Bit Torrent clients e.g Vuze Keep track of seeders and leechers Torrent – contains metdata about the files to be shared and about the tracker Tracker - coordinates the file distribution, and which controls which other peers to download the pieces of the file. Bruce Hammer, Steve Wallis, Raymond Ho

10.2: Napster and its Legacy Bruce Hammer, Steve Wallis, Raymond Ho

10.2: Napster and its Legacy Bruce Hammer, Steve Wallis, Raymond Ho

10.2: Napster and Its Legacy Bruce Hammer, Steve Wallis, Raymond Ho

10.2: Napster and its Legacy Bruce Hammer, Steve Wallis, Raymond Ho

10.2: Napster and its Legacy Bruce Hammer, Steve Wallis, Raymond Ho

10.2: Napster and its Legacy Bruce Hammer, Steve Wallis, Raymond Ho

10.3: Peer-to-Peer Middleware To provide mechanism to access data resources anywhere in network Functional Requirements : Simplify construction of services across many hosts in wide network Add and remove resources at will Add and remove new hosts at will Interface to application programmers should be simple and independent of types of distributed resources Bruce Hammer, Steve Wallis, Raymond Ho

10.3: Peer-to-Peer Middleware Peer To Peer Middleware (contd) Non-Functional Requirements : Global Scalability Load Balancing Optimization for local interactions between neighboring peers Accommodation to highly dynamic host availability Security of data in an environment simplify construction of services across many hosts in wide network Anonymity, deniability and resistance to censorship Bruce Hammer, Steve Wallis, Raymond Ho

10.3: Peer-to-Peer Middleware Peer To Peer Middleware (contd) Global scalability, dynamic host availability and load sharing and balancing across large numbers of computers pose major design challenges. Design of Middleware layer Knowledge of locations of objects must be distributed throughout network Use of replication to achieve this Bruce Hammer, Steve Wallis, Raymond Ho

Bruce Hammer, Steve Wallis, Raymond Ho 10.4: Routing Overlays Routing Overlays Sub-systems, APIs, within the peer-to-peer middleware Responsible for locating nodes and objects Implements a routing mechanism in the application layer Separate from any other routing mechanisms such as IP routing Ensures that any node can access any object by routing each request thru a sequence of nodes Exploits knowledge at each node to locate the destination Routing Overlay is a distributed algorithm Within the middleware of host systems Bruce Hammer, Steve Wallis, Raymond Ho

Bruce Hammer, Steve Wallis, Raymond Ho 10.4: Routing Overlays GUIDs ‘pure’ names or opaque identifiers Reveal nothing about the locations of the objects Building blocks for routing overlays Computed from all or part of the state of the object using a function that deliver a value that is very likely to be unique. Uniqueness is then checked against all other GUIDs Not human readable From Section 9.1.1. pure names are simply uninterpreted bit patterns. Non-pure names contain information about the object that the name, e.g., the location. Bruce Hammer, Steve Wallis, Raymond Ho

Bruce Hammer, Steve Wallis, Raymond Ho 10.4: Routing Overlays Tasks of a routing overlay Client submits a request including the object GUID, routing overlay routes the request to a node at which a replica of the object resides A node introduces a new object by computing its GUID and announces it to the routing overlay Clients can remove an object Nodes may join and leave the service Routing Overlays can check availability of nodes on a timely basis and route to a requestor thru various paths When a new user of BitTorrent signs on to retrieve a concert, the routing overlay adds the node and finds the best path. Then the receiving node is part of the network and must share its file with n requestors The client can leave, but should have shared before it leaves. Bruce Hammer, Steve Wallis, Raymond Ho

Bruce Hammer, Steve Wallis, Raymond Ho 10.4: Routing Overlays Types of Routing Overlays DHT – Distributed Hash Tables Computes the GUID from all or part of the state of the object DOLR – Distributed Object Location and Routing DOLR is a layer over the DHT that maps GUIDs and address of nodes DHT – GUIDs are stored based on the hash value DOLR – GUIDs host address is notified using the Publish() operation In DHT model – GUID are stored numerically based on the hash value, which is randomly generatted The DOLR provides a more useful mapping Now, Raymond will detail two Routing Overlay Case Studies – Pastry and Tapestry Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Both Pastry and Tapestry adopt the prefix routing approach Pastry has a straightforward but effective design. It is the message routing infrastructure deployed in applications such as PAST, an archival file system Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Tapestry is the basis for OceanStore storage system. It has a more complex architecture than Pastry because it aims to support a wider range of locality approaches Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Let’s talk about Pastry Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry A routing overlay with the common characteristics All the nodes and objects are assigned 128-bit GUIDs Nodes are computed by applying a secure hash function such as SHA-1 to the public key with each node is provided Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Objects such as files he GUIDs is computed by a secure hash function to the object’s name or to some part of the object’s stored state The resulting GUID has the usual properties of secure hash values randomly distributed in the range 0 to 2128 -1 In a network with N participating nodes, the Pastry routing algorithm will correctly route a message addressed to an GUID in O(log N) steps Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry GUID delivers message to an identified active node, otherwise, delivers to another active node numerically closest to the original one Active nodes take responsibility of rprocessing requests addressed to al objects in their numerical neighborhood Routing steps involve the user of an underlying transport protocol (normally UDP) to transfer the message to a Pastry node that is ‘closer’ to its destination Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry The real transport of a message across the Internet between two Pastry nodes may requires a substantial number of IP hops Pastry users a locality metric based on network distance in the underlying network to select appropriate neighbors when setting up the routing tables used at each node Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry The participated Hosts are fully self organizing and obtaining the data need to construct a routing table and other required state from existing members in O(log N) messages, where N is the number of hosts participating in the overlay When a node fails, the remaining nodes can detect its absence and cooperatively reconfigure to reflect the required changes in the routing structure Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Pastry Routing Algorithm The algorithm involves the user of a routing table at each node to route messages efficiently. Describe the algorithm in two stages Stage 1: Simplified form to routes messages correctly but inefficiently without a routing table Stage 2: Describe full routing algorithm with routing table which routes a request to any node in O(log N) messages Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Stage 1: Each active node stores a leaf set – a vector L (of size of 2l) The vector contains the GUIDs and IP addresses of the nodes whose GUIDs are numerically closest on either side of its own (l above and l below) Leaf sets are maintained by Pastry as nodes join and leave Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Even after a node failure they will be corrected within a short time within the defined maximum rate of failure The GUID space is treated as circular The GUID space is treated as circular: GUID 0’s lower neighbour is 2-128 -1 Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Stage 1: Circular routing alone is correct but inefficient Destination D Node A (65A1FC) receives message M with destination address D (D46A1C) This diagram shows a view of active nodes distributed in the circular address space. Each dot represents a node. As we mentioned before each node stores a leaf set. Since every leaf set includes the GUIDs and IP addresses of the current node’s immediate neighbours, a Pastry system with correct leaf sets of size at least 2 can route messages to any GUID. In this case for a Pastry system with l=4, node A that receives a message M with destination address D routes the message by comparing D with its own GUID A and with each of the GUIDs in its leaf set and forwarding M to the node amongst them that is numerically closest to D. This process will eventually deliver M to the active node closest to D. This routing scheme is clearly very inefficient because it requires ~N/2l hops to deliver a message in a network with N nodes. Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Stage 2: Full Pastry algorithm Efficient routing is achieved with the aid of routing tables Each Pastry node maintains a tree-structured routing table giving GUIDs and IP address for a set of nodes spread throughout the entire range of 2128 possible GUID values Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Structure of a routing table This figure shows the structure of a routing table for a specific node. The first 4 rows of a Pastry routing table is shown GUIDs are viewed as hexadecimal values and the table classifies GUIDs based on their hexadecimal prefixes. In our case, this Pastry system has 128/4 = 32 rows. Any row n contains 15 entries – one for each possible value of the nth hexadecimal digit excluding the value in the local nodes’ GUID. Each entry in the table points to one of the potentially many nodes whose GUIDs have the relevant prefix. Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Routing a message with the aid of routing table and the message can be delivered in ~log 16 (N) hops. The routing process at any node A uses the information in its routing table R and leaf set L to handle each request from an application and each incoming message from another node according to the algorithm shown in next slide. Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Pastry’s routing algorithm In the Pastry’s routing algorithm, step 1, 2 and 7 perform the action described in Stage 1, complete but inefficient routing algorithm. Adding the other steps are designed to use the routing table to improve the algorithm’s performance by reducing the number of hops required. Step 4, 5 come into play whenever D does not fall within the numeric range of the current node’s leaf set and relevant routing table entries are available. To locate next hop, the process compares the hexadecimal digits of D with those of A (the GUID of the current node) from left to right to discover the length, p of their longest common prefix. This length is then used as a row offset, together with the first non-matching digit of D as a column offset, to access the required element of the routing table. The construction of the table ensures that this element (if not empty) contains the IP address of a node whose GUID has p+1 prefix digits in common with D. Step 7 is used when D falls outside the numeric range of the leaf set and there isn’t a relevant routing table entry. It arises only when nodes have recently failed and the table hasn’t yet bgeen updated. Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Host integration New nodes use a joining protocol to acquire their routing table and leaf set contents Notify other nodes of changes they must make to their tables. - First, the new node computes a suitable GUID then it makes contact with a nearby Pastry node. Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Host failure or departure Nodes in Pastry infrastructure may fail or depart without warning A node is considered failed when its immediate neighbours can no longer communicate with it Required to repair the leaf sets that contain the failed node’s GUID - First, the new node computes a suitable GUID then it makes contact with a nearby Pastry node. Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Locality The locality metric is used to compare candidates and the closest available node is chosen This mechanism cannot produce globally optimal routings because available information is not comprehensive Locality metric (number of IP hops or measured latency) Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Fault tolerance Use ‘at-least-once’ delivery mechanism and repeat several time sin the absence of a response to allow Pastry a longer time window to detect and repair node failures - First, the new node computes a suitable GUID then it makes contact with a nearby Pastry node. Bruce Hammer, Steve Wallis, Raymond Ho

10.5: Overlay Case Studies: Pastry, Tapestry Both Pastry and Tapestry adopt the prefix routing approach Pastry has a straightforward but effective design. Tapestry has a more complex architecture than Pastry because it aims to support a wider range of locality approaches Pastry has a straightforward but effective design. It is the message routing infrastructure deployed in applications such as PAST, an archival file system Tapestry is the basis for OceanStore storage system. It has a more complex architecture than Pastry because it aims to support a wider range of locality approaches Bruce Hammer, Steve Wallis, Raymond Ho 44 44

10.5: Overlay Case Studies: Pastry, Tapestry Let’s talk about Pastry Bruce Hammer, Steve Wallis, Raymond Ho 45 45

10.5: Overlay Case Studies: Pastry, Tapestry A routing overlay network All the nodes and objects are assigned 128-bit GUIDs Nodes are computed by applying a secure hash function such as SHA-1 to the public key with each node is provided Bruce Hammer, Steve Wallis, Raymond Ho 46 46

10.5: Overlay Case Studies: Pastry, Tapestry Objects such as files the GUIDs is computed by a secure hash function to the object’s name or to some part of the object’s stored state The resulting GUID has the usual properties of secure hash values randomly distributed in the range 0 to 2128 -1 In a network with N participating nodes, the Pastry routing algorithm will correctly route a message addressed to an GUID in O(log N) steps Bruce Hammer, Steve Wallis, Raymond Ho 47 47

10.5: Overlay Case Studies: Pastry, Tapestry GUID delivers message to an identified active node, otherwise, delivers to another active node numerically closest to the original one Active nodes take responsibility of processing requests addressed to all objects in their numerical neighborhood Routing steps involve the user of an underlying transport protocol (normally UDP) to transfer the message to a Pastry node that is ‘closer’ to its destination Bruce Hammer, Steve Wallis, Raymond Ho 48 48

10.5: Overlay Case Studies: Pastry, Tapestry The real transport of a message across the Internet between two Pastry nodes may requires a substantial number of IP hops Pastry uses a locality metric based on network distance in the underlying network to select appropriate neighbors when setting up the routing tables used at each node - Thousands of hosts located at widely-dispersed sites can participate in a Pastry overlay Bruce Hammer, Steve Wallis, Raymond Ho 49 49

10.5: Overlay Case Studies: Pastry, Tapestry The participated Hosts are fully self organizing Nodes obtains data from network to construct a routing table and other required state from existing members When a node fails, the remaining nodes reconfigure the required changes in the routing structure The participated Hosts are fully self organizing and obtaining the data need to construct a routing table and other required state from existing members in O(log N) messages, where N is the number of hosts participating in the overlay When a node fails, the remaining nodes can detect its absence and cooperatively reconfigure to reflect the required changes in the routing structure Bruce Hammer, Steve Wallis, Raymond Ho 50 50

10.5: Overlay Case Studies: Pastry, Tapestry Pastry Routing Algorithm The algorithm involves the use of a routing table at each node to route messages efficiently. Describe the algorithm in two stages Stage 1: Simplified form to routes messages correctly but inefficiently without a routing table Stage 2: Describe full routing algorithm with routing table which routes a request to any node in O(log N) messages Bruce Hammer, Steve Wallis, Raymond Ho 51 51

10.5: Overlay Case Studies: Pastry, Tapestry Stage 1: Each active node stores a leaf set – a vector L (of size of 2l) The vector contains the GUIDs and IP addresses of the nodes whose GUIDs are numerically closest on either side of its own (l above and l below) Leaf sets are maintained by Pastry as nodes join and leave Bruce Hammer, Steve Wallis, Raymond Ho 52 52

10.5: Overlay Case Studies: Pastry, Tapestry Even after a node failure they will be corrected within a short time within the defined maximum rate of failure The GUID space is treated as circular The GUID space is treated as circular: GUID 0’s lower neighbor is 2-128 -1, you will see it in next slide Bruce Hammer, Steve Wallis, Raymond Ho 53 53

10.5: Overlay Case Studies: Pastry, Tapestry Stage 1: Circular routing alone is correct but inefficient Destination D Node A (65A1FC) receives message M with destination address D (D46A1C) This diagram shows a view of active nodes distributed in the circular address space. Each dot represents a node. As we mentioned before each node stores a leaf set. Since every leaf set includes the GUIDs and IP addresses of the current node’s immediate neighbours, a Pastry system with correct leaf sets of size at least 2 can route messages to any GUID. In this case for a Pastry system with l=4, node A that receives a message M with destination address D routes the message by comparing D with its own GUID A and with each of the GUIDs in its leaf set and forwarding M to the node amongst them that is numerically closest to D. This process will eventually deliver M to the active node closest to D. This routing scheme is clearly very inefficient because it requires ~N/2l hops to deliver a message in a network with N nodes. Bruce Hammer, Steve Wallis, Raymond Ho 54 54

10.5: Overlay Case Studies: Pastry, Tapestry Stage 2: Full Pastry algorithm Efficient routing is achieved with the aid of routing tables Each Pastry node maintains a tree-structured routing table giving GUIDs and IP address for a set of nodes spread throughout the entire range of 2128 possible GUID values Bruce Hammer, Steve Wallis, Raymond Ho 55 55

10.5: Overlay Case Studies: Pastry, Tapestry Structure of a routing table This figure shows the structure of a routing table for a specific node. The first 4 rows of a Pastry routing table is shown GUIDs are viewed as hexadecimal values and the table classifies GUIDs based on their hexadecimal prefixes. In our case, this Pastry system has 128/4 = 32 rows. Any row n contains 15 entries – one for each possible value of the nth hexadecimal digit excluding the value in the local nodes’ GUID. Each entry in the table points to one of the potentially many nodes whose GUIDs have the relevant prefix. Bruce Hammer, Steve Wallis, Raymond Ho 56 56

10.5: Overlay Case Studies: Pastry, Tapestry Routing a message with the aid of routing table and the message can be delivered in ~log 16 (N) hops. The routing process at any node A uses the information in its routing table R and leaf set L to handle each request from an application and each incoming message from another node From the diagram, you see the difference how the message is routed between stage1 and stage2 We will see the routing algorithm in more detail in next slide. Node A Bruce Hammer, Steve Wallis, Raymond Ho 57 57

10.5: Overlay Case Studies: Pastry, Tapestry Pastry’s routing algorithm In the Pastry’s routing algorithm, step 1, 2 and 7 perform the action described in Stage 1, complete but inefficient routing algorithm. Adding the other steps are designed to use the routing table to improve the algorithm’s performance by reducing the number of hops required. Step 4, 5 come into play whenever D does not fall within the numeric range of the current node’s leaf set and relevant routing table entries are available. To locate next hop, the process compares the hexadecimal digits of Node D with those of node A (the GUID of the current node) from left to right to discover the length, row p of their longest common prefix. This length is then used as a row offset, together with the first non-matching digit of D as a column offset, to access the required element of the routing table. The construction of the table ensures that this element (if not empty) contains the IP address of a node whose GUID has p+1 prefix digits in common with node D. Step 7 is used when node D falls outside the numeric range of the leaf set and there isn’t a relevant routing table entry. It arises only when nodes have recently failed and the table hasn’t yet been updated. Basically tell me what is the short and effective path to go Bruce Hammer, Steve Wallis, Raymond Ho 58 58

10.5: Overlay Case Studies: Pastry, Tapestry Host integration New nodes use a joining protocol to acquire their routing table and leaf set contents Notify other nodes of changes they must make to their tables. - First, the new node computes a suitable GUID then it makes contact with a nearby Pastry node. Bruce Hammer, Steve Wallis, Raymond Ho 59 59

10.5: Overlay Case Studies: Pastry, Tapestry Host failure or departure Nodes in Pastry infrastructure may fail or depart without warning A node is considered failed when its immediate neighbours can no longer communicate with it Required to repair the leaf sets that contain the failed node’s GUID - First, the new node computes a suitable GUID then it makes contact with a nearby Pastry node. Bruce Hammer, Steve Wallis, Raymond Ho 60 60

10.5: Overlay Case Studies: Pastry, Tapestry Locality The Pastry routing structure is highly redundant The locality metric is used to compare candidates and the closest available node is chosen This mechanism cannot produce globally optimal routings because available information is not comprehensive The Pastry routing structure is highly redundant to take account of this redundancy to reduce actual message transmission times by exploiting the locality properties of nodes in the underlying transport network Locality metric (number of IP hops or measured latency) Bruce Hammer, Steve Wallis, Raymond Ho 61 61

10.5: Overlay Case Studies: Pastry, Tapestry Fault tolerance Send ‘heartbeat’ messages to neighboring nodes Use ‘at-least-once’ delivery mechanism and repeat several times in the absence of a response to allow Pastry a longer time window to detect and repair node failures Introduce randomness into the Pastry routing algorithm to overcome the problem Send ‘heartbeat’ messages (messages sent at fixed time intervals to indicate that the sender is alive) to neighboring nodes in their leaf sets but it still may not reliable for failed node detection and malicious nodes that may attempt to interfere with correct routing To overcome these problem, Use ‘at-least-once’ delivery mechanism and repeat several times in the absence of a response to allow Pastry a longer time window to detect and repair node failures A small degree of randomness is introduced into the routing algorithm. With this random variation, client re-transmissions should eventually succeed even in the presence of a mall number of malicious nodes Bruce Hammer, Steve Wallis, Raymond Ho 62 62

10.5: Overlay Case Studies: Pastry, Tapestry Dependability Additional dependability measures and some performance optimizations in the host management algorithms were included in the update version called MSPastry by the authors - Bruce Hammer, Steve Wallis, Raymond Ho 63 63

10.5: Overlay Case Studies: Pastry, Tapestry Evaluation Work Use MSPastry to evaluate the impact on perfromance and dependability of the host join/leave rate and the associated dependability mechanisms Castro and his colleagues has carried out an exhaustive performance evaluation of MSPastry to determine the impact on performance and dependability of the host join/leave rate and the associated dependability mechanisms. Used MSPastry system running on a single machine that simulates a large network of hosts. Used the result to compare to a real application with MSPastry. Dependability: assumed IP message loss rate of 0%, MSPastry failed to deliver 1.5 in 100,000 requests loss rate of 5%, 3.3 in 100,000 requests and 1.6 in 100,000 requests were delivered arrived to wrong node. Performance: Use metric Relative Delay Penalty (RDP) to measure of the extra cost incurred in employing an overlay routing layer with values ~1.8 with zero network message loss to ~2.2 with 5% network message loss. Overheads: The extra network load generated by control traffic – messages involved in maintaining leaf sets and routing tables – was less than 2 messages per minutes per node. Overall these result show that the overlay can achieve good performance and high dependability with thousands of nodes operating in realistic environments. Bruce Hammer, Steve Wallis, Raymond Ho 64 64

10.5: Overlay Case Studies: Pastry, Tapestry Implements a distributed hash table and routes messages to nodes based on GIDs associated with resources using prefix routing in a manner similar to Pastry. API conceals the distributed hash table from applications behind a Distributed Object Location and Routing (DOLR) interface Nodes that hold resources use the publish (GUID) primitive to make them known to Tapestry, the holders of resources remain responsible for string them. Replicated resources are published with the same GUID by each node that holds a replica, resulting in multiple entries in the Tapestry routing structure. Tapestry applications can place replicas close to frequent users of resources in order to reduce latencies and minimize network loads or to ensure tolerance of network and host failures. Bruce Hammer, Steve Wallis, Raymond Ho 65 65

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Ivy file stores This section is to further study three routing overlay – Squirrel web caching service based on Pastry, the OceanStore and Ivy file stores Bruce Hammer, Steve Wallis, Raymond Ho 66 66

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel The SHA-1 secure hash function is applied to the RL of each cached object to produce a 128-bit Pastry GUID Authors based on the end-to-end argument, the HTTPS protocol should be used to achieve a much better guarantee of those interactions that require it Bruce Hammer, Steve Wallis, Raymond Ho 67 67

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Squirrel implementation the node whose GUID is numerically closest to the GUID of an object becomes that object’s home node to hold any cached copy of the object Client nodes respond to cache local and remote web objects Request a fresh copy of a object from home node if no copy in local cache Bruce Hammer, Steve Wallis, Raymond Ho 68 68

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Evaluation of Squirrel The reduction in total external bandwidth used The latency perceived by users for access to web objects The computational and storage load imposed on client nodes Based on these measurements, the authors of Squirrel concluded that its performance is comparable to that of centralized cache. Squirrel achieves a reduction in the observed latency for web page access close to that achievable by a centralized cache server with a similarly-sized dedicated cache. Bruce Hammer, Steve Wallis, Raymond Ho 69 69

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel OceanStore file store Aims to provide a very large scale, incrementally-scalable persistent storage facility for mutable data objects long-term persistence reliability in constantly changing network and computing resources used for NFS-like file service, electronic mail hosting, database sharing persistent storage of large numbers of data objects Bruce Hammer, Steve Wallis, Raymond Ho 70 70

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Built prototype, called Pond to validate the OceanStore design and compare its performance with traditional approaches Pond uses Tapestry routing overlay mechanism to place blocks of data at nodes distributed throughout the Internet and to dispatch requests to them Bruce Hammer, Steve Wallis, Raymond Ho 71 71

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Bruce Hammer, Steve Wallis, Raymond Ho 72 72

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel OceanStore/Pond Storage organization Data objects are analogous to files, with their data stored in a set of blocks Each object represents an ordered sequence of immutable version Three types of identifier used in the storage BGUID - Secure hash of a data block VGUID - BGUID of the root block of a version AGUID – Uniquely identifies all the version of an object Bruce Hammer, Steve Wallis, Raymond Ho 73 73

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Performance of OceanStore/Pond Pond is prototype to prove feasibility of a scalable peer-to-peer file service Evaluated against several purpose-designed benchmarks including Andrew benchmark Use Simple emulation of an NFS client and server Bruce Hammer, Steve Wallis, Raymond Ho 74 74

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Conclusions of OceanStore/Pond When operating over a wide-area network Substantially exceeds NFS for reading Within a factor of three of NFS for updating files and directories LAN results are substantially worse Overall, the results suggest that an Internet-scale peer-to-peer file service would be an effective solution for the distribution of files that do not change very rapidly Bruce Hammer, Steve Wallis, Raymond Ho 75 75

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Ivy file system A read/write file system supports multiple readers and writers Implemented over an overlay routing layer Distributed hash-addressed data store Emulates a Sun FNS server Stores the state of files as logs Scans the logs to reconstructs the files - Bruce Hammer, Steve Wallis, Raymond Ho 76 76

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Ivy file system Resolved issues to host files in partially trusted or unreliable machines The maintenance of consistent file metadata Partial trust between participants and vulnerability Continued operation during partitions - Bruce Hammer, Steve Wallis, Raymond Ho 77 77

Bruce Hammer, Steve Wallis, Raymond Ho 10.7: Summary Napster – immutable data, unsophisticated routing Current – mutable data, routing overlays, sophisticated algorithms Internet or company intranet support Distributed Computing (SETI) Bruce Hammer, Steve Wallis, Raymond Ho

Bruce Hammer, Steve Wallis, Raymond Ho 10.7: Summary Benefits of Peer-to-Peer Systems Ability to exploit unused resources (storage, processing) in the host computers Scalability to support large numbers of clients and hosts with load balancing of network links and host computer resources Self-organizing properties of the middleware platforms reduces costs Bruce Hammer, Steve Wallis, Raymond Ho

Bruce Hammer, Steve Wallis, Raymond Ho 10.7: Summary Weaknesses of Peer-to-Peer Systems Costly for the storage of mutable data compared to trusted, centralized service Can not yet guarantee anonymity to hosts Bruce Hammer, Steve Wallis, Raymond Ho

10: Peer-to-Peer Systems Questions???? Comments?? For the professor, This fields seems to be evolving rapidly, do you have interesting updates for us? Bruce Hammer, Steve Wallis, Raymond Ho