Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

Slides:



Advertisements
Similar presentations
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Advertisements

Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
CHORD: A Peer-to-Peer Lookup Service CHORD: A Peer-to-Peer Lookup Service Ion StoicaRobert Morris David R. Karger M. Frans Kaashoek Hari Balakrishnan Presented.
Ion Stoica, Robert Morris, David Liben-Nowell, David R. Karger, M
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Peer-to-Peer Systems (cntd.) 15/11 – 2004 INF5070 – Media Servers and Distribution Systems:
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion StoicaRobert Morris David Liben-NowellDavid R. Karger M. Frans KaashoekFrank.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
PROMISE A Peer-to-Peer Media Streaming System Using CollectCast CPSC Presentation by Patrick Wong.
1 Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Vikash Agarwal, Reza Rejaie Computer and Information Science Department University of Oregon January 19, 2005 Adaptive Multi-Source.
Spring 2003CS 4611 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
Reza Rejaie Computer and Information Science Department University of Oregon Antonio Ortega Integrated Media Systems Center University of Southern California.
Distributed Lookup Systems
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Decentralized Location Services CS273 Guest Lecture April 24, 2001 Ben Y. Zhao.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
1 Peer-to-Peer Networks Outline Survey Self-organizing overlay network File system on top of P2P network Contributions from Peter Druschel.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
CSE 461 University of Washington1 Topic Peer-to-peer content delivery – Runs without dedicated infrastructure – BitTorrent as an example Peer.
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Peer-to-Peer Systems 10/11 – 2003 INF5070 – Media Storage and Distribution Systems:
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Tapestry GTK Devaroy (07CS1012) Kintali Bala Kishan (07CS1024) G Rahul (07CS3009)
Wide-Area Cooperative Storage with CFS Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley.
Peer-to-Peer Systems 1/11 – 2004
Wide-area cooperative storage with CFS Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica.
Content Overlays (Nick Feamster). 2 Content Overlays Distributed content storage and retrieval Two primary approaches: –Structured overlay –Unstructured.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Distribution – Part III 26/10 & 2/11 – 2007 INF5071 – Performance in distributed systems:
Presentation 1 By: Hitesh Chheda 2/2/2010. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT Laboratory for Computer Science.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Distribution – Part III 4 June 2016 INF5071 – Performance in distributed systems:
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 More on Plaxton routing There are n nodes, and log B n digits in the id, where B = 2 b The neighbor table of each node consists of - primary neighbors.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Peer to Peer Network Design Discovery and Routing algorithms
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
CS 347Notes081 CS 347: Parallel and Distributed Data Management Notes 08: P2P Systems.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
Large Scale Sharing Marco F. Duarte COMP 520: Distributed Systems September 19, 2004.
Peer-to-Peer (P2P) File Systems. P2P File Systems CS 5204 – Fall, Peer-to-Peer Systems Definition: “Peer-to-peer systems can be characterized as.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Peer-to-Peer Information Systems Week 12: Naming
A Survey of Peer-to-Peer Content Distribution Technologies Stephanos Androutsellis-Theotokis and Diomidis Spinellis ACM Computing Surveys, December 2004.
Distribution – Part III
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Consistent Hashing and Distributed Hash Table
Peer-to-Peer Information Systems Week 12: Naming
Presentation transcript:

Peer-to-Peer Systems 27/10 & 10/11 – 2006 INF5071 – Performance in distributed systems:

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Client-Server backbone network local distribution network local distribution network local distribution network  Traditional distributed computing  Successful architecture, and will continue to be so (adding proxy servers)  Tremendous engineering necessary to make server farms scalable and robust

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Distribution with proxies  Hierarchical distribution system  E.g. proxy caches that consider popularity  Popular videos replicated and kept close to clients  Unpopular ones close to the root servers  Popular videos are replicated more frequently end-systems local servers root servers regional servers completeness of available content

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Peer-to-Peer (P2P) backbone network local distribution network local distribution network local distribution network  Really an old idea - a distributed system architecture  No centralized control  Nodes are symmetric in function  Typically, many nodes, but unreliable and heterogeneous

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Overlay networks LAN backbone network backbone network backbone network LAN IP routing IP link IP path Overlay node Overlay link

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems P2P  Many aspects similar to proxy caches  Nodes act as clients and servers  Distributed storage  Bring content closer to clients  Storage limitation of each node  Number of copies often related to content popularity  Necessary to make replication and de-replication decisions  Redirection  But  No distinguished roles  No generic hierarchical relationship  At most hierarchy per data item  Clients do not know where the content is  May need a discovery protocol  All clients may act as roots (origin servers)  Members of the P2P network come and go

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems P2P Systems  Peer-to-peer systems  New considerations for distribution systems  Considered here  Scalability, fairness, load balancing  Content location  Failure resilience  Routing  Application layer routing  Content routing  Request routing  Not considered here  Copyright  Privacy  Trading

Examples: Napster

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Napster  Program for sharing (music) files over the Internet  Approach taken  Central index  Distributed storage and download  All downloads are shared  P2P aspects  Client nodes act also as file servers

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Napster  Client connects to Napster with login and password  Transmits current listing of shared files  Napster registers username, maps username to IP address and records song list central index join...

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Napster central index query answer...  Client sends song request to Napster server  Napster checks song database  Returns matched songs with usernames and IP addresses (plus extra stats)

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Napster central index get file...  User selects a song, download request sent straight to user  Machine contacted if available

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Napster: Assessment  Scalability, fairness, load balancing  Replication to querying nodes  Number of copies increases with popularity  Large distributed storage  Unavailability of files with low popularity  Network topology is not accounted for at all  Latency may be increased  Content location  Simple, centralized search/location mechanism  Can only query by index terms  Failure resilience  No dependencies among normal peers  Index server as single point of failure

Examples: Gnutella

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Gnutella  Program for sharing files over the Internet  Approach taken  Purely P2P, centralized nothing  Dynamically built overlay network  Query for content by overlay broadcast  No index maintenance  P2P aspects  Peer-to-peer file sharing  Peer-to-peer querying  Entirely decentralized architecture  Many iterations to fix poor initial design (lack of scalability)

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Gnutella: Joining  Connect to one known host and send a broadcast ping  Can be any host, hosts transmitted through word-of-mouth or host- caches  Use overlay broadcast ping through network with TTL of 7 TTL 1 TTL 2 TTL 3 TTL 4

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Gnutella: Joining  Hosts that are not overwhelmed respond with a routed pong  Gnutella caches these IP addresses or replying nodes as neighbors  In the example the grey nodes do not respond within a certain amount of time (t hey are overloaded)

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Gnutella: Query  Query by broadcasting in the overlay  Send query to all overlay neighbors  Overlay neighbors forward query to all their neighbors  Up to 7 layers deep (TTL 7) query TTL:7 TTL:6

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Gnutella: Query  Send routed responses  To the overlay node that was the source of the broadcast query  Querying client receives several responses  User receives a list of files that matched the query and a corresponding IP address query response query response query response query response

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Gnutella: Transfer  File transfer  Using direct communication  File transfer protocol not part of the Gnutella specification download request requested file

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Gnutella: Assessment  Scalability, fairness, load balancing  Replication to querying nodes  Number of copies increases with popularity  Large distributed storage  Unavailability of files with low popularity  Bad scalability, uses flooding approach  Network topology is not accounted for at all, latency may be increased  Content location  No limits to query formulation  Less popular files may be outside TTL  Failure resilience  No single point of failure  Many known neighbors  Assumes quite stable relationships

Examples: Freenet

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Freenet  Program for sharing files over the Internet  Focus on anonymity  Approach taken  Purely P2P, centralized nothing  Dynamically built overlay network  Query for content by hashed query and best-first-search  Caching of hash values and content  Content forwarding in the overlay  P2P aspects  Peer-to-peer file sharing  Peer-to-peer querying  Entirely decentralized architecture  Anonymity

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Freenet: Nodes and Data  Nodes  Routing tables  Contain IP addresses of other nodes and the hash values they hold (resp. held)  Data is indexed with a hash values  “Identifiers” are hashed  Identifiers may be keywords, author ids, or the content itself  Secure Hash Algorithm (SHA-1) produces a “one-way” 160- bit key  Content-hash key (CHK) = SHA-1(content)  Typically stores blocks

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Freenet: Storing and Retrieving Data  Storing Data  Data is moved to a server with arithmetically close keys 1. The key and data are sent to the local node 2. The key and data is forwarded to the node with the nearest key Repeat 2 until maximum number of hops is reached  Retrieving data  Best First Search 1. An identifier is hashed into a key 2. The key is sent to the local node 3. If data is not in local store, the request is forwarded to the best neighbor Repeat 3 with next best neighbor until data found, or request times out 4. If data is found, or hop-count reaches zero, return the data or error along the chain of nodes (if data found, intermediary nodes create entries in their routing tables)

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Freenet: Best First Search  Heuristics for Selecting Direction >RES: Returned most results <TIME: Shortest satisfaction time <HOPS: Min hops for results >MSG: Sent us most messages (all types) <QLEN: Shortest queue <LAT: Shortest latency >DEG: Highest degree query ?...

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Freenet: Routing Algorithm

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Freenet: Assessment  Scalability, fairness, load balancing  Caching in the overlay network  Access latency decreases with popularity  Large distributed storage  Fast removal of files with low popularity  A lot of storage wasted on highly popular files  Network topology is not accounted for  Content location  Search by hash key: limited ways to formulate queries  Content placement changes to fit search pattern  Less popular files may be outside TTL  Failure resilience  No single point of failure

Examples: FastTrack, Morpheus, OpenFT

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems FastTrack, Morpheus, OpenFT  Peer-to-peer file sharing protocol  Three different nodes  USER  Normal nodes  SEARCH  Keep an index of “their” normal nodes  Answer search requests  INDEX  Keep an index of search nodes  Redistribute search requests

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems FastTrack, Morpheus, OpenFT INDEX SEARCH USER

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems FastTrack, Morpheus, OpenFT INDEX SEARCH USER ? ! !

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems FastTrack, Morpheus, OpenFT: Assessment  Scalability, fairness, load balancing  Large distributed storage  Avoids broadcasts  Load concentrated on super nodes (index and search)  Network topology is partially accounted for  Efficient structure development  Content location  Search by hash key: limited ways to formulate queries  All indexed files are reachable  Can only query by index terms  Failure resilience  No single point of failure but overlay networks of index servers (and search servers) reduces resilience  Relies on very stable relationship / Content is registered at search nodes  Relies on a partially static infrastructure

Examples: BitTorrent

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems BitTorrent  Distributed download system  Content is distributed in segments  Tracker  One central download server per content  Approach to fairness (tit-for-tat) per content  No approach for finding the tracker  No content transfer protocol included

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems BitTorrent Tracker Segment download operation  Tracker tells peer source and number of segment to get  Peer retrieves content in pull mode  Peer reports availability of new segment to tracker

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems BitTorrent Tracker Rarest first strategy No second input stream: not contributed enough

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems BitTorrent Tracker All nodes: max 2 concurrent streams in and out No second input stream: not contributed enough

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems BitTorrent Tracker

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems BitTorrent Tracker

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems BitTorrent Tracker

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems BitTorrent Assessment  Scalability, fairness, load balancing  Large distributed storage  Avoids broadcasts  Transfer content segments rather than complete content  Does not rely on clients staying online after download completion  Contributors are allowed to download more  Content location  Central server approach  Failure resilience  Tracker is single point of failure  Content holders can lie

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Comparison NapsterGnutellaFreeNetFastTrackBitTorrent Scalability Limited by flooding Uses caching Separate overlays per file Routing information One central server Neighbour listIndex serverOne tracker per file Lookup cost O(1)O(log(#nodes))O(#nodes)O(1)O(#blocks) Physical locality By search server assignment

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Comparison NapsterGnutellaFreeNetFastTrackBitTorrent Load balancing Many replicas of popular content Content placement changes to fit search Load concentrated on supernodes Rarest first copying Content location All files reachable Unpopular files may be outside TTL All files reachable Search by hash External issue Uses index server Search by index term Uses floodingSearch by hash Failure resilience Index server as single point of failure No single point of failure Overlay network of index servers Tracker as single point of failure

Peer-to-Peer Systems Distributed directories

Examples: Chord

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Chord  Approach taken  Only concerned with efficient indexing  Distributed index - decentralized lookup service  Inspired by consistent hashing: SHA-1 hash  Content handling is an external problem entirely  No relation to content  No included replication or caching  P2P aspects  Every node must maintain keys  Adaptive to membership changes  Client nodes act also as file servers

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems key hash function hash table pos N yz Hash bucket lookup(key)  data Insert(key, data) Lookup Based on Hash Tables

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems  Nodes are the hash buckets  Key identifies data uniquely  DHT balances keys and data across nodes Distributed Hash Tables (DHTs) Distributed application Distributed hash tables Lookup (key)data node …. Insert(key, data) node  Define a useful key nearness metric  Keep the hop count small  Keep the routing tables “right size”  Stay robust despite rapid changes in membership

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Chord IDs & Consistent Hashing  m bit identifier space for both keys and nodes  Key identifier = SHA-1(key)  Node identifier = SHA-1(IP address)  Both are uniformly distributed  Identifiers ordered in a circle modulo 2 m  A key is mapped to the first node whose id is equal to or follows the key id Key=“LetItBe” ID=54 SHA-1 IP=“ ” ID=123 SHA-1

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Routing: Everyone-Knows-Everyone  Every node knows of every other node - requires global information  Routing tables are large – N Hash(“LetItBe”) = K54 Where is “LetItBe”? “N56 has K60” Requires O(1) hops

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Hash(“LetItBe”) = K54 Where is “LetItBe”? Routing: All Know Their Successor  Every node only knows its successor in the ring  Small routing table – 1 Requires O(N) hops “N56 has K60”

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Routing: “Finger Tables”  Every node knows m other nodes in the ring  Increase distance exponentially  Finger i points to successor of n+2 i

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Joining the Ring  Three step process:  Initialize all fingers of new node - by asking another node for help  Update fingers of existing nodes  Transfer keys from successor to new node

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Handling Failures  Failure of nodes might cause incorrect lookup  N80 doesn’t know correct successor, so lookup fails  One approach: successor lists  Each node knows r immediate successors  After failure find first known live successor  Increased routing table size N120 N113 N102 N80 N85 N10 Lookup(90)

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Chord Assessment  Scalability, fairness, load balancing  Large distributed index  Logarithmic search effort  Network topology is not accounted for  Routing tables contain log(#nodes)  Quick lookup in large systems, low variation in lookup costs  Content location  Search by hash key: limited ways to formulate queries  All indexed files are reachable  Log(#nodes) lookup steps  Not restricted to file location  Failure resilience  No single point of failure  Not in basic approach  Successor lists allow use of neighbors to failed nodes  Salted hashes allow multiple indexes  Relies on well-known relationships, but fast awareness of disruption and rebuilding

Examples: Pastry

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Pastry  Approach taken  Only concerned with efficient indexing  Distributed index - decentralized lookup service  Uses DHTs  Content handling is an external problem entirely  No relation to content  No included replication or caching  P2P aspects  Every node must maintain keys  Adaptive to membership changes  Leaf nodes are special  Client nodes act also as file servers

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Pastry  DHT approach  Each node has unique 128-bit nodeId  Assigned when node joins  Used for routing  Each message has a key  NodeIds and keys are in base 2 b  b is configuration parameter with typical value 4 (base = 16, hexadecimal digits)  Pastry node routes the message to the node with the closest nodeId to the key  Number of routing steps is O(log N)  Pastry takes into account network locality  Each node maintains  Routing table is organized into  log 2 b N  rows with 2 b -1 entry each  Neighborhood set M — nodeId’s, IP addresses of  M  closest nodes, useful to maintain locality properties  Leaf set L — set of  L  nodes with closest nodeId

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Pastry Routing NodeId SMALLERLARGER Leaf set Routing table Neighborhood set b=2, so nodeId is base 4 (16 bits) Contains the nodes that are numerically closest to local node Contains the nodes that are closest to local node according to proximity metric 2 b -1 entries per row  log 2 b N  rows Entries in the n th row share the first n-1 digits with current node common prefix – next digit – rest Entries in the m th column have m as n th row digit Entries with no suitable nodeId are left empty

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems 1331 X 1 : | | | X 2 : | | | L: 1232 | 1221 | 1300 | X 0 : | | | source 1221 des t Pastry Routing 1. Search leaf set for exact match 2. Search route table for entry with at one more digit common in the prefix 3. Forward message to node with equally number of digits in prefix, but numerically closer in leaf set

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Pastry Assessment  Scalability, fairness, load balancing  Distributed index of arbitrary size  Support for physical locality and locality by hash value  Stochastically logarithmic search effort  Network topology is partially accounted for, given an additional metric for physical locality  Stochastically logarithmic lookup in large systems, variable lookup costs  Content location  Search by hash key: limited ways to formulate queries  All indexed files are reachable  Not restricted to file location  Failure resilience  No single point of failure  Several possibilities for backup routes

Examples: Tapestry

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Tapestry  Approach taken  Only concerned with self-organizing indexing  Distributed index - decentralized lookup service  Uses DHTs  Content handling is an external problem entirely  No relation to content  No included replication or caching  P2P aspects  Every node must maintain keys  Adaptive to changes in membership and value change

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Routing and Location  Namespace (nodes and objects)  SHA-1 hash: 160 bits length  Each object has its own hierarchy rooted at RootID = hash(ObjectID)  Prefix-routing [JSAC 2004]  Router at h th hop shares prefix of length ≥h digits with destination  local tables at each node (neighbor maps)  route digit by digit: 4***  42**  42A*  42AD  neighbor links in levels

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Routing and Location  Suffix routing [tech report 2001]  Router at h th hop shares suffix of length ≥h digits with destination  Example: 5324 routes to 0629 via 5324  2349  1429  7629  0629  Tapestry routing  Cache pointers to all copies  Caches are soft-state  UDP Heartbeat and TCP timeout to verify route availability  Each node has 2 backup neighbors  Failing primary neighbors are kept for some time (days)  Multiple root nodes possible, identified via hash functions  Search value in a root if its hash is that of the root  Choosing a root node  Choose a random address  Route towards that address  If no route exists, choose deterministically, a surrogate

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Routing and Location  Object location  Root responsible for storing object’s location (but not the object)  Publish / search both routes incrementally to root  Locates objects  Object : key/value pair  E.g. filename/file  Automatic replication of keys  No automatic replication of values  Values  May be replicated  May be stored in erasure-coded fragments

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Tapestry  Location service for mobile objects  Location of and access to mobile objects  Self-organizing, scalable, robust, wide-area infrastructure  Self-administering, fault-tolerant, resilient under load  Point-to-point communication, no centralized resources  Locates objects  Object : key/value pair  E.g. filename/file  Automatic replication of keys  No automatic replication of values  Values  May be replicated  May be stored in erasure-coded fragments

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Tapestry  Routing and directory service  Goal  Find route to closest copy  Routing  Forwarding of messages in the overlay network of nodes  Nodes  Act as servers, routers and clients  Routing  Each object has a unique root, identified by it the key  Hash value of key is source route prefix to object ’s root  Root answers with address of value ’s location  Routers cache the response

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Tapestry Insert(, key K, value V) V #K #addr 1 #addr 2 … (#K,●)

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Tapestry V (#K,●) #K #addr 1 #addr 2 … ?K (#K,●) ● caching result

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Tapestry V (#K,●) ●

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Tapestry V (#K,●) ● Move(, key K, value V) V

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Tapestry (#K,●) ● V ● Stays wrong till timeout

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Mobile Tapestry ?K (#K,node) (#node,●) ?node (#K,node) (#node,●)  Host mobility  Map to logical name first  Map to address second

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Tapestry Assessment  Scalability, fairness, load balancing  Distributed index(es) of arbitrary size  Limited physical locality of key access by caching and nodeId selection  Variable lookup costs  Independent of content scalability  Content location  Search by hash key: limited ways to formulate queries  All indexed files are reachable  Not restricted to file location  Failure resilience  No single point of failure  Several possibilities for backup routes  Caching of key resolutions  Use of hash values with several salt values

Comparison

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Comparison ChordPastryTapestry Routing information Log(#nodes) routing table size Log(#nodes) x (2 b – 1) routing table size At least log(#nodes) routing table size Lookup cost Log(#nodes) lookup cost Approx. log(#nodes) lookup cost Variable lookup cost Physical locality By neighbor listIn mobile tapestry Failure resilience No resilience in basic version Additional successor lists provide resilience No single point of failure Several backup route No single point of failure Several backup route Alternative hierarchies

Applications

CFS – Cooperative File System

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems CFS – Cooperative/Chord File System  Design Challenges  Distribute files in a peer-to-peer manner  Load Balancing — spread storage burden evenly  Fault tolerance — tolerate unreliable participants  Speed — comparable to whole-file FTP  Scalability — avoid O(#participants) algorithms  CFS approach based on Chord  Blocks of files are indexed using Chord  Root block found by file name, refers to first directory block  Directory blocks contain hashes of blocks DF B1 B2 Public key Root Block signature Directory Block Inode Block Data Block Data Block H(D) H(F)H(B1) H(B2)

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems CFS Replication Mechanism  Increase availability (fault tolerance)  Increase efficiency (server selection)  The predecessor of the data block returns its successors and their latencies  Client can choose the successor with the smallest latency  Ensures independent replica failure N40 N10 N5 N20 N110 N99 N80 N60 N50 Block 17 N Replicate blocks at r successors

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems N40 N10 N5 N20 N110 N99 N80 N60 Lookup(BlockID=45) N50 N RPCs: 1. Chord lookup 2. Chord lookup 3. Block fetch 4. Send to cache CFS Copies to Caches Along Lookup Path CFS Cache Mechanism

Streaming in Peer-to-peer networks

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Peer-to-peer network

Promise

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Promise  Video streaming in Peer-to-Peer systems  Video segmentation into many small segments  Pull operation  Pull from several sources at once  Based on Pastry and CollectCast  CollectCast  Adds rate/data assignment  Evaluates  Node capabilities  Overlay route capabilities  Uses topology inference  Detects shared path segments - using ICMP similar to traceroute  Tries to avoid shared path segments  Labels segments with quality (or goodness)

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Promise [Hafeeda et. al. 03] Receiver active sender standby senderactive sender Thus, Promise is a multiple sender to one receiver P2P media streaming system which 1) accounts for different capabilities, 2) matches senders to achieve best quality, and 3) dynamically adapts to network fluctuations and peer failure Each active sender: receives a control packet specifying which data segments, data rate, etc., pushes data to receiver as long as no new control packet is received standby sender The receiver: sends a lookup request using DHT selects some active senders, control packet receives data as long as no errors/changes occur if a change/error is detected, new active senders may be selected

SplitStream

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems SplitStream  Video streaming in Peer-to-Peer systems  Uses layered video  Uses overlay multicast  Push operation  Build disjoint overlay multicast trees  Based on Pastry

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems SplitStream Source: full quality movie Stripe 2 Stripe 1 [Castro et. al. 03] Each node: joins as many multicast trees as there are stripes (K) may specify the number of stripes they are willing to act as router for, i.e., according to the amount of resources available Each movie is split into K stripes and each stripe is multicasted using a separate three Thus, SplitStream is a multiple sender to multiple receiver P2P system which distributes the forwarding load while respecting each node’s resource limitations, but some effort is required to build the forest of multicast threes

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Some References 1.M. Castro, P. Druschel, A-M. Kermarrec, A. Nandi, A. Rowstron and A. Singh, "SplitStream: High-bandwidth multicast in a cooperative environment", SOSP'03, Lake Bolton, New York, October Mohamed Hefeeda, Ahsan Habib, Boyan Botev, Dongyan Xu, Bharat Bhargava, "Promise: Peer-to-Peer Media Streaming Using Collectcast", ACM MM’03, Berkeley, CA, November Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari Balakrishnan, “Chord: A scalable peer-to-peer lookup service for internet applications”, ACM SIGCOMM’01 4.Ben Y. Zhao, John Kubiatowicz and Anthony Joseph, “Tapestry: An Infrastructure for Fault- tolerant Wide-area Location and Routing”, UCB Technical Report CSD , John Kubiatowicz, “Extracting Guarantees from Chaos”, Comm. ACM, 46(2), February Antony Rowstron and Peter Druschel, “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems”, Middleware’01, November 2001

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Vikash Agarwal, Reza Rejaie Computer and Information Science Department University of Oregon January 19, 2005 Adaptive Multi-Source Streaming in Heterogeneous Peer-to-Peer Networks

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Introduction  P2P streaming becomes increasingly popular  Participating peers form an overlay to cooperatively stream content among themselves  Overlay-based approach is the only way to efficiently support multi- party streaming apps without multicast  Two components:  Overlay construction  Content delivery  Each peer desires to receive max. quality that can be streamed through its access link  Peers have asymmetric & heterogeneous BW connectivity  Each peer should receive content from multiple parent peers => Multi-source streaming.  Multi-parent overlay structure rather than tree

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Benefits of Multi-source Streaming  Higher bandwidth to each peer  higher delivered quality  Better load balancing among peers  Less congestion across the network  More robust to dynamics of peer participation  Multi-source streaming introduces new challenges …

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Multi-source streaming: Challenges  Congestion controlled connections from different parent peers exhibit  independent variations in BW  different RTT, BW, loss rate  Aggregate bandwidth changes over time  Streaming mechanism should be quality adaptive  Static “one-layer-per-sender” approach is inefficient  There must be a coordination mechanism among senders in order to  Efficiently utilize aggregate bandwidth  Gracefully adapt delivered quality with BW variations This paper presents a receiver-driven coordination mechanism for multi-source streaming called PALS

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Previous Studies  Congestion control was often ignored  Server/content placement for streaming MD content [Apostolopoulos et al.]  Resource management for P2P streaming [Cue et al.]  Multi-sender streaming [Nguyen et al], but they assumed  Aggregate BW is more than stream BW  RLM is receiver-driven but..  RLM tightly couples coarse quality adaptation with CC  PALS only determines how aggregate BW is used  P2P content dist. mechanism can not accomodate “streaming” apps  e.g. BitTorrent, Bullet

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Overall Architecture  Overall architecture for P2P streaming  PRO: Bandwidth-aware overlay construction  Identifying good parents in the overlay  PALS: Multi-source adaptive streaming  Streaming content from selected parents  Distributed multimedia caching  Decoupling overlay construction from delivery provides great deal of flexibility  PALS is a generic multi-source streaming protocol for non-interactive applications

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Assumptions & Goals Assumptions:  All peers/flows are cong. controlled  Content is layered encoded  All layers are CBR with the same cons. rate*  All senders have all layers (relax this later)*  Limited window of future packets are available at each sender  Live but non-interactive * Not requirements Goals:  To fully utilize aggregate bandwidth to dynamically maximize delivered quality  Deliver max no of layers  Minimize variations in quality

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems P2P Adaptive Layered Streaming (PALS)  Receiver: periodically requests an ordered list of packets/segments from each sender.  Sender: simply delivers requested packets with the given order at the CC rate  Benefits of ordering the requested list:  Provide flexibility for the receiver to closely control delivered packets  Graceful degradation in quality when bandwidth suddenly drops  Periodic requests => stability & less overhead

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Basic Framework Internet Peer 0 bw (t) 2 1 CCC buf 1 2 bw (t) 3 3 buf Decoder Demux C buf 0 bw (t) 0 BW Peer 1 Peer 2 Receiver passively monitors EWMA BW from each sender  EWMA aggregate BW  Estimate total no of pkts to be delivered during next window (K) Allocate K pkts among active layers (Quality Adaptation) Controlling bw0(t), bw1(t), …, Controlling evolution of buf. state. Assign a subset of pkts to each sender (Packet assignment) Allocating each sender’s bw among active layers

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Key Components of PALS  Sliding Window (SW): to keep all senders busy & loosely synchronized with receiver playout time  Quality adaptation (QA): to determine quality of delivered stream, i.e. required packets for all layers during one window  Packet Assignment (PA): to properly distribute required packets among senders

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Sliding Window  Buffering window: range of timestamps for packets that must be requested in one window.  Window is slided forward in a step-like fashion  Requested packets per window can be from 1)Playing window (loss recovery) 2)Buffering window (main group) 3)Future windows (buffering)

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Sliding Window (cont’d)  Window size determines the tradeoff between smoothness or signaling overhead & responsiveness  Should be a function of RTT since it specifies timescale of variations in BW  Multiple of max smoothed RTT among senders  Receiver might receive duplicates  Re-requesting the packet that is in flight!  Ratio of duplicates are very low and can be reduced by increasing window

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Coping with BW variations  Sliding window is insufficient  Coping with sudden drop in BW by  Overwriting request at senders  Ordering requested packets  Coping with sudden increase in BW by  Requesting extra packets

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Quality Adaptation  Determining required packets from future windows  Coarse-grained adaptation  Add/drop layer  Fine-grained adaptation  Controlling bw0(t), bw1(t), …,  Loosely controlling evolution of receiver buffer state/dist. What is a proper buffer dist?  Buffer distribution determines what degree of BW variations can be smoothed. Internet Peer 0 bw (t) 2 1 CCC buf 1 2 bw (t) 3 3 buf Decoder Demux C buf 0 bw (t) 0 BW Peer 1 Peer 2

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Buffer Distribution  Impact on delivered quality  Conservative buf. distribution achieves long-term smoothing  Aggressive buf. distribution achieves short-term improvement  PALS leverages this tradeoff in a balanced fashion  Window size affects buffering:  Amount of future buffering  Slope of buffer distribution  Multiple opportunities to request a packet (see paper)  Implicit loss recovery

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Packet Assignment  How to assign an ordered list of selected pkts from diff. layers to individual senders?  Number of assigned pkts to each sender must be proportional to its BW contribution  More important pkts should be delivered  Weighted round robin pkt assignment strategy  Extended this strategy to support partially available content at each peer  Please see paper for further details

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Performance Evaluation  Using ns simulation to control BW dynamics  Focused on three key dynamics in P2P systems: BW variations, Peer participation, Content availability  Senders with heterogeneous RTT & BW  Decouple underlying CC mechanism from PALS  Performance Metrics: BW Utilization, Delivered Quality  Two strawman mechanisms with static layer assignment to each sender:  Single Layer per Sender (SLS): Sender i delivers layer i  Multiple Layer per Sender (MLS): Sender i delivers layer j<i

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Necessity of Coordination  SLS & MLS exhibit high variations in quality  No explicit loss recovery  No coordination  Inter-layer dependency magnifies the problem  PALS effectively utilizes aggregate BW & delivers stable quality in all cases

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Delay-Window Tradeoff  Avg. delivered quality only depends on agg. BW Heterogeneous senders  Higher Delay => smoother quality  Duplicates exponentially decrease with window size  Avg. per-layer buffering linearly increases with Delay  Increasing window leads to even buffer dist.  See paper for more results.

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Conclusion & Future Work  PALS is a receiver-driven coordination mechanism for streaming from multiple cong. controlled senders.  Simulation results are very promising  Future work:  Further simulation to examine further details  Prototype implementation for real experiments  Integration with other components of our architecture for P2P streaming

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Partially available content  Effect of segment size and redundancy

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems

2006 Carsten Griwodz & Pål Halvorsen INF5071 – performance in distributed systems Packet Dynamics