CPE 401/601 Computer Network Systems

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
The BitTorrent Protocol. What is BitTorrent?  Efficient content distribution system using file swarming. Does not perform all the functions of a typical.
Incentives Build Robustness in BitTorrent Bram Cohen.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Peer-to-Peer Architecture Steve Ko Computer Sciences and Engineering University at Buffalo.
CompSci 356: Computer Network Architectures Lecture 21: Content Distribution Chapter 9.4 Xiaowei Yang
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 Peer-to-Peer Applications Reading: 9.4 COS 461: Computer Networks Spring 2008 (MW 1:30-2:50 in COS 105) Jennifer Rexford Teaching Assistants: Sunghwan.
Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella.
CS 640: Introduction to Computer Networks Yu-Chi Lai Lecture 18 - Peer-to-Peer.
P2P File Sharing Systems
Introduction Widespread unstructured P2P network
BitTorrent Presentation by: NANO Surmi Chatterjee Nagakalyani Padakanti Sajitha Iqbal Reetu Sinha Fatemeh Marashi.
By Shobana Padmanabhan Sep 12, 2007 CSE 473 Class #4: P2P Section 2.6 of textbook (some pictures here are from the book)
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
Introduction of P2P systems
Chapter 2: Application layer
2: Application Layer1 Chapter 2: Application layer r 2.1 Principles of network applications r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail  SMTP,
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Peer-to-Peer Systems r Application-layer architectures r Case study: BitTorrent r P2P Search and Distributed Hash Table (DHT)
2: Application Layer1 Chapter 2: Application layer r 2.1 Principles of network applications  app architectures  app requirements r 2.2 Web and HTTP r.
Peer-to-Peer File Sharing Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
OVERVIEW Lecture 8 Distributed Hash Tables. Hash Table r Name-value pairs (or key-value pairs) m E.g,. “Mehmet Hadi Gunes” and m E.g.,
Peer-to-Peer Networks Hongli Luo CEIT, IPFW. r Topics m Application architecture m P2P file sharing m P2P networks: Napster Gnutella KaAzA Bittorrent.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
Lecture 2 Distributed Hash Table
CS 640: Introduction to Computer Networks Aditya Akella Lecture 24 - Peer-to-Peer.
Peer-to-Peer Systems.  Quickly grown in popularity:  Dozens or hundreds of file sharing applications  In 2004: 35 million adults used P2P networks.
1 CS 525 Advanced Distributed Systems Spring 2014 Indranil Gupta (Indy) Lecture 5 Peer to Peer Systems II February 4, 2014 All Slides © IG.
Indranil Gupta (Indy) September 27, 2012 Lecture 10 Peer-to-peer Systems II Reading: Chord paper on website (Sec 1-4, 6-7)  2012, I. Gupta Computer Science.
Peer-to-Peer File Sharing
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Peer-to-peer systems (part I) Slides by Indranil Gupta (modified by N. Vaidya)
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
1 Overlay Networks. 2 Routing overlays –Experimental versions of IP (e.g., 6Bone) –Multicast (e.g., MBone and end-system multicast) –Robust routing (e.g.,
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
PEAR TO PEAR PROTOCOL. Pure P2P architecture no always-on server arbitrary end systems directly communicate peers are intermittently connected and change.
CS 425 / ECE 428 Distributed Systems Fall 2015 Indranil Gupta (Indy) Peer-to-peer Systems All slides © IG.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
An example of peer-to-peer application
Introduction to BitTorrent
BitTorrent Vs Gnutella.
CSE 486/586 Distributed Systems Distributed Hash Tables
A Scalable Peer-to-peer Lookup Service for Internet Applications
EE 122: Peer-to-Peer (P2P) Networks
5.2 FLAT NAMING.
Prof. Leonardo Mostarda University of Camerino
CS 162: P2P Networks Computer Science Division
CPE 401/601 Computer Network Systems
Part 4: Peer to Peer - P2P Applications
The BitTorrent Protocol
Content Distribution Networks + P2P File Sharing
Consistent Hashing and Distributed Hash Table
Pure P2P architecture no always-on server
CSE 486/586 Distributed Systems Distributed Hash Tables
#02 Peer to Peer Networking
Content Distribution Networks + P2P File Sharing
CSE 486/586 Distributed Systems Peer-to-Peer Architecture --- 1
Presentation transcript:

CPE 401/601 Computer Network Systems Peer-to-Peer (p2p) CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford

Outline Scalability in distributing a large file Single server and N clients Peer-to-peer system with N peers Searching for the right peer Central directory (Napster) Query flooding (Gnutella) Hierarchical overlay (Kazaa) Optimized architecture (Chord) BitTorrent Transferring large files Preventing free-riding

“Site under construction” Clients and Servers Client program Running on end host Requests service E.g., Web browser Server program Running on end host Provides service E.g., Web server GET /index.html “Site under construction”

Client-Server Communication Client “sometimes on” Initiates a request to the server when interested E.g., Web browser on your laptop or cell phone Doesn’t communicate directly with other clients Needs to know the server’s address Server is “always on” Services requests from many client hosts E.g., Web server for the www.unr.edu Web site Doesn’t initiate contact with the clients Needs a fixed, well-known address

Server Distributing a Large File F bits d4 upload rate us Internet d3 d1 d2 Download rates di

Server Distributing a Large File Server sending a large file to N receivers Large file with F bits Single server with upload rate us Download rate di for receiver i Server transmission to N receivers Server needs to transmit NF bits Takes at least NF/us time Receiving the data Slowest receiver receives at rate dmin= mini{di} Takes at least F/dmin time Download time: max{NF/us, F/dmin}

Speeding Up the File Distribution Increase the upload rate from the server Higher link bandwidth at the one server Multiple servers, each with their own link Requires deploying more infrastructure Alternative: have the receivers help Receivers get a copy of the data And then redistribute the data to other receivers To reduce the burden on the server

Peers Help Distributing a Large File F bits d4 u4 upload rate us Internet d3 d1 u3 u1 u2 d2 Upload rates ui Download rates di

Peers Help Distributing a Large File Start with a single copy of a large file Large file with F bits and server upload rate us Peer i with download rate di and upload rate ui Two components of distribution latency Server must send each bit: min time F/us Slowest peer receives each bit: min time F/dmin Total upload time using all upload resources Total number of bits: NF Total upload bandwidth us + sumi(ui) Total: max{F/us, F/dmin, NF/(us+sumi(ui))}

Comparing the Two Models Download time Client-server: max{NF/us, F/dmin} Peer-to-peer: max{F/us, F/dmin, NF/(us+sumi(ui))} Peer-to-peer is self-scaling Much lower demands on server bandwidth Distribution time grows only slowly with N But… Peers may come and go Peers need to find each other Peers need to be willing to help each other

Challenges of Peer-to-Peer Peers come and go Peers are intermittently connected May come and go at any time Or come back with a different IP address How to locate the relevant peers? Peers that are online right now Peers that have the content you want How to motivate peers to stay in system? Why not leave as soon as download ends? Why bother uploading content to anyone else?

Locating the Relevant Peers Three main approaches Central directory (Napster) Query flooding (Gnutella) Hierarchical overlay (Kazaa, modern Gnutella) Design goals Scalability Simplicity Robustness Plausible deniability

Peer-to-Peer Networks: Napster Napster history: the rise January 1999: Napster version 1.0 May 1999: company founded December 1999: first lawsuits 2000: 80 million users Napster history: the fall Mid 2001: out of business due to lawsuits Mid 2001: dozens of P2P alternatives that were harder to touch, though these have gradually been constrained 2003: growth of pay services like iTunes Napster history: the resurrection 2003: Napster name/logo reconstituted as a pay service

Napster Technology: Directory Service User installing the software Download the client program Register name, password, local directory, etc. Client contacts Napster (via TCP) Provides a list of music files it will share … and Napster’s central server updates the directory Client searches on a title or performer Napster identifies online clients with the file … and provides IP addresses Client requests the file from the chosen supplier Supplier transmits the file to the client Both client and supplier report status to Napster

Napster Technology: Properties Server’s directory continually updated Always know what music is currently available Point of vulnerability for legal action Peer-to-peer file transfer No load on the server Plausible deniability for legal action but not enough Proprietary protocol Login, search, upload, download, and status operations No security: cleartext passwords and other vulnerabilities Bandwidth issues Suppliers ranked by apparent bandwidth & response time

Napster: Limitations of Central Directory Single point of failure Performance bottleneck Copyright infringement So, later P2P systems were more distributed Gnutella went to the other extreme… File transfer is decentralized, but locating content is highly centralized

Peer-to-Peer Networks: Gnutella Gnutella history 2000: J. Frankel & T. Pepper released Gnutella Soon after: many other clients e.g., Morpheus, Limewire, Bearshare 2001: protocol enhancements, e.g., “ultrapeers” Query flooding Join: contact a few nodes to become neighbors Publish: no need! Search: ask neighbors, who ask their neighbors Fetch: get file directly from another node

Gnutella: Query Flooding Fully distributed No central server Public domain protocol Many Gnutella clients implementing protocol Overlay network: graph Edge between peer X and Y if there’s a TCP connection All active peers and edges is an overlay net Given peer will typically be connected with < 10 overlay neighbors

Gnutella: Protocol Query message sent over existing TCP connections File transfer: HTTP Query message sent over existing TCP connections Peers forward Query message QueryHit sent over reverse path Query QueryHit Query Query QueryHit Query QueryHit Scalability: limited scope flooding Query

Gnutella: Peer Joining Joining peer X must find some other peers Start with a list of candidate peers X sequentially attempts TCP connections with peers on list until connection setup with Y X sends Ping message to Y Y forwards Ping message All peers receiving Ping message respond with Pong message X receives many Pong messages X can then set up additional TCP connections

Gnutella: Pros and Cons Advantages Fully decentralized No server Search cost distributed Queries flooded out, ttl restricted Processing per node permits powerful search semantics Periodic Ping-pong to continuously refresh neighbor lists

Gnutella: Pros and Cons Disadvantages Search scope may be quite large Ping/Pong constituted 50% traffic Search time may be quite long Repeated searches with same keywords High overhead, and nodes come and go often 70% of users in 2000 were freeloaders Flooding causes excessive traffic

Peer-to-Peer Networks: KaAzA KaZaA history 2001: created by Dutch company (Kazaa BV) Single network called FastTrack used by other clients as well Smart query flooding Join: on start, the client contacts a super-node and may later become one Publish: client sends list of files to its super-node Search: send query to super-node, and the super-nodes flood queries among themselves Fetch: get file directly from peer(s); can fetch from multiple peers at once

KaZaA: Exploiting Heterogeneity Each peer is either a group leader or assigned to a group leader TCP connection between peer and its group leader TCP connections between some pairs of group leaders Group leader tracks the content in all its children

KaZaA: Motivation for Super-Nodes Query consolidation Many connected nodes may have only a few files Propagating query to a sub-node may take more time than for the super-node to answer itself Stability Super-node selection favors nodes with high up-time How long you’ve been on is a good predictor of how long you’ll be around in the future

Chord Developers: I. Stoica, D. Karger, F. Kaashoek, H. Balakrishnan, R. Morris, Berkeley and MIT Intelligent choice of neighbors to reduce latency and message cost of routing (lookups/inserts) Uses Consistent Hashing on node’s (peer’s) address SHA-1(ip_address,port) 160 bit string Truncated to m bits Called peer id (number between 0 and ) Not unique but id conflicts very unlikely Can then map peers to one of logical points on a circle

Ring of peers Say m=7 6 nodes N112 N16 N96 N32 N80 N45

Peer pointers: successors Say m=7 N112 N16 N96 N32 N80 N45 (similarly predecessors)

Peer pointers: finger tables

What about the files? Filenames also mapped using same consistent hash function SHA-1(filename) 160 bit string (key) File is stored at first peer with id greater than or equal to its key (mod ) File cnn.com/index.html that maps to key K42 is stored at first peer with id greater than 42 Note that we are considering a different file-sharing application here : cooperative web caching The same discussion applies to any other file sharing application, including that of mp3 files. Consistent Hashing => with K keys and N peers, each peer stores O(K/N) keys. (i.e., < c.K/N, for some constant c)

Mapping Files Say m=7 N112 N16 N96 N32 N80 N45 File with key K42 N112 N16 N96 N32 N80 N45 File with key K42 stored here

Who has cnn.com/index.html? Search Say m=7 N112 N16 N96 Who has cnn.com/index.html? (hashes to K42) N32 N80 N45 File cnn.com/index.html with key K42 stored here

Who has cnn.com/index.html? Search At node n, send query for key k to largest successor/finger entry <= k if none exist, send query to successor(n) At or to the anti-clockwise of k (it wraps around the ring) N112 N16 Say m=7 N96 Who has cnn.com/index.html? (hashes to K42) N32 N80 N45 File cnn.com/index.html with key K42 stored here

Who has cnn.com/index.html? Search At node n, send query for key k to largest successor/finger entry <= k if none exist, send query to successor(n) N112 N16 Say m=7 All “arrows” are RPCs (remote procedure calls) N96 Who has cnn.com/index.html? (hashes to K42) N32 N80 N45 File cnn.com/index.html with key K42 stored here

Comparative Performance Memory Lookup Latency #Messages for a lookup Napster O(1) (O(N)@server) Gnutella O(N) Kazaa O(N’) Chord O(log(N))

Need to deal with dynamic changes Peers fail New peers join Peers leave P2P systems have a high rate of churn (node join, leave and failure) 25% per hour in Overnet (eDonkey) 100% per hour in Gnutella Lower in managed clusters Common feature in all distributed systems, including wide-area (e.g., PlanetLab), clusters (e.g., Emulab), clouds (e.g., AWS), etc. So, all the time, need to:  Need to update successors and fingers, and copy keys

New peers joining/leaving A new peer affects O(log(N)) other finger entries in the system, on average Number of messages per peer join= O(log(N)*log(N)) Similar set of operations for dealing with peers leaving For dealing with failures, also need failure detectors (we’ll see these later in the course!)

Churn When nodes are constantly joining, leaving, failing Significant effect to consider: traces from the Overnet system show hourly peer turnover rates (churn) could be 25-100% of total number of nodes in system Leads to excessive (unnecessary) key copying (remember that keys are replicated) Stabilization algorithm may need to consume more bandwidth to keep up Main issue is that files are replicated, while it might be sufficient to replicate only meta information about files Alternatives Introduce a level of indirection (any p2p system) Replicate metadata more, e.g., Kelips

Peer-to-Peer Networks: BitTorrent BitTorrent history and motivation 2002: B. Cohen debuted BitTorrent Key motivation: popular content Popularity exhibits temporal locality (Flash Crowds) e.g., Slashdot/Digg effect, release of a new movie or game Focused on efficient fetching, not searching Distribute same file to many peers Single publisher, many downloaders Prevents free-loading

BitTorrent: Simultaneous Downloading Divide large file into many pieces Replicate different pieces on different peers A peer with a complete piece can trade with other peers Peer can (hopefully) assemble the entire file Allows simultaneous downloading Retrieving different parts of the file from different peers at the same time And uploading parts of the file to peers Important for very large files

BitTorrent: Tracker Infrastructure node Keeps track of peers participating in the torrent Peers register with the tracker Peer registers when it arrives Peer periodically informs tracker it is still there Tracker selects peers for downloading Returns a random set of peers Including their IP addresses So the new peer knows who to contact for data Can have “trackerless” system using DHT

BitTorrent: Chunks Large file divided into smaller pieces Fixed-sized chunks Typical chunk size of 256 Kbytes Allows simultaneous transfers Downloading chunks from different neighbors Uploading chunks to other neighbors Learning what chunks your neighbors have Periodically asking them for a list File done when all chunks are downloaded

BitTorrent Tracker, per file 1. Get tracker 2. Get peers Peer Peer Website links to .torrent Tracker, per file (keeps track of some peers; receives heartbeats, joins and leaves from peers) 1. Get tracker 2. Get peers Peer (seed, has full file) Peer 3. Get file blocks Peer (seed) (new, leecher) Peer (leecher, has some blocks)

BitTorrent: Overall Architecture Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker Web Server .torrent

BitTorrent: Overall Architecture Web page with link to .torrent B C Peer [Leech] Downloader “US” [Seed] Tracker Get-announce Web Server

BitTorrent: Overall Architecture Peer [Leech] Downloader “US” Web page with link to .torrent A B C [Seed] Tracker Response-peer list Web Server

BitTorrent: Overall Architecture Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker Shake-hand Web Server

BitTorrent: Overall Architecture Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker pieces Web Server

BitTorrent: Overall Architecture Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker pieces Web Server

BitTorrent: Overall Architecture Web page with link to .torrent A B C Peer [Leech] Downloader “US” [Seed] Tracker Get-announce Response-peer list pieces Web Server

BitTorrent: Chunk Request Order Which chunks to request? Could download in order Like an HTTP client does Problem: many peers have the early chunks Peers have little to share with each other Limiting the scalability of the system Problem: eventually nobody has rare chunks E.g., the chunks need the end of the file Limiting the ability to complete a download Solutions: random selection and rarest first

BitTorrent: Rarest Chunk First Which chunks to request first? The chunk with the fewest available copies i.e., the rarest chunk first Benefits to the peer Avoid starvation when some peers depart Benefits to the system Avoid starvation across all peers wanting a file Balance load by equalizing # of copies of chunks

Free-Riding Problem in P2P Networks Vast majority of users are free-riders Most share no files and answer no queries Others limit # of connections or upload speed A few “peers” essentially act as servers A few individuals contributing to the public good Making them hubs that basically act as a server BitTorrent prevent free riding Allow the fastest peers to download from you Occasionally let some free loaders download

Bit-Torrent: Preventing Free-Riding Peer has limited upload bandwidth And must share it among multiple peers Prioritizing the upload bandwidth: tit for tat Favor neighbors that are uploading at highest rate Rewarding the top four neighbors Measure download bit rates from each neighbor Reciprocates by sending to the top four peers Recompute and reallocate every 10 seconds Optimistic unchoking Randomly try a new neighbor every 30 seconds So new neighbor has a chance to be a better partner

BitTyrant: Gaming BitTorrent BitTorrent can be gamed, too Peer uploads to top N peers at rate 1/N E.g., if N=4 and peers upload at 15, 12, 10, 9, 8, 3 … then peer uploading at rate 9 gets treated quite well Best to be the Nth peer in the list, rather than 1st Offer just a bit more bandwidth than the low-rate peers But not as much as the higher-rate peers And you’ll still be treated well by others BitTyrant software Uploads at higher rates to higher-bandwidth peers

BitTorrent Issues Problem of incomplete downloads Peers leave the system when done Many file downloads never complete Especially a problem for less popular content Still lots of legal questions remains Further need for incentives

Summary First distributed systems that seriously focused on scalability with respect to number of nodes P2P techniques abound in cloud computing systems

Conclusions Peer-to-peer networks Finding the appropriate peers Nodes are end hosts Primarily for file sharing, and recently telephony Finding the appropriate peers Centralized directory (Napster) Query flooding (Gnutella) Super-nodes (KaZaA) BitTorrent Distributed download of large files Anti-free-riding techniques Great example of how change can happen so quickly in application-level protocols