Peer-to-Peer Systems.  Quickly grown in popularity:  Dozens or hundreds of file sharing applications  In 2004: 35 million adults used P2P networks.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Peer to Peer and Distributed Hash Tables
Professor Yashar Ganjali Department of Computer Science University of Toronto
The BitTorrent protocol A peer-to-peer file sharing protocol.
Incentives Build Robustness in BitTorrent Bram Cohen.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
Chapter 2 Application Layer Computer Networking: A Top Down Approach, 5 th edition. Jim Kurose, Keith Ross Addison-Wesley, April A note on the use.
Application Layer 2-1 Chapter 2 Application Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012.
CSE 486/586, Spring 2014 CSE 486/586 Distributed Systems Peer-to-Peer Architecture Steve Ko Computer Sciences and Engineering University at Buffalo.
CompSci 356: Computer Network Architectures Lecture 21: Content Distribution Chapter 9.4 Xiaowei Yang
No Class on Friday There will be NO class on: FRIDAY 1/30/15.
Peer-to-Peer Jeff Pang Spring Spring 2004, Jeff Pang2 Intro Quickly grown in popularity –Dozens or hundreds of file sharing applications.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 Peer-to-Peer Applications Reading: 9.4 COS 461: Computer Networks Spring 2008 (MW 1:30-2:50 in COS 105) Jennifer Rexford Teaching Assistants: Sunghwan.
Peer-to-Peer Networks Slides largely adopted from Ion Stoica’s lecture at UCB.
Peer-to-Peer Networking Credit slides from J. Pang, B. Richardson, I. Stoica.
File Sharing : Hash/Lookup Yossi Shasho (HW in last slide) Based on Chord: A Scalable Peer-to-peer Lookup Service for Internet ApplicationsChord: A Scalable.
Winter 2008 P2P1 Peer-to-Peer Networks: Unstructured and Structured What is a peer-to-peer network? Unstructured Peer-to-Peer Networks –Napster –Gnutella.
CS 640: Introduction to Computer Networks Yu-Chi Lai Lecture 18 - Peer-to-Peer.
CSE 461 University of Washington1 Topic Peer-to-peer content delivery – Runs without dedicated infrastructure – BitTorrent as an example Peer.
P2P File Sharing Systems
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
Introduction Widespread unstructured P2P network
By Shobana Padmanabhan Sep 12, 2007 CSE 473 Class #4: P2P Section 2.6 of textbook (some pictures here are from the book)

Application Layer – Peer-to-peer UIUC CS438: Communication Networks Summer 2014 Fred Douglas Slides: Fred, Kurose&Ross (sometimes edited)
1 Telematica di Base Applicazioni P2P. 2 The Peer-to-Peer System Architecture  peer-to-peer is a network architecture where computer resources and services.
Introduction of P2P systems
BitTorrent Dr. Yingwu Zhu. Bittorrent A popular P2P application for file exchange!
Chapter 2: Application layer
2: Application Layer1 Chapter 2: Application layer r 2.1 Principles of network applications r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail  SMTP,
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Application Layer 2-1 Chapter 2 Application Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012.
The Start Shawn Fanning (19-yr-old student nicknamed Napster) developed the original Napster application and service in January 1999 while a freshman.
1 Peer-to-Peer Systems r Application-layer architectures r Case study: BitTorrent r P2P Search and Distributed Hash Table (DHT)
2: Application Layer1 Chapter 2: Application layer r 2.1 Principles of network applications  app architectures  app requirements r 2.2 Web and HTTP r.
Peer-to-Peer File Sharing Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
OVERVIEW Lecture 8 Distributed Hash Tables. Hash Table r Name-value pairs (or key-value pairs) m E.g,. “Mehmet Hadi Gunes” and m E.g.,
Peer-to-Peer Networks Hongli Luo CEIT, IPFW. r Topics m Application architecture m P2P file sharing m P2P networks: Napster Gnutella KaAzA Bittorrent.
SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
2: Application Layer1 Chapter 2 Application Layer Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley March 2012.
CS 640: Introduction to Computer Networks Aditya Akella Lecture 24 - Peer-to-Peer.
2: Application Layer 1 Chapter 2 Application Layer Computer Networking: A Top Down Approach, 5 th edition. Jim Kurose, Keith Ross Addison-Wesley, April.
Hui Zhang, Fall Computer Networking CDN and P2P.
Peer-to-Peer File Sharing
ADVANCED COMPUTER NETWORKS Peer-Peer (P2P) Networks 1.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
1 Overlay Networks. 2 Routing overlays –Experimental versions of IP (e.g., 6Bone) –Multicast (e.g., MBone and end-system multicast) –Robust routing (e.g.,
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
PEAR TO PEAR PROTOCOL. Pure P2P architecture no always-on server arbitrary end systems directly communicate peers are intermittently connected and change.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 24 – Introduction to Peer-to-Peer (P2P) Systems Klara Nahrstedt (presented by Long Vu)
An example of peer-to-peer application
Introduction to BitTorrent
BitTorrent Vs Gnutella.
EE 122: Peer-to-Peer (P2P) Networks
CS 162: P2P Networks Computer Science Division
CPE 401/601 Computer Network Systems
Content Distribution Networks + P2P File Sharing
CPE 401/601 Computer Network Systems
#02 Peer to Peer Networking
Content Distribution Networks + P2P File Sharing
CSE 486/586 Distributed Systems Peer-to-Peer Architecture --- 1
Presentation transcript:

Peer-to-Peer Systems

 Quickly grown in popularity:  Dozens or hundreds of file sharing applications  In 2004: 35 million adults used P2P networks – 29% of all Internet users in USA BitTorrent: a few million users at any given point 35% of Internet traffic is from BitTorrent  Upset the music industry, drawn college students, web developers, recording artists and universities into court  But P2P is not new and is probably here to stay  P2P is simply the next iteration of scalable distributed systems

3 Client-Server Communication  Client “sometimes on”  Initiates a request to the server when interested  E.g., Web browser on your laptop or cell phone  Doesn’t communicate directly with other clients  Needs to know the server’s address  Server is “always on”  Services requests from many client hosts  E.g., Web server for the Web site  Doesn’t initiate contact with the clients  Needs a fixed, well-known address

4 Server Distributing a Large File d1d1 F bits d2d2 d3d3 d4d4 upload rate u s Download rates d i Internet

5 Server Distributing a Large File  Server sending a large file to N receivers  Large file with F bits  Single server with upload rate u s  Download rate d i for receiver i  Server transmission to N receivers  Server needs to transmit NF bits  Takes at least NF/u s time  Receiving the data  Slowest receiver receives at rate d min = min i {d i }  Takes at least F/d min time  Download time: max{NF/u s, F/d min }

6 Speeding Up the File Distribution  Increase the upload rate from the server  Higher link bandwidth at the one server  Multiple servers, each with their own link  Requires deploying more infrastructure  Alternative: have the receivers help  Receivers get a copy of the data  And then redistribute the data to other receivers  To reduce the burden on the server

7 Peers Help Distributing a Large File d1d1 F bits d2d2 d3d3 d4d4 upload rate u s Download rates d i Internet u1u1 u2u2 u3u3 u4u4 Upload rates u i

8 Peers Help Distributing a Large File  Start with a single copy of a large file  Large file with F bits and server upload rate u s  Peer i with download rate d i and upload rate u i  Two components of distribution latency  Server must send each bit: min time F/u s  Slowest peer receives each bit: min time F/d min  Total upload time using all upload resources  Total number of bits: NF  Total upload bandwidth u s + sum i (u i )  Total: max{F/u s, F/d min, NF/(u s +sum i (u i ))}

9 Comparing the Two Models  Download time  Client-server: max{NF/u s, F/d min }  Peer-to-peer: max{F/u s, F/d min, NF/(u s +sum i (u i ))}  Peer-to-peer is self-scaling  Much lower demands on server bandwidth  Distribution time grows only slowly with N  But…  Peers may come and go  Peers need to find each other  Peers need to be willing to help each other

P2P vs. Youtube  Let’s compare BitTorrent vs. Youtube  Capacity to accept and store content:  Youtube currently accepts 200K videos per day (or about 1TB)  1000 TV channels producing 1Mb/s translates to about 10TB per day

P2P vs. Youtube  BitTorrent capacity to serve the content  Piratebay has 5M users at any given point in time  Assume average lifetime of 6 hours and download of 0.5GB: total data served = 10,000 TB  Factor of 2 for other p2p systems, total = 20,000 TB  Youtube served 100M videos per day about an year back  Assume that the number is 200M videos, average video size is 5MB, total data served = 1000TB per day

P2P vs. Youtube  Capacity to serve the content based on bandwidth capacity  Piratebay: 5M leechers, 5M seeders  Assume average of 400Kbps per user  Translates to about 4 Tbps  Youtube: assume a 10 Gbps connection from data center  Then need about 400 data centers to match the serving capacity of BitTorrent

13 Challenges of Peer-to-Peer  Peers come and go  Peers are intermittently connected  May come and go at any time  Or come back with a different IP address  How to locate the relevant peers?  Peers that are online right now  Peers that have the content you want  How to motivate peers to stay in system?  Why not leave as soon as download ends?  Why bother uploading content to anyone else?

14 Locating the Relevant Peers  Three main approaches  Central directory (Napster)  Query flooding (Gnutella)  Hierarchical overlay (Kazaa, modern Gnutella)  Design goals  Scalability  Simplicity  Robustness  Plausible deniability

15 Peer-to-Peer Networks: Napster  Napster history: the rise  January 1999: Napster version 1.0  May 1999: company founded  September 1999: first lawsuits  2000: 80 million users  Napster history: the fall  Mid 2001: out of business due to lawsuits  Mid 2001: dozens of P2P alternatives that were harder to touch, though these have gradually been constrained  2003: growth of pay services like iTunes  Napster history: the resurrection  2003: Napster reconstituted as a pay service  2007: still lots of file sharing going on Shawn Fanning, Northeastern freshman

16 Napster Technology: Directory Service  User installing the software  Download the client program  Register name, password, local directory, etc.  Client contacts Napster (via TCP)  Provides a list of music files it will share  … and Napster’s central server updates the directory  Client searches on a title or performer  Napster identifies online clients with the file  … and provides IP addresses  Client requests the file from the chosen supplier  Supplier transmits the file to the client  Both client and supplier report status to Napster

17 Napster Technology: Properties  Server’s directory continually updated  Always know what music is currently available  Point of vulnerability for legal action  Peer-to-peer file transfer  No load on the server  Plausible deniability for legal action (but not enough)  Proprietary protocol  Login, search, upload, download, and status operations  No security: cleartext passwords and other vulnerability  Bandwidth issues  Suppliers ranked by apparent bandwidth & response time

18 Napster: Limitations of Central Directory  Single point of failure  Performance bottleneck  Copyright infringement  So, later P2P systems were more distributed  Gnutella went to the other extreme… File transfer is decentralized, but locating content is highly centralized

19 Peer-to-Peer Networks: Gnutella  Gnutella history  2000: J. Frankel & T. Pepper released Gnutella  Soon after: many other clients (e.g., Morpheus, Limewire, Bearshare)  2001: protocol enhancements, e.g., “ultrapeers”  Query flooding  Join: contact a few nodes to become neighbors  Publish: no need!  Search: ask neighbors, who ask their neighbors  Fetch: get file directly from another node

20 Gnutella: Query Flooding  Fully distributed  No central server  Public domain protocol  Many Gnutella clients implementing protocol Overlay network: graph  Edge between peer X and Y if there’s a TCP connection  All active peers and edges is overlay net  Given peer will typically be connected with < 10 overlay neighbors

21 Gnutella: Protocol  Query message sent over existing TCP connections  Peers forward Query message  QueryHit sent over reverse path Query QueryHit Query QueryHit Query QueryHit File transfer: HTTP Scalability: limited scope flooding

22 Gnutella: Peer Joining  Joining peer X must find some other peers  Start with a list of candidate peers  X sequentially attempts TCP connections with peers on list until connection setup with Y  X sends Ping message to Y  Y forwards Ping message.  All peers receiving Ping message respond with Pong message  X receives many Pong messages  X can then set up additional TCP connections

23 Gnutella: Pros and Cons  Advantages  Fully decentralized  Search cost distributed  Processing per node permits powerful search semantics  Disadvantages  Search scope may be quite large  Search time may be quite long  High overhead, and nodes come and go often

Aside: Search Time?

Aside: All Peers Equal? 56kbps Modem 10Mbps LAN 1.5Mbps DSL 56kbps Modem 1.5Mbps DSL

Aside: Network Resilience Partial TopologyRandom 30% dieTargeted 4% die from Saroiu et al., MMCN 2002

27 Peer-to-Peer Networks: KaAzA  KaZaA history  2001: created by Dutch company (Kazaa BV)  Single network called FastTrack used by other clients as well  Eventually the protocol changed so other clients could no longer talk to it  Smart query flooding  Join: on start, the client contacts a super-node (and may later become one)  Publish: client sends list of files to its super-node  Search: send query to super- node, and the super-nodes flood queries among themselves  Fetch: get file directly from peer(s); can fetch from multiple peers at once

28 KaZaA: Exploiting Heterogeneity  Each peer is either a group leader or assigned to a group leader  TCP connection between peer and its group leader  TCP connections between some pairs of group leaders  Group leader tracks the content in all its children

29 KaZaA: Motivation for Super-Nodes  Query consolidation  Many connected nodes may have only a few files  Propagating query to a sub-node may take more time than for the super-node to answer itself  Stability  Super-node selection favors nodes with high up-time  How long you’ve been on is a good predictor of how long you’ll be around in the future

30 Peer-to-Peer Networks: BitTorrent  BitTorrent history and motivation  2002: B. Cohen debuted BitTorrent  Key motivation: popular content Popularity exhibits temporal locality (Flash Crowds) E.g., Slashdot effect, CNN Web site on 9/11, release of a new movie or game  Focused on efficient fetching, not searching Distribute same file to many peers Single publisher, many downloaders  Preventing free-loading

31 BitTorrent: Simultaneous Downloading  Divide large file into many pieces  Replicate different pieces on different peers  A peer with a complete piece can trade with other peers  Peer can (hopefully) assemble the entire file  Allows simultaneous downloading  Retrieving different parts of the file from different peers at the same time  And uploading parts of the file to peers  Important for very large files

32 BitTorrent: Tracker  Infrastructure node  Keeps track of peers participating in the torrent  Peers register with the tracker  Peer registers when it arrives  Peer periodically informs tracker it is still there  Tracker selects peers for downloading  Returns a random set of peers  Including their IP addresses  So the new peer knows who to contact for data

33 BitTorrent: Chunks  Large file divided into smaller pieces  Fixed-sized chunks  Typical chunk size of 16KB KB  Allows simultaneous transfers  Downloading chunks from different neighbors  Uploading chunks to other neighbors  Learning what chunks your neighbors have  Broadcast to neighbors when you have a chunk  File done when all chunks are downloaded

34 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Web Server.torrent

35 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Get-announce Web Server

36 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Response-peer list Web Server

37 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Shake-hand Web Server Shake-hand

38 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker pieces Web Server

39 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker pieces Web Server

40 BitTorrent: Overall Architecture Web page with link to.torrent A B C Peer [Leech] Downloader “US” Peer [Seed] Peer [Leech] Tracker Get-announce Response-peer list pieces Web Server

41 BitTorrent: Chunk Request Order  Which chunks to request?  Could download in order  Like an HTTP client does  Problem: many peers have the early chunks  Peers have little to share with each other  Limiting the scalability of the system  Problem: eventually nobody has rare chunks  E.g., the chunks need the end of the file  Limiting the ability to complete a download  Solutions: random selection and rarest first

42 Free-Riding Problem in P2P Networks  Vast majority of users are free-riders  Most share no files and answer no queries  Others limit # of connections or upload speed  A few “peers” essentially act as servers  A few individuals contributing to the public good  Making them hubs that basically act as a server  BitTorrent prevent free riding  Allow the fastest peers to download from you  Occasionally let some free loaders download

43 Bit-Torrent: Preventing Free-Riding  Peer has limited upload bandwidth  And must share it among multiple peers  Prioritizing the upload bandwidth  Favor neighbors that are uploading at highest rate  Rewarding the top four neighbors  Measure download bit rates from each neighbor  Reciprocates by sending to the top four peers  Recompute and reallocate every 10 seconds  Optimistic unchoking  Randomly try a new neighbor every 30 seconds  So new neighbor has a chance to be a better partner

Study BitTorrent’s Incentives  First, construct a model to predict unreciprocated altruism  Measure large number of popular swarms  Estimate fairness, altruism, and reciprocation behavior

Fairness

End-host capacities

Per-Peer Send Rates

Altruism

Reciprocation Probability

Methodology  First, construct a model to predict unreciprocated altruism  Measure large number of popular swarms  Estimate fairness, altruism, and reciprocation behavior  Second, develop a strategic client: BitTyrant

BitTyrant: Strategic Peer Selection Select peers and rates to maximize “return-on-investment”

BitTyrant Performance Ratio of BitTyrant Download Time to Original Download Time Cumulative Fraction

53 BitTorrent Today  Well designed system with some incentives  Significant fraction of Internet traffic  Estimated at 30%  Though this is hard to measure  Problem of incomplete downloads  Peers leave the system when done  Many file downloads never complete  Especially a problem for less popular content  Still lots of legal questions remains  Further need for incentives

Distributed Hash Tables (DHT): History  In , academic researchers jumped on to the P2P bandwagon  Motivation:  Guaranteed lookup success for files in system (the search problem that BitTorrent doesn’t address)  Provable bounds on search time  Provable scalability to millions of node  Hot topic in networking ever since

DHT: Overview  Abstraction: a distributed “hash-table” (DHT) data structure:  put(id, item);  item = get(id);  Implementation: nodes in system form an interconnection network  Can be Ring, Tree, Hypercube, Butterfly Network,...

DHT: Example - Chord  Associate with each node and file a unique id in an uni- dimensional space (a Ring)  E.g., pick from the range [0...2 m ]  Usually the hash of the file or IP address  Properties:  Routing table size is O(log N), where N is the total number of nodes  Guarantees that a file is found in O(log N) hops from MIT in 2001

DHT: Consistent Hashing N32 N90 N105 K80 K20 K5 Circular ID space Key 5 Node 105 A key is stored at its successor: node with next higher ID

DHT: Chord Basic Lookup N32 N90 N105 N60 N10 N120 K80 “Where is key 80?” “N90 has K80”

DHT: Chord “Finger Table” N80 1/2 1/4 1/8 1/16 1/32 1/64 1/128  Entry i in the finger table of node n is the first node that succeeds or equals n + 2 i  In other words, the i th finger points 1/2 n-i way around the ring

DHT: Chord Join  Assume an identifier space [0..8]  Node n1 joins i id+2 i succ Succ. Table

DHT: Chord Join  Node n2 joins i id+2 i succ Succ. Table i id+2 i succ Succ. Table

DHT: Chord Join  Nodes n0, n6 join i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table

DHT: Chord Join  Nodes: n1, n2, n0, n6  Items: f7, f i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table 7 Items 1 i id+2 i succ Succ. Table

7 DHT: Chord Routing  Upon receiving a query for item id, a node:  Checks whether stores the item locally  If not, forwards the query to the largest node in its successor table that does not exceed id i id+2 i succ Succ. Table i id+2 i succ Succ. Table i id+2 i succ Succ. Table Items 1 i id+2 i succ Succ. Table query(7)

DHT: Chord Summary  Routing table size?  Log N fingers  Routing time?  Each hop expects to 1/2 the distance to the desired id => expect O(log N) hops.  What is good/bad about Chord?