Presentation is loading. Please wait.

Presentation is loading. Please wait.

RES 203 Internet applications

Similar presentations


Presentation on theme: "RES 203 Internet applications"— Presentation transcript:

1 RES 203 Internet applications
dario.rossi Peer-to-peer Dario Rossi RES203 V01/2017

2 Agenda Introduction on P2P Finding content (today) Diffusing content
Recap on client-server vs P2P paradigms Interest of P2P paradigm P2P networks and Overlay graphs (vanilla) taxonomy of P2P applications Finding content (today) Napster, Gnutella, DHTs (Chord) Diffusing content BitTorrent, P2P-TV Transmission strategies: Skype and BitTorrent congestion control

3 Client-server paradigm
Runs on end-hosts On/off behavior Service consumer Issue requests Do not communicate directly among them Need to know the server address Server: Runs on end-hosts Always on Service provider Receive services Satisfy requests from many clients Need a fixed address (or DNS name)‏

4 Client-server paradigm
3 Server S distributes content to clients 1..N 2 1 S

5 Client-server paradigm
3 Server has to provide all the necessary upload bandwitdth 2 1 S

6 Peer-to-peer paradigm
Runs on end-hosts On/off behavior, handle churn‏ Need to join Need to discover other peers Service providers and consumers Communicate directly among them Need to define communication rules Prevent free riding Incentivate participation and reciprocation , Servers used for bootstrap S X E A C D ? B X S ? B A D E

7 Peer-to-peer paradigm
N-1 N 3 Peers may assist S using their upload bandwidth 2 1 S

8 Peer-to-peer paradigm
N-1 N S 3 Notice that: Servers are typically needed for bootstrap Servers aren’t needed for resource sharing Peers-2-peer network do not necessarily need a server 2 1

9 Client-server vs Peer-2-peer
Interest of P2P S N-1 2 N 1 3 S N-1 2 N 1 3 What is the minimum download time under either paradigm ?

10 Client-server F-bits long file 1 server Us server upload rate
N clients Di download rate of the i-th client Dmin = min( Di ) , slowest client Ti download time of the i-th client T = max( Ti ), system completion time Assuming simple fluid model, server policy is to give each of the N peers an equal share of its rate Us/N, what is T equal to ? S N-1 2 N 1 3 (1) T >= F / (Us/N) = NF / Us i.e., the download cannot be faster than the share of the server upload capacity allows T >= F / Dmin i.e., the download cannot be faster than the downlink capacity of the slowest peer allows T >= F/min( Us/N, Dmin )

11 Peer-2-peer F-bits longffile
1 source peer (having the content at time t=0) Us source peer upload rate N sink peers Ui,Di upload & download rate of the i-th peer Dmin = min( Di ) , slowest peer Ti download time of the i-th client T = max( Ti ), system completion time Assuming simple fluid model, source gives bits to peers at rate Si, each peer replicate received data toward the others N-1 peers at rate <= Si S N-1 2 N 1 3 (1) T >= F / Us i.e., no peer can receive faster than the source can send (2) T >= NF /( Us + SUi ) i.e., the overall data cannot be downloaded faster than the aggregated system capacity (with peers) allows T >= F / Dmin i.e., the download cannot be faster than the downlink capacity of the slowest peer allows T = F/min(Us, (Us + SUi)/N, Dmin)

12 Client-server vs Peer-2-peer
Interest of P2P (in theory) P2P protocols can offload server capacity, allowing sub-linear (e.g., logarithmic) scaling (in principle) Performance example with file diffusion, conclusion hold for many services (today: DHT lookup, BitTorrent diffusion) 800 File size F=10MB, Peer upload Ui=500 Kbps, Source upload Us=10 Mbps Peer download Di >> Ui peer-2-peer client-server 700 600 500 File diffusion time Tmin (sec) TCS = F / (Us/N) = NF/Us TP2P = F / [(Us + SUi)/N] = NF / (Us + SUi) Best case performance 400 300 SUi 200 100 10 20 30 40 50 60 70 80 90 100 Number of peers/clients N

13 Client-server vs Peer-2-peer
Performance of P2P (in practice) Affected by protocol decisions and implementations (e.g., ability to saturate capacity at L4, to make useful decisions at L7, etc.) Affected by peer behavior (e.g., free riders, peers not contributing or contributing less than their fair share) 800 peer-2-peer 0<h<1: protocol efficiency 0<a<1: free riders client-server 700 600 500 h 0 File diffusion time Tmin (sec) TP2P = F / [(Us + h(1-a)SUi)/N] = NF / (Us + h(1-a)SUi) More realistic performance 400 a1 300 SUi 200 100 10 20 30 40 50 60 70 80 90 100 Number of peers/clients N

14 P2P Overlays P2P network is a graph, where edges represent logical connections between peers Logical connection: Ongoing communication via TCP , UDP sockets Reachability with application layer routing tables

15 P2P Overlays P2P networks are commonly called Overlay as they are laid over the IP infrastructure Notice that: Not all logical links are also physical Not all physical links are used A B C Application Overlay IPa IPc IPb IP Network

16 P2P Overlays P2P overlay graphs are very much different, and depend on implementation choices App1 App2

17 P2P Overlays Peers Challenges
Come and go due to their will or due to failures (l) Have/have not the resources you are looking for (h) May/may not be willing to give the resource (a) Challenges Be resilient face to churn, failures (l) Effectively locate and share resources (h) Scale to possibly several million users (h) Incentivate peers participation to the system (a)

18 P2P Taxonomy P2P software spans a fair range of services Structure
File-sharing and content distribution Voice/video call Television/VoD CPU sharing Indexing and search Structure Structured Unstructured Hierarchical Structure is not tied to a particular service Filesharing BitTorrent eDonkey SopCast TVAnts PPLive Joost Live TV / VoD VoIP/Chat Skype Gtalk Search Kademlia RapidUpdate DebTorrent Apt-p2p,… SW Updates

19 P2P Taxonomy Unstructured Hierarchical Structured
Arbitrarily connected (akin to random graphs) One level of hierarchy (super-peers) Routing exploits regularity Distributed Hash Tables used by Skype, BitTorrent, eDonkey, ... Gnutella (up to v0.4) BitTorrent, P2P-TV Gnutella (v0.6 onward) Skype and eDonkey (initially)

20 P2P in this course Lookup and routing Diffusion and scheduling
Find the peers having the resource of interest (e.g., a file or a contact for messaging/calls) Diffusion and scheduling Ensure timely and effectively diffusion of a resource (e.g., a file, or a TV stream). Transport and congestion control Interactive, near real time applications (Skype) Background applications (e.g., BitTorrent)

21 P2P Lookup (DHTs)

22 P2P Lookup Lookup strategies Note Centralized indexing (Napster)
Query via random walk Query via flooding (Gnutella) Distributed Hash Tables (Chord) Note For the time being, we do not deal with the problem of what to do after the resource is located on the overlay

23 Lookup via Centralized Indexing
Centralized server keeps list L of peers ressources Peers keep the ressources R notify ressources to server at startup anytime a new ressource is added or deleted L R

24 Lookup via Centralized Indexing
Example (1/2) peer A search for ressources R stored at peer B A ask server S location of R S reply to A with address of B A contact B and fetches R A notifies S he owns R S updates ressources list 1:R? 2:B! 5:R! A 3:R? B 4:R

25 Lookup via Centralized Indexing
Example (2/2)‏ peer C search for ressources R now stored at peers A, B C ask server S location of R S reply to C with addresses A, B C selects best peer among A,B probes delay, bandwidth, ... C gets R, then notifies S, which updates list L 1:R? 2:A,B A C B probe

26 Lookup via Centralized Indexing
Pros Simple architecture Only limited traffic to server (query)‏ Peer-centric selection of best candidates Cons Central database not scalable Single point of failure Single legal entity (Napster was actually shutdown)‏ A C B

27 Flooding on Unstructured Overlay
Peers each peer only stores its own content Lookup Queries forwarded to all neighbors Flooding can be dangerous generate a traffic storm on the overlay Need to limit flooding drop duplicated query for the same element constrain flooding depth (application level Time To Live field)‏

28 Flooding on Unstructured Overlay
Example A look for R, stored at D, F sets TTL=4, send query to B,C B,C haven’t the ressource R forward query to D, E respectively D has the ressource Route the query back on the overlay E hasn't ressource, Forward to G, H drops: TTL exceeded, query not reach F H G B R D C R? A E F R

29 Flooding on Unstructured Overlay
Example A look for R, stored at D, F sets TTL=4, send query to B,C B,C haven’t the ressource R forward query to D, E respectively D has the ressource Route the query back on the overlay E hasn't ressource, Forward to G, H drops: TTL exceeded, query not reach F Hint to heuristic solution: expanding TTL rings H G B R Voilà R! D C A E R, please F R

30 Flooding on Unstructured Overlay
Pros Query lookup is greedy Simple maintenance Resilience to fault due to flooding Cons Tradeoff scalability/accuracy Either flooding sends (too) many messages Or lookup may fail (due to TTL)

31 Flooding on Hierarchical Overlay
Peers Each peer only stores its own content Super-peers Indexes the content of the peers attached to them Lookup Peers contact their super-peers Flooding is restricted to super-peers

32 Flooding on Hierarchical Overlay
Pros Lookup is still greedy and simple Efficient ressource consumption Hierarchy increases scalability Cons Increased application complexity Less resilient to super-peers churn Need to carefully select super-peers

33 Lookup on Structured Overlay
Peers Arranged on very specific topologies Implement topology-dependant lookup to exploit overlay regularity Indexing Peers and ressources are indexed Structured overlay implement a Hash semantic Offer insertion and retrieval of keys For this reason, are called Distributed Hash Tables

34 Lookup on Structured Overlay
Indexing idea take an arbitrary key space; e.g., real in [0,1] take an arbitrary map function e.g., colorof(x) map peers to colors lpeer = colorof(peer) map ressources to colors lressource = colorof(ressource) assign ressources to closest peer (in terms of l) ressource peer peer

35 Lookup on Structured Overlay
Indexing in practice (e.g. the Chord DHT) uses consistent hashing Secure Hash SHA1 (B bits) ID=SHA1(x) in [0,2^B-1] Peers are assigned peerID = SHA1(peer) Ressources are assigned fileID = SHA1(file) File is inserted at the peer whose peerID is closer to (but not greater than) fileID Peers are responsible for a portion of the ring P2m-1 P1 P8 P14 F10 F15 F38 F50 P32 P56 P38 P51 P48

36 Lookup on Structured Overlay
Lookup for a fileID Simplest lookup possible Every node uses its successor Successor pointer needed for lookup correctness Highly inefficient Linear scan of the ring Longer-reach pointers can be used to improve lookup efficiency P2m-1 P1 P8 P14 P8.Lookup(F56) P21 successor P32 P38 P51 P42 P48

37 Lookup on Structured Overlay
Keep a list of log(N) ``fingers'' pointers to peers at exponential ID space distance 1st finger is at peerID + 2^1 2nd finger is at peerID + 2^2 k-th finger is at peerID + 2^k if peerID + 2^k does not exist, take closest following peer P2m-1 P1 P8 successor P14 P21 P32 P56 P38 P51 P42 P48

38 Lookup on Structured Overlay
fingers Keep a list of log(N) ``fingers'' pointers to peers at exponential ID space distance 1st finger is at peerID + 2^1 2nd finger is at peerID + 2^2 k-th finger is at peerID + 2^k if peerID + 2^k does not exist, take closest following peer P2m-1 P1 P8 k finger 1 +1 N14 2 +2 3 +4 4 +8 N21 5 +16 N32 6 +32 N42 +1 +2 +4 P14 +8 P21 +16 P56 P32 +32 P51 P38 P42 P48

39 Lookup on Structured Overlay
Lookup for a fileID Idea: make as much progress as possible at each hop Greedy algorithm so, choose the finger whose ID is closest to (but strictly <) fileID next hops do the same intuitively, distance from fileID halves at each step (dichotomic search) consequently, mean lookup length is logarithmic in the number of nodes lookup can be recursive (as in the picture) or iterative (recall DNS) P2m-1 P1 P8 P14 P8.Lookup(F56) P21 P56 P32 P51 P38 P42 P48

40 Lookup on Structured Overlay
Bad protocol s space Pros ``Flat'' key semantic Highly scalable Tunable performance State vs lookup length tradeoff Cons Much more complex to implement Structure need maintenance (churn, join, refresh) Difficult to make complex queries (e.g., wildcards, or sub/ubs/str/tri/rin/ing/ngs) Optimal line Unfeasible region

41 References Optional readings:
[1] Lua, E.K., Crowcroft, J., Pias, M., Sharma, R. and Lim, S., A survey and comparison of peer-to-peer overlay network schemes . IEEE Communications Surveys and Tutorials, 7(2):72-93, 2005. [2] Stoica, I., Morris, R., Karger, D., Kaashoek, M.F. and Balakrishnan, H., Chord: A scalable peer-to-peer lookup service for internet applications . ACM SIGCOMM, 31(4): , 2001. For your thirst of knowledge: [3] Zave, Pamela, Using lightweight modeling to understand chord . ACM SIGCOMM Comput. Commun. Rev., 42(2): [4] Rowstron, Antony, and Peter Druschel. "Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems." Middleware Springer Berlin/Heidelberg, 2001. [5] Ratnasamy, S., Francis, P., Handley, M., Karp, R., and Shenker, S. A scalable content-addressable network ACM SIGCOMM, 31(4): , 2001. [6] Malkhi, Dahlia, Moni Naor, and David Ratajczak. "Viceroy: A scalable and dynamic emulation of the butterfly." Proceedings of the twenty-first annual symposium on Principles of distributed computing. ACM, 2002. [7] Maymounkov, Petar, and David Mazieres. "Kademlia: A peer-to-peer information system based on the xor metric." Usenix IPTPS, p , 2002 [8] D. Loguinov, A Kumar, V Rai, S Ganesh "Graph-theoretic analysis of structured peer-to-peer systems: routing distances and fault resilience." ACM SIGCOMM, 2003 [9] F. Kaashoek and D.R. Karger, ``Koorde: A Simple Degree-Optimal Hash Table’’, Usenix IPTPS, Feb

42 P2P Diffusion (BitTorrent)

43 43 Plan BitTorrent ecosystem overview Swarming algorithm(s) References
Interest for multipart download Piece selection – Local rarest first & co Peer selection – Tit for tat & co Performance overview References 43

44 BitTorrent: Overview Seed Leecher File pieces Piece transmission
Messages over: TCP (old) UDP/LEDBAT (new) Tracker

45 Ecosystem: Overview Source: [3]

46 Ecosystem: Overview Resource localisation Peer localisation
Torrent files Torrent discovery sites Peer localisation Trackers DHTs Gossiping Resource diffusion BitTorrent swarming protocol Focus

47 Swarming Multipart download Benefits of parallel downloads
Benefits of splitting content in chunks

48 Swarming: Chunk-based transfer
Service capacity Number of peers that can serve a content It is 1 for client-server, constant with time Flash crowd of n clients/peers Simultaneous request of n peers (e.g., soccer match, patch, etc.) Piece (a.k.a. chunk, block) A content is split in m pieces Each piece can be independently downloaded

49 Swarming: Chunk-based transfer
Content based transfer P2P Service capacity grows exponentially with time mean P2P download time is O(log(n)) vs client-sever O(n) (see previous model, or [4]) Chunk-based transfer Content is divided into m chunks, k parallel downloads Mean download time decreases by 1/m, to O(log(n)/m) (see reading [4]) Negligible overhead, efficiency reduced by O((log(m)/m)k) (see reading [5]) k=4 k=2 m

50 Swarming: Chunk-based transfer
Limits of the models [4,5] Ideal scenario Homogeneous capacity (h=1) No transport layer effects (h=1) No network bottleneck (h=1) No free riding (a=0) No churn (l=0) Global knowledge (h=1) Ideal peer selection Each peer always knows to which peer P to send the content at a given time (h=1) Ideal piece selection A peer is never blocked because it does not have the right piece to upload to others (h=1) In practice Simple (but very telling) models of ideal swarming performance P2P is very efficient when there is peer to send useful pieces to Optimistic benchmarks significant algorithmic tuning needed to approach the bound (h1) Careful interpretation Real-world P2P performance can easily fall apart from ideal ones (h<<1)

51 Swarming: Parallel downloads
Dynamic parallel download Introduced to speed up Web caches in the early 2000 [6] Idea: download different pieces from multiple k servers Start requests for piece { P1, .., Pk } to all available severs { S1, .., Sk } Each time a piece Pj is received from Si, request to Si a new piece (=previously not requested to any other sever) Two performance issues Interblock idle time, solved by pipelining Termination idle time, solved by “endgame” mode (BitTorrent terminology) S1 C S2 P1 P2 P1 P3 P2 P3 P4 P5

52 Swarming: Parallel downloads
Inter-block idle time Time to receive a new request after sending the last byte of a piece Problem 1RTT idle time Server underutilization (h<1) Solution Pipelining (recall HTTP/1) Two flavors Bufferless Buffering S1 C S2 P1 P2 P1 Idle time P3 P2 P3 Idle time P4 P5

53 Swarming: Parallel downloads
(Bufferless) pipelining Ensure next request hits the server before the current request is completely served Wait for start of chunk reception before sending new request Advantage Avoid buffering of user requests at server Complexity lies at requester (request scheduling) More flexible scheduling at client (requests scheduled just-in-time) Disadvantage Assume knowledge of RTT Assume chunk tx time > RTT Underutilization still possible in case of misestimation S1 C S2 P1 P2 P1 Idle time P3 P5 RTT < P3 P3 No idle time P4 P5

54 Swarming: Parallel downloads
(Buffered) pipelining Ensure server always has a buffer of requests Sending a stack of request, then new request at each chunk reception Advantage Efficient use of resources No need for precise RTT estimate provided target number of request is large Disadvantage Complexity lies at sender (buffering of requests) Lack of flexibility* (when already requested resource becomes available elsewhere) S1 C S2 # of pipelined requests at S1 P1,3,5 P2 3 2 P1 P6 1 P3 2 P7 1 P5 2 P8 1 P6 2 P4 P5 * Flexibility at the price of complexity (e.g cancel pending requests) and potential problems (e.g., imagine bidirectional exchanges in band with request signaling)

55 Parallel download in P2P
Parallel download in CS Clients and servers are decoupled Small number of parallel downloads Parallel download in P2P Every peer is client and server (coupling) Large peer set (degree of parallelism)

56 Parallel download in P2P
Naive solution Every peer connects to every other peer Unfeasible Not scalable, too large number of connections Inefficient Per-peer throughput scales as Ui/N Unpractical Some peers not altruist (but global Ui/N share incentives free-riding) Practical solution Articulated as piece/peer selection Need to deal with free-riding

57 Parallel download in P2P
Ideal solution Forget free-riding for the time being (assume nice ideal altruistic world) Every peer connects to a limited number d of peers Generally studied in terms of parallel upload (mirror view, equivalent problem) Tradeoff Number of peers served (d) versus per-peer service rate (1/d) Increasing d adversely impact the system for static neighborhood [4] Even in dynamic swarms, marginal gain for d>4, suggests d<10 [4] BitTorrent mainline: if upload > 42 kB/s, uploads = int(math.sqrt(rate * .6)) Practical solution Articulated as piece/peer selection Need to deal with free-riding

58 Parallel download in P2P
Swarm Peer set Active peer set L7 routing table infos L4 ongoing transfers 10s-10,000s of peers (stats in [3]) ~40-80 peers ~4-10 peers

59 Parallel download in P2P
Gnutella eMule/eDonkey BitTorrent Aim Content lookup Content diffusion Resource Lookup Flooding on unstructured (v0.4) or hierarchical (v0.6) overlay Flooding on hierarchical network or DHTs Out-of-band (DHTs, trackers, peer-exchange…) Piece selection File splitting not in v0.6 (later addition) Rarest first piece (+ other criteria) Local rarest first Peer selection Order of answer Age in priority queue * credit modifier (U/D ratio) Choke algorithm on short term bandwidth Diffusion Poor performance Not thoroughly studied Slow reactivity Good performance Thoroughly studied Free riding Not dealt with Still possible Mostly unlikely Local rarest first Choke algorithm on short term bandwidth In what follows focus on BitTorrent

60 Importance of peer/piece selection
Scalability results [4,5] assume the following Always find a peer to which to upload an useful piece Never idle, 100% efficient use of uplink capacity Peers never refuse to upload a piece No free riders If any of the above assumptions is relaxed System models no longer apply Impact (worsening) on BitTorrent performance ? Parallel download no longer applies (selfishness) ! Peer selection goals Always find peers to upload to Prevent free riding Piece selection goals Always have useful data to send to others Piece diversity is called entropy BitTorrent peer selection aim Converge to the best upload-download match BitTorrent piece selection aim Maximize piece entropy of the swarm Peer/piece selection order Peer selection first Idea: capacity more important for diffusion performance

61 Piece selection policies
Random piece selection Naive, but simple to implement Poor entropy [1] Global rarest first Select the globally rarest piece Each peer need to know the number of copies of every piece in the swarm Good entropy [1] BitTorrent Additional policies for corner cases strict priority, random first piece endgame mode, super-seeding Local rarest first Approximates global rarest first Select the piece that is the rarest among the immediate neighbors Tradeoffs accuracy vs communication cost and complexity Good entropy on random graph (BitTorrent swarm) with a large degree (peer set) [1] Peer selection is performed first, selected piece will be the rarest in the active peer set

62 Piece selection: Local Rarest First
Aim Minimize overlaps between buffermaps Interest Equalize resource availability (maximize entropy) Avoid all peers interested in the same content (no piece hotspot) Let peer selection efficiently exploit the available capacity All peer have always something of interest (avoid idle time) Good Bad

63 Piece selection: Strict priority
Aim Pieces (256KB-10MB) are divided in sub-pieces (16KB) Useful to rapidly obtain complete pieces Policy If request(s) for a piece has been sent Requests for the other sub-pieces are sent out before a new piece request can be started Interest Sub-pieces are useful for pipelining (efficient transfers) Only pieces have a signature for data verification Only pieces can be shared, sub-pieces cannot (more details later)

64 Piece selection: Random First Piece
Aim Assisting new leechers to obtain their first piece Policy While a complete piece is not obtained yet Send requests for random pieces Exception to Local Rarest First Interest When new leechers have no piece, they cannot upload to other peers, and must wait for OU Let new leechers exploit OU and RU

65 Piece selection: Endgame mode
Aim Assist leecher in becoming seeds Policy Toward the end of the download, allow exception to Rarest First, as the needed chunks may not be rare Send request for all missing pieces to all active neighbors Interest Having more seed in the swarm Ensure perennity of the swarm

66 Peer selection policies
BitTorrent Tit-for-tat (TFT) terminology Choke/Unchoke A chokes B if stops uploading A unchokes B if uploads to B Interested/not interested A is interested in B if B has ≥1 piece that A does not have A is not interested in B if B piece set is a subset of A set TFT algorithm family Choke /unchoke Seed state, Leacher state Additional algorithm Anti-snubbing Optimistic unchoking Choke metrics Capacity primary metric for a diffusion service Potentially based on several peer properties Capacity, delay, AS, … or combination of the above

67 Peer selection: Choke LS vs SS
Choke leecher state (LS) Leecher : partial copy Regular unchoke (RU) Every 10 seconds, sort peers according to their upload rate to the local peer Keep the 4 fastest uploaders Optimistic unchoking (OU) Every 30 seconds, choke 1 active peer and unchoke another at random For the local peer, OU may let him discover new fast peers with whom to reciprocate For the remote peer, OU may provide some piece to new peers having just joined the swarm For the swarm, OU avoid that peers exchange in small groups No more than 4 peers are unchoked at the same time Choke seed state (SS) Seed : complete copy Favor Upload (FU) Same as in LS, but sorting criteria is fastest donwloader Best for swarm, since seed already has a complete copy Round Robin (RR) Every 10 seconds the interested peers are ordered according to the time they were last unchoked For two consecutive 10s periods, the oldest 3 peers are unchoked + an OU as 4th peer For the third 10s period, the olderst 4 peers are unchoked FU may favor fast free-riders, RR equally favor all peers

68 Peer selection: Choke LS vs SS
Choke leecher state (LS) Choke seed state (SS) 10s Time (10 seconds slots) OU RU 1 2 3 4 5 6 7 8 9 10 Peer ID 11 10s Peer ID (oldest first) OU RU 1 2 3 4 5 6 7 8 9 10 12 60 seconds Time (10 seconds slots)

69 Peer selection: Anti-snubbing
Aim Avoit snubbing problems (inverse of free-riding) Policy If we upload to a peer A that chokes us for >60s, choke it Wait for OU from A before unchoking A (resume upload) Interest Snub = upload without download; may happen when the upload rate toward A is lower than the 4 best uploaders for A To avoid snubbing, force conditions for OU from A by stopping the upload

70 BitTorrent performance
Very popular, successful: empirically What exactly makes it perform so well? Which parameter is crucial? Performance depend on many parameters, hard to model them all analytically [4,5] Answers via simulation [1] and experiments [2] Some performance issues Are download rates optimal? Under which conditions ? [1] Is Local Rarest First policy good enough? [1] Does rate-based Tit-for-tat (TFT) work? [2] Effectiveness of sharing incentives ? [2] Protocol packetization and signaling overhead Resilience to free riding [7,8,9,10] (h) (a)

71 Experiments [2] Scenarios Aim Deploy BitTorrent on PlanetLab
Instrumented BitTorrent client (mainline 4.0.2; 2nd most downloaded BT client at SourceForge 51M in 2009) Instrumented client logs control messages, state machine transitions , algorithm internals, bw estimates, … Scenarios 1 single initial seed always connected for the duration of the experiment 40 leechers join at the same time (flash crowd) New leechers leave as soon as they becom seed Content: ~100MB file, 400 pieces of 250KB each Three classes of peers (slow/medium/fast) Aim TFT perf. for slow vs fast seed Main metric: clustering of exchanges with peers of the same class (TFT reciprocation)

72 Experiments [2] Clustering index Scenarios Aim S XC RU(P,X) CI(P,C) =
World class W = Cfast U Cmedium U Cslow Clustering index (CI) of peer P in class C S XC RU(P,X) CI(P,C) = S YW RU(P,Y) Interpretation Ratio of Regular Unchoke (RU) duration CI(P,C)=1 if P unchoked only C peers CI(P,C)=0 if P unchoked only !C peers CI(P,C)=0.3 if P unchoked peers uniformly at random (over 3 classes) Scenarios 1 single initial seed always connected for the duration of the experiment 40 leechers join at the same time (flash crowd) New leechers leave as soon as they becom seed Content: ~100MB file, 400 pieces of 250KB each Three classes of peers (slow/medium/fast) Aim TFT perf. for slow vs fast seed Main metric: clustering of exchanges with peers of the same class (TFT reciprocation)

73 TFT, fast seed [2] We see clusters per class Two artifacts 20 19
Slow class squares are darker since download takes longer Peer 27 slower than other peers in its class (problem with a PlanetLab node) 20 19 Peer 27 seed slow medium fast

74 Sharing incentive, fast seed [2]
Fast peers complete close to optimal completion time Vertical black line In order to be unchoked by fast peers, peers need to reciprocate The more you contribute the faster you complete Effective sharing incentive Well-provisioned initial seed Cluster formation, effective sharing incentive, good upload utilization

75 TFT, slow seed [2] We no longer see clusters Reasons seed
Slow source, capacity bottleneck become content bottleneck Fast peers reciprocate with slower peers, when they are not interested in any other fast peer seed

76 Sharing incentive, slow seed [2]
Earliest completion time longer than with a fast seed Vertical black line Most peers complete close to the earliest completion time Choking algorithm does not provide effective sharing incentive when the seed is underprovisioned Underprovisioned initial seed No clustering, ineffective sharing incentives, Seed provisioning critical : at least as fast as the fastest peer for TFT efficiency as well

77 Free riding (a) Tit-for-that / Choke
Enforce reciprocation, prevent free-riding Still open to some exploit [7,8,9,10] Free riding definitions In economics, "free riders" are those who consume more than their fair share of a resource, or contribute less than a fair share of the costs of its production. BitTorrent free-riding upload < download [7,8,9] no upload [10] Overall results Single peer perspective: free riding is possible [9,10] BitThief [10] Opportunistically connects to a very large number of peers Get seeds RU, leecher OU 0 upload BitTyrant [9] Similar techniques Also exploits leecher RU Provides > 0 upload Whole system perspective Few free riders have limited impact on the whole system [7] However, performance hurted when all peers use BitTyrant [9]

78 Protocol overhead Overhead due to Useful payload (P) Overhead
Signaling (control messages) Packetization (40B TCP/IP header) Useful payload (P) Bytes rx/tx in a PIECE (P) message (without L7 headers) Overhead Ratio of non-payload bytes over the total amount of bytes (overhead + payload) BitTorrent protocol overhead ~3% of bytewise overhead for most of the experiments Signaling overhead = many packets but very few bytes Packetization overhead = 40B /1500B MSS = 2.6% Control messages that account for most of the signaling overhead HAVE, REQUEST, BITFIELD

79 References Optional readings:
[1] A. R. Bharambe, C. Herley, and V. N. Padmanabhan "Analyzing and Improving a BitTorrent Network’s Performance Mechanisms". IEEE INFOCOM’06, Barcelona, Spain, April 2006. [2] A. Legout, N. Liogkas, E. Kohler, and L. Zhang “Clustering and Sharing Incentives in BitTorrent Systems”, ACM SIGMETRICS’ 07 For your thirst of knowledge: [3] C. Zhang, P. Dunghel, D. Wu, K.W. Ross, “Unraveling the BitTorrent Ecosystem”. IEEE Transactions on Parallel and Distributed Systems, 2011 [4] X. Yang and G. de Veciana “Service Capacity of Peer to Peer Networks”. IEEE INFOCOM’04 [5] D. Qiu and R. Srikant “Modeling and Performance Analysis of BitTorrent-Like Peer-to-Peer Networks”. ACM SIGCOMM’04 [6] P. Rodriguez, E. W. Biersack “Dynamic Parallel Access to Replicated Content in the Internet”. IEEE INFOCOM’00 [7] N. Liogkas, R. Nelson, E. Kohler, and L. Zhang. "Exploiting Bittorrent For Fun (But Not Profit)". In Proc. of USENIX IPTPS’06, Santa Barbara, CA, February 2006 [8] S. Jun and M. Ahamad."Icentives in BitTorrent Induce Free Riding". In Proc. of the Workshop on Economics of Peer-to-Peer Systems (P2PEcon’05), Philadelphia, PA, August 2005. [9] T. Locher, P. Moor, S. Schmid, and R.Wattenhofer. "Free Riding in BitTorrent is Cheap." In Proc. of ACM HotNets-V, Irvine, CA, November 2006. [10] M. Piatek, T. Isdal, T. Anderson, A. Krishnamurthy, A. Venkataramani, Do Incentives Build Robustness in BitTorrent? In Proc of USENIX NSDI 2007

80 ?? || // 80


Download ppt "RES 203 Internet applications"

Similar presentations


Ads by Google