Presentation is loading. Please wait.

Presentation is loading. Please wait.

BitTorrent. BitTorrent network  On the itinerary:  Introduction to BitTorrent  Basics & properties  3 Interesting analysis results.

Similar presentations


Presentation on theme: "BitTorrent. BitTorrent network  On the itinerary:  Introduction to BitTorrent  Basics & properties  3 Interesting analysis results."— Presentation transcript:

1 BitTorrent

2 BitTorrent network  On the itinerary:  Introduction to BitTorrent  Basics & properties  3 Interesting analysis results

3 Publishing  How to publish a (usually large) file ?  Dedicated server: Easy to manage, Easy to find, Persistent service Nevertheless…  BitTorrent: Organizes multiple clients that share the same file Leveraging the upload bandwidth of the participants Self scaling, resilience, operates well in “Flash crowed” period Takes 50%-60% of all p2p traffic

4 The Basics of BitTorrent  Content provider (everyday people) wants to publish a file (the initial seed): Creates a meta file (*.torrent) Publish this (light) *.torrent file on a web server File is broken to small blocks (32-256 KB) Uploads blocks to other peers The goal: to publish the file to many nodes by the help of other peers, with minimum load on the seeder

5 The Basics of BitTorrent  Third party (the tracker): A tracker site keeps track of the active participants + extra statistics. Upon requests from nodes, supplies a random subset of active nodes Receives updates from the active nodes Keeps track of new node joining the ‘torrent’ (or ‘swarm’) and nodes that left it

6 The Basics of BitTorrent  Peer (leecher) who is interested: Obtains the public *.torrent file Being directed to the tracker Obtains a list of random neighbors (~40) Downloads and uploads blocks to its ‘best’ neighbors (choking and unchoking) Upon download completion, becomes a seed

7 BitTorrent demo (Wikipedia)

8

9

10

11

12

13

14

15

16

17 BitTorrent basic schemes  (Immediate) Problems arise: Last block problem: assume nodes depart upon completion, how would leeches obtain the last block ? Free riding problem: willing to download, but unwilling to serve others  Simple and effective solutions: Last block problem: Nodes employ Local Rarest First policy Free riding problem: Nodes employ “Tit For Tat” policy, i.e. give more to those whom you accept more from

18 On the itinerary  One interesting limitation of BitTorrent networks BitTorrent provides poor service availability via analysis of tracker logs over long period  Proposition of peer selection approach Enables to lower costs and resources of ISPs Does not require ISP and peers cooperation Implementation of the proposition as part of Azureus client, codename ‘Ono’  Tradeoffs Performance (avg. Download time) vs. Fairness (avg. share ratio)

19 BitTorrent limitations  Taken from the paper “Measurements, Analysis and Modeling of BitTorrent-like systems” [1] Inspect overall performance in the lifetime of a torrent Analysis is based on traces A model is derived which is used to draw conclusions Verify that the derived conclusions match the observed behavior  Limitations were found: Poor service availability (coming next) Fluctuating download performance Unfair service to peers

20 Service availability - Analysis  Analysis is based on traces Tracker logs (~1500 torrents, sampled every 30 sec) Traces from servers that publish *.torrent files  Extracted data Identify peers Birth time of the torrent, size For each peer in the same torrent: arrival time, download & upload bandwidth, download & upload accumulative bytes

21 Service availability - Analysis  Y-axis at time t : The total number of requests for all torrents in the trace minus the cumulative number of requests for all torrents after time t, since they are born.  Similar observations are seen in the *.torrent metadata traces

22 Service availability - Analysis  This suggests an exponential decrease rate of requests, since a torrent is born  Notice that this is a cumulative measure  Does a specific torrent behaves the same? Use the least square method to measure how much a specific torrent deviates from this logarithmic fitting Analyze the average relative deviation distribution (which is mostly small – on average 6%)

23 Service availability – A model  Based on the observations, define the torrent popularity at time t as the peers arrival rate, which is the derivative of the of the peer arrival time distribution for that torrent.  Arrival rate: Where is the initial arrival rate when the torrent starts And is the attenuation parameter Both are evaluated from the observations

24 Service availability – A model  Define Torrent lifespan: duration from the birth to the time of no complete copy (thus new leeches would not be able to complete the download) Inter arrival time between two successive arriving peers could be approximated as Assume seeds leave the system at rate then the average service time for a seed is approximately

25 Service availability – A model  Look at consecutive peers that join the torrent Peer n and (n+1) join the torrent in time t(n) and t(n+1) respectively The inter arrival time between them is approximately Peer n downloads the file with speed u(n) and stays in the torrent for time duration When peer arrival rate is small enough (n is large), peer n+1, with speed u(n+1) <= u(n) could only be served by peer n

26 Service availability – A model Thus when peer n+1 can’t complete the download and the torrent is dead Using the definition of arrival rate, we get the torrent lifespan: Both and are extract from the trace (using linear regression), as well as Compare results from the model to what the trace holds Results match very well

27 Model vs. Observations Comparison of torrents lifespan: average lifespan according to trace is 8.89 and 8.34 based on the model

28 Service availability - Summery  Conclusions were obtained relying on extensive trace analysis and modeling Existing BitTorrent systems provides poor service availability This is due to the exponential decreasing peer arrival rate This provides strong motivation for inter-torrent collaborations

29 Next on the itinerary  One interesting limitation of BitTorrent networks BitTorrent provides poor service availability via analysis of tracker logs over long period  Proposition of peer selection approach Enables to lower costs and resources of ISPs Does not require ISP and peers cooperation Implementation of the proposition as part of Azureus client, codename ‘Ono’  Tradeoffs Performance (avg. Download time) vs. Fairness (avg. share ratio)

30 Reducing cross-ISP traffic  Taken from the paper “Taming the torrent” [2]  Motivation: overwhelming popularity of p2p (70% of internet traffic worldwide) yielded significant revenues for ISP.  However, p2p traffic significantly has increased ISP’s costs, particularly in terms of cross-ISP traffic  This has driven ISP to try and forcefully reduce p2p traffic Block specific ports, tricking clients to close connections Deep packet inspection Caching

31 Reducing cross-ISP traffic  One approach to alleviate this pain is to use an Oracle that provides knowledge about which peers are in the same ISP  This would benefit both ISPs and p2p community  But, this requires p2p users and ISP to collaborate and to trust each other  Not likely to be adopted

32 Reducing cross-ISP traffic  Another approach is to recycle data that is already being collected by Content Distribution Networks  CDNs attempt to improve web performance by redirecting requests to replica servers  The goal is to help content providers (i.e. CNN) to distribute content by redirecting requests to replica servers that are: Topologically proximate Provide lower-latency

33 CDNs as oracles  Hypothesis: when peers exhibit similar redirection behavior, they are likely to be close to the replica server, and thus to each other  Represent redirection behavior using ratio-maps  Each ratio represents the frequency of redirecting to a specific replica  Number of replicas is usually small (max 31)  Keep a time window (~ a day)

34 CDNs redirections as ratio-maps  The ratio map of a peer is a set of (replica server, ratio)  for peer a  Specifically, if peer a is redirected toward replica server r1 75% of the time window, and toward replica server r2 25% of the time window, then the corresponding ratio-map is  The sum of all in a given ratio map equals one

35 Similarity via ratio-maps  Define a metric that, given two peers, produces a value describing the similarity between the peers’ redirections behavior  We are looking for overlap in redirection frequencies maps between each two peers  Use cosine-similarity between two peers a and b:

36 Cosine-similarity  Distance(a,b): Sum is over the set of replica servers that a (b) to which the peer has been redirected over the time window  is the ratio of time that peer a has been redirected to replica server i  Cosine-similarity is analogous to dot product When maps are identical, equals 1 When maps are orthogonal (no common replica), equals 0 Values lie in [0,1] Determine a threshold (0.15)

37 CDNs implementation  Ono, an extension to Azureus client  Upon handshake of two peers, exchange ratio-maps  This enables Ono to perform a biased peer selection Performs DNS lookup for each CDN name to determine redirection behavior and encodes it in ratio-maps Periodically update the ratio-maps  Overhead is extremely small 18KB upstream, 36KB downstream per day Computation of cosine-similarity is easy

38 Ono-recommended empirical results  Over 120,000 peers use Ono  Ono collects extra network data  Ping, Trace-Route to replica servers and peers  Obtain feedback on the biased peer selection Not easy to determine cross-ISP hops IP hops is easy and gives some measure  Compare Ono-recommended peers selection to random peer selection

39 Ono-recommended empirical results Cumulative Distribution function of the number of ip hops taken along paths between Ono client and his peers. Each value represents the average number of hops for all peers, seen by a particular Ono client during 6 hour interval Ono finds shorter paths Median in less than half More than 20% are only one hop away, via less than 2%

40 Ono-recommended empirical results Each ip address was mapped to corresponding Autonomous System id Similar to the previous graph Over 33% of paths found by Ono do not leave the origin AS Median AS hops is one vs. less than 10% in the random case

41 CDNs as oracles - Summery  Recycling network views collected by CDNs  Good internet citizenship in terms of reducing cross-ISP traffic  Performance of peers is not effected  Scalable (the more clients adopt it, the more accurate the bias would get)  Available easily and freely

42 Last on the itinerary  One interesting limitation of BitTorrent networks BitTorrent provides poor service availability via analysis of tracker logs over long period  Proposition of peer selection approach Enables to lower costs and resources of ISPs Does not require ISP and peers cooperation Implementation of the proposition as part of Azureus client, codename ‘Ono’  Tradeoffs Performance (avg. Download time) vs. Fairness (avg. share ratio)

43 BitTorrent Tradeoffs  Taken from the paper “The delicate tradeoffs in BitTorrent-like file sharing protocol design” [3]  Peers that participate in BT are heterogeneous with regard to download and upload capacities  Taking a system approach The system throughput depends critically on the “fat” peers However, this might result in unfairness towards those who contribute more This in turn would encourage peers to supply low upload rate to others

44 BitTorrent Tradeoffs  A user would look for download it gets to be proportional to the upload it supplies  Assuming peers take the system overview  Long lasting, steady state, rational  Two parameters and their inter relations are explored  Performance: minimum average rate of download time  Fairness: ratio between give and take

45 Model  W.L.O.G. the file is of size 1  Assume peer average arrival rate of  Assume Peers do not abort  Upon completion, peer leaves the torrent (BT provides no incentive of seeding)  Assume n classes (types) of peers  For each new peer arrival, with probability it belongs to type I  Thus, average arrival rate for class i is  Class i has Ui and Di as upload and download capacities  Assume U1>U2>…>Un (type 1 are the “fat” ones…)

46 Model  Visualization of the model for n=2

47 Model – measuring performance  Assume (quite natural) that the bottleneck is the upload capacity  i.e. no network bottleneck such as server saturation  The file uploading capacity of the entire system is  Consider the steady state:  Define as the average number of type i peers  Approximation:  Substitute the later in the former, in s.s.:

48 Model – measuring performance  In steady state, the capacity should be equal to the arrival rate (as the file size is 1)  Obtain  Define “share-ratio” and rewrite as In a steady and balanced system, share-ratio should be 1  Average system download time  Those two equations define the solution space, as well as the resulting performance  Feasible solutions are the set

49 Model – measuring fairness  Consider share-ratio as a good and natural measure of fairness  Define fairness index:  Measures how equal the ratios are (if all are the same, it equals 1)  (after some work) Obtain:  Also expressed in terms of upload and download of each class

50 Rate strategies  General assumption: all peers maximize their upload capacity  Based on experiments  We want optimal average download time T  Solve the constraint problem Use Lagrangian multiplier method

51 Rate strategies  Optimal average download time T that is obtained is:  We get an assignment for  The system gives the “thin” peers (other than type-1) maximum upload capacity  “Thin” peers get more than they contribute  Calculate the fairness index under this solution (shown to be quite low)

52 Rate strategies  Now, apply the same strategy to achieve optimal fairness  Then check what is the resulting performance measure  Optimal fairness is achieved when  We get different assignments for  Compare the two:  In terms of system performance, we have:  In terms of system fairness, we have:

53 Entire design space  Actually those are only two of infinite solutions for assigning  The space lies on a curve  Example: a system of two types with specific capacities

54 Simulation results  Experiments with two-types system (with the same characteristics as the last system)  Average downloading time:  Fairness:

55 Simulation results  Fundamental tradeoffs:  Taken for the extreme values of the two strategies:  Summery:  Cannot enjoy both heavens  Current BitTorrnet implementations lie somewhere on the curve

56 Think about …  With regard to the first paper (service availability), how did the ~8.3 average torrent lifespan was deduced ?  Thank you (those who are still awake…)

57 Papers  [1] Measurements, Analysis, and Modeling of BitTorrent-like Systems  Lei Guo, Songqing Chen, Zhen Xiao, Enhua Tan, Xiaoning Ding, and Xiaodong Zhang  [2] Taming the Torrent - A Practical Approach to Reducing Cross-ISP Traffic in Peer-to-Peer Systems  David R. Choffnes and Fabián E. Bustamante  [3] The Delicate Tradeoffs in BitTorrent-like FileSharing Protocol Design  Bin Fan, Dah-Ming Chiu, John C.S. Lui


Download ppt "BitTorrent. BitTorrent network  On the itinerary:  Introduction to BitTorrent  Basics & properties  3 Interesting analysis results."

Similar presentations


Ads by Google