Presentation is loading. Please wait.

Presentation is loading. Please wait.

THE BITTORRENT PROTOCOL OVERVIEW BY ANATOLY RABINOVICH AND VLADIMIR OSTROVSKY Peer-to-Peer File Sharing.

Similar presentations


Presentation on theme: "THE BITTORRENT PROTOCOL OVERVIEW BY ANATOLY RABINOVICH AND VLADIMIR OSTROVSKY Peer-to-Peer File Sharing."— Presentation transcript:

1 THE BITTORRENT PROTOCOL OVERVIEW BY ANATOLY RABINOVICH AND VLADIMIR OSTROVSKY Peer-to-Peer File Sharing

2  PEER-TO-PEER CONCEPT  OVERVIEW OF P2P GENERATIONS  OVERVIEW OF BITTORRENT PROTOCOL  BitTorrent is an Auction: Analyzing and Improving BitTorrent’s Incentives  CONCLUSION Outline

3 What is Peer-to-Peer?  Every node is designed to (but may not by user choice) provide some service that helps other nodes in the network to get service  Each node potentially has the same responsibility  Sharing can be in different ways:  CPU cycles: SETI@Home.SETI@Home  Storage space: Napster, Gnutella, Freenet…

4 P2P: Why so attractive?  Peer-to-peer applications fostered explosive growth in recent years.  Low cost and high availability of large numbers of computing and storage resources.  Increased network connectivity. As long as these issues keep their importance, peer-to-peer applications will continue to gain importance.

5 P2P: Why so attractive? Cont.  An important goal in P2P networks is that all clients provide resources, including:  Bandwidth  Storage Space  Computing Power  As nodes arrive and demand on the system increases, the total capacity of the system also increases.  The distributed nature of P2P networks also increases robustness in case of failures by replicating data over multiple peers.  In pure P2P systems -- by enabling peers to find the data without relying on a centralized index server.  There is no single point of failure in the system.

6 Overview of P2P Generations First Generation - Server-client  Centralized server system.  This system controls traffic amongst the users.  The servers store directories of the shared files of the users and are updated when a user logs on.  The Server-Client system is quick and efficient because the central directory is constantly being updated and all users had to be registered to use the program.

7 Overview of P2P Generations First Generation – Cont. Disadvantages:  There is only a single point of entry, which could result in a collapse of the network.  It is possible to have out of date information or broken links if the server is not refreshed. Example:  Napster, eDonkey2000, Limewire…

8 Overview of P2P Generations Second Generation - Decentralization  After Napster encountered legal troubles, Justin Frankel of NullSoft set out to create a network without a central index server, Gnutella was the result.  Unfortunately, the model of all nodes being equal quickly died from bottlenecks.  The problem was solved by having some nodes be 'more equal than others'.  By electing some higher-capacity nodes to be indexing nodes, with lower capacity nodes branching off from them, allowed for a network that could scale to a much larger size.  Also included in the second generation are distributed hash tables (DHTs), which help solve the scalability problem by electing various nodes to index certain hashes.

9 Overview of P2P Generations Third Generation -Indirect and Encrypted  The third generation of peer-to-peer networks are those that have anonymity features built in.  A degree of anonymity is realized by routing traffic through other users' clients, which have the function of network nodes.  Friend-to-friend networks only allow already-known users (also known as "friends") to connect to the user's computer, then each node can forward requests and files anonymously between its own "friends'" nodes. Disadvantages:  Most current implementations incur too much overhead in their anonymity features, making them slow or hard to use.

10 Overview of P2P Generations Third Generation - Streams over P2P  Apart from the traditional file sharing there are services that send streams instead of files over a P2P network.  Thus one can hear radio and watch television without any server involved -- the streaming media is distributed over a P2P network.  It is important that instead of a treelike network structure, a swarming technology known from BitTorrent is used.

11 Overview of the BitTorrent Protocol  BitTorrent is a peer-to-peer file sharing protocol used to distribute large amounts of data.  BitTorrent is one of the most common protocols for transferring large files, and by some estimates it accounts for about 35% of all traffic on the entire Internet.

12 Overview of the BitTorrent Protocol – Cont.  The protocol works initially when a file provider makes his file (or group of files) the first seed, which allows others, named peers, to download his data.  Each peer who downloads the data also uploads it to other peers and are encouraged to continue making their data available after their download has completed, becoming additional seeds.  Because of this, BitTorrent is extremely efficient. One seed is needed to begin spreading files between many users (peers).  The additions of more seeds increases the likelihood of a successful connection exponentially. Relative to standard Internet hosting, this provides a significant reduction in the original distributor's hardware and bandwidth resource costs.  It also provides redundancy against system problems and reduces dependence on the original distributor.

13 How does it Work? Cont.  To share a file or group of files, a peer first creates a small file called a "torrent" (e.g. MyFile.torrent).  This file contains metadata about the files to be shared and about the tracker, the computer that coordinates the file distribution.  Peers that want to download the file must first obtain a torrent file for it, and connect to the specified tracker, which tells them from which other peers to download the pieces of the file.

14 How does it Work? Creating and Publishing Torrents  The peer distributing a data file treats the file as a number of identically sized pieces, typically between 64 KB and 4 MB each.  Pieces with sizes greater than 512 KB will reduce the size of a torrent file for a very large payload, but is claimed to reduce the efficiency of the protocol.  The peer creates a checksum for each piece, using the SHA1 hashing algorithm, and records it in the torrent file.  When another peer later receives a particular piece, the checksum of the piece is compared to the recorded checksum to test that the piece is error-free.  Peers that provide a complete file are called seeders, and the peer providing the initial copy is called the initial seeder.  Torrent files have an "announce" section which specifies the URL of the tracker.  An "info" section, containing (suggested) names for the files, their lengths, the piece length used, and a SHA-1 hash code for each piece, all of which is used by clients to verify the integrity of the data they receive.  The tracker maintains lists of the clients currently participating in the torrent.  Alternatively, in a trackerless system (decentralized tracking) every peer acts as a tracker.  E.g. BitTorrent, µTorrent…

15 How does it Work? Downloading Torrents and Sharing Files  The client connects to the tracker(s) specified in the torrent file, from which it receives a list of peers currently transferring pieces of the file(s) specified in the torrent.  The client connects to those peers to obtain the various pieces.  Such a group of peers connected to each other to share a torrent is called a swarm.  If the swarm contains only the initial seeder, the client connects directly to it and begins to request pieces.  As peers enter the swarm, they begin to trade pieces with one another, instead of downloading directly from the seeder.

16 How does it Work? Downloading Torrents and Sharing Files – Cont.  Clients incorporate mechanisms to optimize their download and upload rates  for example they download pieces in a random order.  The effectiveness of this data exchange depends largely on the policies that clients use to determine to whom to send data.  Clients may prefer to send data to peers who send data back to them (a tit for tat scheme), which encourages fair trading.  But strict policies often result in suboptimal situations  Such as when newly joined peers are unable to receive any data because they don't have any pieces yet.  When two peers with a good connection between them do not exchange data simply because neither of them wants to take the initiative.  To counter these effects, the official BitTorrent client program uses a mechanism called “optimistic unchoking”  where the client reserves a portion of its available bandwidth for sending pieces to random peers In hopes of discovering even better partners and to ensure that newcomers get a chance to join the swarm.

17 Article’s Abstracts  BitTorrent is widely believed to be a tit-for-tat. This is not so.  Its model is actually an auction-based. Why?  Today BitTorrent doesn’t really provide incentives to follow the protocol.  We will show a strategy that will provide such incentives.

18 BitTorrent as an Auction For each client:  Divide time to rounds.  Measure bandwidth received from peers during each round.  Divide upload bandwidth to S equal slots.  At each round, give S-1 slots to the S-1 peers, which provided maximal bandwidth during previous round.  Give 1 slot to random peer (optimistic unchoking).

19 BitTorrent as an Auction

20 BitTorrent as an Auction – Is it Good?  The scheme is not fair – peers with different uploads receive equal downloads.  The scheme doesn’t provide incentive to provide high bandwidth – only high enough to win an auction.  It’s better to win many auctions with small unequal “bids” than honestly divide bandwidth to equal slots.

21 BitTyrant approach: last place is good enough

22 Other possible exploits  Collusion: nodes can form coalition to force other clients to accept lower bids.  Large-view exploit: try to find and use as much “optimistic unchokes” as possible.  Sybil attack: create N clones to find more (per N) “optimistic unchokes”.  Sybil attack: create N clones to obtain more slots of a single peer.

23 Collusion: Dropping Prices at the Market

24 Another exploit: Under-Reporting of Blocks  Normal BitTorrent client truthfully reports to its peers about the blocks it has.  Why would someone want to under-report his blocks (in other words, conceal some of them)?  Consider the following example. Nodes j and k have common blocks. Node i has blocks that they doesn’t have. j i k

25 Under-Reporting of Blocks – Cont.  If i was honestly reporting to j and k about all blocks it has, they would be able to download different blocks from i.  Then j and k may be able to exchange blocks between themselves, without need in i (they loose interest).  Then i would not have blocks to trade with j and k. j i k

26 Under-Reporting of Blocks – Cont.  So i has incentive to report only about a single block, which both j and k lack, and conceal the others (for a while).  The algorithm: suggest to a peer j one block in a time, which it doesn’t have and which is most common amongst other peers.  i wouldn’t want provide j with the rarest block, in order not to increase incentive of other peers to trade blocks with j instead of i.

27 Under-Reporting of Blocks – Cont.  What impact does the under-reporting strategy has?  The number of exchanges for the under-reporting node increases, so the total download time for it decreases.  But if many nodes use this strategy, the overall download time grows, because nodes don’t know which blocks to report to each other.  So under-reporting is a parasitic activity. Authors of the paper don’t have ready solution for it.

28 Proportional Share (PropShare)  Idea: instead of supplying equal bandwidth to all auction winners, give to each peer a bandwidth proportional to bandwidth it gave to us at the previous round:

29 PropShare – The Best Response  Let’s say we want to achieve maximal download rate theoretically possible for single node i in the PropShare network.  To reach this purpose, we have to solve the following optimization problem at each round t for node i:  All other nodes continue running PropShare algorithm.

30 PropShare – The Best Response  Of course, it can’t be done in practice, since node cannot know all of the bandwidths of all nodes and their allocations.  Simulations were made, in which a single node i did knew all these data and solved the problem at each round.  The experiment has shown that its download speed improved by less than 1% relatively to the other nodes.  This proves that the best strategy against nodes running PropShare algorithm is to run PropShare algorithm.

31 PropShare is Sybil-proof  PropShare algorithm also protects against Sybil attacks.  Let’s say that some node i creates N clones to attack victim v. Then the total bandwidth from v will be:  In other words, it’s the same as just to “sell” bandwidth C to v, without dividing it between clones.

32 PropShare is (more) collusion-resistant  The proportionality principle also protects against coalitions.  Consider situation when a coalition of nodes “attacks” victim v, proposing it low “prices” for its bandwidth.  If all proposals will be low enough, the victim will deliver its bandwidth to the members of the coalition.  But even if a single peer with high proposal will appear, it will receive much higher share than the members of the coalition.  In this case nodes will not have a motivation to remain in the coalition anymore.

33 Bootstrapping new nodes  What happens when a new node joins the swarm, not having any blocks to exchange in the beginning?  In today’s BitTorrent, it will look for some “optimistically unchoked” connections to obtains first blocks.  This makes the “large-view exploit” possible, when node tries to receive blocks without giving anything back.  PropShare doesn’t allow that. So how will the new nodes obtain their starting blocks?

34 Bootstrapping new nodes – The Idea  Suppose that nodes X and Y are already exchanging blocks. New node N connects to X and asks for some block to start.  X picks up a block B that Y still doesn’t have, encrypts it with symmetrical key and computes a hash on the encrypted block.  X sends the encrypted block to N, requesting that N will deliver it to Y.  At the same time, X sends the hash directly to Y, informing it about the block from N. X N Y

35 Bootstrapping new nodes – the idea  When Y receives the block from N, it computes the hash on it and compares it with the received from X.  If the hash is ok, Y sends another block in exchange directly to X, encrypting it with symmetrical key.  When X receives the block from Y, it reveals the key to N and to Y, and Y reveals the key to X. X N Y X N Y 12

36 Bootstrapping new nodes - Conclusion  In this way, N has to send as much as it receives, without any errors or tricks.  Y has to report truthfully about the received block from N, otherwise it will not receive the key from X.  So this scheme allows new nodes to obtain blocks to begin their trade with.

37 Conclusions  BitTorrent does not use tit-for-tat.  An auction-based model is more accurate.  It sheds light on new classes of strategic manipulation:  Under-reporting of pieces.  Revealing only enough to keep neighbors interested can result in prolonged interest and faster download times.  PropShare (the more you give the more you get) achieves fairness and robustness.  We have seen a bootstrapping mechanism  Can replace BitTorrent’s optimistic unchoking in favor of an approach that encourages peers to contribute to the system as soon as they join.


Download ppt "THE BITTORRENT PROTOCOL OVERVIEW BY ANATOLY RABINOVICH AND VLADIMIR OSTROVSKY Peer-to-Peer File Sharing."

Similar presentations


Ads by Google