Presentation is loading. Please wait.

Presentation is loading. Please wait.

Content Distribution March 6, 2012 2: Application Layer1.

Similar presentations


Presentation on theme: "Content Distribution March 6, 2012 2: Application Layer1."— Presentation transcript:

1 Content Distribution March 6, 2012 2: Application Layer1

2 Contents r P2P architecture and benefits r P2P content distribution r Content distribution network (CDN) 2: Application Layer2

3 3 Pure P2P architecture r no always-on server r arbitrary end systems directly communicate r peers are intermittently connected and change IP addresses r Three topics:  File distribution  Searching for information  Case Study: Skype peer-peer

4 2: Application Layer4 File Distribution: Server-Client vs P2P Question : How much time to distribute file from one server to N peers? usus u2u2 d1d1 d2d2 u1u1 uNuN dNdN Server Network (with abundant bandwidth) File, size F u s : server upload bandwidth u i : peer i upload bandwidth d i : peer i download bandwidth

5 2: Application Layer5 File distribution time: server-client usus u2u2 d1d1 d2d2 u1u1 uNuN dNdN Server Network (with abundant bandwidth) F r server sequentially sends N copies:  NF/u s time r client i takes F/d i time to download increases linearly w.r.t. N (for large N) = d cs = max { NF/u s, F/min(d i ) } i Time to distribute F to N clients using client/server approach

6 2: Application Layer6 File distribution time: P2P usus u2u2 d1d1 d2d2 u1u1 uNuN dNdN Server Network (with abundant bandwidth) F r server must send one copy: F/u s time r client i takes F/d i time to download r NF bits must be downloaded (aggregate)  fastest possible upload rate: u s +  u i d P2P = max { F/u s, F/min(d i ), NF/(u s +  u i ) } i

7 2: Application Layer7 Server-client vs. P2P: example Client upload rate = u, F/u = 1 hour, u s = 10u, d min ≥ u s Client server ~ NF/u s vs. P2P ~ NF/(u s +  u i )

8 Contents r P2P architecture and benefits r P2P content distribution r Content distribution network (CDN) 2: Application Layer8

9 P2P content distribution issues r Issues  Group management and data search  Reliable and efficient file exchange  Security/privacy/anonymity/trust r Approaches for group management and data search (i.e., who has what?)  Centralized (e.g., BitTorrent tracker)  Unstructured (e.g., Gnutella)  Structured (Distributed Hash Tables [DHT]) 2: Application Layer9

10 Centralized model (Napster) original “Napster” design 1) when peer connects, it informs central server:  IP address  content 2) Alice queries for “Hey Jude”; server notifies that Bob has the file.. 3) Alice requests file from Bob centralized directory server peers Alice Bob 1 1 1 1 2 3 2: Application Layer10 Q: “Hey Jude” A: Bob has it

11 Centralized model BobAlice JaneJudy file transfer is decentralized, but locating content is highly centralized 2: Application Layer11

12 Centralized model r Benefits:  Low per-node state  Limited bandwidth usage  Short search time  High success rate  Fault tolerant r Drawbacks:  Single point of failure  Limited scale  Possibly unbalanced load r copyright infringement (?) BobAlice JaneJudy 2: Application Layer12

13 2: Application Layer13 File distribution: BitTorrent tracker: tracks peers participating in torrent torrent: group of peers exchanging chunks of a file obtain a list of peers trading chunks peer r P2P file distribution

14 2: Application Layer14 BitTorrent (1) r file divided into 256KB chunks. r peer joining torrent:  has no chunks, but will accumulate them over time  registers with tracker to get list of peers, connects to subset of peers (“neighbors”) r while downloading, peer uploads chunks to other peers. r peers may come and go r once peer has entire file, it may (selfishly) leave or (altruistically) remain

15 2: Application Layer15 BitTorrent (2) Pulling Chunks r at any given time, different peers have different subsets of file chunks r periodically, a peer (Alice) asks each neighbor for a list of chunks that it has. r Alice sends requests for her missing chunks  rarest first Sending Chunks: tit-for-tat r Alice sends chunks to four neighbors currently sending her chunks at the highest rate  re-evaluate top 4 every 10 secs r every 30 secs: randomly select another peer, starts sending chunks  newly chosen peer may join top 4  “optimistically unchoke”

16 2: Application Layer16 BitTorrent: Tit-for-tat (1) Alice “optimistically unchokes” Bob (2) Alice becomes one of Bob’s top-four providers; Bob reciprocates (3) Bob becomes one of Alice’s top-four providers With higher upload rate, can find better trading partners & get file faster!

17 2: Application Layer17 P2P Case study: Skype r inherently P2P: pairs of users communicate. r proprietary application-layer protocol (inferred via reverse engineering) r hierarchical overlay with super nodes (SNs) r Index maps usernames to IP addresses; distributed over SNs Skype clients (SC) Supernode (SN) Skype login server

18 2: Application Layer18 Peers as relays r Problem when both Alice and Bob are behind “NATs”.  NAT prevents an outside peer from initiating a call to insider peer r Solution:  Using Alice’s and Bob’s SNs, Relay is chosen  Each peer initiates session with relay.  Peers can now communicate through NATs via relay

19 Contents r P2P architecture and benefits r P2P content distribution r Content distribution network (CDN) 2: Application Layer19

20 Why Content Networks? r More hops between client and Web server  more congestion! r Same data flowing repeatedly over links between clients and Web server S C1 C4 C2 C3 - IP router Slides from http://www.cis.udel.edu/~iyengar/courses/Overlays.ppt 2: Application Layer20

21 Why Content Networks? r Origin server is bottleneck as number of users grows r Flash Crowds (for instance, Sept. 11) r The Content Distribution Problem: Arrange a rendezvous between a content source at the origin server (www.cnn.com) and a content sink (us, as users) Slides from http://www.cis.udel.edu/~iyengar/courses/Overlays.ppt 2: Application Layer21

22 Example: Web Server Farm r Simple solution to the content distribution problem: deploy a large group of servers r Arbitrate client requests to servers using an “intelligent” L4-L7 switch r Pretty widely used today L4-L7 Switch Request from grad.umd.edu Request from ren.cis.udel.edu Request from ren.cis.udel.edu Request from grad.umd.edu www.cnn.com (Copy 1) www.cnn.com (Copy 3) www.cnn.com (Copy 2) 2: Application Layer22

23 Example: Caching Proxy r Majorly motivated by ISP business interests – reduction in bandwidth consumption of ISP from the Internet r Reduced network traffic r Reduced user perceived latency Client ren.cis.udel.edu Client merlot.cis.ud el.edu Intercepters Proxy www.cnn.com Internet TCP port 80 traffic Other traffic ISP 2: Application Layer23

24 But on Sept. 11, 2001 2: Application Layer24 Web Server www.cnn.com User mslab.kaist.ac.kr 1000,000 other hosts 1000,000 other hosts New Content WTC News! old content request - Caching Proxy ISP - Congestion / Bottleneck

25 Problems with discussed approaches: Server farms and Caching proxies r Server farms do nothing about problems due to network congestion r Caching proxies serve only their clients, not all users on the Internet r Content providers (say, Web servers) cannot rely on existence and correct implementation of caching proxies r Accounting issues with caching proxies.  For instance, www.cnn.com needs to know the number of hits to the webpage for advertisements displayed on the webpage 2: Application Layer25

26 Again on Sept. 11, 2001 with CDN 2: Application Layer26 Web Server www.cnn.com User mslab.kaist.ac.kr New Content WTC News! request new content 1000,000 other users 1000,000 other users - Surrogate - Distribution Infrastructure FL IL DE NY MA MI CA WA

27 Web replication - CDNs r Overlay network to distribute content from origin servers to users r Avoids large amount of same data repeatedly traversing potentially congested links on the Internet r Reduces Web server load r Reduces user perceived latency r Tries to route around congested networks 2: Application Layer27

28 CDN vs. Caching Proxies r Caches are used by ISPs to reduce bandwidth consumption, CDNs are used by content providers to improve quality of service to end users r Caches are reactive, CDNs are proactive r Caching proxies cater to their users (web clients) and not to content providers (web servers), CDNs cater to the content providers (web servers) and clients r CDNs give control over the content to the content providers, caching proxies do not 2: Application Layer28

29 CDN Architecture Surrogate Request Routing Infrastructure Distribution & Accounting Infrastructure CDN Origin Server Client 2: Application Layer29

30 CDN Components r Distribution Infrastructure:  Moving or replicating content from content source (origin server, content provider) to surrogates r Request Routing Infrastructure:  Steering or directing content request from a client to a suitable surrogate r Content Delivery Infrastructure:  Delivering content to clients from surrogates r Accounting Infrastructure:  Logging and reporting of distribution and delivery activities 2: Application Layer30

31 Server Interaction with CDN Distribution Infrastructure 1 1. Origin server pushes new content to CDN OR CDN pulls content from origin server Accounting Infrastructure 2 2. Origin server requests logs and other accounting info from CDN OR CDN provides logs and other accounting info to origin server CDN Origin Server www.cnn.com 2: Application Layer31

32 Request Routing Infrastructure Client Interaction with CDN 1 1. Hi! I need www.cnn.com/sept11 2 2. Go to surrogate newyork.cnn.akamai.com 3 3. Hi! I need content /sept11 Q: How did the CDN choose the New York surrogate over the California surrogate ? Client Surrogate (NY) Surrogate (CA) CDN california.cnn.akamai.com newyorkcnn.akamai.com 2: Application Layer32

33 Request Routing Techniques r Request routing techniques use a set of metrics to direct users to “best” surrogate r Proprietary, but underlying techniques known:  DNS based request routing  Content modification (URL rewriting)  Anycast based (how common is anycast?)  URL based request routing  Transport layer request routing  Combination of multiple mechanisms 2: Application Layer33

34 DNS based Request-Routing r Common due to the ubiquity of DNS as a directory service r Specialized DNS server inserted in a DNS resolution process r DNS server is capable of returning a different set of A, NS or CNAME records based on policies/metrics 2: Application Layer34

35 DNS based Request-Routing Akamai DNS DNS query: www.cnn.com DNS response: A 145.155.10.15 Session local DNS server (dns.nyu.edu) 128.4.4.12 1) DNS query: www.cnn.com DNS response: A 145.155.10.15 www.cnn.com Surrogate 145.155.10.15 Surrogate 58.15.100.152 Akamai CDN test.nyu.edu 128.4.30.15 newyork.cnn.akamai.com california.cnn.akamai.com newyork.cnn.akamai.com Q: How does the Akamai DNS know which surrogate is closest ? 2: Application Layer35

36 DNS based Request-Routing DNS query Akamai DNS www.cnn.com Surrogate Akamai CDN test.nyu.edu 128.4.30.15 local DNS server (dns.nyu.edu) 128.4.4.12 DNS query Measure to Client DNS Measure to Client DNS Measurement results Measurements 2: Application Layer36

37 DNS based Request-Routing www.cnn.com Client DNS 76.43.32.4 Surrogate 145.155.10.15 Surrogate 58.15.100.152 Akamai DNS Akamai CDN Client 76.43.35.53 Requesting DNS - 76.43.32.4 Surrogate - 145.155.10.15 www.cnn.com A 145.155.10.15 TTL = 10s Requesting DNS - 76.43.32.4 Available Bandwidth = 10 kbps RTT = 10 ms Requesting DNS - 76.43.32.4 Available Bandwidth = 5 kbps RTT = 100 ms 2: Application Layer37

38 DNS based Request Routing: Discussion r Originator Problem: Client may be far removed from client DNS r Client DNS Masking Problem: Virtually all DNS servers, except for root DNS servers honor requests for recursion Q: Which DNS server resolves a request for test.nyu.edu? Q: Which DNS server performs the last recursion of the DNS request? r Hidden Load Factor: A DNS resolution may result in drastically different load on the selected surrogate – issue in load balancing requests, and predicting load on surrogates 2: Application Layer38

39 Summary r P2P architecture and its benefits r P2P content distribution  BitTorrent, Skype r Content distribution network (CDN)  DNS-based request routing 2: Application Layer39

40 Distributed Hash Table (DHT) r DHT = distributed P2P database r Database has (key, value) pairs;  key: ss number; value: human name  key: content type; value: IP address r Peers query DB with key  DB returns values that match the key r Peers can also insert (key, value) peers 2: Application Layer40

41 DHT Identifiers r Assign integer identifier to each peer in range [0,2 n -1].  Each identifier can be represented by n bits. r Require each key to be an integer in same range. r To get integer keys, hash original key.  eg, key = h(“Led Zeppelin IV”)  This is why they call it a distributed “hash” table 2: Application Layer41

42 How to assign keys to peers? r Central issue:  Assigning (key, value) pairs to peers. r Rule: assign key to the peer that has the closest ID. r Convention in lecture: closest is the immediate successor of the key. r Ex: n=4; peers: 1,3,4,5,8,10,12,14;  key = 13, then successor peer = 14  key = 15, then successor peer = 1 2: Application Layer42

43 1 3 4 5 8 10 12 15 Chord (a circular DHT) (1) r Each peer only aware of immediate successor and predecessor. r “Overlay network” 2: Application Layer43

44 Chord (a circular DHT) (2) 0001 0011 0100 0101 1000 1010 1100 1111 Who’s resp for key 1110 ? I am O(N) messages on avg to resolve query, when there are N peers 1110 Define closest as closest successor 2: Application Layer44

45 Chord (a circular DHT) with Shortcuts r Each peer keeps track of IP addresses of predecessor, successor, short cuts. r Reduced from 6 to 2 messages. r Possible to design shortcuts so O(log N) neighbors, O(log N) messages in query 1 3 4 5 8 10 12 15 Who’s resp for key 1110? 2: Application Layer45

46 Peer Churn r Peer 5 abruptly leaves r Peer 4 detects; makes 8 its immediate successor; asks 8 who its immediate successor is; makes 8’s immediate successor its second successor. r What if peer 13 wants to join? 1 3 4 5 8 10 12 15 To handle peer churn, require each peer to know the IP address of its two successors. Each peer periodically pings its two successors to see if they are still alive. 2: Application Layer46


Download ppt "Content Distribution March 6, 2012 2: Application Layer1."

Similar presentations


Ads by Google