Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Web and Content Distribution Networks Nick Feamster CS 6250 Fall 2011 (some notes from David Andersen and Christian Kauffman)

Similar presentations

Presentation on theme: "The Web and Content Distribution Networks Nick Feamster CS 6250 Fall 2011 (some notes from David Andersen and Christian Kauffman)"— Presentation transcript:

1 The Web and Content Distribution Networks Nick Feamster CS 6250 Fall 2011 (some notes from David Andersen and Christian Kauffman)

2 Outline HTTP review Persistent HTTP review HTTP caching Content distribution networks 2

3 HTTP Basics HTTP layered over bidirectional byte stream –Almost always TCP Interaction –Client sends request to server, followed by response from server to client –Requests/responses are encoded in text Stateless –Server maintains no information about past client requests 3

4 How to Mark End of Message? Size of message Content-Length –Must know size of transfer in advance Delimiter MIME-style Content-Type –Server must escape delimiter in content Close connection –Only server can do this 4

5 HTTP Request Request line –Method GET – return URI HEAD – return headers only of GET response POST – send data to the server (forms, etc.) –URL (relative) E.g., /index.html –HTTP version 5

6 HTTP Request (cont.) Request headers –Authorization – authentication info –Acceptable document types/encodings –From – user –If-Modified-Since –Referrer – what caused this page to be requested –User-Agent – client software Blank-line Body 6

7 HTTP Request (review) 7

8 HTTP Request Example GET / HTTP/1.1 Accept: */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/5.0 Host: Connection: Keep-Alive 8

9 HTTP Response Status-line –HTTP version –3 digit response code 1XX – informational 2XX – success –200 OK 3XX – redirection –301 Moved Permanently –303 Moved Temporarily –304 Not Modified 4XX – client error –404 Not Found 5XX – server error –505 HTTP Version Not Supported –Reason phrase 9

10 HTTP Response Headers –Location – for redirection –Server – server software –WWW-Authenticate – request for authentication –Allow – list of methods supported (get, head, etc) –Content-Encoding – E.g x-gzip –Content-Length –Content-Type –Expires –Last-Modified Blank-line Body 10

11 HTTP Response Example % curl -D - -o /dev/null HTTP/ OK Date: Tue, 25 Oct :50:28 GMT Server: Apache/ (Debian) X-Powered-By: PHP/ squeeze1 Set-Cookie: a1c6441c5610e cceee6bfd0ef=q3a5oq1e47ncpkn97mqbo47qm0; path=/ P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo STP IND DEM" Set-Cookie: ja_purity_tpl=ja_purity; expires=Sun, 14-Oct :50:28 GMT; path=/ Expires: Mon, 1 Jan :00:00 GMT Last-Modified: Tue, 25 Oct :50:28 GMT Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0 Pragma: no-cache Vary: Accept-Encoding Transfer-Encoding: chunked Content-Type: text/html; charset=utf-8 11

12 Outline HTTP intro and details Persistent HTTP HTTP caching Content distribution networks 12

13 HTTP 0.9/1.0 One request/response per TCP connection –Simple to implement Disadvantages –Multiple connection setups three-way handshake each time Several extra round trips added to transfer –Multiple slow starts 13

14 Single Transfer Example Client Lecture 19: Server SYN ACK DAT FIN ACK 0 RTT 1 RTT 2 RTT 3 RTT 4 RTT Server reads from disk FIN Server reads from disk Client opens TCP connection Client sends HTTP request for HTML Client parses HTML Client opens TCP connection Client sends HTTP request for image Image begins to arrive

15 More Problems Short transfers are hard on TCP –Stuck in slow start –Loss recovery is poor when windows are small Lots of extra connections –Increases server state/processing Server also forced to keep TIME_WAIT connection state -- Things to think about -- –Why must server keep these? –Tends to be an order of magnitude greater than # of active connections, why? 15

16 Persistent Connection Solution Multiplex multiple transfers onto one TCP connection How to identify requests/responses –Delimiter Server must examine response for delimiter string –Content-length and delimiter Must know size of transfer in advance –Block-based transmission send in multiple length delimited blocks –Store-and-forward wait for entire response and then use content-length –Solution use existing methods and close connection otherwise 16

17 Persistent Connection Example Client Lecture 19: Server ACK DAT ACK 0 RTT 1 RTT 2 RTT Server reads from disk Client sends HTTP request for HTML Client parses HTML Client sends HTTP request for image Image begins to arrive DAT Server reads from disk DAT

18 Persistent HTTP Nonpersistent HTTP issues: Requires 2 RTTs per object OS must work and allocate host resources for each TCP connection But browsers often open parallel TCP connections to fetch referenced objects Persistent HTTP Server leaves connection open after sending response Subsequent HTTP messages between same client/server are sent over connection Persistent without pipelining: Client issues new request only when previous response has been received One RTT for each referenced object Persistent with pipelining: Default in HTTP/1.1 Client sends requests as soon as it encounters a referenced object As little as one RTT for all the referenced objects 18

19 Outline HTTP intro and details Persistent HTTP HTTP caching Content distribution networks 19

20 HTTP Caching Clients often cache documents –Challenge: update of documents –If-Modified-Since requests to check HTTP 0.9/1.0 used just date HTTP 1.1 has an opaque entity tag (could be a file signature, etc.) as well When/how often should the original be checked for changes? –Check every time? –Check each session? Day? Etc? –Use Expires header If no Expires, often use Last-Modified as estimate 20

21 Example Cache Check Request GET / HTTP/1.1 Accept: */* Accept-Language: en-us Accept-Encoding: gzip, deflate If-Modified-Since: Mon, Oct :54:18 GMT If-None-Match: "7a11f-10ed-3a75ae4a" User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv: ) Host: Connection: Keep-Alive 21

22 Example Cache Check Response HTTP/ Not Modified Date: Mon, 24 Oct :50:51 GMT Server: Server: Apache/ (Debian) Connection: Keep-Alive Keep-Alive: timeout=15, max=100 ETag: "7a11f-10ed-3a75ae4a 22

23 Ways to cache Client-directed caching: Web Proxies Server-directed caching: Content Delivery Networks (CDNs) 23

24 Web Proxy Caches User configures browser: Web accesses via cache Browser sends all HTTP requests to cache –Object in cache: cache returns object –Else cache requests object from origin server, then returns object to client 24 client Proxy server client HTTP request HTTP response HTTP request HTTP response origin server origin server

25 Caching Example (1) Assumptions Average object size = 100,000 bits Avg. request rate from institutions browser to origin servers = 15/sec Delay from institutional router to any origin server and back to router = 2 sec Consequences Utilization on LAN = 15% Utilization on access link = 100% Total delay = Internet delay + access delay + LAN delay = 2 sec + minutes + milliseconds 25 origin servers public Internet institutional network 10 Mbps LAN 1.5 Mbps access link

26 Caching Example (2) Install cache Suppose hit rate is.4 Consequence 40% requests will be satisfied almost immediately (say 10 msec) 60% requests satisfied by origin server Utilization of access link reduced to 60%, resulting in negligible delays Weighted average of delays =.6*2 sec +.4*10msecs < 1.3 secs 26 origin servers public Internet institutional network 10 Mbps LAN 1.5 Mbps access link institutional cache

27 Problems Over 50% of all HTTP objects are uncacheable – why? Not easily solvable –Dynamic data stock prices, scores, web cams –CGI scripts results based on passed parameters Obvious fixes –SSL encrypted data is not cacheable Most web clients dont handle mixed pages well many generic objects transferred with SSL –Cookies results may be based on past data –Hit metering owner wants to measure # of hits for revenue, etc. What will be the end result? 27

28 Outline HTTP intro and details Persistent HTTP HTTP caching Content distribution networks 28

29 Content Distribution Networks (CDNs) The content providers are the CDN customers. Content replication CDN company installs hundreds of CDN servers throughout Internet –Close to users CDN replicates its customers content in CDN servers. When provider updates content, CDN updates servers 29 origin server in North America CDN distribution node CDN server in S. America CDN server in Europe CDN server in Asia

30 Content Distribution Networks & Server Selection Replicate content on many servers Challenges –How to replicate content –Where to replicate content –How to find replicated content –How to choose among know replicas –How to direct clients towards replica 30

31 Server Selection Which server? –Lowest load to balance load on servers –Best performance to improve client performance Based on Geography? RTT? Throughput? Load? –Any alive node to provide fault tolerance How to direct clients to a particular server? –As part of routing anycast, cluster load balancing Not covered –As part of application HTTP redirect –As part of naming DNS 31

32 Application-Based Content Routing HTTP supports simple way to indicate that Web page has moved (30X responses) Server receives GET request from client –Decides which server is best suited for particular client and object –Returns HTTP redirect to that server Can make informed application specific decision May introduce additional overhead multiple connection setup, name lookups, etc. OK solution in general, but… –HTTP Redirect has some flaws – especially with current browsers –Incurs many delays, which operators may really care about 32

33 Naming-Based Content Routing Client does DNS name lookup for service Name server chooses appropriate server address –A-record returned is best one for the client What information can name server base decision on? –Server load/location must be collected –Information in the name lookup request Name service client typically the local name server for client 33

34 Other Findings Each CDN performed best for at least one (NIMI) client –Why? Because of proximity? The best origin sites were better than the worst CDNs CDNs with more servers dont necessarily perform better –Note that they dont know load on servers… HTTP 1.1 improvements (parallel download, pipelined download) help a lot –Even more so for origin (non-CDN) cases –Note not all origin sites implement pipelining

35 What is a Content Distribution Network? The RFCs and Internet Drafts define a Content Distribution Network, CDN, as: Content Delivery Network or Content Distribution Network. A type of CONTENT NETWORK in which the CONTENT NETWORK ELEMENTS are arranged for more effective delivery of CONTENT to CLIENTS.

36 What is a Content Distribution Network? A CDN is an overlay network, designed to deliver content from the optimal location –In many cases, optimal does not mean geographically closest CDNs are made of distinct, geographically disparate groups of servers, with each group able to serve all content on the CDN –Servers may be separated by type –E.g. One group may serve Windows Streaming Media, another group may serve HTTP –Servers are not typically shared between media types

37 What is a Content Distribution Network Some CDNs are network-owned (Level 3, Limelight, AT&T), some are not (Akamai, Mirror Image, CacheFly, Panther Express) Network-owned CDNs have all / most of their servers in their own ASN Non-network CDNs can place servers directly in other ASNs –This means things like NetFlow will not be useful for determining traffic to/from non-network CDNs

38 The Akamai System Resulting in traffic of: 5.4 petabytes / day 790+ billion hits / day 436+ million unique clients IPs / day The Akamai EdgePlatform: 85,000+ Servers POPs 72+ Countries 950+ Networks 660+ Cities (circa April 2011)

39 How CDNs Work When content is requested from a CDN, the user is directed to the optimal server –This is usually done through the DNS, especially for non- network CDNs –It can be done though anycasting for network owned CDNs Users who query DNS-based CDNs be returned different A records for the same hostname This is called mapping The better the mapping, the better the CDN

40 How CDNs Work Example of CDN mapping –Notice the different A records for different locations: [NYC]% host CNAME A A [Boston]% host CNAME A A

41 CDN Mapping: Another Example From Atlanta –% ping PING ( ) 56(84) bytes of data. –64 bytes from ( ): icmp_req=1 ttl=54 time=1.81 ms From Boston –% ping PING ( ) 56(84) bytes of data. 64 bytes from ( ): icmp_seq=1 ttl=52 time=22.8 ms 41

42 How CDNs Work CDNs use multiple criteria to choose the optimal server –These include standard network metrics: Latency Throughput Packet loss –These also include things like CPU load on the server, HD space, network utilization, etc. Geography still counts –That whole speed-of-light thing –Should be able to solve that with the next version of ethernet…

43 Why CDNs Peer with ISPs The first and foremost reason to peer is improved performance –Since a CDN tries to serve content as close to the end user as possible, peering directly with networks (over non-congested links) obviously helps Peering gives better throughput –Removing intermediate AS hops seems to give higher peak traffic for same demand profile –Might be due to lower latency opening TCP windows faster –Might be due to lower packet loss

44 Why CDNs Peer with ISPs Redundancy –Having more possible vectors to deliver content increases reliability Burstability –During large events, having direct connectivity to multiple networks allows for higher burstability than a single connection to a transit provider Burstability is important to CDNs –One of the reasons customers use CDNs is for burstability

45 Why CDNs Peer with ISPs Peering reduces costs –Reduces transit bill (duh) Network Intelligence –Receiving BGP directly from multiple ASes helps CDNs map the Internet Backup for on-net servers –If there are servers on-net, the IX can act as a backup during downtime and overflow –Allows serving different content types

46 Why ISPs peer with CDNs Performance –CDNs and ISPs are in the same business, just on different sides - we both want to serve end users as quickly and reliably as possible –You know more about your network than any CDN ever will, so working with the CDN directly can help them deliver the content more quickly and reliably Cost Reduction –Transit savings –Possible backbone savings

47 How Non-Network CDNs use IXes CDN Servers Transit Peer Network Non-network CDNs do not have a backbone, so each IX instance is independent The CDN uses transit to pull content into the servers Content is then served to peers over the IX Origin Server IX Content

48 How CDNs use IXes Non-network CDNs usually do not announce large blocks of address space because no one location has a large number of servers –It is not uncommon to see a single /24 from a CDN at an IX This does not mean you will not see a lot of traffic –How many web servers does it take to fill a gigabit these days?

49 How well do CDNs work? S ISP Backbone ISP IX SS Site ISP SSS SS Backbone ISP Backbone ISP Hosting Center Hosting Center Sites CS CC OS C

50 Reduced latency can improve TCP performance DNS round trip TCP handshake (2 round trips) Slow-start –~8 round trips to fill DSL pipe –total 128K bytes Compare to 56 Kbytes for home page Download finished before slow-start completes Total 11 round trips Coast-to-coast propagation delay is about 15 ms –Measured RTT last night was 50ms No difference between west coast and Cornell! 30 ms improvement in RTT means 330 ms total improvement –Certainly noticeable

51 Lets look at a study Zhang, Krishnamurthy and Wills –AT&T Labs Traces taken in Sept and Jan Compared CDNs with each other Compared CDNs against non-CDN

52 Methodology Selected a bunch of CDNs –Akamai, Speedera, Digital Island Note, most of these gone now! Selected a number of non-CDN sites for which good performance could be expected –U.S. and international origin –U.S.: Amazon, Bloomberg, CNN, ESPN, MTV, NASA, Playboy, Sony, Yahoo Selected a set of images of comparable size for each CDN and non-CDN site –Compare apples to apples Downloaded images from 24 NIMI machines

53 Response Time Results (II) Including DNS Lookup Time Cumulative Probability Author conclusion: CDNs generally provide much shorter download time. About one second

54 HTTP (Summary) Simple text-based file exchange protocol –Support for status/error responses, authentication, client-side state maintenance, cache maintenance Workloads –Typical documents structure, popularity –Server workload Interactions with TCP –Connection setup, reliability, state maintenance –Persistent connections How to improve performance –Persistent connections –Caching –Replication 54

Download ppt "The Web and Content Distribution Networks Nick Feamster CS 6250 Fall 2011 (some notes from David Andersen and Christian Kauffman)"

Similar presentations

Ads by Google