Presentation on theme: "Nick Feamster CS 3251: Computer Networking I Spring 2013"— Presentation transcript:
1 Nick Feamster CS 3251: Computer Networking I Spring 2013 HTTP and the WebNick Feamster CS 3251: Computer Networking I Spring 2013
2 HTTP Caching Clients often cache documents Challenge: update of documentsIf-Modified-Since requests to checkHTTP 0.9/1.0 used just dateHTTP 1.1 has an opaque “entity tag” (could be a file signature, etc.) as wellWhen/how often should the original be checked for changes?Check every time?Check each session? Day? Etc?Use Expires headerIf no Expires, often use Last-Modified as estimateAsk: What’s the problem with just using date for IMS queries?- Web servers may generate content per-client Examples: CGI scripts, things like google customized home,vary with cookies, client IP, referrer, accept-encoding, etc.
3 Example Cache Check Request GET / HTTP/1.1Accept: */*Accept-Language: en-usAccept-Encoding: gzip, deflateIf-Modified-Since: Mon, 29 Jan :54:18 GMTIf-None-Match: "7a11f-10ed-3a75ae4a"User-Agent: Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)Host:Connection: Keep-Alive
4 Example Cache Check Response HTTP/ Not ModifiedDate: Tue, 27 Mar :50:51 GMTServer: Apache/ (Unix) (Red-Hat/Linux) mod_ssl/2.7.1 OpenSSL/0.9.5a DAV/1.0.2 PHP/4.0.1pl2 mod_perl/1.24Connection: Keep-AliveKeep-Alive: timeout=15, max=100ETag: "7a11f-10ed-3a75ae4a”
5 Ways to cache Client-directed caching: Web Proxies Server-directed caching: Content Delivery Networks (CDNs)
6 Web Proxy Caches User configures browser: Web accesses via cache Browser sends all HTTP requests to cacheObject in cache: cache returns objectElse cache requests object from origin server, then returns object to clientoriginserverProxyserverHTTP requestHTTP requestclientHTTP responseHTTP responseHTTP requestHTTP responseclientoriginserver
7 Caching Example (1) Assumptions Average object size = 100,000 bits Avg. request rate from institution’s browser to origin servers = 15/secDelay from institutional router to any origin server and back to router = 2 secConsequencesUtilization on LAN = 15%Utilization on access link = 100%Total delay = Internet delay + access delay + LAN delay= 2 sec + minutes + millisecondsoriginserverspublicInternet1.5 Mbpsaccess linkinstitutionalnetwork10 Mbps LAN
8 Caching Example (2) Possible solution Increase bandwidth of access link to, say, 10 MbpsOften a costly upgradeConsequencesUtilization on LAN = 15%Utilization on access link = 15%Total delay = Internet delay + access delay + LAN delay= 2 sec + msecs + msecsoriginserverspublicInternet10 Mbpsaccess linkinstitutionalnetwork10 Mbps LAN
9 Caching Example (3) Install cache origin Consequence servers Suppose hit rate is .4Consequence40% requests will be satisfied almost immediately (say 10 msec)60% requests satisfied by origin serverUtilization of access link reduced to 60%, resulting in negligible delaysWeighted average of delays= .6*2 sec + .4*10msecs < 1.3 secsoriginserverspublicInternet1.5 Mbpsaccess linkinstitutionalnetwork10 Mbps LANinstitutionalcache
10 Problems Over 50% of all HTTP objects are uncacheable – why? Not easily solvableDynamic data stock prices, scores, web camsCGI scripts results based on passed parametersObvious fixesSSL encrypted data is not cacheableMost web clients don’t handle mixed pages well many generic objects transferred with SSLCookies results may be based on passed dataHit metering owner wants to measure # of hits for revenue, etc.What will be the end result?
11 Content Distribution Networks (CDNs) The content providers are the CDN customers.Content replicationCDN company installs hundreds of CDN servers throughout InternetClose to usersCDN replicates its customers’ content in CDN servers. When provider updates content, CDN updates serversorigin serverin North AmericaCDN distribution nodeCDN serverin S. AmericaCDN serverin AsiaCDN serverin Europe
12 Outline HTTP intro and details Persistent HTTP HTTP caching Content distribution networks
13 Content Distribution Networks & Server Selection Replicate content on many serversChallengesHow to replicate contentWhere to replicate contentHow to find replicated contentHow to choose among know replicasHow to direct clients towards replica
14 Server Selection Which server? Lowest load to balance load on serversBest performance to improve client performanceBased on Geography? RTT? Throughput? Load?Any alive node to provide fault toleranceHow to direct clients to a particular server?As part of routing anycast, cluster load balancingNot covered As part of application HTTP redirectAs part of naming DNSAsk: What things might you want to pick the server using?How do you direct clients to a server? Anyone know how Akamai works?
15 Application BasedHTTP supports simple way to indicate that Web page has moved (30X responses)Server receives Get request from clientDecides which server is best suited for particular client and objectReturns HTTP redirect to that serverCan make informed application specific decisionMay introduce additional overhead multiple connection setup, name lookups, etc.OK solution in general, but…HTTP Redirect has some flaws – especially with current browsersIncurs many delays, which operators may really care aboutDraw diagram: Client ---> Server<---redir----> Server 2<----answer----
16 Naming Based Client does DNS name lookup for service Name server chooses appropriate server addressA-record returned is “best” one for the clientWhat information can name server base decision on?Server load/location must be collectedInformation in the name lookup requestName service client typically the local name server for client
17 How Akamai Works Clients fetch html document from primary server E.g. fetch index.html from cnn.comURLs for replicated content are replaced in htmlE.g. <img src=“http://cnn.com/af/x.gif”> replaced with <img src=“http://a73.g.akamaitech.net/7/23/cnn.com/af/x.gif”>Client is forced to resolve aXYZ.g.akamaitech.net hostname
18 How Akamai Works How is content replicated? Akamai only replicates static content (*)Modified name contains original file nameAkamai server is asked for contentFirst checks local cacheIf not in cache, requests file from primary server and caches file* (At least, the version we’re talking about today. Akamai actually lets sites write code that can run on Akamai’s servers, but that’s a pretty different beast)
19 How Akamai Works Root server gives NS record for akamai.net Akamai.net name server returns NS record for g.akamaitech.netName server chosen to be in region of client’s name serverTTL is largeG.akamaitech.net nameserver chooses server in regionShould try to chose server that has file in cache - How to choose?Uses aXYZ name and hashTTL is small why?
20 Simple Hashing Given document XYZ, we need to choose a server to use Suppose we use moduloNumber servers from 1…nPlace document XYZ on server (XYZ mod n)What happens when a servers fails? n n-1Same if different people have different measures of nWhy might this be bad?
21 Consistent Hash “view” = subset of all hash buckets that are visible Desired featuresBalanced – in any one view, load is equal across bucketsSmoothness – little impact on hash bucket contents when buckets are added/removedSpread – small set of hash buckets that may hold an object regardless of viewsLoad – across all views # of objects assigned to hash bucket is small
22 Consistent Hash – Example ConstructionAssign each of C hash buckets to random points on mod 2n circle, where, hash key size = n.Map object to random position on circleHash of object = closest clockwise bucket14Bucket1248Smoothness addition of bucket does not cause movement between existing bucketsSpread & Load small set of buckets that lie near objectBalance no bucket is responsible for large number of objects
23 How Akamai Works cnn.com (content provider) DNS root server Akamai serverGet foo.jpg1211Get index.html5123Akamai high-level DNS server647Akamai low-level DNS server8Nearby matching Akamai server910End-userGet /cnn.com/foo.jpg
25 Impact on DNS Usage DNS is used for server selection more and more What are reasonable DNS TTLs for this type of useTypically want to adapt to load changesLow TTL for A-records what about NS records?How does this affect caching?What do the first and subsequent lookup do?