CSCI-1680 Web Performance and Content Distribution Based partly on lecture notes by Scott Shenker and John Jannotti Rodrigo Fonseca.

Slides:



Advertisements
Similar presentations
1 Server Selection & Content Distribution Networks (slides by Srini Seshan, CS CMU)
Advertisements

On the Effectiveness of Measurement Reuse for Performance-Based Detouring David Choffnes Fabian Bustamante Fabian Bustamante Northwestern University INFOCOM.
Content Distribution Networks Costin Raiciu Advanced Topics in Distributed Systems Fall 2012.
1 Content Delivery Networks iBAND2 May 24, 1999 Dave Farber CTO Sandpiper Networks, Inc.
Spring 2003CS 4611 Content Distribution Networks Outline Implementation Techniques Hashing Schemes Redirection Strategies.
1 Caching in HTTP Representation and Management of Data on the Internet.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
CS162 Operating Systems and Systems Programming Lecture 23 HTTP and Peer-to-Peer Networks April 20, 2011 Ion Stoica
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
The Internet Useful Definitions and Concepts About the Internet.
What’s a Web Cache? Why do people use them? Web cache location Web cache purpose There are two main reasons that Web cache are used:  to reduce latency.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
CDNs & Replication Prof. Vern Paxson EE122 Fall 2007 TAs: Lisa Fowler, Daniel Killebrew, Jorge Ortiz.
1 Drafting Behind Akamai (Travelocity-Based Detouring) AoJan Su, David R. Choffnes, Aleksandar Kuzmanovic, and Fabian E. Bustamante Department of Electrical.
Internet Networking Spring 2002 Tutorial 13 Web Caching Protocols ICP, CARP.
1 Web Content Delivery Reading: Section and COS 461: Computer Networks Spring 2007 (MW 1:30-2:50 in Friend 004) Ioannis Avramopoulos Instructor:
Web Caching Schemes For The Internet – cont. By Jia Wang.
Web Caching and CDNs March 3, Content Distribution Motivation –Network path from server to client is slow/congested –Web server is overloaded Web.
The Medusa Proxy A Tool For Exploring User- Perceived Web Performance Mimika Koletsou and Geoffrey M. Voelker University of California, San Diego Proceeding.
Caching and Content Distribution Networks. Web Caching r As an example, we use the web to illustrate caching and other related issues browser Web Proxy.
Content Distribution Networks (CDNs) Mike Freedman COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Web Cache. Introduction what is web cache?  Introducing proxy servers at certain points in the network that serve in caching Web documents for faster.
Content Distribution Network (CDN) Performance Punit Shah CSE581 Internet Technologies OGI, OHSU 2002, Jan 16th.
Information-Centric Networks05a-1 Week 5 / Paper 1 On the use and performance of content distribution networks –Balachander Krishnamurthy, Craig Wills,
Web Caching and Content Delivery. Caching for a Better Web Performance is a major concern in the Web Proxy caching is the most widely used method to improve.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
DNS and CDNs (Content Distribution Networks) Paul Francis Cornell Computer Science.
1 Caching  Temporary storage of frequently accessed data (duplicating original data stored somewhere else)  Reduces access time/latency for clients 
Content Distribution March 8, : Application Layer1.
Consistent Hashing: Load Balancing in a Changing World
1 3 Web Proxies Web Protocols and Practice. 2 Topics Web Protocols and Practice WEB PROXIES  Web Proxy Definition  Three of the Most Common Intermediaries.
1. 1.Charting the CDNs(locating all their content and DNS servers). 2.Assessing their server availability. 3.Quantifying their world-wide delay performance.
{ Content Distribution Networks ECE544 Dhananjay Makwana Principal Software Engineer, Semandex Networks 5/2/14ECE544.
Web Caching: Replication on the World Wide Web Jonathan Bulava CSC8530 – Distributed Systems Dr. Paul Schragger.
Ao-Jan Su, David R. Choffnes, Fabián E. Bustamante and Aleksandar Kuzmanovic Department of EECS Northwestern University Relative Network Positioning via.
27.1 Chapter 27 WWW and HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Infrastructure for Better Quality Internet Access & Web Publishing without Increasing Bandwidth Prof. Chi Chi Hung School of Computing, National University.
October 8, 2015 University of Tulsa - Center for Information Security Microsoft Windows 2000 DNS October 8, 2015.
CSCI-1680 Web Performance, Content Distribution P2P Based partly on lecture notes by Scott Shenker and John Jannotti Rodrigo Fonseca.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Bellwether: Surrogate Services for Popular Content Duane Wessels & Ted Hardie NANOG 19 June 12, 2000.
Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.
World Wide Web Caching CS457 Seminar Yutao Zhong 11/13/2001.
HTTP support for caching & replication. Conditional requests Server executes conditional request. Responds with a message body only if the condition is.
Web Cache Consistency. “Requirements of performance, availability, and disconnected operation require us to relax the goal of semantic transparency.”
HTTP evolution - TCP/IP issues Lecture 4 CM David De Roure
Implementing ISA Server Caching
Information-Centric Networks Section # 5.3: Content Distribution Instructor: George Xylomenos Department: Informatics.
EE 122: Lecture 21 (HyperText Transfer Protocol - HTTP) Ion Stoica Nov 20, 2001 (*)
The Internet. Important Terms Network Network Internet Internet WWW (World Wide Web) WWW (World Wide Web) Web page Web page Web site Web site Browser.
1 COMP 431 Internet Services & Protocols HTTP Persistence & Web Caching Jasleen Kaur February 11, 2016.
Hiearchial Caching in Traffic Server. Hiearchial Caching  A set of techniques and mechanisms to increase the size and performance of network caches.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Content Distribution Networks (CDNs)
John S. Otto Mario A. Sánchez John P. Rula Fabián E. Bustamante Northwestern, EECS.
Multicast in Information-Centric Networking March 2012.
Squid HTTP Proxy Henrik Nordström Open Source Consultant Squid developer.
Content Distribution Networks
WWW and HTTP King Fahd University of Petroleum & Minerals
Caching Temporary storage of frequently accessed data (duplicating original data stored somewhere else) Reduces access time/latency for clients Reduces.
Ad-blocker circumvention System
Internet Networking recitation #12
ECE 671 – Lecture 16 Content Distribution Networks
COS 518: Advanced Computer Systems Lecture 9 Michael Freedman
Edge computing (1) Content Distribution Networks
CSCI-1680 Web Performance, Content Distribution P2P
EE 122: Lecture 22 (Overlay Networks)
Caching 50.5* + Apache Kafka
Presentation transcript:

CSCI-1680 Web Performance and Content Distribution Based partly on lecture notes by Scott Shenker and John Jannotti Rodrigo Fonseca

Last time HTTP and the WWW Some performance issues –Persistent Connections, Pipeline, Multiple Connections –Caching Today –More on Caching –Content Distribution Networks

Caching Why cache content? –Client (browser): avoid extra network transfers –Server: reduce load on the server –Service Provider: reduce external traffic Server Clients Backbone ISP ISP-1 ISP-2

Caching Why caching works? –Locality of reference: Users tend to request the same object in succession Some objects are popular: requested by many users Server Clients Backbone ISP ISP-1 ISP-2

How well does caching work? Very well, up to a point –Large overlap in requested objects –Objects with one access place upper bound on hit ratio Example: Wikipedia –About 400 servers, 100 are HTTP Caches (Squid) –85% Hit ratio for text, 98% for media

HTTP Cache Control Cache-Control = "Cache-Control" ":" 1#cache-directive cache-directive = cache-request-directive | cache-response-directive cache-request-directive = "no-cache" ; Section | "no-store" ; Section | "max-age" "=" delta-seconds ; Section , | "max-stale" [ "=" delta-seconds ] ; Section | "min-fresh" "=" delta-seconds ; Section | "no-transform" ; Section | "only-if-cached" ; Section | cache-extension ; Section cache-response-directive = "public" ; Section | "private" [ "=" 1#field-name ] ; Section | "no-cache" [ "=" 1#field-name ]; Section | "no-store" ; Section | "no-transform" ; Section | "must-revalidate" ; Section | "proxy-revalidate" ; Section | "max-age" "=" delta-seconds ; Section | "s-maxage" "=" delta-seconds ; Section | cache-extension ; Section cache-extension = token [ "=" ( token | quoted-string ) ]

Reverse Proxies Close to the server –Also called Accelerators –Only work for static content Clients Backbone ISP ISP-1 ISP-2 Server Reverse proxies

Forward Proxies Typically done by ISPs or Enterprises –Reduce network traffic and decrease latency –May be transparent or configured Clients Backbone ISP ISP-1 ISP-2 Server Reverse proxies Forward proxies

Content Distribution Networks Integrate forward and reverse caching –One network generally administered by one entity –E.g. Akamai Provide document caching –Pull: result from client requests –Push: expectation of high access rates to some objects Can also do some processing –Deploy code to handle some dynamic requests –Can do other things, such as transcoding

Example CDN Clients ISP-1 Server Forward proxies Backbone ISP ISP-2 CDN

How Akamai works Akamai has cache servers deployed close to clients –Co-located with many ISPs Challenge: make same domain name resolve to a proxy close to the client Lots of DNS tricks. BestBuy is a customer –Delegate name resolution to Akamai (via a CNAME) From Brown: dig ;; ANSWER SECTION: INCNAMEwww.bestbuy.com.edgesuite.net INCNAMEa1105.b.akamai.net. a1105.b.akamai.net.20INA a1105.b.akamai.net.20INA –Ping time: 2.53ms From Berkeley, CA: a1105.b.akamai.net.20INA a1105.b.akamai.net.20INA –Pint time: 3.20ms

DNS Resolution dig ;; ANSWER SECTION: INCNAMEwww.bestbuy.com.edgesuite.net INCNAMEa1105.b.akamai.net. a1105.b.akamai.net.20INA a1105.b.akamai.net.20INA ;; AUTHORITY SECTION: b.akamai.net.1101INNSn1b.akamai.net. b.akamai.net.1101INNSn0b.akamai.net. ;; ADDITIONAL SECTION: n0b.akamai.net.1267INA n1b.akamai.net.2196INA n1b.akamai.net finds an edge server close to the client’s local resolver Uses knowledge of network: BGP feeds, traceroutes. Their secret sauce…

What about the content? Say you are Akamai –Clusters of machines close to clients –Caching data from many customers –Proxy fetches data from origin server first time it sees a URL Choose cluster based on client network location How to choose server within a cluster? If you choose based on client –Low hit rate: N servers in cluster means N cache misses per URL

Straw man: modulo hashing Say you have N servers Map requests to proxies as follows: –Number servers 0 to N-1 –Compute hash of URL: h = hash (URL) –Redirect client to server #p = h mod N Keep track of load in each proxy –If load on proxy #p is too high, try again with a different hash function (or “salt”) Problem: most caches will be useless if you add or remove proxies, change value of N

Consistent Hashing [Karger et al., 99]Karger et al., 99 URLs and Caches are mapped to points on a circle using a hash function A URL is assigned to the closest cache clockwise Minimizes data movement on change! –When a cache is added, only the items in the preceding segment are moved –When a cache is removed, only the next cache is affected A B C ObjectCache 1B 2C 3C 4A

Consistent Hashing [Karger et al., 99]Karger et al., 99 Minimizes data movement –If 100 caches, add/remove a proxy invalidates ~1% of objects –When proxy overloaded, spill to successor Can also handle servers with different capacities. How? –Give bigger proxies more random points on the ring A B C ObjectCache 1B 2C 3C 4A

CoralCDN What if a content provider can’t pay a CDN? –Slashdotted servers CoralCDN is a clever response to that Say you want to access Instead, try to access What does this accomplish?

CoralCDN –Resolution controlled by the owner of nyud.net –CoralCDN runs a set of DNS servers and a set of HTTP proxies –DNS servers return an HTTP proxy close to the client The HTTP proxies form a Distributed Hash Table, mapping (url -> {proxies}) –The mapping for a URL is stored in the server found by a technique similar to consistent hashing The HTTP proxy can: 1.Return the object if stored locally 2.Fetch it from another CoralCDN proxy if stored there 3.Fetch it from the origin server 4.In case of 3 or 4, store the object locally

Summary HTTP Caching can greatly help performance –Client, ISP, and Server-side caching CDNs make it more effective –Incentives, push/pull, well provisioned –DNS and Anycast tricks for finding close servers –Consistent Hashing for smartly distributing load

Next time Justin DeBrabant will talk about Application Layer Data, or how to write your own application layer protocol…