Content Distribution Networks Costin Raiciu Advanced Topics in Distributed Systems Fall 2012.

Slides:



Advertisements
Similar presentations
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Advertisements

1 Data-Oriented Network Architecture (DONA) Scott Shenker (M. Chowla, T. Koponen, K. Lakshminarayanan, A. Ramachandran, A. Tavakoli, I. Stoica)
Information-Centric Networks05c-1 Week 5 / Paper 3 Democratizing content publication with Coral –Michael J. Freedman, Eric Freudenthal, David Mazières.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Amazon CloudFront An introductory discussion. What is Amazon CloudFront? 5/31/20122© e-Zest Solutions Ltd. Amazon CloudFront is a web service for content.
1 Server Selection & Content Distribution Networks (slides by Srini Seshan, CS CMU)
Democratizing Content Publication with Coral Mike Freedman Eric Freudenthal David Mazières New York University NSDI 2004.
1 Content Delivery Networks iBAND2 May 24, 1999 Dave Farber CTO Sandpiper Networks, Inc.
A Taxonomy and Survey of Content Delivery Networks Meng-Huan Wu 2011/10/26 1.
Spring 2003CS 4611 Content Distribution Networks Outline Implementation Techniques Hashing Schemes Redirection Strategies.
EEC-484/584 Computer Networks Lecture 6 Wenbing Zhao
Internet Networking Spring 2006 Tutorial 12 Web Caching Protocols ICP, CARP.
Cis e-commerce -- lecture #6: Content Distribution Networks and P2P (based on notes from Dr Peter McBurney © )
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
CDNs & Replication Prof. Vern Paxson EE122 Fall 2007 TAs: Lisa Fowler, Daniel Killebrew, Jorge Ortiz.
Anycast Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.
Introduction to Management Information Systems Chapter 5 Data Communications and Internet Technology HTM 304 Fall 07.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 Web Content Delivery Reading: Section and COS 461: Computer Networks Spring 2007 (MW 1:30-2:50 in Friend 004) Ioannis Avramopoulos Instructor:
Web Caching and CDNs March 3, Content Distribution Motivation –Network path from server to client is slow/congested –Web server is overloaded Web.
On-Demand Media Streaming Over the Internet Mohamed M. Hefeeda, Bharat K. Bhargava Presented by Sam Distributed Computing Systems, FTDCS Proceedings.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
AKAMAI Content Delivery Services AKAMAI Content Delivery Services CIS726 : PRESENTATION Avinash Ponugoti Avinash Ponugoti Nagarjuna Nagulapati Sathish.
Caching and Content Distribution Networks. Web Caching r As an example, we use the web to illustrate caching and other related issues browser Web Proxy.
Content Distribution Networks (CDNs) Mike Freedman COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
CSCI-1680 Web Performance and Content Distribution Based partly on lecture notes by Scott Shenker and John Jannotti Rodrigo Fonseca.
NET0183 Networks and Communications Lecture 25 DNS Domain Name System 8/25/20091 NET0183 Networks and Communications by Dr Andy Brooks.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
On the Use and Performance of Content Distribution Networks Balachander Krishnamurthy Craig Wills Yin Zhang Presenter: Wei Zhang CSE Department of Lehigh.
P2P File Sharing Systems
Content Distribution March 8, : Application Layer1.
CSE 534 – Fundamentals of Computer Networks Lecture 11: Content Delivery Networks (Over 1 billion served … each day) Based on slides by D. NEU.
1 Proceeding the Second Exercises on Computer and Systems Engineering Professor OKAMURA Laboratory. Othman Othman M.M.
IT 210 The Internet & World Wide Web introduction.
1. 1.Charting the CDNs(locating all their content and DNS servers). 2.Assessing their server availability. 3.Quantifying their world-wide delay performance.
{ Content Distribution Networks ECE544 Dhananjay Makwana Principal Software Engineer, Semandex Networks 5/2/14ECE544.
Oasis: Anycast for Any Service Michael J. Freedman Karthik Lakshminarayanan David Mazières in NSDI 2006 Presented by: Sailesh Kumar.
Thesis Proposal Data Consistency in DHTs. Background Peer-to-peer systems have become increasingly popular Lots of P2P applications around us –File sharing,
1 Computer Communication & Networks Lecture 28 Application Layer: HTTP & WWW p Waleed Ejaz
1 Application Layer Lecture 6 Imran Ahmed University of Management & Technology.
Application-Layer Anycasting By Samarat Bhattacharjee et al. Presented by Matt Miller September 30, 2002.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
HOW WEB SERVER WORKS? By- PUSHPENDU MONDAL RAJAT CHAUHAN RAHUL YADAV RANJIT MEENA RAHUL TYAGI.
CPSC 441: Multimedia Networking1 Outline r Scalable Streaming Techniques r Content Distribution Networks.
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
Application of Content Computing in Honeyfarm Introduction Overview of CDN (content delivery network) Overview of honeypot and honeyfarm New redirection.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
CDN: Content Distribution Networks  References:  CS613 textbook, “Computer Networking – A Top-Down Approach”, 6 th edition. Chapter  The text.
Making the Best of the Best-Effort Service (2) Advanced Multimedia University of Palestine University of Palestine Eng. Wisam Zaqoot Eng. Wisam Zaqoot.
Web Hosting Herng-Yow Chen. Outline How different web site can be “ virtually hosted ” on the same server, and how this affects HTTP How to make web sites.
Othman Othman M.M., Koji Okamura Kyushu University 1.
Paper Survey of DHT Distributed Hash Table. Usages Directory service  Very little amount of information, such as URI, metadata, … Storage  Data, such.
Information-Centric Networks Section # 5.3: Content Distribution Instructor: George Xylomenos Department: Informatics.
Content Distribution Network, Proxy CDN: Distributed Environment
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
Content Distribution Networks (CDNs)
John S. Otto Mario A. Sánchez John P. Rula Fabián E. Bustamante Northwestern, EECS.
Multicast in Information-Centric Networking March 2012.
Content Distribution Networks
Coral: A Peer-to-peer Content Distribution Network
Content Distribution Networks
Caching Temporary storage of frequently accessed data (duplicating original data stored somewhere else) Reduces access time/latency for clients Reduces.
CHAPTER 3 Architectures for Distributed Systems
ECE 671 – Lecture 16 Content Distribution Networks
CSE 461 HTTP and the Web.
Content Distribution Networks
Presentation transcript:

Content Distribution Networks Costin Raiciu Advanced Topics in Distributed Systems Fall 2012

Problem: making the web go faster Use case: type ebay.com in your browser – The site takes a while to load – How long do you wait before you give up and try another site? 40% of customers will wait no more than 3 seconds for a webpage to load [forrester consulting]

Why would a site take long to load? Let’s assume the content is generated quickly by the servers A webpage will have many small objects and perhaps a few larger ones What network conditions will affect download time?

The Internet Autonomous System (AS)AS Origin server Authoritative DNS Lookup Download webpage

How do we fix the web? Change the Internet architecture – rather difficult to deploy More pragmatic choice: target static web content – Basic idea: bring content closer to the clients Design principles – Design for reliability and scalability (Akamai has 120K servers today) – Limit the need for human management

CDNs are really popular Quick check

The Basic Idea of CDNs: Cache static content near the user Autonomous System (AS)AS Origin server

CDN Operation Summary Autonomous System (AS)AS Origin server Authoritative DNS CDN Nameserver Lookup Lookup delegates zone to CDN namerserver DNS Resolver 2. CDN nameserver uses resolver’s IP to find edge server that is nearest to customer (geo-location) E 3. Edge server E will serve file from local storage if it has it 4. Otherwise will fetch file from origin server

Overview of the Akamai CDN [Akamai]

Edge server functionality Located at ISPs worldwide - close to customers Maintain metadata for each served file Metadata specified by: – XML files delivered using Akamai infrastructure – Akamai specific HTTP Response Headers

Metadata stored for each file Origin server location Content path Cache control – how long should we store replica, how is it invalidated? Cache indexing – how should an URL be used to create a key for that object? Access control – who is allowed to view this file? Response to origin server failure HTTP Header alteration – rewrite headers including cookies to deal with different browsers etc.

Mapping System A scoring system creates an up-to-date topological map of Internet – Divides IP addresses into equivalence classes – Computes connectivity between these classes Implementation: – Run and parse ping and traceroutes in real time – Parse BGP data and logs

Mapping System The Real Time Mapping system creates the actual maps used to direct users to the best Akamai edge servers Runs in two steps: a)Map to cluster: select a preferred edge server cluster for each equivalence class of users b)Map to server: a low level map sends the user to a specific server within the cluster goal: maintain locality within clusters

Implementing the mapping system using DNS 1.The first request goes to generic TLD servers, which return Akamai Top Level Name Servers (TLNS) as authorities, generally with long DNS TTLs. The Akamai TLNS are globally distributed, using a mixture of IP Anycast and large clusters. 2.The next query, to an Akamai TLNS, returns delegations with shorter DNS TTLs to a number of Akamai Low Level Name Servers (LLNS). The Akamai LLNS are typically located in close network proximity to the resolving name server. 3.The final query, to an Akamai LLNS, returns edge server IP addresses based on both the cluster assignment and the low level map described above. These answers have very short TTLs so that changes to the mapping assignments (such as in response to failures or shifts in demand) can be rapidly distributed to end users.

Akamai’s Transport protocol The communications between any two Akamai servers can be optimized to overcome the inefficiencies of BGP routing Goals: – Accelerate non-cacheable content – Accelerate apps that check origin server for freshness Techniques: – Path optimizations – Protocol enhancements

Path Optimization Build and Internet overlay – Use end-to-end path quality between servers maintained by the mapping system – Move traffic onto the best performing path according to measurements or use multiple paths % performance improvements in Asia

Akamai Transport Protocol Proprietary (modified TCP) Use pools of persistent connections to avoid 3WHS Play with TCP Window size based on path conditions – E.g. increase initial cwnd when path is know to be good Set aggressive timeouts based on known path information

Coral CDN (slides adapted from Mike Freedman )

A problem… Feb 3: Google linked banner to “julia fractals” Users clicking directed to Australian University web site …University’s network link overloaded, web server taken down temporarily…

The problem strikes again! Feb 4: Slashdot ran the story about Google …Site taken down temporarily…again

The response from down under… Feb 4, later…Paul Bourke asks: “They have hundreds (thousands?) of servers worldwide that distribute their traffic load. If even a small percentage of that traffic is directed to a single server … what chance does it have?” → Help the little guy ←

Coral’s solution… Implement an open CDN Allow anybody to contribute Works with unmodified clients CDN only fetches once from origin server Origin Server Coral httpprx dnssrv Coral httpprx dnssrv Coral httpprx dnssrv Coral httpprx dnssrv Coral httpprx dnssrv Coral httpprx dnssrv Browser Pool resources to dissipate flash crowds

Using CoralCDN Rewrite URLs into “Coralized” URLs → – Directs clients to Coral, which absorbs load Who might “Coralize” URLs? – Web server operators Coralize URLs – Coralized URLs posted to portals, mailing lists – Users explicitly Coralize URLs

httpprx dnssrv Browser Resolver DNS Redirection Return proxy, preferably one near client Cooperative Web Caching CoralCDN components httpprx Fetch data from nearby ? ? Origin Server 

Functionality needed DNS: Given network location of resolver, return a proxy near the client put (network info, self) get (resolver info) → {proxies} HTTP: Given URL, find proxy caching object, preferably one nearby put (URL, self) get (URL) → {proxies}

Key Idea Use a distributed hash table – but locality is poor – So use multiple DHTs (called clusters)! – Each peer takes part in 3 clusters based on network proximity (<20ms, <60ms, all others) – Insertions are done in all DHTs – Lookups prefer “nearest” DHT A lot more details in the paper.

Coral lacks… – Central management – A priori knowledge of network topology Anybody can join system – Any special tools (e.g., BGP feeds) Coral has… – Large # of vantage points to probe topology – Distributed index in which to store network hints – Each Coral node maps nearby networks to self Challenges for DNS Redirection

Coral DNS server probes resolver Once local, stay local When serving requests from nearby DNS resolver – Respond with nearby Coral proxies – Respond with nearby Coral DNS servers → Ensures future requests remain local Else, help resolver find local Coral DNS server Coral’s DNS Redirection

Return servers within appropriate cluster – e.g., for resolver RTT = 19 ms, return from cluster < 20 ms Use network hints to find nearby servers – i.e., client and server on same subnet Otherwise, take random walk within cluster DNS measurement mechanism Resolver Browser Coral httpprx dnssrv Server probes client (2 RTTs)

References Forrester Consulting. eCommerce Web Site Performance Today: An Updated Look At Consumer Reaction To A Poor Online Shopping Experience. Aug. 17, The Akamai Network: A Platform for High-Performance Internet Applications – Erik Nygren et al. - work_overview_osr.pdf Democratizing Content Publication with Coral. Michael J. Freedman, Eric Freudenthal, and David Mazières. In Proc. 1st USENIX/ACM Symposium on Networked Systems Design and Implementation