RON: Resilient Overlay Networks David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris MIT Laboratory for Computer Science

Slides:



Advertisements
Similar presentations
Secure Routing Panel FIND PI Meeting (June 27, 2007) Morley Mao, Jen Rexford, Xiaowei Yang.
Advertisements

Path Splicing with Network Slicing Nick Feamster Murtaza Motiwala Santosh Vempala.
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Internetworking II: MPLS, Security, and Traffic Engineering
1 Network Measurements in Overlay Networks Richard Cameron Craddock School of Electrical and Computer Engineering Georgia Institute of Technology.
1 Scalability is King. 2 Internet: Scalability Rules Scalability is : a critical factor in every decision Ease of deployment and interconnection The intelligence.
Network Layer: Internet-Wide Routing & BGP Dina Katabi & Sam Madden.
1 Experimental Study of Internet Stability and Wide-Area Backbone Failure Craig Labovitz, Abha Ahuja Merit Network, Inc Presented by Changchun Zou.
Lecture 6 Overlay Networks CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.
The Structure of Networks with emphasis on information and social networks T-214-SINE Summer 2011 Chapter 8 Ýmir Vigfússon.
Reliable Distributed Systems Overlay Networks. Resilient Overlay Networks A hot new idea from MIT Shorthand name: RON Today: What’s a RON? Are these a.
The File Mover: An Efficient Data Transfer System for Grid Applications C. Anglano, M. Canonico Dipartimento di Informatica Universita' del Piemonte Orientale,
15-441: Computer Networking Lecture 26: Networking Future.
Multiple constraints QoS Routing Given: - a (real time) connection request with specified QoS requirements (e.g., Bdw, Delay, Jitter, packet loss, path.
Shivkumar KalyanaramanRensselaer Q1-1 ECSE-6600: Internet Protocols Quiz 1 Time: 60 min (strictly enforced) Points: 50 YOUR NAME: Be brief, but DO NOT.
Part III: Overlays, peer-to-peer
Resilient Overlay Networks David Anderson, Hari Balakrishnan, Frank Kaashoek and Robert Morris. MIT Laboratory for Computer Science
Yaping Zhu Advisor: Prof. Jennifer Rexford With: Andy Bavier and Nick Feamster (Georgia Tech) UFO: A Resilient Layered Routing Architecture.
Chapter 10 Introduction to Wide Area Networks Data Communications and Computer Networks: A Business User’s Approach.
E2E Routing Behavior in the Internet Vern Paxson Sigcomm 1996 Slides are adopted from Ion Stoica’s lecture at UCB.
1 Network Layer: Host-to-Host Communication. 2 Network Layer: Motivation Can we built a global network such as Internet by extending LAN segments using.
1 Routing as a Service Karthik Lakshminarayanan (with Ion Stoica and Scott Shenker) Sahara/i3 retreat, January 2004.
Tesseract A 4D Network Control Plane
Multipath Routing Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Wide-Area Traffic Management COS 597E: Software Defined Networking.
Bandwidth DoS Attacks and Defenses Robert Morris Frans Kaashoek, Hari Balakrishnan, Students MIT LCS.
The Structure of Networks with emphasis on information and social networks T-214-SINE Summer 2011 Chapter 8 Ýmir Vigfússon.
Computer Networks Layering and Routing Dina Katabi
Made with OpenOffice.org 1 TCP Multi-Home Options Arifumi Matsumoto Graduate School of Informatics, Kyoto University, Japan
Improving the Reliability of Internet Paths with One-hop Source Routing Krishna Gummadi, Harsha Madhyastha Steve Gribble, Hank Levy, David Wetherall Department.
Overlays and DHTs Presented by Dong Wang and Farhana Ashraf.
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
1 Computer Communication & Networks Lecture 22 Network Layer: Delivery, Forwarding, Routing (contd.)
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking BGP, Flooding, Multicast routing.
ON DESIGING END-USER MULTICAST FOR MULTIPLE VIDEO SOURCES Y.Nakamura, H.Yamaguchi, A.Hiromori, K.Yasumoto †, T.Higashino and K.Taniguchi Osaka University.
15-744: Computer Networking L-15 Changing the Network.
CCNA 1 Module 10 Routing Fundamentals and Subnets.
RON: Resilient Overlay Networks David Andersen, Hari Balakrishnan, Frans Kaashoek, Robert Morris MIT Laboratory for Computer Science
RON: Resilient Overlay Networks David Andersen, Hari Balakrishnan, Frans Kaashoek, Robert Morris MIT Laboratory for Computer Science
A Routing Underlay for Overlay Networks Akihiro Nakao Larry Peterson Andy Bavier SIGCOMM’03 Reviewer: Jing lu.
Resilient Overlay Networks By David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris MIT RON Paper from ACM Oct Advanced Operating.
CSCI 465 D ata Communications and Networks Lecture 15 Martin van Bommel CSCI 465 Data Communications & Networks 1.
A comparison of overlay routing and multihoming route control Hayoung OH
T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429/556 Introduction to Computer Networks Inter-domain routing Some slides used with.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan Presented.
Resilient Overlay Networks Robert Morris Frans Kaashoek and Hari Balakrishnan MIT LCS
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
Peer-peer and Application-level Networking CS 218 Fall 2003 Multicast Overlays P2P applications Napster, Gnutella, Robust Overlay Networks Distributed.
1 Chapter 4: Internetworking (IP Routing) Dr. Rocky K. C. Chang 16 March 2004.
Network Computing Laboratory Load Balancing and Stability Issues in Algorithms for Service Composition Bhaskaran Raman & Randy H.Katz U.C Berkeley INFOCOM.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Network Processing Systems Design
William Stallings Data and Computer Communications
Multi Node Label Routing – A layer 2.5 routing protocol
A Comparison of Overlay Routing and Multihoming Route Control
Kris, Karthik, Ansley, Sean, Jeremy Dick, David K, Frans, Hari
Introduction to Internet Routing
CPE 401/601 Computer Network Systems
CS 268: Computer Networking
COS 561: Advanced Computer Networks
Data and Computer Communications
Lecture 6 Overlay Networks
ECE453 – Introduction to Computer Networks
Overlay Infrastructure
COS 461: Computer Networks
Lecture 6 Overlay Networks
Exploiting Routing Redundancy via Structured Peer-to-Peer Overlays
A Comparison of Overlay Routing and Multihoming Route Control
Hari Balakrishnan Hari Balakrishnan Computer Networks
Presentation transcript:

RON: Resilient Overlay Networks David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris MIT Laboratory for Computer Science

Fault-tolerant networking Network A B C D Packet switching and route around failures

Internet: network of networks ISPs peer to forward packets ISP exchange route info using BGP ISP3 ISP1ISP2 Site 1 Site 5 Site 4 Site 3 Site 2

The Internet is ill suited to mission-critical applications Commercial peer architecture –Performance bottlenecks at peering points –Ignores many existing alternate paths –Directly conflicts with robustness Internet’s global scale: –Prevents sophisticated algorithms –Route selection uses fixed, simple metrics –Routing isn’t sensitive to path quality

How robust is Internet routing? Paxson % of all routes had serious problems Labovitz % of routes available < 95% of the time 65% of routes available < 99.9% of the time 3-min minimum detection+recovery time; often 15 mins 40% of outages took 30+ mins to repair Chandra 01 5% of faults last more than 2.75 hours

Our goal To improve communication availability for small groups by at least a factor or 10 Many applications –Collaboration and conferencing –Virtual Private Networks (VPNs) across public Internet –Overlay Internet Service

Overlay routes around Internet failures Utah Utah Company MIT Cable Modem Failures: –Outages: Configuration/operational errors, backhoes, etc. –Performance failures: Severe congestion, denial-of-service attacks, etc.

Scalability versus recovery Internet scalability pays a price: –Slow recovery RON recovers fast by –Limiting size of overlay –Exploiting redundancy in underlying Internet

Redundant links Multiple paths between all sites Utah Company Cable Modem Utah MIT Internet 2

Redundant links But many of them are hidden Utah Company Cable Modem Utah MIT

Resilient overlay networks Measure all links between nodes Compute path properties Determine best route Forward traffic over that path

RON: routing using overlays Types of failures –Outages: Configuration/operational errors, backhoes, etc. –Performance failures: Severe congestion, denial-of-service attacks, etc. Scalable BGP-based IP routing substrate Reliability via path monitoring and re-routing Reliability via path monitoring and re-routing Reliability via path monitoring and re-routing Reliability via path monitoring and re-routing

RON design Prober Router Forwarder Conduit Performance Database Application-specific routing tables Policy routing module RON library Nodes in different routing domains (ASes)

Routing and path selection Path selection at the entry node –Specialized for routing through one intermediate node Router computes the forwarding tables –Link-state dissemination through RON Path evaluation and selection –Latency minimizer: EWMA of round-trip samples –Loss-rate minimizer: average of the last k samples –Throughput optimizer: TCP throughput equation Select when estimated throughput improves by 2x 5% hysteresis to avoid flapping

Policy routing Router computes a forwarding table for each policy Two ways of describing policies: –Exclusive cliques (e.g., educational only) –General policies BPF-like packet matcher, which returns a policy Links that are denied by a policy Entry node classifies packet with a policy tag

Responding to failure Probe interval: 12 seconds Probe timeout: 3 seconds Routing update interval: 14 seconds

RON overhead Probe overhead: 69 bytes RON routing overhead: (N-1) 50: allows recovery times between 12 and 25 s 10 nodes20 nodes30 nodes40 nodes50 nodes 1.8 Kbps5.9 Kbps12 Kbps21 Kbps32 Kbps

Many research questions Does the RON approach work at all? Each RON is small in size, no more than 50 or 100 nodes –How fast can failure detection & recovery happen? Policy routing –Doesn’t RON violate AUPs and other policies? Routing behavior –Can stable routing be achieved? –Implementing efficient multi-criteria routing Is it safe to deploy a large number of (small) interacting RONs on the Internet?

IP forwarder A RON application Transparently forwards IP traffic over RON Allows comparisons of IP traffic over RON versus over direct Internet

RON deployment (19 sites).com (ca),.com (ca), dsl (or), cci (ut), aros (ut), utah.edu,.com (tx) cmu (pa), dsl (nc), nyu, cornell, cable (ma), cisco (ma), mit, vu.nl, lulea.se, ucl.uk, kaist.kr, univ-in-venezuela To vu.nl lulea.se ucl.uk To kaist.kr,.ve

AS view

Experiments Measure loss, latency, and throughput with and without RON RON1: 12 hosts in the US and Europe –64 hours of measurements in March 2001 RON2: 16 hosts –85 hours of measurements in May minute average loss rates –A 30 minute outage is very serious! Note: Experiments done with “No-Internet2-for- commercial-use” policy

Take home messages 1.RON reduced outages by a factor 5 to 10, and routed around all major outages 2.RON takes 18s (average) to route around a failure, and can do so in the face of flooding attacks 3.Single route indirection delivers the majority RON benefits

RON improves loss-rate 30-min average loss rate with RON 30-min average loss rate on Internet 13,000 samples RON loss rate never more than 30%

An order-of-magnitude fewer failures Loss Rate RON Better No Change RON Worse 10%526 [517]58 [51]47 [45] 20%142 [140]4 [3]15 [15] 30%32 [32]00 50%20 [20]00 80%14 [14]00 100% minute average loss rates 6,825 “path hours” represented here 12 “path hours” of essentially complete outage 72 “path hours” of TCP outage RON routed around all of these! One indirection hop provides almost all the benefit! 6,825 “path hours” represented here 12 “path hours” of essentially complete outage 72 “path hours” of TCP outage RON routed around all of these! One indirection hop provides almost all the benefit!

Why does one hop work? In RON testbed: –P(direct path is good) is 48.8% –P(intermediate path is good) is 51% sourcetarget RON R RON nodes Good (p) Bad (1-p) P(good path) = (1 – (1-p)^2)^(R+1)

Resilience Against DoS Attacks

Latency using RON

What’s next for RON? Data mining of collected samples Applications Routing policies (e.g., rate control)

Other progress: Chord Chord: a peer-to-peer lookup system CFS: a peer-to-peer file sharing application

Conclusion Improved availability of Internet communication paths using small overlays –Layered above scalable IP substrate –RON provides a set of libraries and programs to facilitate this application-specific routing Experimental data suggest that approach works –Over 10X availability –Outage detection and recovery in about 15 seconds –Able to route around certain denial-of-service attacks Many interesting questions remain…

Policy Routing Today, wide-area policy expression is a sledgehammer Policy control is important –From talking to some providers –E.g., rate control policy; Internet2, etc. True, RONs could violate AUPs But, the RON approach enables more flexible policies –More complex routing decisions; rate-based too –Multiple routing tables –Deeper packet inspection, etc.

Example

Throughput Improvement