Presentation is loading. Please wait.

Presentation is loading. Please wait.

RON: Resilient Overlay Networks David Andersen, Hari Balakrishnan, Frans Kaashoek, Robert Morris MIT Laboratory for Computer Science

Similar presentations


Presentation on theme: "RON: Resilient Overlay Networks David Andersen, Hari Balakrishnan, Frans Kaashoek, Robert Morris MIT Laboratory for Computer Science"— Presentation transcript:

1 RON: Resilient Overlay Networks David Andersen, Hari Balakrishnan, Frans Kaashoek, Robert Morris MIT Laboratory for Computer Science http://nms.lcs.mit.edu/ron/

2 Fault-tolerant Networking Network Any-to-any communication, routing around failures A B C D

3 The Internet Transit Mom-and-pop ISP Big ISP Really-big ISP everyone’s afraid of Peering Autonomous System (AS) BGP4 Scalability via aggressive aggregation and information hiding Commercial reality via peering & transit relationships Scalability via aggressive aggregation and information hiding Commercial reality via peering & transit relationships AS

4 How Robust is Internet Routing? 1.Slow outage detection and recovery 2.Inability to detect badly performing paths 3.Inability to efficiently leverage redundant paths 4.Inability to perform application-specific routing 5.Inability to express sophisticated routing policy Paxson 95-97 3.3% of all routes had serious problems Labovitz 97-00 10% of routes available < 95% of the time 65% of routes available < 99.9% of the time 3-min minimum detection+recovery time; often 15 mins 40% of outages took 30+ mins to repair Chandra 01 5% of faults last more than 2.75 hours

5 Our Goal To improve communication availability for small groups by at least a factor or 10 Many applications –Collaboration and conferencing –Virtual Private Networks (VPNs) across public Internet –Overlay Internet Service

6 RON: Routing Using Overlays Cooperating end-systems in different routing domains can conspire to do better than scalable wide-area protocols Types of failures –Outages: Configuration/operational errors, backhoes, etc. –Performance failures: Severe congestion, denial-of-service attacks, etc. Scalable BGP-based IP routing substrate Reliability via path monitoring and re-routing Reliability via path monitoring and re-routing Reliability via path monitoring and re-routing Reliability via path monitoring and re-routing

7 RON Design Prober Router Forwarder Conduit Link-state routing protocol, disseminates info using RON! Performance Database Application-specific routing tables Policy routing module RON library Nodes in different routing domains (ASes)

8 Many Research Questions Does the RON approach work at all? Each RON is small in size, no more than 50 or 100 nodes –How fast can failure detection & recovery happen? Policy routing –Doesn’t RON violate AUPs and other policies? Routing behavior –Can stable routing be achieved? –Implementing efficient multi-criteria routing Is it safe to deploy a large number of (small) interacting RONs on the Internet?

9 RON Deployment (19 sites).com (ca),.com (ca), dsl (or), cci (ut), aros (ut), utah.edu,.com (tx) cmu (pa), dsl (nc), nyu, cornell, cable (ma), cisco (ma), mit, vu.nl, lulea.se, ucl.uk, kaist.kr, univ-in-venezuela To vu.nl lulea.se ucl.uk To kaist.kr,.ve

10 RON Experiments Measure loss, latency, and throughput with and without RON 13 hosts in the US and Europe 3 days of measurements from data collected in March 2001 30-minute average loss rates –A 30 minute outage is very serious! Note: Experiments done with “No-Internet2- for-commercial-use” policy

11 RON greatly improves loss-rate 30-min average loss rate with RON 30-min average loss rate on Internet 13,000 samples RON loss rate never more than 30%

12 An order-of-magnitude fewer failures Loss Rate RON Better No Change RON Worse 10%4795747 20%127415 30%3200 50%2000 80%1400 100%1000 30-minute average loss rates 6,825 “path hours” represented here 12 “path hours” of essentially complete outage 76 “path hours” of TCP outage RON routed around all of these! One indirection hop provides almost all the benefit! 6,825 “path hours” represented here 12 “path hours” of essentially complete outage 76 “path hours” of TCP outage RON routed around all of these! One indirection hop provides almost all the benefit!

13 Resilience Against DoS Attacks

14 Policy Routing Today, wide-area policy expression is a sledgehammer Policy control is important –From talking to some providers –E.g., rate control policy; Internet2, etc. True, RONs could violate AUPs But, the RON approach enables more flexible policies –More complex routing decisions; rate-based too –Multiple routing tables –Deeper packet inspection, etc.

15 Example

16 Throughput Improvement

17 Conclusion Improved availability of Internet communication paths using small overlays –Layered above scalable IP substrate –RON provides a set of libraries and programs to facilitate this application-specific routing Experimental data suggest that this approach works –Over 10X availability –Outage detection and recovery in about 15 seconds –Able to route around certain denial-of-service attacks Many interesting questions remain… http://nms.lcs.mit.edu/ron/


Download ppt "RON: Resilient Overlay Networks David Andersen, Hari Balakrishnan, Frans Kaashoek, Robert Morris MIT Laboratory for Computer Science"

Similar presentations


Ads by Google