Improving the Reliability of Internet Paths with One-hop Source Routing Krishna Gummadi, Harsha Madhyastha Steve Gribble, Hank Levy, David Wetherall Department.

Slides:



Advertisements
Similar presentations
Click to continue Network Protocols. Click to continue Networking Protocols A protocol defines the rules of procedures, which computers must obey when.
Advertisements

Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck.
Characterizing Residential Broadband Networks Marcel Dischinger †, Andreas Haeberlen †‡, Krishna P. Gummadi †, Stefan Saroiu* † MPI-SWS, ‡ Rice University,
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
Lecture 6 Overlay Networks CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.
On the Effectiveness of Measurement Reuse for Performance-Based Detouring David Choffnes Fabian Bustamante Fabian Bustamante Northwestern University INFOCOM.
PERSISTENT DROPPING: An Efficient Control of Traffic Aggregates Hani JamjoomKang G. Shin Electrical Engineering & Computer Science UNIVERSITY OF MICHIGAN,
The War Between Mice and Elephants LIANG GUO, IBRAHIM MATTA Computer Science Department Boston University ICNP (International Conference on Network Protocols)
Multi-Layer Analysis of Web Browsing Performance for Wireless PDAs Adesola Omotayo & Carey Williamson June 1, 2015.
King : Estimating latency between arbitrary Internet end hosts Krishna Gummadi, Stefan Saroiu Steven D. Gribble University of Washington Presented by:
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Informed Detour Selection Helps Reliability Boulat A. Bash.
Available bandwidth measurement as simple as running wget D. Antoniades, M. Athanatos, A. Papadogiannakis, P. Markatos Institute of Computer Science (ICS),
Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.
1 Drafting Behind Akamai (Travelocity-Based Detouring) AoJan Su, David R. Choffnes, Aleksandar Kuzmanovic, and Fabian E. Bustamante Department of Electrical.
Internet In A Slice Andy Bavier CS461 Lecture.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #12 LSNAT - Load Sharing NAT (RFC 2391)
Network Measurement Bandwidth Analysis. Why measure bandwidth? Network congestion has increased tremendously. Network congestion has increased tremendously.
Internet Basics.
Characterizing Residential Broadband Networks Marcel Dischinger †, Andreas Haeberlen †‡, Krishna P. Gummadi †, Stefan Saroiu* † MPI-SWS, ‡ Rice University,
Lecture 22 Page 1 Advanced Network Security Other Types of DDoS Attacks Advanced Network Security Peter Reiher August, 2014.
A measurement study of vehicular internet access using in situ Wi-Fi networks Vladimir Bychkovsky, Bret Hull, Allen Miu, Hari Balakrishnan, and Samuel.
Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.
SOAR: Simple Opportunistic Adaptive Routing Protocol for Wireless Mesh Networks Authors: Eric Rozner, Jayesh Seshadri, Yogita Ashok Mehta, Lili Qiu Published:
1 Meeyoung Cha, Sue Moon, Chong-Dae Park Aman Shaikh Placing Relay Nodes for Intra-Domain Path Diversity To appear in IEEE INFOCOM 2006.
1 IP: putting it all together Part 2 G53ACC Chris Greenhalgh.
CS An Overlay Routing Scheme For Moving Large Files Su Zhang Kai Xu.
End-to-end QoE Optimization Through Overlay Network Deployment Bart De Vleeschauwer, Filip De Turck, Bart Dhoedt and Piet Demeester Ghent University -
ARP Under Abnormal Conditions. Experiment with the browser (1) arp -n # see what it there Open a browser on your personal workstation browse to
ACM 511 Chapter 2. Communication Communicating the Messages The best approach is to divide the data into smaller, more manageable pieces to send over.
Advanced Computer Networks1 Efficient Policies for Carrying Traffic Over Flow-Switched Networks Anja Feldmann, Jenifer Rexford, and Ramon Caceres Presenters:
Submitted by: Shailendra Kumar Sharma 06EYTCS049.
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
6.1. Transport Control Protocol (TCP) It is the most widely used transport protocol in the world. Provides reliable end to end connection between two hosts.
RON: Resilient Overlay Networks David Andersen, Hari Balakrishnan, Frans Kaashoek, Robert Morris MIT Laboratory for Computer Science
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
Routers and Routing Basics CCNA 2 Chapter 10.
RON: Resilient Overlay Networks David Andersen, Hari Balakrishnan, Frans Kaashoek, Robert Morris MIT Laboratory for Computer Science
Tony McGregor RIPE NCC Visiting Researcher The University of Waikato DAR Active measurement in the large.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.
Web Services. 2 Internet Collection of physically interconnected computers. Messages decomposed into packets. Packets transmitted from source to destination.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
A Reliability-oriented Transmission Service in Wireless Sensor Networks Yunhuai Liu, Yanmin Zhu and Lionel Ni Computer Science and Engineering Hong Kong.
Reading TCP/IP Protocol. Training target: Read the following reading materials and use the reading skills mentioned in the passages above. You may also.
Improving Fault Tolerance in AODV Matthew J. Miller Jungmin So.
Network Computing Laboratory Load Balancing and Stability Issues in Algorithms for Service Composition Bhaskaran Raman & Randy H.Katz U.C Berkeley INFOCOM.
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
Placing Relay Nodes for Intra-Domain Path Diversity Meeyoung Cha Sue Moon Chong-Dae Park Aman Shaikh Proc. of IEEE INFOCOM 2006 Speaker 游鎮鴻.
© 2006 Andreas Haeberlen, MPI-SWS 1 Monarch: A Tool to Emulate Transport Protocol Flows over the Internet at Large Andreas Haeberlen MPI-SWS / Rice University.
Drafting Behind Akamai (Travelocity-Based Detouring) Ao-Jan Su, David R. Choffnes, Aleksandar Kuzmanovic and Fabián E. Bustamante Department of EECS Northwestern.
PlanetSeer: Internet Path Failure Monitoring and Characterization in Wide-Area Services Ming Zhang, Chi Zhang Vivek Pai, Larry Peterson, Randy Wang Princeton.
Scaling the Network: The Internet Protocol
Vocabulary Prototype: A preliminary sketch of an idea or model for something new. It’s the original drawing from which something real might be built or.
A Comparison of Overlay Routing and Multihoming Route Control
Vocabulary Prototype: A preliminary sketch of an idea or model for something new. It’s the original drawing from which something real might be built or.
CPE 401/601 Computer Network Systems
Providing Secure Storage on the Internet
Anupam Das , Nikita Borisov
Web Design & Development
Lecture 6 Overlay Networks
A tool for locating QoS failures on an Internet path
CS4470 Computer Networking Protocols
Scaling the Network: The Internet Protocol
Lecture 6 Overlay Networks
EE 122: Lecture 22 (Overlay Networks)
2019/5/13 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Peng Wang, George Trimponias, Hong Xu,
A Comparison of Overlay Routing and Multihoming Route Control
Achieving Resilient Routing in the Internet
Hari Balakrishnan Hari Balakrishnan Computer Networks
Impact of transmission errors on TCP performance
Presentation transcript:

Improving the Reliability of Internet Paths with One-hop Source Routing Krishna Gummadi, Harsha Madhyastha Steve Gribble, Hank Levy, David Wetherall Department of Computer Science and Engineering University of Washington Seattle, WA

Reliability of Internet paths Enormous interest in understanding Internet path reliability Proposals to improve reliability using indirection routing –RON, Detour Current implementations maintain complex overlays that do not scale

This talk What are the failure characteristics of Internet paths? What do they imply about reliability benefits of indirection routing? Can a simple, stateless, scalable scheme realize these benefits? What benefits would end-users see in practice? –for a real-application, such as Web browsing

Outline Introduction Measurement study of Internet path failures One-hop source routing An implementation study of SOSR Conclusions

Measurement study of path failures We conducted a week long measurement study –probed 3,153 destinations from 67 Planetlab sites –each destination is probed from exactly one node Our goal is to answer the following: –How often do paths fail? –Where do failures occur? –How long do failures last?

Choosing destinations We want to understand how the network paths to servers and broadband hosts differ –it has implications for different workloads/apps Web transfers between servers and broadband hosts VOIP apps between broadband hosts We chose 3153 destinations: –378 popular web servers –1,139 broadband hosts –1,636 randomly selected IPs

Detecting path failures Each probe (response) is a TCP ACK (RST) packet –default probe frequency: one every 15 seconds Upon a single probe response loss, we: –increase probe frequency: one every 5 seconds till 10 consecutive probe responses are received –perform traceroute to detect failure location A path fails when 3 consecutive probes and traceroute fail

How often do paths fail? Failures do happen, but not frequently –on average each path sees 6 failures/week –server paths see 4 failures/week –broadband paths see 7 failures/week Most paths see at least one failure in a week –85% of all paths –78% of server paths –88% of broadband paths

Categories of failure locations Categories help distinguish between core and edge failures SourceDestination Local ISP Tier1 ISP source_side core dst_side last_hop

Where do paths fail? Server path failures occur throughout the network –very few (16%) last_hop failures –suggests network is the dominating cause for server unavailability

Where do paths fail? Most of the broadband failures happen on last_hop Excluding last_hop, server and broadband paths see similar number of failures

How long do failures last? Failure durations are highly skewed Majority of failures are short –median failure duration: 1-2 min for all paths –median path availability: 99.9% for all paths A non-negligible fraction of paths see long failures –tend to occur on last_hop –mean path availabilty: 99.6% for servers and 94.4% for broadband

Implications for indirection routing Failures happen often enough that they are worth fixing But, they are rare enough that recovery schemes should be inexpensive under normal conditions Failures near the end-nodes limit the performance of indirection routing –good news: servers see very few failures near end hosts –bad news: broadband hosts see many last_hop failures

Outline Introduction Measurement study of Internet path failures One-hop source routing An implementation study of SOSR Conclusions

One-hop source routing Use default path under normal conditions When default path fails, source attempts to recover by routing through an intermediary src dst X intermediate

Our goals Understand the potential reliability benefits of one- hop source routing Design a simple stateless, scalable scheme to realize this potential

Evaluating one-hop source routing For each path failure during the week-long trace –we sent probes via intermediaries at 39 Planetlab sites Compared the success of probes along default and intermediate paths –estimate the maximum potential of any one-hop scheme –estimate success rate of specific one-hop scheme

A failure is recoverable if any of the 39 intermediaries help Server failures more recoverable than broadband Almost all Internet core failures can be avoided through one-hop routing Potential of any one-hop routing scheme percent of failures that are recoverable serversbroadband src_side54%55% core92%90% dst_side79%66% last_hop41%12% all66%39%

What fraction of intermediaries help in recovery? For most failures, > half of the intermediaries avoid the failure All we need to do is find one of them! Suggests that a randomly selected intermediary might work 22 75%

How effective is a random policy? Random-k: Pick K intermediaries at random Random-4 delivers near-optimal success rate –requires no a priori probing or state

Recovery latency with random-4 Random-4 either helps early or not at all –nearly 60% failures recovered in 5-10 seconds After that, we have to wait for paths to self-repair So, initiate and abandon recovery early Server failures

Outline Introduction Measurement study of Internet path failures One-hop source routing An implementation study of SOSR Conclusions

SOSR implementation Validate random-4 policy in practice using a real application, Web browsing SOSR: Scalable One-hop Source Routing Implemented in linux –transparent to destinations (NAT on intermediate nodes) –transparent to applications on source node (netfilter) –extensible (can plug in policies)

Evaluating SOSR implementation Ran two clients one with and another without SOSR –repeatedly fetched Web pages from 982 popular servers –both machines located at UW –one request per second over 3 days Client 1: default wget command-line web browser Client 2: default wget + SOSR with random-4 policy –deployed intermediaries on 39 Planetlab nodes

User perceived benefits of SOSR wget succeeds 99.8% of time –Web seems pretty reliable A SOSR user sees only 20% fewer failures! –not clear whether SOSR matters for Web requestsfailures wget273, wget SOSR 273,978383

User perceived benefits of SOSR SOSR recovers from 56% of network failures But, can’t recover from application failures 62% of wget + SOSR failures are application related network level failures application level failures HTTP error codes TCP refused HTTP refused HTTP timeout wget wget SOSR

Conclusions What are the failure characteristics of Internet paths? –failures do happen, but they are short and infrequent –many occur on last_hop for broadband paths What do they imply about reliability benefits of indirection routing? –recovery must be cheap in the common case –one-hop source routing recovers from 66% of server and 39% of broadband path failures

Conclusions Can a simple, stateless, scalable scheme realize these benefits? –random-4 realizes the potential of any one-hop scheme –no cost in common case –no a priori probing or state needed What benefits would end-users see in practice for real applications? –Web users see only 20% fewer failures –many application-level failures

Conclusions Is indirection routing useful or not? –pessimistic view: not for the Web –optimistic view: perhaps for other applications, like VoIP

For more information Visit our research group website: