Failure Resilient Routing Simple Failure Recovery with Load Balancing Martin Suchara in collaboration with: D. Xu, R. Doverspike, D. Johnson and J. Rexford.

Slides:



Advertisements
Similar presentations
EE384y: Packet Switch Architectures
Advertisements

Advanced Piloting Cruise Plot.
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
Rethinking Internet Traffic Management From Multiple Decompositions to a Practical Protocol Martin Suchara in collaboration with: J. He, M. Bresler, J.
Greening Backbone Networks Shutting Off Cables in Bundled Links Will Fisher, Martin Suchara, and Jennifer Rexford Princeton University.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
OSPF 1.
Path Splicing Nick Feamster, Murtaza Motiwala, Megan Elmore, Santosh Vempala.
UNITED NATIONS Shipment Details Report – January 2006.
and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $
Ramin Khalili (T-Labs/TUB) Nicolas Gast (LCA2-EPFL)
Scalable Routing In Delay Tolerant Networks
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
MIMO Broadcast Scheduling with Limited Feedback Student: ( ) Director: 2008/10/2 1 Communication Signal Processing Lab.
Minimum Weight Plastic Design For Steel-Frame Structures EN 131 Project By James Mahoney.
1 Praveen K. Muthuswamy Electrical Computer and Systems Engineering Rensselaer Polytechnic Institute In collaboration with Koushik Kar, Aparna Gupta (RPI)
Chapter 1: Introduction to Scaling Networks
EU market situation for eggs and poultry Management Committee 20 October 2011.
1 COPYRIGHT © 2011 ALCATEL-LUCENT. ALL RIGHTS RESERVED. On the Capacity of Wireless CSMA/CA Multihop Networks Rafael Laufer and Leonard Kleinrock Bell.
2 |SharePoint Saturday New York City
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
VOORBLAD.
15. Oktober Oktober Oktober 2012.
Constant, Linear and Non-Linear Constant, Linear and Non-Linear
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.
© 2012 National Heart Foundation of Australia. Slide 2.
Maciej Stasiak, Mariusz Głąbowski Arkadiusz Wiśniewski, Piotr Zwierzykowski Models of Links Carrying Multi-Service Traffic Chapter 7 Modeling and Dimensioning.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Januar MDMDFSSMDMDFSSS
Abhigyan, Aditya Mishra, Vikas Kumar, Arun Venkataramani University of Massachusetts Amherst 1.
Analyzing Genes and Genomes
Abdollah Khodkar Department of Mathematics University of West Georgia Joint work with Arezoo N. Ghameshlou, University of Tehran.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Energy Generation in Mitochondria and Chlorplasts
Announcements Be reading Chapters 9 and 10 HW 8 is due now.
Scalable Rule Management for Data Centers Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan 4/3/2013.
Network Architecture for Joint Failure Recovery and Traffic Engineering Martin Suchara in collaboration with: D. Xu, R. Doverspike, D. Johnson and J. Rexford.
Ashish Gupta Under Guidance of Prof. B.N. Jain Department of Computer Science and Engineering Advanced Networking Laboratory.
Rethinking Internet Traffic Management: From Multiple Decompositions to a Practical Protocol Jiayue He Princeton University Joint work with Martin Suchara,
Multipath Protocol for Delay-Sensitive Traffic Jennifer Rexford Princeton University Joint work with Umar Javed, Martin Suchara, and Jiayue He
1 Traffic Engineering By Kavitha Ganapa. 2 Introduction Traffic engineering is concerned with the issue of performance evaluation and optimization of.
Presentation transcript:

Failure Resilient Routing Simple Failure Recovery with Load Balancing Martin Suchara in collaboration with: D. Xu, R. Doverspike, D. Johnson and J. Rexford

2 Acknowledgement Special thanks to Olivier Bonaventure, Quynh Nguyen, Kostas Oikonomou, Rakesh Sinha, Robert Tarjan, Kobus van der Merwe, and Jennifer Yates. We gratefully acknowledge the support of the DARPA CORONET Program, Contract N C The views, opinions, and/or findings contained in this article/presentation are those of the author/presenter and should not be interpreted as representing the official views or policies, either expressed or implied, of the Defense Advanced Research Projects Agency or the Department of Defense.

3 Failure Recovery and Traffic Engineering in IP Networks Uninterrupted data delivery when links or routers fail Major goal: re-balance the network load after failure Failure recovery essential for Backbone network operators Large datacenters Local enterprise networks

4 Overview I.Failure recovery: the challenges II.Architecture: goals and proposed design III.Optimizations: of routing and load balancing IV.Evaluation: using synthetic and realistic topologies V.Conclusion

5 Challenges of Failure Recovery Existing solutions reroute traffic to avoid failures Can use, e.g., MPLS local or global protection Prompt failure detection Global path protection is slow Balance the traffic after rerouting Challenging with local path protection primary tunnel backup tunnel primary tunnel backup tunnel local global

6 Overview I.Failure recovery: the challenges II.Architecture: goals and proposed design III.Optimizations: of routing and load balancing IV.Evaluation: using synthetic and realistic topologies V.Conclusion

7 Architectural Goals 3.Detect and respond to failures quickly 1.Simplify the network Allow use of minimalist cheap routers Simplify network management 2.Balance the load Before, during, and after each failure

8 The Architecture – Components Management system Knows topology, approximate traffic demands, potential failures Sets up multiple paths and calculates load splitting ratios Minimal functionality in routers Path-level failure notification Static configuration No coordination with other routers

9 The Architecture topology design list of shared risks traffic demands t s fixed paths splitting ratios

10 The Architecture t s link cut fixed paths splitting ratios

11 The Architecture t s link cut fixed paths splitting ratios path probing

12 The Architecture t s link cut path probing fixed paths splitting ratios 0.5 0

13 The Architecture: Summary 1.Offline optimizations 2.Load balancing on end-to-end paths 3.Path-level failure detection How to calculate the paths and splitting ratios?

14 Overview I.Failure recovery: the challenges II.Architecture: goals and proposed design III.Optimizations: of routing and load balancing IV.Evaluation: using synthetic and realistic topologies V.Conclusion

15 Goal I: Find Paths Resilient to Failures A working path needed for each allowed failure state (shared risk link group) Example of failure states: S = {e 1 }, { e 2 }, { e 3 }, { e 4 }, { e 5 }, {e 1, e 2 }, {e 1, e 5 } e1e1 e3e3 e2e2 e4e4 e5e5 R1 R2

16 Goal II: Minimize Link Loads minimize s w s e Φ(u e s ) while routing all traffic link utilization u e s cost Φ(u e s ) aggregate congestion cost weighted for all failures: links indexed by e u e s =1 Cost function is a penalty for approaching capacity failure state weight failure states indexed by s

17 Our Three Solutions capabilities of routers congestion Suboptimal solution Solution not scalable Good performance and practical? Too simple solutions do not do well Diminishing returns when adding functionality

18 1. Optimal Solution: State Per Network Failure Edge router must learn which links failed Custom splitting ratios for each failure FailureSplitting Ratios -0.4, 0.4, 0.2 e4e4 0.7, 0, 0.3 e 1 & e 2 0, 0.6, 0.4 …… configuration: e4e4 e3e3 e1e1 e2e2 e5e5 e6e6 one entry per failure

19 1. Optimal Solution: State Per Network Failure Solve a classical multicommodity flow for each failure case s: min load balancing objective s.t. flow conservation demand satisfaction edge flow non-negativity Decompose edge flow into paths and splitting ratios Does not scale with number of potential failure states

20 2. State-Dependent Splitting: Per Observable Failure Edge router observes which paths failed Custom splitting ratios for each observed combination of failed paths FailureSplitting Ratios -0.4, 0.4, 0.2 p20.6, 0, 0.4 …… configuration: p1p1 p2p2 p3p3 NP-hard unless paths are fixed at most 2 #paths entries

21 2. State-Dependent Splitting: Per Observable Failure If paths fixed, can find optimal splitting ratios: Heuristic: use the same paths as the optimal solution min load balancing objective s.t. flow conservation demand satisfaction path flow non-negativity

22 3. State-Independent Splitting: Across All Failure Scenarios Edge router observes which paths failed Fixed splitting ratios for all observable failures p1, p2, p3: 0.4, 0.4, 0.2 configuration: Non-convex optimization even with fixed paths p1p1 p2p2 p3p3

23 3. State-Independent Splitting: Across All Failure Scenarios Heuristic to compute splitting ratios Use averages of the optimal solution weighted by all failure case weights Heuristic to compute paths The same paths as the optimal solution r i = s w s r i s fraction of traffic on the i-th path

24 Our Three Solutions 1.Optimal solution 2.State-dependent splitting 3.State-independent splitting How well do they work in practice?

25 Overview I.Failure recovery: the challenges II.Architecture: goals and proposed design III.Optimizations: of routing and load balancing IV.Evaluation: using synthetic and realistic topologies V.Conclusion

26 Simulations on a Range of Topologies Single link failures Shared risk failures for the tier-1 topology 954 failures, up to 20 links simultaneously

27 Congestion Cost – Tier-1 IP Backbone with SRLG Failures increasing load Additional router capabilities improve performance up to a point objective value network traffic State-dependent splitting indistinguishable from optimum State-independent splitting not optimal but simple How do we compare to OSPF? Use optimized OSPF link weights [Fortz, Thorup 02].

28 Congestion Cost – Tier-1 IP Backbone with SRLG Failures increasing load OSPF uses equal splitting on shortest paths. This restriction makes the performance worse. objective value network traffic OSPF with optimized link weights can be suboptimal

29 Average Traffic Propagation Delay in Tier-1 Backbone Service Level Agreements guarantee 37 ms mean traffic propagation delay Need to ensure mean delay doesnt increase much

30 Number of Paths– Tier-1 IP Backbone with SRLG Failures Number of paths almost independent of the load number of paths cdfnumber of paths For higher traffic load slightly more paths

31 Number of Paths – Various Topologies More paths for larger and more diverse topologies number of paths cdf Greatest number of paths in the tier-1 backbone

32 Overview I.Failure recovery: the challenges II.Architecture: goals and proposed design III.Optimizations: of routing and load balancing IV.Evaluation: using synthetic and realistic topologies V.Conclusion

33 Conclusion Simple mechanism combining path protection and traffic engineering Favorable properties of state-dependent splitting algorithm: Path-level failure information is just as good as complete failure information

34 Thank You!