Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH

Slides:



Advertisements
Similar presentations
Introduction to Algorithms
Advertisements

The following 5 questions are about VOLTAGE DIVIDERS. You have 20 seconds for each question What is the voltage at the point X ? A9v B5v C0v D10v Question.
Network Layer Delivery Forwarding and Routing
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switch (Borrowed from Isaac Keslassys Defense Talk) Nick McKeown Professor of Electrical Engineering.
EE384y: Packet Switch Architectures
1 UNIT I (Contd..) High-Speed LANs. 2 Introduction Fast Ethernet and Gigabit Ethernet Fast Ethernet and Gigabit Ethernet Fibre Channel Fibre Channel High-speed.
Inside the binary adder. Electro-mechanical relay A solid state relay is a switch that is controlled by a current. When current flows from A to B, the.
EdgeNet2006 Summit1 Virtual LAN as A Network Control Mechanism Tzi-cker Chiueh Computer Science Department Stony Brook University.
Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.
Packet filtering using cisco access listsINET97 / track 2 # 1 packet filters using cisco access lists Fri 19 June 97.
Nov 2009IEEE 802.1aq Atlanta IEEE 802.1aq Shortest Path Bridging Equal Cost Tree (ECT) Framework Proposal Peter Ashwood-Smith incorporating graphics by:
IEEE 802.1aq control of the Mac-in-Mac Hash/TTL B-VID(s) Jan 2011 Peter Ashwood-Smith
ECMP for 802.1Qxx Proposal for PAR and 5 Criteria Version 2 16 people from ECMP ad-hoc committee.
Fundamental Relationship between Node Density and Delay in Wireless Ad Hoc Networks with Unreliable Links Shizhen Zhao, Luoyi Fu, Xinbing Wang Department.
Introduction to Algorithms 6.046J/18.401J
1 Introducing the Specifications of the Metro Ethernet Forum MEF 32 Requirements for Service Protection Across External Interfaces.
Scalable Routing In Delay Tolerant Networks
Enter. The Scene Type text here Choice A1 Text for decisions Guidelines Guideline text Click on A, B or C A B C Choice B1 Choice C1 Click on A, B or C.
Shortest Path Bridging IEEE 802
1 Maintaining Packet Order in Two-Stage Switches Isaac Keslassy, Nick McKeown Stanford University.
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Bus 480 – Lecture 2 Transportation and Assignment models
CSCI-1680 Switching Based partly on lecture notes by David Mazières, Phil Levis, John Jannotti Rodrigo Fonseca.
Chapter 1: Introduction to Scaling Networks
McGraw-Hill © The McGraw-Hill Companies, Inc., 2004 Chapter 22 Network Layer: Delivery, Forwarding, and Routing Copyright © The McGraw-Hill Companies,
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
Chapter 20 Network Layer: Internet Protocol
Equations of Lines Equations of Lines
Lecture 4 4.1,4.2 Counting. 4.1 Counting Two Important Principles: Product Rule and Sum Rule. Product Rule: Assume we need to perform procedure 1 AND.
Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.
Graphs, representation, isomorphism, connectivity
IPv6 Routing.
Abbas Edalat Imperial College London Contains joint work with Andre Lieutier (AL) and joint work with Marko Krznaric (MK) Data Types.
Chapter 9 ARP CIS 82 Routing Protocols and Concepts Rick Graziani Cabrillo College Last Updated: 5/13/2008.
Submission doc.: IEEE /1409r0 November 2013 Adriana Flores, Rice UniversitySlide 1 Dual Wi-Fi: Dual Channel Wi-Fi for Congested WLANs with Asymmetric.
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA TCP/IP Protocol Suite and IP Addressing Halmstad University Olga Torstensson
Bridges, LANs and the Cisco IOS
1 12/18/ :21 Chapter 12Bridges1 Rivier College CS575: Advanced LANs Chapter 12: Bridges.
12 System of Linear Equations Case Study
ENGIN112 L15: Magnitude Comparator and Multiplexers October 6, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 15 Magnitude Comparators.
GenChem Ch /03/03TMHsiung 1/60 Chapter 16 Acids and Bases.
Completing the Square Topic
Distributed Computing 9. Sorting - a lower bound on bit complexity Shmuel Zaks ©
Design of Hollow Block Slabs
Battery Power Conference 2010, 1, Li-Ion Myth-Buster Poking holes into some common beliefs about Li-Ion cells and Li-Ion BMSs. Davide Andrea.
Compiler Construction
IMIM v v v v v v v v v DEFINITION L v 11 v 2 1 v 31 v 12 v 2 2 v 32.
Target code Generation Made by – Siddharth Rakesh 11CS30036 Date – 12/11/2013.
Link State Routing Jean-Yves Le Boudec Fall
Internetworking II: MPLS, Security, and Traffic Engineering
Page 1 / 14 The Mesh Comparison PLANET’s Layer 3 MAP products v.s. 3 rd ’s Layer 2 Mesh.
CS335 Networking & Network Administration Tuesday, April 20, 2010.
MULTICASTING Network Security.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
Network Redundancy Multiple paths may exist between systems. Redundancy is not a requirement of a packet switching network. Redundancy was part of the.
Network Layer (3). Node lookup in p2p networks Section in the textbook. In a p2p network, each node may provide some kind of service for other.
IEEE Plenary San Francisco
CSC 336 Data Communications and Networking Lecture 7d: Interconnecting LAN Dr. Cheer-Sun Yang Spring 2001.
GMPLS Control of Ethernet IVL Switches draft-fedyk-gmpls-ethernet-ivl-00 GELS BOF, IETF 64 Don Fedyk, Dave Allan,
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 3 v3.0 Module 7 Spanning Tree Protocol.
WAN Technologies. 2 Large Spans and Wide Area Networks MAN networks: Have not been commercially successful.
INTRODUCTION We want load spreading to operate within the layer
Revisiting Ethernet: Plug-and-play made scalable and efficient
3. Internetworking (part 2: switched LANs)
Network Load Balancing Topology
NTHU CS5421 Cloud Computing
VL2: A Scalable and Flexible Data Center Network
Dr. Rocky K. C. Chang 23 February 2004
COMPUTER NETWORKS CS610 Lecture-16 Hammad Khalid Khan.
Presentation transcript:

Ethernet Data Center Routing Challenges and 802.1aq/SPB new work PETER ASHWOOD-SMITH

802.1aq’s 16 ECT can give perfect spread going 2 hops 16 uplinks. However: A) Need to tweak 2 nd layer switch priorities to guarantee all 16 are used. B) Need at least 16 subnets (C/S-Vlan’s) to assign one per 802.1aq B-VID. A) Tweak Bridge Priorities Here S 1 … S 16 B)

Can we eliminate ‘tweaking*’ David Allan et al. have a presentation on this so I won’t spend much time on it. In general a network with N equal cost paths from ‘some source’ to ‘some destination’ requires #ECT about 25-40% greater than N (to statistically capture them all). Therefore when #ECT == N some ‘tweaking’ is usually required (for DC its trivial to do however). Dave et al. suggest non-independence between ECT algorithms as way to address this (maximize diversity) … *Tweaking = adjusting Bridge Priorities up/down from defaults.

A 15 A 16 B 32 B 31 B 30 B 29 A1A1 A2A2 B4B4 B3B3 B2B2 B1B1 48 switch non blocking 2 layer L2 fabric 16 at “upper” layer A 1..A at “lower” layer B 1.. B uplinks per B n, & 160 UNI links per B n 32 downlinks per A n “Example” 802.1aq switching cluster – assume 100GE NNI links/groups (16 x 100GE per B n )x32 = 512x100GE = 51.2T 160 x 10GE server links (UNI) per B n (32 x 160)/2 = x10GE per uFIB = 16 x 48 B-mac = 768 entries mFIB = 16 subnet x 48 src = 768 entries 16 x 32 x 100GE = 51.2T using 48 x 2T switches S 3,1 S 3,160 S 32,1 S 32,160 S 1,1 S 1, x 10GE 16 x 100GE 160 x 10GE 32 x 100GE 1536 FIB/node Good numbers “16” & “2” levels.

For a given ECT-ALG k, A j is a member of every SPF-TREE(B *,ECT-ALG k ) Properly tuned no two ECT-ALGorithms will use the same A j as a fork point. S 1 … S 16 ECT-ALG #12 Source Node (1)

A 15 A 16 B 32 B 31 B 30 B 29 A1A1 A2A2 B4B4 B3B3 B2B2 B1B1 Subnet N i maps to I-SID j and then to a unique A (j mod 16 ) So load spreading allows each A i to transit a complete subnet. Problem#1 - Unable to further spread such that A i and A j (i != j) each handle subset of flows in I-SID j I-SID j I-SID i

A 15 A 16 B 32 B 31 B 30 B 29 A1A1 A2A2 B4B4 B3B3 B2B2 B1B1 This is an issue under failure of A j Recovery will move entire subnet traffic to another A i node. A preferable solution is to spread affected load over remaining A * I-SID j I-SID i

A 15 A 16 B 32 B 31 B 30 B 29 A1A1 A2A2 B4B4 B3B3 B2B2 B1B1 Possible solution – head end hashing (unicast only) Allow unicast I-SID i and I-SID j traffic to be hashed based on smaller flows to different B-VIDs (ECT-ALGorithms) This breaks the symmetry and congruence rules but allows edge balancing at smaller granularity. No changes to multicast. Requires learning, independent of B-VID I-SID j I-SID i Unicast Mcast

A 15 A 16 B 32 B 31 B 30 B 29 A1A1 A2A2 B4B4 B3B3 B2B2 B1B1 A 15 A 16 B 32 B 31 B 30 B 29 A1A1 A2A2 B4B4 B3B3 B2B2 B1B1 Interconnection of fabrics creates more than 16 paths (exponential ) C1C1 C2C2 Number of paths can grow exponentially with increasing levels. Constant number of paths always << number of paths in many networks. Growing 802.1aq ECT to say 32 or even 100 ECMP causes larger unicast FIBs. O(16) O(16x2) O(16x2x16)

A 15 A 16 B32B32 B31B31 B30B30 B29B29 A1A1 A2A2 B4B4 B3B3 B2B2 B1B1 Horizontal Growth – not too bad but need more ECT-ALGORITHMS. Horizontal growth by 1 just increases number of ECT by 1 Not too big a problem but we would need to define new ECT (via Opaque). B34B34 B33B33 A 17

General Issue O(degree) O(diameter) #paths ~= O( diameter degree ) So head end ECT in worst case requires O(exp(# B-VIDs)) S D Choose path from N x B-VID

A feasible solution … Re-assign traffic to path at each hop Tandem “ECMP” just like IP. Need to keep O(degree) number of next hops Only need one B-VID.. removes O(diameter) from state cost Flip side is you have no control – just hope for fine scale statistical distribution Choose path from N x nxt hop S D Choose path from N x nxt hop Single B-VID

What about loops in this mode? 802.1aq Ingress Check is very strong in the case of a single next hop and hence a single possible ingress for an SA aq Ingress Check is weakened in the case of a multiple next hop and hence Multiple possible ingress for an SA. However 802.1aq Agreement Protocol functions correctly in the context of multiple possible Next Hops for the same B-VID (refer to Mick’s proof). But …

Agreement Protocol Concerns Is it too complex? it is clearly non trivial, we need implementation/ emulation experience. Is it overly Draconian. For example the bounds on movement are what is required for a mathematical proof by induction.. However there are probably many cases where further movement would not loop. What is the degree of ‘overkill’ ? Is it marketable? – this is unfortunately a legitimate concern!!! 802.1aq can be deployed without AP until we introduce hash based forwarding at which point we either require a symmetric AP and/or an on-data-path loop detection/drop mechanism. Believe that an on-data-path loop detection mechanism is required for hash based ECMP until we have more experience with AP. Recommend we standardize a TTL TAG either stand-alone or as a new form of I-TAG.

View of New Work Requirements R1) New ECT-ALGorithms with improved spreading properties. R2) Allow optional head end hash assignment of 802.1aq SPBM UNI known unicast traffic to one of multiple next hop interfaces/B-VIDs. Very similar to Link Ag. Minimally HASH (seed, C.SA, C.DA, C-VID, [ IP.SA, IP.DA, IP.PROTO] ) R3) Allow optional tandem hash assignment of 802.1aq SPBM B-VID NNI unicast traffic to one of multiple next hop interfaces. Essentially a new SPBM ECT-ALG with its own B-VID. (i.e. new ECT-ALGorithms, all usable at same time) Minimally HASH (seed, B-VID, C.SA, C.DA, C-VID, [ IP.SA, IP.DA, IP.PROTO ]) R4) minor OA&M changes in support of R2 and R3, because symmetry/congruence broken. R5) More experience with AP, emulations, simulations etc. + addition of TTL to new I-TAG or a TTL-TAG.