Real-world Requirements Pitfalls of Segment Routing Traffic Engineering Timmy Schüller Osnabrück University Last year at RIPE, Florian Hibler (from Arista) gave a talk about SR and why it is cool. I agree, but I found that it also breaks things, if you use it for TE and are not careful. RIPE 78 Reykjavik, May 21, 2019
Traffic Engineering using Segment Routing .. is a Source Routing Architecture .. uses Node, Adjacency and Service Segment IDs to define “checkpoints” .. encodes a path as a stack of SIDs .. consults IGP to reach an SID SR tunnel ≔ concatenation of shortest paths .. can be implemented with MPLS or IPv6 & an IGP extension O First, let me remind you on what SR is. E S U U C U U D B O S Data A F S S O O Copyright © 2019 Timmy Schüller, Osnabrück University
2SR .. 𝑘 𝑘 𝑘 𝑖 𝑗 𝑖 𝑗 𝑖 𝑗 𝑖 𝑗 ℓ ℓ ✔ ✔ ✔ ✔ ✘ 2-Segment Routing 2SR .. .. minimizes max. link utilization .. Uses at most 2 concatenated shortest paths per SR tunnel ✔ 𝑘 𝑘 𝑘 50% 𝑖 𝑗 𝑖 𝑗 𝑖 𝑗 𝑖 𝑗 50% ℓ ℓ ✔ ✔ ✔ ✘ first practical issue is the size of the label stack. But in practice, 2 segments suffice [google maps analogy] Copyright © 2019 Timmy Schüller, Osnabrück University
Real-World Requirements I Minimize max. utilization Image by Ian Barbour @ Flickr This is the main goal of TE (and most common objective in literature) Copyright © 2019 Timmy Schüller, Osnabrück University
Output Input Max. util. = 64% Use SR Paths: RKV OS FRA 2SR by Bhatia et al. Image from http://www.deutschetelekom-icss.com/ournetwork (Accessed June 2015) Max. util. = 64% Output Input Use SR Paths: RKV OS FRA FRA OS RKV etc. Source Destination Traffic RKV FRA 12 OS 7 42 etc. To min max. util with 2SR, we can just take the linear program by Bhatia et al. At INFOCOM 2015 Input: Topology and Traffic Matrix Output: Objective value and how to get there (SR configuration) 2SR formulation from Bhatia et al.: Optimized Network Traffic Engineering using Segment Routing. IEEE INFOCOM, 2015 Copyright © 2019 Timmy Schüller, Osnabrück University
10 Snapshots of a Tier 1 ISP Backbone from 2018 Verifying 2SR 10 Snapshots of a Tier 1 ISP Backbone from 2018 Some trace-based simulation results that verify that 2SR works well lower bound state of the art Copyright © 2019 Timmy Schüller, Osnabrück University
Real-World Requirements II & III Minimize max. utilization Keep # of SR tunnels low Limit split factors Image by Ian Barbour @ Flickr 𝑘 1% 𝑖 𝑗 99% ℓ Now we take our result, go to operations and tell them to deploy this specific SR configuration. Unfortunately they are not happy, because it contains thousands of SR paths.. Also, apparently, a router is not capable of splitting traffic in arbitrary fractions, which the presented model does The Tunnel Limit Extension (TLE) Copyright © 2019 Timmy Schüller, Osnabrück University
Real-World Requirements IV Minimize max. utilization Keep # of SR tunnels low Limit split factors Avoid transit on edge nodes Image by Ian Barbour @ Flickr ✘ ✔ 𝑘 1% 𝑖 𝑗 99% ℓ We showed the resulting SR configuration to the customer, but they identified illegal paths that used virtual edge nodes as intermediate segment Copyright © 2019 Timmy Schüller, Osnabrück University
With this modular design, any node may be blacklisted! Defining a Blacklist Step 1: Define nodes to be blacklisted (e.g. ‘*Edge’) Step 2: Automatically reorder list of nodes: Step 3: Add constraint to optimization: ✘ ✔ Osnabrück A Osnabrück B Osnabrück Edge Reykjavik A Reykjavik B Reykjavik Edge Frankfurt A Frankfurt B Frankfurt Edge Osnabrück A Osnabrück B Reykjavik A Reykjavik B Frankfurt A Frankfurt B Osnabrück Edge Reykjavik Edge Frankfurt Edge Nodes that may be used as checkpoints Nodes that must only be source and/or destination To avoid undesired transit traffic, we defined a blacklist. The blacklist is freely configurable, any node may be blacklisted. Disclaimer: Examples are purely fictional With this modular design, any node may be blacklisted! Copyright © 2019 Timmy Schüller, Osnabrück University
Real-World Requirements V Minimize max. utilization Keep # of SR tunnels low Limit split factors Avoid transit on edge nodes Avoid bypassing latency requirements Image by Ian Barbour @ Flickr ✘ ✔ 𝑘 1% 𝑖 𝑗 99% ℓ While experimenting, weird flows appeared that bypass latency requirements, since intermediate segments can be placed freely! Copyright © 2019 Timmy Schüller, Osnabrück University
Intracontinental Delay Constraint: Delay Constraints Intracontinental Delay Constraint: Intranational Delay Constraint: We cannot use our blacklist for this, because nodes need to be blacklisted given the context of the traffic flow This is not a perfect solution, but approximates the issue well enough (actually readable math: if continent of ingress node is the same of the continent of the egress node AND the continent of the intermediate router is NOT the same as the continent of the ingress, set traffic variable to zero) Intra-Site Delay Constraint: Copyright © 2019 Timmy Schüller, Osnabrück University
The Tunnel Limit Extension ✘ ✔ 𝑘 𝑖 𝑗 ℓ 1% 99% Good compromise possible All requirements seen so far forged into math on the left Results shown on the right Trade off between main and secondary objective on the right (in varying degrees) Copyright © 2019 Timmy Schüller, Osnabrück University
Even More Real-World Requirements Reconfigure rarely & little Be resilient against failures Do we need to re-optimize? Hardware outage True changes of the topology (e.g. adding a new link) Time of optimization or deployment of SR tunnels Irregularities in traffic matrices (e.g. Superbowl) These requirements can be seen as two sides of the same coin. If the optimization is resilient, naturally it will remain stable for longer periods of time. Yet, this is a very hard challenge that I am working on right now Copyright © 2019 Timmy Schüller, Osnabrück University
Any Questions or Comments? The Takeaway SR is a powerful Traffic Engineering tool It has to be carefully fine- tuned towards specific use-case Number of SR tunnels Traffic splitting Special nodes Latency etc. Any Questions or Comments? Time‘s up? Find me at https://sys.cs.uos.de/schueller or contact me directly: tschueller@uos.de Etc = resiliency, add QOS classes Details on this work can be found in: Schüller et al., “Traffic Engineering using Segment Routing and Considering Requirements of a Carrier IP Network”, IEEE/ACM Trans-actions on Networking, vol. 26, no. 4, pp. 1851-1864, 2018 Copyright © 2019 Timmy Schüller, Osnabrück University