Reducing Transient Disconnectivity using Anomaly-Cognizant Forwarding Andrey Ermolinskiy, Scott Shenker University of California – Berkeley and ICSI
What’s the problem? One of the central goals of the Internet - continuous end-to-end connectivity BGP convergence is a major cause of connectivity disruption Routers operate upon potentially inconsistent local views Temporary inconsistencies give rise to anomalies such as loops and black holes that disrupt end-to-end packet delivery
Example: transient routing loop with BGP A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA withdraw BA
A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA withdraw BA Routing loop between C and D incurs temporary loss of connectivity between {B, C, D, E, F} and A. Example: transient routing loop with BGP
Related Work Shrinking the convergence time window through BGP protocol extensions Ghost flushing Consistency assertions Protecting end-to-end packet delivery from adverse effects of convergence R-BGP Forward packets on pre-computed failover paths, Propagate root cause information to prevent loops Consensus Routing Enforce a globally-consistent view via distributed snapshots and strategically delay adoption of incoming BGP updates Anomaly-Cognizant Forwarding
Anomaly-Cognizant Forwarding (ACF) Approach Accept routing anomalies as an unavoidable fact Protect end-to-end packet delivery by detecting and recovering from anomalies on the forwarding path Main hypothesis Several simple and lightweight extensions to conventional IP forwarding enable us to sustain packet delivery during periods of BGP instability without the use of pre-computed backup paths without modifying the core routing protocol or altering its timing dynamics
Domain S has anomalous forwarding state for destination D if S’s outgoing packets destined for D arrive back to S as result of a routing loop. Main idea of ACF: Detect occurrences of anomalous state Avoid forwarding packets via domains that are known to have anomalous state. S D Anomalous forwarding state ACF Overview Each packet carries a list of prior AS-level hops (pathTrace) Each packet carries a blackList of domains with anomalous state pathTraceblackList Packet header
ACF Overview Forward (packet p ) { if ( localASNum in p.pathTrace ) Move loop elements from p.pathTrace to p.blackList nextHop lookupNextHop ( p.destAddr ) if ( nextHop in p.blackList ) Invoke the control plane, look for alternate non-blacklisted routes in the RIB if ( nextHop != NONE ) { Append localASNum to p.pathTrace SendPacket( p, nextHop ) } else Initiate recovery-mode forwarding for p }
ACF Recovery-mode forwarding Normal-mode forwarding Recovery-mode forwarding Intuition: R or some router along the path to R may know a working alternate route to the original destination. If a router is unable to forward a packet because it does not have a valid non- blacklisted route, it initiates recovery forwarding. Chooses a recovery destination R from a static and well-known set of highly- connected Tier-1 domains. Detours the packet through R. R1R1 R2R2 nextHop=NONE Recovery destinations
Anomaly-Cognizant Forwarding A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA p p.Header pathTrace = [ C ]blackList = { } dst = AorigDst =
Anomaly-Cognizant Forwarding A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA p p.Header pathTrace = [ C D ]blackList = { } dst = AorigDst =
Anomaly-Cognizant Forwarding A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA p pathTrace = [ C D ]blackList = {D } p.Header dst = AorigDst = C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA p p.Header pathTrace = [ ]blackList = {C D } dst = ForigDst = A C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA p p.Header pathTrace = [ ]blackList = {C D } dst = ForigDst = A C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA p p.Header pathTrace = [ C]blackList = {C D } dst = ForigDst = A C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA p p.Header pathTrace = [ C]blackList = {C D } dst = ForigDst = A C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA p p.Header pathTrace = [ C]blackList = {C D E} dst = ForigDst = A C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA p p.Header pathTrace = [ C E]blackList = {C D E} dst = ForigDst = A C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA p p.Header pathTrace = [ C E]blackList = {C D E} dst = ForigDst = A C initiates recovery forwarding through domain F
Anomaly-Cognizant Forwarding A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA p p.Header pathTrace = [ ]blackList = {C D E} dst = ForigDst = A C initiates recovery forwarding through domain F F resumes normal-mode forwarding
Anomaly-Cognizant Forwarding A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA p p.Header pathTrace = [ F]blackList = {C D E} dst = ForigDst = A C initiates recovery forwarding through domain F F resumes normal-mode forwarding
Anomaly-Cognizant Forwarding A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA p p.Header pathTrace = [ F G]blackList = {C D E} dst = ForigDst = A C initiates recovery forwarding through domain F F resumes normal-mode forwarding
Anomaly-Cognizant Forwarding A B CD EF G 1. BA 2. CBA 1. BA 2. DBA 1. CBA 2. DBA 1. ECBA 2. GA p p.Header pathTrace = [ F G]blackList = {C D E} dst = ForigDst = A C initiates recovery forwarding through domain F F resumes normal-mode forwarding
Anomaly-Cognizant Forwarding A B CD EF G
ACF: Observations ACF does not use pre-computed failover paths Discovers alternate routes dynamically using state in the packet header The two forwarding modes make use of the same forwarding table Paths to recovery destinations are not assumed to be stable and anomaly-free We protect recovery-mode forwarding using the same mechanism (pathTrace and blackList)
ACF: Preliminary Evaluation Evaluation metrics Effectiveness in eliminating transient disconnectivity Efficiency of alternate paths Packet header overhead
ACF: Preliminary Evaluation Simulation methodology CAIDA AS-level topology (27969 nodes) annotated with inferred inter-AS relationships multihomed edge domains, adjacent provider links Provider link failure experiment For each multihomed domain D, and each provider link L Fail L and simulate packet delivery from every other domain to D during convergence D S1S1 S2S2 S4S4 S3S3 Recovery destinations = 10 highly-connected Tier-1 ISPs Packet TTL = 32 hops
ACF: Preliminary Evaluation Transient disconnection after a link failure BGP with conventional forwarding 51% of failures cases produce unwarranted disconnection Widespread disconnection (>50% of ASes) in 17% of cases BGP with ACF No disconnection in 92% of failure cases <1% of ASes see disconnection in 98% of failure cases
ACF: Preliminary Evaluation Transient path efficiency Causes of path dilation in ACF Transient loops Detouring via a recovery destination F – failure cases that produce transient disconnection with conventional forwarding In 65% of failure cases that produce disconnectivity, ACF recovers packets using ≤ 2 extra hops 9% of cases require 7 hops or more
ACF: Preliminary Evaluation Packet header overhead % of ASes disconnected 0%0.09%0.9%9%90% pathTrace length blackList length Maximum number of pathTrace and blackList entries in a representative sample of failure cases. Worst-case pathTrace – 20 entries 40 bytes of overhead assuming 16-bit AS numbers Worst-case blackList – 16 entries 10 bytes of overhead for a Bloom filter with 1% error rate
Challenges / Concerns Feasibility of deployment ACF adds fields to packet header and modifies core IP forwarding logic. Packet processing overhead Control plane is invoked only during periods of instability Common case: check pathTrace and blackList. Both operations admit efficient implementation in hardware and parallelization. ACF and routing policies
Thank you. Questions?