Presentation is loading. Please wait.

Presentation is loading. Please wait.

Seyed K. Fayaz, Tushar Sharma, Ari Fogel

Similar presentations


Presentation on theme: "Seyed K. Fayaz, Tushar Sharma, Ari Fogel"— Presentation transcript:

1 Efficient Network Reachability Analysis using a Succinct Control Plane Representation
Seyed K. Fayaz, Tushar Sharma, Ari Fogel Ratul Mahajan, Todd Millstein, Vyas Sekar George Varghese

2 Network configuration is hard
??? Network operator Reachability policy: A can talk to B Reality What the network does R3 R4 R1 A R2 B network Does the network do what we want it to do?

3 State of the art in network verification
Data plane verification Prior work: HSA, NSDI’12 ATPG, CoNext’12 NOD, NSDI’15 DP3 traffic DP4 DP1 B A DP2 Network operator Can A talk to B? Reachability policy: A can talk to B Data plane (Forwarding table) DP3 DP4 R3 DP1 R4 R1 DP2 A R2 B Are we done?

4 The data plane keeps changing!
DP3 DP3 DP3 DP4 DP4 DP4 DP1 DP1 DP1 DP2 DP2 DP2 Can A talk to B? Can A talk to B? Can A talk to B? time Time = t1 Time = t2 Time = t3 traffic from A to B Network operator Reachability policy: A can talk to B DP3 DP3 DP3 R3 DP3 DP3 DP4 DP3 DP3 DP1 R4 R1 DP3 DP2 DP3 A R2 B

5 Motivating example: Reachability bug triggered by a BGP announcement
Before the incident After the incident culprit W W A B A B DCA DCA DCB DCB services in /16 DCB/16 New service /28 DCB/28 DCB/16 /16 Root cause: Router B’s config. had a aggregate route /16 pointing to DCB The /28 advertisement activated the aggregate route! How can we proactively find such latent reachability bugs?

6 A data plane is just the current incarnation of the control plane!
Router configuration Prior work on control plane verification rcc, NSDI’05 Bagpipe, OOPSLA’16 ARC, SIGCOMM’16 Batfish, NSDI’15 Route advertisement 1 (e.g., BGP advertisement) Control plane (implementation of BGP, OSPF, etc.) Route advertisement 2 (e.g., OSPF advertisement) advertisements to neighbors Route advertisement 3 (e.g., RIP advertisement) Limitations: Incomplete: Focus on just one routing protocol Unscalable: Detailed modeling of message passing Data Plane 1 <prefix P,port1> Data Plane 3 <prefix P,port3> Data Plane 2 <prefix P,port2> Router To find latent reachability bugs, we should focus on the control plane!

7 Our contributions ERA: A tool for finding latent router configuration bugs in seconds based on control plane analysis Expressive-yet-tractable control plane model Scalable exploration of control plane model Implementation as an open source tool

8 ERA: System overview ERA Pass Fail
Operator Pass router configurations reachability policies Fail ERA ERA: A tool to find latent reachability bugs due to router misconfiguration. Scope: Reachability bugs occurring in the steady state

9 Outline Background and motivation Design of ERA
Implementation and evaluation

10 Challenges in control plane analysis
Operator router configurations reachability policies Pass Challenge 1: Expressive and tractable model? Fail control plane model Challenge 2: Scalable exploration model exploration ERA

11 Challenge 1: Expressive and tractable control plane model
Operator router configurations reachability policies Pass Challenge 1: Expressive and tractable model? Fail control plane model Challenge 2: Scalable exploration model exploration ERA

12 ✗ ✗ ? ? ✔ ✔ ✔ ✔ A route as a succinct bit-vector Router control plane
Control plane I/ O model Expressive Tractable Actual protocol’s messages (e.g., Batfish, NSDI’15) Protocol agnostic I/O model (e.g., ARC, SIGCOMM’16) Route as a compact bit vector Dst IP (32 bits) Dst mask (5 bits) Administrative distance (4 bits) Protocol attributes (87 bits) A route as a succinct and unifying control plane I/O unit

13 ? ? Control plane as a fast pipeline of boolean operators: Intuition _
Router control plane ? Why not actual router’s code?  Hard to explore Router as a fast route processing pipeline X3 X2 X1 X0 protocol attribute An example: prefix admin. distance (RIP=1, BGP=0) _ _ router config. ? X1X0 Router control plane X3X1X0∨X2X1X0 input set RIP attr. to 1 RIP static 10 _ _ _ _ _ _ _ _ _ _ _ X1X0 X1X0 X1 = X1X0 X3X2 X1X0 = X3X1X0∨X2X1X0 X3X1X0∨X2X1X0

14 Control plane as a fast pipeline of
boolean operators: Complete pipeline BDD of input routes AND with supported protocols 1 2 3 OR with routes originated by router Apply input filters AND with NEG. of static routes 6 OR with aggregate routes 5 OR with redistributed routes 4 Select best route per dst prefix 7 8 BDD of output routes Apply output filters Compact representation of a collection of routes using Binary Decision Diagrams (BDDs) The pipeline captures key control plane behaviors that are source of many bugs.

15 Challenge 2: Scalable control plane exploration
Operator router configurations reachability policies Pass Challenge 1: Expressive and tractable model? Fail control plane model Challenge 2: Scalable exploration model exploration ERA

16 Reachability analysis by exploring the control plane model
Intuition: To see what traffic can reach from A to B, just find out what route prefixes advertised by B can reach A! True ? True ? R3 R4 route advertisements (represented as BDDs) RA R1 A R2 RB B traffic R5 Network Environment True ? Prepare for the worst! Optimizations to scale control plane exploration: Equivalence classes of routes Fast AVX2 instructions to implement conjunction/disjunction

17 Outline Background and motivation Design of ERA
Implementation and evaluation

18 Implementation Fail Pass https://github.com/Network- verification/ERA
Router config. parser from Batfish (NSDI’15) parsing Cisco and Juniper Control plane model (Custom Java code and BDD library) Network topology (Custom format) Operator Reachability policies (e.g., AB, valley-free, blackhole) Model Exploration (Java and Intel AVX2 optimizations) Environment assumptions (default: “all routes”) Pass Fail verification/ERA

19 Evaluation ERA is effective in finding latent reachability bugs
Found known and new bugs in synthetic scenarios Found known and new bugs in real scenarios These bugs were caused by router misconfiguration wrt Incorrect route redistribution Incorrect route aggregation Unintended cross-protocol effects Interaction between SDN and traditional routing protocols ERA is fast and scalable ERA analyzes networks with over 1,600 routers in < 7 seconds Finding a latent bug using state of the art data plane analysis techniques in a 2-router network would take up to 1022 days!

20 Conclusions Problem: How to find latent network reachability bugs?
Data plane verification is fundamentally limited Current control plane analysis tools are incomplete or unscalable ERA: A fast control plane analysis tool: Modeling control plane’s I/O as compact BDDs Modeling control plane processing logic using fast boolean arithmetic ERA can help find latent bugs and is scalable


Download ppt "Seyed K. Fayaz, Tushar Sharma, Ari Fogel"

Similar presentations


Ads by Google