Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Rethinking Network Control & Management The Case for a New 4D Architecture David A. Maltz Carnegie Mellon University Joint work with Albert Greenberg,

Similar presentations


Presentation on theme: "1 Rethinking Network Control & Management The Case for a New 4D Architecture David A. Maltz Carnegie Mellon University Joint work with Albert Greenberg,"— Presentation transcript:

1 1 Rethinking Network Control & Management The Case for a New 4D Architecture David A. Maltz Carnegie Mellon University Joint work with Albert Greenberg, Gisli Hjalmtysson Andy Myers, Jennifer Rexford, Geoffrey Xie, Hong Yan, Jibin Zhan, Hui Zhang

2 2 Is the Network Down Again? You sit at your home computer, trying to access a computer at work… …But no data is getting through Minutes or hours later, data flows again… …You never find out why Network operators aren’t much better at predicting outages …

3 3 Outline What do networks look like today? New approach to predicting network behavior A new architecture for controlling networks

4 4 Many Kinds of Networks Each has different Size – generally 10-1000 routers each Owner – company, university, organization Topology – mesh, tree, ring Examples: Enterprise/Campus networks Access networks: DSL, cable modems Metro networks: connect up biz in cities Data center networks: disk arrays & servers Transit/Backbone networks

5 5 A Conventional View of a Network Physical topology is a graph of nodes and links Run Dijkstra to find route to each node A G D B J F I H E C

6 6 A Conventional View of a Network A G D B J F I H E C Physical topology is a graph of nodes and links Run Dijkstra to find route to each node Knowing how the routers are connected says almost nothing about whether or not two hosts can communicate

7 7 Network Equipment Boxes: router, switch Links: Ethernet, SONET, T1, … Picture from Internet2 Abilene Network

8 8 The Data Plane of a Network Hosts/servers Router/Switch Interfaces

9 9 Packets For this talk, networks traffic in packets A sequence of bytes processed as a unit Meta-data Source Address Destination Addr Port numbers …. User data Packet

10 10 The Data Plane of a Network Forwarding Information Base (FIB) Basically a look-up table, each entry is a route Tests fields of packet and determines which interface to send packet out DestinationNextHop Aleft Bright Cleft

11 11 The Data Plane of a Network Packet Filter Specific to a single interface Tests fields of packet and determines whether to permit or drop packet Finer granularity than FIB – can test more fields, even target specific applications Permit A->B Drop C->B

12 12 The Data Plane of a Network Many other mechanisms… Queueing discipline Packet transformers (e.g., address translation)

13 13 The Control Plane of a Network Where do FIB entries come from? A distributed system called the Control Plane Control plane failures responsible for many of the longest, hardest to debug outages! DestinationNextHop Aleft Bright Cleft

14 14 The Control Plane of a Network Routers run routing processes FIB Routing Process

15 15 The Control Plane of a Network Adjacent processes exchange routing information Information format defined by routing protocol Many routing protocols: BGP, OSPF, RIP, EIGRP Adjacent processes must use the same protocol FIB Routing Process FIB Routing Process FIB Routing Process A,B C,D

16 16 The Control Plane of a Network Routing protocols define logic for computing routes Combine all available information Pick best route for each destination FIB Routing Process FIB Routing Process FIB Routing Process D D DestinationNextHop Dleft

17 17 Control Plane Creates Resiliency D D left Routing Process D left Routing Process D left Routing Process D D D

18 18 Control Plane Creates Resiliency D right Routing Process D left Routing Process D left Routing Process D D D

19 19 A Study of Operational Production Networks How complicated/simple are real control planes? What is the structure of the distributed system? Use reverse-engineering methodology There are few or no documents The ones that exist are out-of-date Anonymized configuration files for 31 active networks (>8,000 configuration files) 6 Tier-1 and Tier-2 Internet backbone networks 25 enterprise networks Sizes between 10 and 1,200 routers 4 enterprise networks significantly larger than the backbone networks

20 20 Excerpts from a Router Configuration File interface Ethernet0 ip address 6.2.5.14 255.255.255.128 interface Serial1/0.5 point-to-point ip address 6.2.2.85 255.255.255.252 ip access-group 143 in frame-relay interface-dlci 28 router ospf 64 redistribute connected subnets redistribute bgp 64780 metric 1 subnets network 66.251.75.128 0.0.0.127 area 0 router bgp 64780 redistribute ospf 64 match route-map 8aTzlvBrbaW neighbor 66.253.160.68 remote-as 12762 neighbor 66.253.160.68 distribute-list 4 in access-list 143 deny 1.1.0.0/16 access-list 143 permit any route-map 8aTzlvBrbaW deny 10 match ip address 4 route-map 8aTzlvBrbaW permit 20 match ip address 7 ip route 10.2.2.1/16 10.2.1.7

21 21 Size of Configuration Files in One Netwo rk Router ID (sorted by file size) 8810 Lines in config file 2000 1000 0

22 22 Routing Processes Implement Policy Extensive use of policy commands to filter routes Prevent some hosts from communicating: security policy Limit access to short-cut links: resource policy FIB Routing Process FIB Routing Process FIB Routing Process A,B A R1R2R3

23 23 Packet Filters Implement Policy Packet filters used extensively throughout networks Protect routers from attack Implement reachability matrix –Define which hosts can communicate –Localize traffic, particularly multicas t

24 24 Mechanisms for Action at a Distance Policy often implemented by tagging routes on one router … … And testing for tag at another router FIB Routing Process FIB Routing Process FIB Routing Process A:tag=12 A R1R2R3 A:tag=12 Tag?

25 25 Multiple Interacting Routing Processes OSPFBGPOSPF FIB OSPF FIB OSPF FIB OSPF FIB OSPF EBGP Policy1Policy2 Internet Client Server

26 26 The Routing Instance Graph of a 881 Router Network

27 27 Take Away Points Networks deal with both creating connectivity and preventing it Networks controlled by complex distributed systems Must understand system to understand behavior Focusing on individual protocols is not enough Composition of protocols is important and complex Developed abstractions to model routing design Routing Process Graph – accurately model design Routing Instance – abstracts away details Reverse-engineer routing design from configs

28 28 Outline What do networks look like today? New approach to predicting network behavior Frame the problem of reachability analysis Sketch algebra for predicting reachability A new architecture for controlling networks

29 29 Reachability Can A send a packet to B? Depends on routing protocols, advertised routes, policies, packet filters,... Predicting reachability is key to network survivability and security i j AB

30 30 Reachability We focus on two types of policy: –Survivability: Certain packets should always be permitted, under all possible network states –Security: Certain packets should never be permitted, under all possible network states i j AB

31 31 Reachability Example Two locations, each with data center & front office All routers exchange routes over all links R1R2 R5 R4R3 Chicago (chi) New York (nyc) Data CenterFront Office

32 32 Reachability Example R1R2 R5 R4R3 Chicago (chi) New York (nyc) Data Center chi-DC chi-FO nyc-DC nyc-FO chi-DCchi-FOnyc-DCnyc-FO Front Office

33 33 Reachability Example R1R2 R5 R4R3 Data Center chi-DC chi-FO nyc-DC nyc-FO chi-DCchi-FOnyc-DCnyc-FO Packet filter: Drop nyc-FO -> * Permit * Packet filter: Drop chi-FO -> * Permit * Front Office chi nyc

34 34 Reachability Example A new short-cut link added between data centers Intended for backup traffic between centers R1R2 R5 R4R3 Data Center Packet filter: Drop nyc-FO -> * Permit * Packet filter: Drop chi-FO -> * Permit * Front Office chi nyc

35 35 Reachability Example Oops – new link lets packets violate security policy! Routing changed, but Packet filters don’t update automatically R1R2 R5 R4R3 Data Center Packet filter: Drop nyc-FO -> * Permit * Packet filter: Drop chi-FO -> * Permit * Front Office chi nyc

36 36 Reachability Example Typical response – add more packet filters to plug the holes in security policy R1 R2 R5 R4R3 Data CenterFront Office chi nyc Packet filter: Drop nyc-FO -> * Permit * Packet filter: Drop chi-FO -> * Permit *

37 37 Reachability Example Packet filters have surprising consequences Consider a link failure chi-FO and nyc-FO still connected R1 R2 R5 R4R3 Data Center Drop nyc-FO -> * Front Office chi nyc Drop chi-FO -> *

38 38 Reachability Example Network has less survivability than topology suggests chi-FO and nyc-FO still connected But packet filter means no data can flow! Probing the network won’t predict this problem R1 R2 R5 R4R3 Data Center Drop nyc-FO -> * Front Office chi nyc Drop chi-FO -> *

39 39 State of the Art in Reachability Analysis Build the network, try sending packets ping, traceroute, monitoring tools Only checks paths currently selected by routing protocols Cannot be used for “ what if ” analysis Our goal: Static Reachability Analysis Predict reachability over multiple scenarios through analysis of router configuration files

40 40 Predicting Reachability How can we formalize the reachability provided by a network? The set of packets the network will carry from router i to router j A function of the forwarding state s s represents the contents of each FIB R i,j (s) is the instantaneous reachability i j R i,j (s)

41 41 Computing Reachability R1 F 4,3 (s) F 2,1 (s) F 2,3 (s) F 3,2 (s) R3 R2 R4 F 1,2 (s) F 3,4 (s) F i,j (s): Set of packets permitted along link from node i to node j in network state s Packets allowed along path  The set of all paths from i to j

42 42 Jointly Modeling the Effects of Packet Filters and Routing Key Problem: F i,j (s) affected by routing and packet filters Key Insight: Treat routes as dynamic packet filters R1 R3R2 DestNextHop AR3 BR1 CR3 Permit *->A Permit *->C Drop *->* Permit *->B Drop *->*

43 43 Bounding the Instantaneous Reachability Knowing the exact forwarding state s is impractical Knowing R i,j (s) doesn’t help much, anyway Want to predict behavior over a range of states Luckily, predicting behavior over set of all possible states is easier than predicting reachability for a single state

44 44 Reachability Bounds Lower bound on Reachability Packets in this set never prohibited by network Upper bound on Reachability Packets not in this set always prohibited by network

45 45 Example Upper Bound Analysis Before short-cut link added: After short-cut link added: R1R2 R5 R4R3 Packet filter: Drop nyc-FO -> * Permit * Packet filter: Drop chi-FO -> * Permit * chi nyc

46 46 Example Lower Bound Analysis Before extra packet filters added: After extra packet filters added: R1 R2 R5 R4R3 Packet filter: Drop nyc-FO -> * Permit * Packet filter: Drop chi-FO -> * Permit * chi nyc

47 47 Take Away Points We have defined an algebra for modeling reachability Packet filters, routing protocols, NAT Griffin&Bush validated RFC 2547 VPNs Status Algebra works on test cases Currently experimenting with production networks Algebra’s strength and weakness is static analysis Can validate that network meets static objectives Can have false positives Cannot design the network to meet objectives Cannot control network to obey dynamic objectives

48 48 Outline What do networks look like today? New approach to predicting network behavior A new architecture for controlling networks New principles for network control New architecture embodying those principles Experimental validation

49 49 Does Network Control Actually Matter? YES! Microsoft: All services fell off the network for 23 hours due to misconfiguration of routers in their network (2001) Major ISP: 50% of outages occur during planned maintenance (2005) IP networks have 2-3x the outages as circuit-switched networks (2005)

50 50 Three Principles for Network Control & Management Network-level Objectives: Express goals explicitly Security policies, QoS, egress point selection Do not bury goals in box-specific configuration Management Logic Reachability matrix Traffic engineering rules

51 51 Three Principles for Network Control & Management Network-wide Views: Design network to provide timely, accurate info Topology, traffic, resource limitations Give logic the inputs it needs Management Logic Reachability matrix Traffic engineering rules Read state info

52 52 Three Principles for Network Control & Management Direct Control: Allow logic to directly set forwarding state FIB entries, packet filters, queuing parameters Logic computes desired network state, let it implement it Management Logic Reachability matrix Traffic engineering rules Read state info Write state

53 53 Overview of the 4D Architecture Decision Plane: All management logic implemented on centralized servers making all decisions Decision Elements use views to compute data plane state that meets objectives, then directly writes this state to routers Decision Dissemination Discovery Data Network-level objectives Direct control Network-wide views

54 54 Overview of the 4D Architecture Dissemination Plane: Provides a robust communication channel to each router May run over same links as user data, but logically separate and independently controlled Decision Dissemination Discovery Data Network-level objectives Direct control Network-wide views

55 55 Overview of the 4D Architecture Discovery Plane: Each router discovers its own resources and its local environment E.g., the identity of its immediate neighbors Decision Dissemination Discovery Data Network-level objectives Direct control Network-wide views

56 56 Overview of the 4D Architecture Data Plane: Spatially distributed routers/switches No need to change today’s technology Decision Dissemination Discovery Data Network-level objectives Direct control Network-wide views

57 57 Control & Management Today Data Plane Distributed routers Forwarding, filtering, queueing Based on FIB or labels Management Plane Figure out what is happening in network Decide how to change it Shell scripts Traffic Eng Databases Planning tools OSPF SNMPnetflow Config files OSPF BGP Link metrics OSPF BGP OSPF BGP Control Plane Multiple routing processes on each router Each router with different configuration program Huge number of control knobs: metrics, ACLs, policy FIB Routing policies Packet filters

58 58 Good Abstractions Reduce Complexity All decision making logic lifted out of control plane Eliminates duplicate logic in management plane Dissemination plane provides robust communication to/from data plane routers Management Plane Control Plane Data Plane Decision Plane Dissemination Data Plane Configs FIBs, ACLs

59 59 Three Key Questions Could the 4D architecture ever be deployed? Is the 4D architecture feasible? Can the 4D architecture actually simplify network control and management?

60 60 Deployment of the 4D Architecture Pre-existing industry trend towards separating router hardware from software IETF: FORCES, GSMP, GMPLS SoftRouter [Lakshman, HotNets’04] Incremental deployment path exists Individual networks can upgrade to 4D and gain benefits Small enterprise networks have most to gain

61 61 The Feasibility of the 4D Architecture We designed and built a prototype of the 4D Decision plane Contains logic to simultaneously compute routes and enforce reachability matrix Multiple Decision Elements per network, using simple election protocol to pick master Dissemination plane Uses source routes to direct control messages Extremely simple, but can route around failed data links

62 62 Performance of the 4D Prototype Evaluated using Emulab (www.emulab.net)www.emulab.net Linux PCs used as routers (650 – 800MHz) Tested on 9 enterprise network topologies (10-100 routers each) Recovers from single link failure in < 300 ms < 1 s response considered “excellent” Survives failure of master Decision Element New DE takes control within 1 s No disruption unless second fault occurs Gracefully handles complete network partitions Less than 1.5 s of outage

63 63 4D Makes Network Management & Control Error-proof R1R2 R5 R4R3 Packet filter: Drop nyc-FO -> * Permit * Packet filter: Drop chi-FO -> * Permit * chi nyc Data CenterFront Office chi-DC chi-FO nyc-DC nyc-FO chi-DCchi-FOnyc-DCnyc-FO

64 64 Prohibiting Packets from chi-FO to nyc-DC

65 65 4D Makes Network Management & Control Error-proof R1 R2 R5 R4R3 Data Center Drop nyc-FO -> * Front Office chi nyc Drop chi-FO -> *

66 66 Allowing Packets from chi-FO to nyc-FO

67 67 Related Work Driving network operation from network-wide views –Traffic Engineering –Traffic Matrix computation Centralization of decision making logic –Routing Control Point [Feamster] –Path Computation Element [Farrel] –Signaling System 7 [Ma Bell]

68 68 Take Aways No need for complicated distributed system in control plane – do away with it! 4D Architecture a promising approach Power of solution comes from: Colocating all decision making in one plane Providing that plane with network-wide views Directly express solution by writing forwarding state Benefits Coordinated state updates ! better reliability Separates network issues from distributed systems issues

69 69 Summary Networks must meet many different types of objectives Security, traffic engineering, robustness Today, objectives met using control plane mechanisms Results in complicated distributed system Ripe with opportunities to set time-bombs Predicting static properties is possible, but difficult Refactoring into a 4D Architecture very promising Separates network issues from reliability issues Eliminates duplicate logic and simplifies network Enables new capabilities, like joint control

70 70 Questions?

71 71 Backup Slides

72 72 Computing Reachability Bounds Problem reduced to estimating all routes potentially in routing table (FIB) of each router Much easier than predicting exactly which routes will be in FIB

73 73 How to Organize the Decision Plane? We have exposed the network control logic --- now what? Need a way to structure that logic Mutual optimization of multiple objectives –Potentially mutually exclusive Each objective has different time constants Multiple objectives may affect the same bit of data-plane state

74 74 Future Directions 4D in different network contexts Ethernet networks Mixed networks: circuit- and packet- switched Include services in the 4D Domain Name Service HTTP Proxies and load balancers

75 75 Reverse-Engineering Overview Configuration files Find links Find adjacent routing processes Construct Routing Process Graph Condense adjacent routing processes Construct Routing Instance Graph Construct Layer 3 Topology OSPF #1OSPF #2BGP AS1 AS2

76 76 Reconstruct the Layer 3 Topology interface Serial1/0.5 ip address 1.1.1.1 255.255.255.252 …. Router 1 Config interface Serial2/1.5 ip address 1.1.1.2 255.255.255.252 …. Router 2 Config Internet

77 77 Abstract to a Routing Instance Graph Pick an unassigned Routing Process Flood fill along process adjacencies, labeling processes Repeat until all processes assigned to an Instance OSPF #1OSPF #2BGP AS1 EBGP AS2 Policy1Policy2 OSPFBGPOSPF Route Table RT OSPF RT OSPF RT OSPF RT OSPF

78 78 Textbook Routing Design for Enterprise Networks Border routers speak eBGP to external peers BGP selects a few key external routes to redistribute into OSPF 7 of 25 enterprise networks follow this pattern OSPF BGP AS #1 EBGP AS2 AS3

79 79 Reality: A Diversity of Unusual Routing Designs Network broken up into compartments, each with only 1 to 4 routers Each compartment has its own AS number Hub and spoke logical topology Why? Lots of control over how spokes communicate BGP AS #1 BGP AS #4 BGP AS #2 BGP AS #3 BGP AS #5 EBGP Rest of the World

80 80 Reality: A Diversity of Unusual Routing Designs Network broken up into many compartments, each running EIGRP, some with 400+ routers BGP used to filter routes passed between compartments Compartments themselves pass information between BGP speakers Why? Little need for IBGP; few routers speak BGP; Lots of control over how packets move between compartments BGP AS #1 EBGP Rest of the World EIGRP BGP AS #2 EIGRP BGP AS #3 BGP AS #4 Rest of the World EBGP

81 81 Link Down

82 82 Reconvergence Time Under Single Link Failure

83 83 Reconvergence Time When Master DE Crashes

84 84 Reconvergence Time When Network Partitions

85 85 Reconvergence Time When Network Partitions

86 86 Slides in Progress or Looking for a Place to go

87 87 Separation of Issues The 4D Architecture separates issues Networking logic goes into decision plane

88 88 Dissemination Plane Make clear that dissem paths can use same physical links, but different routing Discovery and dissem packets can be independent of data-plane (e.g. IP) IP is very configuration intensive (addresses, etc) so we avoid it whenever possible

89 89 Questions What if I want to take a bunch of hosts and stick them together into a small network? Haven’t you made this common case terrifically hard? Today, I’d use static routes – it’s neither common nor easy In the 4D model, what do I do? –DE co-located on the host –Doesn’t talk to any other DEs or routers

90 90 Problems with State of the Art Today: Network behavior determined by multiple interacting distributed programs, written in assembly language No way to visualize or describe routing design Impossible to establish linkage between configurations and network objectives Only a few “textbook” routing designs are widely known


Download ppt "1 Rethinking Network Control & Management The Case for a New 4D Architecture David A. Maltz Carnegie Mellon University Joint work with Albert Greenberg,"

Similar presentations


Ads by Google