David A. Maltz Carnegie Mellon University/Microsoft Research

Slides:



Advertisements
Similar presentations
Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.
Advertisements

Symantec 2010 Windows 7 Migration Global Results.
Simplifications of Context-Free Grammars
EE384y: Packet Switch Architectures
1 UNIT I (Contd..) High-Speed LANs. 2 Introduction Fast Ethernet and Gigabit Ethernet Fast Ethernet and Gigabit Ethernet Fibre Channel Fibre Channel High-speed.
Process Description and Control
AP STUDY SESSION 2.
1
Distributed Systems Architectures
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 4 Computing Platforms.
Sequential Logic Design
Processes and Operating Systems
Copyright © 2013 Elsevier Inc. All rights reserved.
OSPF 1.
Introduction to IP Routing Geoff Huston. Routing How do packets get from A to B in the Internet? A B Internet.
1 Building a Fast, Virtualized Data Plane with Programmable Hardware Bilal Anwer Nick Feamster.
Multihoming and Multi-path Routing
1 Hyades Command Routing Message flow and data translation.
David Burdett May 11, 2004 Package Binding for WS CDL.
CALENDAR.
MPLS VPN.
1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
The 5S numbers game..
© Tally Solutions Pvt. Ltd. All Rights Reserved Shoper 9 License Management December 09.
Chapter 7: Steady-State Errors 1 ©2000, John Wiley & Sons, Inc. Nise/Control Systems Engineering, 3/e Chapter 7 Steady-State Errors.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Logically-Centralized Control COS 597E: Software Defined Networking.
Break Time Remaining 10:00.
Chapter 1: Introduction to Scaling Networks
PP Test Review Sections 6-1 to 6-6
The Platform as a Service Model for Networking Eric Keller, Jennifer Rexford Princeton University INM/WREN 2010.
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 2 The OSI Model and the TCP/IP.
Briana B. Morrison Adapted from William Collins
Chapter 3 Logic Gates.
Outline Minimum Spanning Tree Maximal Flow Algorithm LP formulation 1.
Operating Systems Operating Systems - Winter 2010 Chapter 3 – Input/Output Vrije Universiteit Amsterdam.
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 10 Routing Fundamentals and Subnets.
Adding Up In Chunks.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Route Optimisation RD-CSY3021.
Before Between After.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Types of selection structures
1 Titre de la diapositive SDMO Industries – Training Département MICS KERYS 09- MICS KERYS – WEBSITE.
Chapter 12 Working with Forms Principles of Web Design, 4 th Edition.
Converting a Fraction to %
Clock will move after 1 minute
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 9 TCP/IP Protocol Suite and IP Addressing.
Chapter 11 Creating Framed Layouts Principles of Web Design, 4 th Edition.
Physics for Scientists & Engineers, 3rd Edition
Select a time to count down from the clock above
1.step PMIT start + initial project data input Concept Concept.
9. Two Functions of Two Random Variables
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Link-State Routing Protocols Routing Protocols and Concepts – Chapter.
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
1 Route Control Platform Making the Network Act Like One Big Router Jennifer Rexford Princeton University
A Routing Control Platform for Managing IP Networks Jennifer Rexford Computer Science Department Princeton University
1 Problems and Solutions in Enterprise Network Control: Motivations for a 4D Architecture David A. Maltz Microsoft Research Joint work with Albert Greenberg,
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
Tesseract A 4D Network Control Plane
1 Network-wide Decision Making: Toward a Wafer-thin Control Plane Jennifer Rexford, Albert Greenberg, Gisli Hjalmtysson ATT Labs Research David A. Maltz,
1 Structure Preserving Anonymization of Router Configuration Data David A. Maltz, Jibin Zhan, Geoffrey Xie, Hui Zhang Carnegie Mellon University Gisli.
1 Rethinking Network Control & Management The Case for a New 4D Architecture David A. Maltz Carnegie Mellon University Joint work with Albert Greenberg,
1 Rethinking Network Control and Management David A. Maltz
11 Tesseract* A 4D Network Control Plane Hong Yan, David A. Maltz, T. S. Eugene Ng Hemant Gogineni, Hui Zhang, Zheng Cai *Tesseract is a 4-dimensional.
1 Routing Design in Operational Networks: A Look from the Inside David A. Maltz, Geoffrey Xie, Jibin Zhan, Hui Zhang Carnegie Mellon University Gisli Hjalmtysson,
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
15-744: Computer Networking
Presentation transcript:

Rethinking Network Control & Management The Case for a New 4D Architecture David A. Maltz Carnegie Mellon University/Microsoft Research Joint work with Albert Greenberg, Gisli Hjalmtysson Andy Myers, Jennifer Rexford, Geoffrey Xie, Hong Yan, Jibin Zhan, Hui Zhang

The Role of Network Control and Management Many different network environments Access, backbone networks Data-center networks, enterprise/campus Sizes: 10-10,000 routers/switches Many different technologies Longest-prefix routing (IP), fixed-width routing (Ethernet), label switching (MPLS, ATM), circuit switching (optical, TDM) Many different policies Routing, reachability, transit, traffic engineering, robustness The control plane software binds these elements together and defines the network

We Can Change the Control Plane! Pre-existing industry trend towards separating router hardware from software IETF: FORCES, GSMP, GMPLS SoftRouter [Lakshman, HotNets’04] Incremental deployment path exists Individual networks can upgrade their control planes and gain benefits Small enterprise networks have most to gain No changes to end-systems required

A Clean-slate Design What are the fundamental causes of network problems? How to secure the network and protect the infrastructure? How to provide flexibility in defining management logic? What functionality needs to be distributed – what can be centralized? How to reduce/simplify the software in networks? What would a “RISC” router look like? How to leverage technology trends? CPU and link-speed growing faster than # of switches

Three Principles for Network Control & Management Network-level Objectives: Express goals explicitly Security policies, QoS, egress point selection Do not bury goals in box-specific configuration Reachability matrix Traffic engineering rules Management Logic

Three Principles for Network Control & Management Network-wide Views: Design network to provide timely, accurate info Topology, traffic, resource limitations Give logic the inputs it needs Reachability matrix Traffic engineering rules Management Logic Read state info

Three Principles for Network Control & Management Direct Control: Allow logic to directly set forwarding state FIB entries, packet filters, queuing parameters Logic computes desired network state, let it implement it Reachability matrix Traffic engineering rules Write state Management Logic Read state info

Overview of the 4D Architecture Network-level objectives Decision Dissemination Direct control Network-wide views Discovery Data Decision Plane: All management logic implemented on centralized servers making all decisions Decision Elements use views to compute data plane state that meets objectives, then directly writes this state to routers

Overview of the 4D Architecture Network-level objectives Decision Dissemination Direct control Network-wide views Discovery Data Dissemination Plane: Provides a robust communication channel to each router – and robustness is the only goal! May run over same links as user data, but logically separate and independently controlled

Overview of the 4D Architecture Network-level objectives Decision Dissemination Direct control Network-wide views Discovery Data Discovery Plane: Each router discovers its own resources and its local environment E.g., the identity of its immediate neighbors

Overview of the 4D Architecture Network-level objectives Decision Dissemination Direct control Network-wide views Discovery Data Data Plane: Spatially distributed routers/switches Can deploy with today’s technology Looking at ways to unify forwarding paradigms across technologies

Concerns and Challenges Distributed Systems issues How will communication between routers and DEs survive failures in the network? Latency means DE’s view of network is behind reality. Will the control loop be stable? What is the overhead to/from the DEs? What happens in a network partition? Networking issues Does the 4D simplify control and management? Can we create logic to meet multiple objectives?

The Feasibility of the 4D Architecture We designed and built a prototype of the 4D Architecture 4D Architecture permits many designs – prototype is a single, simple design point Decision plane Contains logic to simultaneously compute routes and enforce reachability matrix Multiple Decision Elements per network, using simple election protocol to pick master Dissemination plane Uses source routes to direct control messages Extremely simple, but can route around failed data links

Evaluation of the 4D Prototype Evaluated using Emulab (www.emulab.net) Linux PCs used as routers (650 – 800MHz) Tested on 9 enterprise network topologies (10-100 routers each) Example network with 49 switches and 5 DEs

Performance of the 4D Prototype Trivial prototype has performance comparable to well-tuned production networks Recovers from single link failure in < 300 ms < 1 s response considered “excellent” Faster forwarding reconvergence possible Survives failure of master Decision Element New DE takes control within 1 s No disruption unless second fault occurs Gracefully handles complete network partitions Less than 1.5 s of outage

Fundamental Problem: Wrong Abstractions Shell scripts Traffic Eng Management Plane Figure out what is happening in network Decide how to change it Planning tools Databases Configs SNMP netflow modems OSPF Control Plane Multiple routing processes on each router Each router with different configuration program Huge number of control knobs: metrics, ACLs, policy Link metrics OSPF BGP Routing policies OSPF BGP OSPF BGP Fix up the graphics on this slide FIB Data Plane Distributed routers Forwarding, filtering, queueing Based on FIB or labels FIB FIB Packet filters

Good Abstractions Reduce Complexity Management Plane Configs Decision Plane Control Plane FIBs, ACLs FIBs, ACLs Dissemination Data Plane Data Plane All decision making logic lifted out of control plane Eliminates duplicate logic in management plane Dissemination plane provides robust communication to/from data plane switches

Today: Simple Things are Hard to Do Inter-POP Links Access Networks

Fundamental Problem: Configurations Allow Too Many Degrees of Freedom Computing configuration files that cause control plane to compute desired forwarding states is intractable NP-hard in many cases Requires predictive model of control plane behavior Configurations files form a program that defines a set of forwarding states Very hard to create program that permits only desired states, and doesn’t transit through bad ones Focus on what routers do, shouldn’t do. Don’t overreach the paper. Focus on architectural issues. Reviewers like the example Forwarding states allowed by configs Auto-adaptation leads to/thru bad states Direct Control avoids bad states

Fundamental Problem: Conflation of Issues Ideal case: all routing information flooded to all routers inside network Robustness achieved via flooding Reality: routing information filtered and aggregated extensively Route filtering used to implement security and resource policies Route aggregation used to achieve scalability

4D Separates Distributed Computing Issues from Networking Issues Distributed computing issues ! protocols and network architecture Overhead Resiliency Scalability Networking issues ! management logic Traffic engineering and service provisioning Egress point selection Reachability control (VPNs) Precomputation of backup paths

Future Work Scalability Structuring decision logic Evaluate over 1-10K switches, 10-100K routes Networks with backbone-like propagation delays Structuring decision logic Arbitrate among multiple, potentially competing objectives Unify control when some logic takes longer than others Protocol improvements Better dissemination and discovery planes Deployment in today’s networks Data center, enterprise, campus, backbone (RCP)

Future Work Experiment with network appliances Traffic shapers, traffic scrubbers Expand relationships with security Using 4D as mechanism for monitoring/quarantine Formulate models that establish bounds of 4D Scale, latency, stability, failure models, objectives Generate evidence to support/refute principles

Questions?

Direct Control Provides Complete Control Zero device-specific configuration Supports many models for “pushing” routes Trivial push – convergence requires time for all updates to be receive and applied – same as today Synchronized update – updates propagated, but not applied till agreed time in the future – clock skew defines convergence time Controlled state trajectory – DE serializes updates to avoid all incorrect transient states

Fundamental Problem: Wrong Abstractions interface Ethernet0 ip address 6.2.5.14 255.255.255.128 interface Serial1/0.5 point-to-point ip address 6.2.2.85 255.255.255.252 ip access-group 143 in frame-relay interface-dlci 28 router ospf 64 redistribute connected subnets redistribute bgp 64780 metric 1 subnets network 66.251.75.128 0.0.0.127 area 0 router bgp 64780 redistribute ospf 64 match route-map 8aTzlvBrbaW neighbor 66.253.160.68 remote-as 12762 neighbor 66.253.160.68 distribute-list 4 in access-list 143 deny 1.1.0.0/16 access-list 143 permit any route-map 8aTzlvBrbaW deny 10 match ip address 4 route-map 8aTzlvBrbaW permit 20 match ip address 7 ip route 10.2.2.1/16 10.2.1.7

Fundamental Problem: Wrong Abstractions 2000 Size of configuration files in a single enterprise network (881 routers) Lines in config file 1000 881 Router ID (sorted by file size)

Fundamental Problem: Conflating Distributed Systems Issues with Networking Issues Routing Process D left D D Routing Process D Routing Process D D left D left Distributed Systems Concern: resiliency to link failures Solution: multiple paths through routing process graph

Fundamental Problem: Conflating Distributed Systems Issues with Networking Issues Routing Process D right D Routing Process D Routing Process D D left D left Distributed Systems Concern: resiliency to link failures Solution: multiple paths through routing process graph

Fundamental Problem: Conflating Distributed Systems Issues with Networking Issues Routing Process Filter routes to D D left D D Routing Process D Routing Process D D left D left Networking Concern: implement resource or security policy Solution: restrict flow of routing information, filter routes, summarize/aggregate routes

4D Supports Network Evolution & Expansion Decision logic can be upgraded as needed No need for update of distributed protocols implemented in software distributed on every switch Decision Elements can be upgraded as needed Network expansion requires upgrades only to DEs, not every switch

Reachability Example R1 R2 Chicago (chi) New York (nyc) Data Center Front Office R5 R3 R4 Two locations, each with data center & front office All routers exchange routes over all links

Reachability Example R1 R2 Chicago (chi) New York (nyc) Data Center Front Office R5 R3 R4 chi-DC chi-FO nyc-DC nyc-FO chi-DC chi-FO nyc-DC nyc-FO

Reachability Example R1 R2 chi Data Center Front Office R5 nyc R3 R4 Packet filter: Drop nyc-FO -> * Permit * R1 R2 chi Data Center Front Office Packet filter: Drop chi-FO -> * Permit * R5 nyc R3 R4 chi-DC chi-FO nyc-DC nyc-FO

Reachability Example A new short-cut link added between data centers Packet filter: Drop nyc-FO -> * Permit * R1 R2 chi Data Center Front Office Packet filter: Drop chi-FO -> * Permit * R5 nyc R3 R4 A new short-cut link added between data centers Intended for backup traffic between centers

Reachability Example R1 R2 chi Data Center Front Office R5 nyc R3 R4 Packet filter: Drop nyc-FO -> * Permit * R1 R2 chi Data Center Front Office Packet filter: Drop chi-FO -> * Permit * R5 nyc R3 R4 Oops – new link lets packets violate security policy! Routing changed, but Packet filters don’t update automatically

Prohibiting Packets from chi-FO to nyc-DC

Reachability Example R2 R1 chi Data Center Front Office R5 nyc R3 R4 Packet filter: Drop nyc-FO -> * Permit * R2 R1 chi Data Center Front Office Packet filter: Drop chi-FO -> * Permit * R5 nyc R3 R4 Typical response – add more packet filters to plug the holes in security policy

Reachability Example R2 R1 chi Data Center Front Office R5 nyc R3 R4 Drop nyc-FO -> * R2 R1 chi Data Center Front Office R5 nyc Drop chi-FO -> * R3 R4 Packet filters have surprising consequences Consider a link failure chi-FO and nyc-FO still connected

Reachability Example R2 R1 chi Data Center Front Office R5 nyc R3 R4 Drop nyc-FO -> * R2 R1 chi Data Center Front Office R5 nyc Drop chi-FO -> * R3 R4 Network has less survivability than topology suggests chi-FO and nyc-FO still connected But packet filter means no data can flow! Probing the network won’t predict this problem

Allowing Packets from chi-FO to nyc-FO

Multiple Interacting Routing Processes Client OSPF BGP FIB EBGP Policy1 Policy2 Internet Server Routes flow like water through the graph, gated by policy on the links

The Routing Instance Graph of a 881 Router Network

Reconvergence Time Under Single Link Failure

Reconvergence Time When Master DE Crashes

Reconvergence Time When Network Partitions

Reconvergence Time When Network Partitions

Many Implementations Possible Single redundant decision engine Multiple decision engines Hot stand-by Divide network & load share Distributed decision engines Up to one per router Choice can be based on reliability requirements Dessim. Plane can be in-band, or leverage OOB links Less need for distributed solutions (harder to reason about) More focus on network issues, less on distributed protocols

Direct Expression Enables New Algorithms OSPF normally calculates a single path to each destination D OSPF allows load-balancing only for equal-cost paths to avoid loops Using ECMP requires careful engineering of link weights D Decision Plane with network-wide view can compute multiple paths “Backup paths” installed for free! Bounded stretch, bounded fan-in

Systems of Systems Systems are designed as components to be used in larger systems in different contexts, for different purposes, interacting with different components Example: OSPF and BGP are complex systems in its own right, they are components in a routing system of a network, interacting with each other and packet filters, interacting with management tools … Complex configuration to enable flexibility The glue has tremendous impact on network performance State of art: multiple interactive distributed programs written in assembly language Lack of intellectual framework to understand global behavior

Supporting Network Evolution Logic for controlling the network needs to change over time Traffic engineering rules Interactions with other networks Service characteristics Upgrades to field-deployed network equipment must be avoided Very high cost Software upgrades often require hardware upgrades (more CPU or memory)

Supporting Network Evolution Today Today’s “Solution” Vendors stuff their routers with software implementing all possible “features” Multiple routing protocols Multiple signaling protocols (RSVP, CR-LDP) Each feature controlled by parameters set at configuration time to achieve late binding Feature-creep creates configuration nightmare Tremendous complexity for syntax & semantics Mis-interactions between features is common Our Goal: Separate decision making logic from the field-deployed devices

Supporting Network Expansion Networks are constantly growing New routers/switches/links added Old equipment rarely removed Adding a new switch can cause old equipment to become overloaded CPU/Memory demands on each device should not scale up with network size

Supporting Network Expansion Today Routers run a link-state routing protocol Size of link-state database scales with # of routers Expanding network can exceed memory limits of old routers Today’s “Solution” Monitor resources on all routers Predict approach of exhaustion and then: Global upgrade Rearchitecture of routing design to add summarization, route aggregation, information hiding Our Goal: make demands scale with hardware (e.g., # of interfaces)

Supporting Remote Devices Maintaining communication with all network devices is critical for network management Diagnosis of problems Monitoring status and network health Updating configuration or software “the chicken or the egg….” Cannot send device configuration/management information until it can communicate Device cannot communicate until it is correctly configured

Supporting Remote Devices Today Today’s “Solution” Use PSTN as management network of last resort Connect console of remote routers to phone modem Can’t be used for customer premise equipment (CPE): DSL/cable modems, integrated access devices (IADs) In a converged network, PSTN is decommissioned Our Goal: Preserve management communication to any device that is not physically partitioned, regardless of configuration state

Recent Publications G. Xie, J. Zhan, D. A. Maltz, H. Zhang, A. Greenberg, G. Hjalmtysson, J. Rexford, “On Static Reachability Analysis of IP Networks,” IEEE INFOCOM 2005, Orlando, FL, March 2005. J. Rexford, A. Greenberg, G. Hjalmtysson, D. A. Maltz, A. Myers, G. Xie, J. Zhan, H. Zhang, “Network-Wide Decision Making: Toward a Wafer-Thin Control Plane,” Proceedings of ACM HotNets-III, San Diego, CA, November 2004. D. A. Maltz, J. Zhan, G. Xie, G. Hjalmtysson, A. Greenberg, H. Zhang, “Routing Design in Operational Networks: A Look from the Inside,” Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (ACM SIGCOMM 2004), Portland, Oregon, 2004. D. A. Maltz, J. Zhan, G. Xie, H. Zhang, G. Hjalmtysson, A. Greenberg, J. Rexford, “Structure Preserving Anonymization of Router Configuration Data,” Proceedings of ACM/Usenix Internet Measurement Conference (IMC 2004), Sicily, Italy, 2004.