Scalable Management of Enterprise and Data Center Networks Minlan Yu Princeton University 1.

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

Software-defined networking: Change is hard Ratul Mahajan with Chi-Yao Hong, Rohan Gandhi, Xin Jin, Harry Liu, Vijay Gill, Srikanth Kandula, Mohan Nanduri,
Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
Programming Protocol-Independent Packet Processors
VCRIB: Virtual Cloud Rule Information Base Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan HotCloud 2012.
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
Programmable Measurement Architecture for Data Centers Minlan Yu University of Southern California 1.
Slick: A control plane for middleboxes Bilal Anwer, Theophilus Benson, Dave Levin, Nick Feamster, Jennifer Rexford Supported by DARPA through the U.S.
OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Traffic engineering – Identify large traffic aggregates, traffic changes.
Making Cellular Networks Scalable and Flexible Li Erran Li Bell Labs, Alcatel-Lucent Joint work with collaborators at university of Michigan, Princeton,
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
OpenFlow-Based Server Load Balancing GoneWild
Profiling Network Performance in Multi-tier Datacenter Applications Minlan Yu Princeton University 1 Joint work with Albert Greenberg,
Towards Virtual Routers as a Service 6th GI/ITG KuVS Workshop on “Future Internet” November 22, 2010 Hannover Zdravko Bozakov.
Scalable Flow-Based Networking with DIFANE 1 Minlan Yu Princeton University Joint work with Mike Freedman, Jennifer Rexford and Jia Wang.
Profiling Network Performance in Multi-tier Datacenter Applications
Reconfigurable Network Topologies at Rack Scale
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
Shadow Configurations: A Network Management Primitive Richard Alimi, Ye Wang, Y. Richard Yang Laboratory of Networked Systems Yale University.
Profiling Network Performance in Multi-tier Datacenter Applications Jori Hardman Carly Ho Paper by Minlan Yu, Albert Greenberg, Dave Maltz, Jennifer Rexford,
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
1 Interconnecting LAN segments Repeaters Hubs Bridges Switches.
SDN Scalability Issues
Class 3: SDN Stack Theophilus Benson. Outline Background – Routing in ISP – Cloud Computing SDN application stack revisited Evolution of SDN – The end.
Languages for Software-Defined Networks Nate Foster, Arjun Guha, Mark Reitblatt, and Alec Story, Cornell University Michael J. Freedman, Naga Praveen Katta,
Data Center Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
OpenFlow Switch Limitations. Background: Current Applications Traffic Engineering application (performance) – Fine grained rules and short time scales.
NET-REPLAY: A NEW NETWORK PRIMITIVE Ashok Anand Aditya Akella University of Wisconsin, Madison.
FIREWALL TECHNOLOGIES Tahani al jehani. Firewall benefits  A firewall functions as a choke point – all traffic in and out must pass through this single.
Performance Diagnosis and Improvement in Data Center Networks
Software-Defined Networks Jennifer Rexford Princeton University.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Sujayyendhiren RS, Kaiqi Xiong and Minseok Kwon Rochester Institute of Technology Motivation Experimental Setup in ProtoGENI Conclusions and Future Work.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks TCP.
LAN Switching and Wireless – Chapter 1
Floodless in SEATTLE : A Scalable Ethernet ArchiTecTure for Large Enterprises. Changhoon Kim, Matthew Caesar and Jenifer Rexford. Princeton University.
Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)
Palette: Distributing Tables in Software-Defined Networks Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay.
SEATTLE and Recent Work Jennifer Rexford Princeton University Joint with Changhoon Kim, Minlan Yu, and Matthew Caesar.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
“Hashing Out” the Future of Enterprise and Data-Center Networks Jennifer Rexford Princeton University Joint with Changhoon.
SDN AND OPENFLOW SPECIFICATION SPEAKER: HSUAN-LING WENG DATE: 2014/11/18.
Programming Languages for Software Defined Networks Jennifer Rexford and David Walker Princeton University Joint work with the.
Aaron Gember, Theophilus Benson, Aditya Akella University of Wisconsin-Madison.
Virtual Machines Created within the Virtualization layer, such as a hypervisor Shares the physical computer's CPU, hard disk, memory, and network interfaces.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
CellSDN: Software-Defined Cellular Core networks Xin Jin Princeton University Joint work with Li Erran Li, Laurent Vanbever, and Jennifer Rexford.
Yaping Zhu with: Jennifer Rexford (Princeton University) Aman Shaikh and Subhabrata Sen (ATT Research) Route Oracle: Where Have.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Data-Plane Verification COS 597E: Software Defined Networking.
Coping with Link Failures in Centralized Control Plane Architecture Maulik Desai, Thyagarajan Nandagopal.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Yiting Xia, T. S. Eugene Ng Rice University
SDN challenges Deployment challenges
Problem: Internet diagnostics and forensics
CIS 700-5: The Design and Implementation of Cloud Networks
Software defined networking: Experimental research on QoS
TCP Performance Monitoring
Networking Devices.
Revisiting Ethernet: Plug-and-play made scalable and efficient
NOX: Towards an Operating System for Networks
CS 31006: Computer Networks – The Routers
DDoS Attack Detection under SDN Context
Software Defined Networking
Specialized Cloud Architectures
Presentation transcript:

Scalable Management of Enterprise and Data Center Networks Minlan Yu Princeton University 1

Edge Networks 2 Data centers (cloud) Internet Enterprise networks (corporate and campus) Home networks

Redesign Networks for Management Management is important, yet underexplored – Taking 80% of IT budget – Responsible for 62% of outages Making management easier – The network should be truly transparent 3 Redesign the networks to make them easier and cheaper to manage Redesign the networks to make them easier and cheaper to manage

Main Challenges 4 Simple Switches (cost, energy) Flexible Policies (routing, security, measurement) Large Networks (hosts, switches, apps)

Large Enterprise Networks 5 …. Hosts (10K - 100K) Hosts (10K - 100K) Switches (1K - 5K) Switches (1K - 5K) Applications ( K) Applications ( K)

Large Data Center Networks 6 …. Switches (1K - 10K) Switches (1K - 10K) Servers and Virtual Machines (100K – 1M) Servers and Virtual Machines (100K – 1M) Applications ( K) Applications ( K)

Flexible Policies 7 Customized Routing Access Control Alice Measuremen t Diagnosis … Considerations: - Performance - Security - Mobility - Energy-saving - Cost reduction - Debugging - Maintenance … Considerations: - Performance - Security - Mobility - Energy-saving - Cost reduction - Debugging - Maintenance …

Switch Constraints 8 Switch Small, on-chip memory (expensive, power-hungry) Increasing link speed (10Gbps and more) Storing lots of state Forwarding rules for many hosts/switches Access control and QoS for many apps/users Monitoring counters for specific flows

Edge Network Management 9 Specify policies Management System Configure devices Collect measurements on switches BUFFALO [CONEXT’09] Scaling packet forwarding DIFANE [SIGCOMM’10] Scaling flexible policy on switches BUFFALO [CONEXT’09] Scaling packet forwarding DIFANE [SIGCOMM’10] Scaling flexible policy on hosts SNAP [NSDI’11] Scaling diagnosis on hosts SNAP [NSDI’11] Scaling diagnosis

Research Approach 10 New algorithms & data structure Effective use of switch memory Efficient data collection/analysis Systems prototyping Prototype on OpenFlow Prototype on Win/Linux OS Evaluation & deployment Evaluation on AT&T data Deployment in Microsoft DIFANE SNAP Effective use of switch memory Prototype on Click Evaluation on real topo/trace BUFFALO

11 BUFFALO [CONEXT’09] Scaling Packet Forwarding on Switches

Packet Forwarding in Edge Networks Hash table in SRAM to store forwarding table – Map MAC addresses to next hop – Hash collisions: Overprovision to avoid running out of memory – Perform poorly when out of memory – Difficult and expensive to upgrade memory 12 00:11:22:33:44:55 00:11:22:33:44:66 aa:11:22:33:44:77 …

Bloom Filters Bloom filters in SRAM – A compact data structure for a set of elements – Calculate s hash functions to store element x – Easy to check membership – Reduce memory at the expense of false positives h 1 (x)h 2 (x)h s (x) x V0V0 V m-1 h 3 (x)

One Bloom filter (BF) per next hop – Store all addresses forwarded to that next hop 14 Nexthop 1 Nexthop 2 Nexthop T …… Packet destination query Bloom Filters hit BUFFALO: Bloom Filter Forwarding

Comparing with Hash Table 15 65% Save 65% memory with 0.1% false positives More benefits over hash table – Performance degrades gracefully as tables grow – Handle worst-case workloads well

False Positive Detection Multiple matches in the Bloom filters – One of the matches is correct – The others are caused by false positives 16 Nexthop 1 Nexthop 2 Nexthop T …… Packet destination query Bloom Filters Multiple hits

Handle False Positives Design goals – Should not modify the packet – Never go to slow memory – Ensure timely packet delivery When a packet has multiple matches – Exclude incoming interface Avoid loops in “one false positive” case – Random selection from matching next hops Guarantee reachability with multiple false positives 17

One False Positive Most common case: one false positive – When there are multiple matching next hops – Avoid sending to incoming interface Provably at most a two-hop loop – Stretch <= Latency(AB) + Latency(BA) 18 False positive A A Shortest path B B dst

Stretch Bound Provable expected stretch bound – With k false positives, proved to be at most – Proved by random walk theories However, stretch bound is actually not bad – False positives are independent – Probability of k false positives drops exponentially Tighter bounds in special topologies – For tree, expected stretch is (k > 1) 19

BUFFALO Switch Architecture 20

Prototype Evaluation Environment – Prototype implemented in kernel-level Click – 3.0 GHz 64-bit Intel Xeon – 2 MB L2 data cache, used as SRAM size M Forwarding table – 10 next hops, 200K entries Peak forwarding rate – 365 Kpps, 1.9 μs per packet – 10% faster than hash-based EtherSwitch 21

BUFFALO Conclusion Indirection for scalability – Send false-positive packets to random port – Gracefully increase stretch with the growth of forwarding table Bloom filter forwarding architecture – Small, bounded memory requirement – One Bloom filter per next hop – Optimization of Bloom filter sizes – Dynamic updates using counting Bloom filters 22

DIFANE [SIGCOMM’10] Scaling Flexible Policies on Switches 23

24 Traditional Network Data plane: Limited policies Control plane: Hard to manage Management plane: offline, sometimes manual New trends: Flow-based switches & logically centralized control

Data plane: Flow-based Switches Perform simple actions based on rules – Rules: Match on bits in the packet header – Actions: Drop, forward, count – Store rules in high speed memory (TCAM) 25 drop forward via link 1 Flow space src. (X) dst. (Y) Count packets 1. X:* Y:1  drop 2. X:5 Y:3  drop 3. X:1 Y:*  count 4. X:* Y:*  forward 1. X:* Y:1  drop 2. X:5 Y:3  drop 3. X:1 Y:*  count 4. X:* Y:*  forward TCAM (Ternary Content Addressable Memory)

26 Control Plane: Logically Centralized RCP [NSDI’05], 4D [CCR’05], Ethane [SIGCOMM’07], NOX [CCR’08], Onix [OSDI’10], Software defined networking RCP [NSDI’05], 4D [CCR’05], Ethane [SIGCOMM’07], NOX [CCR’08], Onix [OSDI’10], Software defined networking DIFANE: A scalable way to apply fine-grained policies DIFANE: A scalable way to apply fine-grained policies

Pre-install Rules in Switches 27 Packets hit the rules Forward Problems: Limited TCAM space in switches – No host mobility support – Switches do not have enough memory Pre-install rules Controller

Install Rules on Demand (Ethane) 28 First packet misses the rules Buffer and send packet header to the controller Install rules Forward Controller Problems: Limited resource in the controller – Delay of going through the controller – Switch complexity – Misbehaving hosts

Design Goals of DIFANE Scale with network growth – Limited TCAM at switches – Limited resources at the controller Improve per-packet performance – Always keep packets in the data plane Minimal modifications in switches – No changes to data plane hardware 29 Combine proactive and reactive approaches for better scalability

DIFANE: Doing it Fast and Easy (two stages) 30

Stage 1 31 The controller proactively generates the rules and distributes them to authority switches.

Partition and Distribute the Flow Rules 32 Ingress Switch Egress Switch Distribute partition information Authority Switch A AuthoritySwitch B Authority Switch C reject accept Flow space Controller Authority Switch A Authority Switch B Authority Switch C

Stage 2 33 The authority switches keep packets always in the data plane and reactively cache rules.

Following packets Packet Redirection and Rule Caching 34 Ingress Switch Authority Switch Egress Switch First packet Redirect Forward Feedback: Cache rules Hit cached rules and forward A slightly longer path in the data plane is faster than going through the control plane

Locate Authority Switches Partition information in ingress switches – Using a small set of coarse-grained wildcard rules – … to locate the authority switch for each packet A distributed directory service of rules – Hashing does not work for wildcards 35 Authority Switch A AuthoritySwitch B Authority Switch C X:0-1 Y:0-3  A X:2-5 Y: 0-1  B X:2-5 Y:2-3  C X:0-1 Y:0-3  A X:2-5 Y: 0-1  B X:2-5 Y:2-3  C

Following packets Packet Redirection and Rule Caching 36 Ingress Switch Authority Switch Egress Switch First packet Redirect Forward Feedback: Cache rules Hit cached rules and forward

Three Sets of Rules in TCAM TypePriorityField 1Field 2ActionTimeout Cache Rules 100**111*Forward to Switch B10 sec **Drop10 sec …………… Authority Rules 1400**001*Forward Trigger cache manager Infinity ***Drop, Trigger cache manager …………… Partition Rules 1090***000*Redirect to auth. switch 110… …………… 37 In ingress switches reactively installed by authority switches In ingress switches reactively installed by authority switches In authority switches proactively installed by controller In authority switches proactively installed by controller In every switch proactively installed by controller In every switch proactively installed by controller

Cache Rules DIFANE Switch Prototype Built with OpenFlow switch 38 Data Plane Control Plane Cache Manager Cache Manager Send Cache Updates Recv Cache Updates Only in Auth. Switches Authority Rules Partition Rules Notification Cache rules Just software modification for authority switches

Caching Wildcard Rules Overlapping wildcard rules – Cannot simply cache matching rules 39 Priority: R1>R2>R3>R4 src. dst.

Caching Wildcard Rules Multiple authority switches – Contain independent sets of rules – Avoid cache conflicts in ingress switch 40 Authority switch 1 Authority switch 2

Partition Wildcard Rules Partition rules – Minimize the TCAM entries in switches – Decision-tree based rule partition algorithm 41 Cut A Cut BCut B is better than Cut A

Traffic generator Testbed for Throughput Comparison 42 Controller Authority Switch Ethan e Traffic generator DIFANE Ingress switch …. Controller Testbed with around 40 computers

Peak Throughput ingress switch Controller Bottleneck (50K) Controller Bottleneck (50K) DIFANE (800K) DIFANE (800K) Ingress switch Bottleneck (20K) Ingress switch Bottleneck (20K) DIFANE is self-scaling: Higher throughput with more authority switches. DIFANE is self-scaling: Higher throughput with more authority switches. DIFANE Ethane One authority switch; First Packet of each flow

Scaling with Many Rules Analyze rules from campus and AT&T networks – Collect configuration data on switches – Retrieve network-wide rules – E.g., 5M rules, 3K switches in an IPTV network Distribute rules among authority switches – Only need 0.3% - 3% authority switches – Depending on network size, TCAM size, #rules 44

Summary: DIFANE in the Sweet Spot 45 Logically-centralizedDistributed Traditional network (Hard to manage) Traditional network (Hard to manage) OpenFlow/Ethane (Not scalable) OpenFlow/Ethane (Not scalable) DIFANE: Scalable management Controller is still in charge Switches host a distributed directory of the rules DIFANE: Scalable management Controller is still in charge Switches host a distributed directory of the rules

SNAP [NSDI’11] Scaling Performance Diagnosis for Data Centers 46 Scalable Net-App Profiler

Applications inside Data Centers 47 Front end Server Aggregator Workers ….

Challenges of Datacenter Diagnosis Large complex applications – Hundreds of application components – Tens of thousands of servers New performance problems – Update code to add features or fix bugs – Change components while app is still in operation Old performance problems (Human factors) – Developers may not understand network well – Nagle’s algorithm, delayed ACK, etc. 48

Diagnosis in Today’s Data Center 49 Host App OS Packet sniffer App logs: #Reqs/sec Response time 1% req. >200ms delay Switch logs: #bytes/pkts per minute Packet trace: Filter out trace for long delay req. SNAP: Diagnose net-app interactions SNAP: Diagnose net-app interactions Application-specific Too expensive Too coarse-grained Generic, fine-grained, and lightweight

SNAP: A Scalable Net-App Profiler that runs everywhere, all the time 50

Management System SNAP Architecture 51 At each host for every connection Collect data Performance Classifier Cross- connection correlation Adaptively polling per-socket statistics in OS -Snapshots (#bytes in send buffer) -Cumulative counters (#FastRetrans) Classifying based on the stages of data transfer -Sender app  send buffer  network  receiver Topology, routing Conn  proc/app Offending app, host, link, or switch Online, lightweight processing & diagnosis Offline, cross-conn diagnosis

SNAP in the Real World Deployed in a production data center – 8K machines, 700 applications – Ran SNAP for a week, collected terabytes of data Diagnosis results – Identified 15 major performance problems – 21% applications have network performance problems 52

Characterizing Perf. Limitations 53 Send Buffer Receiver Network #Apps that are limited for > 50% of the time 1 App 6 Apps 8 Apps 144 Apps – Send buffer not large enough – Fast retransmission – Timeout – Not reading fast enough (CPU, disk, etc.) – Not ACKing fast enough (Delayed ACK)

Delayed ACK Problem Delayed ACK affected many delay sensitive apps – even #pkts per record  1,000 records/sec odd #pkts per record  5 records/sec – Delayed ACK was used to reduce bandwidth usage and server interrupts 54 Data ACK Data A B ACK 200 ms …. Proposed solutions: Delayed ACK should be disabled in data centers Proposed solutions: Delayed ACK should be disabled in data centers ACK every other packet

Diagnosing Delayed ACK with SNAP Monitor at the right place – Scalable, lightweight data collection at all hosts Algorithms to identify performance problems – Identify delayed ACK with OS information Correlate problems across connections – Identify the apps with significant delayed ACK issues Fix the problem with operators and developers – Disable delayed ACK in data centers 55

Edge Network Management 56 Specify policies Management System Configure devices Collect measurements on switches BUFFALO [CONEXT’09] Scaling packet forwarding DIFANE [SIGCOMM’10] Scaling flexible policy on switches BUFFALO [CONEXT’09] Scaling packet forwarding DIFANE [SIGCOMM’10] Scaling flexible policy on hosts SNAP [NSDI’11] Scaling diagnosis on hosts SNAP [NSDI’11] Scaling diagnosis

Thanks! 57