SDN Controller Challenges

Name: SDN Controller Challenges
Uploaded: 2017-10-09T02:57:03+00:00
Duration: PTM14S49
Channel: Michaela Lounsbery
Description: SDN Controller Challenges

SDN Controller Challenges

The Story Thus Far SDN --- centralize the network’s control plane
The controller is effectively the brain of the network Controller determines what to do and tell switches how to do it.

The Story Thus Far

The Story Thus Far Something Happened!!!!

The Story Thus Far Let’s Ask the Brian!!!!

The Story Thus Far Think about what happen…
Maybe come up with a solution

The Story Thus Far Controller runs control function
Tell the network what to do Controller runs control function Control function creates switch state F(global network state)  Switch state Global network state can be graph of the network

Challenges with Centralization
Single point of failure Fault tolerance Performance bottleneck Scalability Efficiency (switch-controller latency) Single point for security violations

Motivation for Distributed Controllers
Wide-Area-Network Wide distribution of switches: from USA to Australia. High latency between one controller and All switches Application + Network growth Higher CPU load for controller More memory for storing FIB entries and calculations High availabilit

Class Outline Fault Tolerance Controller Scalability Google’s B4 paper
Ways to scale the controller Distributed controllers: Mesh Versus Hierarchy Implications of controller placement

Fault Tolerance

Google’s B4 Network Provides connectivity between DC sites
Uses SDN to control edge switches Goal: high utilization of links Insight: fine-grained control over edge and network can lead to higher utilization Distributed Controllers One set of controllers for each Data center (site)

Google’s B4 Network Provides connectivity between DC sites
Uses SDN to control edge switches Goal: high utilization of links Distributed Controllers One set of controllers for each Data center (site)

Fault Tolerance in B4 Each site runs a set of controller
Paxos is run between controllers in a site to determine master

Quick Overview of Paxos
Given N controllers 1 Acts as leader, and N-1 as workers All N controller maintain the same state Switches interact with leader Change doesn’t happen until whole group agrees Failure of primary N-1 work together to elect a new leader(determine new leader) Read/Query Propagate State changes Network Events

Pros-Cons of Paxos Pros Cons
Well understood and studied; gives good FT Many implementations in the wild E.g. Zookeeper Cons Time to recover Impacts through of the put of the entire system

Controller Scalability

What limits a controller’s scalability?
Number of control messages from switch Depends on the application logic E.g. MicroTE/Hedera periodically query all switches for stats Reactive controller, evaluated in NoX, requires each switch to send messages for a new flow Packet-in (if reactive Apps) Flow stats, Flow_time-outs

Application processing overhead The controller runs a bunch of application Similar to: A server running a set of programs CPU/Memory constraint limit how the app runs

Distance between controller and the switches Hedera L3 FW Controller 1

How to Scale the Controller.
Obvious: add more controllers. BUT: how about the applications? Synchronization/concurrency problems. Who controls which switch? Who reacts to which events? Hedera L3 FW Hedera L3 FW Hedera L3 FW ? ? Controller 1 Controller 2 Controller N Stats Install OF entries

Stats + Install OF entries
Medium Sized Networks Assumption: controller can’t store all forwarding table entries in memory But can process all events and run all apps Each controller Get same network events+ running same app.  same output But store output for only a fraction and config only a fraction Hedera L3 FW Hedera L3 FW Hedera L3 FW Controller 1 Controller 2 Controller N Stats Install OF entries

Medium Sized Networks: hyperflow
Each controller Push state to each controller Each controller things it’s the only one in the network Sub-subscribe ssytem Hedera L3 FW Hedera L3 FW Hedera L3 FW Controller 1 Controller 2 Controller N Stats Install OF entries

Large Sized Networks Assumptions Need to partition the application
Each controller can’t store all the FIB entries Each controller can’t run the entire application or handle events Need to partition the application But how?

Application partition 1
Approach 1: each controller runs a specific application How do your resolve conflicts in FW entries Apps can conflict in the rules they install Hedera L3 FW Controller 1 Controller 2 Controller N

Approach 2: all controllers run the same application but for a subset of devices Results in a Distributed Mesh control plane Hedera L3 FW Abstract Network view Controller 2 Hedera L3 FW Hedera L3 FW Controller 1 Controller N

Application Partition 2
Abstract view exchanged with each other Abstract view reduces the n/w information used by each controller REAL NETWORK Hedera L3 FW Controller 2 Abstraction Provided by Controller 1 Abstraction Provided by Controller N Controller 2’s View of NETWORK

ONIX to the SDN Programmer
Controllers synchronize through a DB or DHT So each app needs synchronization code. How do you deal with concurrency. How to synchronize between domains. How many domains? Or controllers? How many switches in a domain?

Approach 3: divide application into local, and global. Results in a hierarchical control plane Global Controller and Local Controllers Applications that do not need network-wide state Can be run locally without communicate with other controllers

Are Hierarchical Controllers Feasible
Examples of local applications: Link Discovery, Learning switch, local policies Examples of local portions of a global algo Data center Traffic engineering Elephant flow detection (hedera) Predictability detection (MicroTE) Local apps/controllers have other benefits High parallelism Can be run closer to the devices.

Kandoo: Hierarchical controllers
2 levels of controllers: global and local Local applications are embarrassingly parallel Local shields global from network events Hedera Global Controller Hedera L3 FW Hedera L3 FW Hedera L3 FW Controller 1 Controller 2 Controller N

Local Controllers: run local apps Returns abstract view to the global controller Reduces # events sent to global and reduce size of network seen by Hedera Global Controller Hedera L3 FW Hedera L3 FW Hedera L3 FW Controller 1 Controller 2 Controller N

Global Controllers Runs global apps: AKA apps that need network wide state Hedera Global Controller Hedera L3 FW Hedera L3 FW Hedera L3 FW Controller 1 Controller 2 Controller N

Hedera Reminder Goal: reduce network contention
Insight: contention happens when elephants share paths. Solution: Detect Elephant flows Place Elephant flows on different flows

Implementing Hedera in Onix
2 levels of controllers: global and local Local applications are embarrassingly parallel Local shields global from network events Hedera: detection +placement Exchange TM+detection Hedera: detection+placement Controller 1 Controller 2 Flow Table entries Stats Flow Table entries Stats

Implementing Hedera in Kandoo
Local Controllers: get stats from networks + elephant detection Global Controller: decide flow placement + flow installation Hedera: Global placement Global Controller Install new flow table entries Inform of elephant flows Elephant detection Elephant detection Elephant detection Controller 1 Controller 2 Controller N Stats

Implementing B4 in Kandoo like architecture
Local Controllers: get stats from networks + determines demand Global Controller: calculate paths for traffic Install TE Ops TE+BW allocator TE DB Global Controller Inform of Flow demands Elephant detection Elephant detection Elephant detection Site Controller Site Controller 2 Site Controller N Stats Install OF entries

Kandoo to the SDN Programmer
Think of what is local and what is global When apps are written, annotate with local flag Kandoo will automatically place local And place global. Kandoo restricts messages between global and local controllers You can’t send OF styles messages Must send Kandoo style messages

Summary Centralization provide simplicity at the cost of reliability and scalability Replication can improve reliability and scalability For Reliability, Paxos is an option For Scalability, conqueror and divide Partition the applications Kandoo: Local apps and global apps Partition the network Onix: each controller controls a subset of switches (Domain)

SDN Controller Challenges

Similar presentations

Presentation on theme: "SDN Controller Challenges"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SDN Controller Challenges

Similar presentations

Presentation on theme: "SDN Controller Challenges"— Presentation transcript:

Similar presentations

About project

Feedback