Clustering in OpenDaylight

Slides:



Advertisements
Similar presentations
November 2013 Jan Medved, Reinaldo Penno
Advertisements

Proposal: Model-Driven SAL for the OpenDaylight Controller
SDN Controller Challenges
1 CS 194: Elections, Exclusion and Transactions Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
OpenDaylight: An Open Source SDN for Your OpenStack Cloud Stephan Baucke, Ericsson Kyle Mestery, Cisco Anees Shaikh, IBM Chris Wright,
Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.
Active-Standby Deployment
Device Driver Framework Project October 2014.
OpenDaylight Architecture ONF Member Work Day February 12 th, 2015 Colin TSC Chair, OpenDaylight Principal Engineer, Brocade.
Ameoba Designed by: Prof Andrew S. Tanenbaum at Vrija University since 1981.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Virtual Synchrony Jared Cantwell. Review Multicast Causal and total ordering Consistent Cuts Synchronized clocks Impossibility of consensus Distributed.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Overview Distributed vs. decentralized Why distributed databases
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
OpenDaylight Architecture
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Case Study - GFS.
Client/Server Architectures
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.
1 The Google File System Reporter: You-Wei Zhang.
Ed Warnicke – Note: Read with animations
MAHADEV KONAR Apache ZooKeeper. What is ZooKeeper? A highly available, scalable, distributed coordination kernel.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
IMPROUVEMENT OF COMPUTER NETWORKS SECURITY BY USING FAULT TOLERANT CLUSTERS Prof. S ERB AUREL Ph. D. Prof. PATRICIU VICTOR-VALERIU Ph. D. Military Technical.
Best Practices and Pitfalls for Building Products out of OpenDaylight Colin Dixon,TSC Chair, OpenDaylight Principal Software Engineer, Brocade Devin Avery,Sr.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
OpenDaylight: Introduction, Lithium and Beyond Colin Dixon Technical Steering Committee Chair, OpenDaylight Senior Principal Engineer, Brocade Some content.
OpenDaylight: Introduction, Lithium and Beyond
Plug-in for Singleton Service in Clustered environment and improving failure detection methodology Advisor:By: Dr. Chung-E-WangSrinivasa c Kodali Department.
Process Architecture Process Architecture - A portion of a program that can run independently of and concurrently with other portions of the program. Some.
Protocol for I2RS I2RS WG IETF #89 London, UK Dean Bogdanovic v0.1.
Network Virtualization in Multi-tenant Datacenters Author: VMware, UC Berkeley and ICSI Publisher: 11th USENIX Symposium on Networked Systems Design and.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Ryu Overview 2014/11/25 晁鍾義 Tony. What is Ryu ? Component and Ryu What is component ? Component and libraries in the Ryu and description Ryu Architecture.
CSE 486/586 Distributed Systems Consistency --- 3
Review CS File Systems - Partitions What is a hard disk partition?
Mick Badran Using Microsoft Service Fabric to build your next Solution with zero downtime – Lvl 300 CLD32 5.
Hiearchial Caching in Traffic Server. Hiearchial Caching  A set of techniques and mechanisms to increase the size and performance of network caches.
Copyright © 2004, Keith D Swenson, All Rights Reserved. OASIS Asynchronous Service Access Protocol (ASAP) Tutorial Overview, OASIS ASAP TC May 4, 2004.
Azher Mughal / Beraldo Leal Programming OpenFlow Flows for Scientific Profit 1 Azher Mughal / Beraldo Leal SuperComputing 2015.
ZOOKEEPER. CONTENTS ZooKeeper Overview ZooKeeper Basics ZooKeeper Architecture Getting Started with ZooKeeper.
Test and Performance Integration Group.
Detour: Distributed Systems Techniques
Network Virtualization Ben Pfaff Nicira Networks, Inc.
1 Peer Mount Eric Voit Alexander Clemm 5-Nov-2015.
1 Needing an extensible Mount syntax across Schema, Alias, & Peers IETF 95 Eric Voit, Alex Clemm April 4 th 2016.
SDN controllers App Network elements has two components: OpenFlow client, forwarding hardware with flow tables. The SDN controller must implement the network.
OpenDaylight Clustering – What’s new in Boron
LISP Flow Mapping Service
Introduction to Distributed Platforms
Integrating HA Legacy Products into OpenSAF based system
ETHANE: TAKING CONTROL OF THE ENTERPRISE
Alternative system models
Principles of Network Applications
CPS 512 midterm exam #1, 10/7/2016 Your name please: ___________________ NetID:___________ /60 /40 /10.
Overview of SDN Controller Design
Introduction to Networks
Replication Middleware for Cloud Based Storage Service
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
Yang Zhang, Eman Ramadan, Hesham Mekky, Zhi-Li Zhang
Management and Orchestration in Complex and Dynamic Environment
Lecture 1: Multi-tier Architecture Overview
Scalable Causal Consistency
CSE 486/586 Distributed Systems Consistency --- 2
Implementing Consistency -- Paxos
CSE 486/586 Distributed Systems Consistency --- 3
Presentation transcript:

Clustering in OpenDaylight Colin Dixon Technical Steering Committee Chair, OpenDaylight Distinguished Engineer, Brocade Borrowed ideas and content from Jan Medved, Moiz Raja, and Tom Pantelis

Multi-Protocol SDN Controller Architecture Applications Applications OSS/BSS, External Apps Applications ... REST Netconf Server RESTCONF Application Application Controller Core Service Adaptation Layer (SAL) / Core ... Protocol Plugins/Adapters Netconf Client OpenFlow OVSDB Protocol Plugin Network Devices Network Devices Network Devices

Software Architecture Cisco Live 2014 4/27/2017 Software Architecture Applications Applications Network Devices OSS/BSS, External Apps Network Devices Network Devices Apps/Services REST ... Netconf Server RESTCONF Application Application Protocol Plugin Netconf Client Model-Driven SAL (MD-SAL) “Kernel” Data Store RPCs Data Change Notifications Notifications Namespace

Shard Layout Algorithm: Data Store Sharding Data tree root Shard1.1 Shard1.2 Shard1.3 Shard2.1 ShardN.1 ... Node1 Node2 NodeM Shard Layout Algorithm: Place shards on M nodes ShardX.Y: X: Service X Y: Shard Y within Service X Select data subtrees Currently, can only pick a subtree directly under the root Working on subtrees at arbitrary levels Map subtrees onto shards Map shards onto nodes

Shard Replication Node Replication using RAFT [1] Provides strong consistency Transactional data access snapshot reads snapshot writes read/write transactions Tolerates f failures with 2f+1 nodes 3 nodes => 1 failure, 5 => 2, etc. Leaders handle all writes Send to followers before committing Leaders distributed to spread load F F L F F Node F L L F F Node L L F F F [1] https://raftconsensus.github.io/

Strong Consistency Serializability Causal Consistency Everyone always reads the most recent write. Loosely “everyone is at the same point on the same timeline.” Causal Consistency Loosely “you won’t see the result of any event without seeing everything that could have caused that event.” [Typically in the confines of reads/writes to a single datastore.] Eventual Consistency Loosely “everyone will eventually see the same events in some order, but not the necessarily the same order.” [Eventual in the same way that your kid will “eventually” clean their room.]

Why strong consistency matters A flapping port generates events: port up, port down, port up, port down, port up, port down, … Is the port up or down? If they’re not ordered, you have no idea. …sure, but these are events from a single device, so you can keep them ordered easily More complex examples switches (or ports on those switches) attached to different nodes go up and down If you don’t have ordering, different nodes will now come to different conclusions about reachability, paths, etc.

Why everyone doesn’t use it Strong consistency can’t work in the face of partitions If you’re network splits in two, either: one side stops you lose strong consistency Strong consistency requires coordination Effectively, you need some single entity to order everything This has performance costs, i.e., a single strongly consistent data store is limited by the performance of a single node Question is: do we start with strong and relax it or start weak and strengthen it? OpenDaylight has started strong

Service/Application Model Logically, each service or application (code) has a primary subtree (YANG model) and shard it is associated with (data) One instance of the code is co-located with each replica of the data All instances are stateless, all state is stored in the data store The instance that is co-located with the shard leader handles writes Node1 Node2 Node3 Service1 Service2 Service3 Service1 Service2 Service3 Service1 Service2 Service3 Data Store API Data Store API Data Store API S2.9 S2.9 S2.9 S3.1 S3.7 S3.1 S3.7 S3.1 S3.7 S1.1 S1.2 S1.1 S1.2 S1.1 S1.2 SX.Y Shard Leader replica SX.Y Shard Follower replica

Service/Application Model (cont’d) Entity Ownership Service allows for related tasks to be co-located e.g., the tasks related to a given OpenFlow switch should happen where it’s connected also handles HA/failover, automatic election of a new entity owner RPCs and Notifications are directed to the entity owner New cluster-aware data change listeners provide integration into the data store Node1 Node2 Node3 Service1 Service2 Service3 Service1 Service2 Service3 Service1 Service2 Service3 Data Store API Data Store API Data Store API S2.9 S2.9 S2.9 S3.1 S3.7 S3.1 S3.7 S3.1 S3.7 S1.1 S1.2 S1.1 S1.2 S1.1 S1.2 SX.Y Shard Leader replica SX.Y Shard Follower replica

Handling RPCs and Notifications in a Cluster Node Data Change Notifications Flexibly delivered to shard leader, or any subset of nodes YANG-modeled Notifications Delivered to the node on which they were generated Typically guided to entity owner Global RPCs Delivered to node where called Routed RPCs Delivered to the node which registered to handle them F F L F F Node F L L F F Node L L F F F

Service/Shard Interactions Node1 Node2 Node3 Service-x Service1 Service-y 2.1 “notification” “resolve” “read” 1 4 2 Data Store API Data Store API Data Store API “write” 3 2 4.1 3 1 “notification” S2.9 S2.9 S2.9 S3.1 S3.7 S3.7 S3.1 S3.7 S3.1 S1.2 S1.1 S1.2 S1.1 S1.1 S1.2 Service-x “resolves” a read or write to a subtree/shard Reads are sent to the leader Working on allowing for local reads Writes are send to the leader to be ordered Notifications changed data are sent to the shard leader and to anyone registered for remote notification SX.Y Shard Leader replica SX.Y Shard Follower replica

Major additions in Beryllium Entity Ownership Service EntityOwnershipService Clustered Data Change Listeners ClusteredDataChangeListener and ClusteredDataTreeChangeListener Singificant application/plugin adoption OVSDB OpenFlow NETCONF Neutron Etc.

Work in progress Dynamic, multi-level sharding Multi-level, e.g., OpenFlow should be able to say it’s subtrees start at the “switch” node Dynamic, e.g., an OpenFlow subtree should be moved if the connection moves Improve performance, scale, stability, etc. as always Faster, but “stale”, reads from local replica vs. always reading from leader Pipelined transactions for better cluster write throughput Whatever else you’re interested in helping with

Longer term things Helper code for building common app patterns run once in the cluster and fail-over if that node goes down run everywhere and handle things Different consistency models Giving people the knob is easy Dealing with the ramifications is hard Federated/hierarchical clustering