Download presentation
1
Active-Standby Deployment
2-Node Clustering Active-Standby Deployment
2
2-Node Deployment Topology
Active-Standby Requirements Requirements Configuration of Primary controller in cluster (Must) Primary Controller services the Northbound IP address, a Secondary takes over NB IP upon failover (Must) Configuration of whether on failover & recovery, configured Primary controller reasserts leadership (Must) Configuration of merge strategy on failover & recovery (Want) Primary controller is master of all devices and is leader of all shards (Must) Initial Config (design to allow for alternatives – multi-shard / multiple device masters) Single node operation allowed (access to datastore on non-quorum) (Want)
3
Scenario 1: Master Stays Offline
Failure of Primary Scenario 1: Master Stays Offline Failover Sequence Secondary controller becomes master of all devices and leader of all shards
4
Scenario 2: Primary Comes Back Online
Failure of Primary Scenario 2: Primary Comes Back Online Recovery Sequence Controller A comes back online and its data is replaced by all of Controller B’s data For Re-assert leadership configuration: (ON) Controller A becomes master of all devices and leader of all shards (OFF) Controller B stays master of all devices and maintains leadership of all shards
5
Scenario 1: During Network Partition
Failover Sequence Controller A becomes master of devices in its network segment and leader of all shards Controller B becomes master of devices in its network segment and leader of all shards
6
Scenario 2: Network Partition Recovers
Recovery Sequence Merge data according to pluggable merge strategy (Default: Secondary’s data replaced with Primary’s data.) For Re-assert leadership configuration: (ON) Controller A becomes master of all devices and leader of all shards again. (OFF) Controller B becomes master of all devices and leader of all shards again
7
Failures That Do Not Result in Any Role Changes
No-Op Failures Failures That Do Not Result in Any Role Changes Scenarios Secondary controller failure. Any single link failure. Secondary controller loses network connectivity (but device connections to Primary maintained)
8
Cluster Configuration Options
Global & Granular Configuration Global Cluster Leader (aka “Primary”) Allow this to be changed on live system, e.g. maintenance. Assigned (2-Node Case), Elected (Larger Cluster Case) Cluster Leader Northbound IP Reassert Leadership on Failover and Recovery Network Partition Detection Alg. (pluggable) Global Overrides of Per Device/Group and Per Shard items (below) Per Device / Group Master / Slave Per Shard Shard Leader (Shard Placement Strategy – pluggable) Shard Data Merge (Shard Merge Strategy – pluggable)
9
HA Deployment Scenarios
Simplified Global HA Settings Can we Abstract Configurations to Admin-Defined Deployment Scenarios? e.g. Admin Configures 2-Node (Active-Standby): This means Primary controller is master of all devices and leader of all shards. Conflicting configurations are overridden by deployment scenario.
10
Implementation Dependencies
Potential Changes to Other ODL Projects Clustering: Refactoring of Raft Actor vs. 2-Node Raft Actor code. Define Cluster Leader Define Northbound Cluster Leader IP Alias OpenFlow Plugin: OpenFlow Master/Slave Roles Grouping of Master/Slave Roles (aka “Regions”) System: Be Able to SUSPEND the Secondary controller to support Standby mode.
11
Follow-up Design Discussion Topics
Open Issues Follow-up Design Discussion Topics TBD: Is Master/Slave definition too tied to OpenFlow? (Generalize?) Should device ownership/mastership be implemented by OF Plugin? How to define Northbound Cluster Leader IP in a platform independent way? (Linux/Mac OSx: IP Alias, Windows: Possible) Gratuitous ARP on Leader Change. When both Controllers are active in Network Partition scenario which controller “owns” the Northbound Cluster Leader IP? Define Controller-Wide SUSPEND behavior (how?) On failure Primary controller should be elected (2-node case Secondary is only option to be elected) How/Need to detect management plane failure? (Heartbeat timeout >> w.c. GC?)
12
Implementation (DRAFT)
13
Change Summary Cluster Primary: (OF Master & Shard Leader)
Northbound IP Address (Config) Define Northbound IP Alias Address (Logic) <Pluggable> Northbound IP Alias Implementation (Platform Dependent) Behavior (Config / Logic) <Pluggable> Define Default Primary Controller Assigned (Configuration) – Default for 2-Node Calculated (Election Algorithm) Redefine Default Primary Controller on Running Clustering (Logic) Control OF Master Role (Logic) Control Datastore Shards Global Config (Overridden) Shard Placement (On Primary) <Pluggable> Leadership Determination Match OF Master – Default for 2-Node Election Based (With Influence)
14
Change Summary (Continued) Cluster Primary: (OF Master & Shard Leader)
Behavior (Continued) Network Partition & Failure Detection (Config / Logic) <Pluggable> Detection Algorithm – Default: Akka Clustering Alg. Failover (Config / Logic) <Pluggable> Secondary Controller Behavior (Logic) Suspend (Dependent APP, Datastore, etc.) (Logic) Resume (Become Primary) (OF Mastership, Shards Leader, Non-Quorum Datastore Access) Failback (Logic) <Pluggable> Data Merge Strategy – Default: Current Primary Overrides Secondary (Config) Primary Re-Asserts Leadership on Failback (OF Master & Shard Leader Roles – After Merge)
15
Dependencies Southbound System Suspend Behavior Config Subsystem
Device Ownership & Roles System Suspend Behavior How to Enforce System-Wide Suspend When Desired? (Config Subsystem? OSGI?) Config Subsystem Resolving APP Data Notifications? Measure Failover Times No Data Exchange Various Data Exchange Cases (Sizes)
16
RAFT/Sharding Changes (DRAFT)
17
(Current) Shard Design
ShardManager is an actor who does the following Creates all local shard replicas on a given cluster node and maintains the shard information Monitor the cluster members, their status, and stores their addresses Finds local shards Shard is an actor (instance of RaftActor) which represents a sub-tree within data store Uses in-memory data store Handles requests from Three phase commit Cohorts Handles the data change listener requests and notifies the listeners upon state change Responsible for data replication among the shard (data sub-tree) replicas. Shard uses RaftActorBehavior for two tasks Leader Election for a given shard Data Replication RaftActorBehavior can be in any of the following roles at any given point of time Leader Follower Candidate
18
(Current) Shard Class Diagram
19
(Proposed) Shard Design
Intent Support two-node cluster by separating shard data replication from Leader election Elect one of the ODL node “master” and mark that as “Leader” for all the shards Make Leader Election Pluggable Current Raft Leader Election logic should work for 3-node deployment Design Idea Minimize the impact on “ShardManager” and “Shard” Separate ‘leader election’ and ‘data replication’ logic with ‘RaftActorBehavior’ classes. Create two separate abstract classes and interfaces for ‘leader election’ and ‘data replication’ Shard actor will contain reference to ‘RaftReplicatonActorBehavior’ instances (currentBehavior). ‘RaftReplicationActorBehavior’ will contain reference to ‘ElectionActorBehavior’ instance. Both ‘RaftReplicationActorBehavior’ and ‘ElectionActorBehavior’ instances will be in one of the roles at any given point of time Leader Follower Candidate “RaftReplicationActorBehavior” will update it’s “ElectionActorBehavior” instance based on message received. The message could be sent either by one of the “ElectionActorBehavior” instance or a module that implement “2-node cluster” logic.
20
(Proposed) Shard Class Diagram
21
2-node cluster work flow
Method-1: Run 2-node cluster protocol outside of ODL External cluster protocol decides which node is ‘master’ and which node is ‘standby’. Once the master election is complete, master sends node roles and node membership information to all the ODL instances. ‘Cluster module’ within ODL defines ‘cluster node’ model and provides REST APIs to configure the cluster information by modifying the *.conf files. ‘Cluster module’ will send RAFT messages to all other the cluster members about cluster information – membership & shard RAFT state. ‘ShardActors’ in both the cluster nodes will handle these messages, and instantiate corresponding “replication Behavior” & “election Behavior” role instances and switch to new roles. Northbound virtual IP is OS dependent and out of scope here.
22
Reference diagram for Method-2
1a. Switch to controller connectivity state polling Cluster protocol - Primary path 1b. Cluster protocol – Secondary path
23
2-node cluster work flow
Method-2: Run cluster protocol within ODL ‘Cluster Module’ within each ODL instance, talks to other ODL instance and elects the ‘master’ and ‘standby’ nodes. If cluster times out, a node will check other factors (probably cross-check with connected ‘open flow’ switches for ‘primary’ controller information or use alternative path) for new master election. ‘Cluster module’ will send RAFT messages to all other the cluster members about ‘cluster information’ – membership & shard RAFT state. ‘ShardActors’ in both the cluster nodes will handle these messages, and instantiates corresponding “replication Behavior” & “election Behavior” role instances and switch to new roles. Northbound virtual IP is OS dependent and out of scope here.
24
3-node Cluster work flow
Shard Manager will create the local shards based on the shard configuration. Each shard will start of as ‘candidate’ for role election and as well as for ‘data replication’ messages, by instantiating the ‘ElectionBehavior’ and ‘ReplicationBehavior’ classes in ‘Candidate’ roles. Candidate node will start sending ‘requestForVote’ messages to other members. Leader is elected based on ‘Raft leader election’ algorithm and each shard will set it’s state to ‘Leader’ by switching the ‘ElectionBehavior’ & ‘ReplicationBehavior’ instances to Leader. Remaining candidates, receive the leader assertion messages, they will move to ‘Follower’ state by switching to ‘ElectionBehavior’ & ‘ReplicationBehavior’ instances to ‘Follower’
25
(Working Proposal) ConsensusStrategy
Provide Hooks to Influence Key RAFT Decisions (Shard Leader Election / Data Replication)
26
Config Changes (DRAFT)
27
(Current) Config Config Files (Karaf: /config/initial)
Read Once on Startup (Default Settings For New Modules) (sal-clustering-commons) Hosts Akka & Config Subsystem Reader/Resolver/Validator Currently No Config Subsystem Config Properties Defined? Akka/Cluster Config: (akka.conf) Akka-Specific Settings (actorspaces data/rpc, mailbox, logging, serializers, etc.) Cluster Config (IPs, names, network parameters) Shard Config: (modules.conf, modules-shards.config) Shard Name / Namespace Sharding Strategies Replication (# and Location) Default Config
28
(Proposal) Config Intent
Continue to Keep Config Outside of Shard/RAFT/DistributedDatastore Code Provide Sensible Defaults and Validate Settings When Possible Error/Warn on Any Changes That Are Not Allowed On a Running System Provide REST Config Access (where appropriate) Design Idea Host Configuration Settings in Config Subsystem Investigate Using Karaf Cellar To Distribute Common Cluster-Wide Config Move Current Config Processing (org.opendaylight.controller.cluster.common.actor) to existing sal-clustering-config? Akka-Specific Config: Make Most of Existing akka.conf File as Default Settings Separate Cluster Member Config (see Cluster Config) Options: Provide Specific Named APIs, e.g. setTCPPort() Allow Akka <type,value> Config To Be Set Directly
29
(Proposal) Config Design Idea (Continued) Cluster Config:
Provide a Single Point For Configuring A Cluster Feeds Back to Akka-Specific Settings, etc. Define Northbound Cluster IP Config (alias) Shard Config: Define Shard Config (Name / Namespace / Sharding Strategy) Will NOT Support Changing Running Shard For Now ‘Other’ Config: 2-Node: Designate Cluster’s Primary Node or Election Algorithm (dynamic) Failback to Primary Node (dynamic) Strategies (Influence These in RAFT) – Separate Bundles? Election Consensus
30
Northbound IP Alias (DRAFT)
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.