Download presentation
Presentation is loading. Please wait.
1
Introduction to OpenSAF
David Fick Senior Software Architect GoAhead Software
2
Introduction to OpenSAF
Service availability and high availability systems and concepts have been around for decades However, HA terminology tends to vary from industry to industry and company to company Goals of this session: High-level technical overview of the Service Availability™ Forum standards Overview of the support of those standards within OpenSAF Allow you to: Familiarize yourself with SA Forum and OpenSAF concepts and terminology OR Map the HA concepts and terminology with which you are familiar to the SA Forum and OpenSAF versions Resources for getting started with OpenSAF
3
SA Forum Interfaces: AIS & HPI
System Management Applications Application Interface Specifications (AIS) Service Availability Middleware SAF Standards Implemented by OpenSAF Software Mgmt Framework (SMF) Availability Management Framework (AMF) Lock (LCK) Checkpoint (CKPT) Information Model Mgmt (IMM) Cluster Membership (CLM) Event (EVT) Notification (NTF) Platform Mgmt (PLM) Message (MSG) Log (LOG) Operating System Virtualization Hardware Platform Interface (HPI) Hardware Platform A Hardware Platform B Hardware Platform C Hardware Platform D
4
But how to make sense of the SA Forum “acronym soup”?
IMM LCK EE SI SMF SU HPI SG NTF OM AMF CLM PLM CSI CKPT HE LOG MSG EVT
5
AIS Service Groupings First, understand that the AIS services fall into three logical groupings*: System Management Services Resource Availability Management Services Application Services Information Model Mgmt (IMM) Availability Management Framework (AMF) Checkpoint (CKPT) Event (EVT) Software Mgmt Framework (SMF) Cluster Membership (CLM) Message (MSG) Notification (NTF) Platform Mgmt (PLM) Lock (LCK) Log (LOG) Services that manage central system capabilities commonly used by both: AIS services Applications Services that manage and monitor the state of key system resources that affect availability: Hardware / Operating system Cluster nodes Applications Optional services to support application operations such as: Inter-process communication State replication Shared resource access control * - Not official SA Forum AIS service groupings
6
Fault Management Cycle
Second, AIS services that manage availability are designed around a standard fault management cycle Detection Detection E.g. component healthchecks Isolation E.g. blade power off Repair Notification Isolation Recovery E.g. failover of workload assignments to associated standby resources Repair E.g. automatic restart of failed resource Recovery Notification E.g. state change notifications sent by service managing the resource
7
Resource Dependencies
Third, Availability Management in the AIS world is driven by a detailed understanding of the availability management dependencies across all resource types Managed Applications Simple to complex dependencies and relationships can be modeled between the various software elements Dependency on a particular node also modeled AMF Node Represents a node where AMF services are provided Depends on a CLM node CLM Node Represents a cluster node where AIS services are provided Depends on an Execution Environment (optional) Platform Resource Containment and logical dependencies represented between platform resources Execution Environment (EE) Represents an operating system instance (standalone or virtual) Hardware Element (HE) Represents a physical hardware resource in the system Managed Applications AMF Node CLM Node Platform Resource Hardware Element Execution Environment
8
Common Design Patterns
Fourth, the AIS services follow common design patterns: API Common library lifecycle Naming conventions Resource managed by service Managed object Typically with associated state model Managed objects stored in common information model Administrative operations X.731 style administrative operations for resources which affect availability Notifications automatically generated by AIS services for significant system events (alarms, state changes, etc.)
9
Resource Availability Management Services
Availability Management Framework (AMF) Manages the lifecycle and monitors the state of the managed applications within the system More detail in upcoming slides Cluster Membership (CLM) Provides cluster membership change notifications to AIS services and interested applications OpenSAF CLM implements cluster management protocol dealing with: Cluster formation Active controller selection & failover Node failure detection Platform Management (PLM) Manages state of modeled hardware elements and execution environments (operating system instances) Hardware element states and events accessed through Hardware Platform Interface (HPI) Manages graceful blade extraction / de-activation cases Supports hardware element controls (power on/off and reset) Optional service within OpenSAF AMF CLM PLM
10
Availability Management Framework (AMF) AMF Logical Entities
Structural Entities AMF Application Represents the highest-level service(s) provided by the system AMF Application 1..* Service Group Service Group (SG) Represents a group of like logical resources that provide the same service(s) Associated redundancy model (e.g. 1+1) 1..* Service Unit (SU) Aggregates a set of resources which when combined provide a higher-level service Service Unit 1 1..* Component Represents one or more resources that perform a function within the system Component
11
Availability Management Framework (AMF) AMF Logical Entities
Workload Entities AMF Application Service Instance (SI) Represents a workload to be supported by the system Has associated redundancy requirements (1+1, N+M, etc.) 1..* Service Group Service Group Service Group Protected by Protected by an identified SG Assigned to one or more SUs with an HA state of active, standby, quiescing or quiesced 1..* 1..* Service Unit 1 Service Unit 1 Service Unit 1 Assigned Service Instance Component Service Instance (CSI) Represents a more granular workload that needs to be supported by the system 1..* 1..* Component Component Component Assigned Component Service Instance Assigned to one or more components
12
Availability Management Framework (AMF) AMF Logical Entities
Common Characteristics Well-defined state model for each logical entity type X.731 style administrative operations Common AMF Component Types SA-aware Applications modified to interact with AMF through AMF API Non-proxied, non-SA-aware Legacy or 3rd party applications that typically cannot be modified Interact with AMF through command line scripts to manage application lifecycle Always assigned active HA state if running Proxied, non-SA-aware Applications that have knowledge of HA concepts but do not directly communicate with AMF Proxy application receives HA “commands” from AMF and forwards them to proxied application through a custom interface Lifecycle mgmt AMF comp process HA state assignment AMF AMF Library Lifecycle mgmt Non-proxied AMF comp process AMF Proxy Lifecycle mgmt Proxy component AMF AMF Library Proxied AMF comp process Lifecycle mgmt & HA state assignment Proxy HA state assignment AND Proxied comp lifecycle mgmt & HA state assignment requests
13
Availability Management Framework (AMF) Service Group Redundancy Models
Most common redundancy model Preferred assignment model per SI: 1 active resource 1 standby resource SUs can have either all active or all standby SI assignments A.k.a. 1+1, active-standby, active-backup N+M Both N and M are configurable Common variation: N+1 SI1 Node1 Node2 A S SU1 SU2 SI1 Node1 A Node2 Node3 S SU1 SU2 SU3 A S SI2
14
Availability Management Framework (AMF) Service Group Redundancy Models
No redundancy Preferred assignment model per SI: 1 active resource Similar to a N+0 redundancy scheme where N is the number of protected SIs N-way Y standby resources (where Y is configurable) SUs can concurrently have both active and standby assignments N-way Active X active resources (where X is configurable) No standby resource SI1 SI2 Node1 A Node2 A SU1 SU2 SI1 Node1 A Node2 S Node3 S SU1 SU2 SU3 S A S SI1 SI2 Node1 A Node2 A SU1 SU2
15
Availability Management Framework (AMF) Error Recovery Policies
Pre-defined AMF component error recovery policies Configurable Can be overridden at runtime Recovery policy scopes Component Service Unit Node Recovery policy types Restart Failover Failfast Up to 3 actions per policy Isolation Recovery Repair Error escalation policies
16
System Management Services Information Model Management (IMM)
Information Model Highlights Based on pre-defined object classes (including AIS classes) Holds both configuration and runtime objects Used by AIS services to store current configuration and runtime state info Can be used by applications as well Object Management API Object class management Access object attribute values Search information model Configuration change requests Administrative operation invocation Object Implementer API Runtime object management CCB validation and application Administrative operation handling OpenSAF Implementation Persistence of information model managed through Persistence BackEnd (PBE) feature Replicated to multiple cluster nodes
17
System Management Services Software Management Framework (SMF)
SMF controls migration from one deployment configuration to another Upgrade methods Rolling upgrade Single step upgrade [De-]Activation Unit Scope AMF Node Service Unit During the migration SMF Maintains the campaign state change model Takes measures to enable error recovery Monitors for potential errors caused by the migration Deploys error recovery procedures Upgrade Campaign Definition “Upgrade Instructions” Software Management Framework Adaptation commands (SMF config object) Install / remove software bundles on target nodes Admin operations Read/Create/Delete/Update objects Software Repository Information Model
18
System Management Services
Notification (NTF) Publish-and-subscribe semantics for system-level notifications Reader interface for reading historical alarm info as well Formal syntax and semantics for ITU X.73x notifications: Alarm / security alarm / state change / object create/ delete / attribute change Used by AIS services to publish service-specific notifications Alarm and security alarm notifications automatically logged through LOG service Log (LOG) Flexible, centralized, system-wide logging mechanism Pre-defined log streams: alarm, notification, system Supports multiple, custom application log streams Log streams are configurable on a per log stream basis Including log file full action: halt, wrap, and rotate
19
Application Services Checkpoint (CKPT)
Intended as a state replication mechanism for distributed applications Can be used for all standby “temperature levels” Cold Warm Hot Through OpenSAF CKPT service API extension Semantics of a checkpoint Arbitrary set of sections containing opaque data Stored in one or more replicas distributed across cluster Reads and writes occur against the active replica Both synchronous and asynchronous replication options available Collocated checkpoint option provided for highest performance
20
Application Services Event (EVT) Message (MSG) Lock (LCK)
Publish-and-subscribe communication paradigm Flexible event channel, pattern, and filtering definition Subscriber event queue maintained within app process Message (MSG) Messages sent to and read from message queues Single message queue owner at a time Message queue maintained outside app process Message queues can be logically grouped Messages can be sent to a message queue group Associated distribution policy (round-robin, broadcast, etc.) Lock (LCK) Cluster-wide, distributed lock service Can be used to control access to cluster-level shared resources
21
Getting Started with OpenSAF
OpenSAF Technical Educational Resources Developer Wiki [ OpenSAF Developers blog [ OpenSAF mailing lists [Subscribe: Users [Archive: Announce [Archive: Development [Archive: Latest documentation [ FAQ [ README files in source code repository
22
Questions
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.