Presentation is loading. Please wait.

Presentation is loading. Please wait.

FailSafe SGI’s High Availability Solution Mayank Vasa MTS, Linux FailSafe Gatekeeper

Similar presentations


Presentation on theme: "FailSafe SGI’s High Availability Solution Mayank Vasa MTS, Linux FailSafe Gatekeeper"— Presentation transcript:

1 FailSafe SGI’s High Availability Solution Mayank Vasa MTS, Linux FailSafe Gatekeeper vasa@sgi.com

2 FailSafe - What is it? High Availability for business critical applications at a low cost User level software running in a clustered environment providing –single point of failure recovery –cluster administration services GUI –a simple way to make applications HA aware

3 FailSafe - What it looks like

4 FailSafe - Terminology Node : a single Linux image Cluster : one or more nodes connected via some interconnect Pool : entire set of nodes involved with a group of clusters Node Membership : list of nodes in a cluster on which FailSafe can allocate resource groups

5 FailSafe - Terminology (contd.) Process Membership : list of process instances in a cluster which form a process group Resource : a single physical or logical entity Resource Group : Collection of inter- dependent resources –cannot overlap –Behaves like an atomic unit of failover –Must have a unique name throughout the cluster

6 FailSafe - Terminology (contd.) Failover : process of moving a resource group from one node to another Failover Policy : method used by FailSafe to determine the destination node of a failover Failover Domain : ordered list of nodes on which a given resource group can be allocated

7 FailSafe - Terminology (contd.) Failover Attributes: Auto Failback, Controlled Failback, InPlace Recovery Failover policy script : shell script which generates an ordered set of node names on which the resource group can be placed Action scripts : scripts which determine how a resource is started, stopped and monitored

8 FailSafe - Architecture Cluster Administration services (CAS) {CAD, CDBD, CDB} Cluster Infrastructure (CI) {CMS, GCS, SRM, CRS} FailSafe Cluster Manager GUI and CLI

9 FailSafe - Acronyms (so many!) CMS = Cluster Membership Service GCS = Group Communication Service SRM = System Resource Manager CRS = Cluster Reset Service CAD = Cluster Administration Daemon CDB = Cluster Database CDBD = Cluster Database Daemon

10 FailSafe - Cluster Database Repository for all cluster configuration Dynamic changes supported Consistency is automatically supported Replicated in all nodes of the pool Provides read and write transactional semantics

11 FailSafe - Cluster Database Daemon Controls read and write accesses to the CDB Notifies clients of dynamic changes to the CDB Keeps global portions of the CDB consistent across the pool

12 FailSafe - Cluster Administration Daemon Daemon responsible for dynamically updating the GUI CAD is a client of CDBD CDBD notifies CAD of any changes Provides notification (default = email) of status changes in node, cluster or resource groups

13 FailSafe - Cluster Membership Service Provides cluster node membership information to its clients Node membership information includes –nodes that are currently part of the cluster –Node status i.e. up, down or unknown –Node name –IP address currently being used for inter-CMSD communication Inactive cluster node membership information is also provided

14 FailSafe - Cluster Membership Service (contd.) Any change in cluster status results in a node membership message issued by CMSD to its clients on all nodes of the cluster CMSD implements failstop and quorum policy CMSDs monitor each other by exchanging heartbeat messages directly with each other

15 FailSafe - Group Communication Service Provides a consistent view of process group memberships in presence of process failures, new processes joining, and changing node memberships Provides a reliable ordered atomic messaging service to members of the process group under changing node and group memberships GCS operates in the context of a cluster as defined by CMS

16 FailSafe - System Resource Manager Manages the resources and resource groups in a cluster Co-ordinates access to physically shared resources Monitors availability of resources Performs local failover of resources Maps a set of resources into a resource group Atomically allocate resource groups

17 FailSafe - Failsafe Daemon A policy implementor for Resource Groups (RG) Provides the ability to enable/disable monitoring an application dynamically Provides ability to failover an application if monitoring fails Failover can be either local (restart) or remote

18 FailSafe - Failsafe Daemon (contd.) Failover Policy Module (PM) PM’s components –Failover script –Initial Failure Domain –Attributes

19 FailSafe - Cluster Reset Service Provides reset facility in a cluster upon request from one of its clients Provides facility to monitor each reset line that connects to a machine that it is expected to reset Special reset network to ensure connectivity for resetting remote machines

20 FailSafe - Agents Glue between a resource type and the Failsafe daemon Collection of action scripts and binaries that the action scripts could be calling Goal : Make a resource a highly available service Examples: a file server agent, a web server agent, an agent for making an IP address, a filesystem or a volume highly available

21 FailSafe - Action Scripts Determine how a resource is started, stopped and monitored Action scripts are per resource type Types: start, stop, monitor, exclusive, restart Returns status for each resource acted on Called by SRM

22 FailSafe - Related HA Technologies A journaled file system for fast recovery FailSafe can support multiple journaled filesystems such as XFS, GFS, ext3fs Volume manager for disk failures (lvm) Network mirroring Monitoring tool (mon)

23 FailSafe - Docs, Contacts Documentation : http://oss.sgi.com/projects/failsafe/ Contact : failsafe@oss.sgi.com

24 FailSafe - Q & A Questions - Sure! Answers …. Well maybe :)


Download ppt "FailSafe SGI’s High Availability Solution Mayank Vasa MTS, Linux FailSafe Gatekeeper"

Similar presentations


Ads by Google