Presentation is loading. Please wait.

Presentation is loading. Please wait.

High Availability Linux (HA Linux)

Similar presentations


Presentation on theme: "High Availability Linux (HA Linux)"— Presentation transcript:

1 High Availability Linux (HA Linux)
Reliable services: other techniques Other techniques for reliable services: High Availability Linux (HA Linux) Hot Standby machines

2 Reliable services: other techniques
Goal: in case a machine providing a service becomes unavailable, another one takes over transparent for the users Unavailabilities may be caused by: scheduled system maintenance OS upgrade intrusive security fixes preventive hardware interventions network interventions unscheduled system failures hardware failures power failures network (eg. switch) failures

3 Reliable services: other techniques
Distinguish two different service types: stateless services (no local state data) State data is e.g. stored in an external data base no local data, another machine can simply take over load balancing may be the better choice Others services: local data representing the state of the service this data needs to be shared somehow Worst case: state information is kept in memory

4 How to share state data:
Reliable services: other techniques How to share state data: keep up-to-date data on external storage keep a copy of the state data on external storage use an appropriate external data base share this external storage between different machines Keeps static data in an external configuration database

5 External storage solutions:
Reliable services: other techniques External storage solutions: Network Attached Storage (NAS) solution: a separate machine hosts the disk export via Network File System (NFS) introduces a new SPOF (single point of failure) Storage Area Network (SAN) solution: use Fibre Channel (or InfiniBand) based external storage may require use of cluster file systems can be designed in a fully redundant way expensive ! May become complex

6 Reliable services: Linux-HA
What is Linux-HA ? It means High Availability Linux Open source product Works on many OS flavours (gentoo, debian, SuSE, MacOS/X...) About 100 downloads of core component per day Typical use cases: Mail servers Firewalls File servers DNS servers DHCP servers Proxy Caching servers See

7 Reliable services: Linux-HA
How does Linux-HA work ? Main component: Heartbeat Heartbeat needs one or more media paths Death of node detection via heartbeat Master: runs service Slave: checks if master is alive External storage for state data

8 Reliable services: Linux-HA
How does Linux-HA work ? Simple modes of operation: Active – Passive: The master runs all services The slave only checks if the master is alive The slave does nothing else (no load balancing) No service degradation in case the master goes down now

9 Reliable services: Linux-HA
How does Linux-HA work ? Simple modes of operation: Active – Active: Both master and slave run services In case of a failure the alive nodes takes over all services The service becomes degraded in case of a fail over

10 Reliable services: Linux-HA
Linux - HA remarks Box switching Initiated by heartbeat, if the service goes down Or by administrator, in case of scheduled interventions The user sees only one IP adress, it's completely transparent Monitoring constraints can imply that the boxes need to be close Usually on the same switch (which introduces a SPOF)

11 Reliable services: Linux-HA
Examples of Linux - HA in FIO VOMRS Data is kept in Oracle RAC DDB -> external storage MyProxy Static data in Quattor configuration database Other data synchronized via rsync No cluster file systems being used for sharing data.

12 Reliable services: Hot Standby
Hot standby: built-in failover for some services Example: LSF master In Platform LSF any “server” machine is a possible master candidate. There are no network restrictions and no paths which differ from the standard network connection like in HA linux. Failover is initiated by the application itself. Note: each possible master needs access to the most recent event data files (!)

13 Reliable services: Hot Standby
Hot standby for LSF master: definitions LSF server: any machine which runs LSF daemons LSF master: Box which runs master batch and scheduling daemons There can be only one such machine in each LSF instance LSF master candidate Any LSF server which knows the current LSF state at any time Note: it is possible to name possible master candidate nodes. At CERN we have 2

14 Reliable services: hot standby

15 Reliable services: hot standby
Remarks about the current LSF installation: True redundancy would require (at least) 3 nodes. Too expensive ... The two master nodes have identical hardware. If the NFS server fails, the redundancy is gone, but the system will still work because the master keeps a local copy of the event data The hot standby machine is alway empty. No loadbalancing. A failover requires up to 10 minutes. The hot standby has to fire up additonal daemons and read in the current status files

16 Reliable services: summary
HA Linux: Fast failure detection Independent paths to check master Automatic fail over Independent of application Machines need to be close together (risk of SPOF) Eventually on the same switch (for cabling and networking) Requires an extra IP address Hot standby mechanism: No restrictions on cabling (no SPOF) Depending on application a DNS alias is useful Needs to be supported by the application, not a general solution No independent failure detection possible Requires a hot standby which is not used by other machines No load balancing possible

17 Reliable services: Questions
?


Download ppt "High Availability Linux (HA Linux)"

Similar presentations


Ads by Google