High Availability Linux (HA Linux)

Slides:



Advertisements
Similar presentations
What’s New: Windows Server 2012 R2 Tim Vander Kooi Systems Architect
Advertisements

Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
June 23rd, 2009Inflectra Proprietary InformationPage: 1 SpiraTest/Plan/Team Deployment Considerations How to deploy for high-availability and strategies.
Cold Fusion High Availability “Taking It To The Next Level” Presenter: Jason Baker, Digital North Date:
Lesson 1: Configuring Network Load Balancing
Installing software on personal computer
National Manager Database Services
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Ronen Gabbay Microsoft Regional Director Yside / Hi-Tech College
ProjectWise Virtualization Kevin Boland. What is Virtualization? Virtualization is a technique for deploying technologies. Virtualization creates a level.
CERN DNS Load Balancing Vladimír Bahyl IT-FIO. 26 November 2007WLCG Service Reliability Workshop2 Outline  Problem description and possible solutions.
CERN IT Department CH-1211 Genève 23 Switzerland t Some Hints for “Best Practice” Regarding VO Boxes Running Critical Services and Real Use-cases.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
SANPoint Foundation Suite HA Robert Soderbery Sr. Director, Product Management VERITAS Software Corporation.
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
Module 13: Configuring Availability of Network Resources and Content.
What is (Application) Clustering and Why do you Want to Use it? February 2005 Eero Teerikorpi CEO.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
IT Infrastructure Chap 1: Definition
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
7. Replication & HA Objectives –Understand Replication and HA Contents –Standby server –Failover clustering –Virtual server –Cluster –Replication Practicals.
Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.
 High-Availability Cluster with Linux-HA Matt Varnell Cameron Adkins Jeremy Landes.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
(WINDOWS PLATFORM - ITI310 – S15)
CERN DNS Load Balancing VladimírBahylIT-FIO NicholasGarfieldIT-CS.
Clustering Servers Chapter Seven. Exam Objectives in this Chapter:  Plan services for high availability Plan a high availability solution that uses clustering.
70-412: Configuring Advanced Windows Server 2012 services
CHAPTER 7 CLUSTERING SERVERS. CLUSTERING TYPES There are 2 types of clustering ; Server clusters Network Load Balancing (NLB) The difference between the.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.
LHC Logging Cluster Nilo Segura IT/DB. Agenda ● Hardware Components ● Software Components ● Transparent Application Failover ● Service definition.
1 CEG 2400 Fall 2012 Network Servers. 2 Network Servers Critical Network servers – Contain redundant components Power supplies Fans Memory CPU Hard Drives.
Reaching MoU Targets at Tier0 December 20 th 2005 Tim Bell IT/FIO/TSI.
MySQL HA An overview Kris Buytaert. ● Senior Linux and Open Source ● „Infrastructure Architect“ ● I don't remember when I started.
Advanced Network Administration Computer Clusters.
High Availability Clusters in Linux Sulamita Garcia EDS Unix Specialist
Virtualization of Infrastructure as a Service (IaaS): Redundancy Mechanism of the Controller Node in OpenStack Cloud Computing Platform BY Shahed murshed.
Jean-Philippe Baud, IT-GD, CERN November 2007
RHEV Platform at LHCb Red Hat at CERN 17-18/1/17
REPLICATION & LOAD BALANCING
Bentley Systems, Incorporated
Chapter 1: Introduction
Failover and High Availability
High Availability 24 hours a day, 7 days a week, 365 days a year…
Dag Toppe Larsen UiB/CERN CERN,
Dag Toppe Larsen UiB/CERN CERN,
Consulting Services JobScheduler Architecture Decision Template
Chapter 1: Introduction
Maximum Availability Architecture Enterprise Technology Centre.
Castor services at the Tier-0
WLCG Service Interventions
Introduction to Networks
Introduction to Networks
Chapter 1: Introduction
Storage Virtualization
Chapter 1: Introduction
Unit 27: Network Operating Systems
Design Unit 26 Design a small or home office network
SpiraTest/Plan/Team Deployment Considerations
Cloud computing mechanisms
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Deploying Production GRID Servers & Services
Presentation transcript:

High Availability Linux (HA Linux) Reliable services: other techniques Other techniques for reliable services: High Availability Linux (HA Linux) Hot Standby machines

Reliable services: other techniques Goal: in case a machine providing a service becomes unavailable, another one takes over transparent for the users Unavailabilities may be caused by: scheduled system maintenance OS upgrade intrusive security fixes preventive hardware interventions network interventions unscheduled system failures hardware failures power failures network (eg. switch) failures

Reliable services: other techniques Distinguish two different service types: stateless services (no local state data) State data is e.g. stored in an external data base no local data, another machine can simply take over load balancing may be the better choice Others services: local data representing the state of the service this data needs to be shared somehow Worst case: state information is kept in memory

How to share state data: Reliable services: other techniques How to share state data: keep up-to-date data on external storage keep a copy of the state data on external storage use an appropriate external data base share this external storage between different machines Keeps static data in an external configuration database

External storage solutions: Reliable services: other techniques External storage solutions: Network Attached Storage (NAS) solution: a separate machine hosts the disk export via Network File System (NFS) introduces a new SPOF (single point of failure) Storage Area Network (SAN) solution: use Fibre Channel (or InfiniBand) based external storage may require use of cluster file systems can be designed in a fully redundant way expensive ! May become complex

Reliable services: Linux-HA What is Linux-HA ? It means High Availability Linux Open source product Works on many OS flavours (gentoo, debian, SuSE, MacOS/X...) About 100 downloads of core component per day Typical use cases: Mail servers Firewalls File servers DNS servers DHCP servers Proxy Caching servers See http://www.linux-ha.org/running

Reliable services: Linux-HA How does Linux-HA work ? Main component: Heartbeat Heartbeat needs one or more media paths Death of node detection via heartbeat Master: runs service Slave: checks if master is alive External storage for state data

Reliable services: Linux-HA How does Linux-HA work ? Simple modes of operation: Active – Passive: The master runs all services The slave only checks if the master is alive The slave does nothing else (no load balancing) No service degradation in case the master goes down now

Reliable services: Linux-HA How does Linux-HA work ? Simple modes of operation: Active – Active: Both master and slave run services In case of a failure the alive nodes takes over all services The service becomes degraded in case of a fail over

Reliable services: Linux-HA Linux - HA remarks Box switching Initiated by heartbeat, if the service goes down Or by administrator, in case of scheduled interventions The user sees only one IP adress, it's completely transparent Monitoring constraints can imply that the boxes need to be close Usually on the same switch (which introduces a SPOF)

Reliable services: Linux-HA Examples of Linux - HA in FIO VOMRS Data is kept in Oracle RAC DDB -> external storage MyProxy Static data in Quattor configuration database Other data synchronized via rsync No cluster file systems being used for sharing data.

Reliable services: Hot Standby Hot standby: built-in failover for some services Example: LSF master In Platform LSF any “server” machine is a possible master candidate. There are no network restrictions and no paths which differ from the standard network connection like in HA linux. Failover is initiated by the application itself. Note: each possible master needs access to the most recent event data files (!)

Reliable services: Hot Standby Hot standby for LSF master: definitions LSF server: any machine which runs LSF daemons LSF master: Box which runs master batch and scheduling daemons There can be only one such machine in each LSF instance LSF master candidate Any LSF server which knows the current LSF state at any time Note: it is possible to name possible master candidate nodes. At CERN we have 2

Reliable services: hot standby

Reliable services: hot standby Remarks about the current LSF installation: True redundancy would require (at least) 3 nodes. Too expensive ... The two master nodes have identical hardware. If the NFS server fails, the redundancy is gone, but the system will still work because the master keeps a local copy of the event data The hot standby machine is alway empty. No loadbalancing. A failover requires up to 10 minutes. The hot standby has to fire up additonal daemons and read in the current status files

Reliable services: summary HA Linux: Fast failure detection Independent paths to check master Automatic fail over Independent of application Machines need to be close together (risk of SPOF) Eventually on the same switch (for cabling and networking) Requires an extra IP address Hot standby mechanism: No restrictions on cabling (no SPOF) Depending on application a DNS alias is useful Needs to be supported by the application, not a general solution No independent failure detection possible Requires a hot standby which is not used by other machines No load balancing possible

Reliable services: Questions ?