Server Upgrade HA/DR Integration

Server Upgrade HA/DR Integration
Presented by: Tony Barnes & Arash K. Ardestani April-May 2014

Intoructions Add Into information here

About this session The goal of this session is to give the LIS administrators an over view of the LIS systems that reside under HA cluster support. In the following slides we will review the general elements of each cluster and how they interact with each other. We will also look at some of risks and limitations of the cluster and how to minimize the impact of those risks and limitations.

Resources of Cluster - Dedicated
A Typical SCC cluster consists of two AIX hosts AKA nodes. By default each node has a few resources for itself: Its own network port/adapter – dedicated ports Its own permanent IP address – AKA persistent address Its own AIX operating system Its own dedicated disks and devices such as Tape, CD, etc…

Resources of Cluster - Shared
Based on the purpose of the cluster the two nodes share sets of resources – meaning they both can see them simultaneously: Multiple sets of disks drives – Shared storage Note: A special disk is used as a communication interfaces among nodes Multiple sets of IP addresses – Service IP addresses Set of policies that define and govern the roles and responsibilities of the nodes within the cluster

Resource Groups & Policies
A collection of resources of a node (IP, Disk, devices) and their responsibilities Example: Data volumes, filesystems, scripts, printing, etc… Policy: A set of rules & regulations that exactly define the behavior of a resource group Policies are broken down into two major sets: Events: An event is a set of conditions that result in a particular set of actions Actions: A set of one or more procedures that are performed based on a particular set of events

Resource Groups & Policies
Computer Repair Department (Software tech, hardware tech, manager) Policy: Find and fix problems with computers Events: If a problem is found on a computer – determine if HW or SW problem Actions: Manager makes a ticket and sends it to proper tech for evaluation and repair

Resource Group: Policy: MAIN Application Functions
AUX Application Functions Policy: MAIN node starts, runs, and controls the MAIN application functions AUX node starts, runs, and controls the AUX application functions MAIN ask for health of AUX and AUX ask for health of MAIN Events: If MAIN does not get a response from AUX within 60 seconds Actions: Then MAIN take over AUX – Start, run, and control those function of AUX node

Policy: MAIN node starts, runs, and controls the MAIN application functions AUX node starts, runs, and controls the AUX application functions MAIN ask for health of AUX and AUX ask for health of MAIN Events: If AUX does not get a response from MAIN within 60 seconds Actions: Then AUX take over MAIN – Start, run, and control those function of MAIN node

Important Considerations
A cluster solution does not translate to 100% uptime for users The goal is to minimize the downtime and the manual efforts to recover from single points of failure. As with any technology a cluster needs to be maintained and tested. The initial design and implementation of a solution sets the tone for later Understand the purpose of the cluster and adjust your expectations Have redundancy for everything you possibly can – have backup plan A cluster is as robust as the infrastructure it is running on!!!

Limitations of Cluster
A cluster cannot undo or correct a human mistake The proper resources must exist to build the cluster on top There are conditions that cause the cluster to break A broken system with cluster is much hard to fix There is still a short downtime when a failover event occurs There are some applications that do not play well with cluster

Beyond HA Cluster More often now we have a big question on the table: What happens if all nodes of the cluster are not available or able to work? This simply translates to complete downtime and cluster cannot do anything to help you!!! So what can you do to protect yourself against such disaster? The answer is simply “Have a disaster recovery solution”

Types of Disaster Recovery
Cold DR: A set of servers are secured for the critical applications. The cold DR solution is never read to be used. It is an empty shell waiting to be installed when a disaster is declared. Method of recovery: Usually from backup tapes or backup server Warm DR: A set of servers are secured, installed, and maintained ready to be used when a disaster is declared. Method of recovery: Usually direct data replication in real time

Warm or Cold? With the cost of hardware steadily and rapidly declining, more than ever organizations are interested in a disaster recovery solution. Ultimately the decision of which DR solution is the right one for you comes from the two important factors: RTO : Recovery Time Objective In simple terms RTO determines how long it will take to have a working system Simply DOWNTIME! RPO : Recovery Point Objective How close to 100% is the data that you will recover and have to work with Simply Data Loss! Here is a chart from an expert….

How to find RTO/RPO? Figure out what it will cost when you lose LIS system for 1 hour Figure out how long can you be without the LAB system and still survive Figure out how much data can you lose and still survive Know the limitations of applications that you are using Know the limitations of the infrastructure that you have in place Know the limitations of the staff and other dependents departments

What we offer… Over the past few years we have designed and tested quite a few different DR solutions that fit the needs of most of our clients still within affordable rate. The design is based on these elements: Add a third system in the DR data center Repeat system profiles from MAIN/AUX on DR SAN-2-SAN replication from MAIN/AUX to DR Use DNS to connect all the LAB devices to LIS systems

Server Upgrade HA/DR Integration

Similar presentations

Presentation on theme: "Server Upgrade HA/DR Integration"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Server Upgrade HA/DR Integration

Similar presentations

Presentation on theme: "Server Upgrade HA/DR Integration"— Presentation transcript:

Similar presentations

About project

Feedback