Presentation is loading. Please wait.

Presentation is loading. Please wait.

Failover and High Availability

Similar presentations


Presentation on theme: "Failover and High Availability"— Presentation transcript:

1 Failover and High Availability
Stefan Zier Sr. Software Engineer, Server Team May 2006 © 2006 ArcSight Confidential

2 © 2006 ArcSight Confidential
Agenda Introduction High availability set-up explained Software Systems IP addresses Heartbeat networks Installation overview © 2006 ArcSight Confidential

3 © 2006 ArcSight Confidential
Introduction © 2006 ArcSight Confidential

4 High Availability vs. Disaster Recovery?
High availability: Cut down on unscheduled down time Hardware failures Operating system failures Application errors Disaster Recovery: Stay operational during catastrophes Natural disasters Major infrastructure issues © 2006 ArcSight Confidential

5 © 2006 ArcSight Confidential
Technical Approaches High availability Second system connected through multiple communication paths Shared storage between the two systems Software that monitors both systems and moves services between them as needed Quick, automatic recovery Disaster recovery Mostly single points of failure in communication paths Replicated storage between the two sites Often manual switch between two sites (may be sufficient and more cost effective) © 2006 ArcSight Confidential

6 Removing Single Points of Failure
Not as simple as it first seems: Many details make implementation complex System only as good as weakest link in the chain Single point of failure is a component whose failure brings down the entire system Examples Single switch that connects all the NICs Single power circuit powering systems… File server used to host system files © 2006 ArcSight Confidential

7 Current HA/DR Support in ArcSight
High availability configurations supported for ArcSight Manager ArcSight Database ArcSight SmartConnectors Disaster recovery configurations deployed at customers Oracle DataGuard: Can only be used without partition archiving at the moment; Oracle is working on the issue SAN block level replication: Requires more bandwidth between sites, but less complex © 2006 ArcSight Confidential

8 Connector Redirection
Connectors can redirect events to another manager Offer access to real time events to cover shorter periods of downtime Development manager set-ups (second manager to develop content before moving to production) Not a full failover solution Resources replication scripts provide incomplete replication Managers will get out of sync (Active Lists, Assets, Rule Engine state, Event IDs, M1s, etc.) Requires duplicate databases © 2006 ArcSight Confidential

9 Technologies that Don’t Fit ArcSight Profile
Load balancing – would compromise correlation capabilities, since they require all events to go through one system Hot-Hot standby – would consume substantial network and CPU resources to keep rich manager state in sync Seamless Console Failover – would cause a similar delay during failover and add some CPU/network cost on the manager © 2006 ArcSight Confidential

10 © 2006 ArcSight Confidential
HA Setup Explained © 2006 ArcSight Confidential

11 Failover Architecture
Public Network Manager 1 Manager 2 Heartbeat Networks Shared Storage Shared Database © 2006 ArcSight Confidential

12 Failover Management Software (FMS)
Software that manages services in a failover cluster Starts and stops Manager/Connector/Oracle Monitors whether Manager/Connector/Oracle is running Migrates a service IP or virtual IP between systems Needs to run on both hosts Tested products EMC AutoStart (Legato AAM) Veritas Cluster Server (VCS) © 2006 ArcSight Confidential

13 © 2006 ArcSight Confidential
Software Public Network Manager 1 Manager 2 Heartbeat Networks Shared Storage Shared Database © 2006 ArcSight Confidential

14 Why Didn’t ArcSight Invent a Proprietary FMS?
ArcSight wants to provide best-of-breed solution Existing FMS are very mature and well-tested Many OS platform combinations supported FMS need to solve many low-level problems that are not ArcSight’s core competency © 2006 ArcSight Confidential

15 Why Do We Need Identical Systems?
Any difference between the two systems presents a risk Failover may fail, leaving the component nonfunctional Standby system not powerful enough to handle load Different OS/patch version on the standby system could prevent component from starting up It is easier to keep systems updated if they are identical Not recommended to use standby system for double duty ! Good practice: When you restart the manager for a configuration change, use the FMS and bring it up on the standby node © 2006 ArcSight Confidential

16 © 2006 ArcSight Confidential
Shared Storage Shared volume to host $ARCSIGHT_HOME or $ORACLE_HOME Needed so all files (rules checkpoints, archived reports, etc.) are reliably in sync between components Needs to be HA (use RAID, multiple I/O channels) Shared SCSI bus cannot be used Simple NFS server or similar would be single point of failure SAN used for shared storage © 2006 ArcSight Confidential

17 IP Address Transparency
Typical set-up has three IP addresses One IP address for each system (system IP) One IP address for the Manager/Connector/Database, also called virtual IP or service IP Clients always talk to service IP: Point DNS and/or hosts files there! Some FMS use IP-based networks for heartbeat networks, these may require additional IPs © 2006 ArcSight Confidential

18 IP Address Transparency
Console or Connector arcsight.customer.com  Public Network Manager 1 Manager 2 Heartbeat Networks Shared Storage Shared Database © 2006 ArcSight Confidential

19 IP Address Transparency
Console or Connector arcsight.customer.com  Public Network Manager 1 Manager 2 Heartbeat Networks Shared Storage Shared Database © 2006 ArcSight Confidential

20 Multiple Communication Paths
Multiple discrete communication channels between systems Use a mix of technologies (serial, disk, ethernet) At least two channels Needed to avoid split brain syndrome © 2006 ArcSight Confidential

21 © 2006 ArcSight Confidential
Split Brain Syndrome Public Network Manager 1 Manager 2 Shared Storage Shared Database © 2006 ArcSight Confidential

22 © 2006 ArcSight Confidential
Split Brain Syndrome Public Network Manager 1 Manager 2 Shared Storage Shared Database © 2006 ArcSight Confidential

23 What happened to the other server? What happened to the other server?
Split Brain Syndrome What happened to the other server? What happened to the other server? Public Network Manager 1 Manager 2 Shared Storage Shared Database © 2006 ArcSight Confidential

24 Probably broke. Let’s start up.
Split Brain Syndrome Let’s just keep going. Probably broke. Let’s start up. Public Network Manager 1 Manager 2 Shared Storage Shared Database © 2006 ArcSight Confidential

25 Looks pretty inconsistent…
Split Brain Syndrome Public Network Manager 1 Manager 2 Looks pretty inconsistent… Shared Storage Shared Database © 2006 ArcSight Confidential

26 Installation Overview
© 2006 ArcSight Confidential

27 Set-Up Procedure for ArcSight Manager
From one system, install Manager on shared disk In managersetup, install the startup scripts, but disable them so that Manager doesn’t auto-start on during boot (option #3) Set the Cluster ID to host name in managersetup On the other host, run managersetup from shared disk, also install startup scripts Note that startup scripts do not reside on shared disk, but on local disk on each system Test Manager on both systems © 2006 ArcSight Confidential

28 © 2006 ArcSight Confidential
FMS Set-Up Install FMS Set-up IP addresses, heartbeat networks Set-up Manager startup, shutdown and monitoring (scripts under utilities/failover) Set-up service IP/virtual IP Test whether the manager can be started/stopped/migrated Group the manager process and the service IP so they move between hosts together © 2006 ArcSight Confidential

29 © 2006 ArcSight Confidential
FMS Options EMC AutoStart (fka. Legato AAM) Communication over IP Unix, Windows Veritas Cluster Server Communication over proprietary protocol UNIX, Windows © 2006 ArcSight Confidential

30 © 2006 ArcSight Confidential
ARCSIGHT_CID ARCSIGHT_CID is the ArcSight Cluster ID Set in /etc/arcsight/arcsight.conf Manually set in .profile for convenience Must be different on each of the hosts Usually, the host name is used Affects where log files get written: logs/$ARCSIGHT_CID/ (default value is default) © 2006 ArcSight Confidential

31 © 2006 ArcSight Confidential
Scripts Startup and shutdown scripts Use /etc/init.d/arcsight_manager start|stop arcsight_manager needs to be executed as root Monitor Use $ARCSIGHT_HOME/bin/arcsight managerup arcsight managerup needs to run as arcsight user If needed, alter the monitor script to pipe the output of managerup into a file so you can monitor output © 2006 ArcSight Confidential

32 © 2006 ArcSight Confidential
Summary ArcSight now supports HA for Connectors The same HA solution can be used for Manager and database EMC AutoStart and Veritas Cluster Server are supported and tested Tech Notes are available on configuration for both products © 2006 ArcSight Confidential

33 © 2006 ArcSight Confidential
Questions and Answers Download Slides More ArcSight Events Join the User Forum © 2006 ArcSight Confidential

34 © 2006 ArcSight Confidential


Download ppt "Failover and High Availability"

Similar presentations


Ads by Google