Presentation is loading. Please wait.

Presentation is loading. Please wait.

-- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center

Similar presentations


Presentation on theme: "-- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center"— Presentation transcript:

1 -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center alanr@unix.sh

2 -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 What is High-Availability (HA) Clustering? What can HA do for me? What is the Linux-HA project? Linux-HA applications Linux-HA customers Linux-HA release 1 capabilities Linux-HA release 2 capabilities Comparative Architectures Release 2 Details Futures

3 -- Linux-HA Release 2 Preview February/March, 2005 What Is HA Clustering? Putting together a group of computers which trust each other to provide a service even when system components fail When one machine goes down, others take over its work This involves IP address takeover, service takeover, etc. New work comes to the “takeover” machine Not primarily designed for high-performance

4 -- Linux-HA Release 2 Preview February/March, 2005 What Can HA Clustering Do For You? It cannot achieve 100% availability – nothing can. HA Clustering designed to recover from single faults It can make your outages very short From about a second to a few minutes It is like a Magician's (Illusionist's) trick: When it goes well, the hand is faster than the eye When it goes not-so-well, it can be reasonably visible A good HA clustering system adds a “9” to your base availability 99->99.9, 99.9->99.99, 99.99->99.999, etc. Complexity is the enemy of reliability!

5 -- Linux-HA Release 2 Preview February/March, 2005 Single Points of Failure (SPOFs) A single point of failure is a component whose failure will cause near-immediate failure of an entire system or service Good HA design eliminates of single points of failure

6 -- Linux-HA Release 2 Preview February/March, 2005 How Does HA work? Manage redundancy to improve service availability Like a cluster-wide-super-init on steroids Even complex services are now “respawn” on node (computer) death on “impairment” of nodes on loss of connectivity for services that aren't working (not necessarily stopped) managing very complex dependency relationships

7 -- Linux-HA Release 2 Preview February/March, 2005 Redundant Communications Intra-cluster communication is critical to HA system operation Most HA clustering systems provide mechanisms for redundant internal communication for heartbeats, etc. External communications is usually essential to provision of service External communication redundancy is usually accomplished through routing tricks Having an expert in BGP or OSPF is a help

8 -- Linux-HA Release 2 Preview February/March, 2005 Redundant Data Access Replicated Copies of data are kept updated on more than one computer in the cluster Shared Typically Fiber Channel Disk (SAN) Sometimes shared SCSI Back-end Storage (“Somebody Else's Problem”) NFS, SMB Back-end database

9 -- Linux-HA Release 2 Preview February/March, 2005 The Desire for HA systems Who wants low-availability systems? Why are so few systems High- Availability?

10 -- Linux-HA Release 2 Preview February/March, 2005 Why isn't everything HA? Cost Complexity

11 -- Linux-HA Release 2 Preview February/March, 2005

12 -- Linux-HA Release 2 Preview February/March, 2005 The Linux-HA Project Linux-HA is the oldest high-availability project for Linux, with the largest associated community The core piece of Linux-HA is called “heartbeat” (though it does much more than heartbeat) Linux-HA has been in production since 1999, and is currently in use on about ten thousand sites Linux-HA also runs on FreeBSD and Solaris, and is being ported to OpenBSD and others Linux-HA is shipped with every major Linux distribution except one.

13 -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 1 Applications Load Balancers Web Servers Database Servers Custom Applications Firewalls Retail Point of Sale Solutions Authentication File Servers Proxy Servers Medical Imaging Almost any type server application you can think of – except SAP

14 -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA customers Emageon Emageon – medical imaging services Contraloria General de la Republica Contraloria General de la Republica (Colombian government) Incredimail Incredimail bases their mail service on Linux-HA on IBM hardware Karstadts' Karstadts' uses Linux-HA in each of several hundred stores Bavarian Radio Station Bavarian Radio Station (Munich) coverage of 2002 Olympics in Salt Lake City Circuit City, Autozone, others Circuit City, Autozone, others uses Linux-HA in each of several hundred stores Citysavings Bank Citysavings Bank in Munich (infrastructure) University of Toledo (US) University of Toledo (US) – 20k student Computer Aided Instruction system Autostrada Autostrada – 230 clusters across country The Weather Channel The Weather Channel (weather.com) Sony Sony (manufacturing) ISO New England ISO New England manages power grid using 25 Linux-HA clusters

15 -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 1 capabilities Supports 2-node clusters Can use serial, UDP bcast, mcast, ucast comm. Fails over on node failure Fails over on loss of IP connectivity Capability for failing over on loss of SAN connectivity Limited command line administrative tools to fail over, query current status, etc. Active/Active or Active/Passive Simple resource group dependency model Requires external tool for resource monitoring SNMP monitoring

16 -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 capabilities Built-in resource monitoring Support for the OCF resource standard Much Larger clusters supported (>= 8 nodes) Sophisticated dependency model with rich constraint support (resources, groups, incarnations, master/slave) (needed for SAP) XML-based resource configuration Configuration and monitoring GUI Support for GFS cluster filesystem Multi-state (master/slave) resource support Initially - no IP, SAN monitoring

17 -- Linux-HA Release 2 Preview February/March, 2005 Release 2 Credits Andrew Beekhof – CRM, CIB Gouchun Shi – significant infrastructure improvements Sun, Jiang Dong and Huang, Zhen – LRM, Stonithd and testing Lars Marowsky-Bree – architecture, PHB :-) Alan Robertson – architecture, project leadership, original heartbeat code and testing

18 -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 1 Architecture

19 -- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Architecture (add TE and PE)

20 -- Linux-HA Release 2 Preview February/March, 2005 Resource Objects in Release 2 Release 2 supports “resource objects” which can be any of the following: Primitive Resources OCF, heartbeat-style, or LSB resource agent scripts Resource Incarnations – need “n” resource objects - somewhere Resource groups – a group of resources with implied co- location and linear ordering constraints Multi-state resources (master/slave) Designed to model master/slave (replication) resources (DRBD, et al)

21 -- Linux-HA Release 2 Preview February/March, 2005 Basic Dependencies in Release 2 Ordering Dependencies start before (implies stop after) start after (implies stop before) Mandatory Co-location Dependencies must be co-located with cannot be co-located with

22 -- Linux-HA Release 2 Preview February/March, 2005 Resource Location Constraints Mandatory Constraints: Resource Objects can be constrained to run on any selected subset of nodes. Default is none. Preferential Constraints: Resource Objects can also be preferentially constrained to run on specified nodes by providing weightings for arbitrary logical conditions The resource object is run on the node which has the highest weight (score)

23 -- Linux-HA Release 2 Preview February/March, 2005 Resource Incarnations Resource Incarnations allow one to have a resource which runs multiple (“n”) times on the cluster This is useful for managing load balancing clusters where you want “n” of them to be slave servers Cluster filesystems Cluster Alias IP addresses

24 -- Linux-HA Release 2 Preview February/March, 2005 Resource Groups Resource Groups provide a shorthand for making a creating ordering and co-location dependencies Each resource object in the group is declared to have linear start-after ordering relationships Each resource object in the group is declared to have co-location dependencies on each other This is an easy way of converting release 1 resource groups to release 2

25 -- Linux-HA Release 2 Preview February/March, 2005 Multi-State (master/slave) Resources Normal resources can be in one of two stable states: running stopped Multi-state resources can have more than two stable states. For example: running-as-master running-as-slave stopped This is ideal for modeling replication resources like DRBD

26 -- Linux-HA Release 2 Preview February/March, 2005 Advanced Constraints Nodes can have arbitrary attributes associated with them in name=value form Attributes have types: int, string, version Constraint expressions can use these attributes as well as node names, etc in largely arbitrary ways Operators: =, !=, =, defined(attrname), undefined(attrname), colocated(resource id), not colocated(resource id)

27 -- Linux-HA Release 2 Preview February/March, 2005 Advanced Constraints (cont'd) Each constraint is associated with particular resource, and is evaluated in the context of a particular node. A given constraint has a boolean predicate associated with it according to the expressions before, and is associated with a weight, and a condition. If the predicate is true, then the condition is used to compute the weight associated with locating the given resource on the given node. Supported conditions are: (these distinctions may be unneeded ?) can: same as prefer with MAXINT weight cannot: same as prefer with -MAXINT weight prefer: positive weight prefer not: same as prefer with negative weight

28 -- Linux-HA Release 2 Preview February/March, 2005 Security Considerations Cluster: A computer whose backplane is the Internet If this isn't frightening, you don't understand... You may think you have a secure cluster network You're probably mistaken now You will be in the future

29 -- Linux-HA Release 2 Preview February/March, 2005 Secure Networks are Difficult Because... Security is not often well-understood by admins Security is well-understood by “black hats” Network security is easy to breach accidentally Users bypass it Hardware installers don't fully understand it Most security breaches come from “trusted” staff Staff turnover is often a big issue Virus/Worm/P2P technologies will create new holes especially for Windows machines

30 -- Linux-HA Release 2 Preview February/March, 2005 Security Advice Good HA software should be designed to assume insecure networks Not all HA software assumes insecure networks Good HA installation architects use dedicated (secure?) networks for intra-cluster HA communication Crossover cables are reasonably secure – all else is suspect ;-)

31 -- Linux-HA Release 2 Preview February/March, 2005 References http://linux-ha.org/ http://linux-ha.org/download/ http://wiki.linux- ha.org/NewHeartbeatDesign New Web site content (in progress) http://linux-ha.trick.ca/ (pretty - offline!) http://wiki.linux-ha.org/ (editable) www.linux-mag.com/2003- 11/availability_01.html

32 -- Linux-HA Release 2 Preview February/March, 2005 Legal Statements IBM is a trademark of International Business Machines Corporation. Linux is a registered trademark of Linus Torvalds. Other company, product, and service names may be trademarks or service marks of others. This work represents the views of the author and does not necessarily reflect the views of the IBM Corporation.


Download ppt "-- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center"

Similar presentations


Ads by Google