Linux in High-Availability Environments

Slides:



Advertisements
Similar presentations
Distributed Data Processing
Advertisements

LANs and WANs Network size, vary from –simple office system (few PCs) to –complex global system(thousands PCs) Distinguish by the distances that the network.
Linux vs. Windows. Linux  Linux was originally built by Linus Torvalds at the University of Helsinki in  Linux is a Unix-like, Kernal-based, fully.
Reliability Week 11 - Lecture 2. What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be.
Created by the Community for the Community Kent Weare.
An Empirical Examination of Current High-Availability Clustering Solutions’ Performance Jeffrey Absher DePaul University Research Symposium Presentation.
How to Cluster both Servers and Storage W. Curtis Preston President The Storage Group.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
Appliance Firewalls A Technology Review By: Brent Huston T h e B l a c k H a t B r i e f i n g s July 7-8, 1999 Las Vegas.
Background History Note 1 Refer to Soltis: Appendix.
CHAPTER FIVE Enterprise Architectures. Enterprise Architecture (Introduction) An enterprise-wide plan for managing and implementing corporate data assets.
-- OSS for High-Availability April, 2005 Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
-- Linux-HA Release 2 LWCE – SF – August, 2005 Linux-HA Release 2 - World-Class Open Source HA Software Alan Robertson Project Leader – Linux-HA project.
Module 13: Configuring Availability of Network Resources and Content.
IT Infrastructure Chap 1: Definition
Data Warehousing at Acxiom Paul Montrose Data Warehousing at Acxiom Paul Montrose.
NETWORK COMPONENTS Assignment #3. Hub A hub is used in a wired network to connect Ethernet cables from a number of devices together. The hub allows each.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
Linux Technology Center September 2004 © 2004 IBM Corporation High Availability on Linux.
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
 High-Availability Cluster with Linux-HA Matt Varnell Cameron Adkins Jeremy Landes.
 Load balancing is the process of distributing a workload evenly throughout a group or cluster of computers to maximize throughput.  This means that.
I NTRODUCTION TO N ETWORK A DMINISTRATION. W HAT IS A N ETWORK ? A network is a group of computers connected to each other to share information. Networks.
-- Linux-HA Release 2 High-Availability Best Practices IV – October, 2005 Linux-HA Release 2 – An Overview Alan Robertson Project Leader – Linux-HA project.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Networking Components WILLIAM NELSON LTEC HUB  Device that operated on Layer 1 of the OSI stack.  All I/O flows out all other ports besides the.
Introduction TO Network Administration
-- Linux-HA Release 2 Preview February/March, 2005 Linux-HA Release 2 Alan Robertson IBM Linux Technology Center
2.2 Interfacing Computers MR JOSEPH TAN CHOO KEE TUESDAY 1330 TO 1530
BIG DATA/ Hadoop Interview Questions.
Network and Server Basics. Learning Objectives After viewing this presentation, you will be able to: Understand the benefits of a client/server network.
MySQL HA An overview Kris Buytaert. ● Senior Linux and Open Source ● „Infrastructure Architect“ ● I don't remember when I started.
Draft-bryan-sipping-p2p-usecases-00 David A. Bryan Eunsoo Shim Bruce B. Lowekamp.
High Availability Low Dollar Clustered Storage
High Availability Clusters in Linux Sulamita Garcia EDS Unix Specialist
Dynamic Host Configuration Protocol
Web Server Administration
Enterprise Architectures
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Computers Are Your Future
Lab A: Planning an Installation
Adam Backman Chief Cat Wrangler – White Star Software
Network Attached Storage Overview
Lecture 1-Part 2: Operating-System Structures
High Availability 24 hours a day, 7 days a week, 365 days a year…
CONNECTING TO THE INTERNET
High Availability Linux (HA Linux)
Lesson Objectives Aims You should be able to:
LINUX WINDOWS Vs..
Enterprise Architecture
Introduction to Networks
PHP / MySQL Introduction
Oracle Solaris Zones Study Purpose Only
Cloud Computing.
LINUX WINDOWS Vs..
Welcome To : Group 1 VC Presentation
Unit 27: Network Operating Systems
DHCP, DNS, Client Connection, Assignment 1 1.3
High Availability Low Dollar Clustered Storage
Lecture-5 Implementation of Information System Part - I Thepul Ginige
QNX Technology Overview
Web Server Administration
Dell Data Protection | Rapid Recovery: Simple, Quick, Configurable, and Affordable Cloud-Based Backup, Retention, and Archiving Powered by Microsoft Azure.
Cloud computing mechanisms
Lecture 1-Part 2: Operating-System Structures
ICT Computing Lesson 4: Computing Network.
APACHE WEB SERVER.
Microsoft Virtual Academy
Presentation transcript:

Linux in High-Availability Environments Alan Robertson IBM Linux Technology Center alanr@unix.sh 18 October 2005

OSS in HA Environments Why OSS for High Availability Environments? What is High-Availability (HA) Clustering? What can HA do for me? DRBD Data Replication The Linux Virtual Server Load Balancer The Linux-HA project? Linux-HA applications and customers Thoughts about cluster security

Why OSS In High-Availability Environments? Openness Broad Range Of Environments Stability Breadth of Support Options Lack of Vendor Lock-In

Openness Extensive Peer Review System Source code freely available Source code reviewed by outside parties Changes discussed openly – often in great detail Ability to obtain uncensored product information Mailing lists archives contain contain uncensored comments from Users with deep expertise Users with little expertise Users who are very happy Users with problems

Broad Range of Environments OSS typically runs on many platforms, often on different OSes too Users often find very creative uses for the software Freedom to try something at low cost decreases perceived risks and encourages this behavior Creative uses find their way into mailing list (archives) and sometimes into the OSS product

Thoughts about Stability OSS software is commonly perceived as being more stable than much commercial software Close contact between customers and developers encourages fixes to be developed quickly OSS users help with testing – providing more breadth in test environment and lowering testing costs Typical commercial support optimized to keep fixes out of the product – testing costs are a big factor

Support for OSS Systems Mailing lists consist of hundreds to thousands of users who are very knowledgeable and helpful – usually regarded as very responsive – typically located in most time zones across the world Can choose support vendor freely: Hardware, OS or OSS supplier Independent consulting/support organizations In-house expertise (most motivated) OSS mailing lists Any combination of the above

No Vendor Lock-In Does not rely on a vendor's future plans being compatible with yours (risk mitigation) Obsolescence more readily manageable Does not rely on a single vendor in another company or country Contributing to the product (or paying someone else to) provides you a voice in future direction Compatibility with other systems typically better than with proprietary products

What Is HA Clustering? A group of computers which cooperate and trust each other to provide a service even when cluster components fail When one machine goes down, others take over its work This involves IP address takeover, service takeover, etc. New work comes to the “takeover” machine Not primarily designed for high-performance

What Can HA Clustering Do For You? It cannot achieve 100% availability – nothing can. HA Clustering designed to recover from single faults It can make your outages very short (improving MTTR) From about a second to a few minutes It is like a Magician's (Illusionist's) trick: When it goes well, the hand is faster than the eye When it goes not-so-well, it can be reasonably visible A good HA clustering system adds a “9” or two to your availability 99->99.9, 99.9->99.99, 99.99->99.999, etc. Complexity is the enemy of reliability!

The Desire for HA systems Who wants low-availability systems? Why are so few systems High- Availability?

Why isn't everything HA? Cost Complexity

Single Points of Failure (SPOFs) A single point of failure is a component whose failure will cause near-immediate failure of an entire system or service Good HA design eliminates of single points of failure

How Does HA work? Manage redundancy to improve service availability Like a cluster-wide-super-init on steroids Even complex services are now “respawn” on node (computer) death on “impairment” of nodes on loss of connectivity for services that aren't working (not necessarily stopped) managing very complex dependency relationships

DRBD – RAID over the LAN Block-device (filesystem) level replication Clever synchronization methods make resyncs faster, decrease latency, preserve integrity Useful for both HA and Disaster Recovery NO single point of failure Extremely cost-effective $200 (max) instead of $20,000 (min) ($USD) Probably not suitable for some high-end write- intensive applications Supportable by IBM Support Line

LVS – The Linux Virtual Server Project LVS is the standard Linux Load Balancer Called "ipvs" in the standard Linux kernel Stable, fast, flexible Especially suitable for large "server farms"

LVS IN Action

“Plays Well With Others” Each of these independent services can work together to scale to large systems All single points of failure can be eliminated High-Availability, Load Balancing work together nicely

Linux Virtual Server, Linux-HA and DRBD

The Linux-HA Project Linux-HA is the oldest high-availability project for Linux, with the largest associated community The core piece of Linux-HA is called “heartbeat” (though it does much more than heartbeat) Linux-HA has been in production since 1999, and is currently in use on about ten thousand sites Linux-HA also runs on FreeBSD and Solaris, and is being ported to OpenBSD and others Linux-HA is shipped with every major Linux distribution except one.

Linux-HA Release 1 Applications Database Servers Load Balancers Web Servers Custom Applications Firewalls, routers, DNS, DHCP Retail Point of Sale Solutions Authentication File Servers Proxy Servers Medical Imaging Almost any type server application you can think of – except SAP

Selected Linux-HA customers Los Alamos (US) National Labs – linear accelerator badge reader Emageon – medical imaging for hospitals and clinics ISO New England manages power grid using ≈ 20 Linux-HA clusters Various Firewall, DNS, DHCP products use Linux-HA basically embedded Karstadt, Circuit City, Autozone use Linux-HA in each of several hundred stores MAN Nutzfahrzeuge AG – truck manufacturing division of Man AG Autostrada – 230 clusters across Italy BBC – Internet Infrastructure Citysavings Bank in Munich (infrastructure) Bavarian Radio Station (Munich) coverage of 2002 Olympics in Salt Lake City The Weather Channel (weather.com) Sony (manufacturing) Incredimail bases their mail service on Linux-HA on IBM hardware University of Toledo (US) – 20k student Computer Aided Instruction system

Linux-HA Release 1 capabilities Supports 2-node clusters Can use serial, UDP bcast, mcast, ucast comm. Fails over on node failure Fails over on loss of IP connectivity Capability for failing over on loss of SAN connectivity Limited command line administrative tools to fail over, query current status, etc. Active/Active or Active/Passive Simple resource group dependency model Requires external tool for resource monitoring SNMP monitoring

Linux-HA Release 2 capabilities Built-in resource monitoring Support for the OCF resource standard Much Larger clusters supported (>= 8 nodes) Sophisticated dependency model with rich constraint support (resources, groups, incarnations, master/slave) (needed for SAP) XML-based resource configuration Configuration and monitoring GUI Support for cluster filesystems (GFS, OCFS, etc.) Multi-state (master/slave) resource support Initially - no IP, SAN monitoring

Resource Objects in Release 2 Release 2 supports “resource objects” which can be any of the following: Primitive Resources OCF, heartbeat-style, or LSB resource agent scripts Resource Incarnations – need “n” resource objects - somewhere Resource groups – a group of resources with implied co- location and linear ordering constraints Multi-state resources (master/slave) Designed to model master/slave (replication) resources (DRBD, et al)

Basic Dependencies in Release 2 Ordering Dependencies start before (implies stop after) start after (implies stop before) Mandatory Co-location Dependencies must be co-located with cannot be co-located with

Resource Incarnations Resource Incarnations allow one to have a resource which runs multiple (“n”) times on the cluster This is useful for managing load balancing clusters where you want “n” of them to be slave servers Cluster filesystems Cluster Alias IP addresses

Security Considerations Cluster: A computer whose backplane is the Internet If this isn't scary, you don't understand... You may think you have a secure cluster network You're probably mistaken now You will be in the future

Secure Networks are Difficult Because... Security is not often well-understood by admins Security is well-understood by “black hats” Network security is easy to breach accidentally Users bypass it Hardware installers don't fully understand it Most security breaches come from “trusted” staff Staff turnover is often a big issue Virus/Worm/P2P technologies will create new holes especially for Windows machines

Security Advice Good HA software should be designed to assume insecure networks Not all HA software assumes insecure networks Good HA installation architects use dedicated (secure?) networks for intra-cluster HA communication Crossover cables are reasonably secure – all else is suspect ;-)

References http://linux-ha.org/ http://linux-ha.org/download/ http://linux-ha.org/NewHeartbeatDesign http://wwnew.linux-ha.org/SuccessStories www.linux-mag.com/2003-11/availability_01.html http://www.linuxvirtualserver.org/ http://drbd.org/

Legal Statements IBM is a trademark of International Business Machines Corporation. Linux is a registered trademark of Linus Torvalds. Other company, product, and service names may be trademarks or service marks of others. This work represents the views of the author and does not necessarily reflect the views of the IBM Corporation.