Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.

Slides:



Advertisements
Similar presentations
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Advertisements

Best Practices for Implementing High Availability for SAS® 9.4
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
SSRS 2008 Architecture Improvements Scale-out SSRS 2008 Report Engine Scalability Improvements.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 6 Managing and Administering DNS in Windows Server 2008.
NODEMANAGER WEBLOGIC SERVER. 1.Creating logical machines 2.Using nodemanager for server startup and shutdown GETTING STARTED.
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
High Availability 24 hours a day, 7 days a week, 365 days a year… Vik Nagjee Product Manager, Core Technologies InterSystems Corporation.
Domain Name System. DNS is a client/server protocol which provides Name to IP Address Resolution.
Copyright 2007, Information Builders. Slide 1 Workload Distribution for the Enterprise Mark Nesson, Vashti Ragoonath June, 2008.
1 Week #1 Objectives Review clients, servers, and Windows network models Differentiate among the editions of Server 2008 Discuss the new Windows Server.
Windows Server System TM Overview IT Expectations: Do More with Less.
1 Week #1 Objectives Review clients, servers, and Windows network models Differentiate among the editions of Server 2008 Discuss the new Windows Server.
MCDST : Supporting Users and Troubleshooting a Microsoft Windows XP Operating System Chapter 13: Troubleshoot TCP/IP.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
14.1 © 2004 Pearson Education, Inc. Exam Planning, Implementing, and Maintaining a Microsoft Windows Server 2003 Active Directory Infrastructure.
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Microsoft ® Application Virtualization 4.5 Infrastructure Planning and Design Series.
MCTS Guide to Microsoft Windows Server 2008 Applications Infrastructure Configuration (Exam # ) Chapter Ten Configuring Windows Server 2008 for High.
11 SYSTEMS ADMINISTRATION AND TERMINAL SERVICES Chapter 12.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 12: Managing and Implementing Backups and Disaster Recovery.
Microsoft ® Application Virtualization 4.6 Infrastructure Planning and Design Published: September 2008 Updated: February 2010.
Grid Computing Meets the Database Chris Smith Platform Computing Session #
1 SAMBA. 2 Module - SAMBA ♦ Overview The presence of diverse machines in the network environment is natural. So their interoperability is critical. This.
Microsoft Windows 2003 Server. Client/Server Environment Many client computers connect to a server.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
Sales Kickoff - ARCserve
Module 13: Configuring Availability of Network Resources and Content.
Name Resolution Domain Name System.
11 Copyright © 2005, Oracle. All rights reserved. Configuring the Oracle Network Environment.
Using the WDK for Windows Logo and Signature Testing Craig Rowland Program Manager Windows Driver Kits Microsoft Corporation.
Oracle10g RAC Service Architecture Overview of Real Application Cluster Ready Services, Nodeapps, and User Defined Services.
11 Copyright © 2005, Oracle. All rights reserved. Configuring the Oracle Network Environment.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
High Performance Computing Cluster OSCAR Team Member Jin Wei, Pengfei Xuan CPSC 424/624 Project ( 2011 Spring ) Instructor Dr. Grossman.
 Anil Nori Distinguished Engineer Microsoft Corporation.
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network Chapter 6: Name Resolution.
70-291: MCSE Guide to Managing a Microsoft Windows Server 2003 Network, Enhanced Chapter 6: Name Resolution.
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
Windows Azure Migrating Applications and Workloads Speaker Title Organization.
Best Practices for Implementing Unicenter NSM r11.1 in an HA MSCS Environment Part II -Last Revision April 24, 2006.
Welcome Windows Server 2008 安全功能 -NAP. Network Access Protection in Windows Server 2008.
1 Internet Network Services. 2 Module - Internet Network Services ♦ Overview This module focuses on configuring and customizing the servers on the network.
Separating the Interface from the Engine: Creating Custom Add-in Tasks for SAS Enterprise Guide ® Peter Eberhardt Fernwood Consulting Group Inc.
Making r11 Agent Technology talk through a Firewall Last Updated 12/19/2005.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 12: Planning and Implementing Server Availability and Scalability.
DNS DNS overview DNS operation DNS zones. DNS Overview Name to IP address lookup service based on Domain Names Some DNS servers hold name and address.
High Availability in DB2 Nishant Sinha
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS ® Using the SAS Grid.
Best Practices for Implementing Unicenter NSM r11.1 in an HA MSCS Environment Part I -Last Revision April 24, 2006.
Linux Operations and Administration
Copyright © 2012, SAS Institute Inc. All rights reserved. SAS ® GRID AT PHAC SAS OTTAWA PLATFORM USERS SOCIETY, NOVEMBER 2012.
Chapter 4: server services. The Complete Guide to Linux System Administration2 Objectives Configure network interfaces using command- line and graphical.
Planning Server Deployments Chapter 1. Server Deployment When planning a server deployment for a large enterprise network, the operating system edition.
CIS 221 Lesson 2. What is the first phase of the of the Installation of Windows XP? MS-DOS phase Why is the MS-DOS phase needed? the computer required.
Univa Grid Engine Makes Work Management Automatic and Efficient, Accelerates Deployment of Cloud Services with Power of Microsoft Azure MICROSOFT AZURE.
Managing Windows Server 2012
High Availability 24 hours a day, 7 days a week, 365 days a year…
High Availability Linux (HA Linux)
Consulting Services JobScheduler Architecture Decision Template
Trial.iO Makes it Easy to Provision Software Trials, Demos and Training Environments in the Azure Cloud in One Click, Without Any IT Involvement MICROSOFT.
Network Load Balancing
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Allocating IP Addressing by Using Dynamic Host Configuration Protocol
Microsoft Virtual Academy
Presentation transcript:

Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Achieving High Availability in a SAS® Grid Environment Daniel Wong Platform Computing Inc.

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Agenda Background Introduction to SAS Grid and HA Techniques for achieving HA SAS Metadata Server example Future work

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Achieving High Availability in a SAS Grid Environment Targeted Audience IT Managers/Administrators who want to improve overall availability in their SAS Grid environment Consultants who help customers to implement SAS Grid Anyone evaluating Grid wants to gain more general knowledge about HA in the SAS Grid environment …

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Platform Computing 4,000,000 Managed CPUs 2,000 Customers worldwide 500 Employees in 15 offices 16 Years of profitable growth 8 Years of Partnership with SAS 1 Leader in HPC

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. SAS-Platform OEM Partnership Started with Job Scheduler for SAS Warehouse Administrator v8 in 2001 Current OEM product: Platform Suite for SAS Platform provides the Grid middleware technologies to SAS for: Scheduling for SAS workflow Effectively management of resources and SAS workload Scaling out SAS (parallelized) Applications (DI, EM, Risk Dim…) SAS-Platform R&D teams constantly drive new development initiatives to add values to SAS customers

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Status of this work Goal of joint development initiative To provide failover capabilities for critical SAS system services in all operating systems supported by SAS Grid Manager Current status Working prototype that can failover SAS Metadata Server in both Windows & Linux SAS Grid environment Key advantages of this solution Eliminates operating system dependencies Eliminates the requirement of a hot-standby for failover Eliminates the expense of purchasing a third-party tool to provide high availability

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Agenda Background Introduction to SAS Grid and HA Techniques for achieving HA SAS Metadata Server example Future work

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. SAS Grid 101 Key Value Prop / Use Cases: Enterprise Scheduling Workload Balancing Parallelized Workload Balancing

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. SAS Enterprise Scheduling

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. SAS Workload Balancing

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Parallelized Workload Balancing

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. SAS Grid Architecture Topology

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. 4 key steps to implement HA 1. Monitor Discover service failures Capabilities required Detect failure of individual services Monitor services on all hosts in Grid 3. Synchronize Ensure client applications and Grid components are aware of new config Capabilities required Support for services running on host with a different IP address Client apps and Grid components must know about the new arrangement 2. Recover Restart service when failures are detected Capabilities required Capability to restart services on any host or designated hosts in Grid 4. Reconnect Enable clients applications to access the service again Capabilities required Client apps must be able to access to service after failover without having to reconfig

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Agenda Background Introduction to SAS Grid and HA Techniques for achieving HA SAS Metadata Server example Future work

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Platform Enterprise Grid Orchestrator (EGO) EGO is included in Platform Suite for SAS v4.1 / SAS Grid Manager 9.2 Available on all Unix, Linux, and Windows platforms supported by Grid Manager EGO provides a full suite for services to: Allows users to treat distributed sw/hw resources as components of a virtual computer Enables all applications, services, workload to access a shared infrastructure Allocates resources according to policy, simplifies management and improves availability of the entire environment We will be focusing in two EGO functions: EGO Service Controller EGO Service Director

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Monitoring and Restarting Services with EGOSC EGO Service Controller EGOSC starts on Grid Control Machine Responsible for starting services on remote grid nodes Ensure services are running by detecting failures and restarting service instances based on configuration Registering your service with EGO Create EGO Service Definition File Service Name – defines the name of the service, a handle for EGO EGOCommand – exact command to start your service HostFailoverInterval – Threshold for EGO to trigger failover ResourceRequirement – Host candidates for service ExecutionUser – UserID that runs the service EGO will start, monitor, and restart your service according to these definitions

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Location independence for failover Consider the following scenario: 1.Metadata Server (SASMeta) is configured to be started by EGO 2.Client App can lookup Metadata Server by: $nslookup $server SASMeta.ego.sas.com Server: (Corporate DNS server for client) Address: #53 Name: SASMeta.ego.sas.com Address: Metadata Server does down unexpectedly as hostA crashes 4.EGO detects Metadata Server is gone and restarts it on a failover host 5.Metadata Server is successfully restarted: $nslookup $server SASMeta.ego.sas.com Server: (Corporate DNS server for client) Address: #53 Name: SASMeta.ego.sas.com Address: Recovery is completed. Client App can still access Metadata Server by connecting to SASMeta.ego.sas.com

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. EGO Service Director Provides location mechanism for other system services Contains a stand-alone Domain Name Server for EGO DNS sub-domain Provides server name to IP address resolution within the sub-domain Relies on Service Controller to provide location info for service instances Service Director updates Corporate DNS when location of Service Director DNS is changed

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Grid Nodes Grid Control Machine Normal Case Host B Service Controller EGO Kernel Service Director Corporate DNS Server Service Director DNS Server Service A Host C Host n Client … Launch/Control/Monitor Update location of service instances Update location of SD DNS Server 1.Client makes a DNS query to determine location of Service A 2.Corporate DNS forward query to SD DNS 3.SD DNS responds with latest location of service 4.Corporate DNS pass the response back to Client 5.Client uses the response to locate the service Dont know anything about the EGO sub- domain, lets pass it on

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Service A Grid Nodes Grid Control Machine Failover Case Host B Service Controller EGO Kernel Service Director Corporate DNS Server Service Director DNS Server Host C Host n Client … 1 3 Launch/Control/Monitor Update location of service instances Update location of SD DNS Server 1.EGO discovers Service A has failed 2.Service Controller restarts Service A on failover Host C 3.Service Controller updates Service Director DNS Server 4.Client obtains new name-to-IP address mapping to access Service A on Host C Service A 2 4

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Whats going to failover EGO Service Controller? Answer: Load Index Manager (LIM), the first process responsible for starting all Grid Services Whats going to failover LIM? LIM runs on each node under a master-slave arrangement Master and slave LIM communicates host load index info regularly If Master LIM cannot be reached, Slave LIM will figure out among themselves a new master LIM New master can restart EGO Service Controller

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Agenda Background Introduction to SAS Grid and HA Techniques for achieving HA SAS Metadata Server example Future work

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Example: HA for SAS Metadata Server Install SAS Metadata Server Install LSF Assumptions: hostA = default host for Metadata Server hostB = failover host for Metadata Server \\hostF\LSFShare\\hostF\LSFShare = shared config dir accessible by all hosts in Grid \\hostF\SAS\\hostF\SAS = shared config dir for SAS Metadata Server, accessible by hostA and hostB 1.Install Platform Suite for SAS 4.1 provided with SAS Grid Manager hostA should be the Grid Control Machine 3.hostB should be the failover host for the Grid Control Machine (same with Metadata Server) Configure EGO Services Failover Tests Eliminating Hot-Standby

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Example: HA for SAS Metadata Server Install SAS Metadata Server Install LSF 1.User a/c running Metadata Server must has full access to \\hostF\SAS\\hostF\SAS 2.Install Metadata Server on hostA 3.Set configuration directory to \\hostF\SAS\\hostF\SAS 4.Repeat #3 for hostB Note: SAS and 9.2 install procedures are different Configure EGO Services Failover Tests Eliminating Hot-Standby

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Example: HA for SAS Metadata Server Install SAS Metadata Server Install LSF 1.Create EGO service definition file for Metadata Server 2.Specify hostA & hostB as host candidates 3.Enable Service Director to start automatically on hostA and failover to hostB 4.Configure EGO Name Service (DNS) – set domain names, encryption key to CorpDNS 5.Configure Corporate DNS to forward EGO sub-domain query to EGO DNS 6.Restart all EGO services 7.Disable DNS cache on client machine 8.Test with nslookup Configure EGO Services Failover Tests Eliminating Hot-Standby

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Example: HA for SAS Metadata Server Install SAS Metadata Server Install LSF 1.Connect with SAS Management Console to check if Metadata Server is functioning on hostA 2.Shutdown hostA 3.Check with egosh service list to see Metadata restarts on hostB 4.Restart SAS MC, it should connect to Metadata Server again on hostB Configure EGO Services Failover Tests Eliminating Hot-Standby

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Example: HA for SAS Metadata Server Install SAS Metadata Server Install LSF This step is optional When Metadata Server is running on hostA, hostB can be a Grid node running SAS jobs During failover, LSF can be reconfig not to send jobs to hostB Use a wrapper script to close the host for accepting jobs before starting Metadata Server Jobs already running on hostB will continue to run until finish, but no more jobs will be dispatched to hostB Configure EGO Services Failover Tests Eliminating Hot-Standby

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Agenda Background Introduction to SAS Grid and HA Techniques for achieving HA SAS Metadata Server example Future work

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Grid Nodes SAS JobA Failover for Grid Nodes? Host 1 LSF Host 2 Host n … SAS JobA Host 1 has crashed, requeue if JobA is rerunnable LSF can requeue a job when it detects that the host running the host has failed, and the job is defined as rerunnable When the job is requeued, LSF will try to dispatch this job to another host that is suitable Rerun job will start running from the beginning, meaning that all previous computation and data results will have to redo again

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Grid Nodes SAS JobA SAS 9.2 Checkpoint with Requeuing Host 1 LSF Host 2 Host n … SAS JobA Host 1 has crashed, requeue if JobA is rerunnable Checkpoint Images written to shared directory when Job A was running on Host 1: JobA – chkpnt#1 JobA – chkpnt#2 … JobA – chkpnt#17 … Host 1 crashed !! SAS JobA on Host 2 resumes from chkpnt#17

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Conclusion Working prototype that can failover SAS Metadata Server in both Windows & Linux SAS Grid environment Key advantages of this solution Eliminates operating system dependencies Eliminates the requirement of a hot-standby for failover Eliminates the expense of purchasing a third-party tool to provide high availability Future Work Provides HA configuration templates for the rest of SAS services and platforms Simplify configurations Provides support for SAS 9.2 checkpoint and restart mode

Copyright © 2008, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.