BELT & SUSPENDERS HA & DR in one solution Ray English Sr. Systems Administrator Indianapolis Power & Light Company.

Slides:



Advertisements
Similar presentations
© 2006 The Adaptation of Linux within Government John Long CTO Sabeo Technologies.
Advertisements

1/17/20141 Leveraging Cloudbursting To Drive Down IT Costs Eric Burgener Senior Vice President, Product Marketing March 9, 2010.
Clustering Architectures in GIS/SI
NetApp Confidential - Limited Use
© 2010 IBM Corporation ® Tivoli Storage Productivity Center for Replication Billy Olsen.
MUNIS Platform Migration Project WELCOME. Agenda Introductions Tyler Cloud Overview Munis New Features Questions.
Skyward Disaster Recovery Options
SQL Server Disaster Recovery Chris Shaw Sr. SQL Server DBA, Xtivia Inc.
Business Continuity Section 3(chapter 8) BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT1.
Take your CMS to the cloud to lighten the load Brett Pollak Campus Web Office UC San Diego.
VERITAS Confidential Disaster Recovery – Beyond Backup Jason Phippen – Director Product and Solutions Marketing, EMEA.
© 2009 EMC Corporation. All rights reserved. Introduction to Business Continuity Module 3.1.
Changing the Game: Moving from Reactive to Proactive High Availability Luigi Mercone Senior Director, Product Strategy.
High Availability 24 hours a day, 7 days a week, 365 days a year… Vik Nagjee Product Manager, Core Technologies InterSystems Corporation.
Oracle Data Guard Ensuring Disaster Recovery for Enterprise Data
© 2015 Dbvisit Software Limited | dbvisit.com An Introduction to Dbvisit Standby.
June 23rd, 2009Inflectra Proprietary InformationPage: 1 SpiraTest/Plan/Team Deployment Considerations How to deploy for high-availability and strategies.
Cold Fusion High Availability “Taking It To The Next Level” Presenter: Jason Baker, Digital North Date:
1 © Copyright 2010 EMC Corporation. All rights reserved. EMC RecoverPoint/Cluster Enabler for Microsoft Failover Cluster.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
Keith Burns Microsoft UK Mission Critical Database.
1© Copyright 2011 EMC Corporation. All rights reserved. EMC RECOVERPOINT/ CLUSTER ENABLER FOR MICROSOFT FAILOVER CLUSTER.
Barracuda Networks Confidential1 Barracuda Backup Service Integrated Local & Offsite Data Backup.
1© Copyright 2012 EMC Corporation. All rights reserved. November 2013 Oracle Continuous Availability – Technical Overview.
Implementing Failover Clustering with Hyper-V
National Manager Database Services
11 SERVER CLUSTERING Chapter 6. Chapter 6: SERVER CLUSTERING2 OVERVIEW  List the types of server clusters.  Determine which type of cluster to use for.
Microsoft Load Balancing and Clustering. Outline Introduction Load balancing Clustering.
IBM TotalStorage ® IBM logo must not be moved, added to, or altered in any way. © 2007 IBM Corporation Break through with IBM TotalStorage Business Continuity.
SANPoint Foundation Suite HA Robert Soderbery Sr. Director, Product Management VERITAS Software Corporation.
Enhanced HA and DR with MetroCluster & Vmware
Chapter 10 : Designing a SQL Server 2005 Solution for High Availability MCITP Administrator: Microsoft SQL Server 2005 Database Server Infrastructure Design.
Module 13: Configuring Availability of Network Resources and Content.
IT Business Continuity Briefing March 3,  Incident Overview  Improving the power posture of the Primary Data Center  STAGEnet Redundancy  Telephone.
Business Continuity and Disaster Recovery Chapter 8 Part 2 Pages 914 to 945.
© Novell, Inc. All rights reserved. 1 PlateSpin Protect Virtualize your Disaster Recovery.
INSTALLING MICROSOFT EXCHANGE SERVER 2003 CLUSTERS AND FRONT-END AND BACK ‑ END SERVERS Chapter 4.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
Database Edition for Sybase Sales Presentation. Market Drivers DBAs are facing immense time pressure in an environment with ever-increasing data Continuous.
IT Infrastructure Chap 1: Definition
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
High Availability 2001 Press Briefing. Agenda What we’re announcing today…a new vision for High Availability New HA software products VERITAS Cluster.
Module 9 Planning a Disaster Recovery Solution. Module Overview Planning for Disaster Mitigation Planning Exchange Server Backup Planning Exchange Server.
Monitoring EMS Infrastructure Ann Moore San Diego Gas & Electric September 13, 2004 EMS Users Group Meeting-St. Louis.
FailSafe SGI’s High Availability Solution Mayank Vasa MTS, Linux FailSafe Gatekeeper
Virtualization for Disaster Recovery Panel Discussion May 19, 2010 Ed Walsh EMC vSpecialist EMC Corporation Cell Chris Fox.
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
Thank you. We request that you please turn off pagers and cell phones during class.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.
Remote Data Mirroring Solutions for High Availability David Arrigo EMC Corporation
Ashish Prabhu Douglas Utzig High Availability Systems Group Server Technologies Oracle Corporation.
Joe D’Angelo, Sr. Systems Engineer Serverware Corp.
Enhancing Scalability and Availability of the Microsoft Application Platform Damir Bersinic Ruth Morton IT Pro Advisor Microsoft Canada
Oracle Applications 11i Concepts II Brian Hitchcock OCP 11i DBA -- OCP 10g DBA Sun Microsystems Brian Hitchcock.
TRUE CANADIAN CLOUD Cloud Experts since The ORION Nebula Ecosystem.
Virtual Machine Movement and Hyper-V Replica
VCS Building Blocks. Topic 1: Cluster Terminology After completing this topic, you will be able to define clustering terminology.
Server Upgrade HA/DR Integration
Failover and High Availability
High Availability 24 hours a day, 7 days a week, 365 days a year…
Maximum Availability Architecture Enterprise Technology Centre.
VIDIZMO Deployment Options
SQL Server High Availability Amit Vaid.
Disaster Recovery Services
Storage Trends: DoITT Enterprise Storage
SpiraTest/Plan/Team Deployment Considerations
OnBase Training Speaker: Dora Compis Disaster Recovery.
Designing Database Solutions for SQL Server
Presentation transcript:

BELT & SUSPENDERS HA & DR in one solution Ray English Sr. Systems Administrator Indianapolis Power & Light Company

About Ray English  Sr. Systems Administrator Indianapolis Power & Light Company Focusing on UNIX (primarily Solaris) systems  UNIX geek since 1994  Sun Certified System Administrator  VCS administrator since 2000  Experience in Indianapolis IPL Lilly General Motors Allison Transmission (EDS)

OMS Overview  OMS = Outage Management System  Records outage calls from IPL customers IVR ( & ) Phone center agents “Last gasp” from meters  “I’m meter and I just lost power.”  ~500,000 customers in Indianapolis area  Predictive analysis of root cause based on call grouping (transformer, pole, etc.)  Industry-specific software from CGI/M3i

OMS Map View

Zoom-in on outage

Outage summaries

OMS business challenges  Critical system Customers expect 100% reliability Utilized to dispatch trucks to restore outages  Data is constantly churning Information from minutes ago could be useless The more data, the better idea we have of what’s wrong  Utilized most during high stress Evenings (end-of-day for day shift)  Storms  Customers arriving home from work Poor weather (storms, ice storms, snow) High customer expectations  Keep it simple

OMS Technical Architecture  Oracle databases Sun Solaris SPARC systems  Application Tier Windows systems  Client Tier Windows workstations  Dispatchers  Trucks IVR systems Web front-end call center agents (iCall)

High Availability (the belt)  VERITAS Cluster Server  Failover within datacenter Human error Power feeds Networking SAN Isolated environmental Application failure Server failure  Rolling upgrades

Overview of HA setup

VCS service group configuration

DR Challenges  Loss of a site  Need up-to-the-minute data Information from minutes ago could be useless No data is better than incorrect data  “Know that you don’t know anything.”  Seamless to users Dispatchers Crews Call center representatives Customers

Disaster Recovery (the suspenders)  Moderately close proximity ~10 miles +/-  Robust fiber Public IP subnet & VCS heartbeats span data centers Redundant loop around city Lots of bandwidth  EMC SRDF Symmetrix Remote Data Facility Other technologies available (VVR, etc.)  VERITAS Cluster Server (Global Cluster Option)

Cluster Terminology  Stretch cluster  Stretched cluster  Campus cluster  Extended cluster  Data replication cluster  Metro cluster  Metro stretched cluster

Overview of DR setup

VCS service group with SRDF

Overview of DR setup

Production node crashes (Time for HA!)

Loss of production site (Time for DR!)

Running at the DR site

Failback to the production site

The data is there- now what?

Gotchas  Mounts should be the same on both sides  SRDF needs to be “synchronous” Diskgroup, volumes, filesystem needs to be consistent “Adaptive copy” doesn’t cut it- individual devices in the disk group fall behind  Networking between sites needs to be robust Redundant: Prevent split-brain Fast: VCS heartbeats, data replication Big: Data replication, public network traffic  Freeze/disable failover to the DR servers Risk vs. Reward

Why have idle DR hardware?  Run Test, Development, Sandbox, Training, etc. environments on DR equipment when it’s not needed.  Load on these environments will probably be minimal if you’re in “DR Mode”  Also add these environments to VCS Easily offline if horsepower is needed for DR Service group dependencies

Global cluster service groups?  Adds complexity that may not be needed Networking (DNS, etc.) Management in VCS (Cluster of clusters) GCO Proxy  Instead, use parts of Global Cluster Replication agents

Oracle RAC (parallel service groups)  Oracle RAC between metro sites using data replication requires use of Global Cluster service groups because you’re failing between clusters, not machines. All-or-nothing at each site (because only 1 site can have valid data access at a time) is enforced by GCO  Machine-based failover for Oracle RAC within each site is primarily handled by Oracle RAC itself.

Remember…  Don’t underestimate the power of network and storage magic.  Call to report IPL power outages  VCS makes “belt & suspenders” easy for metro failover clusters with robust infrastructure.  A “fall back to an hour ago / yesterday / last week” situation requires other planning besides this (backups).  Your mileage may vary.

Obligatory slide of logos

Questions?  Any questions? Ray English