Five 9s for SANs w/o Breaking the Bank Presented by Marc Staimer President & CDS (Chief Dragon Slayer) Dragon Slayer Consulting.

Slides:



Advertisements
Similar presentations
Tales from the Lab: Experiences and Methodology Demand Technology User Group December 5, 2005 Ellen Friedman SRM Associates, Ltd.
Advertisements

Business Continuity/Disaster Recovery: Solutions For Firms Of All Sizes Atlas Lee, CBCP Director Of Business Continuity Atlas Lee, CBCP Director Of Business.
Orbitz Worldwide at a Glance
1Abacast - Confidential1 Hybrid Content Delivery Network (CDN) Technologies and Services.
1Abacast - Confidential1 Hybrid Content Delivery Network (CDN) Technologies and Services.
1Abacast - Confidential1 Hybrid Content Delivery Network (CDN) Technologies and Services.
BRET JOHNSON-SR DIRECTOR OF SALES TSYS MERCHANT SOLUTIONS
OVERVIEW Virtualization Defined Server Virtualization
Copyright © 2009 EMC Corporation. Do not Copy - All Rights Reserved.
NetApp Confidential - Limited Use
Introducing FailSafeSolutions Online Backup Software.
Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 2: Capacity.
Ed Duguid with subject: MACE Cloud
1 IU Campus GENI/Openflow Experience Matt Davy Quilt Meeting, July 22nd 2010.
© 2014 Vicom Infinity Storage System High-Availability & Disaster Recovery Overview [638] John Wolfgang Enterprise Storage Architecture & Services
Introduction to Storage Area Network (SAN) Jie Feng Winter 2001.
SERVICE LEVEL AGREEMENTS The Technical Contract Within the Master Agreement.
A new standard in Enterprise File Backup. Contents 1.Comparison with current backup methods 2.Introducing Snapshot EFB 3.Snapshot EFB features 4.Organization.
MUNIS Platform Migration Project WELCOME. Agenda Introductions Tyler Cloud Overview Munis New Features Questions.
Business Continuity Section 3(chapter 8) BC:ISMDR:BEIT:VIII:chap8:Madhu N PIIT1.
© 2009 EMC Corporation. All rights reserved. Introduction to Business Continuity Module 3.1.
Building a Business Case for Disaster Recovery Planning - State and Local Government Chris Turnley
1 Disk Based Disaster Recovery & Data Replication Solutions Gavin Cole Storage Consultant SEE.
Copyright ©2003 Digitask Consultants Inc., All rights reserved Storage Area Networks Digitask Seminar April 2000 Digitask Consultants, Inc.
Business Continuity and DR, A Practical Implementation Mich Talebzadeh, Consultant, Deutsche Bank
NOT FOR PUBLIC DISTRIBUTION State of Minnesota Technology Summary February 24, 2011.
John Graham – STRATEGIC Information Group Steve Lamb - QAD Disaster Recovery Planning MMUG Spring 2013 March 19, 2013 Cleveland, OH 03/19/2013MMUG Cleveland.
CompSci Self-Managing Systems Shivnath Babu.
Treasury in the Cloud Bob Stark – Vice President, Strategy September 17, 2014.
CHAPTER FIVE Enterprise Architectures. Enterprise Architecture (Introduction) An enterprise-wide plan for managing and implementing corporate data assets.
Sofia, Bulgaria | 9-10 October SQL Server 2005 High Availability for developers Vladimir Tchalkov Crossroad Ltd. Vladimir Tchalkov Crossroad Ltd.
SIOS – Comprehensive High Availability Options for your VMware Environment.
NOAA WEBShop A low-cost standby system for an OAR-wide budgeting application Eugene F. Burger (NOAA/PMEL/JISAO) NOAA WebShop July Philadelphia.
Service Overview CA- IROD- Instant Recovery on Demand CRITICAL SERVER CONTINUITY, NON-STOP OPERATIONS, TOTAL DATA PROTECTION Turnkey solution that provides.
Co-location Sites for Business Continuity and Disaster Recovery Peter Lesser (212) Peter Lesser (212) Kraft.
©2006 Merge eMed. All Rights Reserved. Energize Your Workflow 2006 User Group Meeting May 7-9, 2006 Disaster Recovery Michael Leonard.
workshop eugene, oregon What is network management? System & Service monitoring  Reachability, availability Resource measurement/monitoring.
1 © 2004 Cisco Systems, Inc. All rights reserved. Rich Gore Cisco Case Study: Storage Networking and the Cisco MDS 9509 Multilayer.
CD FY08 Tactical Plan Status FY08 Tactical Plan Status Report for Network Infrastructure Upgrades Rick Finnegan April 22, 2008.
McLean HIGHER COMPUTER NETWORKING Lesson 15 (a) Disaster Avoidance Description of disaster avoidance: use of anti-virus software use of fault tolerance.
A Snapshot on MPLS Reliability Features Ping Pan March, 2002.
Business Data Communications, Fourth Edition Chapter 11: Network Management.
Clustering In A SAN For High Availability Steve Dalton, President and CEO Gadzoox Networks September 2002.
Net Optics Confidential and Proprietary 1 Bypass Switches Intelligent Access and Monitoring Architecture Solutions.
OSIsoft High Availability PI Replication
Business Continuity Overview
2012 Infotec Conference Leveraging the Cloud to Increase Availability and Improve Resilience Presenter Kevin Swagerty Executive Director of IT Service.
Continuous Availability
SCSI RAID 101 Thomas Weeks SCSI Hardware Basics Most RAID uses SCSI for its hardware drive/interface "fabric". You must make sure that your.
Remote Data Mirroring Solutions for High Availability David Arrigo EMC Corporation
Ashish Prabhu Douglas Utzig High Availability Systems Group Server Technologies Oracle Corporation.
Listen. Plan. Deliver. Project Delivery Summit – A Necessary Evil, and Why they are important… December 9, 2015 Project Delivery Summit Service.
Build in the Cloud Cloud-Based Construction Software DJ JonesReed Clarke.
Component 8/Unit 9aHealth IT Workforce Curriculum Version 1.0 Fall Installation and Maintenance of Health IT Systems Unit 9a Creating Fault Tolerant.
Virtual Machine Movement and Hyper-V Replica
1 High-availability and disaster recovery  Dependability concepts:  fault-tolerance, high-availability  High-availability classification  Types of.
OSIsoft High Availability PI Replication Colin Breck, PI Server Team Dave Oda, PI SDK Team.
Enterprise Architectures
Introduction to High Availability
Adam Backman Chief Cat Wrangler – White Star Software
Server Upgrade HA/DR Integration
High Availability 24 hours a day, 7 days a week, 365 days a year…
DISASTER RECOVERY INSTITUTE INTERNATIONAL
Atlas Lee, CBCP Director Of Business Continuity
Uptime Made Easy SIMPLE STABLE TRUSTED AVAILABLE.
Atlas Lee, CBCP Director Of Business Continuity
Atlas Lee, CBCP Director Of Business Continuity
Minimize Unplanned Downtime and Data Loss with OpenEdge
Presentation transcript:

Five 9s for SANs w/o Breaking the Bank Presented by Marc Staimer President & CDS (Chief Dragon Slayer) Dragon Slayer Consulting

Agenda What is Five 9s? How this relates to SANs Reality Check What you should do

What is Five 9s & What does it Really Mean?

Five 9s Generally Defined % is another term for “High Availability”

What does “Availability” mean? Availability is the proportion of time that a system can be used for productive work

Then what does “five 9s” mean? Scheduled & Unscheduled downtime does not exceed ~ 5 minutes per year Perspective: Annual downtime = Less time than it takes to drink a cup of coffee 1/6 th the time of the average daily commute

What about Four 9s or less? Four 9s = ~ an hour of downtime/yr Three 9s = ~ 9 hours of downtime/yr Two 9s = ~ 4 days (88 Hours) of downtime/yr

Can you live two, three, or four 9s? …it Depends On the Application The types of outages you can live with The cost of downtime for those applications The cost of high availability such as five 9s.

Application Availability Dependencies Mission criticalness Productivity loss from downtime Alternatives

Outage dependencies You may be able to live w/two 9s if: There are 88 separate outages of 1 hour each through the year It is a different story if it is 1 outage nearly 4 days This could put a business out of business

Cost of downtime The cost of app downtime can be prohibitive

Direct costs of downtime per Gartner Group Industry Average Loss/Hr. Brokerage Operations$6,450,000 Credit Card Authorizations$2,600,000 E-commerce$240,000 Package Shipping Services$150,250 Home Shopping Channels$113,750 Catalog Sales Center$90,000 Airline Reservation Center$89,500 Cellular Service Activation$41,000 ATM Service Fees$14,500

Collateral damage of downtime is more per Gartner Group CompanyDirect CostCollateral Damage eBay> $5,000,000Dramatic Mkt cap reduction ATT> $10,000,000~$40 million in rebates +SLAs Collateral damage is more serious than temporary loss of business Collateral damage severity increases as business moves online

Making “availability” five 9s, has cost too Old rule of thumb: 1st 80% 20% of Cost Last 20% 80% of Cost Per IMEX Research

There must be tradeoffs Per IMEX Research

Finding the crossover point is key 90% 99% 99.90% 99.99% % 100% System Cost Percent Available Excessive Downtime Costs Excessive System Costs System Uptime Requirements Annual Business Downtime Cost

How: Thorough Environment Knowledge Systems Hardware Software Data Productivity Direct cost of downtime and collateral damage

What about disasters & downtime Not if, when There will eventually be a major interruption of your business environment

Test, test, test Whatever your business continuity plans Make sure you can recover your business in the event of a failure Test, test, test One end-user claims to backup to tape every month, except he backs up onto the same tape every time, even when the system asks for a new tape

Reasons cited by European Enterprises for invocation of Business Continuity Plans From Hardware Failure60% Software16% Power Outage7% Bomb3% Fire3% Flooding3% Environmental2% Telecom Failure1% Denied access1% Miscellaneous4%

Reasons cited by USA Enterprises for invocation of Business Continuity Plans From Regional Event40% Hardware Failure36% Software10% Power Outage4% Bomb2% Fire2% Flooding2% Environmental1% Telecom Failure1% Denied access1% Miscellaneous1%

How does all this relate to SANs?

SANs have become the critical path of “high availability” or five 9s. When an application server fails Only the users using that app are affected When shared storage goes down Users of the applications using that storage are affected When the SAN goes down All users are affected

Complete availability vs. high availability w/reduced capabilities Five 9s w/no loss of capabilities Full Bandwidth all the time w/no pr Five 9s w/reduced capabilities Reduced Bandwidth Higher probability of path congestion Similar to differences between RAID 0,1 & RAID 5

Five 9s SANs with full capabilities Director class switches Full bandwidth between Initiators & target storage Even with a failure in the Director or fabric

Five 9s SANs with reduced capabilities Core/edge networking Oversubscribed B/W Path failures mean Auto failover Reduced B/W Increased possibilities of congestion

Fabric Comparison or Red Herring? 96 Port Resilient Core/Edge Fabric 128 Port Fault Tolerant Director Fabric or 128 Port Dual 64 Port Directors Core Switch Edge Switch Using 16-port switches Core Edge

Directors vs. Core/Edge Switches Directors - five 9s fully capable Cost ~ $2,500/port Mask failures Apps never know it fails Full B/W even with failures Simple to set up & manage Fault tolerant Network: up to 239 switches/directors Up to 256 ports/director Can be Core or Edge switch Switches - five 9s, w/reduced failure mode capabilities Cost ~ $1,000/port Oversubscribed B/W Congestion statistically unlikely Failures mean loss of B/W More difficult to set up/manage Fault resilient Network: up to 239 switches Up to 64 ports/switch Can be Core or Edge Switch

Reality Check Core/edge & Directors are not mutually exclusive Models can & should be mixed Some apps cannot handle fabric disruptions of any kind Some fabrics can never ever have reduce capacity Some apps do not have to have full B/W all the time

Fabric Design “five 9s” Factors The larger the switch/director nodes The less likely there will be inter-switch/director traffic The more oversubscribed your fabric can be w/o increased risk The more important “HA” becomes in the node itself FSPF has limited failover capabilities The loss of a path in the fabric (ISL failure) will cause failover Failover may not be fast enough to avoid SCSI device timeout Edge device retransmissions or failover must be designed in

The Key is determining where to implement with what & when Use the same ROE as before Thorough knowledge of the data & environment Hardware, software, systems, etc. Match the type of SAN to the application

What you should do Educate yourself about your data & environment Design your SANs to meet the needs of the business Provide five 9s with full capability for those apps that need it Provide five 9s with less than full capability for those apps that don’t need it Making your entire SAN environment completely five 9s w/no loss of capabilities could be cost prohibitive

SAN Design Methodology Transition Data Collection Data Analysis Arch Develop Prototype and Test Release to Production Add / Change/ Remove /Mgt / Trouble shoot Design Implementation Maint. Upgrade / Architectural change

Other tools you can use Interactive online high availability interrogator Helps determine the cost of your downtime White papers

Marc Staimer