Presentation is loading. Please wait.

Presentation is loading. Please wait.

Five 9s for SANs w/o Breaking the Bank Presented by Marc Staimer President & CDS (Chief Dragon Slayer) Dragon Slayer Consulting.

Similar presentations


Presentation on theme: "Five 9s for SANs w/o Breaking the Bank Presented by Marc Staimer President & CDS (Chief Dragon Slayer) Dragon Slayer Consulting."— Presentation transcript:

1

2 Five 9s for SANs w/o Breaking the Bank Presented by Marc Staimer President & CDS (Chief Dragon Slayer) Dragon Slayer Consulting

3 Agenda What is Five 9s? How this relates to SANs Reality Check What you should do

4 What is Five 9s & What does it Really Mean?

5 Five 9s Generally Defined % is another term for “High Availability”

6 What does “Availability” mean? Availability is the proportion of time that a system can be used for productive work

7 Then what does “five 9s” mean? Scheduled & Unscheduled downtime does not exceed ~ 5 minutes per year Perspective: Annual downtime = Less time than it takes to drink a cup of coffee 1/6 th the time of the average daily commute

8 What about Four 9s or less? Four 9s = ~ an hour of downtime/yr Three 9s = ~ 9 hours of downtime/yr Two 9s = ~ 4 days (88 Hours) of downtime/yr

9 Can you live two, three, or four 9s? …it Depends On the Application The types of outages you can live with The cost of downtime for those applications The cost of high availability such as five 9s.

10 Application Availability Dependencies Mission criticalness Productivity loss from downtime Alternatives

11 Outage dependencies You may be able to live w/two 9s if: There are 88 separate outages of 1 hour each through the year It is a different story if it is 1 outage nearly 4 days This could put a business out of business

12 Cost of downtime The cost of app downtime can be prohibitive

13 Direct costs of downtime per Gartner Group Industry Average Loss/Hr. Brokerage Operations$6,450,000 Credit Card Authorizations$2,600,000 E-commerce$240,000 Package Shipping Services$150,250 Home Shopping Channels$113,750 Catalog Sales Center$90,000 Airline Reservation Center$89,500 Cellular Service Activation$41,000 ATM Service Fees$14,500

14 Collateral damage of downtime is more per Gartner Group CompanyDirect CostCollateral Damage eBay> $5,000,000Dramatic Mkt cap reduction ATT> $10,000,000~$40 million in rebates +SLAs Collateral damage is more serious than temporary loss of business Collateral damage severity increases as business moves online

15 Making “availability” five 9s, has cost too Old rule of thumb: 1st 80% 20% of Cost Last 20% 80% of Cost Per IMEX Research

16 There must be tradeoffs Per IMEX Research

17 Finding the crossover point is key 90% 99% 99.90% 99.99% % 100% System Cost Percent Available Excessive Downtime Costs Excessive System Costs System Uptime Requirements Annual Business Downtime Cost

18 How: Thorough Environment Knowledge Systems Hardware Software Data Productivity Direct cost of downtime and collateral damage

19 What about disasters & downtime Not if, when There will eventually be a major interruption of your business environment

20 Test, test, test Whatever your business continuity plans Make sure you can recover your business in the event of a failure Test, test, test One end-user claims to backup to tape every month, except he backs up onto the same tape every time, even when the system asks for a new tape

21 Reasons cited by European Enterprises for invocation of Business Continuity Plans From Hardware Failure60% Software16% Power Outage7% Bomb3% Fire3% Flooding3% Environmental2% Telecom Failure1% Denied access1% Miscellaneous4%

22 Reasons cited by USA Enterprises for invocation of Business Continuity Plans From Regional Event40% Hardware Failure36% Software10% Power Outage4% Bomb2% Fire2% Flooding2% Environmental1% Telecom Failure1% Denied access1% Miscellaneous1%

23 How does all this relate to SANs?

24 SANs have become the critical path of “high availability” or five 9s. When an application server fails Only the users using that app are affected When shared storage goes down Users of the applications using that storage are affected When the SAN goes down All users are affected

25 Complete availability vs. high availability w/reduced capabilities Five 9s w/no loss of capabilities Full Bandwidth all the time w/no pr Five 9s w/reduced capabilities Reduced Bandwidth Higher probability of path congestion Similar to differences between RAID 0,1 & RAID 5

26 Five 9s SANs with full capabilities Director class switches Full bandwidth between Initiators & target storage Even with a failure in the Director or fabric

27 Five 9s SANs with reduced capabilities Core/edge networking Oversubscribed B/W Path failures mean Auto failover Reduced B/W Increased possibilities of congestion

28 Fabric Comparison or Red Herring? 96 Port Resilient Core/Edge Fabric 128 Port Fault Tolerant Director Fabric or 128 Port Dual 64 Port Directors Core Switch Edge Switch Using 16-port switches Core Edge

29 Directors vs. Core/Edge Switches Directors - five 9s fully capable Cost ~ $2,500/port Mask failures Apps never know it fails Full B/W even with failures Simple to set up & manage Fault tolerant Network: up to 239 switches/directors Up to 256 ports/director Can be Core or Edge switch Switches - five 9s, w/reduced failure mode capabilities Cost ~ $1,000/port Oversubscribed B/W Congestion statistically unlikely Failures mean loss of B/W More difficult to set up/manage Fault resilient Network: up to 239 switches Up to 64 ports/switch Can be Core or Edge Switch

30 Reality Check Core/edge & Directors are not mutually exclusive Models can & should be mixed Some apps cannot handle fabric disruptions of any kind Some fabrics can never ever have reduce capacity Some apps do not have to have full B/W all the time

31 Fabric Design “five 9s” Factors The larger the switch/director nodes The less likely there will be inter-switch/director traffic The more oversubscribed your fabric can be w/o increased risk The more important “HA” becomes in the node itself FSPF has limited failover capabilities The loss of a path in the fabric (ISL failure) will cause failover Failover may not be fast enough to avoid SCSI device timeout Edge device retransmissions or failover must be designed in

32 The Key is determining where to implement with what & when Use the same ROE as before Thorough knowledge of the data & environment Hardware, software, systems, etc. Match the type of SAN to the application

33 What you should do Educate yourself about your data & environment Design your SANs to meet the needs of the business Provide five 9s with full capability for those apps that need it Provide five 9s with less than full capability for those apps that don’t need it Making your entire SAN environment completely five 9s w/no loss of capabilities could be cost prohibitive

34 SAN Design Methodology Transition Data Collection Data Analysis Arch Develop Prototype and Test Release to Production Add / Change/ Remove /Mgt / Trouble shoot Design Implementation Maint. Upgrade / Architectural change

35 Other tools you can use Interactive online high availability interrogator Helps determine the cost of your downtime White papers

36 Marc Staimer


Download ppt "Five 9s for SANs w/o Breaking the Bank Presented by Marc Staimer President & CDS (Chief Dragon Slayer) Dragon Slayer Consulting."

Similar presentations


Ads by Google