Uptime All the Time: Datacenter Availability Strategies

Uptime All the Time: Datacenter Availability Strategies
Use this slide while people are walking into the room, but before you start talking, advance to the blank slide. It will draw the audience’s attention to you.

INTRODUCTION Presenter’s name Content of discussion – The common misperceptions of high availability, disaster recovery and the future high availability directions Goal of discussion – Dispel some of the misperceptions of high availability while highlighting the competitive differentiators of VERITAS solutions FIND OUT Do you have availability needs? Have you determined your mission and business critical applications? Do you have a disaster recovery need? Are you currently using replication? What platforms are in your environment Where are your data centers located? DISCUSSION Discuss availability needs Recommend technologies based on needs Clustering for application availability Replication for data availability TRANSITION There are many threats to availability that impact the environment

Threats to availability
DATA CORRUPTION COMPONENT FAILURE APPLICATION FAILURE INTRODUCTION: Availability is fundamentally the most important aspect of a datacenter. There are many threats to the availability that organizations are trying to protect against PROBLEM Data corruption: The biggest threat to data availability. The easiest way to protect against this threat is to take frequent snapshots of the data in order to roll back to a known good copy Component failure: Many hardware vendors are doing a good job at ensuring their components are available but this is still a threat in the data center Application failure: Quickly becoming the most centric part of the availability in the organization. Application failures mean direct lost revenue because transactions can not be processed Human Error: The process of not automating events can lead to large amounts of human error Maintenance: Maintenance is by far the largest contributor to downtime in the environment. According to Gartner 80% of downtime is planned. Site Outage: Even if an organization does everything to protect against the local failures that effort will not ensure availability if an entire site is out. Therefore, site availability must also be implemented to achieve an availability plan. TRANSITION In order to achieve availability and protect against these threats different technologies need to be implemented. The easiest way to decide what technologies should be used can be found in the availability curve. COLOR SPOTS Data corruption: How many times have you permanently deleted a file and wish you could get it back. If you are only doing backups the last good copy would be from yesterday instead of earlier today. Human Error: I was at a customer site during a DR test and everyone was there including the CTO, IT Directors and Managers as well as the sys admins. It was over the weekend and the purpose of the meeting was to test the DR plan. At that time the customer was replicating their data but not doing any clustering. The sys admin started the DR test and it failed. Why? Because he worked the previous week to completely script the failover and mistyped one command. As a result of this simple human error the DR failover failed in front of all of his management team. Now what would happen if this was a true disaster? Maintenance: I went on to my mobile carrier the other day to pay my bill and could not log on because it said “closed for 2 hours due to site maintenance”. Or I went on my back to pay my bills and it was down for 3 hours on Friday night (the night I pay my bills) because of a site outage. HUMAN ERROR MAINTENANCE SITE OUTAGE

Achieving appropriate levels of availability
remote clustering availability remote replication local clustering INTRODUCTION: The basic level of availability is backup. All data needs to be protected using an effective backup solution. Local Mirroring: Provides the ability for real time data availability within the local data center (protects against data corruption) This is what our Storage Foundation products provide Local Clustering: Once the data is protected, the next step is application and server availability vis local clustering After local availability has been achieved the next step to a comprehensive availability plan is ensuring site availability. The way to do this is to ensure the data and availability are available at the remote site Remote replication: Provides the ability to move the data in real time to another location. Remote clustering: The data is only available as good as the application accessing the data. By implementing these levels of availability, the availability objectives are met while providing immediate ROI. ADVANCE SLIDE PROOF: Achieving availability not only protects the environment from downtime but it can also provide immediate and measurable ROI. El Camino Hospital: Located down the street, implemented our entire suite of HA products to achieve 24x7 availability while consolidating servers and bringing everything under cluster control. This automation process has allowed them to achieve an ROI of $3.4 million over 24 months by saving on hardware, software and maintenance costs. TRANSITION Typically, customers want to achieve availability but they have many misconceptions about the process of achieving high availability NOTES Local Mirroring: Talk about snapshots, point in time copies, to provide real time data protection from corruption Remote replication: Talk about Volume Manager for replication over Fibre or VVR for replication over IP Remote clustering: Highlight that VERITAS provides a complete solution to tie your replicatoin (whether from VERITAS or someone else) and clustering together in a single solution. data availability backup

Common misperceptions of high availability
It’s expensive It’s complex It’s difficult to measure It’s a different problem than disaster recovery It’s hard to test Note: This slide has hyperlinks to the different areas so that you can jump around and focus on the areas that the customer wants to hear about. To get back to this slide simply click on the VERITAS logo on the bottom right hand side. This can be very useful if you only have a few minutes to give the presentation because it allows you to skip directly to the one or two issues the customer is most interested in. INTRODUCTION: As availability is talked about within many organization there are several thoughts that seem to go through many of our customers minds. Most misconceptions can be boiled down to these four areas PROBLEM It’s Expensive: Have to have idle hardware, double the amount of server capacity per application, duplicate systems, duplicate sites, same operating system, etc, etc. This means to achieve high availability it is expensive. It’s Complex: This means the achieving HA could cause downtime. Setting up HA is difficult and complex. And every time you add a new app, os, server, etc it is like starting from scratch. Not to mention most competitors require weeks worth of consulting dollars to get a single 2 node cluster up and running. It’s Difficult to Measure: Once you have the HA up and running you have no idea whether it is really working or really meeting the SLA’s you have established for the business. Its’ Hard to Test: There is no way to test the environment without stopping production completely. Now we will address these common misconceptions one by one and then talk about how HA relates to an overall DR strategy. Transition: The first misconception that we should address is It’s Expensive.

High availability doesn’t have to be expensive
Myth: Hardware redundancy OS restrictions Poor utilization Recommendation: Add Clustering, Not Hardware Any platform Optimal server utilization INTRODUCTION: HA doesn’t have to be expensive. PROBLEM Typically HA requires: Hardware redundancy: The same server (ex. E15k clustered with another E15k) to create a 2 node cluster = EXPENSIVE OS restrictions: Example Solaris has to cluster Solaris 8 with Solaris 8, Microsoft requires Advanced Server to run clustering which means you have to purchase all application at the Advanced Server level as well = EXPENSIVE Poor Utilization: Since most clustering implementations are running active/passive environments there is a lot of compute power out there sitting idle which results in poor server utilization = EXPENSIVE ADVANCE SLIDE RECOMMENDATION Add clustering, Not Hardware: VCS was built from the core to maximize utilization without requiring additional server purchases. VERITAS is not a hardware company and doesn't really care if you buy more hardware from your server vendor. Any platform: With VERITAS you can cluster different severs (E15k with a v880) or different OS levels (MS Server with Advanced Server) the only requirement is that the VCS version be the same. Optimal server utilization: You can build bigger clusters, reduce idle hardware to maximize the compute power within your datacenter BENEFIT Cost savings: You can now use what you have within your environment or add clustering to the current infrastructure without having to purchase additional hardware. Use the compute power you have paid for. TRANSITION Now lets take a look at the traditional way to do clustering today.

Increase value of current hardware investment
Recommended Approach Traditional Approach INTRODUCTION: You should utilize what you currently have in your environment to increase the value of your current hardware investment. PROBLEM The traditional approach is how most customers today doing clustering have configured their environment. If this is your chosen approach we do have tools to help ease the management headache of this configuration = Management option of VCS = CC Availability However, there are issues to this approach Typically customers are running active/active (the grey/grey) cluster or active/passive (grey/shaded out) server. The active/passive environment provides the highest levels of availability but it is also the most expensive because you have a server sitting completely idle waiting for a failure to occur The active/active environment is much more cost effective but is provides lower levels of availability because if one server fails then the additional server would have to do the work of both servers = low performance ADVANCE SLIDE You can see the exact same configuration with the same number of servers RECOMMENDATION Take the current infrastructure that you have with the 2 node clusters Move the spare/idle servers to a resource pool to be used for other purposes Move the exact same configuration (using zones) to a single 8 node clusters. You get the same benefit of the 2-node clusters without having to tons of idle hardware Now you just have 1 roaming spare vs 5 spares Overall utilization rate goes up! BENEFIT Reduce the need for tons of idle hardware Increase utilization Cluster different servers/OS levels in a single cluster No impact to the users since they don’t know which physical server they are accessing TRANSITION By looking at this configuration it may seem complex to implement a cluster configuration. Utilization: Utilization: “With VERITAS, we're up all the time. We've seen instant results and tangible cost savings, for a 24 month ROI of about $3.4 million.” Joe Wagner, CTO, El Camino Hospital/Eclipsys Outsourcing

High availability doesn’t have to be complex
Myth: Costly installation Uncertainty of configuration Labor intensive management 5 platforms  5 Solutions Recommendation: VERITAS Cluster Server Cost effective installation Cluster simulator Simple management 5 Platforms  1 Solution INTRODUCTION: High Availability doesn’t have to be complex PROBLEM Costly installation: Most vendors charge a lot of money to do the installation of a cluster Every time a new application comes online you have to buy more professional services Uncertainty of Configuration: You can be certain a configuration is going to work until you have put it in the production. Thereby, potentially causing downtime. Labor Intensive Management: Difficult to manage across the data center, across different OS, servers, apps, etc 5 platforms  5 Solutions: Every OS requires a different clustering solution ADVANCE SLIDE RECOMMENDATION Use VERITAS Cluster Server. BENEFIT VCS is: Cost effective installation: Professional services can be used but it is cost effective and less expensive than the competitor. Once you get an initial cluster up and running any admin can add to or build a new cluster on their own = No more professional services VCS Propagate installation: change config within one node and goes to all the nodes CCA can propagates policies within and across clusters, locally and globally Cluster simulator: Test the configuration from spare systems or your laptop before putting into production = No downtime with certainty Simple management: Easily manage all your cluster service groups and different operating system from a easy to use GUI Failover easily across nodes within a cluster or across distance Failover is the same no matter what distance 5 Platforms  1 Solution: No matter what OS you are running you use the same solution = simple management and data center standarization. Free to choose different OS, servers, apps based on project. PROOF This customer was amazed at how easy it was to get VCS up and running. The competitive product (HACMP) took 3 days to get up and running and they got a new cluster running in 90 minutes. These are just a few of the VCS screen shots across Linux, Solaris and Windows. As you can see they look the exact same no matter what OS is running. TRANSITION So now you have your clustering up and running but the next issue is determining if you are meeting the business requirements set forth before the project was initiated. Typical solutions have no way of measuring how much availability the organization has achieved. NOTES Transition: Talk about meeting 5 9’s, or SLA’s. How do you know you are meeting them? “We were amazed. We had VERITAS Cluster Server out of the box and failing over applications in less than 90 minutes." IT Director, Financial Company

Availability can be measured
Myth: Unmeasurable SLAs No historical reports Unclear of problems Recommendation: Use integrated reporting tool Track availability Report Results Identify problems INTRODUCTION: Availability can be measured PROBLEM Business units make goals to meet certain SLA’s but there is no way to measure the progress No historical reports of past availability problems If there is downtime it is nearly impossible to determine what the true problem is Not sure if it is app failure, component failure, human error, etc, etc ADVANCE SLIDE RECOMMENDATION Use the integrated reporting tool (now called CCA). The integrated tool will provide the ability: Track availability = You can now know if you are meeting your SLA requirements Report results = look at past and current to report the availability to the different business units Identify problems = analyze the reports to determine what the availability problem is BENEFIT You can now know you are achieving the availability results you determined There are reports the tool puts out to show you the availability you are achieving. PROOF The screen shot shows and Exchange group and how long it was down, online, etc TRANSITION Another issue with ha is the fact that you can’t test the environment.

Disaster recovery can be one click away
Myth: It’s too expensive. What about the application? Unachievable. Recommendation: Protect the data and application Integrated technology Automate the recovery steps INTRODUCTION: Disaster Recovery can be one click away PROBLEM Typically customers do not deploy higher levels of disaster recovery because they believe it is too expensive When thinking of DR they only talk about the data. What about the app? The data is only as good as app If you didn’t have Microsoft word installed on your laptop it wouldn’t matter that you wrote a doc in word format because you wouldn’t be able to read it Customers think that no one is doing it = it is unachievable ADVANCE SLIDE RECOMMENDATION Make sure you are protecting the data and the application DR is not just backup, especially for critical applications Backup is the needed safety net but most business/mission critical apps need more than a backup approach BENEFIT VERITAS provides a completely integrated solution Clustering and replication integrated into one solution Replication can be from VERITAS (Storage Foundation/ VVR) or EMC SRDF, HDS TrueCopy, NetApp Snapmirror Eliminate human error by automating the recovery steps. With a single button click you can failover an entire data center so everything is automated. PROOF There are many customers that are deploying VERITAS DR solutions today As you can see there are many customers that have deployed our solutions For customer examples, read the case studies for more details. TRANSITION You do not have rebuild your entire architecture to implement DR. You can protect locally and go remotely with the same solution set.

Datacenter Availability – Solutions
HA/DR ARCHITECTURE APPLICATION/ DATABASE SOLUTIONS HA/DR Architecture AdvancePCS (Caremark) Relies on VERITAS for wide area failover in heterogeneous environments. With over 20 million plan participants AdvancePCS (now Caremark) relies on VERITAS to ensure availability of key applications for prescription services for more than 1,200 plan sponsors. VERITAS Storage Foundation, VERITAS Volume Replicator, and VERITAS Cluster Server provide local and wide failover capability of databases and applications in a heterogeneous environment. BT "By delivering 100 percent availability, VERITAS Cluster Server ensures BT Broadcast Services delivers uninterrupted, high quality broadcasting solutions. It's a remarkable product." “VERITAS Cluster Server allowed us to achieve the necessary distance between our two sites and support all major third-party server and storage platforms. “VERITAS Cluster Server is also a mature, proven product - designed and engineered by the global leader in high availability solutions: VERITAS Software. “The final seal of approval arose from our highly successful experience of VERITAS NetBackup and VERITAS Foundation Suite, which is used widely within our business.” Application/Database Solutions: KPN Needed to ensure high availability of its website, kpn.com, for its customers, business partners, and subscribers, with access to online billing, customer service, sales, and product information. Any disruption in supporting the needs of KPN’s 7.5 million fixed-line subscribers and 16.4 million mobile customers would mean loss of business and revenue. Solution: KPN decided to deploy a high availability solution using a clustered architecture and data mirroring to a remote site. KPN selected a portfolio of software products which includes VERITAS Database Edition for Oracle, VERITAS Cluster Manager, VERITAS File System, VERITAS Volume Manager, and VERITAS Volume Replicator. Benefits: With the deployment of the solution powered by VERITAS, kpn.com is providing continuous high availability access for 7M daily site hits and 100K unique users. The company is also significantly reducing business risks and potential revenue loss through the automated failover capability of the cluster and real-time mirroring of its database to a remote site. The solution is also lowering TCO by increasing the capacity utilization of servers and automating operations such as failover and backup support. HA Standardization: ATA: Standardize across UNIX and Windows As much as 97% reduction in backup administration time 93% increase in availability of business-critical data 96% reduction in planned server downtime Bluestar Solutions Standardize across Solaris, HPUX, AIX, Windows and Linux Single solution across local and remote datacenter (Pheonix and Dallas) Single tool no matter which OS, servers or storage they chose Orange Orange Denmark uses Volume Manager, File System, Database Edition for Oracle, and VCS with Solaris, HPUX, and Windows "...We live in a heterogeneous world," Olesen explains. "VERITAS gives us the ability to take different platforms on the system side, different platforms on the disk side, and be able to integrate everything and have it work to keep the PeopleSoft application secure and accessible for our internal and external customers..." VERITAS EMC REPLICATION HDS/HP Oracle IBM NetApp HA STANDARDIZATION

Replication can be bulletproof and cost effective
Traditional Approach: It’s Expensive It’s Proprietary Can’t go the distance Recommendation: Replicate the Volume Bulletproof Replication Over… Any hardware Any network Any distance “Cost was another key consideration because unlike its competition, VERITAS Volume Replicator does not require additional hardware to operate, dramatically lowering the overall cost of ownership” IT Director, Banco Santander International Hardware Investment $35K VERITAS $50K $150K Traditional Approach Note: If the customer likes SRDF, go over this slide quickly. Offer VERITAS approach as another alternative and highlight the differences. INTRODUCTION: Replication can be bulletproof and cost effective because there other solutions on the market available. PROBLEM Traditional approaches are expensive because… Its proprietary because you have to use the same storage at all locations Limited distance: Need dual dedicated Fibre connectivity for short distance or have to pay for expensive Fibre over IP converter devices ADVANCE SLIDE RECOMMENDATION Why don’t you replicate the volume? You have VM installed so you can use that same tool for replication Volume Manager (found in Storage Foundation): Can mirror data over Fibre Channel connectivity (just another mirror) The IP option to Volume Manager called Volume Replicator can replicate data over IP connection BENEFIT You can bulletproof replication (certified on Oracle to guarantee data consistency) over… Any hardware: Replicate between any storage devices Any network: VM = Fibre, VVR = IP, no specialized network devices required. Any distance: Your DR site can be anywhere using the same tools PROOF Cost is a concern when looking at a complete clustering and replication solution. VERITAS can save you money when choosing a replication solution. As you can see Banco Santandar chose VERITAS because they managed to save a ton of money with VERITAS. They did not need additional hardware to operate the solution. The VERITAS approach can cost around $35k where the traditional approach is more expensive from a hardware standpoint and requires expensive networking devices TRANSITION So we have talked about high availability and the common perceptions and disaster recovery. Are there any other questions you may have on our current approach?

Availability can be easily tested
Myth: Risk of downtime Inconvenient/ time consuming Blind faith Recommendation: Run Firedrills No production impact Anytime Sure Knowledge INTRODUCTION: Availability can be easily tested PROBLEM When testing HA is goes directly against what you are trying to achieve because it actually causes downtime Not to mention it is time consuming and typically done over the weekends or the middle of the night Many steps to do a DR test, causes lots of downtime, have to set up all the BCV’s in SRDF. VERITAS simplified the process by automating the creation of a cloned environment and testing that cloned environment. Now DR testing can occur without a huge time investment through automated tools. And if you don’t test then you are running on blind faith that it will really work ADVANCE SLIDE RECOMMENDATION Run Firedrills Provides the ability to test your DR plan including clustering and replication without taking the production system down Creates a clone of the environment and tests the clone to see if it works BENEFIT You can fully test the environment without production impact. You can do it anytime. You can test the DR plan or you can failover a test app from one server to another to see if it comes out You now have sure knowledge that it will work…rather than running on blind faith. PROOF Disaster recovery testing has been a huge issue when it comes to preparing for a disaster. As you can see from the research, DR testing is a huge barriers in DR plans. TRANSITION So can you tell me a little about your Disaster Recovery needs?

Uptime All the Time: Datacenter Availability Strategies

Similar presentations

Presentation on theme: "Uptime All the Time: Datacenter Availability Strategies"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Uptime All the Time: Datacenter Availability Strategies

Similar presentations

Presentation on theme: "Uptime All the Time: Datacenter Availability Strategies"— Presentation transcript:

Similar presentations

About project

Feedback