Presentation is loading. Please wait.

Presentation is loading. Please wait.

Business Continuity & Disaster Recovery

Similar presentations


Presentation on theme: "Business Continuity & Disaster Recovery"— Presentation transcript:

1 Business Continuity & Disaster Recovery
Lauren Farese – Oracle Corporation Paul Christman – VERITAS Software Walter Callahan – State of Ohio

2 What happened on August 14th, 2003?

3 Disasters happen every day...its a fact!
Disasters cost money so why suffer by being unprepared? Organizations that survive typically have: management foresight tested procedures processes back-up facilities Business Continuity Planning (BCP) Business Continuity Planning is considered by most as a form of  insurance for a company or organization against catastrophic disasters power outages, terrorism, virus attacks etc.  Will BCP protect you from a disaster ? No , however, what it will do is provide your company with an ability to help quickly resume critical operations and mitigate the overall impact on your organization, the employees, shareholders and customers.  Over the years, there has been a distinct lack of focus by executives and managers on the protection of their organizations from disasters and serious business interruptions.  Since the advent of the client server, LAN and Internet application evolution, this attitude has grown more entrenched.  A typical response is "Why should we spend good money to develop a business continuity plan (or even a disaster recover plan) when we back up our data on a regular basis and can get equipment from a vendor at a moments notice?"

4 Downtime Costs Money Numbers assume $5B yearly revenue run rate. 95%
18 6 99% 3 15 36 99.9% 8 46 99.99% 53 99.999% 5 Percentage Availability Days Downtime Per Year (7x24x365) Hours Minutes % 1 $250M $51M $5,003,312 $504,136 $47,560 Cost$ $9,512 * To put this in persepective, 53 minutes of downtime is roughly the amount of time it would take to reboot your machine only 3 times in a year. 99.9% downtime is about 8 hours which is roughly the amount of time you need to have the system down in order to do the most basic of offline maintenance (ie. Table reorgs, patching of system, etc.) At this level of uptime you are assuming no major outages, catastrophes or system failures and it is still costing your enterprise over $5 Million/year. Numbers assume $5B yearly revenue run rate. * Oracle calculated costs and is not associated with the Standish Group Report

5 Business Continuity Planning vs. Disaster Recovery Planning
Both are directed at recovery of operations Business Continuity Planning is directed at the recovery and resumption of business activities across the entire enterprise Disaster Recovery Planning is usually directed at the recovery of information technology systems and business applications, including corporate data BCP addresses Processes, People and Property

6 Business Continuity Planning Phases
Typically three phases Pre-Planning Planning Post-Planning Critical success factor Cost is always an issue Executive ownership is critical Must be a business priority

7 Phase One: Pre-Planning
Project initiation and management Establish a need Executive management ownership Time and budget allocation Risk evaluation and control Events and environment issues Facilities and process evaluation Cost benefit analysis Impact analysis Disruption and disaster scenarios Critical business functions Recovery time analysis

8 Phase Two: Planning Develop continuity strategies
Alternative organizational recovery Operations and information systems Adhere to recovery time objectives Emergency response and operations Procedures for response and stabilization Establish operations center Emergency command and control Developing and implementing the plan Plan provides recovery within time objective

9 Oracle BCM Business Flow

10 Phase Three: Post-Planning
Awareness and training Create organizational awareness Enhance skills Maintaining and exercising Coordinate plan exercises Evaluate and document exercise results Develop process to maintain the plan Report results clearly and concisely Coordination and communication Communication with media, families, suppliers Crisis coordination with first responders, local authorities

11 What about the technology?

12 Match the Tools to the Business Needs
Secs Mins Hrs Days Wks Recovery Point Recovery Time Tape or Disk Backup Async. Replication Sync. Replication Clustering Remote Replication Online Restore Tape Restore When you have a site outage, there’s two key factors to consider: Recovery Point (data loss) and Recovery Time (downtime). Organizations should have a Recovery Point Objective (per application) that must be satisfied as well as a Recovery Time Objective that must be met. These are respectively called RPO and RTO by industry analysts. Most people tend to focus on the RTO or how much downtime is acceptable. However, just as important is to look at how much data loss an organization can tolerate. Make it a point to look at both. Data is important and data loss (even it just a few minutes, hours or days) can have far reaching negative business impacts. Many customers in the banking and financial arenas are actually bound legally to ensure that ZERO data is lost. They therefore opt for synchronous replication for many applications. Most companies even today rely primarily on tape backup and restore as the center of their DR plan. This usually means at least a day of lost data and a few days of downtime after a disaster. This is fine if it meets the business needs, but most organizations will at least have some applications that will require a more aggressive RPO and RTO. Again, the cloud represents business impact or business damage (direct and indirect costs). So let’s look at how different applications may have different needs within the organization (next slide). Closer you get to the disaster on each side, the more money you’re going to spend.

13 Only as Good as the Weakest Link
Application Server Tier Database Tier Clients Web Cache Java Clusters Load Balancer

14 BC/DR Must Address Every Component
Network Infrastructure Data Storage – online, near-line and off-line Application servers and their offspring Any component down = the entire system is un-usable

15 Network Infrastructure
Wide Area Traffic Manager to direct client traffic to proper site Network load balancer to distribute incoming requests Dedicated, fast link between sites Influences production database performance Redundant components and paths Network paths to the site and within the site Presenter notes: Wide Area Traffic Manager, such as F5’s 3-DNS Traffic Controller Load balancer, such as F5’s BIG-IP Application Switch.

16 BC/DR Techniques for Data Storage
Snapshots – frequent, within an array, FC, temporary Mirrors – frequent, in a different array, FC, temporary Replicas – synchronous or async, remote or local, FC or IP, temporary or semi-permanent Near-Line Disk – infrequent, x-platform, FC or IP, BI copy, DLM, or staging for backup Tape Backup – infrequent, FC or IP, required best practice for DR

17 Application Availability with Local Clustering
Server 1 Instance ‘A’ Server 2 Instance ‘B’ Database Protects from local server failures Depends on shared available storage

18 Wide Area Clustering Extends local clustering model to several sites
Requires data mirroring or replication Cleveland Columbus Cincinnati Sandusky We’ve looked at service groups and server clustering, now let’s extend that model to multiple sites and multiple clusters with global clustering. Global Clustering looks at multiple clusters (in this case four 2-node clusters) and (with replication) allows service groups to failover among and between various different clusters on different subnets. GCM primary purpose is for DR. Manage failover between sites. From one console manage multiple clusters.

19 Wide Area Clustering Site Migration Failover Replication
Extends local HA model to many sites Requires data replication (by definition) Single point of monitoring and administration All this can take place with a single command or mouse click. Other misc info… Wide Area Availability (0-10,000’s KM) over a WAN/MAN Global Application Object’s Horizontal Application Scaling & Data Sharing Strategic Platform Coverage Wide Area Failover Site-Wide Configuration Global Availability Management Replication Integration, Management and Monitoring Tied to local HA platform (service groups) Replication

20 Key Steps to Success Conduct a Business Impact Analysis
Identify which processes are truly critical and cost of BC Prioritize investments in people and technology Plan and Implement Test, test, test!!! Review the business continuity plan when the business process changes

21 Real Life Example Ohio Dept. of Public Safety
State Highway Patrol Bureau of Motor Vehicles Emergency Management Agency Emergency Medical Services Investigative Unit Homeland Security Administration

22 Data Center Facilities
State of Ohio Computer Center – West campus of Ohio State University Primary site Full data center facilities, i.e., UPS, Generator, Environmental Operates light out Charles D. Shipley Building – Public Safety Headquarters, 1970 W. Broad Street Approximately 4 miles apart Secondary site Remote operations

23 Features OC48 Sonet ring between the buildings
Moving to Gigabit Ethernet Mainframe environment has mirrored disks at primary site, 3rd mirrored leg at secondary site Robotic tape silos at primary site, remote tape drives at secondary site Redundant server with failover for law enforcement Servers at either site, mirror to other site

24

25 Decision Factors Prioritize business functions
Work with business units for business continuity to determine IT disaster planning levels Determine level of acceptable risks Distance for secondary site Hot versus cold site Mirror data versus backups Redundant servers with failover versus build new server at time of disaster

26 Best Practices Configuration Operational
Detailed recommendations from your vendor Features to use, parameters to set Guidelines for hardware and other software Operational Technical – e.g. Switchover and failover procedures Logistical – e.g. Change management considerations Emphasis on outages Outages to monitor Detailed steps to resolve outages How to restore fault tolerance Presenter notes: Once the physical architecture is built, it can only be successful by utilizing sound configuration and operational best practices. Configuration best practices include specific direction for Oracle software. MAA includes which features to use, parameters to set, and points to consider when multiple HA-valid options are available. Configuration also covers guidelines for operating system software and hardware, particularly as they directly affect Oracle software. For example, storage layout should be done using the Stripe and Mirror Everything, or SAME, methodology. Another example is using HARD for the highest level of data protection. This is accomplished by embedding Oracle’s data validation algorithms directly into a storage array to prevent corruption. The second part of best practices is operational. Operational best practices include technical components, such as tested, automated backup and recovery or switchover and failover procedures. Operational best practices also include logistical components, which focuses on setting and establishing processes, policy and management. An example would be a set of rules and procedures for change management to ensure any change to the system is authorized, scheduled, and tested. Operational best practices focus heavily on outages.

27 Information Sources Peter G. Neumann PhD.
Architectural Frameworks for Composable Survivability and Security Comp.risks Presidents Commission on Critical Infrastructure Protection (PCCIP) Robert Buchmann. Disaster Proofing information Systems. McGraw-Hill Networking Professional, 2003

28 Information Sources Manhoi Choy, Hong Va Leong, and Man Hon Wong. Disaster Recovery Techniques for Database Systems. In Communications of the ACM 2000 Renate Rohde, Jim Haskett. Disaster Recovery Planning For Academic Computing Centers. In Communications of the ACM. June 1990 Bridget Eklund. Business Unusual. In Communications of the ACM, December 2001 Martin Nemzow. Business Continuity Planning. In International Journal for Network Management, vol 7, (1997)

29 “The pessimist sees difficulty in every opportunity
“The pessimist sees difficulty in every opportunity. The optimist sees opportunity in every difficulty” - Winston Churchill


Download ppt "Business Continuity & Disaster Recovery"

Similar presentations


Ads by Google