Presentation on theme: "Disaster Recovery (DR) GEORGE F. CLAFFEY JR CHIEF INFORMATION OFFICER CHARTER OAK STATE COLLEGE & CT DISTANCE LEARNING CONSORTIUM Every dollar spent on."— Presentation transcript:
Disaster Recovery (DR) GEORGE F. CLAFFEY JR CHIEF INFORMATION OFFICER CHARTER OAK STATE COLLEGE & CT DISTANCE LEARNING CONSORTIUM Every dollar spent on disaster mitigation can save seven dollars in economic losses from a disaster. The multimillion dollar price tag on the levee improvements in New Orleans that were left undone has been dwarfed by the multibillion dollar cost of rebuilding flooded neighborhoods in the wake of the hurricane and the ensuing storm surge. Worldwatch Institute, 2006 Jenzabar Inc., An EDUCAUSE Platinum Partner, and Charter Oak State College - Disaster Recovery for Small Schools Delivered 3/20/2007 at Nercomp 2007 Copyright 2007, George F. Claffey Jr.
Basic Terms Disaster Recovery (DR): A plan to recover business critical data in the event of a disaster Business Continuity (BC or BCP): A management process to ensure the continuity of businesses Continuity of Operations Plan (COOP): A plan to ensure operations continuity after a disastrous event has already occurred. Recovery Time Objective (RTO): Acceptable disruption or amount of downtime between the disaster event and the recovery of operations.
What Elements Compromise DR/BCP Security (both Physical and Technological) Equipment (Servers, Storage, and Network) Information (Customer Data, Log Files, archives) Business Processes or Business Rules Communications
Old Disaster Recovery Plan Only mission critical app was SIS (uptime undefined) Reciprocal Site (Sister College) Data Tapes at home / alt non-secured location Just in time delivery of replacement gear (drives, NICs, servers) Disaster Recovery planning was only done by IT It has been difficult for Higher Ed to measure downtime in relation to monetary loss This has handicapped organizations ability to create a DR/BCP Budget
Recent Changes to our Understanding of Disaster Recovery are a result of… Andrew Katrina 9/11 Regulatory Compliance Potential Pandemic
Old Disaster Recovery Model Why Didn’t it Work Only mission critical app was SIS ERPs got bigger, web-enabled, and Single Sign-on make them the core of the Campus Activities (registration, grades, etc.) Reciprocal Site (Sister College) A great idea but who do you call at 2AM when something on your site is broken. This became more like 9-5 M-F friendly help, no SLA, no deal if the disaster is regional (Katrina) Data Tapes at home / alt non-secured location Tapes stolen, lost, at a terminated employee’s house, FERPA – GLB restrictions Just in time delivery of replacement gear (drives, NICs, servers) Katrina shut down shipping, UPS strike was similar. Disaster Recovery planning was only done by IT Katrina made everyone aware that alternate web-presence etc wasn’t just a nicety but required in the event of catastrophic failure It has been difficult for Higher Ed to measure downtime in relation to monetary loss This has handicapped organizations ability to create a DR/BCP Budget
New Disaster Recovery / New Definition of What’s Mission Critical Protect from internal threats Protection from natural disasters Protection with the ability to persist (locally/remotely) What’s mission critical ERP/SIS, Learning Mgmt System E-Mail, Domain Web Presence Phones / 911
Begin by Taking a Good Look Around Take an honest look… My current Disaster Recovery plan doesn’t work!
Common Problems In Existing/Static DR Plans The plan does not encompass all of a campuses current technological services and/or it becomes outdated Servers or Storage have changed Versions of Software or patches have been applied Personnel have changed Networks or network addresses have changed Security or the network devices have changed Storage has change Again
Understand Where You Are Most schools and colleges have small or limited IT staffs. The result is that Disaster Recovery often sites on the back burner.
Goal “A Living DR/BCP Document” The Disaster Recovery Plan is dynamic, it needs to be changed as networks, applications, and business processes change. At minimum the plan must be updated once a year The DR/BCP Plan is a very public document
We Need a Strong Foundation Ingredients for A strong DR Foundation Executive Support Cabinet / Stakeholder Support Information Technology Staff Faculty Support Facilities Support Budget
Engage People…Tell a Story We found that a story helped our end-users and executives get into the proper frame of mind. A pipe burst and the basement (data center) took two feet of water. The water damage voided server warranties and multiple pieces of critical data center equipment was damaged including the UPS. We have contacted our vendor and new equipment is on the way but we only have three people in IT to begin the restoration. Activities must be done in “serial order.” What should we prioritize, what can we live without for: 4business days, 10 business days, 30 business days. Who will notify our students we are having problems, how will they notify them if e-mail and the SIS is down?
Recovery Time Objective (RTO) is principal factor in DR Planning and Budgets How long can we be without Student Information System E-Mail Website Anti-Virus Protection Internet Connectivity Months Weeks Days Hours Seconds
Recovery Time Objective (RTO) is principal factor in DR Planning and Budgets Start by defining all your systems and key components Servers (model, type, specs, HD configurations) Personnel (skills, location, emergency contact info) Applications (versions, patch, custom tweaks) Network Information (domain, trusts, IP Schema, Firewall config) Seek business and user input as to what is important Seek executive input
What Impacts RTO Number of Systems Amount of Storage Required Number of Restoration Devices (HD, Tape, etc.) Personnel Available (skills required/shared) Equipment availability (cold site, warm site, hot site, 1-800-IBM)
More Storage, More Servers = Greater Time to Restore Minutes Recover data Transport tapes Replay logs Tape vaulting Replication Based Solution Recovery Time Multiple days Disaster-Recovery Mechanism Time
N-Tier Systems Cannot Be reconstructed Quickly Load Balancers Web Logic Domains Database Servers Storage Area Networks Firewalls (VPN Tunnels) IP and VIP addressing
What determines the Big Ticket Items Infrastructure Required (Fiber Lines to campus buildings, routers, UPS, Electrical, HVAC, Internet/DMark) Complexity of the Systems (N+1 WebCT Vista Architecture) The RTO and application can have
Put DR and BCP Planning on the Permanent Radar Disaster Recovery Active Directory Project WLAN AP Deployment Active Directory Project Security Audit
The DR Plan needs to be a LIVING document Integrate DR with existing change control procedures Connect with Project Management Offices or Key application stakeholders Document current and changes in business processes Document changes in network infrastructure and security infrastructure Create the plan online (DR Software or something more simple – MS SharePoint)
DR Must Plan for Catastrophic Failure but have its roots in Small Recoveries Most Likely you will perform Entire Mailbox Restoration Jenzabar/SIS Database Restoration Domain Controller or FSMO roll change How will you handle a larger challenge E-Mail Server Restoration Domain Restoration
Short Term Preparedness Make sure you are backing up the right systems and the right data (no cost) Leverage Virtual Server and Imaging Technology to “image” computers and equipment (2-5K) Look into Bare Metal Restore (BMR) type backup solutions to your existing products (Symantec Backup Exec for Windows 1-10K) Outsource tape/disk vaulting for storage needs (basic contract 1 yr = $3K) Prepare and test a plan to shutdown your server room/data center and restart it Move a domain controller to an MDF Closet in another building, Secondary AV server, tertiary DNS server, DHCP (disabled)
Aggregate and Copy Institutional Recovery Information Equipment Warranties – and policies License information – Media, License Keys Vendor Support Contracts Staff lists and Alt Contact Information Utility Vendors / Acct # / Emergency Contacts Insurance Information, Coverage amounts, riders What is my coverage and how do I engage services at 2:00AM on Saturday?
Short Term Preparedness (Contd) Begin Sharing Information These Slides Your Current Plan (Good or Bad) Take a critical look at your current plan Can you perform minor updates or do you need wholesale replacement? Determine What Support You Need Executive IT or Staffing Academic Support
Test the Plan and Test it again Perform plan and routine validations Recall Tapes Recall Staff Perform Table Top Exercises w/Executive and Cabinet Staff Perform actual physical exercises (shutdowns/restarts, alternate center) Provide cross-training to IT staff
Identify Non-IT tasks that impact DR Purchasing Processes (including Approvals) Hotels Food Transportation Movers Tradesmen (Electrical/Mechanical/HVAC) Security (Guards/Police)
How Can We Purchase Supplies/Equipment Without An Electronic System Can We? Can we cut P.O.’s Can we use Corporate Credit Cards or Raise Limits Are there existing lines of credit we can use
Create Procedures How Do We Notify Our Constituents of a Disaster How do we Recall the Emergency Response Team Notify Students Notify Faculty/Staff Notify Parents All done without use of Electronic Information Systems
Never Assume Anything Never Assume your staff will be at 100% capacity Never assume your emergency systems will work Never assume you are protected because you built redundant systems Never assume normal communication mediums will be available
Long Term Preparedness Start building Disaster Recovery Awareness among staff (start w/IT) Become knowledgeable about other Emergency plans on campus (Police, Facilities, Medical, etc.) Build Disaster Recovery into project budgets and availability questions into decision making criteria Identify onsite and off-site locations for possible DR Recovery Connect w/National Higher Ed initiatives (Educause, Nercomp, Sloan, Hi Ed pandemic planning) Seek advice from DR Consultants (SunGard, IBM, or extend relationships with existing vendors (Jenzabar).
Long Term Preparedness Make the Institutional and Policy Changes Now Command and Control Assignments Purchasing Guidelines Notification Systems Authority for enacting a disaster