IT Business Continuity Briefing March 3, 2011.  Incident Overview  Improving the power posture of the Primary Data Center  STAGEnet Redundancy  Telephone.

IT Business Continuity Briefing March 3, 2011

 Incident Overview  Improving the power posture of the Primary Data Center  STAGEnet Redundancy  Telephone Redundancy  Secondary Data Center and Recovery Point Objectives (RPO)  Secondary Data Center and Recovery Time Objectives (RTO)  Customer communications during outage incidents Agenda

SYSTEMS & DATANETWORK SERVICESPOWER & ENVIRONMENTALSFACILITIES & STAFF IT Business Continuity Dependencies

SYSTEMS & DATANETWORK SERVICESPOWER & ENVIRONMENTALSFACILITIES & STAFF Incident Impact

  ITD powered down servers and equipment in the Primary Data Center to minimize data loss.   ITD started to provision equipment to allow the Secondary Data Center to assume the role of the primary data center.   Initial time estimates projected power being restored to the Primary Data Center by 6:00 pm.   Power restored at 5:50 pm, email and core network services restored at 6:30 pm, final systems/applications completed by 11:30 pm. January 18 th Incident Response

  Primary Data Center and Secondary Data Center both have generators to provide backup power.   ITD is working with Facilities Management and Sirius Computer Solutions to identify and implement solutions that will provide a second redundant power source to the Primary Data Center.   Hoping to be completed by the end of 2011. Power Posture Improvements

  Four Quadrant RPR Ring provides redundancy on the statewide ring by allowing traffic to automatically failover if a core node fails.   The Network Point of Presence in each quadrant has equipment architected for High Availability and backup power generation.   Internet Gateways in Bismarck and Fargo are load balanced and architected to provide failover if one of the Internet Gateways fails.   Agencies should coordinate with ITD if they require redundancy (network diversity) at individual endpoint locations. STAGEnet Redundancy

  Current Design is a Standard Digital Design   Dependent on the PBX serving the endpoint   The PBX has high availability components   Does not provide redundant service if the PBX fails   There is a service agencies can purchase to re-route critical numbers (e.g. Crisis Hotlines) in the event of a disaster. Telephone Redundancy - Current

  New Voice over IP (VoIP) design during the next two years.   As part of the standard VoIP design we will have four redundant Call Managers on STAGEnet which provide failover if the primary Call Manager serving a site fails.   Provides the ability to relocate telephone numbers to other sites with network connectivity.   Provides redundant core services for dial tone, call center and automatic call distribution (ACD).   Will not initially provide redundancy for voice mail, mobility and Interactive Voice Response (IVR). Telephone Redundancy - VoIP

Recovery Point Objective (RPO) Recovery Time Objective (RTO)

 The Recovery Point Objective (RPO) – the point in time to which you must go back to recover data when a loss incident occurs.  RPO focuses on data is independent of the time it takes to get a non-functional system back on-line (the Recovery Time Objective or RTO).  Generally a definition of what an agency determines is an “acceptable loss" in a disaster situation.  The value of the data in the “acceptable loss” window can then be weighed against the cost of the additional loss- prevention measures that would be necessary to narrow the window. Recovery Point Objective (RPO)

  Generally speaking backups are performed on a nightly basis to tape at our Secondary Data Center.   Databases have full weekly backups and nightly incremental backups.   Other data – only items that have changed during the day are backed-up.   Generally speaking the RPO or potential loss window for most data is one day – a Tuesday 4 pm disaster would require you to restore the Monday night back up and the activity for Tuesday is lost.   Agencies whose business requirements don’t allow for this potential data loss implement data replication. Recovery Point Objective (RPO)

  Recover Time Objective (RTO) – a measure of how long it takes for a system to resume normal operations to avoid unacceptable business impacts.   Prior to 2006 ITD contracted for an out of state disaster recovery hot site with a best case mainframe RTO of 72 hours.   With the deployment of online applications and multiple platforms a contracted hot site with adequate network bandwidth and processing capacity became unaffordable.   ITD invested in a second data center to improve the State’s RPO and moved to a four hour RTO for core network services. Recovery Time Objective (RTO)

  Now looking to improve the RTO of the second data center from four hours to a matter of minutes for core network services.   Base services that will be up within the first hour:   E-Mail   File and print services   AS/400 platform and applications   Current replicated hardware   Disaster Recovery Web Site – basic information Recovery Time Objective (RTO)

  Base services that will be up within four to twelve hours:   Mainframe (must IPL) / DELA   ConnectND   Selected shared services and some agencies have development and/or test environments residing at the second data center. These environments will be converted to assume the role of production servers in a disaster scenario. Recovery Time Objective (RTO)

  Agencies that do not invest in replicated data solutions and backup processing capacity will need to wait for additional storage and servers to be shipped and provisioned. Estimated RTO of 3 weeks to 8 weeks for production systems depending on hardware availability, staffing priorities and the amount of data to restore.   Agencies that invest in replicated data solutions but no backup processing capacity will need to wait for servers to be shipped and provisioned. Estimated RTO of 2 weeks to 4 weeks depending on hardware availability and staffing priorities. Recovery Time Objective (RTO)

  We feel we can improve our communications process during any future disaster events.   Planned communication avenues:   DR Website   E-mail   Customer Service Desk   Notifind – currently used to communicate with our staff   We may be asking for emergency contacts for critical applications Disaster Recovery Communications

Questions ITD Contingency Planning Contact Larry Leelalee@nd.gov328-2721

IT Business Continuity Briefing March 3, 2011.  Incident Overview  Improving the power posture of the Primary Data Center  STAGEnet Redundancy  Telephone.

Similar presentations

Presentation on theme: "IT Business Continuity Briefing March 3, 2011.  Incident Overview  Improving the power posture of the Primary Data Center  STAGEnet Redundancy  Telephone."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

IT Business Continuity Briefing March 3, 2011.  Incident Overview  Improving the power posture of the Primary Data Center  STAGEnet Redundancy  Telephone.

Similar presentations

Presentation on theme: "IT Business Continuity Briefing March 3, 2011.  Incident Overview  Improving the power posture of the Primary Data Center  STAGEnet Redundancy  Telephone."— Presentation transcript:

Similar presentations

About project

Feedback