Presentation on theme: "Systems Availability and Business Continuity Chapter Four Prepared by: Raval, Fichadia Raval Fichadia John Wiley & Sons, Inc. 2007."— Presentation transcript:
Systems Availability and Business Continuity Chapter Four Prepared by: Raval, Fichadia Raval Fichadia John Wiley & Sons, Inc. 2007
Chapter Four Objectives 1. Understand system availability and business continuity, and recognize differences between the two. 2. Comprehend incident response systems and their role in achieving the system availability objective. 3. Explain disaster recovery planning objectives and its, design, implementation and testing requirements. 4. Comprehend the link between business continuity and disaster recovery. 5. Understand the role of backup and recovery in disaster recovery plans.
Power outage at Northwest Airlines Thunderstorm and lightening at the datacenter location caused the problem. Systems, down initially, operated in a degraded manner the next morning. Took very long to check people in flights. NWA triggered manual processes. Lines became longer and so did the delays in departure. Arrivals were late, but the departures from gates at the destination airport made the flights to wait before they could get to the gate. NWA announced an embargo, limiting itself to what it can handle under the circumstances.
System Availability and Business Continuity System availability assures you that business will continue to operate. Business continuity is necessary for systems to add value on an ongoing basis. The issues of business continuity and systems availability are related and even overlap to a degree.
Incident Response Incident: A level of interruption in the system availability that appears to be temporary. An incident can be triggered by an accidental action by an authorized user, it may result from a threat. Incidents may be detected by: End-users who may describe the symptom but not the cause. Those monitoring systems and processes may detect anomalies which lead to an incident that has occurred. Attack: A series of steps taken by an attacker to achieve an unauthorized result. Event: An action directed at a target that is intended to result in a change of state, or status, of the target. An event consists of an action and a target.
Nature of Response to an Incident Assess the business significance of the incident’s impact. Identify critical business processes that might have been compromised. Determine the root causes of the incident. This might present a challenge, for every incident could be of a different variety. The team may need to consult experts from outside the team. Training in forensics could help the team collect and evaluate evidence systematically. Standard procedures must be followed for restoring the affected systems and processes, instead of ad hoc, one- off attempts to restore what is compromised or lost.
Preventive Measures Prevention is better – and could be more cost effective - than a cure. Preventive measures require an anticipation or prediction of what might happen in terms of incidents and consequent compromises. Lessons learned from the organization’s and from others’ experiences can help design and implement effective preventive measures.
Incident Response Team A multi-skilled group, since the incident may be any variety and may impact almost any information asset. May include representation from human resources, legal, information systems, networks and communications, physical security, information security, and public relations. A top management team member may be designated as a direct contact for counseling and support.
CERT CERT stands for Computer Emergency Readiness Team. Also called CERT Coordination Center (CERT CC), it is the Internet’s official emergency team. Provides alerts and offers incident handling and avoidance guidelines. Is located at Carnegie-Mellon University. www.cert.org
Disaster Recovery Disaster: An event that causes a significant and perhaps prolonged disruption in system availability. Disasters can be man-made or natural. Man-made disasters can be malicious or unintentional. Disaster recovery is a systematic effort to recover from the impact of a disaster. Best way to understand recovery is by focusing on post-disaster phases. Post-disaster phases Immediate response Near-term resumption Recovery toward normalization Restoration to pre-disaster state
PhaseImmediate Response Near-term resumption Recovery toward normalization Restoration to pre- disaster state ObjectiveAddress emergency situation only. Resume operations at any level possible. Expand operations and extend capabilities and functionalities. Return as close to the original (pre-disaster) state as possible. Example Event: A logic bomb destroyed the operating system and customer data. Call customers whose orders are yet to be filled. Determine the current state of the system and data. Call in backup tapes and equipment to a warm site. Begin manual processing of critical orders. Install equipment, load operating system and applications, restore data, and test outputs. Switch to automated processing. Expand the order processing cycle. Increase the functionality (e.g. report generation). Load operating system, data, and applications at the original site. Pre-test. Resume processing in a parallel run with the warm site. Cut over to the original site. Fold operations at the warm site and return the equipment.
Timeliness of Action and Value of Recovery Timeliness of action The timeline of actions planned should reflect value of the action at the time. Certain steps can wait while others must be taken without delay, to minimize losses. Value of recovery Timeliness of action reflects value of the recovery target. Considering this, recovery tasks should be systematically assigned to each post-disaster phase.