Presentation on theme: "TPL Business Continuity Management System (BCMS)."— Presentation transcript:
TPL Business Continuity Management System (BCMS)
How vulnerable, are we?
Tonga Trench is 10,800m deep and located 200Kms east of the Kingdom of Tonga Tonga Trench is the fastest (24cm/year) velocity trench in the world
How vulnerable, are we?
The History of Business Continuity Disaster Recovery Planning Business Continuity Planning Business Continuity Management System Alternative Planning / Plan B Fallback Plans, Contingency Plans IT or Technical Contingency Plans Organization wide Contingency Plans Holistic Contingency Plans BCP: Process of developing advance arrangements and procedures that enable an organization to respond to an event in such a manner that critical business functions continue with planned levels of interruption or essential change. DRP: The advance planning and preparations that are necessary to minimize loss and ensure continuity of the critical business functions of an organization in the event of disaster. The technological aspect of business continuity planning. 11/9/2001 Mid 2007
What is the System?
Business Continuity Management System is: A living system; managed every day; updated at all times when the situation changes Holistic Requires a strategy/policy End-to-end critical business process restoration (focus is not asset recovery only) Communication – clients, employees, emergency organisations Integration with business processes of other Business Unites (BUs) Address employee safety & wellbeing What is BCM System?
RISK has four key components: 1. Threats: Fire, earthquake, power failure, loss of key staff etc. 2. Assets: Human, mission critical systems and infrastructure, suppliers, clients, buildings, information and records 3. Vulnerabilities: Weaknesses in assets such as single point of failure, inadequacies in fire protection, poor staffing levels, unreliable IT security, inefficient data back up etc. 4. Impact – Financial or Non financial (e.g. reputational, health & safety impacts) Risk description (example): A significant financial and reputational impacts to the Company as a result of IT Manager is unable to restore the Billing Server which was damaged by a power surge on time. Describing BCM Risk
BCM Risks BCM Plans are mitigation controls (i.e. minimise consequences) rather than prevention controls BCM Risks are managed through Quantate as part of the TPLs Risk Management Program
BCM Assumptions Secondary or alternative site is always available to the company to restore business critical processes if the primary site not accessible due to damages sustained by a disaster. Minimum resources (e.g. staff, IT equipment etc.) are still available to restore business critical functions following a disaster Popua power station is unharmed and minor damages to the distribution network assets National communication methods (i.e. TCC or Digicel) are available to communicate with public. BCM plans are not applicable for nationwide disasters (e.g. tsunami) that may have damaged the companys all primary and secondary sites, generation and distribution assets.
BS25999 Standard Requires a policy statement Requires focusing on restoring end-to-end processes (not just equipment or machinery) Requires RTO/RPO analysis for all critical processes Requires a risk assessment and mitigation Requires a hierarchy of plans (IMP, BURP, ERP etc.) Requires a command Centre and DR sites Requires a Call Tree Requires testing, maintaining & auditing of BCM plans periodically
Planning BCM Policy Steering Committee Structure, roles & responsibilities BCM scope & objectives Top management commitment Implementation & Operation Identify BC threats Business impact analysis Process criticality analysis Gap analysis Risk assessment Recovery strategies & scenarios Generating BCM Plans/BCM Manual Testing & Auditing BCM plans Periodical tests – Table top exercises, Simulations, Live Exercises Periodical audits Management review Maintaining and Improving Ongoing training Update plans whenever there is a major change to the company processes/structure Embedding BCM in the culture PLAN DO CHECK ACT Overview of BS 25999:2007
BCM Plan Portfolio (TPL)
BCM Plan Portfolio
BURP is a plan used to restore the entire business unit including the key processes, which employees are moved to DR site, activation of call tree, coordination with other business units, resume business as normal. (holistic) DR Plan is technical plan designed to restore equipment and machinery back to normal. (e.g. IT DR Plan) Companies use both names intermittently. BURP vs. DR Plan
Identify the identification Assess damage and identify lost critical processes Declare a disaster if the critical operations are unable to be restored within the primary premises Shut- down power supply for the safety of public Activate BURPs/DRPs. Move into the DR site to restore operations at a minimum acceptable levels Restore the damaged distribution assets and network Restore power generation Resume business as usual at the DR Site Monitoring Review & Documentation Managing Incidents Within the Capability of TPL Plans
Conducting Business Impact Analysis (BIA)
RTO & RPO The Recovery Time Objective (RTO) is the goal for how quickly TPL need to recover the interrupted processes. The Recovery Point Objective (RPO) is the point in time to which data must be restored to successfully resume the interrupted processes (often thought of as time between last backup and when an interruption occurred). (Last Backup)
BIA identifies impacts (financial & non-financial) due to malfunction of company critical processes resulting from disasters The goal of BIA is to identify, categorise and prioritise the mission critical processes and resources (e.g. technology, infrastructure, vital records, personnel, suppliers) required to function these processes within the company. Some examples of such processes are customer service & support, order & data processing, pay roll, IT & communication, and purchasing and production. BIA also identifies interdependencies (telephones, IT facilities, office space etc.) between different business units within the company. The priorities of critical processes for subsequent resumption are based on RTO and RPO objectives of each process. Business Impact Analysis
Administer BIA Questionnaire with each BU Manager (template supplied), collect data, review and analyse them Consider the worst case scenario; a total loss of personnel, facilities and property. Does the disaster affect the critical processes? Determine impacts if a process is lost due to the disaster: Identify and quantify financial impacts (recovery costs, production losses & revenue losses) identify non-financial impacts –Impact on staff or public wellbeing –Impact of damage to premises/assets/records –Impact of breaches of statutory regulations –Damage to reputation –Deterioration of product/services quality –Environmental damage BIA – Identifying Critical Processes
Determine minimum resources required for minimum acceptable recovery of each process manually or using alternative processes. –Staff – skills and knowledge –Secondary premises –Plant, equipment, software, data/records –External services providers Determine RTOs and RPOs for each process. Note shorter the RTO, greater cost of recovery. Also, longer RTOs increases the chances that the recovery will not be achieved within MTO. Determine whether the company is able to achieve RTO/RPO objectives currently. If not, flag them as risks to analyze them at a later stage Rank the processes based on impacts to determine priority of recovery of critical processes Example: Results of RTO/RPO analysis is supplied Conduct a process dependency analysis with other business units BIA – Identify Critical Processes
A B C BIA - Process Dependency Analysis D E Consider the entire chain of processes to be recovered together Adjust RTOs if necessary Define Maximum Tolerable Outage (MTO) BCM Plans are designed to recover company critical processes within MTO BU 1 BU3 RTO = 2 hrs. RTO = 5 hrs.RTO = 3 hrs. RTO = 7 hrs. RTO = 1 hr. BU 2
Scenario 1 – If Estimated Recovery Time < MTO –Primary Site is intact; (Example: malfunction of IT Servers) –Execute IT DR Plan for all IT issues Scenario 2 – If Estimated Recovery Time <= MTO –Primary site is inaccessible but DR sites are operational, key staff are available; national communication systems functional (Example: cyclone) –Minimum and acceptable reputational/financial losses –Execute all BCM Plans including IMP, BURPs, & IT DR Plan –Resume company critical processes from the DR Sites to a minimum acceptable level Scenario 3 – If Estimated Recovery Time > MTO –All primary and DR sites are not operational; key staff are unavailable; national communication system is malfunctioned (Example: tsunami) –Requires a good communication plan (e.g. radio communication) –Potential reputational/financial losses are inevitable –Execute BCM plans with a delay BIA – Disaster Declaration
Incident Management Plan
TPL MTO = 2 days Level 1: Minor Incident – Minor incidents occur only when critical business operations are affected and distribution network recovery or IT problems are expected to be resolved within 3 hours. In this occasion, the incident is resolved on business as usual basis and no disaster is declared. Level 2: Major Incident – Major disruption to the critical business operations with system outage expected to last more than 1 day but not more than two business days. In this case, the incident is escalated to a semi-disaster and is declared by the members of the IMT. At this incident level, only Generation and Distribution DR Plans will be invoked as applicable. Level 3: Critical Incident – Critical disruption to the business operations with system outage expected to last more than two business days. In this case, the incident is escalated and a disaster is declared by the members of the IMT. At this incident level all critical Business Units (IT, Finance, Generation, and Distribution) BURPs/DRPs will be invoked. Disaster Declaring Criteria
IMP RolePrimaryDeputy Incident Controller Rod Lowe (Distribution Manager) Michael Lani 'Ahokava (Generation Manager) Communication – Board, Media John van Brink (Chief Executive Officer) Steven Esau (Finance Manager) Financial Support Steven Esau (Finance Manager) Epoki Veituna (Financial Accountant) Distribution Division Ian Skelton (Network Planning Manager) Setitaia Chen (Network Asset Design Manager) Generation Division Michael Lani 'Ahokava (Generation Manager) Murray Sheerin (Power Station Superintendent) Risk & Regulatory Ajith Fernando (Risk & Compliance Manager) IT Support Lualala Tapueluelu (IT Supervisor) Peifaga Fuiono (IT Technician) Legal Support William Edwards (Company Secretary) Human Resources 'Alisi Tu'inukuafe (HR & Administration Manager) Nau Lavemai (Senior Administration Officer) IMT Assistance Nau Lavemai (CEO Secretary) Jane Guttenbeil (Communication Advisor) Media Relations Jane Guttenbeil (Communication Advisor) Nau Lavemai (Senior Administration Officer) Incident Management Team (IMT)
Command Centre Options Command Centre Activities: –Declaring the disaster based on the damage assessment –Establishing 24 hour communication channels –Activating, coordinating and monitoring BURPs/DRPs –Monitoring and acting on staff health & safety –Media relations –Making all key decisions for successfully restoration Possible Locations for the Command Centre: –Head Office (if available) –Distribution Office (if available) –Scenic Hotel, Digicel Network Operating Centre
DR Site or Network Incident Site or Network Distribution Office, Tongatapu Popua Generation Plant, Tongatapu Distribution Office, Haapai Distribution Office, Vavau Head Office, TongatapuDR Site Distribution Network, Tongatapu There is no DR Network for TPL in Tongatapu. Therefore, if the network is severely damaged by a disaster, TPL technical staff (with assistance from external emergency teams) have to restore the network as quickly as possible. Popua Generation Plant, Tongatapu There is no DR Generation Plant for TPL in Tongatapu. Therefore, if the existing generation facility is severely damaged by a disaster, TPL technical staff have (with assistance from external emergency teams) to restore the generation facility as quickly as possible. Office, HaapaiDR Site Distribution Network/ Generation Plant, Haapai There is neither DR Generation Plant nor Network for TPL in Haapai. Therefore, if the existing generation facility/ distribution network is severely damaged by a disaster, TPL technical staff have to restore the generation/network facility as quickly as possible working day and night. Office, VavauDR Site Distribution Network/ Generation Plant, Vavau There is neither DR Generation Plant nor Network for TPL in Vavau. Therefore, if the existing generation facility/ distribution network is severely damaged by a disaster, TPL technical staff have to restore the generation/network facility as quickly as possible working day and night. Distribution Network/ Generation Plant, Eua There is neither DR Generation Plant nor Network for TPL in Eua. Therefore, if the existing generation facility/ distribution network is severely damaged by a disaster, TPL technical staff have to restore the generation/network facility as quickly as possible working day and night. DR Sites
Incident Management Team Business Unit Managers Field Supervisors & Foremen Office Supervisors Linesmen Staff Internal Communication – Call Tree Brief description of the disaster, loss of life or injuries, damage summary, response and recovery details Location of the DR or Network Site to report to or to remain at home on standby Phone number of the DR Site/Network Supervisor Immediate actions to be taken Location and time the team should meet at DR or Network Site Instruct everyone notified not to make any statements to the media or social media.
IMT Distribution Manager Finance Manager Generation Manager Planning & Design Manager Business Unit Restoration Plans Vendors & Suppliers Police, Fire & Hospital National Emergency Management Office (Ministry of Works ) Tonga Met Service Radio Tonga Tonga Defense Service Government Organisations Embassies & High Commissions Business Unit DR Plans IT Supervisor External Communication
Only the authorised spokespeople shown below are to comment to the media. John van Brink (CEO) – English media Steven Esau (Finance Manager) – Tongan media All unauthorised staff should know that if they are contacted by the media that they are not to comment and that their standard reply should be Im sorry I cant help you, I am not the appropriate person to speak to. If you provide me your name and contact number, Ill get the right person to get in touch with you shortly. Jane Guttenbeil and Nau Lavemai coordinate media relations Media Relations
Business Unit Recovery Plan
Priority Rank Process to Recover Minimum Systems Required Can TPL Provide Min. Systems Required? RPORTO Minimum Staff Required 1 Customer call management Voice call & Calls redirection to mobile phones Yes (refer IT DR Plan) NA1 DayDedicated staff 2 Meter reading & Billing Computers with Orien, Printer, mobile phones Yes (refer IT DR Plan) 1 Day2 Day Billing Supervisor, Meter readers 3 Revenue management Computers with Orien, mobile phones, Yes (refer IT DR Plan) 1 Day2 Day Credit supervisor and 2 cashiers DivisionManager/SupervisorTeam Members ITLualalaPei, Sonia BillingOfaMakueta RevenueHetaOvava, Grace Head Office BURP Minimum Systems Required at DR Site Minimum Staff Required at DR Site
Priority Rank Process to Recover Minimum Systems Required Can TPL Provide Min. Systems Required? RPORTO Minimum Staff Required 1 Customer faults call management Calls redirection to mobile phones, Computer with File Maker Software Yes (refer IT DR Plan) NA1 day 2 Call Centre Staff 2 GIS Database Management Computers with GIS Database, Printer, mobile phones Yes (refer IT DR Plan) NA2 Day 1 Planning Staff DivisionManager/SupervisorTeam Members FaultsMalia Hehea GIS DatabaseIanSemi Distribution Office BURP Minimum Systems Required at DR Site Minimum Staff Required at DR Site
Priority Rank Process to Recover Minimum Systems Required Can TPL Provide Min. Systems Required? RPORTO Minimum Staff Required Option 1 Power generation to critical organizations 3MW Backup Generator (TPU) and 500KW Generators for outer islands, 2 weeks fuel supply No, but will have to restore damaged generators as quickly as possible NA 4 Days Generation manager and all generation technical staff Option 2 Power generation to critical organizations 3MW Backup Generator (TPU) and 500KW Generators for outer islands, 2 weeks fuel supply If the power station is unrecoverable, generators will be hired from Aggreko NZ NA1 week CEO, Power Generation Manager Generation DR Plan
Priority Rank Process to Recover Minimum Systems Required Can TPL Provide Min. Systems Required? RPORTO Minimum Staff Required 1 Distribution network Network equipment (poles, transformers, insulators etc.) Yes, Transnet stores has adequate standby supply for disasters. NA Urban areas – 2 Days Villages – 1 Week Distribution manager, supervisors, and 100% field staff 2 Field vehicles, PPE, VHF Comms Assume 50% damaged Yes, there VHF systems for 50% of field vehicles NA100% field staff Distribution Network DR Plan Note: H/O staff are expected to support distribution staff for cooking etc. Meter readers are expected support linesmen at the field
Emergency Response Plans ERPs contain specific emergency procedures that must be followed during a disaster in order to protect people and assets, and to mitigate further damage. For example: Building Evacuation Damage Assessment Spillage (Oil/Chemical/Diesel Fuel) at Power Station Fuel Supply Cut Off Civil Disturbance Hurricane/Storm Records Recovery Bomb Threat Earthquakes
Testing, Maintenance & Auditing the BCM Plans
Testing Business Continuity Plans The development of BCM plans does not end of the BCM process. The emphasis of BCM is upon management Without regular maintenance and testing, their usefulness in a real crisis may be severely limited Practicing the companys ability to recover from an incident. Test the scenarios identified under the Scenario Planning Section (refer above) Tests validates effectiveness and timeliness (RTO and RPO objectives) of restoration of critical activities. Determine adequacy of SLAs (service level agreements) with third party suppliers Testing identifies communication breakdowns during call-tree activation trials. Tests are essential to developing teamwork, competence, confidence, and knowledge which is vital at the time of an incident. Frequency of testing can be annual, bi-annual etc. and announced or unannounced
Types of Tests 1.Table top checks: is the simplest and most frequent form of tests. The author of the plan simply checks the contents of the plan to ensure that information (e.g. employee names, contact numbers) are up to date. 2.Walk-Through: similar to table top exercise but involve all named participants to test a special disaster scenario. The participants are brought together to role play their defined resumption procedures alongside those of others. This is the most common method of testing BCM plans as is relatively less expensive. 3.Simulation: exercises widens participation to all those who are involved in business recovery with a prior notice. A simulation may includes an interruption such as building fire in which people do not access to normal facilities and must relocate to an alternative location (i.e. DR site). 4.Full or Live Exercises: are the most extensive and expensive form of test thus they are normally undertaken yearly or bi-yearly basis. This is the largest scale of test and involves the invocation of all BCM plans (i.e. IMP, BURP & ERPs) to deal with a scenario which normally involves a move to an alternative site where operations are to be resumed. The dependencies and links between BURPs are the focal point of this type of testing.
Post Test Evaluations The post test evaluation should consider following issues: Did the plans help or hinder recovery efforts? Did people deviate from the plan and, if so, what was the effect of this? Were RTO & RPO objectives achieved? Where and when did delays occur? What did staff do well? What did staff do badly? How did the expectations differ from what actually happened? Were all BURPs integrated sufficiently to achieve recovery? What are the priorities for change? Is there a paper or audit trail? How should changes be implemented? Could the observation process be improved? Identify and document all the deficiencies, lesions learned etc. Update BCM plans if required based on test outcomes
BCM Maintenance & Auditing Ongoing BCMS audits should be conducted to evaluate and identify gaps/inconsistencies of entire BCM portfolio of plans. If testing has shown that plans have failed to meet the recovery objectives, a fundamental review of plans may be required. In addition, audits should be conducted after company restructure as a result of a merger or acquisition, installation new systems & facilities (e.g. IT) etc. Auditor will issue corrective and preventive actions focusing continuous improvement Ensure the BCMS is current and up to date at all times (i.e. BCM plans are living documents) Provide ongoing training on the entire BCM process (e.g. BIA, Risk Management etc.) Communicate to all employees the BCM initiatives through newsletters, induction programmes etc. Cultivate a BCM culture.
Interruptions Outside the Capabilities of BCM Plans
Managing Disasters Outside the Capability of BCM Plans Usually BCM plans are prepared to mitigate impacts and resume business operations for manageable interruptions such as fire, flood, supply chain interruptions, IT failure, etc. However, there are some disasters (e.g. tsunami) where companys current BCM plans may be simply ineffective due to following reasons: Business may be interrupted for a while RTO and RPO objectives may not be able to be achieved (Estimated Recovery Time > Predefined MTO) All DR sites may have been damaged Possible loss of key staff Severe damages to the companys mission critical assets (e.g. distribution network) All communication (e.g. TCC, Digicel) malfunctioned Severe damages to the power station
Managing Disasters Outside the Capability of BCM Plans In case of a such disaster, a possible action plan would be: Shut-down the power generation for safety of he public Monitor and support staff health & safety and wellbeing Coordinate with NEMO for possible recovery efforts (e.g. assistance from TDS to clear fallen trees etc.) Broadcast public that power is interrupted for a while Restore the distribution network (might have to import network equipment from overseas) with some delays Establish alternative fuel supply (if the fuel tank has been damaged by the tsunami) If the power station cannot be recovered import generators from Aggreko Ltd., NZ. Establish a new DR site to resume key processes (e.g. billing/revenue) to minimum acceptable level. Implement current BCM plans with a delay