Disaster Management at the Tier-1

Slides:



Advertisements
Similar presentations
Construction (Design and Management) Regulations 2007
Advertisements

Defence Project Management 2007 Learning to love project risk management Dr Andrew Tyler DG Ships, DE&S.
The Space Shuttle Challenger Disaster Group 3a: Matt Paschol, Chris Fuller, Brandon McCauley.
Buying Better Outcomes Workshop 4 Equalities and Contract Management If you do not take it seriously, why should the supplier?
Planning and Managing your Research Project Dr Keith Morgan.
Project Management Framework May 2010 Ciaran Whyte Risk Administrator Planning & Strategic Projects Unit.
Project What is a project A temporary endeavor undertaken to create a unique product, service or result.
INTRODUCTION AS (3.3) Apply business knowledge to address a complex problem in a given global business context.
Implementation Options – Stroke. Implementation commences Current stroke services in London are of variable quality – under the new model, all stroke.
An Accident Rooted in History NASA Culture History of the flawed joint Events leading up to the disaster.
Comprehend the Challenger accident Comprehend the Columbia accident The Space Shuttle Program: Challenger and Columbia Accidents.
Improvement Service / Scottish Centre for Regeneration Project: Embedding an Outcomes Approach in Community Regeneration & Tackling Poverty Effectively.
Projmgmt-1/33 DePaul University Project Management I - Risk Management Instructor: David A. Lash.
Action Implementation and Monitoring A risk in PHN practice is that so much attention can be devoted to development of objectives and planning to address.
Risk Management Figure 4-4 Estimate of Project Cost: Estimate Made at Project Start.
Software Development Problems Range of Intervention Theory Prevention, Treatment and Maintenance Planning, Development and Use Cost of Intervention.
Software Project Risk Management
Controlling Risk by Managing Change Jessica Blaydes & Gary Fobare Honeywell Aerospace 2013 Region IX Workshop.
Business Continuity Check List PageOne. - Why Does Your Business Need A Continuity Checklist? Should the unexpected occur, your business will be able.
S-271 Helicopter Crewmember Slide 7D-1 Unit 7D Operational Safety - Lesson D: Parking Tender and Miscellaneous Roles and Responsibilities.
Protection Against Occupational Exposure
Workers Compensation Case Management Iris Ayala Occupational Health Manager Kaolin Mushrooms April 2011.
1 Continuity Planning An Overview…. 2 Continuity Planning Bill Scott CBCP Contingency Planning Coordinator Great Lakes Educational Loan Services, Inc.
IAEA International Atomic Energy Agency How do you know how far you have got? How much you still have to do? Are we nearly there yet? What – Who – When.
Equity Housing Group Risk Management. 05 August 2002 © MazarsEquity Housing Group: Risk Management 2 Agenda Introduction: what is Risk Management? The.
University of Palestine software engineering department Testing of Software Systems Fundamentals of testing instructor: Tasneem Darwish.
Commissioning Market Research November 19, Market Research Five Research Questions 1.What are the odds that any new, un-commissioned building will.
Medical Audit.
DIAC Session 2, November Policies and Programs Professor Adam Graycar.
Expecting the Unexpected By Shaun Lindfield. Nearly 1 in 5 businesses suffer a major disruption every year. Yours could be next. With no recovery plan,
BT Young Scientists & Technology Exhibition App Risk Management.
Event Management & ITIL V3
Risk Assignment in The Delivery of a Project  RISK! –Construction projects have lot of it –Contractors manage it –Owners pay for it.
1 TenStep Project Management Process ™ PM00.7 PM00.7 Project Management Preparation for Success * Manage Risk *
1 Chapter 5 Project management. 2 Project management : Is Organizing, planning and scheduling software projects.
Project monitoring and Control
Avoid Disputes, Not Complaints Presented by: Stuart Ayres and Derek Pullen Stuart Ayres, Scheme Manager Derek Pullen, Scheme Adjudicator.
Engin Ali ARTAN Industrial Engineering
Risk Management. 2 Policy and planning Key Messages Assess all likely risks, be prepared and practice. Structure the risk management process simply and.
Hazards Identification and Risk Assessment
Chapter 3 Project Management Chapter 3 Project Management Organising, planning and scheduling software projects.
Solicitation and Selection Guidance April 27,2015.
Risk Management & Corporate Governance 1. What is Risk?  Risk arises from uncertainty; but all uncertainties do not carry risk.  Possibility of an unfavorable.
SMS for a Modern Flight Training Organization
Phases of BCP The BCP process can be divided into the following life cycle phases: Creation of a business continuity and disaster recovery policy. Business.
STEP 4 Manage Delivery. Role of Project Manager At this stage, you as a project manager should clearly understand why you are doing this project. Also.
Transforming Patient Experience: The essential guide
Chapter 1: Fundamental of Testing Systems Testing & Evaluation (MNN1063)
Project management Topic 7 Controls. What is a control? Decision making activities – Planning – Monitor progress – Compare achievement with plan – Detect.
ISMS Implementation Workshop Adaptive Processes Consulting Pvt. Ltd.
Developing an Investment Governance Framework
Erman Taşkın. Information security aspects of business continuity management Objective: To counteract interruptions to business activities and to protect.
 How well is your organisation prepared for internal or external emergency situations? ◦ Do you consult with relevant emergency agencies? ◦ Do you.
NIHR Themed Call Prevention and treatment of obesity Writing a good application and the role of the RDS 19 th January 2016.
1 TenStep Project Management Process ™ PM00.5 PM00.5 Project Management Preparation for Success * Manage Scope *
Managing a functional exercise for the first time Graham Leonard, Business Continuity Manager Insights and lessons 17 June 2014.
Patricia Alafaireet Patricia E. Alafaireet, PhD Director of Applied Health Informatics University of Missouri-School of Medicine Department of Health.
RISK MANAGEMENT FOR COMMUNITY EVENTS. Today’s Session Risk Management – why is it important? Risk Management and Risk Assessment concepts Steps in the.
Project Management What is a project? Project life cycle? Process Risk management Time management Why do projects fail? EXAMPLE Cartoon EXAMPLE Project.
1 Chapter 1- Introduction How Bugs affect our lives What is a Bug? What software testers do?
Applying Lean Principles in Warehouse Operations
Strategic Communications Training Crisis Communications X State MDA 1.
Contingency planning. Contingency planning is the process of preparing an organisation for unexpected or unwanted events.
Welcome to the ICT Department Unit 3_5 Security Policies.
Computer Security: Principles and Practice First Edition by William Stallings and Lawrie Brown Lecture slides by Lawrie Brown Chapter 17 – IT Security.
Australian National Audit Office Better Practice Guide: Implementation of Programme and Policy Initiatives Presentation to the Canberra PMI Chapter 7 March.
Providing assurance on risk management and controls
Here are some top tips to help you bake responsible data into your project design:.
A COMPETENCY FRAMEWORK FOR GOVERNANCE GOVERNORS’ BRIEFING LANGLEY HALL PRIMARY ACADEMY 14 JULY 2017 Clive Haines & Rebecca Walker.
Presentation transcript:

Disaster Management at the Tier-1 Andrew Sansum 2nd April 2009 RAL

Burnt out UPS battery at ASGC Do You Recognise This Burnt out UPS battery at ASGC Clearly a Disaster 22 March 2017 Tier-1 Status

Do You Recognise This? 22 March 2017 Tier-1 Status

Challenger Disaster 22 March 2017 Tier-1 Status

Cause of Challenger Disaster It was the “O” rings wasn’t it? “[The Rogers commission] found that the Challenger accident was caused by a failure in the O-rings … The failure of the O-rings was attributed to a design flaw, as their performance could be too easily compromised by factors including the low temperature on the day of launch” Yes but there were underlying cause(s) Communication Problems “..failures in communication... resulted in a decision to launch 51-L based on incomplete and sometimes misleading information, a conflict between engineering data and management judgments, and a NASA management structure that permitted internal flight safety problems to bypass key Shuttle managers.” Management Errors: “The Commission found that as early as 1977, NASA managers had not only known about the flawed O-ring, but that it had the potential for catastrophe.” 22 March 2017 Tier-1 Status

Why considered a disaster? People died. “Challenger disintegrated about seventy-three seconds after launch, killing the seven astronauts aboard” NASA’s reputation was badly damaged: “It also represented a serious blow to NASA's reputation, colouring the public perception of piloted spaceflight ..” Financial losses and reduced funding opportunity “…and affecting the agency's ability to gain continued funding from Congress.” Couldn’t meet operational commitments “Following the Challenger disaster, NASA grounded the remainder of the shuttle fleet while the risks were assessed more thoroughly, design flaws were identified, and modifications were developed and implemented.” 22 March 2017 Tier-1 Status

Identify Potential Disasters We do not (usually) mean the same thing when we say disaster as is meant by the “Challenger Disaster” Nevertheless there are many outcomes we wish to avoid Tier-1 Disaster Management plan seeks to identify circumstances that have a potential to significantly impact: Safety Services Commitments Reputation Financial 22 March 2017 Tier-1 Status

Some Disasters Can construct list of obvious disasters. Eg: Fire/Flood etc Loss of network Security incident We did this in the form of a risk analysis: DPv0.8.mht Also have previous experience CASTOR 2.1.7 upgrade Disk firmware problems made it impossible to run delivered H/W R89 delays (unable to manage deliveries) Backplane burnout (not a disaster but very close) Common themes: The ones we generated tended to be operational and start suddenly The ones we suffered were slow moving project management Also need to be able to manage un-thought of disasters

Evolution of a Disaster Sometimes fast Sometimes slow but similar result

A Strategy Create a Disaster Management System which handles all potential disasters in a similar way. Identify common features and trigger levels to allow us to spot events before they blossom into disaster Mess with existing processes as little as possible Build specific contingency plans which add to the general response in specific circumstances. Trigger early, trigger often, respond ahead of curve Make use of the system routinely Stops the system decaying gives operational and project management benefits

Don’t Confuse Disaster with Routine OPS Loss of power not a disaster ….. but …. Failure of routine restart may lead to disaster 22 March 2017

Routine Operations We already have: Routine operations should be: Production Team (Gareth, John Kelly and Tiju) Admin on Duty (daytime) on-call (nighttime) Routine operations should be: Looking for problems Fixing things calling experts Notifying users setting downtimes assessing seriousness reviewing events – improving future response Not part of Disaster Management System But prevents many things moving into the system

Need Escalating Response Start lightweight (Stage 1: Disaster Potential). informally Assess/triage Monitor/compare against standard contingencies Set deadlines watch for things leaving expected script but avoid interfering Add some internal management (Disaster Possible) Add internal (group) oversight Formally assess interfere more, divert resources escalate response to imminent disaster (Disaster Likely) Broaden oversight and expertise (include GRIDPP + department) regular meetings with experiments prepare contingencies Manage actual disaster (stage 4: Disaster)

At each stage Formal list of pre-defined communications Notify team of deadline to escalation Notify PMB incident is moving onto disaster track Notify esc senior staff Advise Press & PR (as disaster approaches) …. Formal list of actions that should be carried out – eg: Define Roles Hold Incident Review Meeting Start process to obtain financial approval arrange exceptional experiment liaison meeting review policy documents Formal list of criteria that get you to next stage

Contingency Plans Contingency plans supplement general disaster management system. For each stage in the general system – supplement with: Criteria to get (avoid) to this stage Actions to take at stage Communications make at stage Example Contingency Plan Contingency_Plan_Major_Security_Incident.mht 22 March 2017 Tier-1 Status

Conclusions Disaster Management System is working. Already managed: Site DNS failure (reached Stage 1) Power failure (reached stage 2) Doesn’t replace our existing processes But does make sure they are responding correctly Expect it to manage equally well: Operations failures (network down and out) Project management failures (building delivered late) Unexpected problems (eg man from mars at door) Working well and giving immediate benefit Doesn’t avoid planning for aftermath of building fire (but will help manage situation) Still working on contingency planning and experiment requirements 22 March 2017