Presentation is loading. Please wait.

Presentation is loading. Please wait.

e-Health 2013 Challenges & Opportunities TEC Talk

Similar presentations

Presentation on theme: "e-Health 2013 Challenges & Opportunities TEC Talk"— Presentation transcript:

1 e-Health 2013 Challenges & Opportunities TEC Talk
Service Interrupted… AHS Experience with IT Major Incidents & Clinical Involvement e-Health Challenges & Opportunities TEC Talk Jill speaking... Welcome to our TEC talk presentation. My name is Jill Robert and I along with Wendy Tagart are AHS employees with the IT department. Biographies Wendy Tegart BSc, BA - Provincial Director, Service Management Wendy Tegart is a member of the management team within Alberta Health Services with responsibilities for improving IT service delivery by leveraging ITSM. Wendy's career has been focused on service management over the past ten years in various organizations including Alberta Health Services, Calgary Health Region and University Health Services in Toronto. As Director, IT Service Management, Wendy is building an ITSM provincial solution that will centre on delivering results by aligning IT and business goals with a focus on operational and service excellence, governance and best practices. Jill Robert BScN, RN – Provincial Director, IT Strategic Partnerships Jill has approximately 10 years of bedside nursing experience in critical care and another decade in the area of clinical informatics. She’s played a vital role in building, implementing and supporting clinical information systems both in Halifax and Calgary. With her knowledge and experience in clinical education and organizational change management, Jill supports Alberta Health Services to ensure the IT investment strategy in major projects and operations will bring value to clinicians and the organization as a whole. Wendy Tegart, Provincial Director Service Management Jill Robert, IT Strategic Partner

2 Faculty/Presenter Disclosure
Faculty: Wendy Tegart & Jill Robert Relationships with commercial interests: Grants/Research Support: Not applicable Speakers Bureau/Honoraria: Not applicable Consulting Fees: Not applicable Other: Employees of Alberta Health Services Jill speaking... This slide must be visually presented to the audience AND verbalized by the speaker. As per the disclosure requirement, we have “Nothing to Disclose”. Nothing to Disclose

3 Agenda Alberta Health Services Overview Major Incident Process
1 Alberta Health Services Overview 2 Major Incident Process 3 Major Incident Roles 4 Communication Approach 5 Clinical Involvement 6 Next Steps Jill speaking... We’re excited to share with you our evolving work at Alberta Health Services around supporting our organization when there are major IT service interruptions. Unfortunately, we have more experience in this area than we’d like. We’ll share the processes, roles involved and learnings we’ve garnered. But first, I’d like to provide a little background to where we come from and who we are representing... 7 Questions

4 Annual Service Volumes (2011-12)
Alberta Health Service Overview Alberta Health Services (AHS) Responsible for delivering health services to the 3.8 million people living in Alberta, over 661,848 square kilometers served Annual Service Volumes ( ) Acute Care 2,029,191 Emergency Department Visits 376,115 Hospital Discharges 2,602,384 Total Hospital Days 50,099 Births 99 Acute care hospitals and 5 stand-alone psychiatric facilities Jill speaking.... Alberta Health Services was formed in In essence, Alberta dissolved nine regional health authorities, the Cancer Board, the Mental Health Board and the Alcohol and Drug Abuse Commission, and rolled them into one entity called Alberta Health Services. A province-wide health delivery system that is charged with delivering equitable services to 3.8 million Albertans no matter where they live in the province. Primary Care 104,704 Home Care Clients 766,146 Health Link calls 393,964 EMS Calls/Events

5 AHS Scale of Effort Largest Employer in Alberta, 5th largest in Canada
Alberta Health Service Overview AHS Scale of Effort Largest Employer in Alberta, 5th largest in Canada 100,000 employees 7,000 physicians 120,000 network IDs Scope of AHS-IT 1,514 production apps (163 critical) 34 data centers 4,721 servers (physical and virtual) 75,000 workstations 48,000 tickets generated monthly 550 concurrent users in ITSM tool 1,300 IT Staff (+ outsourced partners) Jill speaking... The Alberta Health Services merger is the largest in Canadian history. It involves an $8 billion operation with approx 100,000 employees, along with physicians and other providers. From an information technology perspective, the AHS IT department has approx 1300 staff supporting 1500 server-based applications, 34 data centres and troubleshooting 48,000 IT service incidents per month.

6 Context to Current Realities
Major Incident Process Context to Current Realities Complexities of Electronic Health Record in Alberta Local vs Provincial IT service delivery Given the complexities of the AHS IT landscape, aging and varied technical infrastructure and critical service requirements to support patient care... “Downtimes happen...” How do we minimize organizational and clinical impact and provide robust support when the technology fails? Jill Speaking... While healthcare organizations increasingly adopt information systems for point-of-care support, these Electronic Health Record worlds have become progressively complex from a system integration and data integrity perspective. The building components of the Alberta Health Services EHR are extremely complicated from both the upstream and downstream components including: registries, repositories, patient care systems, interfaces, and how information is accessed/viewed across the enterprise through the Alberta Netcare Portal. When AHS formed in 2008, twelve IT departments needed to be consolidated. We have spent a lot of effort in forming, storming and norming as we aim to move away from local IT solutions to provincial service delivery models. There is added complexity with aging and varied technical infrastructure, limited funds and increased project demands while at the same time having to ensure that the critical IT services to support patient care are supportable, sustainable and scalable. While we have made some headway, there is still a lot of work ahead of us. All of us in this room know that “Downtimes happen...” and while the outage may not be for long, in the eyes of our customers, the measure of the IT department can be based on their most recent downtime experience.

7 Super Bowl 2013 – infamous power outage
Major Incident Process Super Bowl 2013 – infamous power outage Wendy speaking... How many of you watched Super Bowl 2013? Over 100 million people worldwide tuned in to watch this years Super Bowl XLVII. It could be argued that it was the most viewed and infamous power outage. There is varied accounts of why it happened from a power line issue, to a software glitch, to a hardware issue. It felt like forever, but the actual electricity outage was for 33 minutes. When outages happen in a healthcare environment, it is also critical as it can impact patient safety and these types of outages put us in the news as well. SHAW Communications Outage In July 2012, Calgary experienced a huge outage with the Shaw Communications Building electrical fire that impacted radio stations, internet service, 911 service in the core, cable TV service, much of city hall’s phone system and all AHS applications hosted in that site (including and patient care systems mainly for Calgary zone). It was related to a electrical fire caused by a transformer explosion, there was redundancy for power through dual generators but unfortunately the sprinkler system caused those to shut down as well. It took 2.5 days to bring back up the AHS systems as water in the building/data center was an issue. It then took AHS an additional 3 days to enter paper information used during the outage and to manage data reconciliation efforts. In January 2013, Shaw disclosed that it received $5-million from its insurance company to help repair the damage. Moreover, the company said it was facing additional costs of $6-million for ongoing restoration. Total equalling $11M.

8 What is a Major Incident (MI)?
Major Incident Process What is a Major Incident (MI)? IT has a provincial Incident Management Process to manage all Incidents. When an Incident is of a certain scale, scope, or impact, a “Major” Incident is launched. The goal of the Incident process is to return an IT Service to operational status. Throughout AHS-IT, we employ this common process to ensure that major IT service issues are quickly identified and appropriately responded to. The purpose of the MI process is to supplement the Incident process with additional resources, escalation, communication and record keeping. Wendy speaking... As most systems were and are still structured geographically, this multi-merger added complexity to the management of both scheduled and unscheduled downtimes. Arriving at a common definition for an IT Major Incident and creating a single process for escalation, communication and documentation in a distributed support environment was critical. A key aspect has been developing close partnerships with all levels of clinical operations to assess and minimize impact to patient care. Leveraging ITIL best practices of Incident, Change, Configuration, Knowledge and Problem Management processes was vital for success.

9 Is this a “Critical” Incident?
Major Incident Process Is this a “Critical” Incident? Urgency and Impact must both be High to create a critical incident. Critical Incidents must be escalated to the IROC immediately. Critical incidents are: a major outage affecting a large number of customers an essential service and/or a business unit where there is no available resolution or work around to provide a return to business operations Must also consider: Patient safety may be at risk or reduced effectiveness of patient care The safety of AHS staff and personnel Impact to confidentiality of data, or reliability of data Degradation of a service including data, applications, or infrastructure. A Senior Admin from the business is requesting a Major Incident be declared (requires immediate escalation to IROC) Wendy speaking... How do we identify IT Major Incidents? All critical priority incidents are Major Incidents. (MI) – and no this is not a Myocardial Infraction (heart attack) but I am surprised that they have not given our CIO one yet. The business, a senior leader, an Incident Response On Call who is a Director or a Service Owner could request the MI process be invoked. In any case the incident should immediately be escalated to the IROC to initiate the Major Incident process. Incident Management activities that drive resolution of the Incident should begin and continue while the MI process is initiated. Critical incidents are: a major outage affecting a large number of customers – like the Shaw Court fire an essential service and/or a business unit where there is no available resolution or work around to provide a return to business operations – like a network outage that affects DI PACS which would impact patient care Our IROCs are at the Director Level as it takes operational experience and being well seasoned to make the judgement call of when to launch an MI. Additionally, each Service/Application should have a clear understanding of business expectations that identifies when a Service or Application qualifies for an MI and clearly identifies which support groups need to be contacted in the event that an MI occurs. Like medicine, determining if an incident meets the critical of an MI - is more of an art form than a science.

10 Is this a “Critical” Incident? Urgency
Major Incident Process Is this a “Critical” Incident? Urgency Wendy speaking... How do we establish the priority of an incident – is this a critical incident? Priority is a combination of impact and urgency. The combination of a High Impact and a High Urgency create a Critical Priority Incident. High Urgency is described as … Read description

11 Is this a “Critical” Incident? Impact
Major Incident Process Is this a “Critical” Incident? Impact Wendy speaking... Read description Who is affected etc.

12 Priority Major Incident Process Wendy speaking...
Once the assessment is completed and determines the high impact plus high urgency equals a critical priority the major incident process must be initiated. When a critical incident is logged, the IT staff must initiate the process. The MI Process must be initiated as soon as the critical incident is identified.

13 Major Incidents by Month
Major Incident Process Major Incidents by Month Wendy speaking... How many of these do we manage? The data is not encouraging. The large spike at the end of 2011 coincides with the CIO directive to ensure all Major Incidents are reported numbers are consistent in this change in reporting of MI’s. In 2013, all incidents with a critical priority must be immediately escalated as an MI. The MI process is more of a organizational change management effort than an ITIL process - as we are trying to create a culture where it is better to declare an MI and stand it down, than not declare one and delay the communication of the issue or the resolution.

14 IT Major Incident Roles
An IT service Incident is typically managed by the IT Service Desk and/or a specific IT Service team. When an MI is initiated, some additional resources brought in include: IT Incident Response On Call (IROC) This is a group of IT Directors who share an On Call responsibility for MI’s. Once contacted, the IROC is responsible for managing the MI Process so the Service Desk and Service team can concentrate on resolving the Incident. IT Security & Compliance On Call On Call IT Security staff to respond to MI’s with a security component. IT Senior Leader On Call This group of IT senior leaders is available to provide additional guidance and authority if/as required by the particular MI. Problem Manager Chair and facilitate communication bridge meetings. Notify IT staff of updates. Wendy speaking... Now we’re going to expand on some of the additional roles and processes that occur in Major Incidents.

15 Major Incident Roles Clinical Roles Not all MIs require the engagement of clinical experts, but when required these roles provide context to clinical impact and urgency Clinical Informatics This is a group of Physicians and non-physicians Clinical Operations Administrator On-call On Call AHS leaders including Executive Directors and Site Administrators. May provide front line resources to support in downtime and reconciliation efforts Senior Leadership On-call This group of AHS Senior leaders include Facility Medical directors and VPS Health Information Management Health Record Management experts with data and record integrity expertise Zonal Emergency Operations Centres (ZEOCs) Tied into Emergency Preparedness Jill speaking... In order to understand what the incident or technical failure means to front line clinicians, the potential risk to patient care and what messages need to be delivered ASAP, we have a number of clinical folks actively participating in the MI process. This ranges from physicians to nurses to health information management experts. We also link into the clinical administrative leadership – accessing their existing on call processes to notify them of MIs and to determine if and when staffing augmentation is required. And finally, when all hell breaks loose, we call on the disaster preparedness processes and leverage the Zone Emergency Operations Centres.

16 Bridges Types (conference calls)
Communication Approach Bridges Types (conference calls) Technical Bridge Part of the Incident Management process, as is initiated independently of MI process Opened when collaboration by several parties is required during incident resolution activities Communications Bridge Launched by IROC to bring the right stakeholders together to identify the problem and direct its resolution. Problem manager assists by recording chronology, participants, decisions and results Directs communications within IT and the user community Clinical Bridge Usually chaired by a Clinical Informatics physician Wendy speaking... Technical Bridge: is important to the resolution of Major Incidents. It is part of the Incident Management process that is initiated independently of the Major Incident process. The technical bridge should be opened as soon as it is required. Communications Bridge: is a call that is lead by the IROC and supported by the Problem Manager between technical specialists, vendors, IT leads, business representatives and other people integral to identifying the problem and directing the resolution. The purpose is to identify the impact, action items to resolution, key players, communication strategy and timelines. The Major Incident Communications bridge will also direct communications with IT, and to the user community as necessary during the resolution process. Finally, it supplements the Incident process with additional resources, escalation, communication and record keeping. Note: Depending on the complexity and impact of the MI, separate bridge calls are kicked off to address technical, clinical, and communication aspects of the MI. The challenge is to ensure that it is clear who is the liaison between the bridges to provide updates.

17 MI Heads Up Notification
Communication Approach MI Heads Up Notification Wendy speaking... Sample notification coming to the Major Incident notification mailing list. We ask that all teams who are involved in some aspect of supporting the service in question to call into the bridge at the designated time to get and provide more information. The first bridge call is considered an all hands on deck situation, subsequent bridge calls should involve only those who have an interest in the resolution of the incident. This will also include Technical and Clinical bridge numbers if they are available or required.

18 Communications to Customers
Communication Approach Communications to Customers IT - Service Issue Information Message may be sent to users of IT Services. In relation to unexpected/unplanned service issues. Say who is this information is intended for / pertains to. Speak in terms the customer will understand. Briefly and directly tell users what is happening and what impact they will experience. Note that IT teams are working to resolve the issue and restore Services. Acknowledge the issue/inconvenience and provide contact information for the relevant zone/FHE Service Desk. If appropriate, state that an update will be provided within a specific timeframe. Replace all text in this section with pertinent information. Review the notifications guide if there are questions on when to use this format. Impact Summary Clearly state what, from the users perspective, is not working. Also set out the specific locations affected by this Service Issue. NOTE: Any exclusions or caveats to what you've stated above regarding this Service Issue. Wendy speaking... This is the format designed for communication to customer in the event of a service outage. During the MI call there should be a discussion of if an announcement to customers is needed, who should draft the text and who should send it. In the event of a large scale outage, Corporate communications will take on the roll of communication with customers and they will use their own templates.

19 MI Root Cause Code Definitions
Communication Approach MI Root Cause Code Definitions Cause Code Summary Application/Software Bug The failure is caused by a problem within the packed software itself. Communication Failure is caused by a missed communication. Data Unexpected or corrupted data elements caused the failure. Environment The failure is caused by an uncontrolled element of the physical world where redundancy would not have reasonably mitigated the effect. Equipment Failure due to age, malfunction or fault in the physical equipment where redundancy would not have reasonably mitigated the effect.. IT Third Party Vendor Root cause lies with the vendor providing a service. Process Missing or undeveloped process caused the failure. There was an oversight in the process; a branch of the process isn’t properly developed or missed entirely. Security An IT Security failure caused the issue. Training The failure was caused by lack of understanding, incorrect qualification or insufficient training. Other A mistake was made where existing process, if followed correctly, should have avoided the failure. Unknown Root cause undetermined. Wendy speaking... Starting in Jan 2013, we have tightened up the root cause definitions to better pin point areas for improvement and see trending issues. After every MI, there is a meeting with the core stakeholders to finalize the details of the MI including root cause, and action items to be completed so that this MI does not reoccur.

20 Clinical Support During a MI
Clinical Involvement Clinical Support During a MI Transparency Communication Understanding and translating the clinical impact Timely and frequent “clinical speak” communication about the incident and immediate risk mitigation measures Support Robust downtime procedures owned by clinical operations Bedside to boardroom engagement and support In his book The Quantum Age of IT, Charles Araujo (A-row-zhu) talks about working for a health care organization that was having many MI’s due to old infrastructure. The organization openly acknowledged that the problem could not be fixed overnight. Speaks to their honesty. But what they did do was assign people to call the senior executives and give them the heads up when issues were brewing. One person, one exec. Instant communication and improved transparency. Ironically, despite the MIs, they did find that their IT satisfaction scores improved. One of our challenges has been the translation of an incident into what it means to clinicians or AHS staff in a language that makes sense. The point is succinct communication which focuses on the perceived area of greatest clinical risk, what staff need to do in the mean time, and don’t speak techie! We have a core group of nurses and physicians that help us craft meaningful communications during an MI. I heard the other day that the rule of 7 (communicate your message 7 times in 7 different ways) has now been replaced by the rule of 14. I think that speaks for itself – we so often rely on to get our messages out. We’ve provided some of the templates that we use, but we also use broadcast faxes, humans (staff rounding to units), phone calls, overhead pages, splash screens, our intranet site and so forth. The last comment I’d like to share is about the ironic challenge we face - many organizations have the uphill battle with CIS adoption, but then the pendulum swings. We now are in a situation where we have clinicians that have never worked with a paper record and those that did, forget what it was like and how to revert back to paper processes in emergency situations. Downtime procedures and clinical staff owning and embedding them into their annual ‘refreshers’ or ‘re-certification’ processes is key to minimize disruption during a major incident. Continual effort is required to train staff on how to operate in the event when a system is not available.

21 Clinical Involvement More than clinical involvement... it’s about relationships, partnerships and supporting safe patient care! Jill speaking... When Wendy and I were preparing for this talk, we realized how patronizing the use of “involvement” was. When we reflect on the most major incident we’ve experienced, the Shaw Communications outage, the complete shut down of the entire AHS Calgary IT network and along with it, the use of its applications both corporate and clinical, and on top of that, the entire AHS system (from High Level in the north to Lethbridge in the south) – the success factors in supporting that event were much greater than just “clinical involvement” but entire AHS organizational teamwork which was possible because of relationships, partnerships and the sole focus on providing and supporting safe patient care. It was about leveraging the Zone Emergency Operations Centre – corporate, clinical and IT teams organizing and operating out of one central location, working 24/7 and dispatching people, information and support wherever it was needed. For example our IT administrative assistants put on their running shoes and delivered lab and DI reports to patient care units. It was about hourly updates distributed to the front lines and conference calls with the entire Calgary AHS clinical administration and 3 to 4 daily meetings at each facility with patient care managers. It wasn’t about blame, it was about the organization (clinical, corporate and IT) pulling together to mitigate the patient care risk of a major major problem.

22 Next Steps Continuous improvement per incident review
Develop service improvement plans overall driven by business requirements Examine different scales of MIs and support requirements Leveraging the successes of the MI process to other risk areas Continually examine clinical business risk tolerance/value and architecture of information systems Simplify – application consolidation, migration to a provincial patient care platform and large scale reliability/redundancy Wendy speaking... Work is ongoing to improve communications between IT and the clinical business, to better understand the business continuity needs, and to better align with the corporate Emerging Incidents and Disaster Preparedness processes. It’s like that old saying that you need to expect the unexpected. When the unexpected does arrive you have to be prepared to come back from that downtime swiftly and with as little disruption to your business as possible. With the right technology and the right best practices in place, you can minimize the damage and decrease the chance of downtime seriously hampering your ability to do business. There is nothing like a Major Incident to get the circulation flowing... With every MI we’ve learned, and we’ve absorbed and we are committed to continually find ways to improve it... Because unfortunately no matter how redundant the infrastructure or applications are... Downtimes happen and IT will fall. What matters is what we learn from it and how we use this information to improve. 1. Good judgment comes from bad experience. 2. Experience comes from bad judgment. -- Higdon's Law Higdon’s Law Good judgement comes from bad experience. Experience comes from bad judgement.

23 Comments / Questions Questions?
Insanity: Doing the same thing over and over again and expecting different results. ~ Albert Einstein

Download ppt "e-Health 2013 Challenges & Opportunities TEC Talk"

Similar presentations

Ads by Google