Emergency Database Failover: Impacts & Recovery Plan

Slides:



Advertisements
Similar presentations
Information Technology Disaster Recovery Awareness Program.
Advertisements

Information Technology Update Aaron Smallwood Manager, IT Business & Customer Services.
Retail Market Subcommittee Update to TAC February 5, 2004.
Revision from last week  Assumptions are potential failure points in a project. They need to be monitored and managed. At the start of the project they.
1 RMS Workshop Retail Systems Disaster Recovery ERCOT May 6 th, 2014.
Database Systems, 8 th Edition Concurrency Control with Time Stamping Methods Assigns global unique time stamp to each transaction Produces explicit.
1 Appeal to ERCOT Board Regarding Use of Fuel Oil Index Price (FOIP) Larry Gurley TXU Wholesale June 19, 2007.
Lesson 11 – NETWORK DISASTER RECOVERY Disaster recovery plans Network backup and restoration OVERVIEW.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “DATABASE RECOVERY” (PART – 1) Academic Year 2014 Spring.
3 rd Party Registration & Account Management SMT Critical Situation Update Saturday March 21, 2015 – Sunday April 05, 2015.
SQL Server 2008 Implementation and Maintenance Chapter 7: Performing Backups and Restores.
Retail Sub-Committee Update Robert Connell June 14, 2002.
IT Business Continuity Briefing March 3,  Incident Overview  Improving the power posture of the Primary Data Center  STAGEnet Redundancy  Telephone.
Objectives: Upgrade Siebel to a supported application Upgrade Oracle database to current version Deliver all existing user functionality with no degradation.
CREATE THE DIFFERENCE Back ups and Recovery Janet Francis/Geoff Leese January 2010.
ERCOT Retail Market IT Update Aaron Smallwood Director, IT Operations Retail Market Subcommittee April 7 th, 2015.
PRR 568 – Settlement Timeline PRR 638 – Invoice Timeline PRR 599 – Mismatch Notification COPS January 2006 Updates.
©2006 Merge eMed. All Rights Reserved. Energize Your Workflow 2006 User Group Meeting May 7-9, 2006 Disaster Recovery Michael Leonard.
RMS Update to TAC May 8, RMS Update to TAC ► At April 9 RMS Meeting:  Antitrust Training  RMS Voting Items: ► NPRR097Changes to Section 8 to Incorporate.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
ERCOT IT Update Ken Shoquist VP, CIO Information Technology Board Meeting November 2003.
Disaster Recovery and Business Continuity Planning.
Systems Management Server 2.0: Backup and Recovery Overview SMS Recovery Web Site location: Updated.
Information Technology Update ERCOT Board of Directors Meeting January 17th, 2005.
1 RMS ERCOT Disaster Recovery Workshop Follow-up ERCOT September 9, 2014.
ERCOT SCR745 Update ERCOT Outage Evaluation Phase 1 and Phase 2 TDTWG April 2, 2008.
Rob Connell May 1, 2002 Retail Sub-Committee Update.
June 10, 2009 RMS PR90006, Commercial Systems Information Lifecycle Management (ILM) Hope Parrish, ERCOT.
ERCOT Project Update ERCOT Outage Evaluation Phase 2 (SCR745) TDTWG May 7, 2008.
Phases of BCP The BCP process can be divided into the following life cycle phases: Creation of a business continuity and disaster recovery policy. Business.
Information Technology Outage Report Dave Pagliai Manager, IT Support Services October 2015 ERCOT Public.
January 15, 2008 Monthly Board of Directors Meeting Texas Nodal Market Implementation Program Update Jerry Sullivan.
Information Technology Service Availability Metrics Trey Felton IT Account Manager COPS/RMS September 2009.
1 RMS TAC Update April 3, Test Plan Flight Dates It is the practice of RMS to approved the dates for future testing flights. This enables new.
Information Technology Update ERCOT Board of Director’s Meeting October 18, 2005.
Retail Transaction Processing Year End Review and Recent Issues RMS January 2007.
COPS – ERCOT PROJECTS UPDATE WHOLESALE MARKET DEVELOPMENT Paula Feuerbacher February 22, 2005.
February 20, 2006 Nodal Architecture Overview Jeyant Tamby 20 Feb 2006.
Incident Response (IR) / Change Control (CC) Jose L. Orozco.
Discuss Options to Improve Market Visibility to RTBES Presented by: Matt Mereness Friday, October 27, 2006.
1 RMS Update By Don Bender January 9, RMS Approved Resolution Upon TAC approval, suspend further True-up settlements for True-up resettlement.
Feb 18, TAC Report to the ERCOT Board February 18, 2003.
1 Nodal Credit Update at WMS June 16, 2010.
ERCOT Guideline for Interim Updates to the Network Operations Model Excerpted from the “Modeling Expectations” whitepaper D.W. Rickerson.
Demand Response Task Force. 2 2 Outline  Overview of ERCOT’s role in the CCET Pilot  Overview of Stakeholder Process – What’s been done to date?  Questions.
ERCOT Service Availability Metrics and Retail Systems Update April 2007.
1 Market Operations Presentation Board of Director’s Meeting January 17, 2006.
1 IT Update on Impact of Additional Congestion Zones Episode II.
DEWG and SDAWG Meeting Summary Report to COPS 2/10/2009.
16 Copyright © 2005, Oracle. All rights reserved. Performing Database Recovery.
1 TAC Report to the ERCOT Board January 17, 2006.
1 New MO Projects June COMS Extract, Report & Web Services Monitoring & Usage Statistics Jackie Ashbaugh.
CREATE THE DIFFERENCE Back ups and Recovery. CREATE THE DIFFERENCE Aims This lecture aims to cover –Back ups –Transaction logging –Security threats.
Information Technology Update Aaron Smallwood Manager, IT Business & Customer Services.
Meter Read Validation History Defined as ERCOT requirement in Original RFP Was enforced for Market Opening ERCOT system design could not support both validation.
Texas Nodal Program ERCOT Readiness Update TPTF May 5, 2008.
TAC 2012 Meetings Update (March & April) to Commercial Operations Subcommittee Harika Basaran 5/8/2012.
1 TDTWG Update to RMS Wednesday May 6, Primary Activities 1.Reviewed ERCOT System Outages and Failures 2.Reviewed Service Availability 3.Reviewed.
WMS – Nodal Credit Integration Start Up May 19,
1 TAC Report to the ERCOT Board July 18, TAC Summary 4 PRRs for approval (3 unanimous) 4 PRRs for approval (3 unanimous) 5 Nodal PRRs for approval.
Lead from the front Texas Nodal 1 TDWG Nodal Update – June 6, Texas Nodal Market Implementation Server.
August 11, 2008 TPTF Early Delivery Systems Status Daryl Cote.
Emergency Database Failover: Impacts & Recovery Plan
Settlement Timeline Workshop
EMMS Infrastructure Cost/Risk Analysis
Ken Shoquist VP, CIO Information Technology Board Meeting October 2003
COPS Communication Conference Call
Commercial Operations Sub-Committee Update to TAC
ERCOT SCR745 Update ERCOT Outage Evaluation Phase 1 and Phase 2
RAID RAID Mukesh N Tekwani April 23, 2019
Presentation transcript:

Emergency Database Failover: Impacts & Recovery Plan Aaron Smallwood – ERCOT IT Joel Mickey – ERCOT Market Operations

Emergency Database Failover Summary: ERCOT conducted an emergency database failover on April 21st, 2008 following a hardware failure While ERCOT does perform controlled database failovers monthly, this was different due to the nature of the hardware failure Normally, the database is ‘stopped’ at one site, and then ‘started’ at the other in controlled manner In this case, the database ‘hung’ – meaning that it became unresponsive and data was unable to be written to or read from database The impacts: Transactions were prevented from updating downstream databases The lack of transaction updates in downstream databases left a gap in transactional records (out of sync) The affected extracts for April 21st through April 30th are listed in market notices for the incident ERCOT considers this to be an isolated incident and not a systemic problem

Recovery Plan Goal: Plan: Recover transactions that are needed to perform price adjustment calculations that are missing in downstream databases from a restored copy of the production database Plan: Build an environment identical to the production environment Servers, storage, applications Restore data to pre-crash state (4/21) Over 20TB of data to restore from tape (in progress) Using the restored environment and data, extract transactions missing from downstream databases and then roll forward all subsequent transactions ERCOT Market Operations will then review the data for reasonableness and approve the data for reporting and settlement

Questions Actions to prevent future occurrences: Questions: Nodal market databases will be on newer hardware with more fault tolerance and redundancy Potential re-architecture of system integration between the databases Lessons learned are being documented but no plan yet Resources are focused on the data recovery efforts Questions: When will non-spinning reserve price adjustments for PRR 650 be completed? When the transactional data has been restored, reviewed, and approved What is the timeline? The environment build is complete, we anticipate the data restore from tape to be the task that takes the longest We are estimating weeks, not months, to complete the plan Unknowns include the amount of time needed to restore from tape and the quality of the data once it’s been restored Market notices will continue to be sent to indicate status