Presentation is loading. Please wait.

Presentation is loading. Please wait.

Emergency Database Failover: Impacts & Recovery Plan

Similar presentations


Presentation on theme: "Emergency Database Failover: Impacts & Recovery Plan"— Presentation transcript:

1 Emergency Database Failover: Impacts & Recovery Plan
Aaron Smallwood – ERCOT IT Joel Mickey – ERCOT Market Operations

2 Emergency Database Failover
Summary: ERCOT conducted an emergency database failover on April 21st, 2008 following a hardware failure While ERCOT does perform controlled database failovers monthly, this was different due to the nature of the hardware failure Normally, the database is ‘stopped’ at one site, and then ‘started’ at the other in controlled manner In this case, the database ‘hung’ – meaning that it became unresponsive and data was unable to be written to or read from database The impacts: Transactions were prevented from updating downstream databases The lack of transaction updates in downstream databases left a gap in transactional records (out of sync) The affected extracts for April 21st through April 30th are listed in market notices for the incident ERCOT considers this to be an isolated incident and not a systemic problem

3 Recovery Plan Goal: Plan:
Recover transactions that are needed to perform price adjustment calculations that are missing in downstream databases from a restored copy of the production database Plan: Build an environment identical to the production environment Servers, storage, applications Restore data to pre-crash state (4/21) Over 20TB of data to restore from tape (in progress) Using the restored environment and data, extract transactions missing from downstream databases and then roll forward all subsequent transactions ERCOT Market Operations will then review the data for reasonableness and approve the data for reporting and settlement

4 Questions Actions to prevent future occurrences: Questions:
Nodal market databases will be on newer hardware with more fault tolerance and redundancy Potential re-architecture of system integration between the databases Lessons learned are being documented but no plan yet Resources are focused on the data recovery efforts Questions: When will non-spinning reserve price adjustments for PRR 650 be completed? When the transactional data has been restored, reviewed, and approved What is the timeline? The environment build is complete, we anticipate the data restore from tape to be the task that takes the longest We are estimating weeks, not months, to complete the plan Unknowns include the amount of time needed to restore from tape and the quality of the data once it’s been restored Market notices will continue to be sent to indicate status


Download ppt "Emergency Database Failover: Impacts & Recovery Plan"

Similar presentations


Ads by Google