ERCOT SCR745 Update ERCOT Outage Evaluation Phase 1 and Phase 2

Slides:



Advertisements
Similar presentations
Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.
Advertisements

Information Technology Update Aaron Smallwood Manager, IT Business & Customer Services.
Information Technology Report Dave Pagliai Manager, IT Support Services March 2015 ERCOT Public.
Information Technology Report Trey Felton Manager, IT Service Delivery January 2012 ERCOT Public.
Emergency Database Failover: Impacts & Recovery Plan
RMS Update to TAC August 7, RMS Update to TAC ► At July 9 RMS Meeting:   RMS Voting Items:
1 TDTWG Update to RMS Wednesday March 11, Primary Activities 1.Reviewed ERCOT System Outages and failures 2.ERCOT update of browser support for.
Objectives: Upgrade Siebel to a supported application Upgrade Oracle database to current version Deliver all existing user functionality with no degradation.
1 TDTWG Report to RMS SCR 745 ERCOT Unplanned System Outages Wednesday, July 13th.
ERCOT PMO Update Robert Connell Director Program Management Fourth Quarter Results (Through 12/31/04) January 11, 2005.
1 TDTWG Update to RMS June 2, MarkeTrak API Performance Metrics Review ERCOT continues work with CenterPoint and Oncor to refine/revise the MT API.
RO Project Priority List Update EDW Projects Update RMS Meeting Adam Martinez Mgr, Market Ops Divisional Projects Organization ERCOT April 12, 2006.
RMS Update to TAC January 3, Goals Update ► Complete and improve SCR745, Retail Market Outage Evaluation & Resolution, implementation and reporting.
RMS Update to TAC May 8, RMS Update to TAC ► At April 9 RMS Meeting:  Antitrust Training  RMS Voting Items: ► NPRR097Changes to Section 8 to Incorporate.
IO – CART Project Status Protocol Revision Subcommittee Update 06/22/06.
Data Extracts & Reporting Recent Issues ERCOT Information Technology Data Extracts Working Group 11/27/07.
Market Impact Assessment TF Final Report to RMS June 11, 2008.
June 6, 2007TeraGrid '071 Clustering the Reliable File Transfer Service Jim Basney and Patrick Duda NCSA, University of Illinois This material is based.
ERCOT IT Update Ken Shoquist VP, CIO Information Technology Board Meeting November 2003.
RMS Update to TAC January 8, Voting Items From RMS meeting on 12/10/2008  RMGRR069: Texas SET Retail Market Guide Clean-up – Section 7: Historical.
Objectives: Upgrade Siebel to a supported application Upgrade to Oracle 9i database Deliver all existing user functionality with no degradation in performance.
ERCOT SCR745 Update ERCOT Outage Evaluation Phase 1 and Phase 2 TDTWG April 2, 2008.
ERCOT Project Update ERCOT Outage Evaluation Phase 2 (SCR745) TDTWG May 7, 2008.
Information Technology Service Availability Metrics Trey Felton IT Account Manager COPS/RMS September 2009.
PR50121_07 Retail Business Processes (RBP) Project Update Retail Market Subcommittee November 8, 2006 Adam D. Martinez Mgr, Market Operations DPO.
14 Copyright © 2005, Oracle. All rights reserved. Backup and Recovery Concepts.
PMO Update to PRS Troy Anderson ERCOT Program Management Office December 17, 2009.
Retail Transaction Processing Year End Review and Recent Issues RMS January 2007.
High Availability in DB2 Nishant Sinha
1 TDTWG Scope and Goals 2015 Wednesday January 8, 2014.
Objectives: Develop a solution to either enhance or replace the FasTrak tool Scope/Why is this important?: Increase the transparency for issues that are.
COPS – ERCOT PROJECTS UPDATE Retail & Services Platform Development Karen Farley March 22, 2005.
1 New RO Projects Hope Parrish June NEW RO Projects for 2008 Requested by ERCOT - Overview Objective Objective of the following information is to.
Information Technology Report Trey Felton Manager, IT Service Delivery October 2011 ERCOT Public.
PMO Update to PRS Troy Anderson ERCOT Program Management Office January 21, 2010.
June 2010 COPS/RMS Information Technology Report Trey Felton Manager, IT Administration.
18 Copyright © 2004, Oracle. All rights reserved. Backup and Recovery Concepts.
February 10, 2010 RMS ERCOT 1/24/10 Production Issue Overview and Lessons Learned Karen Farley Manager, Retail Customer Choice.
Objectives: Upgrade Siebel to a supported application Upgrade Oracle database to current version Deliver all existing user functionality with no degradation.
1 TDTWG Accomplishments 2010 Friday January 28, 2011.
ERCOT Service Availability Metrics and Retail Systems Update April 2007.
1 Texas Data Transport & MarkeTrak Systems (TDTMS) Update to RMS February 2, 2016 Jim Lee (AEP) – Chair Monica Jones (NRG) – Vice Chair.
Information Technology Service Availability Metrics Trey Felton IT Account Manager COPS/RMS January 2010.
1 TDTWG Update to RMS Tuesday March 3, Primary Activities 1.ERCOT System Outages and Failures 2.MarkeTrak Performance 3.Discussed 4 th QTR Performance.
9/13/2006 RMS Duplicate Retail Transactions. RMS9/13/2006 Background Duplicate Retail Transactions Types of duplicate transactions: –PaperFree duplicate.
1 TDTWG Report to RMS Recommended Solutions for SCR 745 ERCOT Unplanned System Outages and Failures Wednesday, August 10th.
Information Technology Service Availability Metrics March 2008.
Component 8/Unit 9aHealth IT Workforce Curriculum Version 1.0 Fall Installation and Maintenance of Health IT Systems Unit 9a Creating Fault Tolerant.
RMS Update to TAC November 1, RMS Activity Summary RMGRR057, Competitive Metering Working Group Name Change (VOTE) Update on RMS Working Group and.
Information Technology Update Aaron Smallwood Manager, IT Business & Customer Services.
EIS Projects Update COPS Meeting Adam D. Martinez Mgr, Market Ops Divisional Projects Organization ERCOT April 28, 2006.
14 Copyright © 2005, Oracle. All rights reserved. Backup and Recovery Concepts.
1 Yearly Project Prioritization Process Overview and New RO Projects Troy Anderson et. al. June 2007.
April 2010 COPS/RMS Information Technology Service Availability Metrics Trey Felton Manager, IT Administration.
2006 Production Implementations Opportunities for MP testing prior to release COPS 4/28/2006.
1 Texas Data Transport & MarkeTrak Systems (TDTMS) Update to RMS March 1, 2016 Jim Lee (AEP) – Chair Monica Jones (NRG) – Vice Chair.
1 TDTWG Update to RMS Wednesday May 6, Primary Activities 1.Reviewed ERCOT System Outages and Failures 2.Reviewed Service Availability 3.Reviewed.
Retail SLA Proposed Changes RMS/TDTWG September 2008 Trey Felton IT Account Manager.
1 TDTWG Report to RMS SCR Addressing ERCOT System Outages Tuesday, May 10.
TDTWG UPDATE TO RMS 1 Tuesday April 1, Reviewed ERCOT System Outages and failures ERCOT presented the monthly Incident Report Planned/Unplanned.
August 17, 2006 Data Extract Working Group EDW Project Updates 2006 / 2007 MO & RO PPL.
RO Projects Financial Overview Retail Market Subcommittee May 09, 2007 Adam Martinez Market Operations Division Projects Organization.
ERCOT Project Update ERCOT Outage Evaluation Phase 2 (SCR745) TDTWG November 5, 2008.
Lead from the front Texas Nodal 1 TDWG Nodal Update – June 6, Texas Nodal Market Implementation Server.
MODPO Project Update Overview of December Implementations & EDW Changes Commercial Operations Subcommittee December 11, 2006.
July 2008 RO Projects Financial Overview Retail Market Subcommittee August 13, 2008 Hope Parrish Market Operations Division Projects Organization.
Emergency Database Failover: Impacts & Recovery Plan
High Availability 24 hours a day, 7 days a week, 365 days a year…
Managing Multi-User Databases
Maximum Availability Architecture Enterprise Technology Centre.
Presentation transcript:

ERCOT SCR745 Update ERCOT Outage Evaluation Phase 1 and Phase 2 RMS April 9, 2008

PR60006_01 ERCOT Update Background: SCR 745: To achieve improved Market performance and reliability through a reduction of ERCOT Retail Systems unplanned outages. Achieve 99.99% Availability within Paperfree Application This effort was planned to be implemented in two subprojects; PR60006_01: ERCOT Outage Evaluation Phase I and Phase II Phase I, NAESB and Proxy Clustered (Delivered 02/2007) Phase II, Paperfree Clustered environment with File Server Redundancy PR60006_02: Phase III, Database Clustered environment (below PPL cut line for 2008) Phase II Current Status: 02/27/2008 – Integration, Performance/Volume and Failover Testing 03/08/2009 – Production Implementation 03/22/2008 – Rollback to previous Paperfree Infrastructure due to Performance Issues 2

PR60006_01 ERCOT Update - Continued Testing Results: 11 High Availability / Fault tolerance tests - complete. 1 related open defect; to be addressed in future release(s). Description: Node Fencing on shutdown from RSA results in application failure. Steady transaction flow volume test – completed. Despite open defect with PolyServe software, the advantages provided would include: File Server Redundancy Addresses the identified single point of failure for loss of Mapping for users and application processes. Allows for maintenance capabilities without affecting all nodes in cluster High Availability / Fault Tolerance Clustered Load Balancing 3

PR60006_01 ERCOT Update - Continued Description Resolution Root Cause 03/12/2008 Retail Application Outage Restart processes in order Human Error (See SLA Update) 867 files not loading into L* Permissions were granted Permissions issue (See SLA Update) 03/19/2008 Hard Crash of Polyserve Cluster due to SAN Switch Failure Moved Polyserve cluster to different switch SAN Switch Failure caused Node Fencing: If polyserve loses connectivity to SAN, the cluster will lock. HP Ticket logged 12/11/2007 (see slide 3). 03/12/2008 – 03/22/2008 Paperfree Performance degradation 03/19/2008 Implemented SIR 11823 to add additional transaction processing enhancements. 03/22/20008 Rollback to old infrastructure until performance tuning recommendations from HP can be implemented / tested Unknown 4

PR60006_01 ERCOT Update - Next Steps Roll iTEST back to old infrastructure of Paperfree Fan Out (Blades). Required to mitigate impact to PR60008: Ts&Cs and PUCT 33049 Performance Measures – Complete TDTWG Meeting to discuss issues – Complete. Analyze performance tuning options provided by HP for feasibility. Discuss Plans to move forward with effort on SCR745 and re-implementation of Polyserve at ERCOT with TDTWG May, 2008 Things to take consider for future discussion: PaperFree Availability Metrics (Prior to March 2008 Incidents) Previous Logged incident for PaperFree file server – 02/2007. 02/2008 – 100% availability (meeting SCR Goal). 2007 Intermediate Resolutions Code Changes File Management (Copy / Move / Delete) Retry Re-Map drives before processing vs. application startup Hardware Replacement Implementation of 3950 (4-Way) server for file server Increased Training Increased Monitoring Future discussion at TDTWG - Does the 2007 Intermediate Resolutions meet the objective of the SCR745 Phase II Goals? 5

PR60006_02: Phase III, Database Clustered environment PR60006_02 ERCOT Update PR60006_02: Phase III, Database Clustered environment Recommendation from ERCOT to TDTWG to Cancel this project – Resolved with AIX deployment Last Incident logged – 01/05/2008 02/2008 – 100% Availability 6