Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 TDTWG Report to RMS SCR 745 ERCOT Unplanned System Outages Wednesday, July 13th.

Similar presentations


Presentation on theme: "1 TDTWG Report to RMS SCR 745 ERCOT Unplanned System Outages Wednesday, July 13th."— Presentation transcript:

1 1 TDTWG Report to RMS SCR 745 ERCOT Unplanned System Outages Wednesday, July 13th

2 2 Motion SCR745 includes: (1.) a system evaluation and (2.) a recommended solution based on a review of the evaluation. SCR745 will be sent to the TAC and Board for consideration and possible approval.

3 3 SCR 745 Analysis Approach SCR 745 requested ERCOT to perform in depth analysis in order to determine root causes for unplanned system outages. ERCOT in depth analysis indicates the current architecture supporting the Retail Market contains multiple single points of failure. While it is not possible to totally eliminate any possibility for an ERCOTsystem outage, it is possible to implement solutions that drastically reduce unplanned system outages for ERCOT by removing these single points of failure. This presentation includes the solutions identified.

4 4 Retail Systems NAESB PaperFree TCH-EAI (Transaction Clearing House) All Retail (Database Server) Market Participant

5 5 The following options are being presented to assist RMS in reviewing and eventually approving the best solutions for resolving unplanned ERCOT system outages. The 4 options include: 1 of 2 options for NAESB Proxy Server improvements 1 of 3 options for NAESB Application (dependent on NAESB Proxy Server option) 1 of 2 options for PaperFree improvements 1 of 3 options for Database Server for All Retail System Options Included

6 6 Current NAESB Architecture The Retail Transaction communication system using the North American Energy Standard Board Electronic Delivery Mechanism (NAESB EDM) V 1.6. This system is an internet based protocol. The current NAESB architecture includes 2 NAESB Proxy servers in Taylor and 2 NAESB Proxy servers in Austin (to be used for disaster recovery only). Due to the large quantity of data and critical timing for that data, the current NAESB architecture is insufficient for supporting the Texas Retail Market.

7 7 NAESB Proxy Server Options Option 1 – Fully Clustered* V880 Solution – 4 V880 NAESB Proxy Servers Summary – Maximum reliability solution. This option will provide a fully clustered and fault tolerant solution; opportunity to consolidate the current 18 production proxy servers including the servers identified in Option 2 This option virtually eliminates the potential for NAESB proxy outages, unplanned or planned. This option will provide 99.99% availability for the NAESB proxy servers. *Cluster: A group of servers that are typically on different physical machines and have the same applications configured within them, but operate as a single logical server.

8 8 NAESB Proxy Server Options Option 2 – 4 V120 NAESB Proxy Servers. Summary – Minimum reliability solution. This option will provide redundancy to address the single point of failure. Two servers will be located in Taylor and two servers will be located in Austin. This will not be a clustered solution it will be a load balance solution. V120 servers cannot cluster. This solution will reduce the frequency and duration of proxy outages, is not as costly as option 1 but is also not as a robust solution as Option 1.

9 9 NAESB Application Options Option 3 - Separate Application Server Cluster This option moves peripheral NAESB processes (data encryption, decryption) to the PaperFree cluster and separates inbound and outbound transmissions to disconnected clusters.

10 10 NAESB Application Options Option 4 Hybrid Application Cluster This option creates an application cluster for inbound transactions and moves outbound transaction processing to the PaperFree system in order to utilize PaperFree’s load balancing and high availability capabilities.

11 11 NAESB Application Options Option 5 – Combined Application Cluster This option combines inbound and outbound transaction processing into a single application cluster.

12 12 Summary of NAESB Application Cost Option 1 V880 Server Cluster$370,000 Option 2 V120 Server Redundancy $97,000 Option 3 Separate Application Server Cluster$175,000 Option 4 Hybrid Application Cluster$165,000 Option 5 Combined Application Cluster$235,000 Must choose one selection of Option 1 or Option 2 and one selection of Option 3, Option 4 or Option 5. An additional cost of $66,105 identified for Training, Business Process and Monitoring. Blue highlighting identifies recommended solution

13 13 PaperFree Paper Free includes the data validation and transformation system. The current architecture contains a single disk share for multiple load balanced application servers. This disk is the single point of failure for this system.

14 14 PaperFree Options Option 1 – Clustered File System Server solution This option represents the maximum availability solution.

15 15 PaperFree Options Option 2 – Local File System Solution –This option supports the load balancing applications –The system will still be active with a single sever failure; however server interruptions may result in delays in processing persistent data for the server experiencing an interruption.

16 16 Summary of PaperFree Costs Option 1 – Clustered File System Server solution –$75,000 Option 2 – Local File System Solution –$105,000 Blue highlighting identifies recommended solution

17 17 All Retail System

18 18 All Retail System The All Retail System is the database server which houses each system’s database ( NAESB, PaperFree, Siebel and TCH-EAI). This Database server is a single point of failure for multiple Retail Systems. All Retail System Goal: Provide high availability for all databases that support the Retail Applications including; NAESB, PaperFree, Siebel, TCH-EAI. This will allow processing of data to continue in the event of a database server failure.

19 19 Database Server High Availability Options Option 1 - All HP-UX Oracle Real Application Cluster (RAC) Option 2 - All Linux Oracle Real Application Cluster (RAC) For options 1 and 2: Provides active redundancy for database connectivity for all retail databases Complex to implement Removes single point of failure at the database server level

20 20 Database Server High Availability Options Option 3: –NAESB Linux Oracle RAC and Different Standby/cluster solution for the rest of the Retail databases Provides active redundancy for database connectivity for NAESB database Less complex to implement as NAESB database is small and easier to migrate Provides option to migrate PaperFree and Siebel to migrate into this RAC Removes single point of failure at the database server level –Veritas cluster, or Oracle Standby or Oracle RAC for other databases on HP-UX or Linux for appropriate availability requirements. Phased implementation NAESB first and other databases next Removes single point of failure at the database server level

21 21 Database Server High Availability Options Summary –All three options provide highest availability architecture for NAESB database. –Option 1 and 2 provide highest availability architecture for all databases, however, they are most expensive and complex to implement and manage. –Option 3 provides highest availability option for the NAESB database and will provide appropriate high availability solutions for the rest of the retail databases in subsequent phases. Easier to implement in phased manner addressing acute availability needs first.

22 22 Summary of Database Server High Availability Costs Cost –Options 1&2 Oracle RAC Hardware – $450,000 Cluster SW – $400,000 Oracle RAC SW - $400,000 Cluster Ext Service - $100,000 Oracle RAC Ext Service - $100,000 Internal project cost (FTE) - $180,000 Total: $1,630,000 –Option 3 Partial Oracle RAC + Alternate Solution for remaining Hardware – $400,000 - $600,000 Cluster SW –$100,000 - $400,000 Oracle RAC SW - $0-$400,000 Cluster Ext Service –$0-$120,000 Oracle RAC Ext Service - $120,000 - $180,000 Internal project cost (FTE) - $120,000 - $180,000 Total: $890,000 - $ 1,650,000

23 23 Next Steps Today if recommended by RMS, TDTWG will facilitate a technical workshop to be held before the next RMS meeting. This workshop is intended to help RMS members and interested Market Participants review the in depth system evaluation in order to select recommended solution(s) for approval at the August RMS meeting.

24 24 Questions


Download ppt "1 TDTWG Report to RMS SCR 745 ERCOT Unplanned System Outages Wednesday, July 13th."

Similar presentations


Ads by Google