Presentation is loading. Please wait.

Presentation is loading. Please wait.

GOCDB failover status and plans COD-19, 01/04/2009 G.Mathieu, A.Cavalli, C.Peter, P.Sologna.

Similar presentations


Presentation on theme: "GOCDB failover status and plans COD-19, 01/04/2009 G.Mathieu, A.Cavalli, C.Peter, P.Sologna."— Presentation transcript:

1 GOCDB failover status and plans COD-19, 01/04/2009 G.Mathieu, A.Cavalli, C.Peter, P.Sologna

2 Assessment and progress Last week's outage at RAL –a good (!) usecase for testing our procedures and listing improvements DNS aspect –new DNS machine at CNAF

3 Last RAL outage Timeline –5:20 UTC - power glitch at RAL. –8:00 – Start failover process –9:20 - DNS switch complete. –10:00 - Failover working properly. –13:25 - reverse DNS switch

4 Post mortem good things –failover worked –DNS swap quick, efficient and transparent –Good synchronisation –CNAF IRC channel was useful encountered problems –Problems with CNAF DB schema –DB Connection from ITWM to RAL –SSL issues –The overall process to swap completely took a rather long time (2h)

5 Proposed improvements (1) Improve manual process –Reduce the number of needed people. we need to allow different people to carry on the whole chain alone. –Create scripts to reduce number of actions Sort out CNAF schema issue –Improve current synchronisation mechanism Contacts and documentation –Keep somewhere a list of phone contacts, or alternative mail addresses to use in case main mail system does not work –Document all processes

6 Proposed improvements (2) Regular tests –Test CNAF replica DB –ITWM web interface –All possible scenarios Configuration improvements –Simplify configuration file –have the service publish itself the fact that it is in read-only mode. Automation –Work with OAT monitoring group –Automate DB switch –Automate portal switch the same way

7 Actions list (1) Doc and processes –Gilles to draft process + test documentation –Christian to add goc@itwm tests to ITWM procedures –All: provide contacts (phone, alternate mail, etc.) Access to machines –Christian to give failover team access to gocdb@itwm –Gilles to give failover team access to gocdb@ral- Gilles to write goc portal Scripting –Gilles to write scripts to change GOC portal conf –Peter/Ale to write DNS configuration scripts

8 Actions list (2) Improvements on CNAF-RAL DB sync –Gilles to provide a dump to CNAF whenever the schema changes –Peter/Ale/Gilles to study encryption solution to secure the dump –Gilles to check the dump solution is valid –Peter/Ale to implement new procedures –Ale to do speed tests in different scenarios

9 Actions list (3) Test Test again Re-test –Test Test (if there is some time left)


Download ppt "GOCDB failover status and plans COD-19, 01/04/2009 G.Mathieu, A.Cavalli, C.Peter, P.Sologna."

Similar presentations


Ads by Google