Presentation is loading. Please wait.

Presentation is loading. Please wait.

ATLAS Software Installation redundancy Alessandro De Salvo Alessandro

Similar presentations


Presentation on theme: "ATLAS Software Installation redundancy Alessandro De Salvo Alessandro"— Presentation transcript:

1 ATLAS Software Installation redundancy Alessandro De Salvo Alessandro
ATLAS Software Installation redundancy Alessandro De Salvo Outline System hosted in Rome Redundancy of the Installation system and the other services Current situation and plans A. De Salvo – Oct

2 The power of nature On Oct 20, 2011 Rome was flooded by an unexpected amount of rain 127mm of rain in about 3 hours The site INFN-ROMA1 had to be switched off, after the water reached the servers As you might know computers cannot swim easily! tons of water in the computing room, pumped out in about 12 hours

3 Services hosted in Rome
Installation System Two databases (rw, ro), installation agents (EGEE, OSG, CVMFS) Redundant services, but hosted by the same site Global KitValidation Portal and main KV cache KV cache mirrored at CERN Installation tools cache Hosted in the KV cache Release validation portal All the named services stopped working on Oct 20 and were resumed 5 days later

4 Temporary solutions A toy installation system (LJSFlite) re-written from scratch in ~8 hours 3 analysis caches, 1 base release and 1 patch deployed with LJSFlite while the main system was down > 500 validations Compatible with the main system Using KV from the CERN mirror (no GKV) Missing services GKV Release DB Main installation system Installation tools (compilers, global patches)

5 Full redundancy solutions (in progress)
The installation system already supports native redundancy Multiple agents, can be located in different sites > 500 validations DB replicas 1 rw replica Multiple ro replicas Logfile access facility Glusterfs georeplication Experimenting a WAN automatic failover system Ring replication between N DB replicas (multi-master) 1 rw replica, 3 ro replica Main rw replicat and 1 ro replica in Roma, 1 ro replica in Napoli, Ready to test the automatic switching ro -> rw for the active replica, via watchdog Testing the global failover domain, pointing to the active replicas, using the INFN HA DNS rw DB ro DB ro DB

6 Full redundancy solutions [2]
GKV and release databases can be hosted in the same Installation System replicas Release DB already hosted in the mainInstallation System DB space GKV can be added Installation tools will be mirrored at CERN Simple synchronization

7 Current situation and plans
Installation System replica Main DB instance in Roma, working backup in Napoli Will add at least a third replica at CERN Every replica is fully functional, it will use the local replica to show the ro status and the current rw replica for the actions You can now access the installation system via the HA domain (experimental) GKV replica Can be added easily to the Installation System replicas, after the fs georeplication is in place Testing the georeplication now, needs the upgrade of the main DB machine, to be done by the end of this month KV & Installation Tools Partially done, to be completed in the next few days


Download ppt "ATLAS Software Installation redundancy Alessandro De Salvo Alessandro"

Similar presentations


Ads by Google