ATLAS Software Installation redundancy Alessandro De Salvo Alessandro

Slides:



Advertisements
Similar presentations
A Ridiculously Easy & Seriously Powerful SQL Cloud Database Itamar Haber AVP Ops & Solutions.
Advertisements

Single Sign-On with GRID Certificates Ernest Artiaga (CERN – IT) GridPP 7 th Collaboration Meeting July 2003 July 2003.
Skyward Disaster Recovery Options
Mecanismos de alta disponibilidad con Microsoft SQL Server 2008 Por: ISC Lenin López Fernández de Lara.
Distributed Tier1 scenarios G. Donvito INFN-BARI.
Low Control | Low Maintenance Shared Lower cost Dedicated Higher cost High Control | High Maintenance Hybrid On premises Off premises SQL Server Physical.
DPM Italian sites and EPEL testbed in Italy Alessandro De Salvo (INFN, Roma1), Alessandra Doria (INFN, Napoli), Elisabetta Vilucchi (INFN, Laboratori Nazionali.
DPM Italian sites and EPEL testbed in Italy Alessandro De Salvo (INFN, Roma1), Alessandra Doria (INFN, Napoli), Elisabetta Vilucchi (INFN, Laboratori Nazionali.
11 MAINTAINING THE OPERATING SYSTEM Chapter 5. Chapter 5: MAINTAINING THE OPERATING SYSTEM2 CHAPTER OVERVIEW Understand the difference between service.
Test results Test definition (1) Istituto Nazionale di Fisica Nucleare, Sezione di Roma; (2) Istituto Nazionale di Fisica Nucleare, Sezione di Bologna.
11 MAINTAINING THE OPERATING SYSTEM Chapter 5. Chapter 5: MAINTAINING THE OPERATING SYSTEM2 CHAPTER OVERVIEW  Understand the difference between service.
ATIF MEHMOOD MALIK KASHIF SIDDIQUE Improving dependability of Cloud Computing with Fault Tolerance and High Availability.
Business Continuity and Disaster Recovery Chapter 8 Part 2 Pages 914 to 945.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
October, Scientific Linux INFN/Trieste B.Gobbo – Compass R.Gomezel - T.Macorini - L.Strizzolo INFN - Trieste.
NCICB Systems Architecture Bill Britton Terrapin Systems LPG/NCICB Dedicated Support.
F. Brasolin / A. De Salvo – The ATLAS benchmark suite – May, Benchmarking ATLAS applications Franco Brasolin - INFN Bologna - Alessandro.
Daniela Anzellotti Alessandro De Salvo Barbara Martelli Lorenzo Rinaldi.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Chapter 10 Chapter 10: Managing the Distributed File System, Disk Quotas, and Software Installation.
ESRI User Conference 2004 ArcSDE. Some Nuggets Setup Performance Distribution Geodatabase History.
Alex Undrus – Nightly Builds – ATLAS SW Week – Dec Preamble: Code Referencing Code Referencing is a vital service to cope with 7 million lines of.
High Availability in DB2 Nishant Sinha
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Virtual Machine Movement and Hyper-V Replica
HEPiX Rome – April 2006 The High Energy Data Pump A Survey of State-of-the-Art Hardware & Software Solutions Martin Gasthuber / DESY Graeme Stewart / Glasgow.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
Click to edit Master title style Sytel’s High Availability Strategy © 2012 Sytel Limited. All rights reservedVersion 2.5.
Alessandro De Salvo Mayuko Kataoka, Arturo Sanchez Pineda,Yuri Smirnov CHEP 2015 The ATLAS Software Installation System v2 Alessandro De Salvo Mayuko Kataoka,
AlwaysOn In SQL Server 2012 Fadi Abdulwahab – SharePoint Administrator - 4/2013
CVMFS Alessandro De Salvo Outline  CVMFS architecture  CVMFS usage in the.
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
Enterprise Vitrualization by Ernest de León. Brief Overview.
Microsoft Dynamics NAV Microsoft Dynamics NAV managed service for partners, under the hood Dmitry Chadayev Corporate Vice President, Microsoft.
Architecting Enterprise Workloads on AWS Mike Pfeiffer.
Servizi core INFN Grid presso il CNAF: setup attuale
SUSE Linux Enterprise Server for SAP Applications
Dynamic Extension of the INFN Tier-1 on external resources
DPM at ATLAS sites and testbeds in Italy
Status: ATLAS Grid Computing
Welcome to SharePoint Saturday Houston
Adam Backman Chief Cat Wrangler – White Star Software
High Availability 24 hours a day, 7 days a week, 365 days a year…
Virtualization and Clouds ATLAS position
Chapter 25 Domain Name System.
High Availability Linux (HA Linux)
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Secrets to Fast, Easy High Availability for SQL Server in AWS
Active Directory Replication (Part 1) Paige Verwolf Support Professional Microsoft Corporation © 1999 Microsoft Corporation. All rights reserved.
Exadata and ZFS Storage at Nielsen
CV PVSS project architecture
Database Services at CERN Status Update
Service Challenge 3 CERN
3D Application Tests Application test proposals
Accounting at the T1/T2 Sites of the Italian Grid
Castor services at the Tier-0
Exam in just 24 hours!!! Pass your exam in first attempt by the help of our latest braindumps
The ATLAS software in the Grid Alessandro De Salvo <Alessandro
SharePoint disaster recovery as a service
Cloud based Open Source Backup/Restore Tool
Contained DB? Did it do something wrong?
Francesco Giacomini – INFN JRA1 All-Hands Nikhef, February 2008
Required 9s and data protection: introduction to sql server 2012 alwayson, new high availability solution Santosh Balasubramanian Senior Program Manager.
What is Crestron Virtual Control?
High Availability/Disaster Recovery Solution
04 | Always On High Availability
Microsoft Azure Services Platform
System Center Third Party Tools Ivanti Patch and RCT Recast April 2019.
Presentation transcript:

ATLAS Software Installation redundancy Alessandro De Salvo Alessandro ATLAS Software Installation redundancy Alessandro De Salvo Alessandro.DeSalvo@roma1.infn.it 08-11-2011 Outline System hosted in Rome Redundancy of the Installation system and the other services Current situation and plans A. De Salvo – Oct 08 2011

The power of nature On Oct 20, 2011 Rome was flooded by an unexpected amount of rain 127mm of rain in about 3 hours The site INFN-ROMA1 had to be switched off, after the water reached the servers As you might know computers cannot swim easily! 100000 tons of water in the computing room, pumped out in about 12 hours

Services hosted in Rome Installation System Two databases (rw, ro), installation agents (EGEE, OSG, CVMFS) Redundant services, but hosted by the same site Global KitValidation Portal and main KV cache KV cache mirrored at CERN Installation tools cache Hosted in the KV cache Release validation portal All the named services stopped working on Oct 20 and were resumed 5 days later

Temporary solutions A toy installation system (LJSFlite) re-written from scratch in ~8 hours 3 analysis caches, 1 base release and 1 patch deployed with LJSFlite while the main system was down > 500 validations Compatible with the main system Using KV from the CERN mirror (no GKV) Missing services GKV Release DB Main installation system Installation tools (compilers, global patches)

Full redundancy solutions (in progress) The installation system already supports native redundancy Multiple agents, can be located in different sites > 500 validations DB replicas 1 rw replica Multiple ro replicas Logfile access facility Glusterfs georeplication Experimenting a WAN automatic failover system Ring replication between N DB replicas (multi-master) 1 rw replica, 3 ro replica Main rw replicat and 1 ro replica in Roma, 1 ro replica in Napoli, https://atlas-install.na.infn.it/atlas_install Ready to test the automatic switching ro -> rw for the active replica, via watchdog Testing the global failover domain, pointing to the active replicas, using the INFN HA DNS https://atlas-install.ha.infn.it rw DB ro DB ro DB

Full redundancy solutions [2] GKV and release databases can be hosted in the same Installation System replicas Release DB already hosted in the mainInstallation System DB space GKV can be added Installation tools will be mirrored at CERN Simple synchronization

Current situation and plans Installation System replica Main DB instance in Roma, working backup in Napoli Will add at least a third replica at CERN Every replica is fully functional, it will use the local replica to show the ro status and the current rw replica for the actions You can now access the installation system via the HA domain (experimental) https://atlas-install.ha.infn.it GKV replica Can be added easily to the Installation System replicas, after the fs georeplication is in place Testing the georeplication now, needs the upgrade of the main DB machine, to be done by the end of this month KV & Installation Tools Partially done, to be completed in the next few days