CERN Database Services for the LHC Computing Grid Maria Girone, CERN.

Slides:



Advertisements
Similar presentations
ITEC474 INTRODUCTION.
Advertisements

Cloud Computing: Theirs, Mine and Ours Belinda G. Watkins, VP EIS - Network Computing FedEx Services March 11, 2011.
2 Copyright © 2005, Oracle. All rights reserved. Installing the Oracle Database Software.
What’s New: Windows Server 2012 R2 Tim Vander Kooi Systems Architect
Deploying GMP Applications Scott Fry, Director of Professional Services.
Introduction to DBA.
High Availability Group 08: Võ Đức Vĩnh Nguyễn Quang Vũ
CERN - IT Department CH-1211 Genève 23 Switzerland t Relational Databases for the LHC Computing Grid The LCG Distributed Database Deployment.
Updates from Database Services at CERN Andrei Dumitru CERN IT Department / Database Services.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
1 RAL Status and Plans Carmine Cioffi Database Administrator and Developer 3D Workshop, CERN, November 2009.
BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.
© 2010 IBM Corporation Kelly Beavers Director, IBM Storage Software Changing the Economics of Storage.
© 2009 Oracle Corporation. S : Slash Storage Costs with Oracle Automatic Storage Management Ara Vagharshakian ASM Product Manager – Oracle Product.
High Availability & Oracle RAC 18 Aug 2005 John Sheaffer Platform Solution Specialist
CERN IT Department CH-1211 Genève 23 Switzerland t Next generation of virtual infrastructure with Hyper-V Michal Kwiatek, Juraj Sucik, Rafal.
Castor F2F Meeting Barbara Martelli Castor Database CNAF.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
CERN IT Department CH-1211 Geneva 23 Switzerland t Experience with NetApp at CERN IT/DB Giacomo Tenaglia on behalf of Eric Grancher Ruben.
Fermilab Oct 17, 2005Database Services at LCG Tier sites - FNAL1 FNAL Site Update By Anil Kumar & Julie Trumbo CD/CSS/DSG FNAL LCG Database.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
ASGC 1 ASGC Site Status 3D CERN. ASGC 2 Outlines Current activity Hardware and software specifications Configuration issues and experience.
Oracle on Windows Server Introduction to Oracle10g on Microsoft Windows Server.
D0 Taking Stock1 By Anil Kumar CD/CSS/DSG July 10, 2006.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
ATLAS Metrics for CCRC’08 Database Milestones WLCG CCRC'08 Post-Mortem Workshop CERN, Geneva, Switzerland June 12-13, 2008 Alexandre Vaniachine.
Workshop Summary (my impressions at least) Dirk Duellmann, CERN IT LCG Database Deployment & Persistency Workshop.
Oracle RAC and Linux in the real enterprise October, 02 Mark Clark Director Merrill Lynch Europe PLC Global Database Technologies October, 02 Mark Clark.
ATLAS Database Operations Invited talk at the XXI International Symposium on Nuclear Electronics & Computing Varna, Bulgaria, September 2007 Alexandre.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
CASTOR Databases at RAL Carmine Cioffi Database Administrator and Developer Castor Face to Face, RAL February 2009.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
Distributed Data Management Graeme Kerr Oracle in R&D Programme.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
CERN IT Department CH-1211 Genève 23 Switzerland t Possible Service Upgrade Jacek Wojcieszuk, CERN/IT-DM Distributed Database Operations.
The Million Point PI System – PI Server 3.4 The Million Point PI System PI Server 3.4 Jon Peterson Rulik Perla Denis Vacher.
Hadoop IT Services Hadoop Users Forum CERN October 7 th,2015 CERN IT-D*
1 D0 Taking Stock By Anil Kumar CD/LSCS/DBI/DBA June 11, 2007.
 The End to the Means › (According to IBM ) › 03.ibm.com/innovation/us/thesmartercity/in dex_flash.html?cmp=blank&cm=v&csr=chap ter_edu&cr=youtube&ct=usbrv111&cn=agus.
CERN Computer Centre Tier SC4 Planning FZK October 20 th 2005 CERN.ch.
Oracle Database Architecture By Ayesha Manzer. Automatic Storage Management Spreads database data across all disks Creates and maintains a storage grid.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.
CERN IT Department CH-1211 Geneva 23 Switzerland t WLCG Operation Coordination Luca Canali (for IT-DB) Oracle Upgrades.
BNL Oracle database services status and future plans Carlos Fernando Gamboa, John DeStefano, Dantong Yu Grid Group, RACF Facility Brookhaven National Lab,
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
Scalable Oracle 10g for the Physics Database Services Luca Canali, CERN IT January, 2006.
Database CNAF Barbara Martelli Rome, April 4 st 2006.
PIC port d’informació científica Luis Diaz (PIC) ‏ Databases services at PIC: review and plans.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
CERN - IT Department CH-1211 Genève 23 Switzerland t Service Level & Responsibilities Dirk Düllmann LCG 3D Database Workshop September,
DB Questions and Answers open session (comments during session) WLCG Collaboration Workshop, CERN Geneva, 24 of April 2008.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 6 April 2005.
RHEV Platform at LHCb Red Hat at CERN 17-18/1/17
Maria Girone, CERN – IT, Data Management Group
IT-DB Physics Services Planning for LHC start-up
LCG 3D Distributed Deployment of Databases
Database Services at CERN Status Update
2016 Citrix presentation.
Scalable Database Services for Physics: Oracle 10g RAC on Linux
Oracle Database Monitoring and beyond
Oracle Storage Performance Studies
3D Project Status Report
ASM-based storage to scale out the Database Services for Physics
CERN DB Services: Status, Activities, Announcements
Scalable Database Services for Physics: Oracle 10g RAC on Linux
Presentation transcript:

CERN Database Services for the LHC Computing Grid Maria Girone, CERN

CERN Database Services for the LCG - 2 Outline CERN database services for physics goals – How we address scalability, performance & reliability needs Service set-up and operations – Highly available clusters – Development, test and production levels – Backup and update policies Replication set-up to the Tier1 Service evolution for the LHC start-up Conclusions See also workshop sessions on Robust&Reliable Services

Maria Girone, CERN CERN Database Services for the LCG - 3 Introduction Physics meta-data stored in relational databases play a crucial role in the LHC experiments and in the operation of the Worldwide LHC Computing Grid (WLCG) services – Detector conditions, calibration, geometry, production bookkeeping – Core grid services for cataloguing, monitoring and distributing LHC data Key features: – High Availability – Performance and Scalability – Cost reduction with commodity HW – Consolidation – Solid backup and recovery – Security – Distributed databases among 10 Tier1 sites – Operations and Monitoring 24x7

Maria Girone, CERN CERN Database Services for the LCG - 4 Some key numbers Service based on Oracle 10g Real Application Clusters (RAC) on Linux Service Size –110 mid-range servers and 110 disk arrays (~1100 disks) –In other words: 220 CPUs, 440GB of RAM, 300 TB of raw disk space!! Several production clusters – One production cluster per LHC experiment for offline applications, up to 8-node clusters – Online test Atlas cluster of 6-nodes – COMPASS cluster Several validation and test clusters – 1 or 2 per LHC experiment of 2-nodes – Some hardware allocated for internal use/tests Service responsibilities – 5 DBAs in the team – 24x7 service on “best effort” for the production service

Maria Girone, CERN CERN Database Services for the LCG - 5 Current set-up

Maria Girone, CERN CERN Database Services for the LCG - 6 High Availability & Scalability Clustering of redundant HW Eliminate single points of failure Clusters are expanded to meet growth. Servers SAN Storage

Maria Girone, CERN CERN Database Services for the LCG - 7 Economies of Scale Homogeneous HW configuration – A pool of servers, storage arrays and network devices are used as ‘standard’ building blocks – Hardware provisioning and setup is simplified Homogeneous software configuration – Same OS and database software on all nodes Red Hat Enterprise Linux RHEL4 and Oracle 10g R2 – Simplifies installation, administration and troubleshooting

Maria Girone, CERN CERN Database Services for the LCG - 8 Current Set-Up RAC on commodity hardware - Full redundancy!! – Linux RHES4 32bit as OS platform – Oracle ASM as volume Manager – Dual-CPU P4 3GHz servers with 4GB of DDR2 400 memory each – SAN at low cost FC Infortrend disk arrays, SATA disks, FC controller FC QLogic switches SANBox (4Gbps) Qlogic HBAs dual ported (4Gbps) Most likely evolution: – Scale-up and scale-out, combined: – Leverage multi-core CPUs + 64bit Linux Good for services that don’t scale over multiple nodes Tests on ‘Quad cores’ look promising

Maria Girone, CERN CERN Database Services for the LCG - 9 Quad-core performance testing A single quad-core server is able to handle PhEDEx-like workload (a transaction oriented application) even more efficiently then a 6-node RAC 6-node RAC Quad-core server

Maria Girone, CERN CERN Database Services for the LCG - 10 Server Sizing How many CPUS? – Size CPU power to match the number of concurrent active sessions – Look at the workload on the current production service – Leave ‘extra node’ for contingency How much RAM? – A rule of thumb: 2-to-4 GB per core – DB sessions are mostly idle in our workloads, DB sessions measured per server

Maria Girone, CERN CERN Database Services for the LCG - 11 Storage Sizing How Much Storage? – Metrics: data volume needed and performance for IOPS and throughput Requirements are gathered from experiments and from stress tests For our storage set-up – IOPS determines the number of disks Consider random I/O (index range scan) 64 disks -> ~7000 IOPS (measured) – For data 25% of the row storage capacity is used we implement on-disk backups on the free disk space

Maria Girone, CERN CERN Database Services for the LCG - 12 Service Levels & Release Cycles Applications’ release cycle Database software release cycle Development serviceValidation serviceProduction service Validation service version (n+1) Production service version n Production service version (n+1)

Maria Girone, CERN CERN Database Services for the LCG - 13 Backup & Recovery Reliable backup and recovery infrastructure – Oracle Recovery Manager (RMAN) is proven technology – Backup policy agreed with the experiments and accepted also by Tier1 sites Backup on tape using IBM technology (TSM), with 31 days retention Backup on disk, with 2 days retention Automatic test recoveries in place

Maria Girone, CERN CERN Database Services for the LCG - 14 Security and s/w Updates Policy in place for Oracle and OS security patches – typically within two weeks, after validation Oracle software upgrades are typically performed once or twice per year – one month validation Oracle patches are only made for recent versions and therefore it is essential to update accordingly

Maria Girone, CERN CERN Database Services for the LCG - 15 Monitoring & Operations Procedures in line with the WLGC services Transparent operations – Disks replacement (ASM) – Nodes reboot (Oracle Clusterware) – Security and O/S upgrades HW is deployed at the IT computer center – Production is on critical power (UPS and diesels) 24x7 reactive monitoring – Sys-admins, Net-admins, Operators – DBAs Overall availability above 99.98% over the last 18 months Pro-active monitoring with OEM and Lemon

Maria Girone, CERN CERN Database Services for the LCG - 16 Replication Set-up for 3D ATLAS conditions replication setup – Tier0 online -> Tier0 offline -> Tier1’s – All 10 Tier1s are in production LHCb replication setup – Tier0 online (pit) -> Tier0 offline -> Tier1’s for conditions – LFC replication Tier0 offline -> Tier1’s – All 6 Tier1s are in production Currently, 8x5 intervention coverage – Archive log retention on disk covers weekends See for more [171]: “Production experience with distributed deployment of databases for the LCG”

Maria Girone, CERN CERN Database Services for the LCG - 17 Hardware allocation in 2008 Production databases for LHC: – 3 or 4-nodes clusters built with quadcore CPU machines ( cores per cluster) – GB of RAM per cluster – Planning for >10k IOPS – TBs of mirrored space Integration and test systems: – Single core CPU hardware – Usually 2 nodes per cluster – Usually disks 64bit version of Linux and Oracle software Migration tools have been prepared and tested to minimize the downtime of the production RACs

Maria Girone, CERN CERN Database Services for the LCG - 18 Conclusions Database Services for physics at CERN run production and integration Oracle 10g services – Designed to address the reliability, performance and scalability needs of WLCG user community One of the biggest Oracle database cluster installation Approached by the LHC experiments for service provision for the online databases Recently connected to the 10 Tier1 sites for synchronized databases – Sharing policies and procedures Well sized to match the needs of the experiments in 2008 Planning now the service growth for

Maria Girone, CERN CERN Database Services for the LCG - 19 Sharing cluster resources Applications are consolidated on large clusters, per experiment We use the Oracle Service concept: partition of a larger cluster available to a application Can allocate resources (CPU, num of connects) per service Cluster resources distributed among applications using Oracle 10g services – Each big application is assigned to a dedicated service – Smaller applications share services Shared Storage Cluster Node Cluster Node Cluster Node Oracle Service 1 Oracle Service 3 Ora. Serv. 2 User Applications