Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Services at CERN Status Update

Similar presentations


Presentation on theme: "Database Services at CERN Status Update"— Presentation transcript:

1 Database Services at CERN Status Update
Maria Girone, CERN IT-PSS

2 Database Service Evolution
Until summer 2005 Solaris based shared Physics DB cluster (2-nodes for HA) Low CPU power, hard to extend, shared by all experiments (many) linux disk servers as DB servers High maintenance load, no resource sharing, no redundancy Now consolidation on extensible database clusters No sharing across experiments Higher quality building blocks Midrange PCs (RedHat ES) FibreChannel attached disk arrays As of last month - all LHC services moved LCG Database Workshop Maria Girone

3 Service Architecture - Oracle Database Clusters
The Physics Database Production and Validation services are deployed on 2-node RAC/Linux, in failover mode LCG Database Workshop Maria Girone

4 Experience with RAC availability
Managed to apply ORACLE security patches in rolling fashion Big step to decrease planned downtime Need in time patch information from Oracle Most RAC based services stayed up during last power cut - service is now on critical power Investigating some glitches on ATLAS RAC nodes Startup after service problem significantly faster than old disk-server based services LCG Database Workshop Maria Girone

5 DB Storage Configuration (in production)
Data DG-2 Recovery DG-1 Data DG-1 Recovery DG-2 Disk Groups (ASM) DB N.1 DB N.2 Disk groups created with ‘horizontal’ slicing Benefits: more effective use of available storage High availability - Allows to keep backups on disk Higher performance (30%-50%) - Allows clusterware mirroring Oracle RAC Nodes Storage Arrays LCG Database Workshop Maria Girone

6 Service Throttling - Resource Usage Reports
Run into degraded service after single remote user submitted many (idle) jobs Defined account profile for larger apps Db accounts are shared among many users Switched on idle session “sniping” (default = 3h idle time) Proposing (eg weekly) resource overview to experiment database coordinator Allow experiment to prioritize resources and identify unexpected usage patterns Which jobs/users got affected by what limit? LCG Database Workshop Maria Girone

7 RAC Hardware evolution for 2006
Linear ramp-up budgeted for hardware resources in Planning next major service extension for Q3 this year Current State ALICE ATLAS CMS LHCb Grid 3D Non-LHC Validation - 2-node offline 2-node 2x2-node 2-node online test Pilot on disk server Proposed structure in Q2 2006 4-node 4--node 2-node (PDB replacement) 2-node valid/test 2-node pilot Compass?? Online? LCG Database Workshop Maria Girone

8 RAC Expansion for Q2 New mid-range servers received and installed
Passed acceptance tests by IT-FIO Waiting for additional disk-arrays and fibre channel switches Expect delivery end of February Planning the setting up in collaboration with IT-FIO Proceed in two steps February: Extension of existing RACs with additional CPUs Cabling work for fibre channel and IP networks has started March: Creation of new RACs eg dedicated experiment validation servers after disk-arrays and switches arrived LCG Database Workshop Maria Girone

9 Moving to 10gR2 Proceed with move to 10gR2 as main production platform for 2006 Planning with IT-DES to migrate development service for experiments to 10gR2 this month Plan to setup new RAC servers with 10gR2 Will start with validation setups Plan to migrate production service to new release as soon as experiments have validated their apps on dev or validation service Target complete move by end of March LCG Database Workshop Maria Girone

10 Backups Strategy - Review with Experiments
Default backup retention policy and frequency needs review by experiments Backup schedule - is the default of two full backups sufficient? Is the latency of a partial or full recovery acceptable? Can we reduce fraction of active writeable data? And thereby backup volume and latency Impact on physical data organisation and applications Database backup/recovery at Tier 1’s Any experiment requirements on latency to recover? Impact on Tier 0 services for replicated data Propose to setup meetings with experiment database coordinators document an agreed strategy and present at next workshop (summer) LCG Database Workshop Maria Girone

11 Summary LCG database services now fully based on RAC
Benefits of consolidation and additional flexibility obtained Q2 Database extension proceeding as planned Dedicated experiment database clusters will double in CPU power Dedicated validation resources will simplify planning Second h/w extension (Q3) will need to go out soon Need to regularly plan evolution with experiment database responsible Regular resource usage reports could be a good basis Get started with backup and recovery strategy discussions LCG Database Workshop Maria Girone


Download ppt "Database Services at CERN Status Update"

Similar presentations


Ads by Google