CERN Data Services Update HEPiX 2004 / NeSC Edinburgh Data Services team: Vladimír Bahyl, Hugo Caçote, Charles Curran, Jan van Eldik, David Hughes, Gordon Lee, Tony Osborne, Tim Smith
2004/05/26CERN Data Services: of 19 Outline Data Services Drivers Disk Service Migration to Quattor / LEMON Future directions Tape Service Media migration Future directions Grid Data Services
2004/05/26CERN Data Services: of 19 Data Flows Tier-0 / Tier-1 for the LHC Data Challenges: CMSDC04 (finished); PCP05 (Autumn)+80; +170 ALICE ongoing+137 TB LHCb ramping up+40 TB ATLAS ramping up+60 TB Fixed Target Programme: NA48at 80 MB/s+200 TB COMPASSat 70 MB/s (peak 120)+625 TB nToFat 45 MB/s+180 TB NA60at 15 MB/s+60 TB Testbeamsat 1~5 MB/s (x 5) Analysis…
2004/05/26CERN Data Services: of 19 Disk Server Functions
2004/05/26CERN Data Services: of 19 Generations 0 th Jumbos 1 st & 2 nd 4U 3 rd & 4 th 8U
2004/05/26CERN Data Services: of 19 Warrantees
2004/05/26CERN Data Services: of 19 Disk Servers: Jan EIDE Disk Servers Commodity Storage in a box 544 TB of disk capacity 6700 spinning disks Storage Configuration HW Raid-1mirrored for maximum reliability ext2 file systems Operating systems RH6.1, 6.2, 7.2, 7.3, RHES 13 different kernels Application uniformity; CASTOR SW
2004/05/26CERN Data Services: of 19 Quattor-ising Motivation: Scale Uniformity; Manageability; Automation Configuration Description (into CDB) HW and SW; nodes and services Reinstallation Production machines – min service interruption! Eliminate peculiarities from CASTOR nodes MySQL, web servers Refocus root control Quiescing a disk server draining a batch node! Gigabit cards gymnastics (ext2 -> ext3) Complete (except 10 RH6 boxes for Objectivity)
2004/05/26CERN Data Services: of 19 LEMON-ising MSA everywhere Linux box monitoring and alarms Automatic HW static checks Adding CASTOR server specific Service monitoring HW Monitoring lm_sensors (see tape section) smartmontools smartd deployment Kernel issues; firmware bugs; through 3ware controller smart_ctl auto checks; predictive monitoring IPMI investigations; especially remote access Remote reset/power-on/power-off
2004/05/26CERN Data Services: of 19 Disk Replacement Failure rate unacceptably high 10 months to be believed 4 weeks to execute 1224 disks exchanged (out of 6700) And the cages Western Digital; type DUA Head instabilities
2004/05/26CERN Data Services: of 19 Disk Storage Futures EIDE Commodity storage in a box Production systems HW Raid-1 / ext3 Pilots (15 production systems) HW Raid-5 + SW Raid-0 / XFS (See Jan Ivens talk next) New tenders out… 30TB SATA in a box 30TB external SATA disk arrays New CASTOR stager (see Olofs talk)
2004/05/26CERN Data Services: of 19 Tape Service 70 tape servers (Linux) (mostly) Single FibreChannel attached drives 2 symmetric robotic installations 5 x STK 9310 Silos in each Drives Media
2004/05/26CERN Data Services: of 19 Tape Server Temperatures lm_sensors package General SMBus access and hardware monitoring. Used to access LM87 chip Fan speeds Voltages Int/Ext temperatures ADM1023 chip Int/Ext temperatures
2004/05/26CERN Data Services: of 19 Tape Server Temperatures
2004/05/26CERN Data Services: of 19 Media Migration To 9940B (mainly from 9940A) 200GB – extra capacity avoids unnecessary acquisitions Better performance – though hard to benefit in normal chaotic mode Reduced errors; fewer interventions 1-2% of A tapes can not be read (extremely slow) on B drives Have not been able to return all A-drives
2004/05/26CERN Data Services: of 19 Tape Service Developments Removing tails… Tracking of all tape errors (18 months) Retiring of problematic media Proactive retiring of heavily used media (>5000 mounts) repack on new media Checksums Populated writing to tape Verified loading back to disk 22% already after few weeks
2004/05/26CERN Data Services: of 19 Water Cooled Tapes! Plumbing error! 5000 tapes disabled for a few days 550 superficially wet 152 seriously wet – visually inspected
2004/05/26CERN Data Services: of 19 Tape Storage Futures Commodity drive studies LTO-2 (Collaboratively CASPUR/Valencia) Test and evaluate High-end drives IBM 3592 STK NGD Other STK offerings SL8500 robotics and silos Indigo; managed storage, tape virtualisation
2004/05/26CERN Data Services: of 19 GRID Data Management GridFTP + SRM servers (Former) Standalone / experiment dedicated Hard to intervene; not scalable New load-balanced 6 node Service castorgrid.cern.ch SRM modifications to support operate behind load balancer GridFTP standalone client Retire ftp and bbftp access to CASTOR