Presentation is loading. Please wait.

Presentation is loading. Please wait.

Managing managed storage CERN Disk Server operations HEPiX 2004 / BNL Data Services team: Vladimír Bahyl, Hugo Caçote, Charles Curran, Jan van Eldik, David.

Similar presentations


Presentation on theme: "Managing managed storage CERN Disk Server operations HEPiX 2004 / BNL Data Services team: Vladimír Bahyl, Hugo Caçote, Charles Curran, Jan van Eldik, David."— Presentation transcript:

1 Managing managed storage CERN Disk Server operations HEPiX 2004 / BNL Data Services team: Vladimír Bahyl, Hugo Caçote, Charles Curran, Jan van Eldik, David Hughes, Gordon Lee, Tony Osborne, Tim Smith

2 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS2 Outline Which are our Data Services? Disk server hardware @ CERN Management tools Whats next?

3 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS3 A lot of hardware Disk storage 350 storage in a box Linux diskservers 6700 disks 550 TeraBytes of raw disk space Tape storage 2 robotic installations each with 5 STK 9310 silos 50 9940B drives, 14000 tapes, 2.8 PB 20 9840 drives, 8000 tapes, 160 TB

4 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS4 Many applications 200 CASTOR! 40 Oracle 20 CDR 10 AFS scratch dCache, LHC@home, …LHC@home LCG, OpenLab, EGEE, data challenges 40 in repair/spare A very heterogeneous environment! And very dynamic too

5 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS5 Players Many teams involved: Application responsibles / Users Service managers System administrators team Suppliers Software often not redundant… need to minimize downtime! …so the hardware should be!

6 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS6 Storage in a box 13 different hardware configurations: 8 – 26 IDE disks, hot-swappable trays 2 – 4 3-Ware RAID controllers 2 CPUs 2 – 3 power supplies GigE network card Should be redundant…

7 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS7 hardware interventions 55 interventions since Sep 1 disk replacements (70%) trays, cables, fans, PSU 33% involve (un)scheduled downtime Older hardware harder to maintain One supplier out of business Incidents to spice up life…

8 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS8 Disk replacement 10 months before case agreed: Head instabilities 4 weeks to execute 1224 disks exchanged (=18%); And the cages as well Christmas

9 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS9 65 Jumbos 1 – 1.5 TB raw disk space 6800 3-Ware controllers 600 MHz PIII No PXE Becoming hard to maintain Many still under warranty Make good mini-bars!

10 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS10 175 4U servers 4U (5U) rack mounted 1 – 1.5 TB 2 * 3-Ware 7000 series currently upgrading firmware 2 * 1 GHz PIIIs No PXE (yet) Various maintenance issues

11 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS11 115 8U servers 8U rack mounted 2 – 2.5 Tb 3 – 4 * 3-Ware 7500(6)-8 2 * 2.4 GHz Xeon Well controlled, well maintained, well behaved, after disk replacements

12 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS12 Diskserver evolution

13 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS13 That was then… HW RAID1 Ext2 filesystems many of them 13 different kernels! RedHat 6.1/6.2, 7.2/7.3, 2.1ES Need for automation + standardization ELFms toolsuite Quattor – installation + configuration LEMON – performance + exception monitoring LEAF – Hardware and State Management

14 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS14 …this is now RedHat 7.3, preparing for SLC3 Oracle: RHEL 2.1, preparing RHEL 3 kernel has old 3-Ware driver HW RAID5 + hot spare disk Up to 50% more usable space On 3-Ware 7000 controller with up-to-date firmware SW RAID0 + XFS Improved performance expected iozone benchmark Old XFS version Improved kernel / elevator tuning

15 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS15 Updating the toolbox SMART – to predict disk failure daily and weekly self-tests, on every disk IPMI v1.5 HW monitoring and event control Power control, resets Lm_sensors – temperature monitoring Hardware and software specific All data flows into Lemon repository

16 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS16 Wintertime?

17 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS17 This is now Quattorized + Lemonized Rely on Operator and SysAdmin teams Operated in same way as PC farms Getting more out of suppliers BIOS upgrade necessary for PXE enabling BTW: most applies to tapeservers as well

18 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS18 Whats next? New hardware 360 TB SATA in a box, 2 different suppliers 140 TB FC attached external SATA disk arrays New software SLC3, RHEL 3 New CASTOR stager New challenges Oracle SAN setup Alice data challenge

19 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS19 Conclusions A lot of work has been done to Stabilize Hardware and Software Automate + hand over basic operations Integrate into standard work flows Get more out of available hardware Achieved pro-active data management

20 19-Oct-04Jan.van.Eldik@cern.ch FIO/DS20 Useful links Standing on the shoulders of giants Tim Smith CHEP 2004 CHEP 2004 http://indico.cern.ch/contributionDisplay.py?contribId=374&sessionId=10&confId=0 Helge Meinhard CHEP 2004 CHEP 2004 http://indico.cern.ch/contributionDisplay.py?contribId=325&sessionId=10&confId=0 Peter Kelemen CERN IT After C5 CERN IT After C5 http://cern.ch/Peter.Kelemen/talk/2004/C5/diskserver Jan Iven HEPiX 2004 EdinburghHEPiX 2004 Edinburgh http://hepwww.rl.ac.uk/hepix/nesc/iven.pdf


Download ppt "Managing managed storage CERN Disk Server operations HEPiX 2004 / BNL Data Services team: Vladimír Bahyl, Hugo Caçote, Charles Curran, Jan van Eldik, David."

Similar presentations


Ads by Google