Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERN Disk Storage Technology Choices LCG-France Meeting April 8 th 2005 CERN.ch.

Similar presentations


Presentation on theme: "CERN Disk Storage Technology Choices LCG-France Meeting April 8 th 2005 CERN.ch."— Presentation transcript:

1 CERN Disk Storage Technology Choices LCG-France Meeting April 8 th 2005 Tony.Cass@ CERN.ch

2 Tony.Cass@ CERN.ch 2 History  99/00 – EIDE Disk server evaluation  2001 – Problem with IBM disks –But no serious worries about model at that stage  2002 – Continued expansion  2003 – Major problem with new servers –Significant impact on servers (and support staff!) –Entire low-cost disk server model questioned.  2004 – Western Digital admit disk problem –1224 disks replaced. –Confidence in low-cost model restored. »Added 75 with 1800 SATA disks for 175TB usable capacity »Installed base exceeds 500TB across ~350 servers  2005 – Plan to buy 500TB @ <3CHF/GB (usable)

3 Tony.Cass@ CERN.ch 3 Cost Evolution usable (RAID 5) gross Jumbos 4U…8U rackmount FC attached disk array

4 Tony.Cass@ CERN.ch 4 (Some) Options  SAN vs NAS  SCSI vs FC vs SATA  “In a box” vs server and trays  “White box” vs major vendor

5 Tony.Cass@ CERN.ch 5 (Some) Options  SAN vs NAS –SAN-style solutions not obviously advantageous for HEP use pattern—and require expensive infrastructure. –ISCSI maturing, but not there yet. –Could be great with a global file system, but that technology not mature either. –CERN choice: large scale storage as NAS »But exploring SAN for some special uses, e.g. tape->disk transfer  SCSI vs FC vs SATA  “In a box” vs server and trays  “White box” vs major vendor

6 Tony.Cass@ CERN.ch 6 (Some) Options  SAN vs NAS  SCSI vs FC vs EIDE/SATA –Common view is that EIDE/SATA disks are less reliable. »Most reliable platters integrated with higher value electronics –No evidence for lower reliability of EIDE vs SCSI at CERN. »MTBF for both ~200-250,000hrs »Historically, bad batch of disks (SCSI or EIDE) every 2-3years –But, note some SATA disks are rated for intermittent operation, others for 24x7 operation. –CERN choice: SATA disks rated for 24x7 operation »(high capacity 7,200rpm, not lower capacity 10,000rpm)  “In a box” vs server and trays  “White box” vs major vendor

7 Tony.Cass@ CERN.ch 7 (Some) Options  SAN vs NAS  SCSI vs FC vs SATA  “In a box” vs server and trays –Specialist disk trays seen by some as better quality than trays in PC server chassis. –Possibly true, but who is responsible if there are communication problems between the disk tray and the server? –CERN choice: integrated system from one vendor »“storage in a box” has won every tender to date. »work specifications to ensure high quality chassis & trays.  “White box” vs major vendor

8 Tony.Cass@ CERN.ch 8 (Some) Options  SAN vs NAS  SCSI vs FC vs SATA  “In a box” vs server and trays  “White box” vs major vendor –Major vendors claim better reliability… –… but are unable to explain how they achieve this »the underlying components are generally identical –CERN “choice”: free competition and white boxes win »but some white boxes are more equal than others; unfortunately CERN rules make prior selection of companies based on proven past performance rather difficult  »Long term relationship with at least 3 suppliers would be good.

9 Tony.Cass@ CERN.ch 9 RAID and filesystems  Originally mirrored the disks; redundancy with maximum performance (#independent spindles) –mirrored EIDE still cheaper than SCSI per usable byte!  Gradually became less worried about disk performance –required I/O bandwidth per TB falls with each tender; »current systems can saturate GigE interface »disk sizes continue to increase »observed performance still below server capability  Current CERN choice: hardware Raid5 with xfs –Hardware Raid5 performance has greatly improved –Reiserfs still immature –Some tests of hardware Raid5 with software Raid0; performance poor.

10 Tony.Cass@ CERN.ch 10 Hardware will fail

11 Tony.Cass@ CERN.ch 11 Hardware will fail  On delivery or due to systematic h/w problem –CERN choice: dual source major procurements  In service –RAIDx –Hot spares »Probability of 2 nd disk failure during RAID array rebuild is u a concern for 250GB disks u likely a significant problem for 400GB disks u a certainty for 1TB disks in large scale installations?  However, this is a concern for any architecture with an equivalent number of disks. –Remember: CERN sees equivalent MTBF figures for SCSI and EIDE disks. »Although SCSI disks are lower capacity and higher bandwidth so reducing window for 2 nd failure. –Be prepared…

12 Tony.Cass@ CERN.ch 12 Summary  CERN –will have a (Gigabit-)Ethernet based NAS configuration for bulk disk storage for LHC –is not convinced TCO concerns justify a higher initial purchase cost for SCSI/FC disk –buys (and will buy) SATA disk from the lowest bidder, but »with as much pre-selection of bidders as we are allowed, and »dual sourcing all purchases to minimise risk of major problems due to systematic failures. »with warranty (3years) to encourage initial quality –is focussing strongly on »redundancy, »rigour and organisation in operational procedures, and on »anonymity for disk servers, just as for CPU servers.

13 Tony.Cass@ CERN.ch 13 Summary  CERN –will have a (Gigabit-)Ethenet based NAS configuration for bulk disk storage for LHC –is not convinced TCO concerns justify a higher initial purchase cost for SCSI/FC disk –buys (and will buy) SATA disk from the lowest bidder, but »with as much pre-selection of bidders as we are allowed, and »dual sourcing all purchases to minimise risk of major problems due to systematic failures. –is focussing strongly on »redundancy, »rigour and organisation in operational procedures, and on »anonymity for disk servers, just as for CPU servers. These points are valid whatever the disk technology!


Download ppt "CERN Disk Storage Technology Choices LCG-France Meeting April 8 th 2005 CERN.ch."

Similar presentations


Ads by Google