Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Lustre ® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast) Torben Kling Petersen, PhD Principal Architect.

Similar presentations


Presentation on theme: "Advanced Lustre ® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast) Torben Kling Petersen, PhD Principal Architect."— Presentation transcript:

1 Advanced Lustre ® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast) Torben Kling Petersen, PhD Principal Architect High Performance Computing

2 2 The Challenge

3 © Xyratex File system –Up/down –Slow –Fragmented –Capacity planning –HA (Fail-overs etc) Hardware –Nodes crashing –Components breaking –FRUs –Disk rebuilds –Cables ?? The REAL challenge Software –Upgrades / patches ?? –Bugs –Clients –Quotas –Workload optimization Other –Documentation –Scalability –Power consumption –Maintenance windows –Back-ups

4 © Xyratex Tightly integrated solutions –Hardware –Software –Support Extensive testing Clear roadmaps In-depth training Even more extensive testing ….. The Answer ??

5 © Xyratex ClusterStor Software Stack Overview ClusterStor 6000 Embedded Application Server Intel Sandy Bridge CPU, up to 4 DIMM slots FDR & 40GbE F/E, SAS-2 (6G) B/E SBB v2 Form Factor, PCIe Gen-3 Embedded RAID & Lustre support CS 6000 SSU Lustre File System (2.x) ClusterStor Manager Data Protection Layer (RAID 6 / PD-RAID) Data Protection Layer (RAID 6 / PD-RAID) Linux OS Unified System Management (GEM-USM) Unified System Management (GEM-USM) Embedded server modules

6 6 ClusterStor dashboard Problems found

7 7 Hardware inventory ….

8 8

9 9 Finding problems ???

10 10 But things brake …. Especially disk drives … What then ???

11 © Xyratex Large systems use many HDDs to deliver both performance and capacity –NCSA BW uses 17,000+ HDDs for the main scratch FS –At 3% AFR this means 531 HDDs fail annually –Thats ~1.5 drives per day !!!! –RAID 6 rebuild time under use is 24 – 36 hours Bottom line, the scratch system would NEVER be fully operational and there would constantly be a risk of loosing additional drives leading to data loss !! Lets do some math ….

12 © Xyratex Xyratex pre-tests all drives used in ClusterStor solutions Each drive is subjected to hours of intense I/O Reads and writes are performed to all sectors Ambient temperature cycles between 40 °C and 5°C Any drive surviving, goes on to additional testing As a result Xyratex disk drives deliver proven reliability with less that 0.3% annual failure rate Real Life Impact On a large system such as NCSA BlueWaters with 17,000+ disk drives, this means a predicted failure of 50 drives per year *Other vendors publically state a failure rate of 3%* which (given equivalent number of disk drives) means 500+ drive failures per year With fairly even distribution, the file system will ALWAYS be in a state of rebuild In addition as a file system with wide stripes will perform according to the slowest OST, the entire system will always run in degraded mode ….. Drive Technology/Reliability *DDN, Keith Miller, LUG 2012

13 © Xyratex Annual Failure Rate of Xyratex Disks Actual AFR Data (2012/13) Experienced by Xyratex Sourced SAS Drives Xyratex drive failure rate is less than half of industry standard ! At 0.3%, the annual failure would be 53 HDDs

14 14 As growth in areal density growth slows (<25% per generation), disk drive manufacturers are having to increase the number of heads/platters per drive to continue to increase max capacity per drive y/y 2TB drives today typically includes just 5 heads and 3 platters 6TB drives in 2014 will include a minimum of 12 heads and 6 platters More components will inevitably result in an increase in disk drive failures in the field Therefore systems using 6TB must be able to handle the increase in the number of array rebuild events Evolution of HDD technology: Impacts System Rebuild Time

15 15 The three key factors you must consider are drive reliability, drive size and the rebuild rate of your system The scary fact is: new generation HDD, bigger drives will fail more often Such drive failures are even more impactful on the file system performance and the risk of data loss when using bigger drives such as 6TB or larger !! The rebuild window is bigger and risk of data loss is greater Traditional RAID technology will take up to days to rebuild a single failed 6TB drive Therefore Parity De-clustered RAID Rebuild technology is essential for any HPC system Why Does HDD Reliability Matter?

16 16

17 17 PD RAID geometry for an array is defined as: P drive (N+K+A) example: 41 (8+2+2) P is the total number of disks in the array N is the number of data blocks per stripe K is the number of Parity blocks per stripe A is the number of distributed spare disk drives Parity Declustered RAID - Geometry

18 18 Rebuild speed increased by more than 3.5 x No SSDs, no NV-RAM, no accelerators ….. PD-RAID as it was meant to be … Grid RAID advantage

19 Thank you ….


Download ppt "Advanced Lustre ® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast) Torben Kling Petersen, PhD Principal Architect."

Similar presentations


Ads by Google