Presentation on theme: "Advanced Lustre ® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast) Torben Kling Petersen, PhD Principal Architect."— Presentation transcript:
Advanced Lustre ® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast) Torben Kling Petersen, PhD Principal Architect High Performance Computing
14 As growth in areal density growth slows (<25% per generation), disk drive manufacturers are having to increase the number of heads/platters per drive to continue to increase max capacity per drive y/y 2TB drives today typically includes just 5 heads and 3 platters 6TB drives in 2014 will include a minimum of 12 heads and 6 platters More components will inevitably result in an increase in disk drive failures in the field Therefore systems using 6TB must be able to handle the increase in the number of array rebuild events Evolution of HDD technology: Impacts System Rebuild Time
15 The three key factors you must consider are drive reliability, drive size and the rebuild rate of your system The scary fact is: new generation HDD, bigger drives will fail more often Such drive failures are even more impactful on the file system performance and the risk of data loss when using bigger drives such as 6TB or larger !! The rebuild window is bigger and risk of data loss is greater Traditional RAID technology will take up to days to rebuild a single failed 6TB drive Therefore Parity De-clustered RAID Rebuild technology is essential for any HPC system Why Does HDD Reliability Matter?
17 PD RAID geometry for an array is defined as: P drive (N+K+A) example: 41 (8+2+2) P is the total number of disks in the array N is the number of data blocks per stripe K is the number of Parity blocks per stripe A is the number of distributed spare disk drives Parity Declustered RAID - Geometry
18 Rebuild speed increased by more than 3.5 x No SSDs, no NV-RAM, no accelerators ….. PD-RAID as it was meant to be … Grid RAID advantage