Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experience of Lustre at a Tier-2 site

Similar presentations


Presentation on theme: "Experience of Lustre at a Tier-2 site"— Presentation transcript:

1 Experience of Lustre at a Tier-2 site
Alex Martin + Christopher J. Walker Queen Mary, University of London

2 Why Lustre? Posix compliant High performance Scalable Free (GPL)
Used on large fraction of top supercomputers Able to stripe files if needed Scalable Performance should scale with number of OSTs Tested with 25,000 Clients 450 OSSs (1000 OSTs) Max filesize 2^64 bytes Free (GPL) Source available (Paid support available)

3 QMUL 2008 Lustre Setup 12 OSS (290 TiB) MDS Rack Switches Worker Nodes
10GigE MDS Failover pair Rack Switches 10GigE uplink Worker Nodes E5420 – 2*GigE Opteron – GigE Xeon - GigE

4 Number of machines 2 Threads, 1MB block size 3.5 GB/s max transfer
Probably limited by network to racks used

5 StoRM Architcture Storm Traditional SE StoRM

6 HammerCloud 718 WMS Scales well to about ~600 jobs
Events (24h) - 155/4490 Job failures (3.4%) Scales well to about ~600 jobs

7 2011 Upgrade Design criteria
Maximise storage provided needed ~1PB Sufficient performance, we also upgraded #cores from 1500 to ~3000 - Goal to be able to run ~3000 ATLAS analysis jobs with high efficiency - Storage bandwidth matches compute bandwidth. Cost!!!

8 Upgrade Design criteria
Considered both “Fat” servers with 36 x 2 TB drives and “Thin” servers 12 x 2 TB drives Similar total cost (including networking). Chose “Thin” solution - more bandwidth - more flexibility - One OST/node (although currently the is a 16 TB ext4 limit) Maximise storage provided needed ~1PB Sufficient performance - Aim to be able to run ~2500 ATLAS analysis jobs with high efficiency - Storage bandwidth matches compute bandwidth. Cost!!! Considered both “Fat” servers with 36 x 2 TB drives and “Thin” servers 12 x 2 TB drives Similar total cost (including networking). Chose “Thin” solution - more bandwidth - more flexibility - One OST/node (althouh

9 New Hardware 60 * Dell R510 12*2TB SATA disk H700 RAID controller
12 Gig RAM 4 * 1GbE (4 with 10 GbE) Total ~1.1 PB Formatted (integrate with legacy kit to give ~ 1.4 PB)

10 Lustre “Brick” (half Rack)
HP 2900 Switch (legacy) 48 ports (24 storage, 24 compute) 10Gig uplink (could go to 2) 6 * Storage nodes 6 * Dell R510 4*GigE 12*2TB disk ( ~19 TB RAID6) 12 * Compute node 3 * Dell C6100 (contains 4 motherboards) 2*GigE Total of 144 (288) cores and ~110 TB ( Storage is better coupled to local CN's)

11 Old QMUL Network

12 New QMUL Network 48 *1Gig per switch 24 – storage 24 – CPU

13 The real thing Hepspec06 RAL disk thrashing scripts
2 machines low (power saving mode) RAL disk thrashing scripts 1 Backplane failure 2 disk failures 10Gig cards in x8 slots

14 RAID6 Storage Performance R510
Disk ~ 600M/s Performance well matched to 4 x Gb/s network

15 Lustre Block (+ Rack) performance
Preliminary tests using iozone 1-24 clients 8 threads/node Network Limit 6 GB/s

16 Ongoing and Work Need to tune performance
Integrate legacy storage into new Lustre filestore Starting to investigate other filesystems particularly Hadoop.

17 Conclusions Have successfully deployed a ~1PB
Lustre filesystem using low cost hardware Required performance. Would scale further with more “Bricks” but would be better if Grid jobs could be localized to a specific “Brick” Would be better if the storage/CPU could be more closely integrated.

18 Conclusions 2 The storage nodes contain 18% of the
CPU cores in the cluster. And we spend a lot of effort networking these to the CPU. It would be better (and cheaper) if these could use directly for processing the data Could be achieved using Lustre pools (or other filesystem such as hadoop)

19

20

21 WMS Throughput (HC 582) Scales well to about ~600 jobs

22 Overview Design Network Hardware Performance StoRM Conclusions


Download ppt "Experience of Lustre at a Tier-2 site"

Similar presentations


Ads by Google