Presentation is loading. Please wait.

Presentation is loading. Please wait.

NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.

Similar presentations


Presentation on theme: "NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham."— Presentation transcript:

1 NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham

2 2 Science Drivers Three different domains with different requirements High Performance Computing – Chemistry Low storage volumes (10 TB) High performance storage (>500MB/s per client, GB/s aggregate) POSIX access High Throughput Proteomics – Biology Large storage volumes (PB’s) and exploding Write once, read rarely if used as an archive Modes latency okay (<10s to data) If analysis could be done in place it would require faster storage Atmospheric Radiation Measurement - Climate Modest side storage requirements (100’s TB) Shared with community and replicated to ORNL

3 3 Overview The proteomics driven storage explosion is casing us to: Developing filesystems that enable lower cost hardware Continued write on fileserver failure (route around) Mirrored fileservers so we can use direct attached disk Increasing filesystem technology to meet scalability and performance metrics needed by the science 10,000+ clients accessing a POSIX 10+PB filesystem >500MB/s single client rate Add advanced technologies into the filesystem to increase performance and make it “smarter” Scalable content management Move the computation into the storage It must work in production (not a research project)

4 4 EMSL’s Current Storage Strategy EMSL’s Storage Strategy has focused on capacity We want to be here Our storage sales rep want us here We use tools like Lustre to help us bridge this gap. Estimated $/TB as a function of time and technology

5 5 EMSL’s Current Storage Strategy Developing filesystems that enable lower cost hardware Our experience has shown that expensive disks fail about as often as cheap disks. We have a large sampling of disks: 1,000 FC-SAN drives to make a 53TB filesystem 20% duty cycle – The drives don’t fail much (1-3 disks per month) Entire filesystem (all 1,000 drives) down once every two months. Mostly due to vendor required firmware updates to SAN switches or hardware failures. 7,500 SCSI drives to provide ½ PB of scratch space 100% duty cycle. Average ~3 disk failures per day (should be 0.5 per day). Experiencing bugs in the Seagate disks 1,000 ATA/SAN to provide 200TB archive 10% duty cycle. Average 1-3 disk failures per month

6 6 NWfs Hardware Low Cost, High Performance Storage We have replaced all our tapes with low-cost ATA storage. NWfs Project: Includes; Lustre, Cluster mgt tools, minor Metadata capturing tools and a custom client side GUI to support gridFTP, striped and parallel data transfers. Linux-based OSTs Containing: 2 CPU’s & RAM Multiple 3Ware ATA RAID Adapters 16 SATA Disk Drives “Hot-Swap” RAID5 with multiple hot spares per node. $3.5K/TB after RAID5 Infiniband 4X backbone New SATA drives include rotational vibration safeguard EMSL’s Current Storage Strategy 400TB ≈ $1.5M

7 7 Increasing filesystem technology to meet scalability and performance metrics needed by the science Lustre has been in full production since last Aug and used for aggressive IO from our supercomputer. Highly stable Still hard to manage We are expanding our use of Lustre to act as the filesystem for our archival storage. Deploying a ~400TB filesystem 660MB/s from a single client with a simple “dd” is faster than any local or global filesystem we have tested. We are finally in the era where global filesystems provide faster access

8 8 EMSL’s Current Storage Strategy Scalable Content Management Storage Pool1 Remote Storage Pool2 Client Remote Index Index3Index3 Index2Index2 MetaData Server Cluster MetaData

9 9 EMSL’s Current Storage Strategy Looks a lot like Lustre OSTOST Client Index3Index3 Index2Index2 MDSMDS

10 10 EMSL’s Current Storage Strategy Add replication to support DAS & collaboration OST Client Remote Index Index3 Index2 MDS

11 11 Active Storage Moving the computation into the storage rather than moving the data to the compute power. Data StreamParallel file systemReassemble & post process Data StreamPost process in object based parallel file system Classic parallel file systems stripe at the block level. This requires the distributed data to be reassembled in order to post process PNNL is developing code that will allow post processing to be performed on objects inside the file system and make use of the computational power on the file servers. Classical Storage Active Storage Demonstrated 1.3GB/s FT stream EMSL’s Current Storage Strategy

12 12 EMSL’s Current Storage Strategy NWFS V3.0 Lustre with replication, Content Mgt, Active StorageOSTOST OST Client Remote Index Index3Index3 Index2Index2 MDSMDS API


Download ppt "NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham."

Similar presentations


Ads by Google