Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fermilab Mass Storage System Gene Oleynik Integrated Administration, Fermilab.

Similar presentations


Presentation on theme: "Fermilab Mass Storage System Gene Oleynik Integrated Administration, Fermilab."— Presentation transcript:

1 Fermilab Mass Storage System Gene Oleynik Integrated Administration, Fermilab

2 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Introduction Fermi National Accelerator Laboratory (FNAL) Fermi National Accelerator Laboratory (FNAL) The premier, highest energy, particle accelerator in the world (for about two more years) The premier, highest energy, particle accelerator in the world (for about two more years) Accelerate protons and anti-protons in a 4 mile circle and crash them together in collision halls to study elementary particle physics Accelerate protons and anti-protons in a 4 mile circle and crash them together in collision halls to study elementary particle physics Digitized data from the experiments from experiments written into the FNAL Mass Storage System Digitized data from the experiments from experiments written into the FNAL Mass Storage System Overview of the FNAL Mass Storage System Overview of the FNAL Mass Storage System Users and use pattern of the system Users and use pattern of the system Details on MSS with emphasis on features that enable scalability and performance Details on MSS with emphasis on features that enable scalability and performance

3 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Introduction Multi-Petabyte tertiary tape store for world-wide HEP and other scientific endeavors. THE site store Multi-Petabyte tertiary tape store for world-wide HEP and other scientific endeavors. THE site store High Availability (24x7) High Availability (24x7) On and off-site (off- continent) access On and off-site (off- continent) access Scalable Hardware and Software Architecture Scalable Hardware and Software Architecture Front-end disk caching Front-end disk caching Evolves to meet evolving requirements Evolves to meet evolving requirements

4 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Users Local HEP experiments: CDF, D0, minos, mini-boone (> 1 PB/yr, 25 TB/day peak transfers). \ Local HEP experiments: CDF, D0, minos, mini-boone (> 1 PB/yr, 25 TB/day peak transfers). \ Remote HEP experiments: Tier 1 site for CMS via Grid from CERN Switzerland ( 3.5 PB/yr 2007+) Remote HEP experiments: Tier 1 site for CMS via Grid from CERN Switzerland ( 3.5 PB/yr 2007+) Other remote scientific endeavors: Sloan Digital Sky Survey (ship dlt tapes written at Apache Point Observatory, New Mexico), auger (Argentina) Other remote scientific endeavors: Sloan Digital Sky Survey (ship dlt tapes written at Apache Point Observatory, New Mexico), auger (Argentina) Quite a few others: Lattice QCD theory for example Quite a few others: Lattice QCD theory for example

5 4/12/2005 Gene Oleynik, Fermilab Mass Storage System The DZero Experiment The CDF Experiment And Many Others KTev, Minos, Mini-boone, … The CMS Experiment Sloan Digital Sky Survey

6 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Users write RAW data to the MSS, analyze/reanalyze it in real-time on PC farms, then write results into the MSS Users write RAW data to the MSS, analyze/reanalyze it in real-time on PC farms, then write results into the MSS ~3 bytes read for every byte written ~3 bytes read for every byte written Distribution of cartridge mounts skewed to large numbers of mounts Distribution of cartridge mounts skewed to large numbers of mounts Lifetime of data on the order 5-10 years or moreLifetime of data on the order 5-10 years or more

7 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Plot of User Activity

8 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Software enstore enstore In-house, manages files, tape volumes, tape libraries In-house, manages files, tape volumes, tape libraries End-user direct interface to files on tape End-user direct interface to files on tape dCache dCache Joint DESY and Fermilab, disk caching front-end Joint DESY and Fermilab, disk caching front-end End user interface to read cached files, write files to enstore indirectly via dCache End user interface to read cached files, write files to enstore indirectly via dCache pnfs pnfs DESY file namespace server (nfs interface) used by both enstore and dCache DESY file namespace server (nfs interface) used by both enstore and dCache

9 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Current Facilities 6 Storage Tek 9310 libraries (9940B, 9940A) 6 Storage Tek 9310 libraries (9940B, 9940A) 1 3-quadratower AML/2 library (LTO-1, LTO-2, DLT, 2 quads implemented) 1 3-quadratower AML/2 library (LTO-1, LTO-2, DLT, 2 quads implemented) > 100 tape drives and associated mover computers > 100 tape drives and associated mover computers 25 enstore server computers 25 enstore server computers ~100 dCache nodes with ~225 Terabytes of raid disk ~100 dCache nodes with ~225 Terabytes of raid disk Allocated amongst 3 instantiations of enstore: Allocated amongst 3 instantiations of enstore: cdfen (CDF experiment) – 2 9310 + 40 drives + plenty dCache cdfen (CDF experiment) – 2 9310 + 40 drives + plenty dCache D0en (D0 experiment) – 2 9310 + 40 drive, 1 AML/2 quad + 20 drives D0en (D0 experiment) – 2 9310 + 40 drive, 1 AML/2 quad + 20 drives Stken (General purpose + CMS) – 2 9310 + 23 drives 1 AML/2 quad (dlts) + 8 drives Stken (General purpose + CMS) – 2 9310 + 23 drives 1 AML/2 quad (dlts) + 8 drives Over 25000 tapes with 2.6 P of data Over 25000 tapes with 2.6 P of data

10 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Architecture File based, user presented with nfs namespace. ls, rm, etc. but files are stubs File based, user presented with nfs namespace. ls, rm, etc. but files are stubs Copy program to move files in and out of mass storage, encp Copy program to move files in and out of mass storage, encp Copy program to move files in and out of mass storage via dCache, dccp, ftp, gridftp, SRM Copy program to move files in and out of mass storage via dCache, dccp, ftp, gridftp, SRM Details of the storage all hidden behind the file system interface and copy interfaces Details of the storage all hidden behind the file system interface and copy interfaces

11 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Architecture Based on commodity PCs running Linux. Typically dual Xeon 2.4GHz Based on commodity PCs running Linux. Typically dual Xeon 2.4GHz Mover nodes for data transfer Mover nodes for data transfer Server nodes for managing metadata – library, volume, file, pnfs name-space, logs, media changer. External RAID Arrays. postgreSQL Server nodes for managing metadata – library, volume, file, pnfs name-space, logs, media changer. External RAID Arrays. postgreSQL Administrative, monitor, door, and pool nodes for dCache Administrative, monitor, door, and pool nodes for dCache

12 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Architecture Technology independent Technology independent Media changers isolate library technology Media changers isolate library technology Movers isolate tape drive technology Movers isolate tape drive technology Flexible policies for managing requests and accommodating rates Flexible policies for managing requests and accommodating rates Fair share for owners of drives Fair share for owners of drives Request Priority Request Priority Family width Family width

13 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Architecture Number of nodes of any type unrestricted Number of nodes of any type unrestricted Number of active users is unrestricted Number of active users is unrestricted Scale Scale storage by building out libraries storage by building out libraries transfer rates by building out movers, tape drives transfer rates by building out movers, tape drives Request load by building out servers Request load by building out servers Clients needs are somewhat unpredictable, typically must scale on demand as needed Clients needs are somewhat unpredictable, typically must scale on demand as needed

14 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Managing File & Metadata Integrity File protection procedural policies File protection procedural policies File removed by user are not deleted File removed by user are not deleted Separation of roles – owner in the loop on recycles Separation of roles – owner in the loop on recycles Physical protection against modification of filled tapes Physical protection against modification of filled tapes cloning of problem or tapes with large mount counts cloning of problem or tapes with large mount counts File protection software policies File protection software policies Extensive CRC checking throughout all transactions Extensive CRC checking throughout all transactions Automated periodic CRC checking of randomly selected files on tapes Automated periodic CRC checking of randomly selected files on tapes Flexible read-after-write policy Flexible read-after-write policy Automated CRC checking of dCache files Automated CRC checking of dCache files Automated disabling of access to tapes or tape drives that exhibit read or write problems. Automated disabling of access to tapes or tape drives that exhibit read or write problems.

15 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Managing File & Metadata Integrity Metadata protection Metadata protection Redundancy in database information (file, volume, pnfs) Redundancy in database information (file, volume, pnfs) RAID RAID Backups/journaling – 1-4 hr cyclic and archived to tape Backups/journaling – 1-4 hr cyclic and archived to tape Automated database Checks Automated database Checks Replicated databases (soon) Replicated databases (soon)

16 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Migration & technology Migrate to denser media to free storage space Migrate to denser media to free storage space Evolve to newer tape technologies Evolve to newer tape technologies Normal part of workflow – users never loose access to files. Normal part of workflow – users never loose access to files. Built in tools to perform migration, but staff always in the loop Built in tools to perform migration, but staff always in the loop Recent efforts: Recent efforts: 2004 CDF: 4457 60GB 9940A to 200GB 9940B cartridges. 3000 slots and tapes freed for reuse. Only 42 file problems encountered and fixed. 2004 CDF: 4457 60GB 9940A to 200GB 9940B cartridges. 3000 slots and tapes freed for reuse. Only 42 file problems encountered and fixed. 2004-2005 stken: 1240 20GB Eagle tapes to 200GB 9940B tapes on general purpose stken system freeing 1000 slots. 2004-2005 stken: 1240 20GB Eagle tapes to 200GB 9940B tapes on general purpose stken system freeing 1000 slots. 2005 Migration of 9940A to 9940B on stken about to start 2005 Migration of 9940A to 9940B on stken about to start And on it goes. Have LTO-1 to LT0-2 still to migrate And on it goes. Have LTO-1 to LT0-2 still to migrate

17 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Administration & Maintenance The most difficult aspect to scale The most difficult aspect to scale Real time 24 hours/day requires diligent monitoring Real time 24 hours/day requires diligent monitoring 4 Administrators, 3 enstore, 3.5 dCache developers 4 Administrators, 3 enstore, 3.5 dCache developers 24x7 vendor support on tape libraries and tape drives. 24x7 vendor support on tape libraries and tape drives.

18 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Administrators Rotate 24x7 on call primary and secondary Rotate 24x7 on call primary and secondary Monitor and maintain 7 tape libraries with > 100 tape drives, 150 PCs, 100 file servers Monitor and maintain 7 tape libraries with > 100 tape drives, 150 PCs, 100 file servers Recycle volumes Recycle volumes Monitor capacity vs. use Monitor capacity vs. use Clone overused tapes Clone overused tapes Troubleshoot problems Troubleshoot problems Install new hardware and software Install new hardware and software

19 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Administrative support Automatic generation of alarms by Enstore and dCache software. Generate tickets and page administrators. Automatic generation of alarms by Enstore and dCache software. Generate tickets and page administrators. In-house on-hour helpdesk/off hour call centers generate pages and tickets. In-house on-hour helpdesk/off hour call centers generate pages and tickets. Extensive web based plots, logs, status, statistics and metrics, alarms, and system information. Extensive web based plots, logs, status, statistics and metrics, alarms, and system information. Animated graphical view of Enstore resources and data flows. Animated graphical view of Enstore resources and data flows.

20 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Administration Monitoring States of the Enstore servers States of the Enstore servers Amount of Enstore resources such as tape quotas, number of working movers Amount of Enstore resources such as tape quotas, number of working movers User request queues User request queues Plots of data movement, throughput, and tape mounts Plots of data movement, throughput, and tape mounts Volume information Volume information Generated alarms Generated alarms Completed transfers Completed transfers Computer uptime and accessibility Computer uptime and accessibility Resource usage (memory, CPU, disk space) Resource usage (memory, CPU, disk space) Status and space utilization on dCache systems Status and space utilization on dCache systems

21 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Admin metrics, random week 1 9940B, 1 9940A, 1 LTO2 replaced 1 9940B, 1 9940A, 1 LTO2 replaced 3 mover interventions 3 mover interventions 4 server interventions 4 server interventions 2 tape drive interventions 2 tape drive interventions 2 fileserver (dCache) interventions 2 fileserver (dCache) interventions 3 tape interventions 3 tape interventions 4 file interventions 40 tapes labeled/entered 2 tapes cloned 3 Enstore service requests 1 data integrity issue

22 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Administration Observations Find for every drive we own, we replace 1/yr. Find for every drive we own, we replace 1/yr. With our usage patterns, we havent discerned cartridge/drive reliability differences between LTO and 9940 With our usage patterns, we havent discerned cartridge/drive reliability differences between LTO and 9940 Lots of tape, drive interventions Lots of tape, drive interventions Large distributed system requires complex correlation of information to diagnose Large distributed system requires complex correlation of information to diagnose

23 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Performance

24 4/12/2005 Gene Oleynik, Fermilab Mass Storage System Conclusion Constructed a Multi-Petabyte tertiary store that can easily scale to meet storage, data integrity, transaction, and transfer rate demands Constructed a Multi-Petabyte tertiary store that can easily scale to meet storage, data integrity, transaction, and transfer rate demands Can support different nd new underlying tape and library technologies Can support different nd new underlying tape and library technologies Current store is 2.6PB @ 1PB/yr, 20TB/day expect to increase fourfold in 2007 (CMS tier 1 goes online) Current store is 2.6PB @ 1PB/yr, 20TB/day expect to increase fourfold in 2007 (CMS tier 1 goes online)


Download ppt "Fermilab Mass Storage System Gene Oleynik Integrated Administration, Fermilab."

Similar presentations


Ads by Google