Fermilab Mass Storage System Gene Oleynik Integrated Administration, Fermilab.

Slides:



Advertisements
Similar presentations
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Advertisements

The RHIC-ATLAS Computing Facility at BNL HEPIX – Edinburgh May 24-28, 2004 Tony Chan RHIC Computing Facility Brookhaven National Laboratory.
GridKa May 2004 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Installing dCache into an existing Storage environment at GridKa Forschungszentrum.
GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.
Computing Infrastructure
Andrew Hanushevsky7-Feb Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Jos van Wezel Doris Ressmann GridKa, Karlsruhe TSM as tape storage backend for disk pool managers.
XenData SXL-5000 LTO Archive System Turnkey video archive system with near-line LTO capacities scaling from 210 TB to 1.18 PB, designed for the demanding.
XenData SX-520 LTO Archive Servers A series of archive servers based on IT standards, designed for the demanding requirements of the media and entertainment.
XenData SXL-3000 LTO Archive System Turnkey video archive system with near-line LTO capacities scaling from 150 TB to 750 TB, designed for the demanding.
Designing Storage Architectures for Preservation Collections Library of Congress, September 17-18, 2007 Preservation and Access Repository Storage Architecture.
CT NIKHEF June File server CT system support.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
Module 10 Configuring and Managing Storage Technologies.
The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
STEALTH Content Store for SharePoint using Caringo CAStor  Boosting your SharePoint to the MAX! "Optimizing your Business behind the scenes"
Virtualization in the NCAR Mass Storage System Gene Harano National Center for Atmospheric Research Scientific Computing Division High Performance Systems.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Lower Storage projects Alexander Moibenko 02/19/2003.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Storage and File Structure Storage and File Structure.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
08/30/05GDM Project Presentation Lower Storage Summary of activity on 8/30/2005.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
Fermi National Accelerator Laboratory SC2006 Fermilab Data Movement & Storage Multi-Petabyte tertiary automated tape store for world- wide HEP and other.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Remote Control Room and SAM DH Shifts at KISTI for CDF Experiment 김현우, 조기현, 정민호 (KISTI), 김동희, 양유철, 서준석, 공대정, 김지은, 장성현, 칸 아딜 ( 경북대 ), 김수봉, 이재승, 이영장, 문창성,
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
Outline: Tasks and Goals The analysis (physics) Resources Needed (Tier1) A. Sidoti INFN Pisa.
ATLAS Tier 1 at BNL Overview Bruce G. Gibbard Grid Deployment Board BNL 5-6 September 2006.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
High Energy FermiLab Two physics detectors (5 stories tall each) to understand smallest scale of matter Each experiment has ~500 people doing.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
Oct 24, 2002 Michael Ernst, Fermilab DRM for Tier1 and Tier2 centers Michael Ernst Fermilab February 3, 2003.
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
Storage and Data Movement at FNAL D. Petravick CHEP 2003.
Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
GridKa December 2004 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann dCache Implementation at FZK Forschungszentrum Karlsruhe.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
DCache/XRootD Dmitry Litvintsev (DMS/DMD) FIFE workshop1Dmitry Litvintsev.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
© 2012 IBM Corporation IBM Linear Tape File System (LTFS) Overview and Demo.
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
Patrick Gartung 1 CMS 101 Mar 2007 Introduction to the User Analysis Facility (UAF) Patrick Gartung - Fermilab.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
TEVATRON DATA CURATION UPDATE Gene Oleynik, Fermilab Department Head, Data Movement and Storage 1.
BEHIND THE SCENES LOOK AT INTEGRITY IN A PERMANENT STORAGE SYSTEM Gene Oleynik, Jon Bakken, Wayne Baisley, David Berg, Eileen Berman, Chih-Hao Huang, Terry.
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
Compute and Storage For the Farm at Jlab
Managing Explosive Data Growth
Quick Look on dCache Monitoring at FNAL
Luca dell’Agnello INFN-CNAF
Real IBM C exam questions and answers
Overview Introduction VPS Understanding VPS Architecture
Lee Lueking D0RACE January 17, 2002
INFNGRID Workshop – Bari, Italy, October 2004
Presentation transcript:

Fermilab Mass Storage System Gene Oleynik Integrated Administration, Fermilab

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Introduction Fermi National Accelerator Laboratory (FNAL) Fermi National Accelerator Laboratory (FNAL) The premier, highest energy, particle accelerator in the world (for about two more years) The premier, highest energy, particle accelerator in the world (for about two more years) Accelerate protons and anti-protons in a 4 mile circle and crash them together in collision halls to study elementary particle physics Accelerate protons and anti-protons in a 4 mile circle and crash them together in collision halls to study elementary particle physics Digitized data from the experiments from experiments written into the FNAL Mass Storage System Digitized data from the experiments from experiments written into the FNAL Mass Storage System Overview of the FNAL Mass Storage System Overview of the FNAL Mass Storage System Users and use pattern of the system Users and use pattern of the system Details on MSS with emphasis on features that enable scalability and performance Details on MSS with emphasis on features that enable scalability and performance

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Introduction Multi-Petabyte tertiary tape store for world-wide HEP and other scientific endeavors. THE site store Multi-Petabyte tertiary tape store for world-wide HEP and other scientific endeavors. THE site store High Availability (24x7) High Availability (24x7) On and off-site (off- continent) access On and off-site (off- continent) access Scalable Hardware and Software Architecture Scalable Hardware and Software Architecture Front-end disk caching Front-end disk caching Evolves to meet evolving requirements Evolves to meet evolving requirements

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Users Local HEP experiments: CDF, D0, minos, mini-boone (> 1 PB/yr, 25 TB/day peak transfers). \ Local HEP experiments: CDF, D0, minos, mini-boone (> 1 PB/yr, 25 TB/day peak transfers). \ Remote HEP experiments: Tier 1 site for CMS via Grid from CERN Switzerland ( 3.5 PB/yr 2007+) Remote HEP experiments: Tier 1 site for CMS via Grid from CERN Switzerland ( 3.5 PB/yr 2007+) Other remote scientific endeavors: Sloan Digital Sky Survey (ship dlt tapes written at Apache Point Observatory, New Mexico), auger (Argentina) Other remote scientific endeavors: Sloan Digital Sky Survey (ship dlt tapes written at Apache Point Observatory, New Mexico), auger (Argentina) Quite a few others: Lattice QCD theory for example Quite a few others: Lattice QCD theory for example

4/12/2005 Gene Oleynik, Fermilab Mass Storage System The DZero Experiment The CDF Experiment And Many Others KTev, Minos, Mini-boone, … The CMS Experiment Sloan Digital Sky Survey

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Users write RAW data to the MSS, analyze/reanalyze it in real-time on PC farms, then write results into the MSS Users write RAW data to the MSS, analyze/reanalyze it in real-time on PC farms, then write results into the MSS ~3 bytes read for every byte written ~3 bytes read for every byte written Distribution of cartridge mounts skewed to large numbers of mounts Distribution of cartridge mounts skewed to large numbers of mounts Lifetime of data on the order 5-10 years or moreLifetime of data on the order 5-10 years or more

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Plot of User Activity

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Software enstore enstore In-house, manages files, tape volumes, tape libraries In-house, manages files, tape volumes, tape libraries End-user direct interface to files on tape End-user direct interface to files on tape dCache dCache Joint DESY and Fermilab, disk caching front-end Joint DESY and Fermilab, disk caching front-end End user interface to read cached files, write files to enstore indirectly via dCache End user interface to read cached files, write files to enstore indirectly via dCache pnfs pnfs DESY file namespace server (nfs interface) used by both enstore and dCache DESY file namespace server (nfs interface) used by both enstore and dCache

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Current Facilities 6 Storage Tek 9310 libraries (9940B, 9940A) 6 Storage Tek 9310 libraries (9940B, 9940A) 1 3-quadratower AML/2 library (LTO-1, LTO-2, DLT, 2 quads implemented) 1 3-quadratower AML/2 library (LTO-1, LTO-2, DLT, 2 quads implemented) > 100 tape drives and associated mover computers > 100 tape drives and associated mover computers 25 enstore server computers 25 enstore server computers ~100 dCache nodes with ~225 Terabytes of raid disk ~100 dCache nodes with ~225 Terabytes of raid disk Allocated amongst 3 instantiations of enstore: Allocated amongst 3 instantiations of enstore: cdfen (CDF experiment) – drives + plenty dCache cdfen (CDF experiment) – drives + plenty dCache D0en (D0 experiment) – drive, 1 AML/2 quad + 20 drives D0en (D0 experiment) – drive, 1 AML/2 quad + 20 drives Stken (General purpose + CMS) – drives 1 AML/2 quad (dlts) + 8 drives Stken (General purpose + CMS) – drives 1 AML/2 quad (dlts) + 8 drives Over tapes with 2.6 P of data Over tapes with 2.6 P of data

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Architecture File based, user presented with nfs namespace. ls, rm, etc. but files are stubs File based, user presented with nfs namespace. ls, rm, etc. but files are stubs Copy program to move files in and out of mass storage, encp Copy program to move files in and out of mass storage, encp Copy program to move files in and out of mass storage via dCache, dccp, ftp, gridftp, SRM Copy program to move files in and out of mass storage via dCache, dccp, ftp, gridftp, SRM Details of the storage all hidden behind the file system interface and copy interfaces Details of the storage all hidden behind the file system interface and copy interfaces

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Architecture Based on commodity PCs running Linux. Typically dual Xeon 2.4GHz Based on commodity PCs running Linux. Typically dual Xeon 2.4GHz Mover nodes for data transfer Mover nodes for data transfer Server nodes for managing metadata – library, volume, file, pnfs name-space, logs, media changer. External RAID Arrays. postgreSQL Server nodes for managing metadata – library, volume, file, pnfs name-space, logs, media changer. External RAID Arrays. postgreSQL Administrative, monitor, door, and pool nodes for dCache Administrative, monitor, door, and pool nodes for dCache

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Architecture Technology independent Technology independent Media changers isolate library technology Media changers isolate library technology Movers isolate tape drive technology Movers isolate tape drive technology Flexible policies for managing requests and accommodating rates Flexible policies for managing requests and accommodating rates Fair share for owners of drives Fair share for owners of drives Request Priority Request Priority Family width Family width

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Architecture Number of nodes of any type unrestricted Number of nodes of any type unrestricted Number of active users is unrestricted Number of active users is unrestricted Scale Scale storage by building out libraries storage by building out libraries transfer rates by building out movers, tape drives transfer rates by building out movers, tape drives Request load by building out servers Request load by building out servers Clients needs are somewhat unpredictable, typically must scale on demand as needed Clients needs are somewhat unpredictable, typically must scale on demand as needed

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Managing File & Metadata Integrity File protection procedural policies File protection procedural policies File removed by user are not deleted File removed by user are not deleted Separation of roles – owner in the loop on recycles Separation of roles – owner in the loop on recycles Physical protection against modification of filled tapes Physical protection against modification of filled tapes cloning of problem or tapes with large mount counts cloning of problem or tapes with large mount counts File protection software policies File protection software policies Extensive CRC checking throughout all transactions Extensive CRC checking throughout all transactions Automated periodic CRC checking of randomly selected files on tapes Automated periodic CRC checking of randomly selected files on tapes Flexible read-after-write policy Flexible read-after-write policy Automated CRC checking of dCache files Automated CRC checking of dCache files Automated disabling of access to tapes or tape drives that exhibit read or write problems. Automated disabling of access to tapes or tape drives that exhibit read or write problems.

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Managing File & Metadata Integrity Metadata protection Metadata protection Redundancy in database information (file, volume, pnfs) Redundancy in database information (file, volume, pnfs) RAID RAID Backups/journaling – 1-4 hr cyclic and archived to tape Backups/journaling – 1-4 hr cyclic and archived to tape Automated database Checks Automated database Checks Replicated databases (soon) Replicated databases (soon)

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Migration & technology Migrate to denser media to free storage space Migrate to denser media to free storage space Evolve to newer tape technologies Evolve to newer tape technologies Normal part of workflow – users never loose access to files. Normal part of workflow – users never loose access to files. Built in tools to perform migration, but staff always in the loop Built in tools to perform migration, but staff always in the loop Recent efforts: Recent efforts: 2004 CDF: GB 9940A to 200GB 9940B cartridges slots and tapes freed for reuse. Only 42 file problems encountered and fixed CDF: GB 9940A to 200GB 9940B cartridges slots and tapes freed for reuse. Only 42 file problems encountered and fixed stken: GB Eagle tapes to 200GB 9940B tapes on general purpose stken system freeing 1000 slots stken: GB Eagle tapes to 200GB 9940B tapes on general purpose stken system freeing 1000 slots Migration of 9940A to 9940B on stken about to start 2005 Migration of 9940A to 9940B on stken about to start And on it goes. Have LTO-1 to LT0-2 still to migrate And on it goes. Have LTO-1 to LT0-2 still to migrate

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Administration & Maintenance The most difficult aspect to scale The most difficult aspect to scale Real time 24 hours/day requires diligent monitoring Real time 24 hours/day requires diligent monitoring 4 Administrators, 3 enstore, 3.5 dCache developers 4 Administrators, 3 enstore, 3.5 dCache developers 24x7 vendor support on tape libraries and tape drives. 24x7 vendor support on tape libraries and tape drives.

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Administrators Rotate 24x7 on call primary and secondary Rotate 24x7 on call primary and secondary Monitor and maintain 7 tape libraries with > 100 tape drives, 150 PCs, 100 file servers Monitor and maintain 7 tape libraries with > 100 tape drives, 150 PCs, 100 file servers Recycle volumes Recycle volumes Monitor capacity vs. use Monitor capacity vs. use Clone overused tapes Clone overused tapes Troubleshoot problems Troubleshoot problems Install new hardware and software Install new hardware and software

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Administrative support Automatic generation of alarms by Enstore and dCache software. Generate tickets and page administrators. Automatic generation of alarms by Enstore and dCache software. Generate tickets and page administrators. In-house on-hour helpdesk/off hour call centers generate pages and tickets. In-house on-hour helpdesk/off hour call centers generate pages and tickets. Extensive web based plots, logs, status, statistics and metrics, alarms, and system information. Extensive web based plots, logs, status, statistics and metrics, alarms, and system information. Animated graphical view of Enstore resources and data flows. Animated graphical view of Enstore resources and data flows.

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Administration Monitoring States of the Enstore servers States of the Enstore servers Amount of Enstore resources such as tape quotas, number of working movers Amount of Enstore resources such as tape quotas, number of working movers User request queues User request queues Plots of data movement, throughput, and tape mounts Plots of data movement, throughput, and tape mounts Volume information Volume information Generated alarms Generated alarms Completed transfers Completed transfers Computer uptime and accessibility Computer uptime and accessibility Resource usage (memory, CPU, disk space) Resource usage (memory, CPU, disk space) Status and space utilization on dCache systems Status and space utilization on dCache systems

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Admin metrics, random week B, A, 1 LTO2 replaced B, A, 1 LTO2 replaced 3 mover interventions 3 mover interventions 4 server interventions 4 server interventions 2 tape drive interventions 2 tape drive interventions 2 fileserver (dCache) interventions 2 fileserver (dCache) interventions 3 tape interventions 3 tape interventions 4 file interventions 40 tapes labeled/entered 2 tapes cloned 3 Enstore service requests 1 data integrity issue

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Administration Observations Find for every drive we own, we replace 1/yr. Find for every drive we own, we replace 1/yr. With our usage patterns, we havent discerned cartridge/drive reliability differences between LTO and 9940 With our usage patterns, we havent discerned cartridge/drive reliability differences between LTO and 9940 Lots of tape, drive interventions Lots of tape, drive interventions Large distributed system requires complex correlation of information to diagnose Large distributed system requires complex correlation of information to diagnose

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Performance

4/12/2005 Gene Oleynik, Fermilab Mass Storage System Conclusion Constructed a Multi-Petabyte tertiary store that can easily scale to meet storage, data integrity, transaction, and transfer rate demands Constructed a Multi-Petabyte tertiary store that can easily scale to meet storage, data integrity, transaction, and transfer rate demands Can support different nd new underlying tape and library technologies Can support different nd new underlying tape and library technologies Current store is 1PB/yr, 20TB/day expect to increase fourfold in 2007 (CMS tier 1 goes online) Current store is 1PB/yr, 20TB/day expect to increase fourfold in 2007 (CMS tier 1 goes online)