Experiences and Challenges running CERN's High-Capacity Tape Archive 14/4/2015 CHEP 2015, Okinawa2 Germán Cancio, Vladimír Bahyl

Slides:



Advertisements
Similar presentations
XenData SXL-5000 LTO Archive System Turnkey video archive system with near-line LTO capacities scaling from 210 TB to 1.18 PB, designed for the demanding.
Advertisements

XenData SXL-3000 LTO Archive System Turnkey video archive system with near-line LTO capacities scaling from 150 TB to 750 TB, designed for the demanding.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS TSM CERN Daniele Francesco Kruse CERN IT/DSS.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Disk and Tape Storage Cost Models Richard Moore & David Minor San Diego Supercomputer.
1 Storage Today Victor Hatridge – CIO Nashville Electric Service (615)
CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.
Reliability Week 11 - Lecture 2. What do we mean by reliability? Correctness – system/application does what it has to do correctly. Availability – Be.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Hugo HEPiX Fall 2005 Testing High Performance Tape Drives HEPiX FALL 2005 Data Services Section.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Update on CERN Tape Status HEPiX Spring 2014, Annecy German.
Backup and Recovery Part 1.
Guide to Linux Installation and Administration, 2e1 Chapter 13 Backing Up System Data.
Barracuda Networks Confidential1 Barracuda Backup Service Integrated Local & Offsite Data Backup.
Designing Storage Architectures for Preservation Collections Library of Congress, September 17-18, 2007 Preservation and Access Repository Storage Architecture.
Agenda  Overview  Configuring the database for basic Backup and Recovery  Backing up your database  Restore and Recovery Operations  Managing your.
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
RAL Site Report HEPiX Fall 2013, Ann Arbor, MI 28 Oct – 1 Nov Martin Bly, STFC-RAL.
SRUTHI NAGULAVANCHA CIS 764, FALL 2008 Department of Computing and Information Sciences (CIS) Kansas State University -1- Back up & Recovery Strategies.
SXL-8 LTO Archive System. SXL-8 Components: HP 1/8 Autoloader XenData SX-10 1RU.
Meeting the Data Protection Demands of a 24x7 Economy Steve Morihiro VP, Programs & Technology Quantum Storage Solutions Group
HEPiX bit-preservation WG Dmitry Ozerov/DESY Germán Cancio/CERN HEPiX Fall 2013, Ann Arbor.
Chapter © 2006 The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/ Irwin Chapter 7 IT INFRASTRUCTURES Business-Driven Technologies 7.
Mark A. Magumba Storage Management. What is storage An electronic place where computer may store data and instructions for retrieval The objective of.
HEPiX bit-preservation WG update – Spring 2014 Dmitry Ozerov/DESY Germán Cancio/CERN HEPiX Spring 2014, Annecy.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS (Physics) Archival Storage Status and Experiences at CERN.
Fermi National Accelerator Laboratory SC2006 Fermilab Data Movement & Storage Multi-Petabyte tertiary automated tape store for world- wide HEP and other.
CERN - IT Department CH-1211 Genève 23 Switzerland The Tier-0 Road to LHC Data Taking CPU ServersDisk ServersNetwork FabricTape Drives.
Update on Bit Preservation, HEPiX WG and Beyond 8/6/2015 DPHEP Collaboration Workshop1 Germán Cancio, IT-DSS-TAB CERN DPHEP Collaboration Workshop CERN,
CASTOR: CERN’s data management system CHEP03 25/3/2003 Ben Couturier, Jean-Damien Durand, Olof Bärring CERN.
Ian Bird Trigger, Online, Offline Computing Workshop CERN, 5 th September 2014.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
The Power of Aligning Backup, Recovery, and Archive Bob Madaio Sr. Manager; Backup, Recovery and Archive Marketing EMC Corporation.
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
Advances in Bit Preservation (since DPHEP’2015) 3/2/2016 DPHEP / WLCG Workshop1 Germán Cancio IT Storage Group CERN DPHEP / WLCG Workshop Lisbon, 3/2/2016.
CERN - IT Department CH-1211 Genève 23 Switzerland Tape Operations Update Vladimír Bahyl IT FIO-TSI CERN.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Data architecture challenges for CERN and the High Energy.
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Tape write efficiency improvements in CASTOR Department CERN IT CERN IT Department CH-1211 Genève 23 Switzerland DSS Data Storage.
Tape archive challenges when approaching Exabyte-scale CHEP 2010, Taipei G. Cancio, V. Bahyl, G. Lo Re, S. Murray, E. Cano, G. Lee, V. Kotlyar CERN IT-DSS.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
© 2012 IBM Corporation IBM Linear Tape File System (LTFS) Overview and Demo.
IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
St. Petersburg, 2016 Openstack Disk Storage vs Amazon Disk Storage Computing Clusters, Grids and Cloud Erasmus Mundus Master Program in PERCCOM Author:
BEHIND THE SCENES LOOK AT INTEGRITY IN A PERMANENT STORAGE SYSTEM Gene Oleynik, Jon Bakken, Wayne Baisley, David Berg, Eileen Berman, Chih-Hao Huang, Terry.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS CASTOR status and development HEPiX Spring 2011, 4 th May.
CERN - IT Department CH-1211 Genève 23 Switzerland CERN Tape Status Tape Operations Team IT/FIO CERN.
CTA: CERN Tape Archive Rationale, Architecture and Status
CERN IT-Storage Strategy Outlook Alberto Pace, Luca Mascetti, Julien Leduc
Federating Data in the ALICE Experiment
XenData SX-10 LTO Archive Appliance
Tape Operations Vladimír Bahyl on behalf of IT-DSS-TAB
Technology for Long-Term Digital Preservation
Experiences and Outlook Data Preservation and Long Term Analysis
Joseph JaJa, Mike Smorul, and Sangchul Song
Update on Plan for KISTI-GSDC
CERN Lustre Evaluation and Storage Outlook
CTA: CERN Tape Archive Adding front-ends and back-ends Status report
Pierre-Emmanuel Brinette
Ákos Frohner EGEE'08 September 2008
CTA: CERN Tape Archive Overview and architecture
Tape Portfolio Messaging
IT INFRASTRUCTURES Business-Driven Technologies
CASTOR: CERN’s data management system
Presentation transcript:

Experiences and Challenges running CERN's High-Capacity Tape Archive 14/4/2015 CHEP 2015, Okinawa2 Germán Cancio, Vladimír Bahyl Daniele Kruse, Julien Leduc, Eric Cano, Steven Murray Presented by Eric Cano

Outline Tape at CERN Large-scale media migration Archive reliability Software and management tools LHC Run-2 and beyond Conclusion 14/4/2015 CHEP 2015, Okinawa3

Tape at CERN: Overview 14/4/2015 CHEP 2015, Okinawa4 CASTOR READ 15 PB23 PB 27 PB CASTOR WRITE Data: ~100 PB physics data (CASTOR) ~7 PB backup (TSM) Tape libraries: IBM TS3500 (3+2) Oracle SL8500 (4) Tape drives: 100 archive 50 backup Capacity: ~ slots ~ tapes

Challenge: ~100 PB of data 2013: ~ tapes 2015: ~ tapes Verify all data after write 3x (300PB!) pumped through the infrastructure (read->write->read) Liberate library slots for new cartridges Decommission ~ obsolete tape cartridges Constraints: Be transparent for experiment activities Exploit the high speeds of the new tape drives Preserve temporal collocation Finish before LHC run 2 start Large scale media migration 14/4/2015 CHEP 2015, Okinawa5

Large scale media migration 14/4/2015 CHEP 2015, Okinawa6 … LHC Run1 Repack LHC Run1 Repack Almost done! Deadline: LHC run 2 starts soon! Part 1: Oracle T10000D Part 2: IBM TS1150

Dust incident Identified 13 tapes in one library affected by concrete or foam particles Isolated incident by verifying all other tapes in the building Recovered 94% files with custom low-level tools and vendor recovery; 113 files lost 14/4/2015 CHEP 2015, Okinawa7 Fruitful exchanges with other tape sites on CC protective measures (access and activity restrictions, special clothing, air filters etc) Library cleaning by specialist company envisaged Prototyped a dust sensor to be installed inside libraries, using cheap commodity components, achieving industrial precision and reaction time RPi Arduino sensor

Archive Reliability Bit-preservation techniques to improve archive reliability Annual bit loss rate: O( ) Systematic verification of freshly written + “cold” tapes Less physical strain on tapes (HSM access, buffered tape marks) With new hardware/media, differences between vendors getting small For smaller experiments, creating dual copies on separated libraries / buildings 14/4/2015 CHEP 2015, Okinawa8 No losses Working on support for SCSI-4 Logical Block Protection Protect against link-level errors eg bit flips Data Blocks shipped to tape drive with pre-calculated CRC CRC re-calculated by drive (read-after-write) and stored on media; CRC checked again on reading. Minimal overhead (<1%) Supported by LTO and enterprise drives

Software and management tools New CASTOR tape software developed and deployed in production Completely redesigned architecture, moved from C to C++ Improved error detection / handling, full support for SCSI tape alerts, soon LBP Support for multiple access protocols (RFIO, XROOT), soon Ceph More details: cf poster by E. Cano in Session B Investigating direct-to-tape backend to EOS (avoid double disk layer) Re-engineered Tape Incident System Taking advantage of full SCSI tape alerts Automated problem identification: tape vs. drive vs. library Better detection of root cause -> catch problems and disable faulty elements earlier Comprehensive media repair workflow 14/4/2015 CHEP 2015, Okinawa9

LHC Run-2 and beyond (1) Run-2 ( ): Expecting ~50PB/year of new data (LHC + non-LHC) +7K tapes / year. CERN has now ~35’000 free library slots Run-3 (-2022): ~150PB/year. Run-4 (2023 onwards): 600PB/year! Peak rates of ~80GB/s 14/4/2015 CHEP 2015, Okinawa10

LHC Run-2 and beyond (1) Run-2 ( ): Expecting ~50PB/year of new data (LHC + non-LHC) +7K tapes / year. CERN has now ~35’000 free library slots Run-3 (-2022): ~150PB/year. Run-4 (2023 onwards): 600PB/year! Peak rates of ~80GB/s 14/4/2015 CHEP 2015, Okinawa11

Source: INSIC, LHC Run-2 and beyond (2) Technology/market forecast (…risky for 15 years!) INSIC Roadmap: +30% / yr tape capacity per $ (+20%/yr I/O increase) +20% / yr disk capacity per $ 12 86Gbit/in 2 demo (~154TB) 125Gbit/in 2 demo (~185TB) Oracle T10KD (8.5TB) IBM TS1150 (10TB)

LHC Run-2 and beyond (2) Technology/market forecast (…risky for 15 years!) INSIC Roadmap: +30% / yr tape capacity per $ (+20%/yr I/O increase) +20% / yr disk capacity per $ 13

Conclusion CERN’s Tape Archive is at the core of physics data storage and archiving Successfully dealt with LHC Run-1 and a large media migration during the Long Shutdown Improving reliability and bit-level data preservation has become a key and long-term activity Focus on having archive infrastructure, software and tools ready and scalable for LHC Run-2 and beyond 14/4/2015 CHEP 2015, Okinawa14

Reserve slides 14/4/2015 CHEP 2015, Okinawa16

Repack tape drive usage, 1w 1TB drives “old” (pre-2011) data 4TB IBM drives data 8TB Oracle drives repack write verification VO write 5TB Oracle drives data repack read 17

Integrity: Media problem workflow Elaborated and well-established workflow for dealing with media problems Several repair levels (local to vendor) Different latency (hours to months) Users are kept informed about status and recovery progress 14/4/2015 CHEP 2015, Okinawa18

Tape incidents, by type 14/4/2015 CHEP 2015, Okinawa19

Integrity: Adressing Media Defects and Wear Cartridges with read/write errors are automatically identified and then decommissioned. A tape cartridge mounted to often (containing “popular” data) can wear out and suffer mechanical damage. Introduced automated decommissioning of media mounted >= 5000 times Enterprise-class media sustains ~15K full reads according to specs Monitoring disables tapes mounted “too frequently” + operators notified. Tape gets automatically disabled and ticket generated for media migration 20