Tape Operations Vladimír Bahyl on behalf of IT-DSS-TAB

Tape Operations Vladimír Bahyl on behalf of IT-DSS-TAB
CASTOR Review, CERN, 22 September 2010 Focus: - marketing, we are very good, more about how good we are doing things - explain we receive requests - deploy modification requests via change management - infrastructure media and drive problems - act as backend - everything is controlled - we do have monitoring - we do have media verification Service is well understood and managed - database lock contention - better decoupling - better interfaces - more modularity - better dependency between user access and tape activity - move out user access from CASTOR Media verification new and old media optimizing / tuning CASTOR to archive not HSM reduce tape mounts – read efficiency, work with users, improve access patterns Software improvements, write efficiency – buffered tape marks Serviceclass Number of repreated mounts Username:group (experiment)

Agenda Overview Success stories Challenges Outlook Conclusion
Hardware, software, people Success stories Change management Monitoring Data unavailability vs. Proactive checking Documentation Problem resolution automation Challenges Media migration Recalls Outlook Conclusion

Overview Hardware Software People CASTOR Tape Service is shared
4 x Oracle SL8500, tapes, 70 x T10000B 3 x IBM TS3500 (+ 2 for backup), tapes, 60 x TS1130 ~150 tape servers (running SLC5, Quattor managed) Software CASTOR ACSLS for Oracle libraries; SCSI media changer for IBM TSM 5 People 3 FTE CASTOR Tape Operations Service Managers + 2 FTE for TSM 1 external tape operator Vendor engineers (2) CASTOR Tape Service is shared CASTOR Tape Development Stager-Tape interface, Volume Database, Tape Request Queue manager, Tape Server daemons 2 FTE – part of the same section Spending often up to 50% of their time on 3rd level support Additional activities Capacity planning / procurement – new call for tender in 2011

Success Stories

Change management Tape infrastructure autonomous from stagers
Keep it stable – only upgrade when tape related changes (tape) vs (stagers) Always test new version/configuration Validate new setup on few servers in each tape library Wider deployment Announced beforehand Transparent – never disable whole library Changes tracked in Savannah Risk assessment Pre-Approval Notifications

Monitoring & Visualisation
LEMON Reactive monitoring – tape servers act on simple errors Request queue monitored TapeLog = our central log database Data collected from various tape daemons Correlation engine handles complex incidents E. g.: a tape showing errors on several tape drives Additional actions recorded by human Record of what was done to tapes sent for recovery Interface for experts for data mining SLS for users Huge amount of detailed counters with plots Structured per stager/experiment

SLS examples CMS ATLAS

Data unavailability Tapes grow in capacities – currently at 1 TB
More data on tape = more reasons to access it = higher the risk of an error

Media health-check / tape scrubbing
When users tell you about unavailable data, it is too late = need to be proactive, not reactive Active archive manager needs to know the state of the data in the archive Perform periodic checks = read the data back All tapes written FULL Tapes not accessed for very long time Process runs in the background Uses resources if available, not overloading the system 10 drives in parallel at ~120 MB/s Daily, weekly, monthly reports Over time to read all data in the archive Easy way to detect failures early

Documentation Working Instructions Problem handling procedures
Regular Periodic Tasks Tape Operations, Vendors, Administrators How to: rise a vendor call, announce interventions Media Management Physical entering and removal into libraries Pool management (creation, deletion, etc.) Drives Working Instructions Installation, Configuration, Testing and Operations Libraries and Controllers Working Instructions Evaluation, Installation, Configuration and Operations Problem handling procedures Media, Drive, Library, Tape Server

Problem resolution automation
Failures occur with: drives, media, libraries Goal: no involvement from our side (ideal case)! FIX the tape drive Vendor Engineer TEST the tape drive Tape drive I/O ERROR Remedy Problem Management workflow Tape Server Put the tape server back into production CLOSE the ticket Tape Operator

Challenges Reducing HSM mode -> going to Archive mode …
Paradigm shift?

Media migration Media migration or “repacking” required for
Recovery from faulty media Defragmentation (file deletions) Migration to higher-density media generations, and / or higher-density tape drives Costly but necessary operation in order to save library slots and new media spending Completed migration from 500GB to 1TB tapes 45,000 tapes – took around a year using 1.5 FTE and up to 40 tape drives running at ~50 MB/s ... but that was done during a period when LHC was not running ... ... but that was ~25 PB, next time it will be ~50 PB ... Improvements needed Identify network/disk server contention bottlenecks Move to 10 Gb/s – at least for the repack infrastructure Remove small file overhead

Small file write overhead
Writing well understood and managed Postpone writing until enough to fill a (substantial fraction of a) tape High aggregate transfer rates System designed to split the write stream onto several tapes However, low per-drive performance Disk cache I/O and network contention Per-file based tape format – high overhead for writing small files -> “shoe-shining” Not fast enough to migrate all data in the archive on a reasonable timescale Improvements: Looking at bulk data transfers using “buffered” tape marks “buffered” means no sync and tape stop – increased throughput and reduced tape wear Buffered tape marks are part of SCSI standard since SCSI-2 and available on standard tape drives, however no support via Linux kernel Worked with Linux tape driver maintainer and have now a test driver version 1 synchronising TM already available in the upcoming release Drive performance in MB/s eth speed (MB/s) File size (MB)

Recalls at random – identification
Random READ access on tape Current system is file based With experiment data sets containing 1000s of files, these are spread across many tapes Users asking for files not on disk cause (almost) random file recalls Many tapes get mounts but average number of files read is very low Few files per mount  drives busy for short time  up to 9K mounts/day Low effective transfer rates Until recently, it was complicated to trace tape mounts to users Now – twice/day report on what is going on: Short incident report to service managers Long report archived for future comparison More castor into efficient archive Move away from HSM mode to Archive mode Bring down number of mounts Optimise read access Using policies / restrictions Will increase realibility Move to archive mode from HSM … Disk buffer sizes are correct – life data kept on disk … if not, impact of the tape infrastructure … STAGER USER:GROUP #MOUNTS #TAPES RATIO Avg#FILES AvgFILESIZE(MB) PUBLIC vkolosa:vy ALICE aliprod:z LHCb sagidova:z CMS mplaner:zh

Recalls at random – actions
Follow-up with VO's and users in case of irregular or inefficient tape usage Potentially never ending activity Re-initiated SW developments for better controlling read efficiency via policies Grouping of requests Ceilings for concurrent tape usage Investigate larger disk caches to reduce load on tape; move to tape for archive only, not per-user / per-file HSM Mismatch between disk pool size and actual activity can cause high load on the tape infrastructure affecting everybody Review periodically disk pool setup with experiments Increase tape storage granularity from files to data sets, or co-location More castor into efficient archive Move away from HSM mode to Archive mode Bring down number of mounts Optimise read access Using policies / restrictions Will increase realibility Move to archive mode from HSM … Disk buffer sizes are correct – life data kept on disk … if not, impact of the tape infrastructure …

Outlook No replacement for tape on the horizon for large long term archives Neither at CERN nor outside We expect to grow at ~20 PB/year with LHC running Around ~10 PB in 2012 when LHC stopped Need to shift from HSM model towards archive model Use tape for bulk transfers only, not for random file access Investigate alternatives (such as GPFS/TSM) “House keeping” traffic in an archive is proportional to its size and will exceed the LHC generated traffic at some point Actively looking at potential alternatives Such as GPFS/TSM

Conclusion The Tape Service is well understood and managed
It is successfully coping with the LHC data Change management system Test changes before deploying widely Detailed monitoring Provides information from various viewpoints Good data simplifies incident follow-up Problem handling procedures Minimize load on the service managers whenever possible Plan exist for upgrades and expansions

Tape Operations Vladimír Bahyl on behalf of IT-DSS-TAB

Similar presentations

Presentation on theme: "Tape Operations Vladimír Bahyl on behalf of IT-DSS-TAB"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Tape Operations Vladimír Bahyl on behalf of IT-DSS-TAB

Similar presentations

Presentation on theme: "Tape Operations Vladimír Bahyl on behalf of IT-DSS-TAB"— Presentation transcript:

Similar presentations

About project

Feedback