Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Tape-dev update Castor F2F meeting, 14/10/09 Nicola Bessone, German Cancio, Steven Murray,

Similar presentations


Presentation on theme: "CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Tape-dev update Castor F2F meeting, 14/10/09 Nicola Bessone, German Cancio, Steven Murray,"— Presentation transcript:

1 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Tape-dev update Castor F2F meeting, 14/10/09 Nicola Bessone, German Cancio, Steven Murray, Giulia Taurelli

2 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Slide 2 Drive Scheduler Drive Scheduler Tape Server Tape Server Disk Server Disk Server Disk Server Disk Server Disk Server Disk Server 12 1. Stager requests a drive 2. Drive is allocated 3. Data is transferred to/from disk/tape based on file list given by stager 3 3 Legend Data Control messages Host Stager Current Architecture 1 data file = 1 tape file Reminder – last F2F

3 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Stager New Architecture Slide 3 Drive Scheduler Drive Scheduler Tape Server Disk Server Disk Server Disk Server Disk Server Disk Server Disk Server Stager Legend Data to be stored Control messages Host Server process(es) Tape Gateway Tape Aggregator n data files = 1 tape file The tape gateway will replace RTCPClientD The tape gateway will be stateless The tape aggregator will wrap RTCPD Reminder – last F2F

4 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t New software Goals: code refresh (unmaintained/unknown), component reuse (Castor C++ / DB framework), improved (DB) consistency, enhanced stability -> performance, ground work for future new tape format (block-based metadata) 2 new daemons developed:  tapegatewayd (on the stager) -> replaces rtcpclientd / recaller / migrator.  aggregatord (on the tape server) -> acts as a proxy or bridge between rtcpd and tapegatewayd. (No new tape format yet) Rewritten migHunter  Transactional handling (at stager DB level) of new migration candidates German.Cancio@cern.ch Slide 4

5 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Status software has been installed on CERN’s stress test instance (ITDC) ~4w ago, started end-to-end tests and stress tests (~20 tape servers, ~25 disk servers) So far, significant improvements in terms of stability (no software- related tape unmounts during migrations and recalls) However: testing not completed yet, many issues found on the way unveiled by the new software  See next slides New migHunter to be released ASAP (2.1.9-2 if tests with rtcpclientd ok) Tape gateway + aggregator to be released in 2.1.9-x as optional component - not part of the default deployment, and there are no dependencies on it from the rest of the CASTOR software. German.Cancio@cern.ch Slide 5

6 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Test findings (1) Performance degradations during migrations  Already observed in production, but difficult to trace down as long-lived migration streams rarely occur (cf savannah)  Found to be a misconfiguration in the rtcpd / syslog config, causing log messages to be generated growing @ O(n*n),, n=migrated files Another problem to be understood is stager DB time for disk server/ fs selection, which tends to grow during migration lifetime. Currently not limited by this but could become a bottleneck Slide 6

7 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Test findings (2) Migration slowdown on IBM drives  Castor at fault? Towards end of tape? End of mount? German.Cancio@cern.ch Slide 7

8 Tpsrv150, 23/9/09 Tpsrv151, 23/9/09 Tpsrv001, 23/9/09 Tpsrv235, 23/9/09 Tpsrv204, 23/9/09 Tpsrv203, 24/9/09 Tpsrv204, 24/9/09

9 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Test findings (2) Migration slowdown on IBM drives  Castor at fault? Towards end of tape? End of mount?  correlation between where the tape is being written and performance of writing. Confirmed by writing a Castor- independent test writing Castor-like AUL files  Traced down to be an IBM hardware specific issue. After analysis, TapeOps confirmed this to be part of an optimisation on IBM drives called “virtual back hitch”. This optimisation allows small files to be written at higher speeds by reserving a special cache area on tape, while the tape is not getting full.  NVC can be switched off, but performance drops to ~15MB/s German.Cancio@cern.chslide 9

10 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Test findings (3) Under (yet) unknown circumstances, IBM tapes hit end-of- tape at 10-30% less than their nominal capacity. Read performance on these tapes is also suboptimal Seems to be related to a suboptimal working of NVC / virtual back hitch  Does not occur when NVC is switched off To be reported to IBM German.Cancio@cern.ch Slide 10 reading tape with urandom- generated 100MB files to /dev/null using dd (X: seconds, Y: MB/s throughput). The tape contains 8222 AUL files of 100M each

11 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Test findings (4) Suboptimal file placement strategy on recalls?  which apparently causes interference Recall using default Castor file placement Same recall using 2 dedicated disk servers per tape server German.Cancio@cern.chSlide 11 3 tape servers recalling on 7 disk servers (all files distributed over all disk servers/file systems 3 tape and 6 disk servers (all filesystems), same as above yields ~310-320 MB/s

12 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Test findings (5) Recall performance limited by central element (gateway/stager/..?)  a central limitation which prevents performance to go higher than a threshold, even if distinct pools are being used German.Cancio@cern.chSlide 12 c2itdc total throughput c2itdc/ pool 1 c2itdc / pool 2 shortly after 21:30, the tape recall on pool 1finished. recall performance of the second pool goes up from then on, and that the total recall performance (both disk pools) stays at ~255MB/s. No DB / network contention.

13 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Test findings (7) Performance degradation on recalls on new tape server HW  observed that new-generation tape servers (Dell 4core) are capable to read out data from tape at a higher than rtcpd is capable to process it. This eventually causes the attached drives to stall. It happens equally if an IBM or an STK drive is attached. The stalling problem does not happen on all other older servers (elonex 2core, clustervision) as there, the drives read out at lower speeds.  Traced down (yesterday..) to a too verbose logging of the tape positioning executable (posovl) when using the new syslog-based DLF. German.Cancio@cern.chSlide 13,

14 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t “tape” bug fixes in 2.1.9 “tape” = repack, VDQM, VMGR, rcpclientd, rtcpd, taped, and the new components 2.1.9-0  https://twiki.cern.ch/twiki/bin/view/DataManagement/Cast orReleasePlan21900 https://twiki.cern.ch/twiki/bin/view/DataManagement/Cast orReleasePlan21900 2.1.9-2 (planned)  https://twiki.cern.ch/twiki/bin/view/DataManagement/Cast orReleasePlan21902 https://twiki.cern.ch/twiki/bin/view/DataManagement/Cast orReleasePlan21902 2.1.9-X  https://twiki.cern.ch/twiki/bin/view/DataManagement/Cast orTapeReleasePlan219X https://twiki.cern.ch/twiki/bin/view/DataManagement/Cast orTapeReleasePlan219X German.Cancio@cern.chSlide 14


Download ppt "CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Tape-dev update Castor F2F meeting, 14/10/09 Nicola Bessone, German Cancio, Steven Murray,"

Similar presentations


Ads by Google