Presentation is loading. Please wait.

Presentation is loading. Please wait.

CASTOR at RAL in 2016 Rob Appleyard. Contents Current Status Staffing Upgrade plans Questions Conclusion.

Similar presentations


Presentation on theme: "CASTOR at RAL in 2016 Rob Appleyard. Contents Current Status Staffing Upgrade plans Questions Conclusion."— Presentation transcript:

1 CASTOR at RAL in 2016 Rob Appleyard

2 Contents Current Status Staffing Upgrade plans Questions Conclusion

3 Current Status

4 RAL: –Tier 1: four Instances, 13 disk pools, 12PB disk, 14 PB tape ATLAS > LHCb > CMS > ALICE/’Gen’ –Local facilities: Small D0T1 disk instance with large (8PB) tape backend. –Condor batch farm. –Running pretty well. Good availability for last year

5 Changes: Staffing Staffing –Shaun & Juan have left –Meet George – new CASTOR person –Andrey now sole DBA

6 Changes: Local Facilities Facilities setup –Disk cache is many small nodes (11*8TiB) Old hardware, but good performance –User wanted to stage large quantities of data… …but it was getting GC-ed before user got around to retrieving it. Sad user. –Too expensive to scale up with many small nodes –Mixing 8TiB old nodes with new big nodes seems like asking for trouble

7 Changes: Local Facilities Small, high-performance disk cache is great for migration… –…but not for users who want to stage large amounts of data. We don’t want to throw away our migration-optimised cache, so we need to find a way to accommodate recalls.

8 Changes: Local Facilities The solution: dedicated recall cache. –Few large nodes, total capacity bigger than max anticipated user recall. –Conventional d0t1 GC Now have generic migration cache and 2 recall caches for specific users. Possibly not necessary to have 2. Works OK for now. –Is there anything we should be aware of? User wants to run D1T0 and manage his own deletion.

9 Changes: Tier 1 …not a lot, actually –Run 2 well underway –Availability generally good –2.1.15 Real Soon Now™

10 2.1.15 DB problem turned out to be due to a missing DB link. –…and now the test instance (mostly) works! xroot & rfio access all OK SRM access not working… but lcg-del does. Have not investigated in any detail due to lack of time this week…  [rvv47345@lcgui04 ~]$ /usr/bin/lcg-cp --vo dteam --defaultsetype srmv2 --nobdii -S PreprodDiskPool srm://lcgsrm08.gridpp.rl.ac.uk:8443/srm/managerv2?SFN=/castor/preprod.ral/preprodDisk /rob/junk151457108769 file:/home/tier1/rvv47345/recall [SE][StatusOfGetRequest][ETIMEDOUT] httpg://lcgsrm08.gridpp.rl.ac.uk:8443/srm/managerv2: User timeout over lcg_cp: Connection timed out

11 Tape Servers 2.1.16-0 on all production tape servers Some issues, Tim to report by email. Major hardware issues with one library Software issues with ACSLS Roadmap: RH7-based tape servers

12 Hardware Most new hardware allocated to Echo project –2011-generation nodes still in tape-backed service classes feeling a bit creaky –New hardware acquired to fill the gaps –Help us keep up with LHC production

13 Tape Robot Problems Two periods of difficult running - early May & early June. Consult with Tim for full story Both libraries (Tier 1 & ‘Facilities’) offline at some point, Tier 1 for longer. Early May: both elevators in Tier 1 robot failed –Moved drives into Facilities robot to ensure migration Early June: Engineer addressing previous problem received electric shock from robot – robot turned off until confirmed safe

14 Future Plans SL7 tape servers …and Ceph gateways Echo migration… –More on this later. –Outline: 1)Progressively migrate disk-only CASTOR storage to Echo in co-ordination with VOs 2)Keep D0T1 CASTOR going. 3)See talk from last time for further detail (‘CASTOR 2017’)

15 Assorted questions from RAL 1.Understood that rfio is being removed. Any estimate of when will this happen? 2.Is there any possibility of running a non-Ceph object store (DDN/Panasas) beneath CASTOR? –Question from a curious RAL user, motivation unclear. 3.What access protocols will CASTOR support when running on top of Ceph storage?


Download ppt "CASTOR at RAL in 2016 Rob Appleyard. Contents Current Status Staffing Upgrade plans Questions Conclusion."

Similar presentations


Ads by Google