Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data management for ATLAS, ALICE and VOCE in the Czech Republic L.Fiala, J. Chudoba, J. Kosina, J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril,

Similar presentations


Presentation on theme: "Data management for ATLAS, ALICE and VOCE in the Czech Republic L.Fiala, J. Chudoba, J. Kosina, J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril,"— Presentation transcript:

1 Data management for ATLAS, ALICE and VOCE in the Czech Republic L.Fiala, J. Chudoba, J. Kosina, J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril, L. Matyska, M. Ruda, Z. Salvet, M. Mulac

2 Overview Supported VOs (VOCE, ATLAS, ALICE) Supported VOs (VOCE, ATLAS, ALICE) DPM as a choice of SRM-based Storage Element DPM as a choice of SRM-based Storage Element Issues encountered with DPM Issues encountered with DPM Results of transfers Results of transfers Conclusion Conclusion

3 VOCE Virtual Organization for Central Europe Virtual Organization for Central Europe in the scope of the EGEE project in the scope of the EGEE project provision of distributed Grid facitilites to non-HEP scientists provision of distributed Grid facitilites to non-HEP scientists Austrian, Czech, Hungarian, Polish, Slovak, Slovenian resources involved Austrian, Czech, Hungarian, Polish, Slovak, Slovenian resources involved the design and implementation of VOCE infrastructure done solely on Czech Resources the design and implementation of VOCE infrastructure done solely on Czech Resources ALICE, ATLAS Virtual Organizations for LHC experiments Virtual Organizations for LHC experiments

4 Storage Elements Classical disk based SEs Classical disk based SEs Participating in Service Challenge 4 Participating in Service Challenge 4 Need for SRM-enabled SE No tape storage available for Grid at the moment – DPM chosen as SRM enabled SE No tape storage available for Grid at the moment – DPM chosen as SRM enabled SE 1 head node, 1 disk server on the same server 1 head node, 1 disk server on the same server Separate nodes with disk servers planned Separate nodes with disk servers planned 5 TB on 4 filesystems (3 local, 1 NBD) 5 TB on 4 filesystems (3 local, 1 NBD)

5 DPM issues – srmCopy() DPM does not currently support srmCopy() method (work in progress) DPM does not currently support srmCopy() method (work in progress) When copying from non-DPM SRM SE to DPM SE using srmcp, the pushmode=true flag must be used When copying from non-DPM SRM SE to DPM SE using srmcp, the pushmode=true flag must be used Local temporary storage or globus-url- copy can be used to avoid direct SRM to SRM 3 rd party transfer using srmCopy() Local temporary storage or globus-url- copy can be used to avoid direct SRM to SRM 3 rd party transfer using srmCopy()

6 DPM issues – pools on NFS (1) Our original setup – disk array attached to NFS server (64bit Opteron, Fedora Core OS with 2.6 kernel) Our original setup – disk array attached to NFS server (64bit Opteron, Fedora Core OS with 2.6 kernel) Disk array NFS mounted on DPM disk server (no need to install disk server on Fedora) Disk array NFS mounted on DPM disk server (no need to install disk server on Fedora) Silent file truncation when copying files from pools located on NFS Silent file truncation when copying files from pools located on NFS

7 DPM issues – pools on NFS (2) Using strace we found that the problem is that at some point during the copying process receives EACCES error from read() Using strace we found that the problem is that at some point during the copying process receives EACCES error from read() Unable to reproduce using standard utilities (cp, dd, simple read()/write() programs) Unable to reproduce using standard utilities (cp, dd, simple read()/write() programs) Problem only when 2.4 client and 2.6 kernel (verified on various versions) Problem only when 2.4 client and 2.6 kernel (verified on various versions)

8 DPM issues – pools on NFS (3) Problem reported to DPM developers Problem reported to DPM developers Verified to be issue also with new VDT 1.3 (globus4, gridftp2) Verified to be issue also with new VDT 1.3 (globus4, gridftp2) Our workaround – used NBD instead of NFS Our workaround – used NBD instead of NFS –Important: DPM requires every fs in the pool to be a separate partition (free space calculation) –NBD is a suitable solution for case of shared filesystem

9 DPM issues – rate limiting SRM implementation in DPM currently doesn’t support (unlike dCache or CASTOR2) rate limiting concurrent new SRM requests SRM implementation in DPM currently doesn’t support (unlike dCache or CASTOR2) rate limiting concurrent new SRM requests On DPM TODO list On DPM TODO list Besides these issue we have quite good results using DPM as a SE for ATLAS, ALICE and VOCE VOs … Besides these issue we have quite good results using DPM as a SE for ATLAS, ALICE and VOCE VOs …

10 Atlas CSC Golias100 receives data from Atlas CSC production Golias100 receives data from Atlas CSC production Defined in some lexor (Atlas LCG executor) instances as reliable storage element Defined in some lexor (Atlas LCG executor) instances as reliable storage element

11 Data transfers via FTS CERN – FZU, tested in April using FTS server at CERN CERN – FZU, tested in April using FTS server at CERN

12 Data transfers via srmcp –FTS channel available only to associated Tier1 (FZK) –Tests to another Tier1 possible only via transfers issued “by hand” –Tests SARA - FZU: bulk copy from SARA to FZU, now with only one srmcp command bulk copy from SARA to FZU, now with only one srmcp command 10 files: max speed 200 Mbps, average 130 Mbps 10 files: max speed 200 Mbps, average 130 Mbps 200 files: only 66 finished, the rest failed due to “Too many transfers” error 200 files: only 66 finished, the rest failed due to “Too many transfers” error Speed OK Speed OK

13 Tests Tier1 – Tier2 via FTS FZU (Prague) is a Tier2 associated to Tier1 FZK (GridKa, Karlsruhe, Germany) FZU (Prague) is a Tier2 associated to Tier1 FZK (GridKa, Karlsruhe, Germany) FTS (File Transfer Server) operated by Tier1, channels FZK-FZU and FZU-FZK managed by FZK and FZU FTS (File Transfer Server) operated by Tier1, channels FZK-FZU and FZU-FZK managed by FZK and FZU Tunable parameters: Tunable parameters: –Number of files transferred simultaneously –Number of streams –Priorities between different VOs (ATLAS, ALICE, DTEAM)

14 Results not stable: Transfer of 50 files, each file 1GB Starts fast, then timeouts occur: Transfer of 100 files, each file 1GB Started when load on Tier1 disk servers low

15 ATLAS Tier0 test – part of SC4 Transfers of RAW and AOD data from Tier0 (CERN) to 10 ATLAS Tier1’s and to associated Tier2’s Transfers of RAW and AOD data from Tier0 (CERN) to 10 ATLAS Tier1’s and to associated Tier2’s Managed by ATLAS system DQ2, it uses FTS at Tier0 for Tier0 – Tier1 transfers and Tier1’s FTS for Tier1 – Tier2 transfer Managed by ATLAS system DQ2, it uses FTS at Tier0 for Tier0 – Tier1 transfers and Tier1’s FTS for Tier1 – Tier2 transfer First data copied to FZU this Monday: First data copied to FZU this Monday: ALICE plans FTS transfer test in July

16 Conclusion DPM is the only “light-weight” available Storage Element with SRM frontend DPM is the only “light-weight” available Storage Element with SRM frontend It has issues, but none of them are “show stoppers” and the code is under active development It has issues, but none of them are “show stoppers” and the code is under active development Using DPM, we were able to reach significant and non-trivial transfer results in the scope of LCG SC4 Using DPM, we were able to reach significant and non-trivial transfer results in the scope of LCG SC4


Download ppt "Data management for ATLAS, ALICE and VOCE in the Czech Republic L.Fiala, J. Chudoba, J. Kosina, J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril,"

Similar presentations


Ads by Google