Data management for ATLAS, ALICE and VOCE in the Czech Republic L.Fiala, J. Chudoba, J. Kosina, J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril,

Slides:



Advertisements
Similar presentations
HEPiX GFAL and LCG data management Jean-Philippe Baud CERN/IT/GD.
Advertisements

GridKa January 2005 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann 1 Mass Storage at GridKa Forschungszentrum Karlsruhe GmbH.
LCG Tiziana Ferrari - SC3: INFN installation status report 1 Service Challenge Phase 3: Status report Tiziana Ferrari on behalf of the INFN SC team INFN.
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari
Storage: Futures Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 8 October 2008.
ATLAS computing in Geneva Szymon Gadomski, NDGF meeting, September 2009 S. Gadomski, ”ATLAS computing in Geneva", NDGF, Sept 091 the Geneva ATLAS Tier-3.
11 September 2007Milos Lokajicek Institute of Physics AS CR Prague Status of the GRID in the Czech Republic NEC’2007.
FZU participation in the Tier0 test CERN August 3, 2006.
16 th May 2006Alessandra Forti Storage Alessandra Forti Group seminar 16th May 2006.
10 October 2006ICFA DDW'06, Cracow Milos Lokajicek, Prague 1 Current status and plans for Czech Grid for HEP.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
Computing for HEP in the Czech Republic Jiří Chudoba Institute of Physics, AS CR, Prague.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
29 June 2004Distributed Computing and Grid- technologies in Science and Education. Dubna 1 Grid Computing in the Czech Republic Jiri Kosina, Milos Lokajicek,
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
March 2003 CERN 1 EDG and AliEn in Prague Dagmar Adamova INP Rez near Prague.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Δ Storage Middleware GridPP10 What’s new since GridPP9? CERN, June 2004.
1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware on Prague farms Statistics about running LHC experiment’s DC Experience.
Site Report BEIJING-LCG2 Wenjing Wu (IHEP) 2010/11/21.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
CSCS Status Peter Kunszt Manager Swiss Grid Initiative CHIPP, 21 April, 2006.
Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
Grid activities in the Czech Republic Jiří Kosina, Miloš Lokajíček, Jan Švec Institute of Physics of the Academy of Sciences of the Czech Republic
Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.
Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.
Site Report --- Andrzej Olszewski CYFRONET, Kraków, Poland WLCG GridKa+T2s Workshop.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
5 Sept 2006GDB meeting BNL, MIlos Lokajicek Service planning and monitoring in T2 - Prague.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
13 October 2004GDB - NIKHEF M. Lokajicek1 Operational Issues in Prague Data Challenge Experience.
Site Report: Prague Jiří Chudoba Institute of Physics, Prague WLCG GridKa+T2s Workshop.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
Materials for Report about Computing Jiří Chudoba x.y.2006 Institute of Physics, Prague.
GridKa December 2004 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann dCache Implementation at FZK Forschungszentrum Karlsruhe.
LCG Storage Workshop “Service Challenge 2 Review” James Casey, IT-GD, CERN CERN, 5th April 2005.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Service Challenge Meeting “Review of Service Challenge 1” James Casey, IT-GD, CERN RAL, 26 January 2005.
ASCC Site Report Eric Yen & Simon C. Lin Academia Sinica 20 July 2005.
Computing for HEP in the Czech Republic Jiří Chudoba Institute of Physics, AS CR, Prague.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
CASTOR in SC Operational aspects Vladimír Bahyl CERN IT-FIO 3 2.
J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
Tier2 Centre in Prague Jiří Chudoba FZU AV ČR - Institute of Physics of the Academy of Sciences of the Czech Republic.
Grid activities in Czech Republic Jiri Kosina Institute of Physics of the Academy of Sciences of the Czech Republic
13 January 2004GDB Geneva, Milos Lokajicek Institute of Physics AS CR, Prague LCG regional centre in Prague
EGEE Data Management Services
LCG Service Challenge: Planning and Milestones
Status of the SRM 2.2 MoU extension
James Casey, IT-GD, CERN CERN, 5th September 2005
Service Challenge 3 CERN
VOCE Peter Kaczuk, Dan Kouril, Miroslav Ruda, Jan Svec,
Data Management cluster summary
LHC Data Analysis using a worldwide computing grid
INFNGRID Workshop – Bari, Italy, October 2004
The LHCb Computing Data Challenge DC06
Presentation transcript:

Data management for ATLAS, ALICE and VOCE in the Czech Republic L.Fiala, J. Chudoba, J. Kosina, J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril, L. Matyska, M. Ruda, Z. Salvet, M. Mulac

Overview Supported VOs (VOCE, ATLAS, ALICE) Supported VOs (VOCE, ATLAS, ALICE) DPM as a choice of SRM-based Storage Element DPM as a choice of SRM-based Storage Element Issues encountered with DPM Issues encountered with DPM Results of transfers Results of transfers Conclusion Conclusion

VOCE Virtual Organization for Central Europe Virtual Organization for Central Europe in the scope of the EGEE project in the scope of the EGEE project provision of distributed Grid facitilites to non-HEP scientists provision of distributed Grid facitilites to non-HEP scientists Austrian, Czech, Hungarian, Polish, Slovak, Slovenian resources involved Austrian, Czech, Hungarian, Polish, Slovak, Slovenian resources involved the design and implementation of VOCE infrastructure done solely on Czech Resources the design and implementation of VOCE infrastructure done solely on Czech Resources ALICE, ATLAS Virtual Organizations for LHC experiments Virtual Organizations for LHC experiments

Storage Elements Classical disk based SEs Classical disk based SEs Participating in Service Challenge 4 Participating in Service Challenge 4 Need for SRM-enabled SE No tape storage available for Grid at the moment – DPM chosen as SRM enabled SE No tape storage available for Grid at the moment – DPM chosen as SRM enabled SE 1 head node, 1 disk server on the same server 1 head node, 1 disk server on the same server Separate nodes with disk servers planned Separate nodes with disk servers planned 5 TB on 4 filesystems (3 local, 1 NBD) 5 TB on 4 filesystems (3 local, 1 NBD)

DPM issues – srmCopy() DPM does not currently support srmCopy() method (work in progress) DPM does not currently support srmCopy() method (work in progress) When copying from non-DPM SRM SE to DPM SE using srmcp, the pushmode=true flag must be used When copying from non-DPM SRM SE to DPM SE using srmcp, the pushmode=true flag must be used Local temporary storage or globus-url- copy can be used to avoid direct SRM to SRM 3 rd party transfer using srmCopy() Local temporary storage or globus-url- copy can be used to avoid direct SRM to SRM 3 rd party transfer using srmCopy()

DPM issues – pools on NFS (1) Our original setup – disk array attached to NFS server (64bit Opteron, Fedora Core OS with 2.6 kernel) Our original setup – disk array attached to NFS server (64bit Opteron, Fedora Core OS with 2.6 kernel) Disk array NFS mounted on DPM disk server (no need to install disk server on Fedora) Disk array NFS mounted on DPM disk server (no need to install disk server on Fedora) Silent file truncation when copying files from pools located on NFS Silent file truncation when copying files from pools located on NFS

DPM issues – pools on NFS (2) Using strace we found that the problem is that at some point during the copying process receives EACCES error from read() Using strace we found that the problem is that at some point during the copying process receives EACCES error from read() Unable to reproduce using standard utilities (cp, dd, simple read()/write() programs) Unable to reproduce using standard utilities (cp, dd, simple read()/write() programs) Problem only when 2.4 client and 2.6 kernel (verified on various versions) Problem only when 2.4 client and 2.6 kernel (verified on various versions)

DPM issues – pools on NFS (3) Problem reported to DPM developers Problem reported to DPM developers Verified to be issue also with new VDT 1.3 (globus4, gridftp2) Verified to be issue also with new VDT 1.3 (globus4, gridftp2) Our workaround – used NBD instead of NFS Our workaround – used NBD instead of NFS –Important: DPM requires every fs in the pool to be a separate partition (free space calculation) –NBD is a suitable solution for case of shared filesystem

DPM issues – rate limiting SRM implementation in DPM currently doesn’t support (unlike dCache or CASTOR2) rate limiting concurrent new SRM requests SRM implementation in DPM currently doesn’t support (unlike dCache or CASTOR2) rate limiting concurrent new SRM requests On DPM TODO list On DPM TODO list Besides these issue we have quite good results using DPM as a SE for ATLAS, ALICE and VOCE VOs … Besides these issue we have quite good results using DPM as a SE for ATLAS, ALICE and VOCE VOs …

Atlas CSC Golias100 receives data from Atlas CSC production Golias100 receives data from Atlas CSC production Defined in some lexor (Atlas LCG executor) instances as reliable storage element Defined in some lexor (Atlas LCG executor) instances as reliable storage element

Data transfers via FTS CERN – FZU, tested in April using FTS server at CERN CERN – FZU, tested in April using FTS server at CERN

Data transfers via srmcp –FTS channel available only to associated Tier1 (FZK) –Tests to another Tier1 possible only via transfers issued “by hand” –Tests SARA - FZU: bulk copy from SARA to FZU, now with only one srmcp command bulk copy from SARA to FZU, now with only one srmcp command 10 files: max speed 200 Mbps, average 130 Mbps 10 files: max speed 200 Mbps, average 130 Mbps 200 files: only 66 finished, the rest failed due to “Too many transfers” error 200 files: only 66 finished, the rest failed due to “Too many transfers” error Speed OK Speed OK

Tests Tier1 – Tier2 via FTS FZU (Prague) is a Tier2 associated to Tier1 FZK (GridKa, Karlsruhe, Germany) FZU (Prague) is a Tier2 associated to Tier1 FZK (GridKa, Karlsruhe, Germany) FTS (File Transfer Server) operated by Tier1, channels FZK-FZU and FZU-FZK managed by FZK and FZU FTS (File Transfer Server) operated by Tier1, channels FZK-FZU and FZU-FZK managed by FZK and FZU Tunable parameters: Tunable parameters: –Number of files transferred simultaneously –Number of streams –Priorities between different VOs (ATLAS, ALICE, DTEAM)

Results not stable: Transfer of 50 files, each file 1GB Starts fast, then timeouts occur: Transfer of 100 files, each file 1GB Started when load on Tier1 disk servers low

ATLAS Tier0 test – part of SC4 Transfers of RAW and AOD data from Tier0 (CERN) to 10 ATLAS Tier1’s and to associated Tier2’s Transfers of RAW and AOD data from Tier0 (CERN) to 10 ATLAS Tier1’s and to associated Tier2’s Managed by ATLAS system DQ2, it uses FTS at Tier0 for Tier0 – Tier1 transfers and Tier1’s FTS for Tier1 – Tier2 transfer Managed by ATLAS system DQ2, it uses FTS at Tier0 for Tier0 – Tier1 transfers and Tier1’s FTS for Tier1 – Tier2 transfer First data copied to FZU this Monday: First data copied to FZU this Monday: ALICE plans FTS transfer test in July

Conclusion DPM is the only “light-weight” available Storage Element with SRM frontend DPM is the only “light-weight” available Storage Element with SRM frontend It has issues, but none of them are “show stoppers” and the code is under active development It has issues, but none of them are “show stoppers” and the code is under active development Using DPM, we were able to reach significant and non-trivial transfer results in the scope of LCG SC4 Using DPM, we were able to reach significant and non-trivial transfer results in the scope of LCG SC4