HEPiX 2004-05-23 GFAL and LCG data management Jean-Philippe Baud CERN/IT/GD.

Slides:



Advertisements
Similar presentations
30-31 Jan 2003J G Jensen, RAL/WP5 Storage Elephant Grid Access to Mass Storage.
Advertisements

Bernd Panzer-Steindel, CERN/IT WAN RAW/ESD Data Distribution for LHC.
J Jensen CCLRC RAL Data Management AUZN (mostly about SRM though) GGF 16, Athens J Jensen.
Steve Traylen Particle Physics Department Experiences of DCache at RAL UK HEP Sysman, 11/11/04 Steve Traylen
Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
EGEE-II INFSO-RI Enabling Grids for E-sciencE The gLite middleware distribution OSG Consortium Meeting Seattle,
Plateforme de Calcul pour les Sciences du Vivant SRB & gLite V. Breton.
EGEE is a project funded by the European Union under contract IST Using SRM: DPM and dCache G.Donvito,V.Spinoso INFN Bari
1 CHEP 2003 Arie Shoshani Experience with Deploying Storage Resource Managers to Achieve Robust File replication Arie Shoshani Alex Sim Junmin Gu Scientific.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Magda – Manager for grid-based data Wensheng Deng Physics Applications Software group Brookhaven National Laboratory.
16 th May 2006Alessandra Forti Storage Alessandra Forti Group seminar 16th May 2006.
Don Quijote Data Management for the ATLAS Automatic Production System Miguel Branco – CERN ATC
Mass Storage System Forum HEPiX Vancouver, 24/10/2003 Don Petravick (FNAL) Olof Bärring (CERN)
1 Use of SRMs in Earth System Grid Arie Shoshani Alex Sim Lawrence Berkeley National Laboratory.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.
Data Management The GSM-WG Perspective. Background SRM is the Storage Resource Manager A Control protocol for Mass Storage Systems Standard protocol:
Data management for ATLAS, ALICE and VOCE in the Czech Republic L.Fiala, J. Chudoba, J. Kosina, J. Krasova, M. Lokajicek, J. Svec, J. Kmunicek, D. Kouril,
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.
Δ Storage Middleware GridPP10 What’s new since GridPP9? CERN, June 2004.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.
Owen SyngeTitle of TalkSlide 1 Storage Management Owen Synge – Developer, Packager, and first line support to System Administrators. Talks Scope –GridPP.
Computing Sciences Directorate, L B N L 1 CHEP 2003 Standards For Storage Resource Management BOF Co-Chair: Arie Shoshani * Co-Chair: Peter Kunszt ** *
INFSO-RI Enabling Grids for E-sciencE gLite Data Management and Interoperability Peter Kunszt (JRA1 DM Cluster) 2 nd EGEE Conference,
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
USATLAS dCache System and Service Challenge at BNL Zhenping (Jane) Liu RHIC/ATLAS Computing Facility, Physics Department Brookhaven National Lab 10/13/2005.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
STAR Collaboration, July 2004 Grid Collector Wei-Ming Zhang Kent State University John Wu, Alex Sim, Junmin Gu and Arie Shoshani Lawrence Berkeley National.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Jens G Jensen RAL, EDG WP5 Storage Element Overview DataGrid Project Conference Heidelberg, 26 Sep-01 Oct 2003.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
Oct 24, 2002 Michael Ernst, Fermilab DRM for Tier1 and Tier2 centers Michael Ernst Fermilab February 3, 2003.
The new FTS – proposal FTS status. EMI INFSO-RI /05/ FTS /05/ /05/ Bugs fixed – Support an SE publishing more than.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
10 May 2001WP6 Testbed Meeting1 WP5 - Mass Storage Management Jean-Philippe Baud PDP/IT/CERN.
Author - Title- Date - n° 1 Partner Logo WP5 Status John Gordon Budapest September 2002.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Grid Status - PPDG / Magda / pacman Torre Wenaus BNL DOE/NSF Review of US LHC Software and Computing Fermilab Nov 29, 2001.
Production Mode Data-Replication Framework in STAR using the HRM Grid CHEP ’04 Congress Centre Interlaken, Switzerland 27 th September – 1 st October Eric.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
9/20/04Storage Resource Manager, Timur Perelmutov, Jon Bakken, Don Petravick, Fermilab 1 Storage Resource Manager Timur Perelmutov Jon Bakken Don Petravick.
J Jensen / WP5 /RAL UCL 4/5 March 2004 GridPP / DataGrid wrap-up Mass Storage Management J Jensen
EGEE Data Management Services
Jean-Philippe Baud, IT-GD, CERN November 2007
StoRM: a SRM solution for disk based storage systems
Status of the SRM 2.2 MoU extension
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Data Management cluster summary
INFNGRID Workshop – Bari, Italy, October 2004
Presentation transcript:

HEPiX GFAL and LCG data management Jean-Philippe Baud CERN/IT/GD

HEPiX Agenda LCG Data Management goals Common interface Current status Current developments Medium term developments Conclusion

HEPiX LCG Data Management goals Meet requirements of Data Challenges Common interface Reliability Performance

HEPiX Common interfaces Why? Different grids: LCG, Grid3, Nordugrid Different Storage Elements Possibly different File Catalogs Solutions Storage Resource Manager (SRM) Grid File Access Library (GFAL) Replication and Registration Service (RRS)

HEPiX Storage Resource Manager Goal: agree on single API for multiple storage systems Collaboration between CERN, FNAL, JLAB and LBNL and EDG SRM is a Web Service Offering Storage resource allocation & scheduling SRMs DO NOT perform file transfer SRMs DO invoke file transfer service if needed (GridFTP) Types of storage resource managers Disk Resource Manager (DRM) Hierarchical Resource Manager (HRM) SRM is being discussed at GGF and proposed as a standard

HEPiX Grid File Access Library (1) Goals Provide a Posix I/O interface to heterogeneous Mass Storage Systems in a GRID environment A job using GFAL should be able to run anywhere on the GRID without knowing about the services accessed or the Data Access protocols supported

HEPiX Grid File Access Library (2) Services contacted Replica Catalogs Storage Resource Managers Mass Storage Systems through diverse File Access protocols like FILE, RFIO, DCAP, (ROOT I/O) Information Services: MDS

HEPiX Grid File Access Library (3) Wide Area Access Physics Application Replica Catalog Client SRM ClientLocal File I/O rfio I/O open() read() etc. dCap I/O open() read() etc. Grid File Access Library (GFAL) SRM Service dCap Service rfio Service RC Services POSIX I/O VFS root I/O open() read() etc. Root I/O Service POOL Information Services Client MDS

HEPiX GFAL File System GFALFS now based on FUSE (Filesystem in USErspace) file system developed by Miklos Szeredi Uses: VFS interface Communication with a daemon in user space (via character device) The metadata operations are handled by the daemon, while the I/O (read/write/seek) is done directly in the kernel to avoid context switches and buffer copy Requires installation of a kernel module fuse.o and of the daemon gfalfs The file system mount can be done by the user

HEPiX GFAL support GFAL library is very modular and is small (~ 2500 lines of C): effort would be minimal unless new protocols or new catalogs have to be supported Test suite available GFAL file system: Kernel module: 2000 lines (FUSE original) lines (GFAL specific for I/O optimization) Daemon: 1600 lines (FUSE unmodified) lines GFAL specific (separate file) Utilities like mount: 600 lines (FUSE + 5 lines mod)

HEPiX Replication and Registration Service Copy and register files Multiple SEs and multiple Catalogs Different types of SE Different types of RC Different transfer protocols Optimization, handling of failures Meeting at LBNL in September 2003 with participants from CERN, FNAL, Globus, JLAB and LBNL Refined proposal by LBNL being discussed

HEPiX Current status (1) SRM SRM 1.1 interfaced to CASTOR (CERN), dCache (DESY/FNAL), HPSS (HRM at LBNL) SRM 1.1 interface to EDG-SE being developed (RAL) SRM 2.1 being implemented at LBNL, FNAL, JLAB SRM basic being discussed at GGF SRM is seen by LCG as the best way currently to do the load balancing between GridFTP servers. This is used at FNAL.

HEPiX Current status (2) EDG Replica Catalog (improvements for POOL) being tested Server works with Oracle (being tested with MySQL) EDG Replica Manager in production (works with classical SE and SRM) on LCG certification testbed (support for EDG-SE) Stability and error reporting being improved

HEPiX Current status (3) Disk Pool Manager CASTOR, dCache and HRM were considered for deployment at sites without MSS. dCache is the product that we are going to ship with LCG2 but this does not prevent sites having another DPM or MSS to use it. dCache is still being tested in the LCG certification testbed

HEPiX CASTOR This solution was tried first because of local expertise Functionality ok Solution dropped by CERN IT management for lack of manpower to do the support worldwide

HEPiX HRM/DRM (Berkeley) This system has been used in production for more than a year to transfer data between Berkeley and Brookhaven for the STAR experiment The licensing and support was unclear However VDT will probably distribute this software IN2P3 (Lyon) is investigating if they could use this solution to provide an SRM interface to their HPSS system

HEPiX dCache (DESY/FNAL) Joint project between DESY and FNAL DESY developed the core part of dCache while FNAL developed the Grid interfaces (GridFTP and SRM) and monitoring tools dCache is used in production at DESY and FNAL, but also at some Tier centers for CMS IN2P3 is also investigating if dCache could be used as a frontend to their HPSS system

HEPiX Current status (4) Grid File Access Library Offers Posix I/O API and generic routines to interface to the EDG RC, SRM 1.1, MDS A library lcg_util built on top of gfal offers a C API and a CLI for Replica Management functions. They are callable from C++ physics programs and are faster than the current Java implementation. A File System based on FUSE and GFAL is being tested (both at CERN and FNAL)

HEPiX LCG-2 SE (April release) Mass Storage access – to tape SRM interfaces exist for Castor, Enstore/dCache, HPSS SRM SEs available at CERN, FNAL, INFN, PIC Classic SEs (GridFTP, no SRM) deployed everywhere else GFAL included in LCG-2 – it has been tested against CASTOR SRM and rfio as well as against Enstore/dCache SRM and Classic SEs.

HEPiX Test suites Test suites have been written and run against classic SE, CASTOR and dCache for: SRM GFAL library and lcg_util The new version (better performance) of the GFAL File System is being extensively tested against CASTOR and the tests against dCache have started Latest versions (> 1.6.2) of the Replica Manager support both the classical SEs and the SRM SEs

HEPiX File Catalogs in LCG-2 Problems were seen during Data Challenges performance of java CLI tools performance problems due to lack of bulk operations no major stability problems JOINs between Replica Catalog and Metadata Catalog is expensive worked with users and other middleware to reduce these joins (often unnecessary)

HEPiX Proposal for next Catalogs Build on current catalogs, and satisfy medium term needs from the DC's Replica Catalog like current LRC, but not "local" we never had "local" ones anyway, since RLI was not deployed no user defined attributes in catalog -> no JOINs File Catalog store Logical File Names impose a hierarchical structure, and provide "directory- level operations user defined metadata on GUID (like in current RMC)

HEPiX Replication of Catalogs need to remove single point of failure and load during one Saturday of CMS DC, Catalogs accounted for 9% of all external traffic at CERN. RLI (distributed indexes) were never tested or deployed RLI does not solve distributed metadata query problem (only indexes GUIDs) IT/DB tested Oracle based replication with CMS during Data Challenge Proposed to build on this work, and use replicated, not distributed catalogs small number of sites (~4 - 10) New design (Replica Catalog and File Catalog) should reduce replication conflicts need to design the conflict resolution policy - last updated might be good enough

HEPiX Questions (1) Is this a good time to introduce security ? authenticated transactions would help with problem analysis How many sites should have replicated catalogs ? Sites require Oracle (not a large problem, most Tier1's have it and license is not a problem) replication conflicts rise with more sites. It depends on outbound TCP issues from worker nodes (but a proxy could be used).

HEPiX Questions (2) What about MySQL as a backend? Oracle/MySQL interaction being investigated by IT/DB and others under a "Distributed Database Architecture" proposal replication between the two is possible Likely to use MySQL at Tier-2s and Tier-1s without Oracle Need to investigate which is minimum version of MySQL we require probably will be MySQL 5.x, when it is stable

HEPiX Current developments Bulk operations in EDG RC (LCG certification testbed) Integration of GFAL with ROOT Classes TGfal and TGfalFile Support of ROOT I/O in GFAL Interface GFAL and lcg_util to EDG-SE

HEPiX Medium term developments Reshuffling of Replica Catalogs for performance Replicated Catalogs instead of Distributed Catalogs File Collections? SRM 2.1 Replication/Registration Service (Arie Shoshani) Integration of POOL with GFAL to reduce dependencies (using TGfal class)

HEPiX Important features of SRM 2.1 for LCG (compared to SRM 1.1) Global space reservation Directory operations Better definition of statuses and error codes

HEPiX Conclusion In the past 12 months Common interfaces have been designed, implemented and deployed (SRM and GFAL) The reliability of the Data Management tools has been improved quite considerably We are still improving the performance of those tools