Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, 13-14 November 2008.

Slides:



Advertisements
Similar presentations
Workflows over Grid-based Web services General framework and a practical case in structural biology gLite 3.0 Data Management David García Aristegui Grid.
Advertisements

EGEE-II INFSO-RI Enabling Grids for E-sciencE The gLite middleware distribution OSG Consortium Meeting Seattle,
Grid Data Management Assaf Gottlieb - Israeli Grid NA3 Team EGEE is a project funded by the European Union under contract IST EGEE tutorial,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
EGEE-II INFSO-RI Enabling Grids for E-sciencE gLite Data Management System Yaodong Cheng CC-IHEP, Chinese Academy.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing.
INFSO-RI Enabling Grids for E-sciencE Project Gridification: the UNOSAT experience Patricia Méndez Lorenzo CERN (IT-PSS/ED) CERN,
Data Management The GSM-WG Perspective. Background SRM is the Storage Resource Manager A Control protocol for Mass Storage Systems Standard protocol:
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware Data Management in gLite.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management and Interoperability Peter Kunszt (JRA1 DM Cluster) 2 nd EGEE Conference,
E-science grid facility for Europe and Latin America Data Management Services E2GRIS1 Rafael Silva – UFCG (Brazil) Universidade Federal.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
EGEE is a project funded by the European Union under contract IST Grid Data Management Roberto Barbera Univ. Of Catania and INFN
SEE-GRID-SCI Storage Element Installation and Configuration Branimir Ackovic Institute of Physics Serbia The SEE-GRID-SCI.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Data management in LCG and EGEE David Smith.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite Data Management Components Presenter.
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Distributed Data Access Control Mechanisms and the SRM Peter Kunszt Manager Swiss Grid Initiative Swiss National Supercomputing Centre CSCS GGF Grid Data.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data management in EGEE.
Istituto Nazionale di Astrofisica Information Technology Unit INAF-SI Job with data management Giuliano Taffoni.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra gLite 1.4 Data Management System Salvatore Scifo, Riccardo Bruno Test.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra Data Management System gLite – LCG – FiReMan Salvatore Scifo INFN Catania.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Algiers, EUMED/Epikh Application Porting Tutorial, 2010/07/04.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
Introduction to Storage Element Hsin-Wei Wu Academia Sinica Grid Computing Center, Taiwan.
Grid Data Management Assaf Gottlieb Tel-Aviv University assafgot tau.ac.il EGEE is a project funded by the European Union under contract IST
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Data Management Maha Metawei
User Domain Storage Elements SURL  TURL LFC Domain (LCG File Catalogue) SA1 – Data Grid Interoperation Enabling Grids for E-sciencE EGEE-III INFSO-RI
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
Martedi 8 novembre 2005 Consorzio COMETA “Progetto PI2S2” FESR Data Management System Annamaria Muoio -- INFN Catania PI2S2 First Tutorial -- Messina,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
Enabling Grids for E-sciencE EGEE-II INFSO-RI The Development of SRM interface for SRB Fu-Ming Tsai Academia Sinica Grid Computing.
EGEE Data Management Services
GFAL Grid File Access Library
GFAL Grid File Access Library
GFAL: Grid File Access Library
gLite Basic APIs Christos Filippidis
Grid Computing: Running your Jobs around the World
Classic Storage Element
The Data Grid: Towards an architecture for Distributed Management
Vincenzo Spinoso EGI.eu/INFN
Status of the SRM 2.2 MoU extension
Java API del Logical File Catalog (LFC)
gLite Data management system overview
Introduction to Data Management in EGI
Hands-On Session: Data Management
Data Management cluster summary
Data Management Ouafa Bentaleb CERIST, Algeria
Data services in gLite “s” gLite and LCG.
EGEE Middleware: gLite Information Systems (IS)
Architecture of the gLite Data Management System
gLite Data and Metadata Management
INFNGRID Workshop – Bari, Italy, October 2004
Data Management system in gLite middleware
Presentation transcript:

Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008

2 Enabling Grids for E-sciencE Outline Introduction SRM Storage Elements in gLite LCG File Catalog (LFC) Information System

Grid Tutorial, November Enabling Grids for E-sciencE Introduction Grid infrastructures are usually used for analysing and manipulating large amounts of data coming from scientific instruments and other sources Example: LOFAR, MAGIC,…..

Grid Tutorial, November Enabling Grids for E-sciencE Introduction Example: Large Hadron Collider Produces ~15 PByte/year Grid computing for data storage and processing Depends on EGEE and OSG infrastructure

Grid Tutorial, November Enabling Grids for E-sciencE Introduction Data is stored at CERN and 11 other (tier1) sites Data is processed at CERN, the 11 tier1 sites and ~100 tier2 sites

Grid Tutorial, November Enabling Grids for E-sciencE Introduction Data management tools enables the usage and sharing data in a grid environment

Grid Tutorial, November Enabling Grids for E-sciencE Introduction Storage Infrastructures –Disk –Hierarchical Storage Management (HSM)  The hierarchy consists of different types of storage media, such as disks systems or tape, each type representing a different level of cost and speed of retrieval  policy-based management of file backup and archiving without the user needing to be aware of when files are being retrieved from or stored on backup storage media. Example: files that have not been used for some time are automatically migrated from disk to tape policybackuparchiving  HSM Software: TSM, DMF, CASTOR, Enstore, HPSS,…

Grid Tutorial, November Enabling Grids for E-sciencE Introduction How do we link users, user programs and the data given the fact that data is distributed over different storage systems?

Grid Tutorial, November Enabling Grids for E-sciencE Introduction Data management in the Grid environment needs: A system which keeps track of the location of all files and copies of those files A uniform interface for all storage systems

Grid Tutorial, November Enabling Grids for E-sciencE SRM Uniform access to heterogeneous storage resources on the Grid: SRM Storage Resource Managers –SRM is a control protocol for:  Space reservation  File management  Replication  Protocol negotiation

Grid Tutorial, November Enabling Grids for E-sciencE SRM SRM implementation –SRM I/F is implemented as a web service –Implementations for dCache, DPM, SRB, …. SRM Examples –srmLs –srmPrepareToPut –srmBringOnline –srmCopy –srmGetTransferProtocols The user never gets to see this, since SRM is hidden by the gLite client software

Grid Tutorial, November Enabling Grids for E-sciencE Storage Elements in gLite DPM –SRM –Data Transfer protocols: gridftp, secure rfio –Storage type: disk dCache –SRM –Data Transfer protocols: gridftp, gsidcap, xrootd –Storage type: disk, HSM StoRM –SRM –Data Transfer protocols: gridftp, rfio –Storage type: disk

Grid Tutorial, November Enabling Grids for E-sciencE LFC –Keeps track of the location of copies (replicas) of files on the Grid

Grid Tutorial, November Enabling Grids for E-sciencE LFC Name conventions Logical File Name (LFN) –An alias created by a user to refer to some item of data, e.g. “lfn:/grid/tutor/mydir/myfile” –Unix-like namespace Globally Unique Identifier (GUID) –A non-human-readable unique identifier for an item of data, e.g. “guid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6” Site URL (SURL) –The location of an actual piece of data on a storage system, e.g. “srm://pcrd24.cern.ch/flatfiles/cms/output10_1” Transport URL (TURL) –Locator of a replica + access protocol: understood by a SE, e.g. “rfio://lxshare0209.cern.ch//data/alice/ntuples.dat”

Grid Tutorial, November Enabling Grids for E-sciencE Naming conventions How do they fit together? –LFC holds the mapping LFN-GUID-SURL LFN 1 LFN i : SURL j GUID : : : TURL j1 TURL jl : TURL 11 TURL 1k SURL 1 LFC

Grid Tutorial, November Enabling Grids for E-sciencE LFC

Grid Tutorial, November Enabling Grids for E-sciencE LFC LFN acts as main key in the database. It has: –Symbolic links to it (additional LFNs) –Unique Identifier (GUID) –System metadata –Information on replicas –One field of user metadata

Grid Tutorial, November Enabling Grids for E-sciencE LFC Two kinds of LFC –Central LFC For each VO, one site on the grid will publish a global catalog. This will record entries (file replicas or dataset entities) across the whole of the grid. –Local LFC Local catalogs record the file replicas stored at that site's SEs only.

Grid Tutorial, November Enabling Grids for E-sciencE LFC Integrated GSI Authentication + Authorization Access Control Lists (Unix Permissions and POSIX ACLs) Sessions (multiple operations inside a single transaction ) Bulk operations (inside transactions )

Grid Tutorial, November Enabling Grids for E-sciencE LFC LFC interfaces Interaction with the WMS(RB) –The InputSandbox and OutputSandbox should only be used for small amounts of data. Large files should be on SEs –The RB can locate Grid files: allows for data-based match- making –Jdl file:  InputData = "lfn:/grid/tutor/MyFile"; oThe lfn’s / guid’s needed by the job as an input to the process oTells RB to schedule job on CE close to SE holding the file oglite-brokerinfo getInputData returns list of files in InputData attribute  OutputSE=srm.grid.sara.nl”; olocation of a SE where the output data will be stored  DataAccessProtocol=“gsiftp”; oThe list of protocols that the application is able to “speak” for accessing files listed in the InputData

Grid Tutorial, November Enabling Grids for E-sciencE LFC LFC interfaces –Commandline interface and C/C++/Python api –Lcg_utils commandline tools and API  Combined operations on LFC and data –GFAL  Provides a Posix-like interface for File I/O Operation More to come in the next talk!

Grid Tutorial, November Enabling Grids for E-sciencE Information system Finding out where to put your data: BDII –BDII collects information of all nodes running grid services in the EGEE infrastructure. –Based on ldap Need to set environment variable LCG_GFAL_INFOSYS –Needs to be set to a BDII. Example: bdii.grid.sara.nl:2170

Grid Tutorial, November Enabling Grids for E-sciencE Information system lcg-infosites –Example: finding an SE: > lcg-infosites --vo tutor se Avail Space(Kb) Used Space(Kb) Type SEs n.a n.a gb-se-ams.els.sara.nl n.a n.a gb-se-wur.els.sara.nl n.a se.grid.rug.nl n.a srm.grid.sara.nl –Example: finding an LFC > lcg-infosites --vo tutor lfc lfc.grid.sara.nl

Grid Tutorial, November Enabling Grids for E-sciencE Information system lcg-info For more advanced searches: For example, finding out where to put your files >lcg-info --vo tutor --list-se --query='SE=srm.grid.sara.nl' --attrs=Path - SE: srm.grid.sara.nl - Path /pnfs/grid.sara.nl/data/tutor

Grid Tutorial, November Enabling Grids for E-sciencE Links gLite User Guide: UserGuide.html UserGuide.html

Grid Tutorial, November Enabling Grids for E-sciencE Questions?