Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.

Slides:



Advertisements
Similar presentations
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
Advertisements

Grid Data Management Assaf Gottlieb - Israeli Grid NA3 Team EGEE is a project funded by the European Union under contract IST EGEE tutorial,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Ninth EELA Tutorial for Users and Managers E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
EGEE-II INFSO-RI Enabling Grids for E-sciencE gLite Data Management System Yaodong Cheng CC-IHEP, Chinese Academy.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing.
EGEE-III INFSO-RI Enabling Grids for E-sciencE The Medical Data Manager : the components Johan Montagnat, Romain Texier, Tristan.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware Data Management in gLite.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) GISELA Additional Services Diego Scardaci
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
DIRAC Review (13 th December 2005)Stuart K. Paterson1 DIRAC Review Exposing DIRAC Functionality.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
1 DIRAC Interfaces  APIs  Shells  Command lines  Web interfaces  Portals  DIRAC on a laptop  DIRAC on Windows.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
LHCb The LHCb Data Management System Philippe Charpentier CERN On behalf of the LHCb Collaboration.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Author: Andrew C. Smith Abstract: LHCb's participation in LCG's Service Challenge 3 involves testing the bulk data transfer infrastructure developed to.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
Introduction to The Storage Resource.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
SEE-GRID-SCI Storage Element Installation and Configuration Branimir Ackovic Institute of Physics Serbia The SEE-GRID-SCI.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Medical Data Manager 1 Dicom retrieval : overview of the DPM One command line to retrieve a file:
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
DIRAC Data Management: consistency, integrity and coherence of data Marianne Bargiotti CERN.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Data management in LCG and EGEE David Smith.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Data Management The European DataGrid Project Team
1 Egrid portal Stefano Cozzini and Angelo Leto. 2 Egrid portal Based on P-GRADE Portal 2.3 –LCG-2 middleware support: broker, CEs, SEs, BDII –MyProxy.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
1 Data Management for Internet Backplane Protocol by Tang Ming Assoc/Prof. Francis Lee School of Computer Engineering, Nanyang Technological University,
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data management in EGEE.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The AliEn File Catalogue Jamboree on Evolution of WLCG Data &
12th EELA Tutorial for Users and Managers E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Algiers, EUMED/Epikh Application Porting Tutorial, 2010/07/04.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) LFC Installation and Configuration Dong Xu IHEP,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Integration of China Relics and gLite with gLibrary You MENG
Enabling Grids for E-sciencE EGEE-II INFSO-RI Status of SRB/SRM interface development Fu-Ming Tsai Academia Sinica Grid Computing.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
1 DIRAC Project Status A.Tsaregorodtsev, CPPM-IN2P3-CNRS, Marseille 10 March, DIRAC Developer meeting.
Grid Data Management Assaf Gottlieb Tel-Aviv University assafgot tau.ac.il EGEE is a project funded by the European Union under contract IST
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Data Management Maha Metawei
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Scuola Grid INFN, Trieste, 1-12 Dic Managing Confidential Data in the gLite Middleware – The Secure Storage.
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
EGEE Data Management Services
gLite Basic APIs Christos Filippidis
Oxana Smirnova, Jakob Nielsen (Lund University/CERN)
Java API del Logical File Catalog (LFC)
Scuola Grid INFN, Martina Franca, Nov
gLite Data management system overview
Hands-On Session: Data Management
Riccardo Bruno, Salvatore Scifo gLite - Tutorial Catania, dd.mm.yyyy
GSAF Grid Storage Access Framework
Data Management Ouafa Bentaleb CERIST, Algeria
Data services in gLite “s” gLite and LCG.
Data Management in LHCb: consistency, integrity and coherence of data
EGEE Middleware: gLite Information Systems (IS)
Architecture of the gLite Data Management System
gLite Data and Metadata Management
DIRAC Data Management: consistency, integrity and coherence of data
Presentation transcript:

Managing Data DIRAC Project

Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation commands  Data bookkeeping with the File Catalog CLI  Replica Catalog  Metadata Catalog KEK 10/2012DIRAC Tutorial

Data Management components  Storage Elements  gLite/EGI Storage Elements  Standard SRM interface  Gridftp protocol  Need Globus libraries, limited number of platforms  Allow third party transfers between them  Managed by the site managers within EGI SLAs  DIRAC Storage Elements  DISET based components  DIPS (Dirac Secure Protocol)  Does not allow third party transfers  Replication through local cache  Third party transfers will be available in the future KEK 10/2012DIRAC Tutorial

Data Management components  File Catalogs  LCG File Catalog (LFC)  Part of the EGI middleware  Service provided by the NGI  ORACLE backend  Client tools: command line, Python API  Need Globus libraries  No User Metadata support  DIRAC File Catalog  DISET based components  Part of the DIRAC set of services  Community service  MySQL backend  Client tools: command line, CLI, Python API  Support of the User Metadata KEK 10/2012DIRAC Tutorial

Data Management components  For DIRAC users the use of any Storage Element or File Catalog is transparent  Community choice which components to use  Different SE types can be mixed together  Several File Catalogs can be used in parallel  Complementary functionality  Redundancy  Users see depending on the DIRAC Configuration  Logical Storage Elements  e.g. CERN-SRM, KEK2-SRM, DESY-SRM  Logical File Catalog KEK 10/2012DIRAC Tutorial

DIRAC data naming conventions  Each file is identified by its Logical File Name (LFN)  Primary unique identifier  GUIDs are supported but their uniqueness is under the responsibility of user applications  This is different from LFC  Mostly for support of some applications, e.g. ROOT I/O  LFN construction  Starts always with the VO name  /ilc/…  User data  /ilc/user/a/amiyamot/…  PFN (Physical File Name) construction  Always contains LFN as it trailing part KEK 10/2012DIRAC Tutorial

Data operation commands  dirac-dms-add-file  Upload file to the grid SE (lcg-cr)  dirac-dms-get-file  Download file to the grid SE (lcg-cp)  dirac-dms-replicate-lfn  Make another replica of a file (lcg-rep)  dirac-dms-lfn-replicas  List replicas of a given file (lcg-lr)  dirac-dms-user-lfns  Get a list of all the user files  Plus others …  See tutorial materials KEK 10/2012DIRAC Tutorial

File Catalog CLI  Specialized shell with common commands collected together with a “file system” look-n-feel  Namespace browsing: cd, ls  Finding info: size, meta get  Data operations: add, get, replicate, rm  Metadata operations, meta (set,get,show), find KEK 10/2012DIRAC Tutorial

Asynchronous operations  File Catalog operations are generally synchronous  Quick, can wait for the prompt  Physical data operations can take very long time  And even fail in the end  For example, consider removing data:  Delete replicas on all the SEs  Delete files (lfns)  Delete directories ( recursively )  Long operations are performed asynchronously  Do not wait for completion  Make sure the operation is accomplished despite possible problems KEK 10/2012DIRAC Tutorial

Tutorial Tutorial page With DIRAC command line tools  Getting data files to the grid  Downloading data files from the grid  Replicating files  Exploring the File Catalog console KEK 10/2012DIRAC Tutorial

File Catalog Metadata  Metadata can be associated with each directory as key:value pairs to describe its contents  Int, Float, String, DateTime value types  Some metadata variables can be declared indices  Those can be used for data selections  Subdirectories are inheriting the metadata of their parents  Data selection with metadata queries  Example:  find /ilc/user Meta1=Value1 Meta2>3 Meta2<5 Meta3=2,3,4  File metadata can also be defined KEK 10/2012DIRAC Tutorial

File Catalog Metadata (2)  The functionality is similar to the AMGA gLite service  The internal structure is very different  Different scalability properties  BES Collaboration (IHEP, Beijing) performed an extensive comparison of DFC vs AMGA  Similar performance  DFC is chosen for their Computing Model  Some features of DFC  Support for the data provenance information  Ancestor descendent relationships  Support for efficient storage usage reports  Real time  Necessary for the storage quota policies KEK 10/2012DIRAC Tutorial

Tutorial With File Catalog CLI: Upload several files in several directories Define directory metatags with values Define file metatags Find files by metadata KEK 10/2012DIRAC Tutorial