Grid Update Henry Nebrensky Brunel University MICE Collaboration Meeting CM23.

Slides:



Advertisements
Similar presentations
ATLAS T1/T2 Name Space Issue with Federated Storage Hironori Ito Brookhaven National Laboratory.
Advertisements

Grid Data Management Assaf Gottlieb - Israeli Grid NA3 Team EGEE is a project funded by the European Union under contract IST EGEE tutorial,
Software Summary Database Data Flow G4MICE Status & Plans Detector Reconstruction 1M.Ellis - CM24 - 3rd June 2009.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Computing Panel Discussion Continued Marco Apollonio, Linda Coney, Mike Courthold, Malcolm Ellis, Jean-Sebastien Graulich, Pierrick Hanlet, Henry Nebrensky.
Henry Nebrensky – Data Flow Workshop – 30 June 2009 MICE Data Flow Workshop Henry Nebrensky Brunel University 1.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
Software Summary 1M.Ellis - CM23 - Harbin - 16th January 2009  Four very good presentations that produced a lot of useful discussion: u Online Reconstruction.
Henry Nebrensky - MICE CM June 2009 MICE Data Flow Henry Nebrensky Brunel University 1.
Henry Nebrensky - MICE VC May 2009 MICE Data and the Grid 1  Storage, archiving and dissemination of experimental data: u Not been a high priority.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Nick Brook Current status Future Collaboration Plans Future UK plans.
INFSO-RI Enabling Grids for E-sciencE Project Gridification: the UNOSAT experience Patricia Méndez Lorenzo CERN (IT-PSS/ED) CERN,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware Data Management in gLite.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Software Status  Last Software Workshop u Held at Fermilab just before Christmas. u Completed reconstruction testing: s MICE trackers and KEK tracker.
Enabling Grids for E-sciencE Introduction Data Management Jan Just Keijser Nikhef Grid Tutorial, November 2008.
Configuration Database David Forrest 15th January 2009 CM23, HIT, Harbin.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
INFSO-RI Enabling Grids for E-sciencE Αthanasia Asiki Computing Systems Laboratory, National Technical.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
Henry Nebrensky – MICE DAQ review - 4 June 2009 MICE Data Flow Henry Nebrensky Brunel University 1.
INFSO-RI Enabling Grids for E-sciencE Introduction Data Management Ron Trompert SARA Grid Tutorial, September 2007.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
David Adams ATLAS ATLAS distributed data management David Adams BNL February 22, 2005 Database working group ATLAS software workshop.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Data management in LCG and EGEE David Smith.
The GridPP DIRAC project DIRAC for non-LHC communities.
Database David Forrest. What database? DBMS: PostgreSQL. Run on dedicated Database server at RAL Need to store information on conditions of detector as.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Testing the HEPCAL use cases J.J. Blaising, F. Harris, Andrea Sciabà GAG Meeting April,
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
1 Configuration Database David Forrest University of Glasgow RAL :: 31 May 2009.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
Stephen Burke – Sysman meeting - 22/4/2002 Partner Logo The Testbed – A User View Stephen Burke, PPARC/RAL.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
The GridPP DIRAC project DIRAC for non-LHC communities.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data management in EGEE.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Architecture of LHC File Catalog Valeria Ardizzone INFN Catania – EGEE-II NA3/NA4.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Algiers, EUMED/Epikh Application Porting Tutorial, 2010/07/04.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Grid Data Management Assaf Gottlieb Tel-Aviv University assafgot tau.ac.il EGEE is a project funded by the European Union under contract IST
User Domain Storage Elements SURL  TURL LFC Domain (LCG File Catalogue) SA1 – Data Grid Interoperation Enabling Grids for E-sciencE EGEE-III INFSO-RI
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America LFC Server Installation and Configuration.
Creating a simplified global unique file catalogue Miguel Martinez Pedreira Pablo Saiz.
EGEE Data Management Services
Jean-Philippe Baud, IT-GD, CERN November 2007
Online – Data Storage and Processing
gLite Data management system overview
Data Management and Database Framework for the MICE Experiment
SRM2 Migration Strategy
Artem Trunov and EKP team EPK – Uni Karlsruhe
Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF
Data Management Ouafa Bentaleb CERIST, Algeria
Data services in gLite “s” gLite and LCG.
Architecture of the gLite Data Management System
Integrating SRB with the GIGGLE framework
Presentation transcript:

Grid Update Henry Nebrensky Brunel University MICE Collaboration Meeting CM23

The Awesome Power of Grid Computing The Grid provides seamless interconnection between thousands of computers. It therefore generates new acronyms and jargon at superhuman speed.

MICE and the Grid (1) ● Computing: 10 UK sites & Sofia supporting MICE ● Storage: 9 sites & RAL Tier 1 supporting MICE ● Job submission: 2 each at Glasgow and RAL ● UK centric! Even though we don't yet need to ask to ask for more resources, does everyone know whom they should ask and how long the bureaucracy takes?

MICE and the Grid (2) ● g4Beamline tested on Grid over a year ago ● G4MICE in regular use – ~1000 simulation jobs/day ● Job submission: recent RB->WMS change ● I will make a bundle of config files for the gLite CLI available, for people to install their own UI – gLite User Guide: ● Ganga GUI (as used by Atlas) also works for MICE, see D. Forrest MICE Note soon.

MICE and the Grid (3) ● Grid interface to CASTOR storage at RAL Tier1 has been deployed (not sure what MICE quota is) ● Space unused – need to test (both interface and storage). We should ensure that the Tier 1 is aware of our activities; check: – and route requests via Paul Kyberd. ● Storage elsewhere has been used (e.g. staging of simulation output) but as yet we are not yet formally storing data on the Grid.

MICE and the Grid (4) ● We are currently using EGEE/WLCG middleware and resources, as they are receiving significant development effort and are a reasonable match for our needs (shared with various minor experiments such as LHC) ● Outside Europe other software may be expected – e.g. the OSG stack in the US. Interoperability is from our perspective a “known unknown”...

MICE and Grid Data Storage ● The Grid can provide provide MICE not only with computing (number-crunching) power, but also with a secure global framework allowing users access to data ● Good news: storing development data on the Grid keeps it available to the collaboration – not stuck on an old PC in the corner of the lab ● Bad news: loss of ownership – who picks up the data curation responsibilities?

Grid File Management ● Each file is given a unique, machine-generated, GUID when stored on the Grid ● The file is physically uploaded to one (or more) SEs (Storage Elements) where it is given a machine-generated SURL (Storage URL) ● Machine-generated names are not (meant to be) human-usable ● A “replica catalogue” tracks the multiple SURLs of a GUID

File Catalogue ● For sanity's sake we would like to associate nice sensible filenames with each file (LFN, Logical File Name) ● A “file catalogue” is basically a database that translates between something that looks like a Unix filesystem and the GUIDs and SURLs needed to actually access the data on the Grid ● MICE has an instance of LFC (LCG File Catalogue) run by the Tier 1 at RAL ● The LFC service can do both the replica and LFN cataloguing

File Catalogue Namespace ● We need to agree on a consistent namespace for the file catalogue ● The aim is NOT to replicate the experiment structure, instead we want an easy-to-understand structure that is compact but without too many entries in each low-level directory (aim for 20-50)

Concepts (1) 1) At present it is hard-to-impossible to delete a directory out of the LFC namespace. Avoid excess complexity – prevents creating dead branches 2) Ideally a directory should contain either only more subdirectories, or only some files

Concepts (2) 1) Don't assume it will be possible to browse this from a graphical interface with thumbnails – if you have to search through the LFC by hand, it will be painful even with the ideal structure 2) Moving things will cause confusion (though LFC allows multiple “soft” links) 3) MC simulations close to that which they model

Namespace (half-baked proposal) ● We get given /grid/mice/ by the server ● Four upper-level directories: – Construction/ historical data from detector development and QA – Calibration/ needed during analysis (large datasets, c.f. DB) – TestBeam/ test beam data – MICE/ DAQ output and corresponding MC simulation

Namespace proposal (1) /grid/mice/MICE /StepN/ RefMomentum? / MC or data ● Split into MICE Step. ● Do we need more subdivision e.g. by momentum? ● Separate directory for MC results, or in filename? /TestBeam/place/... ● KEK, Fermi, RAL (Tracker Cosmics)? ● The subdivisions to be place-specific

Namespace proposal (2) /grid/mice /Construction/module/data /Construction/Tracker/TrackerQA /Construction/Tracker/OpticalQA /Construction/Tracker/FibreDamage ● What about the other modules? ● Should we bother with raw files, or tarball/zip each subset up?

Namespace proposal (3) /grid/mice /Calibration/Tracker/VLPC /Calibration/BeamLine/FieldMaps ● How should this be split up = what else will be in here – e.g. are the solenoid field maps part of the spectrometer /grid/mice/Calibration/Spectrometer/FieldMaps or part of the beamline /grid/mice/Calibration/BeamLine/FieldMaps/SpectrometerSolenoid or do we put the field maps together /grid/mice/Calibration/FieldMaps/SpectrometerSolenoid /grid/mice/Calibration/FieldMaps/Quads

Namespace proposal (3  ) /grid/mice /Users/name ● For people to use as scratch space for their own purposes ● Encourage people to do this through LFC – helps avoid “dark data” ● LFC allows Unix-style access permissions

Namespace We are starting to deploy this now, so we need a (crude) idea of – what calibration data will be needed – how it will be provided - O(# of files), format, size I have no ( = zero ) idea of what to expect from most of MICE for this !!! What info should be represented by the directory structures and what within the filename? Need to be consistent in use of upper and lower case in LFNs, else will create bogus entries.

Metadata Catalogue (1) ● For many applications – such as analysis – you will want to identify the list of files containing the data that matches some parameters ● This is done by a “metadata catalogue”. For MICE this doesn't yet exist ● A metadata catalogue can in principle return either the GUID or an LFN

Metadata Catalogue (2) ● We need to select a technology to use for this – use the conditions database – gLite AMGA (who else uses it – will it remain supported?) ● Need to implement – i.e. register metadata to files – What metadata will be needed for analysis? ● Should the catalogue include the file format and compression scheme (gzip ≠ PKzip)?

Metadata Catalogue (2) for Humans or, in non-Gridspeak: ● we have several databases (conditions DB, EPICS, e-Logbook) where we should be able to find all sorts of information about a run/timestamp. ● but how do we know which runs to be interested in, for our analysis? ● we need an “index” to the MICE data, and for this we need to define the set of “index terms” that will be used to search for relevant datasets.

Grid Data Access Protocols Our current tools are based around the transfer of whole files to/from a local disk on the processing machine. The Grid also allows “POSIX I/O” (i.e. random access) directly to files on the local SE, using a secure version of the RFIO protocol. This would require compiling libshift into G4MICE. This would be useful in cases where we need to access only a small part of the data in a file. We currently don't see any need for this in MICE.

Last Slide ● “My laptop battery is flat” is no longer an excuse for not getting some simulation/analysis done!