1 LHCb view on Baseline Services A.Tsaregorodtsev, CPPM, Marseille Ph.Charpentier CERN Baseline Services WG, 4 March 2005, CERN.

Slides:



Advertisements
Similar presentations
INFSO-RI Enabling Grids for E-sciencE EGEE and gLite Slides by: Erwin Laure EGEE Deputy Middleware Manager.
Advertisements

Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Plateforme de Calcul pour les Sciences du Vivant SRB & gLite V. Breton.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
CHEP 2004, 27 September - 1 October 2004, Interlaken1 DIRAC – the distributed production and analysis for LHCb A.Tsaregorodtsev, CPPM, Marseille CHEP 2004,
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
INFSO-RI Enabling Grids for E-sciencE Comparison of LCG-2 and gLite Author E.Slabospitskaya Location IHEP.
LCG-France, 22 July 2004, CERN1 LHCb Data Challenge 2004 A.Tsaregorodtsev, CPPM, Marseille LCG-France Meeting, 22 July 2004, CERN.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Grid Initiatives for e-Science virtual communities in Europe and Latin America DIRAC TEAM CPPM – CNRS DIRAC Grid Middleware.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Backdrop Particle Paintings created by artist Tom Kemp September Grid Information and Monitoring System using XML-RPC and Instant.
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
LHCb week, 27 May 2004, CERN1 Using services in DIRAC A.Tsaregorodtsev, CPPM, Marseille 2 nd ARDA Workshop, June 2004, CERN.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
Author - Title- Date - n° 1 Partner Logo EU DataGrid, Work Package 5 The Storage Element.
4/5/2007Data handling and transfer in the LHCb experiment1 Data handling and transfer in the LHCb experiment RT NPSS Real Time 2007 FNAL - 4 th May 2007.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
1 LCG-France sites contribution to the LHC activities in 2007 A.Tsaregorodtsev, CPPM, Marseille 14 January 2008, LCG-France Direction.
1 DIRAC Interfaces  APIs  Shells  Command lines  Web interfaces  Portals  DIRAC on a laptop  DIRAC on Windows.
Author: Andrew C. Smith Abstract: LHCb's participation in LCG's Service Challenge 3 involves testing the bulk data transfer infrastructure developed to.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
1 LHCb File Transfer framework N. Brook, Ph. Charpentier, A.Tsaregorodtsev LCG Storage Management Workshop, 6 April 2005, CERN.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
CHEP 2006, February 2006, Mumbai 1 LHCb use of batch systems A.Tsaregorodtsev, CPPM, Marseille HEPiX 2006, 4 April 2006, Rome.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
1 DIRAC Job submission A.Tsaregorodtsev, CPPM, Marseille LHCb-ATLAS GANGA Workshop, 21 April 2004.
DIRAC Data Management: consistency, integrity and coherence of data Marianne Bargiotti CERN.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
1 DIRAC agents A.Tsaregorodtsev, CPPM, Marseille ARDA Workshop, 7 March 2005, CERN.
CHEP 2006, February 2006, Mumbai 1 DIRAC, the LHCb Data Production and Distributed Analysis system A.Tsaregorodtsev, CPPM, Marseille CHEP 2006,
GAG meeting, 5 July 2004, CERN1 LHCb Data Challenge 2004 A.Tsaregorodtsev, Marseille N. Brook, Bristol/CERN GAG Meeting, 5 July 2004, CERN.
1 DIRAC WMS & DMS A.Tsaregorodtsev, CPPM, Marseille ICFA Grid Workshop,15 October 2006, Sinaia.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
VO Box discussion ATLAS NIKHEF January, 2006 Miguel Branco -
DIRAC Distributed Computing Services A. Tsaregorodtsev, CPPM-IN2P3-CNRS FCPPL Meeting, 29 March 2013, Nanjing.
1 DIRAC project A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
Jean-Philippe Baud, IT-GD, CERN November 2007
Grid Computing: Running your Jobs around the World
ALICE and LCG Stefano Bagnasco I.N.F.N. Torino
StoRM: a SRM solution for disk based storage systems
U.S. ATLAS Grid Production Experience
INFN GRID Workshop Bari, 26th October 2004
Comparison of LCG-2 and gLite v1.0
The LHCb Software and Computing NSS/IEEE workshop Ph. Charpentier, CERN B00le.
EGEE VO Management.
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Short update on the latest gLite status
LHCb Grid Computing LHCb is a particle physics experiment which will study the subtle differences between matter and antimatter. The international collaboration.
Data Management in LHCb: consistency, integrity and coherence of data
DIRAC Data Management: consistency, integrity and coherence of data
INFNGRID Workshop – Bari, Italy, October 2004
Production Manager Tools (New Architecture)
The LHCb Computing Data Challenge DC06
Presentation transcript:

1 LHCb view on Baseline Services A.Tsaregorodtsev, CPPM, Marseille Ph.Charpentier CERN Baseline Services WG, 4 March 2005, CERN

2 Outline  DIRAC in a nutshell  Design goals  Architecture and components  Implementation technologies  Agent based deployment  Conclusion

3 DIRAC in a nutshell  LHCb grid system for the Monte-Carlo simulation data production and analysis  Integrates computing resources available at LHCb production sites as well as on the LCG grid  Composed of a set of light-weight services and a network of distributed agents to deliver workload to computing resources  Runs autonomously once installed and configured on production sites  Implemented in Python, using XML-RPC service access protocol DIRAC – Distributed Infrastructure with Remote Agent Control

4 DIRAC scale of resource usage  Deployed on 20 “DIRAC”, and 40 “LCG” sites  Effectively saturated LCG and all available computing resources during the 2004 Data Challenge  Supported up to 4500 simultaneous jobs across 60 sites  Produced, transferred, and replicated 80 TB of data, plus meta-data  Consumed over 425 CPU years during last 4 months

5 DIRAC design goals  Light implementation  Must be easy to deploy on various platforms  Non-intrusive No root privileges, no dedicated machines on sites  Must be easy to configure, maintain and operate  Using standard components and third party developments as much as possible  High level of adaptability  There will be always resources outside LCGn domain Sites that can not afford LCG, desktops, …  We have to use them all in a consistent way  Modular design at each level  Adding easily new functionality

6 DIRAC architecture

7 DIRAC Services and Resources DIRAC Job Management Service DIRAC Job Management Service DIRAC CE DIRAC Sites Agent Production manager Production manager GANGA UI User CLI JobMonitorSvc JobAccountingSvc AccountingDB Job monitor ConfigurationSvc FileCatalogSvc BookkeepingSvc BK query webpage BK query webpage FileCatalog browser FileCatalog browser DIRAC services DIRAC Storage DiskFile gridftp LCG Resource Broker Resource Broker CE 1 CE 2 CE 3 Agent DIRAC resources FileCatalog Agent

8 DIRAC workload management  Realizes PULL scheduling paradigm  Agents are requesting jobs whenever the corresponding resource is free  Using Condor ClassAd and Matchmaker for finding jobs suitable to the resource profile  Agents are steering job execution on site  Jobs are reporting their state and environment to central Job Monitoring service

9 Agent Container JobAgent PendingJobAgent BookkeepingAgent TransferAgent MonitorAgent CustomAgent DIRAC: Agent modular design  Agent is a container of pluggable modules  Modules can be added dynamically  Several agents can run on the same site  Equipped with different set of modules as defined in their configuration  Data management is based on using specialized agents running on the DIRAC sites

10 File Catalogs  DIRAC incorporated 2 different File Catalogs  Replica tables in the LHCb Bookkeeping Database  File Catalog borrowed from the AliEn project  Both catalogs have identical client API’s  Can be used interchangeably  This was done for redundancy and for gaining experience  Other catalogs will be interfaced in the same way  LFC  FiReMan

11 Data management tools  DIRAC Storage Element is a combination of a standard server and a description of its access in the Configuration Service  Pluggable transport modules: gridftp,bbftp,sftp,ftp,http, …  DIRAC ReplicaManager interface (API and CLI)  get(), put(), replicate(), register(), etc  Reliable file transfer  Request DB keeps outstanding transfer requests  A dedicated agent takes file transfer request and retries it until it is successfully completed  Using WMS for data transfer monitoring

12 Other Services  Configuration Service  Provides configuration information for various system components (services,agents,jobs)  Bookkeeping (Metadata+Provenance) database  Stores data provenance information  Monitoring and Accounting service  A set of services to monitor job states and progress and to accumulate statistics of resource usage

13 Implementation details

14 XML-RPC protocol  Standard, simple, available out of the box in the standard Python library  Both server and client  Using expat XML parser  Server  Very simple socket based  Multithreaded  Supports up to 40 Hz requests rates  Client  Dynamically built service proxy  Did not feel a need for something more complex  SOAP, WSDL, …

15 Instant Messaging in DIRAC  Jabber/XMPP IM  asynchronous, buffered, reliable messaging framework  connection based Authenticate once “tunnel” back to client – bi-directional connection with just outbound connectivity (no firewall problems, works with NAT)  Used in DIRAC to  send control instructions to components (services, agents, jobs) XML-RPC over Jabber Broadcast or to specific destination  monitor the state of components  interactivity channel with running jobs

16 Dynamically deployed agents How to involve the resources where the DIRAC agents are not yet installed or can not be installed ?  Workload management with resource reservation  Sending agent as a regular job  Turning a WN into a virtual LHCb production site  This strategy was applied for DC04 production on LCG:  Effectively using LCG services to deploy DIRAC infrastructure on the LCG resources  Efficiency:  >90 % success rates for DIRAC jobs on LCG  While 60% success rates of LCG jobs No harm for the DIRAC production system  One person ran the LHCb DC04 production in LCG

17 Agent based deployment  Most of the DIRAC subsystems are a combination of central services and distributed agents  Mobile agents as easily deployable software components can form the basis of the new paradigm for the grid middleware deployment  The application software is deployed by a special kind of jobs / agent  The distributed grid middleware components (agents) can be deployed in the same way

18 Deployment architecture CE Hosting CE SRM WMS agent DMS agent Mon agent Site services VO services WMS DMS Catalogs Config Accounting VOMS Central services Local services WMS agent Monitoring VO disk space

19 Hosting CE  Standard CE with specific configuration  Restricted access – for VO admins only  Out/Inbound connectivity  Visible both to external WAN and to the Intranet  Provides VO dedicated disk space, database access:  VO software  Requests database  No wall clock time limit for jobs  No root privileges for jobs

20 Natural distribution of responsibilities  Site is responsible for providing resources  Site administrators should be responsible only for deploying software providing access to resources Computing resources – CE (Condor-C ?) Storage resources – SE (SRM ?)  Managing the load on the grid is the responsibility of the VO  VO administrators should be responsible for deploying VO services components - even those running on sites

21 Services breakdown  Inspired from the Dirac experience  Deployment by RCs of basic services:  Fabric  Basic CE for WM and SRM for DM (on top of MSS and/or DMP)  File access services (rootd / xrootd / rfio…)  Deployment by VOs of higher level services  Using deployment facilities made available on site  Additional baseline services  File catalogue(s): must be robbust and performant  Other services: a la carte

22 Tape Disk MSS SRM WNs LSF WMS agent Storage system FRC Central services Local services Monitoring & accounting Fabric Deployed central Services WMS Local RC Host Service CE Mon. agent Transfer agent rfiod rootd File Placement System Batch system Application GFAL xxftpd Baseline services VO services

23 Conclusions  DIRAC provides an efficient MC production system  Data reprocessing with DIRAC has started  Using DIRAC for analysis is still to be demonstrated  DIRAC needs further developments and integration of third party components  Relations with other middleware developments are still to be defined following thorough assessments  We believe that DIRAC experience can be useful for the LCG and other grid middleware developments and deployment