K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.

Slides:



Advertisements
Similar presentations
Building Portals to access Grid Middleware National Technical University of Athens Konstantinos Dolkas, On behalf of Andreas Menychtas.
Advertisements

The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
4/2/2002HEP Globus Testing Request - Jae Yu x Participating in Globus Test-bed Activity for DØGrid UTA HEP group is playing a leading role in establishing.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
S/W meeting 18 October 2007RSD 1 Remote Software Deployment Nick West.
Production Planning Eric van Herwijnen Thursday, 20 june 2002.
Operating Systems: Principles and Practice
11 Dec 2000F Harris Datagrid Testbed meeting at Milan 1 LHCb ‘use-case’ - distributed MC production
Exploiting the Grid to Simulate and Design the LHCb Experiment K Harrison 1, N Brook 2, G Patrick 3, E van Herwijnen 4, on behalf of the LHCb Grid Group.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
BaBar WEB job submission with Globus authentication and AFS access T. Adye, R. Barlow, A. Forti, A. McNab, S. Salih, D. H. Smith on behalf of the BaBar.
SLIR Computer Lab: Orientation and Training December 16, 1998.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
08/06/00 LHCb(UK) Meeting Glenn Patrick LHCb(UK) Computing/Grid: RAL Perspective Glenn Patrick Central UK Computing (what.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
LHCb Applications and GRID Integration Domenico Galli Catania, April 9, st INFN-GRID Workshop.
K. Harrison CERN, 20th April 2004 AJDL interface and LCG submission - Overview of AJDL - Using AJDL from Python - LCG submission.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
SLICE Simulation for LHCb and Integrated Control Environment Gennady Kuznetsov & Glenn Patrick (RAL) Cosener’s House Workshop 23 rd May 2002.
Cosener’s House – 30 th Jan’031 LHCb Progress & Plans Nick Brook University of Bristol News & User Plans Technical Progress Review of deliverables.
Nadia LAJILI User Interface User Interface 4 Février 2002.
LHCb and DataGRID - the workplan for 2001 Eric van Herwijnen Wednesday, 28 march 2001.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
WP8 Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid WP8 Meeting, 16th November 2000 Glenn Patrick (RAL)
Andrey Meeting 7 October 2003 General scheme: jobs are planned to go where data are and to less loaded clusters SUNY.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
NA61/NA49 virtualisation: status and plans Dag Toppe Larsen CERN
Successful Distributed Analysis ~ a well-kept secret K. Harrison LHCb Software Week, CERN, 27 April 2006.
UK Grid Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid Prototype and Globus Technical Meeting QMW, 22nd November 2000 Glenn Patrick (RAL)
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
Liverpool Experience of MDC 1 MAP (and in our belief any system which attempts to be scaleable to 1000s of nodes) broadcasts the code to all the nodes.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
LHCb Data Challenge in 2002 A.Tsaregorodtsev, CPPM, Marseille DataGRID France meeting, Lyon, 18 April 2002.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
CLRC Grid Team Glenn Patrick LHCb GRID Plans Glenn Patrick LHCb has formed a GRID technical working group to co-ordinate practical Grid.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
10 March Andrey Grid Tools Working Prototype of Distributed Computing Infrastructure for Physics Analysis SUNY.
CCJ introduction RIKEN Nishina Center Kohei Shoji.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
ATLAS Computing Wenjing Wu outline Local accounts Tier3 resources Tier2 resources.
GDB Meeting CERN 09/11/05 EGEE is a project funded by the European Union under contract IST A new LCG VO for GEANT4 Patricia Méndez Lorenzo.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
LHCb computing model and the planned exploitation of the GRID Eric van Herwijnen, Frank Harris Monday, 17 July 2000.
Compute and Storage For the Farm at Jlab
AWS Integration in Distributed Computing
U.S. ATLAS Grid Production Experience
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Moving the LHCb Monte Carlo production system to the GRID
Chapter 2: System Structures
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Simulation use cases for T2 in ALICE
TYPES OF SERVER. TYPES OF SERVER What is a server.
Integration of Singularity With Makeflow
Alice Software Demonstration
Gridifying the LHCb Monte Carlo production system
Status and plans for bookkeeping system and production tools
Short to middle term GRID deployment plan for LHCb
Production Manager Tools (New Architecture)
Production client status
The LHCb Computing Data Challenge DC06
Presentation transcript:

K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration of access machine - Job handling - Setting up Cambridge as a (small-scale) production centre:  Configuration for summer 2002  Problems encountered  Future plans

23rd October LHCb distributed production system - Production manager stores details of participating sites in two places:  in a Java servlet that produces job scripts  in the PVSS system used for job management - Each production site must define and configure an access machine  Access machine deals with requests from PVSS, and distributes jobs between all machines available at a site  In EDG terms, the access machine acts as a Computing Element, and the machines where jobs are run act as Worker Nodes - When producing job scripts, use Servlet Runner that must have write access to the area where a site’s job scripts are created  May be able to use CERN Servlet Runner (afs access), or may need Servlet Runner installed at remote site

23rd October Configuration of access machine - Main steps for configuring the access machine are as follows:  Install PVSS tools  Define environment variable LHCBPRODROOT to point to root directory of production area  Download and run mcsetup installation script  Customise site-specific scripts  Customisation basically defines site identity, command for job submission, and what to do with output  Set up Servlet Runner if not using CERN Servlet Runner  More details available at: /datachallenges/slice.doc

23rd October Job handling - Basic job handling is as follows (using CERN Servlet Runner):  Specify job request by filling in web form at: /mcbrunel.htm  Parameters passed to Servlet Runner, which produces job scripts  Submit jobs either through PVSS or locally using script submit-all-scripts installed by mcsetup  When jobs are completed, update central database and transfer data to CASTOR using script transfer-all installed by mcsetup

23rd October Cambridge: Summer 2002 (1) - Jobs for summer production were run on 10 desktop machines with Redhat Linux 7.1 installed:  5 x P3 ( GHz, Mb)  5 x P4 ( GHz, Mb) - Desktop machines are used by people who work interactively, and may submit other jobs; production jobs were run on low-priority batch queues  Made use of otherwise-idle CPU cycles - Each machine used has Gb local scratch space; additionally had 20 Gb for LHCb production on central file server - LHCb production tools and software were installed only on the access machine - Access machine submitted jobs to an NQS pipe queue, for distribution among all production nodes

23rd October Cambridge: Summer 2002 (2) - A script executed at job startup determined where to run the applications:  If the local scratch area had at least 5 Gb free, the LHCb software was copied to a new directory in this area, and run there  If there was insufficient free space locally, the LHCb software was copied to a new directory in the LHCb area of the central file server, and run there - When a job completed, its output was stored on the file server, then the directory where the job was run was deleted - Log files and DSTs were copied to CERN, using bbftp and locally written tools

23rd October Cambridge: Problems encountered (1) - Configuration process was very drawn out, as all changes had to be made centrally  With new installation tools, site configuration is simpler and almost everything is done locally - Information concerning production not always communicated quickly to sites outside CERN  Situtation improved now that lhcb-production mailing list has been set up

23rd October Cambridge: Problems encountered (2) - Had problems during production when afs was unavailable, with sequence as follows:  Job fails to retrieve parameter files needed by SICBMC  SICBMC complains, but runs anyway  Job fails to retrieve options files needed by Brunel  Brunel core dumps  Large amounts of CPU time wasted (SICBMC producing unusable events); human intervention needed after job crash  Problem solved with new system, where reliance on afs is removed - Brunel v13r1 used a lot of memory (around 200 Mb)  Some jobs had to be killed as they prevented other users from working  Improvements with newer versions of Brunel?

23rd October Cambridge: Future plans - Participation in summer 2002 production has been a positive experience  Gained experience with production tools, and with running simulation and reconstruction jobs using the latest versions of the software  Produced 37k events that have been copied to CASTOR, and are being used locally in physics studies - Aim to maintain participation in data challenges at least at current (low) level - Additional 20 x P3 (1.1 GHz, 256 Mb) are available in Cambridge HEP Group if we are able to use Grid tools (Globus or EDG)  Will be exploring possibilities in the coming months