Presentation is loading. Please wait.

Presentation is loading. Please wait.

K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.

Similar presentations


Presentation on theme: "K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration."— Presentation transcript:

1 K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration of access machine - Job handling - Setting up Cambridge as a (small-scale) production centre:  Configuration for summer 2002  Problems encountered  Future plans

2 23rd October 20022 LHCb distributed production system - Production manager stores details of participating sites in two places:  in a Java servlet that produces job scripts  in the PVSS system used for job management - Each production site must define and configure an access machine  Access machine deals with requests from PVSS, and distributes jobs between all machines available at a site  In EDG terms, the access machine acts as a Computing Element, and the machines where jobs are run act as Worker Nodes - When producing job scripts, use Servlet Runner that must have write access to the area where a site’s job scripts are created  May be able to use CERN Servlet Runner (afs access), or may need Servlet Runner installed at remote site

3 23rd October 20023 Configuration of access machine - Main steps for configuring the access machine are as follows:  Install PVSS tools  Define environment variable LHCBPRODROOT to point to root directory of production area  Download and run mcsetup installation script  Customise site-specific scripts  Customisation basically defines site identity, command for job submission, and what to do with output  Set up Servlet Runner if not using CERN Servlet Runner  More details available at: http://lhcb-wdqa.web.cern.ch/lhcb-wdqa/distribution http://lhcb-comp.web.cern.ch/lhcb-comp/ComputingModel /datachallenges/slice.doc

4 23rd October 20024 Job handling - Basic job handling is as follows (using CERN Servlet Runner):  Specify job request by filling in web form at: http://lhcb-comp.web.cern.ch/lhcb-comp/SICB/pcsf/html /mcbrunel.htm  Parameters passed to Servlet Runner, which produces job scripts  Submit jobs either through PVSS or locally using script submit-all-scripts installed by mcsetup  When jobs are completed, update central database and transfer data to CASTOR using script transfer-all installed by mcsetup

5 23rd October 20025 Cambridge: Summer 2002 (1) - Jobs for summer production were run on 10 desktop machines with Redhat Linux 7.1 installed:  5 x P3 (0.9-1.0 GHz, 256-512 Mb)  5 x P4 (1.8-2.0 GHz, 256-512 Mb) - Desktop machines are used by people who work interactively, and may submit other jobs; production jobs were run on low-priority batch queues  Made use of otherwise-idle CPU cycles - Each machine used has 10-20 Gb local scratch space; additionally had 20 Gb for LHCb production on central file server - LHCb production tools and software were installed only on the access machine - Access machine submitted jobs to an NQS pipe queue, for distribution among all production nodes

6 23rd October 20026 Cambridge: Summer 2002 (2) - A script executed at job startup determined where to run the applications:  If the local scratch area had at least 5 Gb free, the LHCb software was copied to a new directory in this area, and run there  If there was insufficient free space locally, the LHCb software was copied to a new directory in the LHCb area of the central file server, and run there - When a job completed, its output was stored on the file server, then the directory where the job was run was deleted - Log files and DSTs were copied to CERN, using bbftp and locally written tools

7 23rd October 20027 Cambridge: Problems encountered (1) - Configuration process was very drawn out, as all changes had to be made centrally  With new installation tools, site configuration is simpler and almost everything is done locally - Information concerning production not always communicated quickly to sites outside CERN  Situtation improved now that lhcb-production mailing list has been set up

8 23rd October 20028 Cambridge: Problems encountered (2) - Had problems during production when afs was unavailable, with sequence as follows:  Job fails to retrieve parameter files needed by SICBMC  SICBMC complains, but runs anyway  Job fails to retrieve options files needed by Brunel  Brunel core dumps  Large amounts of CPU time wasted (SICBMC producing unusable events); human intervention needed after job crash  Problem solved with new system, where reliance on afs is removed - Brunel v13r1 used a lot of memory (around 200 Mb)  Some jobs had to be killed as they prevented other users from working  Improvements with newer versions of Brunel?

9 23rd October 20029 Cambridge: Future plans - Participation in summer 2002 production has been a positive experience  Gained experience with production tools, and with running simulation and reconstruction jobs using the latest versions of the software  Produced 37k events that have been copied to CASTOR, and are being used locally in physics studies - Aim to maintain participation in data challenges at least at current (low) level - Additional 20 x P3 (1.1 GHz, 256 Mb) are available in Cambridge HEP Group if we are able to use Grid tools (Globus or EDG)  Will be exploring possibilities in the coming months


Download ppt "K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration."

Similar presentations


Ads by Google