Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data production using CernVM and lxCloud Dag Toppe Larsen Belgrade 2013-05-28.

Similar presentations


Presentation on theme: "Data production using CernVM and lxCloud Dag Toppe Larsen Belgrade 2013-05-28."— Presentation transcript:

1 Data production using CernVM and lxCloud Dag Toppe Larsen Belgrade 2013-05-28

2 28.05.2013NA61/NA49 meeting, Belgrade2 Outline New data production scripts Virtualised data production Data production manager Next steps

3 28.05.2013NA61/NA49 meeting, Belgrade3 Data production sequence

4 28.05.2013NA61/NA49 meeting, Belgrade4 New data production scripts New set of scripts prodna61-produce-reaction.sh prodna61-produce-chunk.sh prodna61-find-chunk-errors.sh Details on next slides Exclusively use xRootd interface to Castor Initially, the scripts were mainly focused on CernVM, however recent involvement in “normal” data production provided an opportunity to focus on lxBatch as well Involvement in “normal” data production gave much better overview/understanding of requirements for it Scripts “work”, but are some issues that need to be addressed for fully automated usage Will significantly save work/reduce chance for mistakes for data productions even when executed from command line by hand To be executed from web data production manager

5 28.05.2013NA61/NA49 meeting, Belgrade5 New data production scripts prodna61-produce-reaction.sh e.g. prodna61-produce-reaction.sh BeBe160 Initiates production of reaction Get lists of:  chunks from bookkeeping/Castor  software from file system  global keys from KEY DB Takes latest global key and software by default (additional parameters otherwise) Submits jobs to batch system (either CernVM/lxBatch)  Jobs run prodna61-produce-chunk.sh script (next slide) Small differences between lxBatch/CernVM versions (related to different batch systems)

6 28.05.2013NA61/NA49 meeting, Belgrade6 New data production scripts prodna61-produce-reaction.sh Several parameters defining paths, global key, software versions, etc.  Designed to be flexible, also with regards to processing outside CERN  Configuration parameters like magnetic field, etc., not passed, job will determine this itself Modifies a template set-up file (prodna61-setup) to fit requirements Steps:  Get raw file from Castor, unpack it  Run legacy software  Run ROOT61  Convert legacy to SHOE  Run native Shine reconstruction (PSD)  Merge converted legacy data and native Shine data  Create mini-Shoe  Run Anar's QA (on chunk, intended to be merged later)  Upload files to Castor and/or local disk  Compress log file and store to Castor The process will be easier after switch to Shine native reconstruction, since most complications are related to legacy chain Typically not run by user directly (but possible), but submitted to batch system by prodna61-produce- reaction.sh Same version for CernVM/lxBatch

7 28.05.2013NA61/NA49 meeting, Belgrade7 New data production scripts prodna61-find-chunk-errors.sh Searches for errors for given chunk Errors searched for:  Check Castor for too small/empty/non-existent DSPACK, ROOT, SHOE, MINI-SHOE, LOG or QA file  Scan log file for failed events (above given threshold) and job exited/killed/terminated Intended to be run as acrontab job for production manager web page for all finished jobs Same version for CernVM/lxBatch

8 28.05.2013NA61/NA49 meeting, Belgrade8 Remaining issues for DP scripts The reconstruction can be run with either “-pA” or “-pp” for fitting of primary vertex Which one is preferable for given reaction typically depends on target length Run-script: run_keys="-d all -256 -pp -keep -minipoint -points -f $setup" run_keys="-d all -256 -pA -keep -minipoint -points -f $setup" Setup-file Exec $v0find -s $DSPACK_SERVER -d all -pp Exec $v0find -s $DSPACK_SERVER -d all -pA Exec $xi_fit -s $DSPACK_SERVER -d all -f 13 -pp Exec $xi_fit -s $DSPACK_SERVER -d all -f 13 -pA Question: Would it be possible to have a KEY for this?  Otherwise need to store in separate “database”, increasing complexity Why is -pp/-pA called both as parameter to run- script and inside set-up file?  Does it ever happen that they are both not used simultaneously?

9 28.05.2013NA61/NA49 meeting, Belgrade9 Remaining issues for DP scripts KEY5 e.g. KEY5=CALC/STD+ Question: Is there any reason why this is set explicitly in the set-up file, and not from the global key? Residual corrections e.g. Exec res_corr -s $DSPACK_SERVER - vt1_chris $CORR_DIR/vt1_2009_pp158.corr - vt2_chris $CORR_DIR/vt2_2009_pp158.corr - mtl_chris $CORR_DIR/mtl_2009_pp158.corr - mtr_chris $CORR_DIR/mtr_2009_pp158.corr -p $CORR_DIR_OLD/vdrift_2009.txt Question: Can we have a KEY for this as well? Currently, we only have one set of residual correction files. Are more envisioned?

10 28.05.2013NA61/NA49 meeting, Belgrade10 Remaining issues for DP scripts xRootd replacement for “nsls ” is “xrd castorpublic dirlist ” Very slow for either directories with many files or deep directory trees  Several minutes to return data  Not very practical for user interaction Used for obtaining list of chunks for reactions from Castor Possible solutions: Ask IT if it can be improved Obtain data from bookkeeping database instead PSD reconstruction for Shine needs different files for different run conditions PSDReconstructor.xml, PSDCalibXMLConfig.xml Question: Can this be done more automatic on the Shine side?

11 28.05.2013NA61/NA49 meeting, Belgrade11 Any other parameters? Set-up file is currently modified from a template for magnetic field, residual corrections and -pp/ -pA Either by hand for “old” data production scripts Or automatic for “new” scripts Question: Are there any other parameters that need to taken into account for the data production?

12 28.05.2013NA61/NA49 meeting, Belgrade12 Create/destroy virtual clusters Scripts for creating/destroying virtual clusters of virtual machines on lxCloud (or other clouds) created Will be possible to launch older virtual machines for data preservation (running older versions of data production software) Need some tuning for latest iteration of test lxCloud Final lxCloud processing is charged per hour a virtual machine is running (no matter if it does processing or not)  Important to be able to create/destroy virtual clusters on demand The creating/destroying of virtual machines must be controlled by the web production manager  Some control logic needs to be developed on web production manager side

13 28.05.2013NA61/NA49 meeting, Belgrade13 Virtualised data production So far legacy software v12j used for testing on virtual machine Now installing v13b (or c?) to be able to use latest versions (also for global key) for test of whole reaction Can use the modified version of Anar's QA (ratio, difference) to compare the outputs However, a large contribution of the differences may be due to “random” missing events in either production?

14 28.05.2013NA61/NA49 meeting, Belgrade14 Virtualised data production resource estimate Processing time of chunk depends on reaction BeBe160 ~1.5h pp ~45min Consistent with experience from lxBatch Cost estimate based on currently processed data Whole run 15252 (BeBe40, 170 files) produced on virtual machines  10 Virtual machines  Made sure chunks were staged on castor first  Processing time on test lxCloud ~1h/chunk (slightly less) Assuming 1h/chunk, 10 000 chunks for reaction, and 2 days “reasonable” processing time for reaction  10 000 / 24 / 2 = 208 virtual machines for cluster  Have been allocated quota of 200 VMs by IT for testing The production of a full reaction for data validation will give better estimate Installing latest legacy software 13b (c?) for this

15 28.05.2013NA61/NA49 meeting, Belgrade15 Web data production manager Dimitije created the current production manager Since he left NA61, I have stared looking into how it works Two “parts”  Web page displaying information  Background acrontab jobs updating files with information to be accessed by the web page Missing/incomplete parts:  Interface to new production scripts  Authentication (to verify user has rights to start production)  Production database (stores information about status of running/finished chunk jobs, initiates search of errors for chunks, and resubmits chunks as needed)  Interface to bookkeeping database (upload of finished reactions)  Interface to create/kill virtual clusters

16 28.05.2013NA61/NA49 meeting, Belgrade16 Data production manager database Needed to keep track of status of ongoing/finished jobs (chunks) for productions Some initial scripts created Initiated from acrontab job Search for finished jobs, update database Check if jobs were successful, update database Resubmit failed jobs, update databse Based on SQLite Scripts will be back-end for web production manager

17 28.05.2013NA61/NA49 meeting, Belgrade17 Automatic update of bookkeeping database Bookkeeping database (Alexander's) needs to be updated when a production has been finished Should be done automatic by web production manager (database) Interface between production manager and bookkeeping database must work for both CernVM/lxBatch Not depend on AFS Also work for CernVM processing outside CERN  HTTP-based First step to create scripts to do update by hand e.g. prodna61-update-bookkeeping.sh

18 28.05.2013NA61/NA49 meeting, Belgrade18 Next steps Short-term Address remaining issues for CernVM/lxBatch unified production scripts Install legacy software 13b (c?) on CvmFS for further validation Further investigation of missing events  Sometimes an event can fail, but succeed next time? By-hand way to upload data to bookkeeping database Long-term Web production manager  Interface to production scripts  Database for status of ongoing productions  Automatic upload data to bookkeeping database

19 13.02.2013NA61/NA49 meeting, CERN19 Roadmap TaskStatus/doneRemainingExpected NA61 software installation OK (12j)Install 13b (c?)June NA49 software installation OKData validation Scripts for production, resubmit failed jobs, create/ kill clusters, etc. Scripts createdSome issues related to “smooth” operation Work in progress Production of full reaction Data can be produced with software v12j v13b/c neededJune Estimate resources Production of full production June Integrate components Stand-alone scripts exist Further integration needed June/July Automatic tracking of job status Some stand-alone scripts DB to track status of ongoing/finished jobs Summer Web production manager Web interface for displaying information Allow for initiation of data production Summer

20 28.05.2013NA61/NA49 meeting, Belgrade20 Volunteers For participating in the “normal” data production team? If anybody is interested, we can have a “mini- workshop” later this week


Download ppt "Data production using CernVM and lxCloud Dag Toppe Larsen Belgrade 2013-05-28."

Similar presentations


Ads by Google