Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.

Similar presentations


Presentation on theme: "Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D."— Presentation transcript:

1 Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D. Colling, B. MacEvoy, S. Wakefield, Y. Zhang. Imperial College London

2 Stuart Wakefield Imperial College London Outline Introduction to BOSS. Previous features and usage. New functionality. Reengineering of the design. Current status and plans.

3 Stuart Wakefield Imperial College London Introduction Batch Object Submission System. See Previous talk at CHEP03, monitoring track, THET001. A tool for batch job submission, real time monitoring and book keeping. Interfaced to many schedulers both local and grid. Utilizes relational database for persistency. Full logging and bookkeeping information stored. Job commands: submit, kill, query and output retrieval. Can define custom job types which allows specify monitoring unique to the submitted application.

4 Stuart Wakefield Imperial College London BOSS in CMS computing Used in CMS MC production for 4 years. Prototype CMS distributed analysis system (GROSS) based on BOSS and later new analysis system using BOSS. Last year it was decided that the BOSS architecture needed to be redesigned in order to meet the changing requirements of CMS computing. BOSS Logging & bookkeeping monitoring Production / analysis tool

5 Stuart Wakefield Imperial College London V3.x workflow I boss submit boss query boss kill BOSS DB BOSS Scheduler farm node Wrapper User specifies job - parameters including: –Executable name. –Executable type - turn on customized monitoring. –Output files to retrieve (for sites without shared file system and grid). User tells Boss to submit jobs specifying scheduler i.e. PBS, LSF, SGE, Condor, LCG, GLite etc.. Job consists of job wrapper, Real time monitoring service and users executable.

6 Stuart Wakefield Imperial College London V3.x workflow II Once running wrapper starts real time monitoring services and users executable. Writes all logging information (start time, finish time, exit code etc.) to local journal file. Monitoring services parse job output looking for regular expressions specified by the job type. Monitoring info saved to journal file and returned to the user via a database connection to the BOSS DB or via R-GMA (if possible). output #!/usr/bin/perl $i = 0; while($i<3){ sleep(1); $i++; print "counter $i\n"; } User job test JOBID COUNTER 12345 0 BOSS DB #!/usr/bin/perl while( ){ if($_=~/.*counter\s+(\d+).*/){ print “COUNTER=$1\n"; } BOSS jobExecutor counter 1 counter 2 counter 3COUNTER=1COUNTER=2COUNTER=3 123 Filter journal 1234 test counter 1 1234 test counter 2 1234 test counter 3 BOSS dbUpdator

7 Stuart Wakefield Imperial College London V3.x workflow III Using BOSS user can get status of jobs, pulling in information from BOSS DB, scheduler and Real-time Monitoring DB. When job finished output automatically stored at final destination if possible (i.e. shared file system on local cluster) if not (i.e, LCG) output must be fetched by separate BOSS command. If Real Time monitoring not available (i.e. firewall) BOSS DB can be updated from journal file. % boss q -all -specific -type test ID S_USR EXECUTABLE ST EXE_HOST START TIME STOP TIME counter 1 grandi test.pl 15 E pccms10.bo 14:30:00 06/06 14:30:16 06/06 3 2 grandi test.pl 15 R pccms10.bo 14:30:02 06/06 -------------- 2

8 Stuart Wakefield Imperial College London Proposed changes Following experience from CMS MC and distributed analysis systems it was decided to re-engineer BOSS. Provide a C++ and Python API (via SWIG) to allow higher level tools to steer BOSS. Introduce task, chain and program. –Program is the users executable. –Chain is an arbitrarily complex set of different programs run on the same worker node. –Task is a group of homogeneous jobs that may be executed in parallel. In order to describe new task hierarchy move to xml task descriptions. Separate bookkeeping from real time monitoring. Improve real time monitoring but leave as optional. Allow multiple real time monitoring mechanisms. Allow pluggable chaining tools i.e. ShReek (CHEP06 id 276).

9 Stuart Wakefield Imperial College London Logging and Monitoring Separate users logging and (optional) monitoring DB’s. Only allow access to logging DB via BOSS tools. i.e. remove all server requirements (allows personal db implementation in SQLite on local disk). Fill logging database with BOSS tools from information in monitoring DB and journal file retrieved at end of job. Real time server updated by updater on worker node. Transport mechanism possibly utilizing a proxy server. Real time update mechanism possible implementations R- GMA, MonaLisa etc… Allow for different RT mechanisms for each job. Information in monitoring database expires.

10 Stuart Wakefield Imperial College London New data flow

11 Stuart Wakefield Imperial College London New job wrapper Job wrapper will start chainer and monitoring modules. Job chainer will launch each executable separately within its own environment. Job wrapper will provide 2 levels of monitoring, job and executable level. –Job level monitoring includes overall variables such as total time, total memory usage etc.. –Executable monitoring will monitor the executables progress and journal. Future plans include allowing action to be taken if certain circumstances are met - i.e running out of memory, detecting infinite loops etc. Chain Journal Task stdin user exec runtime-filter pre-filter post-filter stdout stderr TaskExecutor Task stdin user exec runtime-filter pre-filter post-filter stdout stderr TaskExecutor Program stdin user exec runtime-filter pre-filter post-filter stdout stderr ProgramExecuter JobMonitor (real-time updater) JobChaining JobExecuter (wrapper)

12 Stuart Wakefield Imperial College London Sample Task specification <program exec="test.pl" args=”ITR" stderr="err_ITR” program_type="test” stdin="in” stdout="out_ITR" infiles="Examples/test.pl,Examples/in” outfiles="out_ITR,err_ITR” outtopdir="" /> Example of task containing 100 chains each consisting of 1 program. Program specific monitoring activated - results returned via MySQL connection.

13 Stuart Wakefield Imperial College London Status and plans Significant new functionality identified and being actively integrated into BOSS. Latest release v3.6 includes much of the new functionality: –Tasks, job and executables. –XML task description. –C++ and Python API’s –Basic executable chaining - currently only default chainer with linear chaining. –Separate logging and monitoring DB’s. –Implemented DB’s in either MySQL or SQLite (more to come). –Optional RT monitoring with multiple implementations, currently only MonaLisa and direct MySQL connections (to be deprecated). Still to be done: –Allow chainer plugins. –Implement more RT monitoring solutions i.e R-GMA. –Finalize API. –Look at writing wrapper in scripting language i.e Perl/Python.


Download ppt "Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D."

Similar presentations


Ads by Google