Presentation is loading. Please wait.

Presentation is loading. Please wait.

James Cunha Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

Similar presentations


Presentation on theme: "James Cunha Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:"— Presentation transcript:

1 James Cunha Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:

2 James Cunha Human resource strategy Physicists: Roger, George, John, Jenny, Mark, Marta, Christina, Ming, Nick, Mitch, Andy 11 workers load Goals: HEP, frontiers of Physics, … Dont care with computers, grid, popcorn machine: if available, they use them Guinea Pig: James Goal: integration and support 2 * workers load Computeers: Andrews, Alessandra, Mike, Chris, Sabah 3 workers load Goals: New technologies, new technologies, new technologies, … Total demand16 workers load * Jobs with 5 events instead Millions.

3 James Cunha Resources Strategy Before JuneSeptember 2004 PCsGeneral interactive use SLAC terminal (Babar Software) General interactive use SLAC terminal (Babar Software) Babar Software CM2/Monte Carlo Production 40 machines 80 CPUs Test Bed: 10 CPUs LCG2 -Babar Software CM2 -Monte Carlo -Grid Application Dev Production:70 CPUs LCG2 -only CE/WN -exclusive non-babar use Know howWorkbook (Physics)Workbook (physics) A to Z Babar Computing

4 James Cunha Grid Test Bed

5 James Cunha

6 James Cunha Software: 850 packages. Tau Datasets: range between 60 files 1GB and 150 files 1GB Total 4,000 GB ~ 10,000 files

7 James Cunha Analysis Submission to Grid Single command:./easygrid dataset_name Perform Handlers management and submission Software based in State-machine –Verify skimdata available: If not available perform BbkDatasetTCL to generate skimData. Each file will be a job. –Verify if there are handlers pending If not, script generation (gera.c) with edg-job-submit and ClassAdds, and script execution. Nest for submission policy and optimisation. If yes, verify job status. When the all jobs ended, recover results in user folder. (Prototype)

8 James Cunha Generation and submission babar]$./easygrid SP-1005-Tau11-R14 Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner Enter GRID pass phrase for this identity: Creating temporary proxy Done Creating proxy Done Searching pre selected skimdata. Searching previous handlers. Handlers not found. Submiting to GRID. Wait end of process...

9 James Cunha Job Status babar]$./easygrid SP-1005-Tau11-R14 Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner Enter GRID pass phrase for this identity: Creating temporary proxy Done Creating proxy Done Searching pre selected skimdata. Searching previous handlers. Checking if jobs finished. ### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/foRHhWyeDBnbqA9JkDADLg Current Status: Scheduled https://lcgrb01.gridpp.rl.ac.uk:9000/foRHhWyeDBnbqA9JkDADLg still pendent. ### Handle -> https://lxn1188.cern.ch:9000/8DdK3xruxtevNpei3zZbaA Current Status: Scheduled https://lxn1188.cern.ch:9000/8DdK3xruxtevNpei3zZbaA still pendent. 4 jobs did not finished ! Try again later.

10 James Cunha Job Status and recovery babar]$./easygrid SP-1005-Tau11-R14 Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner Enter GRID pass phrase for this identity: Creating temporary proxy Done Creating proxy Done Searching pre selected skimdata. Searching previous handlers. Checking if jobs finished. ### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/foRHhWyeDBnbqA9JkDADLg Current Status: Done Exit code: 0 ### Handle -> https://lxn1188.cern.ch:9000/8DdK3xruxtevNpei3zZbaA Current Status: Done Exit code: 0 0 jobs did not finished ! Try again later. All jobs done. Recovering results in your folder. Results in the following folders: /home/jamwer/grid_sub/babar/jamwer_foRHhWyeDBnbqA9JkDADLg /home/jamwer/grid_sub/babar/jamwer_8DdK3xruxtevNpei3zZbaA

11 James Cunha Monte Carlo Submission to Grid Single Command:./mcgrid JobName num_copies Perform Handlers management and submission. Software based in State-Machine: –Verify if there are handlers pending If not, script generation (geramc.c) with edg-job-submit and ClassAdds for each copy, and script execution. Nest for submission policy and optimisation. If yes, verify job status. When the all jobs ended, recover results in user folder. (Prototype)

12 James Cunha MC Submission mcgrid1]$./mcgrid MCteste 3 Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner Enter GRID pass phrase for this identity: Creating temporary proxy Done Creating proxy Done Searching previous handlers. Handlers not found. Submiting to GRID. Wait end of process...

13 James Cunha Job Status mcgrid1]$./mcgrid MCteste 3 Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner Enter GRID pass phrase for this identity: Creating temporary proxy Done Creating proxy Done Searching previous handlers. Checking if jobs finished. ### Handle -> https://lxn1188.cern.ch:9000/9WzceoIMEQoTK24a-UvOmw Current Status: Scheduled https://lxn1188.cern.ch:9000/9WzceoIMEQoTK24a-UvOmw still pendent. ### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/c4iCB8vioozaGteI9hybIg Current Status: Ready https://lcgrb01.gridpp.rl.ac.uk:9000/c4iCB8vioozaGteI9hybIg still pendent. ### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/L5BD1OE--eckTm5RXkp2nA Current Status: Ready https://lcgrb01.gridpp.rl.ac.uk:9000/L5BD1OE--eckTm5RXkp2nA still pendent. 3 jobs did not finished ! Try again later.

14 James Cunha Job status and recovery mcgrid1]$./mcgrid MCteste 3 Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner Enter GRID pass phrase for this identity: Creating temporary proxy Done Creating proxy Done Searching previous handlers. Checking if jobs finished. ### Handle -> https://lxn1188.cern.ch:9000/9WzceoIMEQoTK24a-UvOmw Current Status: Done Exit code: 0 ### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/c4iCB8vioozaGteI9hybIg Current Status: Done Exit code: 0 0 jobs did not finished ! Try again later. All jobs done. Recovering results in your folder. Results in the following folders: /home/jamwer/grid_sub/mcgrid1/jamwer_9WzceoIMEQoTK24a-UvOmw /home/jamwer/grid_sub/mcgrid1/jamwer_c4iCB8vioozaGteI9hybIg /home/jamwer/grid_sub/mcgrid1/jamwer_L5BD1OE--eckTm5RXkp2nA

15 James Cunha Testing Submission Script Load Range: Worker load x #Files –16 x 60 files = 960 jobs pendent –16 x 150 files = 2400 jobs pendent Test with Submission script 100 Jobs1000 Jobs SubmissionResult recovery SubmissionResult recovery Done Aborted ** Scheduled79 Fail1**630 *630 * sslv3 alert handshake failure ** Please wait job enter the Done status. This never happens! Resource Broker not reliable or robust. Sometimes failure 3 days a week or takes hours to submit/dispatch to CE (empty!).

16 James Cunha Pending Infrastructure => Course of action Babar Software Know How is not available at Manchester => Web Page & Network skills. Quality Assurance => We are OK! from benchmark (E x P) Real Application to perform complete cycle, acquire know how, and grid prof-of-concept is missing => Partnership with physicists CERN does NOT recognise Babar Community => Lets reduce their priority! RB at Manchester => 60MB binaries and policies freedom. SE/RC at Manchester => policies and submission jobs freedom. Mass storage (10TB) for Babar purposes => CAP! UI in the AFS => wide access to Manchester farms. Apprenticeship at RAL and later at SLAC – production and experiment => improve where others fail Configuration for optimal job performance/submission at Tear 2 (1 Ce x 50 WN? Performance dCache with Babar Software? Why 10TB if Liverpool bought 80TB? Electricity bill? => analyse procedures to improve QoS and better Site Configuration Update (software and data) and operational policies => operational standards to achieve high QoS

17 James Cunha Aimed Hardware Architecture (Redundant RB with alternate access)

18 James Cunha Aimed Software Architecture

19 James Cunha Production Job Submission Package Operational policies/integration with RB (application level). Recovery of aborted status. Resources optimisation. Integration with RC (application level) for replicas policies development. Interactive data visualisation (Useful?) Integration with GridSite (Data visualisation, analysis, performance monitor, and submission) Professional version.

20 James Cunha Integrate LCG2 and Job Submission with Babar/CM2 at University of Manchester for Tau Physics modelling, analysis and MC generation. We aim to be soon… The largest site in UK. Leader in grid computing and HEP Summary

21 James Cunha Conclusion Babar CM2 is running at Manchester! LCG2 Grid is running with real world experiment! Babar submission prototype to Grid is running ! LCG is not LHC software only! It is Babars. We are doing today what will take years to you to achieve. Lets work together!


Download ppt "James Cunha Enabling Grid Computer for HEP Babar Team at University of Manchester Resources:"

Similar presentations


Ads by Google