Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks WMS tricks & tips – further scripting Giuseppe.

Similar presentations


Presentation on theme: "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks WMS tricks & tips – further scripting Giuseppe."— Presentation transcript:

1 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks WMS tricks & tips – further scripting Giuseppe La Rocca INFN – Catania giuseppe.larocca@ct.infn.it Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI

2 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 2 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI Preliminars LCG middleware – The workload is managed by the Resource Broker – Doesn’t support neither parametric jobs nor DAGs – Works fine gLite – Support both the parametric and the DAG jobs – Under developing – Uses WMProxy to manage the workload – Will be available in a few months Tips and tricks – Some ideas to use LCG RB to support parametric jobs and DAGs while waiting for WMProxy stable release

3 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 3 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI Exercise 1: Parametric jobs

4 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 4 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI About Parametric jobs A Parametric job is a job having one or more parametric attributes in the JDL. The parametric attributes change their values according to another attribute (named Parameters). The submission of a Parametric job results in the submission of a set of jobs having the same descriptions apart from the values of the parametric attributes. All jobs resulting from the submission of it are assigned by the WMS with an identifier so that it is possible to monitor and control each of them separately and as a single entity (through the parametric job handle).

5 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 5 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI An example [ JobType = "Parametric"; Executable = "cms_sim.exe"; StdOutput = "myoutput_PARAM_.txt"; StdError = "myerror_PARAM_.txt"; Parameters = 3; ParameterStart = 1; ParameterStep = 1; InputSandbox = {"cms_sim.exe","input_PARAM_.txt"}; OutputSandbox = {"myoutput_PARAM_.txt", "myerror_PARAM_.txt" }; Requirements = other.GlueCEInfoTotalCPUs > 2; Rank = other.GlueCEStateFreeCPUs; ]

6 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 6 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI The present: using scripts Submitter script (submitter.sh) – Creates multiple JDL files and submits these – Inspect the status of each JDL files – Retrieve the output files produced by each JDL.

7 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 7 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI Exercise 1: The bash script (1/2) A set of jobs differing for input files only – The bash script (submitter.sh) looks like this #!/bin/sh if [ "$2" = "" ]; then echo "Usage: $0 begin end [step]" echo " begin The first value of the sequence" echo " end The last value of the sequence" echo " step The step between two submissions" exit 0 fi joblist="jobs.list" begin_index=$1 // the first parameter of the script end_index=$2 // the second parameter of the script if [ "$3" = "" ]; then step=1; else step=$3 // the third parameter of the script fi...

8 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 8 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI Exercise 1: The bash script (2/2) # starts iterations for ((index=$begin_index; index<=$end_index; index=$index+$step)) do # we generate the input file automatically. Obviously it can be made by hand inputfile="input$index.txt" echo "creating input file $inputfile" echo "The name of this input file is $inputfile" > $inputfile # create the correspondent jdl file depending on the index jdlfile="job$index.jdl“ # name of the jdl echo "creating JDL file $jdlfile" ( echo 'Type="Job";' echo 'JobType="Normal";' echo 'Executable=“/bin/cat";' echo "Arguments=\"$inputfile\";" echo "StdOutput=\"stdout$index.txt\";" echo "StdError=\"stderr$index.txt\";“ echo "InputSandbox={\"$inputfile\"};" echo "OutputSandbox={\"stdout$index.txt\", \"stderr$index.txt\"};" ) > $jdlfile glite-job-submit -o jobs.id $jdlfile # actual job submission done # end of iterations

9 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 9 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI $./submitter.sh 1 2 creating input file input1.txt; creating JDL file job1.jdl Selected Virtual Organisation name (from proxy certificate extension): gilda Connecting to host glite-rb2.ct.infn.it, port 7772 Logging to host glite-rb2.ct.infn.it, port 9002 =============== glite-job-submit Success ===================================== The job has been successfully submitted to the Network Server. Use glite-job-status command to check job current status. Your job identifier is: - https://glite-rb2.ct.infn.it:9000/xD7PgJBfdbyE-x5O9LhBWA The job identifier has been saved in the following file: /home/larocca/tips_and_tricks/parametric/jobs.id ============================================================================== creating input file input2.txt; creating JDL file job2.jdl Selected Virtual Organisation name (from proxy certificate extension): gilda Connecting to host glite-rb2.ct.infn.it, port 7772 Logging to host glite-rb2.ct.infn.it, port 9002 ================ glite-job-submit Success ===================================== The job has been successfully submitted to the Network Server. Use glite-job-status command to check job current status. Your job identifier is: - https://glite-rb2.ct.infn.it:9000/jF7Dz4nzzDrTywIgUijCYQ The job identifier has been saved in the following file: /home/larocca/tips_and_tricks/parametric/jobs.id ================================================================================

10 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 10 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI $ glite-job-status --noint -i jobs.id ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://glite-rb2.ct.infn.it:9000/xD7PgJBfdbyE-x5O9LhBWA Current Status: Done (Success) Exit Code: 0 Status Reason: Job terminated successfully Destination: grid011f.cnaf.infn.it:2119/jobmanager-lcgpbs-long Submitted: Thu Mar 15 16:15:31 2007 CET ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://glite-rb2.ct.infn.it:9000/jF7Dz4nzzDrTywIgUijCYQ Current Status: Done (Success) Exit Code: 0 Status Reason: Job terminated successfully Destination: ce-nano-37.to.infn.it:2119/jobmanager-lcgpbs-short Submitted: Thu Mar 15 16:15:35 2007 CET *************************************************************

11 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 11 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI $ glite-job-output --dir. --noint -i jobs.id Retrieving files from host: glite-rb2.ct.infn.it ( for https://glite-rb2.ct.infn.it:9000/xD7PgJBfdbyE-x5O9LhBWA ) ************************************************************ JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://glite-rb2.ct.infn.it:9000/jF7Dz4nzzDrTywIgUijCYQ have been successfully retrieved and stored in the directory: /home/larocca/tips_and_tricks/parametric/larocca_jF7Dz4nzzDrTywIgUij CYQ ************************************************************ Retrieving files from host: glite-rb2.ct.infn.it ( for https://glite-rb2.ct.infn.it:9000/jF7Dz4nzzDrTywIgUijCYQ ) ************************************************************ JOB GET OUTPUT OUTCOME Output sandbox files for the job: - https://glite-rb2.ct.infn.it:9000/jF7Dz4nzzDrTywIgUijCYQ have been successfully retrieved and stored in the directory: /home/larocca/tips_and_tricks/parametric/larocca_jF7Dz4nzzDrTywIgUij CYQ ************************************************************

12 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 12 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI Exercise 2: DAGs

13 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 13 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI 1 Exercise 2: DAG modelling DAGs can be emulated with a simplified Petri net – A job is submitted only when activating jobs have terminated – Each transition bar corresponds to a bash script that  Waits for termination of activating job(s) by polling every minute  Collects the output  Submits next job(s) job 23456

14 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 14 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI./last_job.sh: submits the last job and waits for its completion, downloading the output./polling.sh: waits for jobs [1..n] completion, collect the output and creates the final input file./submitter.sh: generates input[1..n].txt and submits jobs Exercise 2: An example We emulate a simple split and merge DAG – Two states machine – Anyway, this example can be extended to any possible DAG 12n input1.txt input2.txt input(n).txt last stdout final_input final_output

15 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 15 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI Exercise 2:./submitter.sh #!/bin/sh if [ "$1" = "" ]; then echo "Usage: $0 num-splits“ ; exit 0 fi for ((index=1; index<= $1; index++)); do # for each job echo "this is the content of input$index.txt" >> input$index.txt ( ## creates the jdl for this job echo "Type=\"Job\";" echo "JobType=\"Normal\";" echo "Executable=\"/bin/cat\";" echo "Arguments=\"input$index.txt\";" echo "InputSandbox={\"input$index.txt\"};" echo "StdOutput=\"stdout.txt\";" echo "StdError=\"stderr.txt\";" echo "OutputSandbox={\"stdout.txt\", \"stderr.txt\"};" ) > job$index.jdl; glite-job-submit -o jobs.id job$index.jdl done

16 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 16 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI Exercise 2:./submitter.sh [fscibi@glite-tutor dag]$./submitter.sh 2.. The job has been successfully submitted to the Network Server. Use glite-job-status command to check job current status. Your job identifier is: - https://glite-rb2.ct.infn.it:9000/Od68j9IBOuJHGlUq-EfWTg The job identifier has been saved in the following file: /home/fscibi/tips_and_tricks/dag/jobs.id.. The job has been successfully submitted to the Network Server. Use glite-job-status command to check job current status. Your job identifier is: - https://glite-rb2.ct.infn.it:9000/-suh1wmmo1VvYJd_4AiLhA The job identifier has been saved in the following file: /home/fscibi/tips_and_tricks/dag/jobs.id

17 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 17 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI Exercise 2:./polling.sh (1/4) #!/bin/sh while read line; do if [ "$line" != "###Submitted Job Ids###" ]; then joblist="$joblist $line" fi done < jobs.id for job in $joblist; do status="unknown" finished="false" while [ "$finished" = "false" ]; do # loops waiting for job completion ## Gets the status of the job echo echo "getting status of job $job" output=`glite-job-status $job` status=`echo "$output" | grep "Current Status" | awk '{print $3 }'` echo "status = $status"

18 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 18 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI Exercise 2:./polling.sh (2/4) ## depens on the status, decides what to do case $status in "Aborted“ ) echo "The job has been aborted on the CE" finished="true" ;; "Cleared“ ) echo "The job output sandbox has been already retrieved. I don't know where!" finished="true" ;; "Done“ ) echo "Job $job Done!!! Downloading the output" ## executes and parses the output of glite-job-output ## to understand where the output has been stored

19 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 19 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI Exercise 2:./polling.sh (3/4) glite-job-output --dir. $job | ( ## pipes the glite-job-output to llok for job status found="false" while read line; do if "$found" = "true“ ; then ## this line contains the dir path dirpath=$line echo "output sandbox stored at $dirpath" break fi if echo "$line" | grep -q "have been successfully retrieved and stored“ ; then found= "true" ## next line contains the dir path fi done if "$found" = "true“; then filename=$dirpath/stdout.txt echo "appending $filename to final_input" cat $filename >> final_input fi ) finished="true" ;;

20 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 20 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI Exercise 2:./polling.sh (4/4) *) echo "sleeping 1 minute" sleep 1m ;; esac done # while done # for [fscibi@glite-tutor dag]$./polling.sh... (after a while) getting status of job https://glite-rb2.ct.infn... Od68j9IBOuJHGlUq-EfWTg status = Done Job https://glite-rb2.ct.infn... Done!!! Downloading the output output sandbox stored at... /dag/fscibi_Od68j9IBOuJHGlUq-EfWTg appending...dag/fscibi_Od68j9IBOuJHGlUq-EfWTg/stdout.txt to final_input getting status of job https://glite-rb2.ct.infn... _-suh1wmmo1VvYJd_4AiLhA status = Done Job https://glite-rb2.ct.infn... Done!!! Downloading the output output sandbox stored at... /dag/fscibi_-suh1wmmo1VvYJd_4AiLhA appending... dag/fscibi_-suh1wmmo1VvYJd_4AiLhA/stdout.txt to final_input

21 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 21 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI Exercise 2: Submitting last job [fscibi@... ] cat last_job.sh #!/bin/sh ## submits the last job glite-job-submit -o last_job.id last_job.jdl status=unknown while [ "$status" != "Done" ]; do echo "sleeping 30 seconds“ sleep 30s output=`edg-job-status -i last_job.id` status=`echo "$output" | grep "Current Status" | awk '{print $3 }'` echo "status = $status" done glite-job-output -i last_job.id --dir. echo "Everything is Done !!! " [fscibi@... ] cat last_job.jdl Type="Job"; JobType="Normal"; Executable="/bin/cat"; Arguments="-n final_input"; StdOutput="final_output"; StdError="stderr.txt"; InputSandBox={"final_input"}; OutputSandbox={"stderr.txt", "final_output"};

22 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 22 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI Exercise 2:./last_job.sh output [fscibi@glite-tutor dag]$./last_job.sh... The job has been successfully submitted to the Network Server.... - https://glite-rb2.ct.infn.it:9000/q_rqVQpNFt0GshDn5MZHEw... sleeping 30 seconds (many times) status = Scheduled (waiting for status Done)... sleeping 30 seconds status = Done Retrieving files from host:... Output sandbox files for the job: - https://glite-rb2.ct.infn.it:9000/q_rqVQpNFt0GshDn5MZHEw have been successfully retrieved and stored in the directory: /home/fscibi/tips_and_tricks/dag/fscibi_q_rqVQpNFt0GshDn5MZHEw "Everything is Done !!! [fscibi@glite-tutor dag]$ cat fscibi_q_rqVQpNFt0GshDn5MZHEw/final_output 1 this is the content of input1.txt 2 this is the content of input2.txt

23 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 23 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI Jobs + Data

24 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 User can stage input files from the UI to the WN using the InputSandbox attribute InputSandbox = {"codesa.i686", "start_root.sh", "./Korba/atmbc.const", "./Korba/bctran-window.3", "./Korba/codesa3d.fnames", ".rootrc", "convert.C", "GraphCODESA3D.C"}; Overview The upper limit for InputSandbox is 10Mbyte!

25 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 25 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI What can I do if my job requires huge data to be processed ?..the question

26 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 26 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI InputData InputData (optional) This is a string or a list of strings representing the Logical File Name (LFN) or Grid Unique Identifier (GUID) needed by the job as input. The list is used by the RB to find the close CE (from which the specified files can be better accessed) and schedules the job to run there. InputData = {“lfn:cmstestfile”, “guid:135b7b23-4a6a-11d7-87e7-9d101f8c8b70”};..the answer /1

27 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 27 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI DataAccessProtocol DataAccessProtocol (mandatory if InputData has been specified) The protocol or the list of protocols which the application is able to “speak” with for accessing files listed in InputData on a given SE. gsiftp file Supported protocols in gLite are currently gsiftp, and file. DataAccessProtocol = {“file”,“gsiftp”};..the answer /2

28 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 28 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI [ JobType = "normal"; Type = "Job"; Executable = "/bin/bash"; Arguments = "pds2jpg_install.sh \ MER_FR__2PNUPA20030504_092534_000000502016_00079_06145_0033"; StdOutput = "pds2jpg.out"; StdError = "pds2jpg.err"; InputSandbox = {"./pds2jpg_install.sh","./beam20.tar.gz"}; InputData = {"lfn:/grid/gilda/MER_FR__2PNUPA20030504_092534_000000502016_00079_06145_0033.N1"}; DataAccessProtocol = {"gridftp","rfio","gsiftp"}; OutputSandbox = { "MER_FR__2PNUPA20030504_092534_000000502016_00079_06145_0033.jpg", "ENVISAT_Product_courtesy_of_European_Space_Agency", "pds2jpg.out", "pds2jpg.err" }; RetryCount = 3; ] pds2jpg-MERIS-Etna.jdl

29 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 29 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI #!/bin/sh echo Staging Input Data \(Courtesy of European Space Agency\); #Skip the "/" from the argument. file=`echo $1 | awk -F '/' '{print $2}'` echo lcg-cp --vo gilda lfn:/grid/gilda/MER_FR__2PNUPA20030504_092534_000000502016_00079_06145_0033.N1 file:`pwd`/${file}.N1 lcg-cp --vo gilda lfn:/grid/gilda/MER_FR__2PNUPA20030504_092534_000000502016_00079_06145_0033.N1 file:`pwd`/${file}.N1 echo Staging Application; ls -al gunzip beam20.tar.gz; tar xvf beam20.tar; cd beam-2.0/bin; echo Starting Application; echo "./pds2jpg-run.sh $file;"./pds2jpg-run.sh $file; echo "mv $file.jpg../.." mv $file.jpg../.. touch../../ENVISAT_Product_courtesy_of_European_Space_Agency echo "Input ENVISAT Product courtesy of European Space Agency">../../ENVISAT_Product_courtesy_of_European_Space_Agency echo No Output Packaging; pds2jpg_install.sh

30 Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 30 Training the Trainers – in conjunction with ISGC 19-23 March 2007 - TAIPEI the output.. Input ENVISAT Product courtesy of European Space Agency


Download ppt "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks WMS tricks & tips – further scripting Giuseppe."

Similar presentations


Ads by Google