Presentation is loading. Please wait.

Presentation is loading. Please wait.

USING THE HUBZERO PLATFORM TO ENABLE REMOTE COMPUTING ON DIAGRID HUBBUB 2015 - SEPTEMBER 15 TH 2015 Christopher Thompson Rosen Center of Advanced Computing.

Similar presentations


Presentation on theme: "USING THE HUBZERO PLATFORM TO ENABLE REMOTE COMPUTING ON DIAGRID HUBBUB 2015 - SEPTEMBER 15 TH 2015 Christopher Thompson Rosen Center of Advanced Computing."— Presentation transcript:

1 USING THE HUBZERO PLATFORM TO ENABLE REMOTE COMPUTING ON DIAGRID HUBBUB 2015 - SEPTEMBER 15 TH 2015 Christopher Thompson Rosen Center of Advanced Computing Purdue University

2 OUTLINE Why compute remotely? Example Tools SubmitR BLASTer Review

3 HOW TO LOCAL COMPUTE I’VE GOT AN OLD MACHINE AROUND HERE SOMEWHERE FOR THAT… Gather Input Set Params Run Code Review Output Where? Under desk? Other users? Slow! “I swear it’s in the right directory!” ~ The Users Now what? What did I use yesterday? Did X=4 or X=7? Some users are savvy, know about terminals and clusters. The standard user experience before the HUB for most, though… “I think it will be done next Tuesday…”

4 WHY REMOTE COMPUTE? DIAGRID REMOTE COMPUTES BECAUSE…  DiaGrid Purpose: low-barrier to (idle) cycles through a HUB Started as a frontend to “DiaGrid” HT Condor Pool ~50,000 cores distributed across partner campuses Now handles tools with many job types, destinations  Scalability Computation Always want more: resolution, complexity, etc. Storage “Big Data” Not mutually exclusive! More computation  Bigger data Bigger data  More computation  Users bring remote resource needs Ex: XSEDE projects (Awarded X cpu hours at Y cluster) DiaGrid Partner Institutions Purdue UniversityUniversity of Notre Dame University of WisconsinIndiana State University University of Nebraska- Lincoln Indiana U. - Purdue U. Fort Wayne University of LouisvillePurdue U. North Central Indiana UniversityPurdue U. Calument

5 HOW TO REMOTE COMPUTE GET THOSE TOOLS ONLINE! Luckily, HubZero gives us tool sessions & ‘submit’! Gather Input Set Params Run Code Review Output Upload Input to DiaGrid (Generate?) Set Params ‘submit’ Runs Code Review Output Where? Under desk? Other users? Slow! and download… and share…. and publish…. and analyze… “Who cares where!?” ~ The Users Fast! “Let the tool figure out where it goes!” ~ The Users “I swear it’s in the right directory!” ~ The Users Now what? Save settings for future sessions! What did I use yesterday? Did X=4 or X=7?

6 SUBMITR R SCRIPT EXECUTION

7 SUBMITR WHAT IS SUBMITR? GUI to upload and run R scripts Submit to HPC cluster @ Purdue Build argument lists Monitor running jobs Written in Python R used by: Statisticians Everyone!

8 SUBMITR WORKFLOW User writes R script Upload script Configure job params Submit & run job Download output

9 SUBMITR SUBMIT JOB TYPES Serial “Traditional” single core scripts Parallel Multiple cores, nodes Uses R parallel libraries (Snow) Parameter Sweeps Define sweep variables in args List ranges in SubmitR UI ‘submit’ creates all the jobs Ex: Monte Carlo simulations

10 SUBMITR ARCHITECTURE DiaGrid.org Tool Session submit SubmitR http:// www. SSH

11 SUBMITR SUBMIT COMMAND

12 SUBMITR SUBMIT COMMAND R CMD BATCH –q “--args [1-100][a,b,c]” test1.R

13 SUBMITR SUBMIT COMMAND submit --debug venue flags list of job files (-i) R command and args

14 SUBMITR HOW TO CALL SUBMIT IN PYTHON thread = threading.Thread(submit_thread) # See function below thread.start(). def submit_thread(): cmd = build_submit() # Assembles "submit..." string cmd = 'exec ' + cmd # Prevents additional process (enables cleanup) # Execute submit, update UI with output while it is running ui.set_status('Submitting job...') ui.log('Job run started') line = '(initial)' # Holds latest status update from submit lines = [] # Accumulates status updates try: sub = subprocess.Popen( cmd, shell = True, bufsize = 1, # "line buffered" stdout=subprocess.PIPE, stderr=subprocess.STDOUT, close_fds = True) while line: line = sub.stdout.readline() lines.append(line) ui.set_status('Running...' + line) returncode = sub.wait() except: for obj in sys.exc_info(): # Explanatory text available? lines.append(str(obj)) ui.log('Job run ended') ui.log('Output from submit:\n' +''.join(lines)+'(end of output)') Run submit command in a function that is used as the body of a thread. Create the ‘submit’ command as a string. Send an update to the UI to tell user the job submission process has started and setup string array to hold statuses 1. Use subprocess.Popen to create process for ‘submit’ cmd. 2. Capture stdout & stderr from the process and use to keep user updated. Loop and read line by line. 3. When ‘submit’ done, stdout stream closes and loop will exit. Store the process exit code. Use try-except block to capture any run errors. Finally, update UI again to show job is done.

15 BLASTER GENOME SEQUENCE SEARCHING

16 BLASTER WHAT IS BLASTER? BLAST Bioinformatics tools, industry-standard developed by NCBI Query gene sequence databases for similar sequences Suite of command-line tools BLASTer Java-based GUI for running BLAST Manages multiple searches, history of searches Handles different BLAST variants Submits to Purdue HPC resources, hides all execution details from user Maintains up-to-date copy of standard NCBI databases or allows custom DB uploaded from user.

17 BLASTER WORKFLOW User uploads FASTA input file (& optionally custom DB) Select search parameters Submit search job Monitor search job progress View/download output

18 BLASTER SUBMIT JOB TYPES HT Condor (Pegasus) First mode BLASTer supported Input files broken into small pieces of <100 sequences, run in parallel Each piece separate job sent to HT Condor pool via ‘submit’ with Pegasus BLASTer merges results from each at end However… Standard DB grew over time, some as much as 2 magnitudes in size! Jobs that ran in 5-10min now run for hours… PBS Now submits to HPC cluster at Purdue with PBS job handling Supports longer runtimes: defined walltimes, no eviction from nodes like HT Condor ‘Submit’ allowed this change with very little effort Most details hidden behind ‘submit’ and away from tool code Changed some arguments, tweaked some output/status parsing

19 BLASTER INTERNAL ARCHITECTURE BLASTer GUI Classes Job Data Manager Engine Manager Condor Engine PBS Engine ??? Engine Main Window Job History Viewer Job Config Panel Job Control Panel Submit Pegasus … http:// www. History Current 1 1 2 2 3 3 4 4 5 5 6 6 7 7 SSH

20 BLASTER OVERALL ARCHITECTURE DiaGrid.org Tool Session submit BLASTer http:// www. Pegasus GUI Job Mngr Engine1 Engine2 Engine3 Condor PBS ?? SSH

21 BLASTER SUBMIT COMMAND Determines executable used by submit Input file processed and sent by submit with job Arguments of the BLAST executable appended to the submit command and sent with the job

22 BLASTER HOW TO CALL SUBMIT IN JAVA – STARTING THE SUBMISSION THREAD Engine Manager Condor Engine PBS Engine ??? Engine public synchronized JobStatus submitJob(Job job) { // create status item for job submission & add to list of active jobs JobStatus queuedStatus = new JobStatus(QUEUED); job.setStatus(queuedStatus); this.jobs.put(jobId, job); // create a thread to handle the submission and run it SubmitPbsThread submitThread = new SubmitPbsThread(job); this.jobThreads.put(jobId, submitThread); submitThread.start(); return queuedStatus; } public synchronized JobStatus submitJob(Job job) { // create status item for job submission & add to list of active jobs JobStatus queuedStatus = new JobStatus(QUEUED); job.setStatus(queuedStatus); this.jobs.put(jobId, job); // create a thread to handle the submission and run it SubmitPbsThread submitThread = new SubmitPbsThread(job); this.jobThreads.put(jobId, submitThread); submitThread.start(); return queuedStatus; } New Job When user clicks Submit, new job data object created and sent to engine manager which routes it to correct “engine” class for job type. A thread is started to actually run the submit command.

23 BLASTER HOW TO CALL SUBMIT IN JAVA – THE SUBMISSION & MONITORING THREADS Submit Thread Parse Parameters Setup Job Data Start ‘submit’ Command Start Monitoring Thread Monitoring Thread Wait ‘submit’ Done Wait ‘submit’ Done Merge & Prepare Output Merge & Prepare Output End ‘submit’ Done? ‘submit’ Done? End YES NO Scan Job Status Update Data Structs SLEEP(10s) Trigger UI Update

24 BLASTER HOW TO CALL SUBMIT IN JAVA – RUNNING THE SUBMIT SHELL SCRIPT // gather all the user options & system settings needed String[] jobArgs = parseParameters(job) … // run the shell script to call submit Process p = Runtime.getRuntime().exec(jobArgs); InputStream stdout = p.getInputStream(); InputStream stderr = p.getErrorStream(); … // start a thread that will keep an eye on process & update UI SubmitMonitorThread monitorThread = new SubmitMonitorThread(stdout, stderr); monitorThread.start(); … // block until submit script is done int exitCode = p.waitFor(); // stop the monitoring thread monitorThread.interrupt(); jobArgs = /bin/bash blast_submit.sh input.fasta blastx nr 10 3 3 …

25 BLASTER THE SUBMIT COMMAND SHELL SCRIPT #!/bin/bash FASTA=$1 PROGRAM=$2 DB=$3 shift 3 ARGUMENTS="$@" # get full path to latest copy of database DB=`tail -1 /data/tools/blastgui/versions`'/'${DB} # split the input file into the appropriate number of chunks./blast-split-contigs.py ${FASTA} submit --metrics -p @@seq=globnat:seq* ${PROGRAM} -query @@seq -db ${DB} ${ARGUMENTS} \ -out splitfile.output 1>>stdout 2>>stderr Input file of sequences Specific BLAST executable to run Name of database user wants to search BLAST executable arguments from user’s options submit --metrics -p @@seq=globnat:seq* \ ${PROGRAM} -query @@seq -db ${DB} ${ARGUMENTS} \ -out splitfile.output 1>>stdout 2>>stderr jobArgs = /bin/bash blast_submit.sh input.fasta blastx nr 10 3 3 … $0 $1 $2 $3 $4 $5 …

26 IN REVIEW

27 REMOTE COMPUTING TIPS A PATTERN FOR TOOL DEVELOPMENT Same general process, regardless of language or job type: 1.Build ‘submit’ command arguments from:  User selected options in UI  Automatically generated (ex: scan uploaded files)  Fixed values (ex: executable names, queue names, file paths, etc) 2.Create a new thread to execute ‘submit’ command A.Exec the ‘submit’ command B.Wait for ‘submit’ process to finish 3.Monitor status of ‘submit’ thread  Collect stdout/stderr while it runs  Update UI periodically  React to any problems that arise 4.Collect / process / analyze output

28 MORE DIAGRID TOOLS SO MANY TOOLS... ToolDescriptionLanguageJob TypesDestination CryoEM Electron microscope image analysis Python / C++ HT Condor & PBS DiaGrid Condor Pool / Hansen (Purdue) NAMDD GUI for NAMD, popular molecular dynamics tool Python PBS (SSH) Hansen (Purdue) GROMAC- SIMUM GUI for GROMACS, popular molecular dynamics tool Java PBS (SSH) Hansen (Purdue) Spyder Python IDE with built-in job submission Python HT Condor (Pegasus) DiaGrid Condor Pool NLACE Biomechanical image analysis Python PBS (GSISSH) Gordon (SDCC) And many more…

29 ACKNOWLEDGEMENTS THANKS! HUBzero team! Not possible without ‘submit’ system Support from HZ admins invaluable in tool dev

30 QUESTIONS? ANSWERS! Also…. Explore, interact, & contribute at: http://diagrid.org ?


Download ppt "USING THE HUBZERO PLATFORM TO ENABLE REMOTE COMPUTING ON DIAGRID HUBBUB 2015 - SEPTEMBER 15 TH 2015 Christopher Thompson Rosen Center of Advanced Computing."

Similar presentations


Ads by Google