Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to research computing using Condor

Similar presentations


Presentation on theme: "Introduction to research computing using Condor"— Presentation transcript:

1 Introduction to research computing using Condor
Ian C. Smith* *Advanced Research Computing University of Liverpool

2 What’s special about research computing ?
Often researchers need to tackle problems which are far too demanding for a typical PC or laptop computer Programs may take too long to run or … require too much memory or … too much storage (disk space) or … all of these ! Special computer systems and programming methods can help overcome these barriers

3 Speeding things up Key to reducing run times is parallelism - splitting large problems into smaller tasks which can be tackled at the same time (i.e. “in parallel” or “concurrently”) Two main types of parallelism: data parallelism functional parallelism (pipelining) Tasks may be independent or inter-dependent (this eventually limits the speed up which can be achieved) Fortunately many statistical problems exhibit data parallelism and tasks can be performed independently … this can lead to very significant speed ups ! 

4 High Performance Computing (HPC)
Uses powerful special purpose systems called HPC clusters Contain large numbers of processors acting in parallel Each processor may contain multiple processing elements (cores) which can also work in parallel Provide lots of memory and large amounts of fast (parallel) disk storage – ideal for data-intensive applications Typically run parallel programs containing inter-dependent tasks (e.g. finite element analysis codes) but also suitable for statistics applications

5 High Throughput Computing (HTC) using Condor
No dedicated hardware - uses ordinary classroom PCs to run jobs when then they would otherwise be idle (usually evenings and weekends) Jobs may be interrupted by users logging into Condor PCs – works best for short running jobs (10-20 minutes ideally, ~ 8 hours max) Only suitable for applications which use independent tasks (need to use HPC inter-dependent tasks) No shared storage – all data files must be transferred to/from the Condor PCs Limited memory and disk space available since Condor uses only commodity PCs However… Condor is well suited to many statistical applications 

6 Condor pool operation Desktop PC Condor Server login and
upload input data Execute hosts Execute hosts

7 Condor pool operation Desktop PC Condor Server jobs jobs Execute hosts

8 Condor pool operation Desktop PC Condor Server results results
Execute hosts Execute hosts

9 Condor pool operation Desktop PC Condor Server download results
Execute hosts Execute hosts

10 University of Liverpool Condor Pool
contains over 1000 classroom PCs running the Managed Windows 10 Service Each PC can run a maximum of 4 jobs concurrently giving a theoretical capacity of over 4000 parallel jobs Typical spec: 3.3 GHz Intel i5 processor, 8 GB memory, 128 GB disk space Tools are available to help in running large numbers of R and MATLAB jobs (other software may work but not commercial packages such as SAS and Stata). Also some Python support. Single job submission point for Condor jobs provided by powerful UNIX server Service can be also accessed from a Windows PC/laptop using Desktop Condor (even from off-campus)

11 Bootstrap example samples.dat seed0.dat seed1.dat seed2.dat
results0.dat results1.dat results2.dat bootstrap.R seed999.dat results999.dat combine samples.dat stats.dat

12 Bootstrap example $ ls bootstrap.R samples.dat seed*.dat
bootstrap.R seed27.dat seed460.dat seed641.dat seed822.dat samples.dat seed280.dat seed461.dat seed642.dat seed823.dat seed0.dat seed281.dat seed462.dat seed643.dat seed824.dat ...

13 Bootstrap example $ ls bootstrap.R samples.dat seed*.dat
bootstrap.R seed27.dat seed460.dat seed641.dat seed822.dat samples.dat seed280.dat seed461.dat seed642.dat seed823.dat seed0.dat seed281.dat seed462.dat seed643.dat seed824.dat ... $ cat run_bootstrap R_script = bootstrap.R indexed_input_files = seed.dat indexed_output_files = results.dat total_jobs = 1000

14 Bootstrap example $ ls bootstrap.R samples.dat seed*.dat
bootstrap.R seed27.dat seed460.dat seed641.dat seed822.dat samples.dat seed280.dat seed461.dat seed642.dat seed823.dat seed0.dat seed281.dat seed462.dat seed643.dat seed824.dat ... $ cat run_bootstrap R_script = bootstrap.R indexed_input_files = seed.dat indexed_output_files = results.dat total_jobs = 1000 $ r_submit run_bootstrap Submitting job(s)... 1000 job(s) submitted to cluster 952.

15 Bootstrap example $ ls bootstrap.R samples.dat seed*.dat
bootstrap.R seed27.dat seed460.dat seed641.dat seed822.dat samples.dat seed280.dat seed461.dat seed642.dat seed823.dat seed0.dat seed281.dat seed462.dat seed643.dat seed824.dat ... $ cat run_bootstrap R_script = bootstrap.R indexed_input_files = seed.dat indexed_output_files = results.dat total_jobs = 1000 $ r_submit run_bootstrap Submitting job(s)... 1000 job(s) submitted to cluster 952. $ condor_q -- Schedd: : 05/21/19 11:26:56 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS smithic CMD: run_bootstrap.bat 5/21 10:

16 Bootstrap example $ ls bootstrap.R samples.dat seed*.dat
bootstrap.R seed27.dat seed460.dat seed641.dat seed822.dat samples.dat seed280.dat seed461.dat seed642.dat seed823.dat seed0.dat seed281.dat seed462.dat seed643.dat seed824.dat ... $ cat run_bootstrap R_script = bootstrap.R indexed_input_files = seed.dat indexed_output_files = results.dat total_jobs = 1000 $ r_submit run_bootstrap Submitting job(s)... 1000 job(s) submitted to cluster 952. $ condor_q -- Schedd: : 05/21/19 11:26:56 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS smithic CMD: run_bootstrap.bat 5/21 10: $ ls results*.dat results0.dat results281.dat results462.dat results643.dat results824.dat results100.dat results282.dat results463.dat results644.dat results825.dat results101.dat results283.dat results464.dat results645.dat results826.dat

17 Example application: Bayesian Models using MCMC
Fitting multivariate mixed models to model longitudinal data Two main uses for Condor: compare various similar models to select the best model run simulations - simulate 100 datasets and to each one fit a MCMC model Single simulation takes ~ 1 day but Condor can run 100 simulations in parallel Simulations take ~ 1 day instead of around ~ 3 months

18 Recent usage

19 Annual Usage

20 Summary Parallelism can help speed up the solution of many research computing problems by dividing large problems into many smaller ones which can be tackled at the same time Condor High Throughput Computing Service Typically used for large/very large numbers of short running jobs Limited memory and storage available on Condor PCs Support available for applications using R, MATLAB and Python No UNIX knowledge needed with Desktop Condor High Performance Computing clusters Typically used for small numbers of long running jobs Ideal for applications requiring lots of memory and disk storage space Almost all systems are UNIX-based

21 Further information Condor Service: http://condor.liv.ac.uk
To request an account on Condor: go to ServiceNow then click: Make a request > Accounts > Application to access high performance/throughput computing facilities Background information on HPC clusters: Information on the Advanced Research Computing (ARC) facilities: To contact the ARC team or contact me More presentations ???


Download ppt "Introduction to research computing using Condor"

Similar presentations


Ads by Google