Introduction to research computing using Condor

Slides:



Advertisements
Similar presentations
NGS computation services: API's,
Advertisements

Ian C. Smith* Introduction to research computing using Condor *Advanced Research Computing University of Liverpool.
Division of Pharmacokinetics and Drug Therapy Department of Pharmaceutical Biosciences Uppsala University Estimating and forecasting in vivo drug disposition.
Lecture 1: History of Operating System
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Types of Operating System
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
Remote OMNeT++ v2.0 Introduction What is Remote OMNeT++? Remote environment for OMNeT++ Remote simulation execution Remote data storage.
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Ian C. Smith* Introduction to research computing using the High Performance Computing facilities and Condor *Advanced Research Computing University of.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
Monte Carlo Instrument Simulation Activity at ISIS Dickon Champion, ISIS Facility.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Experiences with a HTCondor pool: Prepare to be underwhelmed C. J. Lingwood, Lancaster University CCB (The Condor Connection Broker) – Dan Bradley
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
OPERATING SYSTEMS Lecture 3: we will explore the role of the operating system in a computer Networks and Communication Department 1.
© GCSE Computing Computing Hardware Starter. Creating a spreadsheet to demonstrate the size of memory. 1 byte = 1 character or about 1 pixel of information.
GRID activities in Wuppertal D0RACE Workshop Fermilab 02/14/2002 Christian Schmitt Wuppertal University Taking advantage of GRID software now.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Unix Servers Used in This Class  Two Unix servers set up in CS department will be used for some programming projects  Machine name: eustis.eecs.ucf.edu.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
The Gateway Computational Web Portal Marlon Pierce Indiana University March 15, 2002.
7.1 Operating Systems. 7.2 A computer is a system composed of two major components: hardware and software. Computer hardware is the physical equipment.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
UNIX U.Y: 1435/1436 H Operating System Concept. What is an Operating System?  The operating system (OS) is the program which starts up when you turn.
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Website Deployment Week 12. Software Engineering Practices Consider the generic process framework – Communication – Planning – Modeling – Construction.
Advanced Computing Facility Introduction
High Performance Computing (HPC)
Condor DAGMan: Managing Job Dependencies with Condor
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Nature & Types of Software
WP18, High-speed data recording Krzysztof Wrona, European XFEL
2. OPERATING SYSTEM 2.1 Operating System Function
Operating System.
Software Architecture in Practice
Example: Rapid Atmospheric Modeling System, ColoState U
Types of Operating System
ECRG High-Performance Computing Seminar
GWE Core Grid Wizard Enterprise (
Where are being used the OS?
William Stallings Computer Organization and Architecture
Architecture & System Overview
Grid Computing.
Genomic Data Clustering on FPGAs for Compression
Introduction to client/server architecture
TYPES OFF OPERATING SYSTEM
Client-Server Interaction
Chapter 1: Introduction
Chapter 16: Distributed System Structures
Grid Canada Testbed using HEP applications
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Advanced Computing Facility Introduction
Introduction to Embedded Systems
Introduction to High Throughput Computing and HTCondor
Threads Chapter 4.
Multithreaded Programming
Computer Evolution and Performance
Memory management Explain how memory is managed in a typical modern computer system (virtual memory, paging and segmentation should be described.
Experience with the process automation at SORS
Introduction to High Performance Computing Using Sapelo2 at GACRC
LO2 – Understand Computer Software
Year 10 Computer Science Hardware - CPU and RAM.
Chapter 1: Introduction
Presentation transcript:

Introduction to research computing using Condor Ian C. Smith* *Advanced Research Computing University of Liverpool

What’s special about research computing ? Often researchers need to tackle problems which are far too demanding for a typical PC or laptop computer Programs may take too long to run or … require too much memory or … too much storage (disk space) or … all of these ! Special computer systems and programming methods can help overcome these barriers

Speeding things up Key to reducing run times is parallelism - splitting large problems into smaller tasks which can be tackled at the same time (i.e. “in parallel” or “concurrently”) Two main types of parallelism: data parallelism functional parallelism (pipelining) Tasks may be independent or inter-dependent (this eventually limits the speed up which can be achieved) Fortunately many statistical problems exhibit data parallelism and tasks can be performed independently … this can lead to very significant speed ups ! 

High Performance Computing (HPC) Uses powerful special purpose systems called HPC clusters Contain large numbers of processors acting in parallel Each processor may contain multiple processing elements (cores) which can also work in parallel Provide lots of memory and large amounts of fast (parallel) disk storage – ideal for data-intensive applications Typically run parallel programs containing inter-dependent tasks (e.g. finite element analysis codes) but also suitable for statistics applications

High Throughput Computing (HTC) using Condor No dedicated hardware - uses ordinary classroom PCs to run jobs when then they would otherwise be idle (usually evenings and weekends) Jobs may be interrupted by users logging into Condor PCs – works best for short running jobs (10-20 minutes ideally, ~ 8 hours max) Only suitable for applications which use independent tasks (need to use HPC inter-dependent tasks) No shared storage – all data files must be transferred to/from the Condor PCs Limited memory and disk space available since Condor uses only commodity PCs However… Condor is well suited to many statistical applications 

Condor pool operation Desktop PC Condor Server login and upload input data Execute hosts Execute hosts

Condor pool operation Desktop PC Condor Server jobs jobs Execute hosts

Condor pool operation Desktop PC Condor Server results results Execute hosts Execute hosts

Condor pool operation Desktop PC Condor Server download results Execute hosts Execute hosts

University of Liverpool Condor Pool contains over 1000 classroom PCs running the Managed Windows 10 Service Each PC can run a maximum of 4 jobs concurrently giving a theoretical capacity of over 4000 parallel jobs Typical spec: 3.3 GHz Intel i5 processor, 8 GB memory, 128 GB disk space Tools are available to help in running large numbers of R and MATLAB jobs (other software may work but not commercial packages such as SAS and Stata). Also some Python support. Single job submission point for Condor jobs provided by powerful UNIX server Service can be also accessed from a Windows PC/laptop using Desktop Condor (even from off-campus)

Bootstrap example samples.dat seed0.dat seed1.dat seed2.dat results0.dat results1.dat results2.dat bootstrap.R seed999.dat results999.dat combine samples.dat stats.dat

Bootstrap example $ ls bootstrap.R samples.dat seed*.dat bootstrap.R seed27.dat seed460.dat seed641.dat seed822.dat samples.dat seed280.dat seed461.dat seed642.dat seed823.dat seed0.dat seed281.dat seed462.dat seed643.dat seed824.dat ...

Bootstrap example $ ls bootstrap.R samples.dat seed*.dat bootstrap.R seed27.dat seed460.dat seed641.dat seed822.dat samples.dat seed280.dat seed461.dat seed642.dat seed823.dat seed0.dat seed281.dat seed462.dat seed643.dat seed824.dat ... $ cat run_bootstrap R_script = bootstrap.R indexed_input_files = seed.dat indexed_output_files = results.dat total_jobs = 1000

Bootstrap example $ ls bootstrap.R samples.dat seed*.dat bootstrap.R seed27.dat seed460.dat seed641.dat seed822.dat samples.dat seed280.dat seed461.dat seed642.dat seed823.dat seed0.dat seed281.dat seed462.dat seed643.dat seed824.dat ... $ cat run_bootstrap R_script = bootstrap.R indexed_input_files = seed.dat indexed_output_files = results.dat total_jobs = 1000 $ r_submit run_bootstrap Submitting job(s)... 1000 job(s) submitted to cluster 952.

Bootstrap example $ ls bootstrap.R samples.dat seed*.dat bootstrap.R seed27.dat seed460.dat seed641.dat seed822.dat samples.dat seed280.dat seed461.dat seed642.dat seed823.dat seed0.dat seed281.dat seed462.dat seed643.dat seed824.dat ... $ cat run_bootstrap R_script = bootstrap.R indexed_input_files = seed.dat indexed_output_files = results.dat total_jobs = 1000 $ r_submit run_bootstrap Submitting job(s)... 1000 job(s) submitted to cluster 952. $ condor_q -- Schedd: Q1@condor1 : <10.102.32.11:37851?... @ 05/21/19 11:26:56 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS smithic CMD: run_bootstrap.bat 5/21 10:57 68 130 802 1000 952.4-999

Bootstrap example $ ls bootstrap.R samples.dat seed*.dat bootstrap.R seed27.dat seed460.dat seed641.dat seed822.dat samples.dat seed280.dat seed461.dat seed642.dat seed823.dat seed0.dat seed281.dat seed462.dat seed643.dat seed824.dat ... $ cat run_bootstrap R_script = bootstrap.R indexed_input_files = seed.dat indexed_output_files = results.dat total_jobs = 1000 $ r_submit run_bootstrap Submitting job(s)... 1000 job(s) submitted to cluster 952. $ condor_q -- Schedd: Q1@condor1 : <10.102.32.11:37851?... @ 05/21/19 11:26:56 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS smithic CMD: run_bootstrap.bat 5/21 10:57 68 130 802 1000 952.4-999 $ ls results*.dat results0.dat results281.dat results462.dat results643.dat results824.dat results100.dat results282.dat results463.dat results644.dat results825.dat results101.dat results283.dat results464.dat results645.dat results826.dat

Example application: Bayesian Models using MCMC Fitting multivariate mixed models to model longitudinal data Two main uses for Condor: compare various similar models to select the best model run simulations - simulate 100 datasets and to each one fit a MCMC model Single simulation takes ~ 1 day but Condor can run 100 simulations in parallel Simulations take ~ 1 day instead of around ~ 3 months

Recent usage

Annual Usage

Summary Parallelism can help speed up the solution of many research computing problems by dividing large problems into many smaller ones which can be tackled at the same time Condor High Throughput Computing Service Typically used for large/very large numbers of short running jobs Limited memory and storage available on Condor PCs Support available for applications using R, MATLAB and Python No UNIX knowledge needed with Desktop Condor High Performance Computing clusters Typically used for small numbers of long running jobs Ideal for applications requiring lots of memory and disk storage space Almost all systems are UNIX-based

Further information Condor Service: http://condor.liv.ac.uk To request an account on Condor: go to ServiceNow then click: Make a request > Accounts > Application to access high performance/throughput computing facilities Background information on HPC clusters: http://clusterinfo.liv.ac.uk Information on the Advanced Research Computing (ARC) facilities: http://www.liv.ac.uk/csd/advanced-research-computing, http://arc.liv.ac.uk To contact the ARC team email: arc-support@liverpool.ac.uk or contact me i.c.smith@liverpool.ac.uk More presentations ???