Ian C. Smith* Introduction to research computing using Condor *Advanced Research Computing University of Liverpool.

Slides:



Advertisements
Similar presentations
Operating System.
Advertisements

Operating Systems Manage system resources –CPU scheduling –Process management –Memory management –Input/Output device management –Storage device management.
Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Ian C. Smith* Harvesting unused clock cycles with Condor *Advanced Research Computing The University of Liverpool.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
CSCI 1411 FUNDAMENTALS OF COMPUTING LAB Lab Introduction 1 Shane Transue MSCS.
Getting Started with Linux Douglas Thain University of Wisconsin, Computer Sciences Condor Project October 2000.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.
By Mr. Abdalla A. Shaame 1. What is Computer An electronic device that stores, retrieves, and processes data, and can be programmed with instructions.
Computer Skills Preparatory Year Presented by: L.Obead Alhadreti.
A crash course in njit’s Afs
Condor Project Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
DB2 (Express C Edition) Installation and Using a Database
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
© Paradigm Publishing Inc. 4-1 Chapter 4 System Software.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>
CS110/CS119 Introduction to Computing (Java)
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Chapter 4 System Software.
Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
CS240 Computer Science II Introduction to Unix Based on “UNIX for Programmers and Users” by G.Class and K. Ables.
Accessing the Amazon Elastic Compute Cloud (EC2) Angadh Singh Jerome Braun.
Introduction to HPC resources for BCB 660 Nirav Merchant
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Cluster Computing Applications for Bioinformatics Thurs., Aug. 9, 2007 Introduction to cluster computing Working with Linux operating systems Overview.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
Ian C. Smith Towards a greener Condor pool: adapting Condor for use with energy-efficient PCs.
Ian C. Smith* Introduction to research computing using the High Performance Computing facilities and Condor *Advanced Research Computing University of.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
MaterialsHub - A hub for computational materials science and tools.  MaterialsHub aims to provide an online platform for computational materials science.
Monte Carlo Instrument Simulation Activity at ISIS Dickon Champion, ISIS Facility.
Peter Keller Computer Sciences Department University of Wisconsin-Madison Quill Tutorial Condor Week.
Distributed Monte Carlo Instrument Simulations at ISIS Tom Griffin, ISIS Facility & University of Manchester.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Compiled Matlab on Condor: a recipe 30 th October 2007 Clare Giacomantonio.
(1) A Beginner’s Quick Start to SIMICS. (2) Disclaimer This is a quick start document to help users get set up quickly Does not replace the user guide.
Ian C. Smith The University of Liverpool Condor Pool.
Wenjing Wu Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing BOINC workshop 2013.
CS 390 Unix Programming Summer Unix Programming - CS 3902 Course Details Online Information Please check.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
Grid MP at ISIS Tom Griffin, ISIS Facility. Introduction About ISIS Why Grid MP? About Grid MP Examples The future.
Grid job submission using HTCondor Andrew Lahiff.
Ian C. Smith Experiences with running MATLAB jobs on a power- saving Condor Pool.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Quill / Quill++ Tutorial.
Using the Weizmann Cluster Nov Overview Weizmann Cluster Connection Basics Getting a Desktop View Working on cluster machines GPU For many more.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Introduction TO Network Administration
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Creating Grid Resources for Undergraduate Coursework John N. Huffman Brown University Richard Repasky Indiana University Joseph Rinkovsky Indiana University.
Advanced Computing Facility Introduction
bitcurator-access-webtools Quick Start Guide
GRID COMPUTING.
Welcome to Indiana University Clusters
Architecture & System Overview
MaterialsHub - A hub for computational materials science and tools.
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
Chapter 1: Introduction
Semiconductor Manufacturing (and other stuff) with Condor
Advanced Computing Facility Introduction
Working in The IITJ HPC System
Introduction to research computing using Condor
Presentation transcript:

Ian C. Smith* Introduction to research computing using Condor *Advanced Research Computing University of Liverpool

Overview  what is Condor and what can it be used for ?  typical Condor pool operation  University of Liverpool Condor Pool  support for MATLAB and R applications  some research computing examples  quick introduction to UNIX with a walk-through example

What is Condor ?  a specialized system for delivering High Throughput Computing  a harvester of unused computing resources  developed by Computer Science Dept at University of Wisconsin in late ‘80s  free and (now) open source software  widely used in academia and increasing in industry  available for many platforms: Linux, Solaris, AIX, Windows XP/Vista/7, Mac OS

Types of Condor application  typically - large numbers of independent calculations (“pleasantly parallel”)  data parallel applications – split large datasets into smaller parts and process them in parallel  biological sequence analysis (e.g. BLAST)  processing of field trial data  optimisation problems  microprocessor design and testing  applications based on Monte Carlo methods  radiotherapy treatment analysis  epidemiological studies

A “typical” Condor pool Condor Server Desktop PC Execute hosts login and upload input data

A “typical” Condor pool Condor Server Desktop PC Execute hosts jobs

A “typical” Condor pool Condor Server Desktop PC Execute hosts results

A “typical” Condor pool Condor Server Desktop PC Execute hosts download results

University of Liverpool Condor Pool  contains around 700 classroom PCs running the CSD Managed Windows 7 Service (mostly 64 bit from next year)  most have 2.33 GHz Intel Core 2 processors with 2 GB RAM, 80 GB disk, configured with two job slots per PC (total of 1400 job slots)  single job submission point for Condor jobs provided by powerful UNIX server  jobs continue to run while classroom PCs are unused but...  if load (or memory use) becomes significant, job will be killed and usually any results will be lost (job will start again from scratch)  tools provided for running large numbers of MATLAB and R jobs

Condor caveats  only suitable for non-interactive applications  no communication between jobs possible  all files needed by application must be present on local disk  shorter jobs more likely to run to completion (10-20 min seems to work best)  long running jobs can be run if save/restore mechanism (checkpointing) is built into them  tricky to begin with but usually worth the initial effort

Running MATLAB jobs under Condor  need to create standalone application from M-file(s) using MATLAB compiler  standalone application can run without a MATLAB license  run-time libraries still need to be accessible to MATLAB jobs  nearly all toolbox functions available to standalone applications  simple (but powerful) file input/output makes checkpointing easier  tools available to simplify job submission - see Liverpool Condor website for more information

Running R jobs under Condor  limited support at present  R is installed on-the-fly as part of the job  currently only R version available with standard packages  tools available to simplify job submission  checkpointing may be possible for long running jobs

Personalised Medicine example  project is a Genome-Wide Association Study  aims to identify genetic predictors of response to anti-epileptic drugs  try to identify regions of the human genome that differ between individuals (referred to as SNPs)  800 patients genotyped at SNPs along the entire genome  test statistically the association between SNPs and outcomes (e.g. time to withdrawal of drug due to adverse effects)  very large data-parallel problem using R – ideal for Condor  divide datasets into small partitions so that individual jobs run for minutes  batch of 26 chromosomes (2 600 jobs) required ~ 5 hours wallclock time on Condor but ~ 5 weeks on a single PC

Radiotherapy example  large 3 rd party application code which simulates photon beam radiotherapy treatment using Monte Carlo methods  tried running simulation on 56 cores of high performance computing cluster but no progress after 5 weeks  divided problem into 250 then and eventually Condor jobs  required ~ days of cpu time (equivalent to ~ 3.5 years on dual core PC)  Condor simulation completed in less than one week  average run time was ~ 70 min  only ~ 10 % of compute time wasted due to evictions

Condor service prerequisites  will need a Sun UNIX service account (contact CSD and a Condor account (  to login in to the Condor server:  on MWS use PuTTy: Install University Applications | Internet | PuTTy 0.60  Mac/Linux: open terminal window and use ssh  off campus: use Apps Anywhere (PuTTy is in Utilities group)  to upload/download files to/from the Condor server:  on MWS use CoreFTPLite: Install University Applications | Internet | CoreFTP LE2.1  Mac/Linux: open terminal window, use sftp/scp  off campus: need to use virtual private network (VPN), then FTP

PuTTy login

CoreFTP Lite

CoreFTP Lite – download files

Condor server directory tree / or ‘root’ /usr/bin/sbin/tmp/home/condor_data

Condor server directory tree / /home/fred/home/smithic/home/jim /home login ‘home’directories /tmp/usr/bin/sbin/condor_data

Condor server directory tree /condor_data /condor_data/smithic/condor_data/jim /usr/bin/sbin/home/tmp / ‘home’directories for Condor

MATLAB Condor example calculate the sum of p matrix-matrix products:  each product calculation is independent and can be performed in parallel  MATLAB M-file (product.m): function product load input.mat; C=A*B; save( 'output.mat', 'C' ); quit;

Job submission example multiple]$ cd /condor_data/smithic #change directory

Job submission example multiple]$ cd /condor_data/smithic #change directory smithic]$ tar xf /opt1/condor/examples/handson.tar #get examples

Job submission example multiple]$ cd /condor_data/smithic #change directory smithic]$ tar xf /opt1/condor/examples/handson.tar #get examples smithic]$ cd matlab #now in /condor_data/smithic/matlab

Job submission example multiple]$ cd /condor_data/smithic #change directory smithic]$ tar xf /opt1/condor/examples/handson.tar #get examples smithic]$ cd matlab #now in /condor_data/smithic/matlab matlab]$ ls #list files input0.mat input2.mat input4.mat product input1.mat input3.mat product.m

Job submission example multiple]$ cd /condor_data/smithic #change directory smithic]$ tar xf /opt1/condor/examples/handson.tar #get examples smithic]$ cd matlab #now in /condor_data/smithic/matlab matlab]$ ls #list files input0.mat input2.mat input4.mat product input1.mat input3.mat product.m matlab]$ matlab_build product.m #create standalone executable Submitting job(s). 1 job(s) submitted to cluster 503.

Job submission example multiple]$ cd /condor_data/smithic #change directory smithic]$ tar xf /opt1/condor/examples/handson.tar #get examples smithic]$ cd matlab #now in /condor_data/smithic/matlab matlab]$ ls #list files input0.mat input2.mat input4.mat product input1.mat input3.mat product.m product.exe matlab]$ matlab_build product.m #create standalone executable Submitting job(s). 1 job(s) submitted to cluster 503. matlab]$ condor_q #get Condor queue status -- Schedd: : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD smithic 6/7 15: :00:10 R runscript.bat wrap

Job submission example multiple]$ cd /condor_data/smithic #change directory smithic]$ tar xf /opt1/condor/examples/handson.tar #get examples smithic]$ cd matlab #now in /condor_data/smithic/matlab matlab]$ ls #list files input0.mat input2.mat input4.mat product input1.mat input3.mat product.m product.exe matlab]$ matlab_build product.m #create standalone executable Submitting job(s). 1 job(s) submitted to cluster 503. matlab]$ condor_q #get Condor queue status -- Schedd: : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD smithic 6/7 15: :00:10 R runscript.bat wrap 1 jobs; 0 idle, 1 running, 0 held matlab]$ condor_q #job has finished when gone from queue -- Schedd: : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held

Job submission example matlab]$ ls input0.mat input2.mat input4.mat product.bat product.exe.manifest product.sub input1.mat input3.mat product product.exe product.m

Job submission example matlab]$ ls input0.mat input2.mat input4.mat product.bat product.exe.manifest product.sub input1.mat input3.mat product product.exe product.m matlab]$ cat product #display file contents executable=product.exe indexed_input_files=input.mat indexed_output_files=output.mat total_jobs=5

Job submission example matlab]$ ls input0.mat input2.mat input4.mat product.bat product.exe.manifest product.sub input1.mat input3.mat product product.exe product.m matlab]$ cat product #display file contents executable=product.exe indexed_input_files=input.mat indexed_output_files=output.mat total_jobs=5 matlab]$ matlab_submit product #submit multiple Matlab jobs Submitting job(s) job(s) submitted to cluster 511.

Job submission example matlab]$ ls input0.mat input2.mat input4.mat product.bat product.exe.manifest product.sub input1.mat input3.mat product product.exe product.m matlab]$ cat product #display file contents executable=product.exe indexed_input_files=input.mat indexed_output_files=output.mat total_jobs=5 matlab]$ matlab_submit product #submit multiple Matlab jobs Submitting job(s) job(s) submitted to cluster 511. matlab]$ condor_q#get status of jobs -- Schedd: : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD smithic 6/7 16: :00:02 R product.bat produc smithic 6/7 16: :00:02 R product.bat produc smithic 6/7 16: :00:02 R product.bat produc smithic 6/7 16: :00:02 R product.bat produc smithic 6/7 16: :00:02 R product.bat produc 5 jobs; 0 idle, 5 running, 0 held

Job submission example matlab]$ condor_q #some jobs completed, one still running -- Schedd: : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD smithic 6/7 16: :00:25 R product.bat produc 1 jobs; 0 idle, 1 running, 0 held

Job submission example matlab]$ condor_q #some jobs completed, one still running -- Schedd: : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD smithic 6/7 16: :00:25 R product.bat produc 1 jobs; 0 idle, 1 running, 0 held matlab]$ condor_q #all jobs complete -- Schedd: : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held

Job submission example matlab]$ condor_q #some jobs completed, one still running -- Schedd: : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD smithic 6/7 16: :00:25 R product.bat produc 1 jobs; 0 idle, 1 running, 0 held matlab]$ condor_q #all jobs complete -- Schedd: : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held matlab]$ ls #check output files input0.mat input3.mat output1.mat output4.mat product.exe product.sub input1.mat input4.mat output2.mat product product.exe.manifest input2.mat output0.mat output3.mat product.bat product.m

Job submission example matlab]$ condor_q #some jobs completed, one still running -- Schedd: : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD smithic 6/7 16: :00:25 R product.bat produc 1 jobs; 0 idle, 1 running, 0 held matlab]$ condor_q #all jobs complete -- Schedd: : ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held matlab]$ ls input0.mat input3.mat output1.mat output4.mat product.exe product.sub input1.mat input4.mat output2.mat product product.exe.manifest input2.mat output0.mat output3.mat product.bat product.m matlab]$ zip output.zip output*.mat #bundle output files

Summary  Condor can speed up processing by running large numbers of jobs in parallel  shorter jobs work best but can deal with jobs of arbitrary length  user-written codes easiest to run (MATLAB, R, C/C++, FORTRAN etc)  commercial 3 rd party software may work  needs to run on standard MWS PC without user interaction  all Condor jobs submitted via central UNIX server

Further Information Condor other research computing services