High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.

Slides:



Advertisements
Similar presentations
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Advertisements

HTCondor and the European Grid Andrew Lahiff STFC Rutherford Appleton Laboratory European HTCondor Site Admins Meeting 2014.
Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.
Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
Greg Thain Computer Sciences Department University of Wisconsin-Madison Condor Parallel Universe.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Campus High Throughput Computing (HTC) Infrastructures (aka Campus Grids) Dan Fraser OSG Production Coordinator Campus Grids Lead.
More HTCondor 2014 OSG User School, Monday, Lecture 2 Greg Thain University of Wisconsin-Madison.
Post ASP 2012: Protein Docking at the IU School of Medicine.
High Throughput Computing with Condor at Notre Dame Douglas Thain 30 April 2009.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
Minerva Infrastructure Meeting – October 04, 2011.
When and How to Use Large-Scale Computing: CHTC and HTCondor Lauren Michael, Research Computing Facilitator Center for High Throughput Computing STAT 692,
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
HTPC - High Throughput Parallel Computing (on the OSG) Dan Fraser, UChicago OSG Production Coordinator Horst Severini, OU (Greg Thain, Uwisc) OU Supercomputing.
Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan.
National Alliance for Medical Image Computing Grid Computing with BatchMake Julien Jomier Kitware Inc.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.
HTCondor and BOINC. › Berkeley Open Infrastructure for Network Computing › Grew out of began in 2002 › Middleware system for volunteer computing.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.
Condor Project Computer Sciences Department University of Wisconsin-Madison Advanced Condor mechanisms CERN Feb
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
Campus Grids Report OSG Area Coordinator’s Meeting Dec 15, 2010 Dan Fraser (Derek Weitzel, Brian Bockelman)
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, UWisc Condor Week April 13, 2010.
Workflows: from Development to Production Thursday morning, 10:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Parallel Optimization Tools for High Performance Design of Integrated Circuits WISCAD VLSI Design Automation Lab Azadeh Davoodi.
3-2.1 Topics Grid Computing Meta-schedulers –Condor-G –Gridway Distributed Resource Management Application (DRMAA) © 2010 B. Wilkinson/Clayton Ferner.
Turning science problems into HTC jobs Wednesday, July 20 th 9am Zach Miller & Greg Thain Condor Team University of Wisconsin-Madison.
Intermediate Condor Rob Quick Open Science Grid HTC - Indiana University.
Condor Project Computer Sciences Department University of Wisconsin-Madison A Scientist’s Introduction.
Grid job submission using HTCondor Andrew Lahiff.
Workflows: from development to Production Thursday morning, 10:00 am Greg Thain University of Wisconsin - Madison.
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Remote Cluster Connect Factories David Lesny University of Illinois.
Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Condor Project Computer Sciences Department University of Wisconsin-Madison Grids and Condor Barcelona,
Condor Week 2004 The use of Condor at the CDF Analysis Farm Presented by Sfiligoi Igor on behalf of the CAF group.
FermiGrid School Steven Timm FermiGrid School FermiGrid 201 Scripting and running Grid Jobs.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
What’s Coming? What are we Planning?. › Better docs › Goldilocks – This slot size is just right › Storage › New.
Purdue RP Highlights TeraGrid Round Table May 20, 2010 Preston Smith Manager - HPC Grid Systems Rosen Center for Advanced Computing Purdue University.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Job Router.
How High Throughput was my cluster? Greg Thain Center for High Throughput Computing.
Condor Project Computer Sciences Department University of Wisconsin-Madison Running Interpreted Jobs.
Condor Project Computer Sciences Department University of Wisconsin-Madison Using New Features in Condor 7.2.
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
Parag Mhashilkar (Fermi National Accelerator Laboratory)
Greg Thain Computer Sciences Department University of Wisconsin-Madison HTPC on the OSG.
Introduction to High Throughput Computing and HTCondor Monday AM, Lecture 1 Ian Ross Center for High Throughput Computing University of Wisconsin-Madison.
Condor Week May 2012No user requirements1 Condor Week 2012 An argument for moving the requirements out of user hands - The CMS experience presented.
Intermediate HTCondor: More Workflows Monday pm
Stephen Childs Trinity College Dublin
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Harnessing the Power of Condor for Human Genetics
Moving CHTC from RHEL 6 to RHEL 7
NGS computation services: APIs and Parallel Jobs
Condor: Job Management
CHTC working w/ Chemistry Dept
Understanding Supernovae with Condor
Condor and Multi-core Scheduling
The Condor JobRouter.
GLOW A Campus Grid within OSG
PU. Setting up parallel universe in your pool and when (not
Presentation transcript:

High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc

The two familiar HPC Models High Throughput Computing Run ensembles of single core jobs Run ensembles of single core jobs Capability Computing A few jobs parallelized over the whole system A few jobs parallelized over the whole system Use whatever parallel s/w is on the system Use whatever parallel s/w is on the system

HTPC – an emerging model Ensembles of small- way parallel jobs (10’s – 1000’s) Use whatever parallel s/w you want Use whatever parallel s/w you want (It ships with the job)

Current HTPC Mechanism Submit jobs that consume an entire processor Typically 8 cores (today) Typically 8 cores (today) Package jobs with a parallel library (schedd) Package jobs with a parallel library (schedd) HTPC jobs as portable as any other job MPI, OpenMP, your own scripts, … Parallel libraries can be optimized for on-board memory access All memory is available for efficient utilization All memory is available for efficient utilization Some jobs just need more memory Submit jobs via WMS glidein on the OSG A separate list of resources for HTPC (by VO) A separate list of resources for HTPC (by VO)

Where we are heading Condor 7.6.x now supports: Node draining Node draining Partial node usage (e.g. 4 out of 8 cores) Partial node usage (e.g. 4 out of 8 cores) Fixes a job priority issue with HTPC jobs Fixes a job priority issue with HTPC jobs Now being tested at Wisconsin (CHTC) Plan is to create a “best practice” template to help with Condor configuration Plan is to create a “best practice” template to help with Condor configuration Will add to RPM CE doc Will add to RPM CE doc Currently in Pacman OSG CE doc

Configuring Condor for HTPC Two strategies: Suspend/drain jobs to open HTPC slots Suspend/drain jobs to open HTPC slots Hold empty cores until HTPC slot is open Hold empty cores until HTPC slot is open See the HTPC twiki for the straightforward PBS setup ParallelComputing ParallelComputing

How to submit universe = vanilla requirements = (CAN_RUN_WHOLE_MACHINE =?= TRUE) +RequiresWholeMachine=true executable = some job arguments = arguments should_transfer_files = yes when_to_transfer_output = on_exit transfer_input_files = inputs queue

MPI on Whole machine jobs universe = vanilla requirements = (CAN_RUN_WHOLE_MACHINE =?= TRUE) +RequiresWholeMachine=true executable = mpiexec arguments = -np 8 real_exe should_transfer_files = yes when_to_transfer_output = on_exit transfer_input_files = real_exe queue Whole machine mpi submit file

How to submit to OSG universe = grid GridResource = some_grid_host GlobusRSL = MagicRSL executable = wrapper.sh arguments = arguments should_transfer_files = yes when_to_transfer_output = on_exit transfer_input_files = inputs transfer_output_files = output queue

Chemistry Usage Example of HTPC

What’s the magic RSL? Site Specific Documentation is in the OSG CE ( PBS(host_xcount=1)(xcount=8)(queue=?)LSF(queue=?)(exclusive=1)Condor (condorsubmit=(‘+WholeMachine’ true))

What’s with the wrapper? Chmod executable Create output files #!/bin/sh chmod 0755 real.ex touch output./mpiexec –np 8 real.ex

Next steps …?