Using hpc Instructor : Seung Hun An, DCS Lab, School of EECSE, Seoul National University.

Slides:



Advertisements
Similar presentations
First there was batch Serial processing Waiting jobs sit on a job queue until they can be processed Things processed in first-in first-out order (FIFO)
Advertisements

S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE NPACI Parallel Computing Seminars San Diego Supercomputing.
1 Introduction to Supercomputing at ARSC Kate Hedstrom, Arctic Region Supercomputing Center (ARSC) Jan, 2004.
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
More HTCondor 2014 OSG User School, Monday, Lecture 2 Greg Thain University of Wisconsin-Madison.
Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
High Performance Computing Systems for IU Researchers – An Introduction IUB Wells Library 10-Sep-2012 Jenett Tillotson George Turner
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 LoadLeveler vs. NQE/NQS: Clash of The Titans NERSC User Services Oak Ridge National Lab 6/6/00.
Job Submission on WestGrid Feb on Access Grid.
SIE’s favourite pet: Condor (or how to easily run your programs in dozens of machines at a time) Adrián Santos Marrero E.T.S.I. Informática - ULL.
1 Workshop 20: Teaching a Hands-on Undergraduate Grid Computing Course SIGCSE The 41st ACM Technical Symposium on Computer Science Education Friday.
1 Using Condor An Introduction ICE 2008.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
J. Skovira 5/05 v11 Introduction to IBM LoadLeveler Batch Scheduling System.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machine Universe in.
Unix Presentation. What is an Operating System An operating system (OS) is a program that allows you to interact with the computer -- all of the software.
Introduction to Condor DMD/DFS J.Knudstrup December 2005.
Agenda What is Computer Programming? The Programming Process
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Prof. Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University FT-MPICH : Providing fault tolerance for MPI parallel applications.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Progress Report Barnett Chiu Glidein Code Updates and Tests (1) Major modifications to condor_glidein code are as follows: 1. Command Options:
Office of Science U.S. Department of Energy Evaluating Checkpoint/Restart on the IBM SP Jay Srinivasan
Zellescher Weg 12 Trefftz-Building – HRSK/151 Phone Guido Juckeland Center for Information Services.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
IBM Systems & Technology Group LoadLeveler 3.3 Dr. Roland Kunz, IT Specialist l.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
Chapter 4 UNIX Common Shells Commands By C. Shing ITEC Dept Radford University.
Grid Computing I CONDOR.
GridShell + Condor How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner Edward Walker Miron Livney Todd Tannenbaum The Condor Development Team.
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
Intermediate Condor Rob Quick Open Science Grid HTC - Indiana University.
Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh You will be logged in to an interactive.
Condor Project Computer Sciences Department University of Wisconsin-Madison A Scientist’s Introduction.
Network Queuing System (NQS). Controls batch queues Only on Cray SV1 Presently 8 queues available for general use and one queue for the Cray analyst.
Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison
July 28' 2011INDIA-CMS_meeting_BARC1 Tier-3 TIFR Makrand Siddhabhatti DHEP, TIFR Mumbai July 291INDIA-CMS_meeting_BARC.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Using the BYU SP-2. Our System Interactive nodes (2) –used for login, compilation & testing –marylou10.et.byu.edu I/O and scheduling nodes (7) –used for.
© 2005 IBM MPI Louisiana Tech University Ruston, Louisiana Charles Grassl IBM January, 2006.
High-Performance SERC Supercomputer Education & Research Centre Indian Institute of Science Bangalore Contact :
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Using UNIX Shell Scripts Michael Griffiths Corporate Information and Computing Services The University of Sheffield
Faucets Queuing System Presented by, Sameer Kumar.
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
1 HPCI Presentation Kulathep Charoenpornwattana. March 12, Outline Parallel programming with MPI Running MPI applications on Azul & Itanium Running.
HPC at HCC Jun Wang Outline of Workshop2 Familiar with Linux file system Familiar with Shell environment Familiar with module command Familiar with queuing.
HUBbub 2013: Developing hub tools that submit HPC jobs Rob Campbell Purdue University Thursday, September 5, 2013.
Intermediate Condor Monday morning, 10:45am Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Gaussian 09 Tutorial Ph. D. Candidate
Auburn University
More HTCondor Monday AM, Lecture 2 Ian Ross
Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node
CRESCO Project: Salvatore Raia
Troubleshooting Your Jobs
Computer Organization & Compilation Process
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Introduction to High Throughput Computing and HTCondor
HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.
The performance of NAMD on a large Power4 system
Computer Organization & Compilation Process
Working in The IITJ HPC System
Troubleshooting Your Jobs
Job Submission Via File Transfer
Presentation transcript:

Using hpc Instructor : Seung Hun An, DCS Lab, School of EECSE, Seoul National University

What is hpc  System IBM RS/6000 SP, Aix nodes and 16 processors per node 144 Gbyte memory, 3TByte LoadLeveler & Poe LoadLeveler is recommanded  hpc.snu.ac.kr Connect by telnet, ssh, rsh Teratem is available at

System Setting & Using  Bourne shell ksh(default), bash Use export instead of setenv  General step of using Edit cmd file Compile source file Submit machine code into the machine

Command file #!/bin/ksh job_type = parallel executable = ~/KISA/LLL/execution input = /dev/null output = $(Executable).$(Cluster).$(Process).out error = $(Executable).$(Cluster).$(Process).err initialdir = /u/dcslab notify_user = class = gold step_name = LLL notification = complete checkpoint = no restart = no requirements = (Arch == "R6000") && (OpSys == "AIX43") node = 4 total_tasks = 15 network.MPI = css0,shared,US,high queue

Running example [sp01: ~/KISA/LLL] $ mpcc parallel_allswap.c [sp01: ~/KISA/LLL] $ mv a.out execution [sp01: ~/KISA/LLL] $ llsubmit lll.cmd llsubmit: The job "sp " has been submitted. [sp01: ~/KISA/LLL] $ llstatus Name Schedd InQ Act Startd Run LdAvg Idle Arch OpSys sp01 Avail Idle R6000 AIX43 sp02 Avail 1 1 Run R6000 AIX43 sp03 Avail 0 0 Run R6000 AIX43 sp04 Avail 0 0 Run R6000 AIX43 sp05 Avail 0 0 Run R6000 AIX43 sp06 Avail 0 0 Run R6000 AIX43 sp07 Avail 0 0 Run R6000 AIX43 sp08 Avail 0 0 Run R6000 AIX43 sp09 Avail 0 0 Run R6000 AIX43 R6000/AIX43 9 machines 13 jobs 123 running Total Machines 9 machines 13 jobs 123 running The Central Manager is defined on sp02 All machines on the machine_list are present. [sp01: ~/KISA/LLL] $

llq [sp01: ~/KISA/LLL] $ llq Id Owner Submitted ST PRI Class Running On sp mrdlab1 9/14 03:04 R 50 long sp02 sp spscs 9/15 16:16 R 50 silver sp05 sp spscs 9/15 16:16 R 50 silver sp07 sp flowsys1 9/15 17:00 R 50 silver sp04 sp seongkim 9/15 22:37 R 50 gold sp06 sp shinkj 9/16 12:11 R 50 gold sp04 sp janggrp 9/16 12:28 R 50 gold sp09 sp janggrp 9/16 12:28 R 50 gold sp03 sp biosys 9/16 15:26 R 50 silver sp03 sp hpcb0011 9/16 16:53 R 50 silver sp03 sp microsys 9/16 17:25 R 50 silver sp08 sp microsys 9/16 17:25 R 50 silver sp08 sp dcslab 9/16 19:06 ST 50 gold sp08 13 job steps in queue, 0 waiting, 1 pending, 12 running, 0 held

llclass & llcancel  llclass [sp01: ~/KISA/LLL] $ llclass Name MaxJobCPU MaxProcCPU Free Max Description d+hh:mm:ss d+hh:mm:ss Slots Slots gold Serial & parallel batch job silver Serial & parallel batch job long Long time job general Test or Interactive job  llcancel When cancel one or more jobs from the Loadleveler queue