Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.

Slides:



Advertisements
Similar presentations
Lab III – Linux at UMBC.
Advertisements

Cluster Computing at IQSS Alex Storer, Research Technology Consultant.
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission.
Using the Argo Cluster Paul Sexton CS 566 February 6, 2006.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch.
Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.
ISG We build general capability Job Submission on the Olympus Cluster J. DePasse; S. Brown, PhD; T. Maiden Pittsburgh Supercomputing Center Public Health.
CS Lecture 03 Outline Sed and awk from previous lecture Writing simple bash script Assignment 1 discussion 1CS 311 Operating SystemsLecture 03.
Processes CSCI 444/544 Operating Systems Fall 2008.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.
Guide To UNIX Using Linux Third Edition
A crash course in njit’s Afs
ISG We build general capability Purpose After this tutorial, you should: Be comfortable submitting work to the batch queuing system of olympus and be familiar.
Introduction to UNIX/Linux Exercises Dan Stanzione.
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010.
Introduction to Shell Script Programming
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
How to get started on cees Mandy SEP Style. Resources Cees-clusters SEP-reserved disk20TB SEP reserved node35 (currently 25) Default max node149 (8 cores.
Introduction to Using SLURM on Discover Chongxun (Doris) Pan September 24, 2013.
UNIX Commands. Why UNIX Commands Are Noninteractive Command may take input from the output of another command (filters). May be scheduled to run at specific.
BSP on the Origin2000 Lab for the course: Seminar in Scientific Computing with BSP Dr. Anne Weill –
Network Queuing System (NQS). Controls batch queues Only on Cray SV1 Presently 8 queues available for general use and one queue for the Cray analyst.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Introduction Advantages/ disadvantages Code examples Speed Summary Running on the AOD Analysis Platforms 1/11/2007 Andrew Mehta.
HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.
M. Schott (CERN) Page 1 CERN Group Tutorials CAT Tier-3 Tutorial October 2009.
Globus Toolkit Installation Report. What is Globus Toolkit? The Globus Toolkit is an open source software toolkit used for building Grid systems.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.
Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Introduction to HPC Workshop October Introduction Rob Lane & The HPC Support Team Research Computing Services CUIT.
1 Running MPI on “Gridfarm” Bryan Carpenter February, 2005.
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
CCJ introduction RIKEN Nishina Center Kohei Shoji.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
Wouter Verkerke, NIKHEF Preparation for La Mainaz or how to run Unix apps and ROOT on your Windows Laptop without installing Linux Wouter Verkerke (NIKHEF)
UNIX U.Y: 1435/1436 H Operating System Concept. What is an Operating System?  The operating system (OS) is the program which starts up when you turn.
Starting Analysis with Athena (Esteban Fullana Torregrosa) Rik Yoshida High Energy Physics Division Argonne National Laboratory.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
Chap 1 ~ Introducing LINUX LINUX is a free-stable multi-user operating system that derives from UNIX operating system Benefits: 1) Linux is released under.
Advanced Computing Facility Introduction
Outline Installing Gem5 SPEC2006 for Gem5 Configuring Gem5.
GRID COMPUTING.
Welcome to Indiana University Clusters
Unix Scripts and PBS on BioU
HPC usage and software packages
Welcome to Indiana University Clusters
Machine Learning Workshop
The Linux Operating System
Writing Shell Scripts ─ part 3
Paul Sexton CS 566 February 6, 2006
What is Bash Shell Scripting?
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
CSCI The UNIX System Shell Startup and Variables
Chapter 3 The UNIX Shells
Quick Tutorial on MPICH for NIC-Cluster
Working in The IITJ HPC System
Presentation transcript:

Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128 cores –Processor = ‘Intel(R) Xeon(R) CPU 2.00GHz’ –Each core is roughly 50% faster than my desktop (yonne = Intel(R) Pentium(R) D CPU 2.80GHz). Metric is not precise, performance difference is different for integer and FP processing, so effective speed ratio depends on application What is ‘stoomboot’ – Software –OS ‘Scientific Linux CERN SLC release 4.6 (Beryllium)’ –Can access same NFS disks as desktops (/data/atlas, /project/atlas etc…) –Can run all Atlas software & compile all Atlas software What is stoomboot – ATLAS Computing model –Tier3 computing facility. Not organized by ATLAS, not controlled by Atlas –Can access T1 AOD data (in progress, multiple protocols possible e.g. dcap, xrootd), but also not organized by ATLAS –Note stoomboot is shared by ATLAS,LHCb,ALICE.

Wouter Verkerke, NIKHEF 2 Running jobs on stoomboot –No direct login on nodes stbc-01 through stbc-16 –Access to stoomboot through Torque/Mauii batch system (formerly known as PBS) –Batch commands available on all desktops Submitting batch jobs – qsub –Simplest example: submit script for batch execution unix> qsub test.sh 9714.allier.nikhef.nl Returned string is job identifier Checking status of batch jobs – qstat –Simplest example: unix> qstat Job id Name User Time Use S Queue allier test.sh verkerke 00:00:00 C test –Status code: Q = queued, R = running, C=completed Only jobs that completed in the last 10 minutes are listed

Wouter Verkerke, NIKHEF 3 Running jobs – Default settings Examining output –Appears in file.o, e.g. test.sh.o9714 in example of previous page Default settings for jobs –Job runs in home directory ($HOME) –Job starts with clean shell (any environment variable from the shell from which you submit are not transferred to batch job) E.g. if you need ATLAS software setup, it should be done in the submitted script –Job output (stdout) is sent to a file in directory in which job was submitted. Job stderr output is sent to separate file E.g. for example of previous slide file ‘test.sh.o9714’ contains stdout and file ‘test.sh.e9714’ contains stderr. If there is no stdout or stderr, an empty file is created –A mail is sent to you the output files cannot be created

Wouter Verkerke, NIKHEF 4 Running jobs – Some useful qstat options Merge stdout and stderr in a single file –Add option ‘-j oe’ to qsub command (single file *.o* is written) Choose batch queue –Right now there are two queues: test (30 min) and qlong (48h) –Add option ‘-q ’ to qsub command Choose different output file for stdout –Add option ‘-o ’ to qsub command Pass all environment variables of submitting shell to batch job (with exception of $PATH) –Add option ‘-V’ to qsub command

Wouter Verkerke, NIKHEF 5 Running ATLAS software in batch Setup environment in submitted script –Following Manuels wiki instructions for here Note that SLC4 hosts of stoomboot can run both SLC3 and SLC4 compiled executables #! /bin/sh # 1 -- setup Athena source $HOME/cmthome/ _slc3/setup.sh -tag=setup, # 2 -- setup working area export USERPATH=/project/atlas/users/ / export CMTPATH=${USERPATH}:${CMTPATH} cd /project/atlas/users/ / athena.py

Wouter Verkerke, NIKHEF 6 Compiling ATLAS software in batch / for use in batch Compiling ATLAS software for use in batch –If your project areas is on /project/atlas or on /data/atlas it is visible to jobs running in batch –No need to compile your executables in batch job (as is often required for GRID jobs) –Compile interactively on your desktop (SLC3 or SLC4) and set up your batch job to use the compiled executables and libraries of your project area Compiling ATLAS software in batch –You can create script to drive compilation –But easier to submit interactive batch job –Command ‘ qsub –X –I –q qlong ’. Submits batch job connected to your terminal (seems very much interactive login using ssh) –Compile software (SLC4 only option) and exit shell when done (terminates interactive batch job) –Then submit separate job(s) to run executables

Wouter Verkerke, NIKHEF 7 Are the queues full? The ‘qstat’ command only lists your own jobs –The lower level command ‘showq --host=allier’ can show jobs of all users –For now seems only available on login The ‘qstat –Q’ command summarizes number of currently pending/running/completed jobs per queue for all users Queue Max Tot Ena Str Que Run Hld Wat Trn Ext T qlong yes yes E test 0 0 yes yes E Web page with load graph vs time: Current configuration of scheduler –Max 96 jobs for ATLAS, max 64 jobs per user

Wouter Verkerke, NIKHEF 8 A wrapper script for simple interactive use I have written a small utility script ‘bsub’ that wraps the qsub command to allow to directly submit a command line argument –E.g. You have a ROOT analysis macro that you can run in your current as unix> root –l –b –q analyzeNtuples.C –Hassle to write small script that initializes batch shell with same ROOT version, moves to PWD and executates same command. The bsub wrapper does all this for you. Unix> bsub -l –b –q analyzeNtuples.C submits batch job that does exactly the same as interactive command. –Caveat: command line arguments cannot contain quotes, parentheses etc.. as these get mangled by scripts

Wouter Verkerke, NIKHEF 9 Next time: AOD data access from stoomboot You can access files on /data/atlas, –However performance will not scale as NFS performance and network bandwith will quickly saturate as more than O(few) jobs on stoomboot are reading from /data/atlas –OK for debugging & testing, but need better solution to be able to use full stoomboot capacity. Several better solutions (e.g. dcap, xrootd) exist in princinple –Different tradeoff in performance –Work in progress (Folkert is trying dcap). More in next meeting.