Zeus Users’ Quickstart Training January 27/30, 2012.

Slides:



Advertisements
Similar presentations
Sonny J Zambrana University of Pennsylvania ISC-SEO November 2008.
Advertisements

Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
Introduction to Running CFX on U2  Introduction to the U2 Cluster  Getting Help  Hardware Resources  Software Resources  Computing Environment  Data.
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh ssh.fsl.byu.edu You will be logged in to an interactive node.
Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.
Asynchronous Solution Appendix Eleven. Training Manual Asynchronous Solution August 26, 2005 Inventory # A11-2 Chapter Overview In this chapter,
Downloading and Installing AutoCAD Architecture 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the software.
ORNL is managed by UT-Battelle for the US Department of Energy Tools Available for Transferring Large Data Sets Over the WAN Suzanne Parete-Koon Chris.
Installing and running COMSOL on a Windows HPCS2008(R2) cluster
HPCC Mid-Morning Break Interactive High Performance Computing Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
How To Batch Register Your Students
A crash course in njit’s Afs
Form Handling, Validation and Functions. Form Handling Forms are a graphical user interfaces (GUIs) that enables the interaction between users and servers.
Hosted Exchange The purpose of this Startup Guide is to familiarize you with ExchangeDefender's Exchange and SharePoint Hosting. ExchangeDefender.
Introduction to UNIX/Linux Exercises Dan Stanzione.
Research Computing with Newton Gerald Ragghianti Newton HPC workshop Sept. 3, 2010.
 Accessing the NCCS Systems  Setting your Initial System Environment  Moving Data onto the NCCS Systems  Storing Data on the NCCS Systems  Running.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
1 Guide to Novell NetWare 6.0 Network Administration Chapter 13.
Copyright ®xSpring Pte Ltd, All rights reserved Versions DateVersionDescriptionAuthor May First version. Modified from Enterprise edition.NBL.
Trilinos 101: Getting Started with Trilinos November 7, :30-9:30 a.m. Mike Heroux Jim Willenbring.
Home Media Network Hard Drive Training for Update to 2.0 By Erik Collett Revised for Firmware Update.
Introduction to HPC resources for BCB 660 Nirav Merchant
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
0Gold 11 0Gold 11 LapLink Gold 11 Firewall Service How Connections are Created A Detailed Overview for the IT Manager.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
Database-Driven Web Sites, Second Edition1 Chapter 5 WEB SERVERS.
FTP Server and FTP Commands By Nanda Ganesan, Ph.D. © Nanda Ganesan, All Rights Reserved.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh You will be logged in to an interactive.
Downloading and Installing Autodesk Revit 2016
HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
Downloading and Installing Autodesk Inventor Professional 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the.
Getting Started on Emerald Research Computing Group.
Configuring and Troubleshooting Identity and Access Solutions with Windows Server® 2008 Active Directory®
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
Threaded Programming Lecture 2: Introduction to OpenMP.
Linux Operations and Administration
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
How to configure, build and install Trilinos November 2, :30-9:30 a.m. Jim Willenbring.
Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.
Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.
Cliff Addison University of Liverpool NW-GRID Training Event 26 th January 2007 SCore MPI Taking full advantage of GigE.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
SSH. 2 SSH – Secure Shell SSH is a cryptographic protocol – Implemented in software originally for remote login applications – One most popular software.
NIMAC for Accessible Media Producers: February 2013 NIMAC 2.0 for AMPs.
NREL is a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy, operated by the Alliance for Sustainable.
ATLAS Computing Wenjing Wu outline Local accounts Tier3 resources Tier2 resources.
An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.
Advanced Computing Facility Introduction
GRID COMPUTING.
Welcome to Indiana University Clusters
PARADOX Cluster job management
Development Environment
HPC usage and software packages
WikID installation/training
How to use the HPCC to do stuff
LINUX ADMINISTRATION 1
Submit BOSS Jobs on Distributed Computing System
Architecture & System Overview
File Transfer Olivia Irving and Cameron Foss
Compiling and Job Submission
TABE PC.
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Introduction to High Performance Computing Using Sapelo2 at GACRC
Quick Tutorial on MPICH for NIC-Cluster
Working in The IITJ HPC System
Presentation transcript:

Zeus Users’ Quickstart Training January 27/30, 2012

Agenda Overview – Documentation – Zeus Architecture – Policies/Conventions Accessing Zeus Moving Data to/from Zeus Modules Compiling codes Submitting a Job Monitoring a Job Resource utilization inquiries Getting Help

Documentation Basic NESCC Web Site is rdhpcs.noaa.gov/nescc On the left side is a link to FAQ This presentation is essentially a structured walk through the FAQ. Web references at the bottom of the slides is the FAQ entry for that subject.

Architecture DTN 1 4 Front End N Front End 3 Front End 2 Front End 1 … Big Mem N Big Mem 3 Big Mem 2 Big Mem 1 … Service Nodes Big Memory Nodes Rack N Rack 4 Rack 3 Rack 2 Rack 1 … Compute System Bastion HPS Filesystem 1 HPS Filesystem 2 High Performance Storage Home FS Ethernet Infinband Network DTN 1 Infiniband

Architecture

Policies/Conventions Node usage – Service Nodes (front-ends) – Shared (serial) Compute Bigmem – Compute Nodes Disk space – Home Backed up, User quota (initially 5GB/user) – Scratch Not backed up, not purged, Group quota (Based on portfolio managers direction)

Policies/Conventions Front-ends/Service Nodes – File Transfers – Compiles – Editing/Organization – NO Compute Jobs Compute Nodes – Obviously, compute jobs – Do not have access to the data movement or HPSS – Do not compile

Agenda Overview – Zeus Architecture – Documentation – Policies/Conventions Accessing Zeus Moving Data to/from Zeus Modules Compiling codes Submitting a Job Monitoring a Job Resource utilization inquiries Getting Help

Accessing Zeus Requesting an account on Zeus – This will change over time, we hope to eventually have a unified, web-based account request form. But, we aren’t there yet For now, if you have access to Vapor, Jet, Gaea, etc. send a help request in requesting the account. Please specify the project or projects you are associated with. – Use:

Accessing Zeus Your RSA token needs to be set up, we won’t discuss this here. We are assuming you already have a valid PIN. Use an ssh client to login to Zeus (Linux, Mac: ssh; Windows: putty or similar) – Linux/Mac: ssh –C –X If this is your first time, you will be asked to enter a three (or more) word pass phrase. After you enter your pass phrase, a login will be attempted. This will fail because you don’t have a password. Don’t worry, the process has been initiated correctly.

Accessing Zeus (continued) After entering your certificate has been signed, you can then try again logging into Zeus. (Use the same method as before.) You will be prompted for your pass phrase again. (This will be the last time – for a while.) The interaction is on the following web page: Let’s take a quick look

Selecting Specific Login Nodes Sometimes you may need to return to a specific Login Node. The Login Nodes are fe1-fe8 (tfe1 is for Herc – the test and development system (TDS) only specific users will be enable here after we get part way through acceptance) Near the end of the login process you will see Proxy certificate retrieved. You will now be connected to OneNOAA RDHPCS: Zeus system Hit ^C within 5 seconds to select another host. After hitting Control-C you will see Select a host. Enter the hostname, or a unique portion of a hostname: HostnameIP Address fe fe fe fe fe fe fe fe tfe Enter hostname here: fe2

Agenda Overview – Zeus Architecture – Documentation – Policies/Conventions Accessing Zeus Moving Data to/from Zeus Modules Compiling codes Submitting a Job Monitoring a Job Resource utilization inquiries Getting Help

Transferring Data To/From Zeus The port tunnel is documented on the FAQ but is not recommended Using the Data Transfer Host (Token Authentication – only works from.noaa.gov) # scp bigfile password: (This is the point where you enter your PIN+RSA Token response) bigfile 100% 128MB 32.0MB/s 00:04 #

Transferring Data To/From Zeus Unattended Data Transfers – Will be set up upon special request – From known specific hosts within.noaa.gov – To know specific user names on Zeus – Send a request to the help system explaining your requirements and the list of IP address from which the transfers will occur Outbound transfers – Done from the front-end nodes – Use the “service” nodes See:

Agenda Overview – Zeus Architecture – Documentation – Policies/Conventions Accessing Zeus Moving Data to/from Zeus Modules Compiling codes Submitting a Job Monitoring a Job Resource utilization inquiries Getting Help

Modules Modules allow the PATH and environment variables to be manipulated so that multiple versions of software can be supported As a result some things are not in your path by default including compilers and MPI A good set of defaults including the Intel compiler: – module load bbcp intel mpt A good set of defaults including the PGI compiler: – module load bbcp pgi mpt These commands can be in your.cshrc or.profile

Modules - Documentation See: – For sh, ksh, or bash see: – Questions Questions Select: Why can’t I use module commands inside of my batch script

Agenda Overview – Zeus Architecture – Documentation – Policies/Conventions Accessing Zeus Moving Data to/from Zeus Modules Compiling codes Submitting a Job Monitoring a Job Resource utilization inquiries Getting Help

Compiling We currently have two compilers installed – Intel We have more licenses and can provide more extensive support for the Intel compiler – PGI We expect to add the Lahey compiler in the future Many systems provide MPI compiler wrapper scripts. – These are provided, but are not supported nor are they guaranteed to work.

Compiling - Intel Compiler names: – Fortran: ifort – C, C++: icc, icpc Recommended options: – Optimize -O3 -xSSE4.2 -ip – Debug -O0 -g -traceback -check all -fpe0 –ftrapuv Linking MPI -lmpi

Compiling - PGI Compiler names: – Fortran90, Fortran77: pgf90, pgf77 – C, C++: pgcc, pgCC Recommended options: – Optimize -O3 -fastsse -tp=nehalem-64 – Debug -O0 -g -traceback -Mbounds -Mchkfpstk -Mchkptr –Mchkstk Compiling/Linking MPI – Compile: -I$(MPI_ROOT)/include – Link -L$(MPI_ROOT)/lib -lmpi

Compiling & Make Intel CC = icc F77 = ifort CXX = icpc F90 = ifort PGI CC = pgcc -I$(MPI_ROOT)/include F77 = pgf77 –I$(MPI_ROOT)/include CXX = pgCC –I$(MPI_ROOT)/include F90 = pgf90 –I$(MPI_ROOT)/include Documentation _Compilers

Agenda Overview – Zeus Architecture – Documentation – Policies/Conventions Accessing Zeus Moving Data to/from Zeus Modules Compiling codes Submitting a Job Monitoring a Job Resource utilization inquiries Getting Help

Submitting a job msub is the command preferred for Zeus – Qsub is available, but Gaea and Zeus will both have msub available. msub also allows for the batch system to change underneath tranparently. msub options -A project (required) -l nodes= :ppn= or -l procs= -l walltime= or walltime= -l mem= (default is 1.4GB/process) -q (default is batch)

Useful Options and Variables To have a job start in your current directory -d. Some Environment Variables $PBS_JOBID - The jobid of the currently running job $PBS_O_WORKDIR - The directory from which the batch script was submitted $PBS_QUEUE - The assigned queue for this job $PBS_NP - The number of tasks assigned to this job

Examples Submit a job with a 4-hour limit, 32 cores, project nesccmgmt, and urgent queue –msub -l walltime=4:00:00,procs=32 –q urgent job.csh Submit a job with a 2-hour limit, 16 processes each with 6 threads (OpenMP/MPI hybrid), project fim, 10GByte/process, and batch (default) queue. Embed commands in script – msub fimjob.sh – fimjob.sh: #!/bin/sh –login # Options use the PBS sentinel #PBS –A fim # Indicate 8 nodes with 2 MPI tasks per node (use 6 OMP threads/task) #PBS –l nodes=8:ppn=2 #PBS –lmem=10G #PBS –l walltime=3600 export OMP_NUM_THREADS=6 See: ng_Jobs

Submitting an Interactive Job To submit an interactive job use the –I and –X options msub –A fim -I -X procs=36,walltime=1:00:00 – After the jobs works through the queue, an interactive prompt will appear Starting TotalView – At an interactive prompt enter the following (make sure you used –X on your ssh and msub): module load totalview totalview (Give it a moment to start up)

More debugging

Agenda Overview – Zeus Architecture – Documentation – Policies/Conventions Accessing Zeus Moving Data to/from Zeus Modules Compiling codes Submitting a Job Monitoring a Job Resource utilization inquiries Getting Help

Monitoring a job Show all jobs in the queue showq Show all jobs belonging to only you showq –u Christopher.W.Harrop active jobs JOBID USERNAME STATE PROCS REMAINING STARTTIME 0 active jobs 0 of processors in use by local jobs (0.00%) 0 of 2310 nodes active (0.00%) eligible jobs JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME 5646 Christop Idle 1 00:05:00 Thu Jan 26 16:58:52 1 eligible job blocked jobs JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME 0 blocked jobs Total job: 1

Deleting a job Mjobctl –c

Agenda Overview – Zeus Architecture – Documentation – Policies/Conventions Accessing Zeus Moving Data to/from Zeus Modules Compiling codes Submitting a Job Monitoring a Job Resource utilization inquiries Getting Help

Resource Utilization Queries Disk Quota – squota (lists all quotas associated with user) Project Utilization – Gold-information-utility???

Agenda Overview – Zeus Architecture – Documentation – Policies/Conventions Accessing Zeus Moving Data to/from Zeus Modules Compiling codes Submitting a Job Monitoring a Job Resource utilization inquiries Getting Help

Use OTRS – -based – Zeus Issues – HPSS Issues

Thank You! now… Open Forum