SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing.

Slides:

Advertisements

Similar presentations

Network II.5 simulator ..

Advertisements

K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon.

Xushan Zhao, Yang Chen Application of ab initio In Zr-alloys for Nuclear Power Stations General Research Institute for Non- Ferrous metals of Beijing September.

Profiling your application with Intel VTune at NERSC

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)

Academic and Research Technology (A&RT)

High Performance Computing (HPC) at Center for Information Communication and Technology in UTM.

CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.

Overview SAP Basis Functions. SAP Technical Overview Learning Objectives What the Basis system is How does SAP handle a transaction request Differentiating.

Using VASP on Ranger Hang Liu. About this work and talk – A part of an AUS project for VASP users from UCSB computational material science group led by.

HPCC Mid-Morning Break Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery Introduction to the new GPU (GFX) cluster.

ORIGINAL AUTHOR JAMES REINDERS, INTEL PRESENTED BY ADITYA AMBARDEKAR Overview for Intel Xeon Processors and Intel Xeon Phi coprocessors.

MIGRATING TO THE SHARED COMPUTING CLUSTER (SCC) SCV Staff Boston University Scientific Computing and Visualization.

Parallelization with the Matlab® Distributed Computing Server CBI cluster December 3, Matlab Parallelization with the Matlab Distributed.

Introduction Optimizing Application Performance with Pinpoint Accuracy What every IT Executive, Administrator & Developer Needs to Know.

Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.

WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.

Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.

Research Support Services Research Support Services.

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management.

TRACEREP: GATEWAY FOR SHARING AND COLLECTING TRACES IN HPC SYSTEMS Iván Pérez Enrique Vallejo José Luis Bosque University of Cantabria TraceRep IWSG'15.

SC’13: Hands-on Practical Hybrid Parallel Application Performance Engineering Introduction to VI-HPS Brian Wylie Jülich Supercomputing Centre.

Sobolev Showcase Computational Mathematics and Imaging Lab.

Slide 1 MIT Lincoln Laboratory Toward Mega-Scale Computing with pMatlab Chansup Byun and Jeremy Kepner MIT Lincoln Laboratory Vipin Sachdeva and Kirk E.

The Research Computing Center Nicholas Labello

17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin.

DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.

Application performance and communication profiles of M3DC1_3D on NERSC babbage KNC with 16 MPI Ranks Thanh Phung, Intel TCAR Woo-Sun Yang, NERSC.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.

Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,

Performance Monitoring Tools on TCS Roberto Gomez and Raghu Reddy Pittsburgh Supercomputing Center David O’Neal National Center for Supercomputing Applications.

HPC for Statistics Grad Students. A Cluster Not just a bunch of computers Linked CPUs managed by queuing software – Cluster – Node – CPU.

Software Overview Environment, libraries, debuggers, programming tools and applications Jonathan Carter NUG Training 3 Oct 2005.

Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.

NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.

Chapter 4 Message-Passing Programming. The Message-Passing Model.

Tool Visualizations, Metrics, and Profiled Entities Overview [Brief Version] Adam Leko HCS Research Laboratory University of Florida.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

2011/08/23 國家高速網路與計算中心 Advanced Large-scale Parallel Supercluster.

ARCHER Advanced Research Computing High End Resource

Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.

Open XDMoD Overview Tom Furlani, Center for Computational Research

ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.

Operational and Application Experiences with the Infiniband Environment Sharon Brunett Caltech May 1, 2007.

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012.

Getting Started: XSEDE Comet Shahzeb Siddiqui - Software Systems Engineer Office: 222A Computer Building Institute of CyberScience May.

Cliff Addison University of Liverpool NW-GRID Training Event 26 th January 2007 SCore MPI Taking full advantage of GigE.

Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.

ORNL is managed by UT-Battelle for the US Department of Energy Spark On Demand Deploying on Rhea Dale Stansberry John Harney Advanced Data and Workflows.

INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.

An Brief Introduction Charlie Taylor Associate Director, Research Computing UF Research Computing.

Lesson 9: SOFTWARE ICT Fundamentals 2nd Semester SY

Advanced Computing Facility Introduction

SQL Database Management

Compute and Storage For the Farm at Jlab

Kai Li, Allen D. Malony, Sameer Shende, Robert Bell

HPC Roadshow Overview of HPC systems and software available within the LinkSCEEM project.

HPC usage and software packages

WP18, High-speed data recording Krzysztof Wrona, European XFEL

Performance Technology for Scalable Parallel Systems

Performance Analysis and optimization of parallel applications

Introduction to XSEDE Resources HPC Workshop 08/21/2017

CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster

IS3440 Linux Security Unit 7 Securing the Linux Kernel

Cornell Theory Center Cornell Theory Center (CTC) is a high-performance computing and interdisciplinary research center at Cornell.

Topics Introduction Hardware and Software How Computers Store Data

Introduction to High Performance Computing Using Sapelo2 at GACRC

Presentation transcript:

SAN DIEGO SUPERCOMPUTER CENTER Overview of UCSD’s Triton Resource A cost-effective, high performance shared resource for research computing

SAN DIEGO SUPERCOMPUTER CENTER What is the Triton Resource? A medium-scale high performance computing (HPC) and data storage system Designed to serve the needs of UC researchers: Turn-key, cost competitive access to a robust computing resource Supports computing research, scientific & engineering computing, large scale data analysis Lengthy proposals & long waits for access are not required Support short- or long-term projects Flexible usage models are accommodated Free of equipment headaches and staffing costs associated with maintaining a dedicated cluster

SAN DIEGO SUPERCOMPUTER CENTER Triton Resource Components High Performance Network Data Oasis: 2,000 – 4,000 terabytes of disk storage for research data Petascale Data Analysis Facility (PDAF): Unique SMP system for analyzing very large datasets. 28 nodes, 256/512GB of memory, 8 quad core AMD Shanghai processors/node (32 cores/node). Triton Compute Cluster (TCC): Medium-scale cluster system for general purpose HPC. 256 nodes, 24GB of memory, 2 quad core Nehalem processors/node (8 cores/node). High Performance File System To High Bandwidth Research Networks & Internet

SAN DIEGO SUPERCOMPUTER CENTER Flexible Usage Models Shared-queue access Compute nodes are shared with other users Jobs are submitted to queue and wait to run Batch and interactive jobs are supported User accounts are debited by actual service units consumed by the job Dedicated compute nodes User can reserve a fixed number of compute nodes for exclusive access User is charged for 24x7 use of the nodes at 70% utilization Any utilization over 70% is a “bonus” Nodes may be reserved on a monthly basis Hybrid Dedicated nodes for core computing tasks and shared-queue access for overflow or jobs that are not time-critical or jobs requiring higher core counts

SAN DIEGO SUPERCOMPUTER CENTER Triton Resource Benefits Short lead time for project start-up Low waits-in-queue No lengthy proposal process Flexible usage models: Access to HPC experts for setup, software optimization and trouble- shooting Avoid using research staff for sysadmin tasks Avoid headaches with maintenance, aging equipment, project wind- down Access to parallel high performance, high capacity storage system Access to high bandwidth research networks

SAN DIEGO SUPERCOMPUTER CENTER Triton Affiliates & Partners Program TAPP is SDSC’s program for accessing the Triton Resource Two components: Central Campus Purchase Individual / Department Purchase Central Campus Purchase Block purchase made by central campus then allocated out to individual faculty / researchers Individual Purchase Faculty / researchers / departments purchase cycles from grants or other funding Startup Accounts 1,000 SU accounts for evaluation are granted upon request

SAN DIEGO SUPERCOMPUTER CENTER Contact for access/allocations: Ron Hawkins TAPP Manager (858)

SAN DIEGO SUPERCOMPUTER CENTER Numerical Libraries on Triton Mahidhar Tatineni 04/22/2010

SAN DIEGO SUPERCOMPUTER CENTER AMD Core Math Library (ACML) Installed on Triton as part of the PGI compiler installation directory. Covers BLAS, LAPACK, and FFT routines. ACML user guide is in the following location: /opt/pgi/linux86-64/8.0-6/doc/acml.pdf Example BLAS, LAPACK, FFT codes in: /home/diag/examples/ACML

SAN DIEGO SUPERCOMPUTER CENTER BLAS Example Using ACML Compile and link as follows: pgcc -L/opt/pgi/linux86-64/8.0-6/lib blas_cdotu.c -lacml -lm - lpgftnrtl –lrt Output: -bash-3.2$./a.out ACML example: dot product of two complex vectors using cdotu Vector x: ( , ) ( , ) ( , ) Vector y: ( , ) ( , ) ( , ) r = x.y = ( , )

SAN DIEGO SUPERCOMPUTER CENTER Lapack Example Using ACML Compile and link as follows: pgcc -L/opt/pgi/linux86-64/8.0-6/lib lapack_dgesdd.c -lacml - lm -lpgftnrtl –lrt Output: -bash-3.2$./a.out ACML example: SVD of a matrix A using dgesdd Matrix A: Singular values of matrix A:

SAN DIEGO SUPERCOMPUTER CENTER FFT Example Using ACML Compile and link as follows: pgf90 dzfft_example.f -L/opt/pgi/linux86-64/8.0-6/lib –lacml Output: -bash-3.2$./a.out ACML example: FFT of a real sequence using ZFFT1D Components of discrete Fourier transform: Original sequence as restored by inverse transform: Original Restored

SAN DIEGO SUPERCOMPUTER CENTER Intel Math Kernel Libraries (MKL) Installed on Triton as part of the Intel compiler directory. Covers BLAS, LAPACK, FFT, BLACS, and SCALAPACK libraries. Most useful link: The Intel link advisor! advisor/ Examples in the following directory: /home/diag/examples/MKL

SAN DIEGO SUPERCOMPUTER CENTER CBLAS example using MKL Compile as follows: > export MKLPATH=/opt/intel/Compiler/11.1/046/mkl > icc cblas_cdotu_subx.c common_func.c -I$MKLPATH/include $MKLPATH/lib/em64t/libmkl_solver_lp64_sequential.a -Wl,--start-group $MKLPATH/lib/em64t/libmkl_intel_lp64.a $MKLPATH/lib/em64t/libmkl_sequential.a $MKLPATH/lib/em64t/libmkl_core.a -Wl,--end-group -lpthread Run as follows: MKL]$./a.out cblas_cdotu_subx.d C B L A S _ C D O T U _ S U B EXAMPLE PROGRAM INPUT DATA N=4 VECTOR X INCX=1 ( 1.00, 1.00) ( 2.00, -1.00) ( 3.00, 1.00) ( 4.00, -1.00) VECTOR Y INCY=1 ( 3.50, 0.00) ( 7.10, 0.00) ( 1.20, 0.00) ( 4.70, 0.00) OUTPUT DATA CDOTU_SUB = ( , )

SAN DIEGO SUPERCOMPUTER CENTER LAPACK example using MKL Compile as follows: ifort dgebrdx.f -I$MKLPATH/include $MKLPATH/lib/em64t/libmkl_solver_lp64_sequential.a -Wl,--start-group $MKLPATH/lib/em64t/libmkl_intel_lp64.a $MKLPATH/lib/em64t/libmkl_sequential.a $MKLPATH/lib/em64t/libmkl_core.a -Wl,--end-group libaux_em64t_intel.a -lpthread Output: MKL]$./a.out < dgebrdx.d DGEBRD Example Program Results Diagonal Super-diagonal

SAN DIEGO SUPERCOMPUTER CENTER ScaLAPACK example using MKL Sample test case (from MKL examples) is in: /home/diag/examples/scalapack The make file is set up to compile all the tests. Procedure: module purge module load intel module load openmpi_mx make libem64t compiler=intel mpi=openmpi LIBdir=/opt/intel/Compiler/11.1/046/mkl/lib/em64t Sample link line (to illustrate how to link for scalapack): mpif77 -o../xsdtlu_libem64t_openmpi_intel_noopt_lp64 psdtdriver.o psdtinfo.o psdtlaschk.o psdbmv1.o psbmatgen.o psmatgen.o pmatgeninc.o -L/opt/intel/Compiler/11.1/046/mkl/lib/em64t /opt/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_scalapack_lp64.a /opt/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_blacs_openmpi_lp64.a - L/opt/intel/Compiler/11.1/046/mkl/lib/em64t /opt/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_intel_lp64.a -Wl,--start-group /opt/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_sequential.a /opt/intel/Compiler/11.1/046/mkl/lib/em64t/libmkl_core.a -Wl,--end-group -lpthread

SAN DIEGO SUPERCOMPUTER CENTER Profiling Tools on Triton FPMPI MPI profiling library: /home/beta/fpmpi/fpmpi-2 (PGI +MPICH MX) TAU Profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, Java, Python. Available on Triton, compiled with PGI compilers. /home/beta/tau/2.19-pgi /home/beta/pdt/3.15-pgi

SAN DIEGO SUPERCOMPUTER CENTER Using FPMPI on Triton The library is located in: /home/beta/fpmpi/fpmpi-2/lib Needs PGI and MPICH MX: > module purge > module load pgi > module load mpich_mx Just relink with the library. For example: /opt/pgi/mpichmx_pgi/bin/mpicc -o cpi cpi.o -L/home/beta/fpmpi/fpmpi-2/lib -lfpmpi

SAN DIEGO SUPERCOMPUTER CENTER Using FPMPI on Triton Run code normally: >mpirun -machinefile $PBS_NODEFILE -np 2./cpi Process 1 on tcc-2-25.local pi is approximately , Error is wall clock time = Process 0 on tcc-2-25.local Creates output file (fpmpi_profile.txt) with profile data. Check /home/diag/FPMPI directory for more examples.

SAN DIEGO SUPERCOMPUTER CENTER Sample FPMPI Output Command: /mirage/mtatineni/TESTS/FPMPI/./cpi Date: Wed Apr 21 16:44: Processes:2 Execute time:0 Timing Stats: [seconds][min/max] [min rank/max rank] wall-clock: 0 sec / / 0 Memory Usage Stats (RSS) [min/max KB]:825/926 Average of sums over all processes Routine Calls Time Msg Length %Time by message length K M MPI_Bcast : * MPI_Reduce : *

SAN DIEGO SUPERCOMPUTER CENTER Sample FPMPI Output Details for each MPI routine Average of sums over all processes % by message length (max over processes [rank]) K M MPI_Bcast: Calls : 2 2 [ 0] 0* Time : [ 1] 0* Data Sent : 4 8 [ 0] By bin : 1-4[2,2][ 5.96e-06, ] MPI_Reduce: Calls : 1 1 [ 0] 00* Time : [ 0] 00* Data Sent : 8 8 [ 0] By bin : 5-8[1,1][ , 0.027] Summary of target processes for point-to-point communication: 1-norm distance of point-to-point with an assumed 2-d topology (Maximum distance for point-to-point communication from each process) 0 0 Detailed partner data: source: dest1 dest2... Size of COMM_WORLD2 0: 1:

SAN DIEGO SUPERCOMPUTER CENTER About Tau TAU is a suite of Tuning and Analysis Utilities year project involving University of Oregon Performance Research Lab LANL Advanced Computing Laboratory Research Centre Julich at ZAM, Germany Integrated toolkit Performance instrumentation Measurement Analysis Visualization

SAN DIEGO SUPERCOMPUTER CENTER Using Tau Load the papi and tau modules Gather information for the profile run: Type of run (profiling/tracing, hardware counters, etc…) Programming Paradigm (MPI/OMP) Compiler (Intel/PGI/GCC…) Select the appropriate TAU_MAKEFILE based on your choices ($TAU/Makefile.*) Set up the selected PAPI counters in your submission script Run as usual & analyze using paraprof You can transfer the database to your own PC to do the analysis

SAN DIEGO SUPERCOMPUTER CENTER TAU Performance System Architecture

SAN DIEGO SUPERCOMPUTER CENTER Tau: Example Set up the tau environment (this will be in modules in the next software stack of Triton): export PATH=/home/beta/tau/2.19-pgi/x86_64/bin:$PATH export LD_LIBRARY_PATH=/home/beta/tau/2.19-pgi/x86_64/lib:$LD_LIBRARY_PATH Choose the TAU_MAKEFILE to use for your code. For example: /home/beta/tau/2.19-pgi/x86_64/lib/Makefile.tau-mpi-pdt-pgi So we set it up: % export TAU_MAKEFILE=/home/beta/tau/2.19-pgi/x86_64/lib/Makefile.tau-mpi-pdt-pgi And we compile using the wrapper provided by tau: % tau_cc.sh matmult.c Run the job through the queue normally. Analyze output using paraprof. (More detail in the Ranger part of the presentation).

SAN DIEGO SUPERCOMPUTER CENTER Coming Soon on Triton Data Oasis version 0! We have the hardware on site and are working to get the lustre filesystem setup (~350TB). Upgrade of entire software stack. A lot of the packages in /home/beta will become a permanent part of the stack (we have rocks rolls for them). This will happen within a month. mpiP will be installed soon on Triton. PAPI/IPM needs perfctr patch of kernel. Need to integrate this into our stack (Not in the current upgrade).