Debugging and Profiling GMAO Models with Allinea’s DDT/MAP Georgios Britzolakis April 30, 2015.

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

Configuration management
SL-10 Laboratory Hot Tack / Seal Tester TMI Group of Companies TMI Group of Companies.
MPI Cluster Debugging in VS2010 Facilitates the “F5” scenario Eases the launch of an MPI application Once the debugger is invoked, normal multi-process.
® IBM Software Group © 2010 IBM Corporation What’s New in Profiling & Code Coverage RAD V8 April 21, 2011 Kathy Chan
Hybrid Redux: CUDA / MPI 1. CUDA / MPI Hybrid – Why? 2  Harness more hardware  16 CUDA GPUs > 1!  You have a legacy MPI code that you’d like to accelerate.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
Browsers and Servers CGI Processing Model ( Common Gateway Interface ) © Norman White, 2013.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 11: Monitoring Server Performance.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
WNT Client/Server SDK Tony Vaccaro CS699 Project Presentation.
CS 732: Advance Machine Learning Usman Roshan Department of Computer Science NJIT.
Asynchronous Solution Appendix Eleven. Training Manual Asynchronous Solution August 26, 2005 Inventory # A11-2 Chapter Overview In this chapter,
Contemporary Languages in Parallel Computing Raymond Hummel.
Systems Software Operating Systems.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Renesas Technology America Inc. 1 M16C/Tiny SKP Tutorial 2 Creating A New Project Using HEW4.
A First Program Using C#
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>
Thrive Installation.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.
Understanding the CORBA Model. What is CORBA?  The Common Object Request Broker Architecture (CORBA) allows distributed applications to interoperate.
CIM6400 CTNW (04/05) 1 CIM6400 CTNW Lesson 6 – More on Windows 2000.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
Appendix A Starting Out with Windows PowerShell™ 2.0.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Tutorial 7 Creating Forms. Objectives Session 7.1 – Create an HTML form – Insert fields for text – Add labels for form elements – Create radio buttons.
1. Insert the Resource CD into your CD-ROM drive, click Start and choose Run. In the field that appears, enter F:\XXX\Setup.exe (if “F” is the letter of.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin.
DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.
Parallel Computing with Matlab CBI Lab Parallel Computing Toolbox TM An Introduction Oct. 27, 2011 By: CBI Development Team.
CHAPTER TEN AUTHORING.
Renesas Technology America Inc. 1 SKP8CMINI Tutorial 2 Creating A New Project Using HEW.
Python From the book “Think Python”
Invitation to Computer Science 5 th Edition Chapter 6 An Introduction to System Software and Virtual Machine s.
Introduction Purpose This training course covers debugging an application on an SH target in the Renesas HEW (High-performance Embedded Workshop) development.
Interfaces to External EDA Tools Debussy Denali SWIFT™ Course 12.
Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,
CPSC 233 Run graphical Java programs remotely on Mac and Windows.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 11: Monitoring Server Performance.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group NERSC User Group Meeting September 17, 2007.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
The Software Development Process
ASP (Active Server Pages) by Bülent & Resul. Presentation Outline Introduction What is an ASP file? How does ASP work? What can ASP do? Differences Between.
Creating and running an application.
Using the Weizmann Cluster Nov Overview Weizmann Cluster Connection Basics Getting a Desktop View Working on cluster machines GPU For many more.
Renesas Technology America Inc. 1 SKP8CMINI Tutorial 2 Creating A New Project Using HEW.
Linux Operations and Administration
Tool Integration with Data and Computation Grid “Grid Wizard 2”
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
CS 732: Advance Machine Learning
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
Hands-On Microsoft Windows Server 2008 Chapter 5 Configuring Windows Server 2008 Printing.
Module 14: Advanced Topics and Troubleshooting. Microsoft ® Windows ® Small Business Server (SBS) 2008 Management Console (Advanced Mode) Managing Windows.
How to use HybriLIT Matveev M. A., Zuev M.I. Heterogeneous Computations team HybriLIT Laboratory of Information Technologies (LIT), Joint Institute for.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Introduction to HPC Debugging with Allinea DDT Nick Forrington
CACI Proprietary Information | Date 1 Sybase Open Client 15.5 ESD#6 Name: Semarria Rosemond Title: Systems Analyst, Lead Date: December 8, 2011.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Using ROSSMANN to Run GOSET Studies Omar Laldin ( using materials from Jonathan Crider, Harish Suryanarayana ) Feb. 3, 2014.
The Distributed Application Debugger (DAD)
GWE Core Grid Wizard Enterprise (
Indranil Roy High Performance Computing (HPC) group
Teaching slides Chapter 6.
Presentation transcript:

Debugging and Profiling GMAO Models with Allinea’s DDT/MAP Georgios Britzolakis April 30, 2015

2 Today’s Objectives Objectives: – Outline Allinea’s MAP/DDTs features – Some preliminary settings on how to set up SSH connections to Discover that Allinea needs to work. Configure the SSH settings (Review the NCCS primer) Setup the Remote Installation directory for Allinea – Demonstrate how to use DDT to debug GMAO models using the remote client – Some guidance on how the *.qtf Allinea job files are generated Describe the qtf file structure Distinguish how different the *.qtf file is from the original job file. – Demonstrate how to use MAP to profile GMAO modes using the remote client. Present some of the features that MAP provides – Give a real time example for DDT and MAP – Questions

3 Allinea Features DDT: – Allinea DDT is a straight forward, scalable, graphical debugger capable of debugging a wide variety of scenarios found in today’s development environments. With Allinea DDT it is possible to debug: Single process with multithreaded software OpenMP Heterogeneous software such as that written to use GPUs Hybrid codes mixing paradigms such as MPI + OpenMP or MPI + CUDA Multi-process software of any form, including client server applications – Allinea DDT supports many compiled languages that are found in mainstream and high-performance computing including: C, C++, and all derivatives of Fortran, including Fortran 90 Parallel languages/models including MPI, UPC, and Fortran 2008 Co-arrays GPU languages such as HMPP, OpenMP accelerators, CUDA and CUDA Fortran

4 Allinea Features MAP: – Allinea MAP is a parallel profiler that aims to be powerful but easy to use, scalable to hundreds of thousands of processes while maintaining low overhead. Allinea MAP features: A sampling profiler with adaptive sampling rates to keep the data volumes collected under control. Samples are aggregated at all levels to preserve key features of a run without drowning in data. A folding code and stack viewer allows you to drill down to time spend on individual lines and draw back to see the big picture across nodes of routines Both interactive and batch nodes for gathering profile data A unified job control interactive with Allinea DDT – configure it once and both tools just work, with a familiar, usable interface Metrics that allow memory usage, floating point calculations and MPI usage to be seen through a program run and across processes All the above with just 5% application slowdown even with thousands of MPI processes

Preliminary Settings to Configure SSH for Allinea This assumes you have installed Allinea on your local machine Set up your.ssh/config (for linux/MAC systems) so it contains the following: host discover.nccs.nasa.gov dirac.nccs.nasa.gov dali.nccs.nasa.gov discover User > LogLevel Quiet ProxyCommand ssh -l > login.nccs.nasa.gov direct %h Protocol 2 ConnectTimeout 30 (optional) Add “Remote Launch” connection from the Allinea menus: – Go under “Remote Launch”->“Configure”-> “Add” Connection Name: discover Host Name: Remote Installation Directory: /usr/local/allinea/5.0.1/tools (for Allinea v5.0.1) Press OK to add the new connection Back on the Main menu, select the discover connection so the connection is established

Set Up DDT with GMAO’s Models Build your code using option: BOPT=Og Install your experiment test case by running ldsetup.py (or ldsetup) script In addition to the job script a *.qtf file was generated under the run directory. Start Allinea and select “discover” under the remote Launch section (as it was configured before).

Set Up DDT with GMAO’s Models (Cont’d) On the main window, select “Options”: – “System”: MPI/UPC Implementation: “Intel MPI” – “Job Submission”: Submission File: The location of your qtf file (under the run directory) Edit Queue Parameters: Set the time you would like the resources to be requested for

Set Up DDT with GMAO’s Models (Cont’d) On the main screen go to “Run” Identify the application that is going to be debugged Identify the number of Processes, Number of Nodes, Processes per Node. (Specify the same number of cores that the original job file includes.) Check Submit to Queue

Compare *.qtf Allinea Job File with Regular Job File

Translate Allinea’s Tags to Real Parameters # submit: /usr/slurm/bin/sbatch # display: /usr/slurm/bin/squeue # job regexp: (\d+) # cancel: /usr/slurm/bin/scancel JOB_ID_TAG # show num_nodes: no # Header of the qtf:

Set Up MAP with GMAO’s Models The Allinea setup to use MAP is similar to the DDT procedure except when the code is built it needs to be build with options:BOPT=Og APROF=y Install your experiment test case by running ldsetup.py (or ldsetup) script if this is not done during the debug process The qtf is the same as the one in the debug step Remaining settings are the same as in the debug procedure

MAP Profile Using LDAS

Example – High Memory Usage Detection

Example – High Floating Point Activity

Demonstrate how to use use the debugger (i.e., submit a job, step through the code, view variables) Some hints on how to use the profiler: – Submit a job – Detect hot spots in the code – View the appropriate code section for each hot spot Real-time Demo with DDT/MAP for LDAS Model