Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.

Slides:



Advertisements
Similar presentations
Three types of remote process invocation
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
MPI Message Passing Interface
Parallel Processing with OpenMP
Introduction to Openmp & openACC
A Process Splitting Transformation for Kahn Process Networks Sjoerd Meijer.
Utilizing the GDB debugger to analyze programs Background and application.
SYSTEM PROGRAMMING & SYSTEM ADMINISTRATION
GridRPC Sources / Credits: IRISA/IFSIC IRISA/INRIA Thierry Priol et. al papers.
Multilingual Debugging Support for Data-driven Parallel Languages Parthasarathy Ramachandran Laxmikant Kale Parallel Programming Laboratory Dept. of Computer.
Reference: Getting Started with MPI.
1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.
DISTRIBUTED AND HIGH-PERFORMANCE COMPUTING CHAPTER 7: SHARED MEMORY PARALLEL PROGRAMMING.
Exceptions in Java Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Remote Procedure Calls. 2 Client/Server Paradigm Common model for structuring distributed computations A server is a program (or collection of programs)
Client Server Model and Software Design TCP/IP allows a programmer to establish communication between two application and to pass data back and forth.
Optimizing Compilers for Modern Architectures Compiling High Performance Fortran Allen and Kennedy, Chapter 14.
Replay Debugging for Distributed Systems Dennis Geels, Gautam Altekar, Ion Stoica, Scott Shenker.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Debugging Cluster Programs using symbolic debuggers.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.
University of Maryland The DPCL Hybrid Project James Waskiewicz.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
DDT Debugging Techniques Carlos Rosales Scaling to Petascale 2010 July 7, 2010.
Transparent Process Migration: Design Alternatives and the Sprite Implementation Fred Douglis and John Ousterhout.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
Advanced Computer Networks Topic 2: Characterization of Distributed Systems.
1 Serial Run-time Error Detection and the Fortran Standard Glenn Luecke Professor of Mathematics, and Director, High Performance Computing Group Iowa State.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
1 Parallel Programming Aaron Bloomfield CS 415 Fall 2005.
Homework Assignment #1 J. H. Wang Oct. 13, Homework #1 Chap.1: 1.24 Chap.2: 2.13 Chap.3: 3.5, 3.13* (or 3.14*) Chap.4: 4.6, 4.12* –(*: optional.
Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,
Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.
Debugging. Outline Announcements: –HW II due Fridayl db.
MPI and OpenMP.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
Threaded Programming Lecture 2: Introduction to OpenMP.
CPS120: Introduction to Computer Science Compiling a C++ Program From The Command Line.
HPD -- A High Performance Debugger Implementation A Parallel Tools Consortium project
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Parallelization Strategies Laxmikant Kale. Overview OpenMP Strategies Need for adaptive strategies –Object migration based dynamic load balancing –Minimal.
CSc 352 Debugging Tools Saumya Debray Dept. of Computer Science The University of Arizona, Tucson
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Text TCS INTERNAL Oracle PL/SQL – Introduction. TCS INTERNAL PL SQL Introduction PLSQL means Procedural Language extension of SQL. PLSQL is a database.
A Dynamic Tracing Mechanism For Performance Analysis of OpenMP Applications - Caubet, Gimenez, Labarta, DeRose, Vetter (WOMPAT 2001) - Presented by Anita.
Parallel Computing Presented by Justin Reschke
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Reference Implementation of the High Performance Debugging (HPD) Standard Kevin London ( ) Shirley Browne ( ) Robert.
Chapter Goals Describe the application development process and the role of methodologies, models, and tools Compare and contrast programming language generations.
The Distributed Application Debugger (DAD)
14 Compilers, Interpreters and Debuggers
SHARED MEMORY PROGRAMMING WITH OpenMP
In-situ Visualization using VisIt
Debuggers.
CSc 352 Debugging Tools Saumya Debray Dept. of Computer Science
Prof. Leonardo Mostarda University of Camerino
Remote Procedure Call Hank Levy 1.
Parallel Computing Explained How to Parallelize a Code
Remote Procedure Call Hank Levy 1.
Remote Procedure Call Hank Levy 1.
Presentation transcript:

Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA AMES Research Center

Background Computational Intensive Applications Fortran, C/C++ Migration of codes to parallel computers Shared memory parallelization: –Multithreading –Compiler support via directives. Distributed memory parallelization: –Requires explicit message passing, e.g. MPI Desire to generate message passing codes from existing sequential programs

The CAPTools Parallelization Support Tool –Developed at the University of Greenwich –Transforms existing sequential Fortran code into parallel message passing code Extensive dependence analysis across statements, loop iterations, and subroutine calls. Partitioning of Array Data Generation of necessary calls to communication routines program Laplace real u(100), v(100) … do 10 i = 2, 99 u(i) = 0.5 * (v(i-1) + v(i+1)) end do …. program PARALLELlaplace real u(100), v(100) … CALL CAP_EXCHANGE(v, CAP_RIGHT…) CALL CAP_EXCHANGE(v, CAP_LEFT,…) do i = CAP_LOW, CAP_HIGH u(i) = 0.5*(v(i-1) + v(i+1)) end do … User guidance

The CAPTools Parallelization Support Tool –Developed at the University of Greenwich –Transforms existing sequential Fortran code into parallel message passing code Extensive dependence analysis across statements, loop iterations, and subroutine calls. Partitioning of array data Generation of necessary calls to communication routines program Laplace real u(100), v(100) … do 10 i = 2, 99 u(i) = 0.5 * (v(i-1) + v(i+1)) end do …. program PARALLELlaplace real u(100), v(100) … CALL CAP_EXCHANGE(v, CAP_RIGHT…) CALL CAP_EXCHANGE(v, CAP_LEFT,…) do i = CAP_LOW, CAP_HIGH u(i) = 0.5*(v(i-1) + v(i+1)) end do … User guidance Possible sources for errors: Wrong user information Tool makes mistake

Relative Debugging P1: version of a program that produces correct results. P2: version of the same program that produces incorrect results. Relative Debugging: –Compare data between P1 and P2 to locate the error. –P1 and P2 can possibly run on different machines, e.g., a sequential and a parallel architecture. Applies directly to our situation. P1 P2 3 P2 1 P2 2 P2 4 compare

Questions What data values should be compared? When during execution should they be compared? Where should data residing in multiple address spaces be compared? How do we decide whether the values are correct? How do we handle distributed data?

Main Players in the Prototype: The CAPTools Database The CAPTools Database: –Provides variable definition information across subroutines to determine which variables should be checked. –Provides array distribution information to determine how distributed data should be compared against undistributed data. Undistributed array Replicated Memory Reduced Memory Block wise distributions CAPTools Information: sub1: var1: CAP1_LOW:CAP1_HIGH,1:N sub2: var2: 1:M,CAP1_LOW:CAP2_HIGH

Main Players in the Prototype: The Comparison Routines The comparison routines: inserted at entry and exit of suspicious routines to bracket error location. compit1: Inserted in sequential program S –Receives data from each processor from parallel program P1, P2,... –Compares data to its own Checksums, partial checksums, element-by-element comparison –Calls special routine if discrepancy detected. compit2: Inserted in parallel program –Sends local data to sequential process... subroutine sub1(var1) call compit1(var1) … call compit1(var1) end... subroutine sub1(var1) call compit2(var1) … call compit2(var1) end... subroutine sub1(var1) call compit2(var1) … call compit2(var1) end S P1 P2

Main Players in the Prototype: Instrumentation Server Instrumentation Server (IS): –Based on dyninstAPI which was developed at the University of Maryland. C++ library that provides API for runtime code patching. –Permits insertion of calls to comparison routines into a running program: No need to edit source code, re-compile, re-link –Provides commands like: attach : attach to running process. createPoint : create a point where instrumentation can be inserted. insertCall : insert a subroutine call at a given instrumentation point.

Main Players in the Prototype: P2d2 P2d2 debugger: –Developed at NASA Ames Research Center –Portable, scalable, parallel debugger –Client-Server Architecture based on gdb –P2d2 coordinates the actions of the other players and provides user Interface

A Relative Debugging Session (1)

A Relative Debugging Session (2)

Behind the Scenes p2d2 gdb s P2 P3 P4 P1 Instrumentation Server CAPTools database Contact Info File

Behind the Scenes p2d2 gdb s P2 P3 P4 P1 Instrumentation Server CAPTools database Contact Info File compit2 compit1

Behind the Scenes p2d2 gdb s P2 P3 P4 P1 Instrumentation Server CAPTools database Contact Info File

Behind the Scenes p2d2 gdb s P2 P3 P4 P1 Instrumentation Server CAPTools database Contact Info File

Behind the Scenes (3) p2d2 gdb s P2 P3 P4 P1 Instrumentation Server CAPTools database Contact Info File copyphi:oldphi5 update:phi4 setupgrid:phi6 copyphi:oldphi5 update:phi4 setupgrid:phi6

Behind the Scenes (4) p2d2 gdb s P2 P3 P4 P1 Instrumentation Server CAPTools database Contact Info File executable name process id machine name port number executable name process id machine name port number

Behind the Scenes p2d2 gdb s P2 P3 P4 P1 Instrumentation Server CAPTools database Contact Info File compit2 compit1

Behind the Scenes p2d2 gdb s P2 P3P4 P1 compit2 compit1 breakpoint trap if error is detected p2d2 indicates to user that difference was detected. Run outside of debugger control. Communicate with S.

Gdb vs Dyninst Gdb-based approach –Invoke compit routines at breakpoints. Compare timings for different: –Number of comparisons. –Size of data to be compared. –Amount of computation between comparisons. –Number of processes in parallel execution. Result summary: –Dyninst-based approach advantageous for Small amount of data to be compared Larger number of processes –Gdb-based approach more stable, and provides more flexibility

call sub1(arg2, arg3) sub2(arg1,arg2,arg3). return end Pid 9999 gdb approach: attach a.out 9999 break sub2 commands silent call sub1(arg2, arg3) continue end IS approach: attach a.out 9999 createPoint 9999 sub2 insertCall 9999 sub1 2 3

Case 1: Data size small, number of comparisons low, amount of computation between comparisons small. Case 2: Data size small, number of comparisons high, amount of computation between comparisons small.

Case 4: Data size large, number of comparisons low, amount of computation between comparisons low. Case 5: Data size large, number of comparisons low, amount of computation between comparisons large.

Related Work GUARD –Relative Debugger for Parallel Programs –Developed at the Griffith University in Brisbane, Australia –Does not aim particularly at automatically parallelized programs –Provides user commands like “assert” and “compare” for comparison. –Provides means for the user to describe array distribution. –Debugger collects data from both executables and performs comparison

Project Status and Future Work We have built a prototype of a relative debugging system for comparing serial codes and their tool produced counterparts. –Runs on SGI Origin IRIX6.5. We used dynamic instrumentation to minimize comparison overhead Extend prototype to handle OpenMP programs. Allow for more types of data distributions.