Computer and Automation Research Institute Hungarian Academy of Sciences The P-GRADE Visual Parallel Programming Environment Péter Kacsuk Laboratory of.

Slides:

Advertisements

Similar presentations

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University

Advertisements

Configuration management

Practical techniques & Examples

Interactive and semiautomatic performance evaluation W. Funika, B. Baliś M. Bubak, R. Wismueller.

A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager

Automated Instrumentation and Monitoring System (AIMS)

The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.

An Automata-based Approach to Testing Properties in Event Traces H. Hallal, S. Boroday, A. Ulrich, A. Petrenko Sophia Antipolis, France, May 2003.

Visual Solution to High Performance Computing Computer and Automation Research Institute Laboratory of Parallel and Distributed Systems

Components for high performance grid programming in the GRID.it project 1 Workshop on Component Models and Systems for Grid Applications - St.Malo 26 june.

Parallel Programming Models and Paradigms

Chapter 2: The Visual Studio.NET Development Environment Visual Basic.NET Programming: From Problem Analysis to Program Design.

1 An Introduction to Visual Basic Objectives Explain the history of programming languages Define the terminology used in object-oriented programming.

© 2008 IBM Corporation Behavioral Models for Software Development Andrei Kirshin, Dolev Dotan, Alan Hartman January 2008.

Instrumentation and Measurement CSci 599 Class Presentation Shreyans Mehta.

Programming In C++ Spring Semester 2013 Programming In C++, Lecture 1.

Introduction to Systems Analysis and Design Trisha Cummings.

1 Integrated Development Environment Building Your First Project (A Step-By-Step Approach)

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

OpenTS for Windows Compute Cluster Server. Overview  Introduction  OpenTS (academic) for Windows CCS  T-converter  T-microkernel  OpenTS installer.

Hunt for Molecules, Paris, 2005-Sep-20 Software Development for ALMA Robert LUCAS IRAM Grenoble France.

Chemistry GRID and its application for air pollution forecast Computer and Automation Research Institute of the Hungarian Academy of Sciences (MTA SZTAKI)

Marcelo de Paiva Guimarães Bruno Barberi Gnecco Marcelo Knorich Zuffo

Hungarian Supercomputing GRID

WORK ON CLUSTER HYBRILIT E. Aleksandrov 1, D. Belyakov 1, M. Matveev 1, M. Vala 1,2 1 Joint Institute for nuclear research, LIT, Russia 2 Institute for.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Cluster Programming Technology and its Application in Meteorology Computer and Automation Research Institute Hungarian Academy of Sciences Hungarian Meteorological.

Computer and Automation Research Institute Hungarian Academy of Sciences Presentation and Analysis of Grid Performance Data Norbert Podhorszki and Peter.

Introduction to MDA (Model Driven Architecture) CYT.

Adventures in Mastering the Use of Performance Evaluation Tools Manuel Ríos Morales ICOM 5995 December 4, 2002.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

GRM + Mercury in P-GRADE Monitoring of P-GRADE applications in the Grid using GRM and Mercury.

Support for Debugging Automatically Parallelized Programs Robert Hood Gabriele Jost CSC/MRJ Technology Solutions NASA.

Computer and Automation Research Institute Hungarian Academy of Sciences Automatic checkpoint of CONDOR-PVM applications by P-GRADE Jozsef Kovacs, Peter.

Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.

1 Cactus in a nutshell... n Cactus facilitates parallel code design, it enables platform independent computations and encourages collaborative code development.

Martin Schulz Center for Applied Scientific Computing Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory, P. O. Box 808, Livermore,

OMIS Approach to Grid Application Monitoring Bartosz Baliś Marian Bubak Włodzimierz Funika Roland Wismueller.

CE Operating Systems Lecture 3 Overview of OS functions and structure.

1 M. Tudruj, J. Borkowski, D. Kopanski Inter-Application Control Through Global States Monitoring On a Grid Polish-Japanese Institute of Information Technology,

XII.1 Debugging of Distributed Systems. XII.2 Debugging of Distributed Systems Example of a tool for distributed systems Approach to fault search during.

Profiling, Tracing, Debugging and Monitoring Frameworks Sathish Vadhiyar Courtesy: Dr. Shirley Moore (University of Tennessee)

Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.

Portable Parallel Performance Tools Shirley Browne, UTK Clay Breshears, CEWES MSRC Jan 27-28, 1998.

Performance evaluation on grid Zsolt Németh MTA SZTAKI Computer and Automation Research Institute.

Chapter 2 Introduction to Systems Architecture. Chapter goals Discuss the development of automated computing Describe the general capabilities of a computer.

Computer and Automation Research Institute Hungarian Academy of Sciences SZTAKI’s work in DataGrid WP September Norbert Podhorszki Laboratory of.

1 P-GRADE Portal: a workflow-oriented generic application development portal Peter Kacsuk MTA SZTAKI, Hungary Univ. of Westminster, UK.

Debugging parallel programs. Breakpoint debugging Probably the most widely familiar method of debugging programs is breakpoint debugging. In this method,

Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.

NetLogger Using NetLogger for Distributed Systems Performance Analysis of the BaBar Data Analysis System Data Intensive Distributed Computing Group Lawrence.

Overview of AIMS Hans Sherburne UPC Group HCS Research Laboratory University of Florida Color encoding key: Blue: Information Red: Negative note Green:

 Programming - the process of creating computer programs.

A Demonstration of Collaborative Web Services and Peer-to-Peer Grids Minjun Wang Department of Electrical Engineering and Computer Science Syracuse University,

Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.

Object-Oriented Application Development Using VB.NET 1 Chapter 2 The Visual Studio.NET Development Environment.

Source Level Debugging of Parallel Programs Roland Wismüller LRR-TUM, TU München Germany.

Lesson 1 1 LESSON 1 l Background information l Introduction to Java Introduction and a Taste of Java.

High Performance Flexible DSP Infrastructure Based on MPI and VSIPL 7th Annual Workshop on High Performance Embedded Computing MIT Lincoln Laboratory

Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.

Visual Programming Borland Delphi. Developing Applications Borland Delphi is an object-oriented, visual programming environment to develop 32-bit applications.

Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.

General Grid Monitoring Infrastructure (GGMI) Peter kacsuk and Norbert Podhorszki MTA SZTAKI.

Parallel Programming Models EECC 756 David D. McGann 18 May, 1999.

Parallel Programming By J. H. Wang May 2, 2017.

Pattern Parallel Programming

University of Technology

A configurable binary instrumenter

Presentation transcript:

Computer and Automation Research Institute Hungarian Academy of Sciences The P-GRADE Visual Parallel Programming Environment Péter Kacsuk Laboratory of Parallel and Distributed Systems MTA SZTAKI Research Institute

Problems of Developing Parallel Programs High-Speed Switch Observing? Programming?

Our Solution: P-GRADE  P-GRADE is a parallel programming environment which supports the whole life-cycle of parallel program development  For non-specialist programmers it provides a complete solution for efficient and easy parallel program development  Fast reengineering of sequential programs for parallel computers  Unified graphical support in program design, debugging and performance analysis  Portability on supercomputers and heterogeneous workstation/PC clusters based on PVM and MPI

Tools of P-GRADE GRAPNEL: Hybrid Parallel Prog. Language –Graphics to express parallelism –C/C++ to describe sequential parts GRED: Graphical Editor GRP2C: Pre-compiler to (C/C++)+(PVM/MPI) DIWIDE: Integrated distributed debugger and animation system GRM: distributed monitoring system PROVE: Integrated visualisation tool

Parallel Program Design GRAPNE L GRE D Mapping User mapping GRP file Pre-compilation GRP2C C source code, Cross-ref file, Make file Building executables C compiler, linker GRP-PVM GRM Library PVM Library GRP-MPI GRM Library MPI Library executables Trace file Monitoring GRM Visualisation PROVE Life-cycle of Parallel Program Development and its support in P-GRADE GRP file Debugging DIWIDE

Design Goals of GRAPNEL Graphical interface –to define all parallel activities –Strong support for hierarchical design –Visual abstractions to hide the low level details of message-passing C/C++ (or Fortran) to describe sequential parts –Strong support for parallelizing sequential applications –Support for programming in large –No steep learning curve GRAPNEL = (C/C++) + graphics

GRAPNEL: GRaphical Process NEt Language Programming paradigm: message-passing –component processes run in parallel and can interact only by means of sending and receiving messages Communication model: –point-to-point, synchronous/asynchronous –collective (e.g. multicast, scatter, reduce, etc.) Process model: –single processes –process groups –predefined process communication templates

Three layers of GRAPNEL

GRAPNEL  Hierarchical design levels:  Graphics used at application level:  Defines interprocess communication topology  Port protocols  Graphics hides PVM/MPI function calls  Support for SPMD programming style  Predefined communication patterns  Automatic scaling of parallel programs

Communication Templates Pre-defined regular process topologies –process farm –pipeline –2D mesh –tree User defines: –representative processes –actual size Automatic scaling

Mesh Template

Tree Template

The process farm parallelisation approach Master Send work packages send(); Collect results recv(); Slave1Slave2SlaveN spawn(N); The code of each slave is the same.

Parallelising the Mandelbrot set computation

Draw process output

Compute process input

Compute process output

Draw process input

Process Groups Hierarchical design (subgraph abstraction) Collective communication (group ports) –multicast –scatter –gather –reduce

GRAPNEL  Hierarchical design levels:  Graphics used at process internal level  C/C++ used at the text level  Synch/asynch. comm.  Programming in large:  Any C/C++ library call can be included in text blocks  Graphical support for object- based programming

GRAPNEL  Structuring facility by macro graphs

multicast gather Userdef (grp_in) reduce scatter Point-point gather scatter Userdef (grp_out) GRAPNEL

Parallelising the Mandelbrot set computation

GRED Editor  Supports the creation of all the elements of GRAPNEL  Drag-and-drop style of drawing  Cut/copy/paste/move on graphical objects  Automatic port positioning with minimal lengths and crossing of communication channels

GRED Editor  Extremely easy and fast construction of process graph  Automatic arrange of the process graph  Automatic resizing of process windows  Cut/copy/paste on graphical objects  Macro graph construction at arbitrarily nested level  C/C++ code can be edited by any standard text editor

GRP2C Pre-compiler Automatic generation of PVM and MPI calls based on GRAPNEL graphics GRP2C C/C++ graphics GRAPNEL Automatic code instrumentation for debugging and performance monitoring C/C++ PVM/MPI Generated code

Debugging Parallel Programs High-Speed Switch Observing?

Principle of sequential program debugging Reproducibility - determinism –For the same input set the sequential program delivers always the same output set (even if the program is incorrect) Used technique: cyclic debugging –breakpoints –step-by-step execution

Problem of parallel program debugging Non-reproducibility (non-determinism) –For the same input set the incorrect parallel program can deliver different output sets Cyclic debugging cannot be used –breakpoints –step-by-step execution

Classification of parallel debuggers Parallel running seq. debuggers Replayable debuggers Monitor&replay Control&replay

DIWIDE Debugger  Graphical and C/C++ level debug support (breakpoints, variable inspection, etc.)  3 kinds of “step by step execution”, according to the programmer’s demand:  Instruction by instruction,  Graphical item by graphical item,  Macrostep by macrostep  Visualisation and animation support

Hierarchical Debugging by DIWIDE

Classification of parallel debuggers Parallel running seq. debuggers Replayable debuggers Monitor&replay Control&replay

Classification of parallel breakpoints Local breakpoints Global breakpoints Individual breakpoints Collective breakpoints

Principle of Macrostep Debugging  Parallel debugging is as easy as debugging traditional sequential programs. Macrosteps Collective Breakpoints M 0 = {S 1 -> A 1, S 2 -> A 2, S 3 -> A 3 } A 1 A 2 A 3 M 1 = {A 1 -> B 1, A 2 -> B 2, A 3 -> B 3 } B 1 B 2 B 3 M 2 = {B 1 -> B 1, B 2 -> C 2, B 3 -> B 3 } B 1 C 2 B 3 M 3 = {B 1 -> B 1, C 2 -> D 2, B 3 -> E 3 } B 1 D 2 E 3 M 4 = {B 1 -> E 1, D 2 -> E 2 } E 1 E 2 where S i = Start i and E i = End i P1P1 P2P2 P3P3 S1S1 A1A1 B1B1 E1E1 S2S2 S3S3 A2A2 A3A3 B2B2 C2C2 D2D2 E2E2 B3B3 E3E3

Macrostep Debugging  Support for systematic debugging to handle non- deterministic behaviour of parallel applications  Systematic and automatic generation of Execution Trees  Testing parallel programs for all time conditions  Replay technique with collective breakpoints

Automatic Deadlock Detection by Macrostep Debugging

Integration of Macrostep Debugging and PROVE

Performance monitoring and analysis of Parallel Programs High-Speed Switch Observing?

Visualisation Systems Scientific (Data Oriented) Visualisation Program Visualisation Problem Visualisation (Alg. Animation) Correctness Debugging Performance (Debugging) Visualisation Combined Visualisation Goal of visualisation? What to visualise?

Program Visualisation Correctness Debugging Performance Visualisation Combined Visualisation Goal of visualisation? Off-line On-line Semi On-line When to visualise?

Phases of Performance Visualisation Source Code Instrumentation (GRAPNEL/GRED) Runtime Monitoring (GRM) Visualisation (PROVE) Data Analysis (PROVE)

Performance Visualisation Scalability (Data handling) Scalability (Data handling) Source Code Instrumentation Source Code Instrumentation Versatility (Visualisation) Versatility (Visualisation) Evaluation Criteria

Source Code Instrumentation Manual or Automatic Monitoring modes Filtering Click-back facility Selectable program units Individual Events On/off facility Statistics

Scalability Data Acquisition Data Analysis & Display Turning tracing on/off Filtering Zooming Filtering Interactive Non-Interactive VISTOPNupshot

Versatility Interoperate with other tools Different views Event views No Statistics views Yes

Standalone Performance Analysis Tools VAMPIR Pablo ParaGraph AIMS Paradyn

VAMPIR

Integrated Performance Analysis Tools VISTOP (TOPSYS) PVMVis (EDPEPPS) PROVE (GRADE)

Source Code Instrumentation Automatic Monitoring modes Filtering Click-back facility Selectable program units Individual Events On/off facility Statistics

Source Code Instrumentation Automatic Monitoring modes Filtering Click-back facility Selectable program units Individual Events On/off facility Statistics

Source code click-back facility and click-forward facility

Scalability Data Acquisition Data Analysis & Display Turning tracing on/off Filtering Zooming Filtering Interactive Non-Interactive

Scalability Data Acquisition Data Analysis & Display Turning tracing on/off Filtering Zooming Filtering Interactive Non-Interactive

Behaviour Window of PROVE  Scrolling visualisation windows forward and backwards  User controlled focus on processors, processes and messages  Zooming, event filtering facilities

Scalability Data Acquisition Data Analysis & Display Turning tracing on/off Filtering Zooming Filtering Interactive Non-Interactive

Filtering in PROVE

Versatility Interoperate with other tools Different views Event views No Statistics views Yes

PROVE Performance analyser Various views for displaying performance information  Synchronised multi-window visualisation

PROVE Summary Windows  Various views for displaying summary information  Synchronised multi-window visualisation

PROVE Statistics Windows  Profiling based on counters  Analysis of very long running programs is enabled

Versatility Interoperate with other tools Different views Event views No Statistics views Yes P-GRADE

The GRM Monitor Off-line monitoring (GRADE) –stores trace events in a (local or global) storage and –makes it available after execution for post-mortem processing. Semi-on-line monitoring (P-GRADE) –stores trace events in a storage but –makes it available for the visualisation tool any time during execution if the user asks for it –interactive usage of PROVE –user can remove already inspected part of the trace –evaluation of long-running programs –macrostep debugging in P-GRADE with execution visualisation

Application-level monitor Tracing + statistics collection Semi-on-line GRM monitor

Buffer is full (to a certain threshold) Trace collection MM LM Process 1Process 2Process 3 Trace file Process notifies LM LM notifies MM MM asks all LMs to stop application MM for each LM: asks each LM to send trace sets timestamps to a global time writes trace into the trace file receives trace from LM MM asks LMs to continue application Trace file

Portability Supported Hardware/Software Platforms  Workstation clusters  SGI MIPS  / IRIX  5.x/6.x (MTA SZTAKI, Univ. of Vienna)  Sun  UltraSPARC  / Solaris  2.x (Univ. of Athens)  Intel  x86 / Linux (MTA SZTAKI)  Supercomputers  Hitachi  SR2201 / HI-UX/MPP  (Polish-Japanese School, Warsaw)  Cray  T3E  / UNICOS  (Jülich, Germany)

International installations Current –UK –Austria –Spain –Portugal –Poland –Germany –Slovakia –Greece –Japan –Mexico –USA Planned –Australia –Korea

Further Developments Family of parallel programming environments P-GRADEVisualMPVisualGrid - checkpointing - dynamic load balancing - fault tolerance - grid resource management - grid monitoring - mobile processes

Conclusion Current applications in physics –Efficency lost due to high level graphical programming is less than 2 % Weather forecast application under development Download version: – P-GRADE (Professional GRADE) –Project with Silicon Graphics Hungary –Current developments to support SPMD style programming Object based programming

Thank You... ?