© 2003 Mercury Computer Systems, Inc. Data Reorganization Interface (DRI) Kenneth Cain Jr. Mercury Computer Systems, Inc. High Performance Embedded Computing.

Slides:

Advertisements

Similar presentations

Network II.5 simulator ..

Advertisements

MPI Message Passing Interface

Database Architectures and the Web

CoMPI: Enhancing MPI based applications performance and scalability using run-time compression. Rosa Filgueira, David E.Singh, Alejandro Calderón and Jesús.

Implementation of 2-D FFT on the Cell Broadband Engine Architecture William Lundgren Gedae), Kerry Barnes (Gedae), James Steed (Gedae)

Master/Slave Architecture Pattern Source: Pattern-Oriented Software Architecture, Vol. 1, Buschmann, et al.

TUPEC057 Advances With Merlin – A Beam Tracking Code J. Molson, R.J. Barlow, H.L. Owen, A. Toader MERLIN is a.

1 Coven a Framework for High Performance Problem Solving Environments Nathan A. DeBardeleben Walter B. Ligon III Sourabh Pandit Dan C. Stanzione Jr. Parallel.

MotoHawk Training Model-Based Design of Embedded Systems.

1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.

INTRODUCTION OS/2 was initially designed to extend the capabilities of DOS by IBM and Microsoft Corporations. To create a single industry-standard operating.

1 Parallel Computing—Introduction to Message Passing Interface (MPI)

Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.

Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.

Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.

Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.

High Performance Communication using MPJ Express 1 Presented by Jawad Manzoor National University of Sciences and Technology, Pakistan 29 June 2015.

Big Kernel: High Performance CPU-GPU Communication Pipelining for Big Data style Applications Sajitha Naduvil-Vadukootu CSC 8530 (Parallel Algorithms)

Mapping Techniques for Load Balancing

Implementation of Digital Front End Processing Algorithms with Portability Across Multiple Processing Platforms September 20-21, 2011 John Holland, Jeremy.

© 2002 Mercury Computer Systems, Inc. © 2002 MPI Software Technology, Inc. For public release. Data Reorganization Interface (DRI) Overview Kenneth Cain.

Operating Systems Operating System

Discussion and conclusion The OGC SOS describes a global standard for storing and recalling sensor data and the associated metadata. The standard covers.

Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.

CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.

1 Developing Native Device for MPJ Express Advisor: Dr. Aamir Shafi Co-advisor: Ms Samin Khaliq.

© 2005 Mercury Computer Systems, Inc. Yael Steinsaltz, Scott Geaghan, Myra Jean Prelle, Brian Bouzas,

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

The “State” and “Future” of Middleware for HPEC Anthony Skjellum RunTime Computing Solutions, LLC & University of Alabama at Birmingham HPEC 2009.

Low-Power Wireless Sensor Networks

View-Oriented Parallel Programming for multi-core systems Dr Zhiyi Huang World 45 Univ of Otago.

Computational Design of the CCSM Next Generation Coupler Tom Bettge Tony Craig Brian Kauffman National Center for Atmospheric Research Boulder, Colorado.

© 2001 Mercury Computer Systems, Inc. Real-Time Geo- Registration on High-Performance Computers Alan Chao Monica Burke ALPHATECH, Inc. High Performance.

© 2008 Mercury Computer Systems, Inc. Converged Sensor Network Architecture (CSNA) Dr. Ian Dunn, Michael Desrochers, Robert Cooper – Mercury.

AN EXTENDED OPENMP TARGETING ON THE HYBRID ARCHITECTURE OF SMP-CLUSTER Author ： Y. Zhao 、 C. Hu 、 S. Wang 、 S. Zhang Source ： Proceedings of the 2nd IASTED.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

ESC499 – A TMD-MPI/MPE B ASED H ETEROGENEOUS V IDEO S YSTEM Tony Zhou, Prof. Paul Chow April 6 th, 2010.

© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.

Cousins HPEC 2002 Session 4: Emerging High Performance Software David Cousins Division Scientist High Performance Computing Dept. Newport,

Latest news on JXTA and JuxMem-C/DIET Mathieu Jan GDS meeting, Rennes, 11 march 2005.

1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.

A Software Solution for the Control, Acquisition, and Storage of CAPTAN Network Topologies Ryan Rivera, Marcos Turqueti, Alan Prosser, Simon Kwan Electronic.

Compilers for Embedded Systems Ram, Vasanth, and VJ Instructor : Dr. Edwin Sha Synthesis and Optimization of High-Performance Systems.

Scott Ferguson Section 1

Improving I/O with Compiler-Supported Parallelism Why Should We Care About I/O? Disk access speeds are much slower than processor and memory access speeds.

Sep 08, 2009 SPEEDUP – Optimization and Porting of Path Integral MC Code to New Computing Architectures V. Slavnić, A. Balaž, D. Stojiljković, A. Belić,

Introduction to OpenMP Eric Aubanel Advanced Computational Research Laboratory Faculty of Computer Science, UNB Fredericton, New Brunswick.

Hardware Benchmark Results for An Ultra-High Performance Architecture for Embedded Defense Signal and Image Processing Applications September 29, 2004.

An Open Architecture for an Embedded Signal Processing Subsystem

Slide-1 Multicore Theory MIT Lincoln Laboratory Theory of Multicore Algorithms Jeremy Kepner and Nadya Bliss MIT Lincoln Laboratory HPEC 2008 This work.

1 WRB 09/02 HPEC Lincoln Lab Sept 2002 Poster B: Software Technologies andSystems W. Robert Bernecky Naval Undersea Warfare Center Ph: (401) Fax:

1 VSIPL++: Parallel Performance HPEC 2004 CodeSourcery, LLC September 30, 2004.

Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.

High Performance Flexible DSP Infrastructure Based on MPI and VSIPL 7th Annual Workshop on High Performance Embedded Computing MIT Lincoln Laboratory

© 2002 Mercury Computer Systems, Inc. © 2002 MPI Software Technology, Inc. Data Reorganization Interface (DRI) Kenneth Cain Jr., Mercury Computer Systems,

PADS Power Aware Distributed Systems Architecture Approaches – Deployable Platforms & Reconfigurable Power-aware Comm. USC Information Sciences Institute.

Parallelizing Functional Tests for Computer Systems Using Distributed Graph Exploration Alexey Demakov, Alexander Kamkin, and Alexander Sortov

MIT Lincoln Laboratory 1 of 4 MAA 3/8/2016 Development of a Real-Time Parallel UHF SAR Image Processor Matthew Alexander, Michael Vai, Thomas Emberley,

1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.

PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.

Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.

High Performance Embedded Computing Software Initiative (HPEC-SI)

M. Richards1 ,D. Campbell1 (presenter), R. Judd2, J. Lebak3, and R

MPI Software Technology, Inc. VSIPL for Diverse Architectures

FPGAs & Software Components

Konstantis Daloukas Nikolaos Bellas Christos D. Antonopoulos

MPJ: A Java-based Parallel Computing System

VSIPL++: Parallel Performance HPEC 2004

An XML-based System Architecture for IXA/IA Intercommunication

Presentation transcript:

© 2003 Mercury Computer Systems, Inc. Data Reorganization Interface (DRI) Kenneth Cain Jr. Mercury Computer Systems, Inc. High Performance Embedded Computing (HPEC) Conference September 2003 On behalf of the Data Reorganization Forum

© 2003 Mercury Computer Systems, Inc. 2 Status update for the DRI-1.0 standard since Sep publication Outline, Acknowledgements l DRI Overview. l Highlights of First DRI Demonstration. wCommon Imagery Processor (Brian Sroka, MITRE). l Vendor Status. wMercury Computer Systems, Inc. wMPI Software Technology, Inc. (Anthony Skjellum). wSKY Computers, Inc. (Stephen Paavola).

© 2003 Mercury Computer Systems, Inc. 3 What is DRI? l Partition for data-parallel processing wDivide multi-dimensional dataset across processes wWhole, block, block-cyclic partitioning wOverlapped data elements in partitioning wProcess group topology specification l Redistribute data to next processing stage wMulti-point data transfer with single function call wMulti-buffered to enable communication / computation overlap wPlanned transfers for higher performance Standard API that complements existing communication middleware

First DRI-based Demonstration Common Imagery Processor (CIP) Conducted by Brian Sroka of The MITRE Corporation

© 2003 Mercury Computer Systems, Inc. 5 CIP and APG-73 Background CIP l The primary sensor processing element of the Common Imagery Ground/Surface System (CIGSS) l Processes imagery data into exploitable image, outputs to other CIGSS elements l A hardware independent software architecture supporting multi-sensor processing capabilities l Prime Contractor: Northrop Grumman, Electronic Sensor Systems Sector l Enhancements directed by CIP Cross- Service IPT, Wright Patterson AFB APG-73 SAR component of F/A-18 Advanced Tactical Airborne Reconnaissance System (ATARS) Imagery from airborne platforms sent to TEG via Common Data Link Source: “HPEC-SI Demonstration: Common Imagery Processor (CIP) Demonstration -- APG73 Image Formation”, Brian Sroka, The MITRE Corporation, HPEC 2002

© 2003 Mercury Computer Systems, Inc. 6 APG-73 Data Reorganization (1) Block data distribution P0P0 P1P1 Range Cross Range %16 P N-1 PNPN P0P0 P1P1 PNPN Block data distribution Source No overlap Blocks of cross-range Full range Destination No overlap Full cross-range Mod-16 length blocks of range Source: “CIP APG-73 Demonstration: Lessons Learned”, Brian Sroka, The MITRE Corporation, March 2003 HPEC-SI meeting

© 2003 Mercury Computer Systems, Inc. 7 APG-73 Data Reorganization (2) %16 Range Cross-Range P0P0 P1P1 PNPN Block data distribution Source No overlap Full cross-range Mod-16 length blocks of range cells Destination 7 points right overlap Full cross-range Blocked portion of range cells P0P0 P1P1 PNPN Block with overlap Source: “CIP APG-73 Demonstration: Lessons Learned”, Brian Sroka, The MITRE Corporation, March 2003 HPEC-SI meeting

© 2003 Mercury Computer Systems, Inc. 8 DRI Use in CIP APG-73 SAR Simple transition to DRI l #pragma splits loop over global data among threads l DRI: loop over local data Demonstration completed Demonstrations underway MITRE DRI MPI * Application Mercury PAS/DRI Application SKY MPICH/DRI Application * MPI/Pro (MSTI) and MPICH demonstrated Range compression Inverse weighting for Azimuth compression Inverse weighting for Side-lobe clutter removal Amplitude detection Image data compression for DRI-2: Overlap exchange DRI-1: Cornerturn DRI Implementations Used Source: “HPEC-SI Demonstration: Common Imagery Processor (CIP) Demonstration -- APG73 Image Formation”, Brian Sroka, The MITRE Corporation, HPEC 2002 workshop

© 2003 Mercury Computer Systems, Inc. 9 Source Lines Of Code (SLOCs) l 5% SLOC increase for DRI includes code for: w2 scatter / gather reorgs w3 cornerturn data reorg cases w3 overlap exchange data reorg cases wmanaging interoperation between DRI and VSIPL libraries Portability: SLOC Comparison ~1% Increase~5% Increase VSIPLDRI VSIPLSequential Shared Memory VSIPL Source: “HPEC-SI Demonstration: Common Imagery Processor (CIP) Demonstration -- APG73 Image Formation”, Brian Sroka, The MITRE Corporation, HPEC 2002 workshop 1 for interleaved complex + 2 for split complex data format Using DRI requires much less source code than manual distributed-memory implementation

© 2003 Mercury Computer Systems, Inc. 10 CIP APG-73 DRI Conclusions Source: “HPEC-SI Demonstration: Common Imagery Processor (CIP) Demonstration -- APG73 Image Formation”, Brian Sroka, The MITRE Corporation, HPEC 2002 Source: “High Performance Embedded Computing Software Initiative (HPEC-SI)”, Dr. Jeremy Kepner, MIT Lincoln Laboratory, June 2003 HPEC-SI meeting l Applying DRI to operational software does not greatly affect software lines of code l DRI greatly reduces complexity of developing portable distributed-memory software (shared-memory transition easy) l Communication code in DRI estimated 6x smaller SLOCs than if implemented with MPI manually l No code changed to retarget application (MITRE DRI on MPI) l Features missing from DRI: wSplit complex wDynamic (changing) distributions wRound-robin distributions wPiecemeal data production / consumption wNon-CPU endpoints Future needs

Vendor DRI Status Mercury Computer Systems, Inc. MPI Software Technology, Inc. SKY Computers, Inc.

© 2003 Mercury Computer Systems, Inc. 12 Mercury Computer Systems (1/2) l Commercially available in PAS (Jul-03) wParallel Acceleration System (PAS) middleware product wDRI interface to existing PAS features The vast majority of DRI-1.0 is supported Not yet supported: block-cyclic, toroidal, some replication l Additional PAS features compatible with DRI wOptional: applications can use PAS and DRI APIs together l Applications can use MPI & PAS/DRI wExample: independent use of PAS/DRI and MPI libraries by the same application is possible (libraries not integrated)

© 2003 Mercury Computer Systems, Inc. 13 Mercury Computer Systems (2/2) DRI Adds No Significant Overhead DRI Achieves PAS Performance! l PAS communication features: wUser-driven buffering & synchronization wDynamically changing transfer attributes wDynamic process sets wI/O or memory device integration wTransfer only a Region of Interest (ROI) l PAS communication features: wUser-driven buffering & synchronization wDynamically changing transfer attributes wDynamic process sets wI/O or memory device integration wTransfer only a Region of Interest (ROI) Standard DRI_Distribution Object Standard DRI_Blockinfo Object Hybrid use of PAS and DRI APIs Built on Existing PAS Performance

© 2003 Mercury Computer Systems, Inc. 14 MPI Software Technology, Inc. l MPI Software Technology has released its ChaMPIon/Pro (MPI-2.1 product) this spring l Work now going on to provide DRI “in MPI clothing” as add-on to ChaMPIon/Pro l Confirmed targets are as follows: wLinux clusters with TCP/IP, Myrinet, InfiniBand wMercury RACE/RapidIO Multicomputers l Access to early adopters: 1Q04 l More info available from: (Tony Skjellum)

© 2003 Mercury Computer Systems, Inc. 15 SKY Computers, Inc. (1/2) Initial Implementation l Experimental version implemented for SKYchannel l Integrated with MPI l Achieving excellent performance for system sizes at least through 128 processors

© 2003 Mercury Computer Systems, Inc. 16 Sky Computers, Inc. (2/2) SKY’s Plans l Fully supported implementation with SMARTpac l Part of SKY’s plans for standards compliance l Included with MPI library l Optimized InfiniBand performance