Runtime Support for Irregular Computations in MPI-Based Applications - CCGrid 2015 Doctoral Symposium - Xin Zhao *, Pavan Balaji † (Co-advisor), William.

Slides:

Advertisements

Similar presentations

MPI3 RMA William Gropp Rajeev Thakur. 2 MPI-3 RMA Presented an overview of some of the issues and constraints at last meeting Homework - read Bonachea's.

Advertisements

RMA Considerations for MPI-3.1 (or MPI-3 Errata)

MPI-INTEROPERABLE GENERALIZED ACTIVE MESSAGES Xin Zhao, Pavan Balaji, William Gropp, Rajeev Thakur University of Illinois at Urbana-Champaign Argonne National.

Building Algorithmically Nonstop Fault Tolerant MPI Programs Rui Wang, Erlin Yao, Pavan Balaji, Darius Buntinas, Mingyu Chen, and Guangming Tan Argonne.

Endpoints Proposal Update Jim Dinan MPI Forum Hybrid Working Group June, 2014.

Technology Drivers Traditional HPC application drivers – OS noise, resource monitoring and management, memory footprint – Complexity of resources to be.

Natively Supporting True One-sided Communication in MPI on Multi-core Systems with InfiniBand G. Santhanaraman, P. Balaji, K. Gopalakrishnan, R. Thakur,

Enabling MPI Interoperability Through Flexible Communication Endpoints

Revoke / Incarnation #s / Matching Discussion around how to reclaim context IDs (resources that are a part of message matching) after an MPI_Comm_revoke.

Advancing the “Persistent” Working Group MPI-4 Tony Skjellum June 5, 2013.

Chorus and other Microkernels Presented by: Jonathan Tanner and Brian Doyle Articles By: Jon Udell Peter D. Varhol Dick Pountain.

TOWARDS ASYNCHRONOUS AND MPI-INTEROPERABLE ACTIVE MESSAGES Xin Zhao, Darius Buntinas, Judicael Zounmevo, James Dinan, David Goodell, Pavan Balaji, Rajeev.

Toward Efficient Support for Multithreaded MPI Communication Pavan Balaji 1, Darius Buntinas 1, David Goodell 1, William Gropp 2, and Rajeev Thakur 1 1.

Communication Analysis of Parallel 3D FFT for Flat Cartesian Meshes on Large Blue Gene Systems A. Chan, P. Balaji, W. Gropp, R. Thakur Math. and Computer.

Sparse Triangular Solve in UPC By Christian Bell and Rajesh Nishtala.

OPTIMIZATION STRATEGIES FOR MPI-INTEROPERABLE ACTIVE MESSAGES Xin Zhao, Pavan Balaji, William Gropp, Rajeev Thakur University of Illinois at Urbana-Champaign.

Dr. Gengbin Zheng and Ehsan Totoni Parallel Programming Laboratory University of Illinois at Urbana-Champaign April 18, 2011.

Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,

Argonne National Laboratory School of Computing and SCI Institute, University of Utah Formal Verification of Programs That Use MPI One-Sided Communication.

Sangmin Seo, Robert Latham, Junchao Zhang, Pavan Balaji Argonne National Laboratory {sseo, robl, jczhang, May 4, 2015 Implementation and.

Argonne National Laboratory is a U.S. Department of Energy laboratory managed by U Chicago Argonne, LLC. The experimental resources for this research were.

Lessons Learned Implementing User-Level Failure Mitigation in MPICH Wesley Bland, Huiwei Lu, Sangmin Seo, Pavan Balaji Argonne National Laboratory User-level.

Bulk Synchronous Parallel (BSP) Model Illustration of a BSP superstep.

Topology Aware Mapping for Performance Optimization of Science Applications Abhinav S Bhatele Parallel Programming Lab, UIUC.

MPI at Exascale Rajeev Thakur Mathematics and Computer Science Division Argonne National Laboratory.

P. Balaji, S. Bhagvat, D. K. Panda, R. Thakur, and W. Gropp

Optimizing Threaded MPI Execution on SMP Clusters Hong Tang and Tao Yang Department of Computer Science University of California, Santa Barbara.

Charm++ Load Balancing Framework Gengbin Zheng Parallel Programming Laboratory Department of Computer Science University of Illinois at.

Non-Collective Communicator Creation in MPI James Dinan 1, Sriram Krishnamoorthy 2, Pavan Balaji 1, Jeff Hammond 1, Manojkumar Krishnan 2, Vinod Tipparaju.

ET E.T. International, Inc. X-Stack: Programming Challenges, Runtime Systems, and Tools Brandywine Team May2013.

A Unified Modeling Framework for Distributed Resource Allocation of General Fork and Join Processing Networks in ACM SIGMETRICS

Dynamic Time Variant Connection Management for PGAS Models on InfiniBand Abhinav Vishnu 1, Manoj Krishnan 1 and Pavan Balaji 2 1 Pacific Northwest National.

The Global View Resilience Model Approach GVR (Global View for Resilience) Exploits a global-view data model, which enables irregular, adaptive algorithms.

© 2010 IBM Corporation Enabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems Gabor Dozsa 1, Sameer Kumar 1, Pavan Balaji 2,

Semantics-based Distributed I/O with the ParaMEDIC Framework P. Balaji, W. Feng, H. Lin Math. and Computer Science, Argonne National Laboratory Computer.

Eighth International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2) Yong Chen PavanBalaji Abhinav Vishnu P2S2.

DMA-Assisted, Intranode Communication in GPU-Accelerated Systems Feng Ji*, Ashwin M. Aji†, James Dinan‡, Darius Buntinas‡, Pavan Balaji‡, Rajeev Thakur‡,

Argonne National Laboratory is a U.S. Department of Energy laboratory managed by U Chicago Argonne, LLC. Xin Zhao *, Pavan Balaji † (Co-advisor) and William.

Analysis of Topology-Dependent MPI Performance on Gemini Networks Antonio J. Peña, Ralf G. Correa Carvalho, James Dinan, Pavan Balaji, Rajeev Thakur, and.

Min Si[1][2], Antonio J. Peña[1], Jeff Hammond[3], Pavan Balaji[1],

Abdelhalim Amer *, Huiwei Lu *, Pavan Balaji *, Satoshi Matsuoka + *Argonne National Laboratory, IL, USA +Tokyo Institute of Technology, Tokyo, Japan Characterizing.

Rakhi Anand Optimizing the Execution of Parallel Applications in Volunteer Environments Parallel Software Technologies Laboratory Department of Computer.

Stochastic optimization of energy systems Cosmin Petra Argonne National Laboratory.

A Fault Tolerant Protocol for Massively Parallel Machines Sayantan Chakravorty Laxmikant Kale University of Illinois, Urbana-Champaign.

Supercomputing ‘99 Parallelization of a Dynamic Unstructured Application using Three Leading Paradigms Leonid Oliker NERSC Lawrence Berkeley National Laboratory.

Copyright: Abhinav Vishnu Fault-Tolerant Communication Runtime Support for Data-Centric Programming Models Abhinav Vishnu 1, Huub Van Dam 1, Bert De Jong.

Message Passing in Massively Multithreaded Environments Pavan Balaji Computer Scientist and Project Lead Argonne National Laboratory.

High-Level, One-Sided Models on MPI: A Case Study with Global Arrays and NWChem James Dinan, Pavan Balaji, Jeff R. Hammond (ANL); Sriram Krishnamoorthy.

Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.

PMI: A Scalable Process- Management Interface for Extreme-Scale Systems Pavan Balaji, Darius Buntinas, David Goodell, William Gropp, Jayesh Krishna, Ewing.

Scaling NWChem with Efficient and Portable Asynchronous Communication in MPI RMA Min Si [1][2], Antonio J. Peña [1], Jeff Hammond [3], Pavan Balaji [1],

Wesley Bland, Huiwei Lu, Sangmin Seo, Pavan Balaji Argonne National Laboratory {wbland, huiweilu, sseo, May 5, 2015 Lessons Learned Implementing.

Efficient Multithreaded Context ID Allocation in MPI James Dinan, David Goodell, William Gropp, Rajeev Thakur, and Pavan Balaji.

MPI: Portable Parallel Programming for Scientific Computing William Gropp Rusty Lusk Debbie Swider Rajeev Thakur.

FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture FIT5174 Distributed & Parallel Systems Lecture 5 Message Passing and MPI.

HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

Endpoints Plenary James Dinan Hybrid Working Group December 10, 2013.

A uGNI-Based Asynchronous Message- driven Runtime System for Cray Supercomputers with Gemini Interconnect Yanhua Sun, Gengbin Zheng, Laximant(Sanjay) Kale.

Message Passing Computing 1 iCSC2015,Helvi Hartmann, FIAS Message Passing Computing Lecture 2 Message Passing Helvi Hartmann FIAS Inverted CERN School.

Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.

Gengbin Zheng Xiang Ni Laxmikant V. Kale Parallel Programming Lab University of Illinois at Urbana-Champaign.

MPI on a Million Processors Pavan Balaji, 1 Darius Buntinas, 1 David Goodell, 1 William Gropp, 2 Sameer Kumar, 3 Ewing Lusk, 1 Rajeev Thakur, 1 Jesper.

Transparent Accelerator Migration in Virtualized GPU Environments Shucai Xiao 1, Pavan Balaji 2, James Dinan 2, Qian Zhu 3, Rajeev Thakur 2, Susan Coghlan.

OpenMP Runtime Extensions Many core Massively parallel environment Intel® Xeon Phi co-processor Blue Gene/Q MPI Internal Parallelism Optimizing MPI Implementation.

1 Advanced MPI William D. Gropp Rusty Lusk and Rajeev Thakur Mathematics and Computer Science Division Argonne National Laboratory.

Synergy.cs.vt.edu VOCL: An Optimized Environment for Transparent Virtualization of Graphics Processing Units Shucai Xiao 1, Pavan Balaji 2, Qian Zhu 3,

TensorFlow– A system for large-scale machine learning

Abstract Machine Layer Research in VGrADS

Parallel Applications And Tools For Cloud Computing Environments

Bulk Synchronous Parallel (BSP) Model Illustration of a BSP superstep.

Presentation transcript:

Runtime Support for Irregular Computations in MPI-Based Applications - CCGrid 2015 Doctoral Symposium - Xin Zhao *, Pavan Balaji † (Co-advisor), William Gropp * (Advisor) * University of Illinois at Urbana-Champaign, {xinzhao3, † Argonne National Laboratory,

Irregular Applications  “Traditional” applications  Organized around regular data structures: dense vectors or matrices  Regular data movement pattern, use MPI SEND/RECV or collectives  Irregular applications  Organized around graphs, sparse vectors, more “data driven” in nature  Data movement pattern is irregular and data-dependent  Research goal  Answer the question: where MPI would lie on “the spectrum of suitability”?  Propose what if anything needs to change to efficiently support irregular applications completely suitable not suitable at all MPI? 2

Main Concerns of MPI with Irregular Applications  Scalability  Can MPI runtime be scalable when running irregular application with large problem size and on large scale?  Performance of fine-grained operations  Can MPI runtime be lightweight enough to handle massive fine-grained data movements commonly used in irregular applications?  MPI communication semantics  Can MPI library absorb a mechanism of integrating data movement and computation? two-sided communication rank 0 rank 1 SEND RECEIVE SENDRECEIVE data data process execution process data and execution process node 0 node 1 node 2 node 0 node 1 node 0 node 1 integrating data and computation 3

Plan of Study AM input data AM output data RMA window origin input bufferorigin output buffer target input buffertarget output buffer target persistent buffer private memory AM handler MPI-AM workflow  Integrated data and computation management  Generalized MPI-interoperable Active Messages framework (MPI-AM)  Optimizing MPI-AM for different application scenarios  Asynchronous processing in MPI-AM Correctness semantics Streaming AMs Scalable resource management Scalable and sustainable resource supply Tradeoff between scalability and performance Support hardware-based RMA operations Algorithmic choices for RMA synchronization 4  Addressing scalability and performance limitations in massive asynchronous communication  Tackling scalability challenges in MPI runtime  Optimizing MPI runtime for fine-grained operations MPI runtime MPI standard Buffer management Asynchronous processing Compatible with MPI-3 mpich ran out of memory at small scale

Thanks! [In process of PPOPP’16] Addressing Scalability Limitations in MPI RMA Infrastructure. Xin Zhao, Pavan Balaji, William Gropp [SC’14] Nonblocking Epochs in MPI One-Sided Communication. Judicael Zounmevo, Xin Zhao, Pavan Balaji, William Gropp, Ahmad Afsahi. Best Paper Finalist [EuroMPI’12] Adaptive Strategy for One-sided Communication in MPICH2. Xin Zhao, Gopalakrishnan Santhanaraman, William Gropp [EuroMPI’11] Scalable Memory Use in MPI: A Case Study with MPICH2. David Goodell, William Gropp, Xin Zhao, Rajeev Thakur [ICPADS’13] MPI-Interoperable Generalized Active Messages. Xin Zhao, Pavan Balaji, William Gropp, Rajeev Thakur [ScalCom’13] Optimization Strategies for MPI-Interoperable Active Messages. Xin Zhao, Pavan Balaji, William Gropp, Rajeev Thakur. Best Paper Award [CCGrid’13] Towards Asynchronous and MPI-Interoperable Active Messages. Xin Zhao, Darius Buntinas, Judicael Zounmevo, James Dinan, David Goodell, Pavan Balaji, Rajeev Thakur, Ahmad Afsahi, William Gropp