MPI3 RMA William Gropp Rajeev Thakur. 2 MPI-3 RMA Presented an overview of some of the issues and constraints at last meeting Homework - read Bonachea's.

Slides:



Advertisements
Similar presentations
Proposal (More) Flexible RMA Synchronization for MPI-3 Hubert Ritzdorf NEC–IT Research Division
Advertisements

Read Modify Write for MPI Howard Pritchard (Cray) Galen Shipman (ORNL)
MPI 2.2 William Gropp. 2 Scope of MPI 2.2 Small changes to the standard. A small change is defined as one that does not break existing correct MPI 2.0.
MPI 3 RMA Bill Gropp and Rajeev Thakur. 2 Why Change RMA? Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations.
RMA Working Group Update. Status Three proposals submitted over They will be discussed in the working group meeting on Wednesday,
2 MPI 2.2 Discussed Ground Rules with Examples Agenda for the next meeting will be drafted at the interim telecon Drafts of Changes Karl to propose sparse.
MPI ABI Working Group status March 10, 2008 Jeff Brown, chair (LANL)
Tintu David Joy. Agenda Motivation Better Verification Through Symmetry-basic idea Structural Symmetry and Multiprocessor Systems Mur ϕ verification system.
RMA Considerations for MPI-3.1 (or MPI-3 Errata)
Enabling MPI Interoperability Through Flexible Communication Endpoints
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
Virtual Memory Primitives for User Programs Andrew W. Appel and Kai Li Presented by Phil Howard.
Extensibility, Safety and Performance in the SPIN Operating System Presented by Allen Kerr.
TOWARDS ASYNCHRONOUS AND MPI-INTEROPERABLE ACTIVE MESSAGES Xin Zhao, Darius Buntinas, Judicael Zounmevo, James Dinan, David Goodell, Pavan Balaji, Rajeev.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Toward Efficient Support for Multithreaded MPI Communication Pavan Balaji 1, Darius Buntinas 1, David Goodell 1, William Gropp 2, and Rajeev Thakur 1 1.
The Stanford Directory Architecture for Shared Memory (DASH)* Presented by: Michael Bauer ECE 259/CPS 221 Spring Semester 2008 Dr. Lebeck * Based on “The.
The Triumph of Hope over Experience * ? Bill Gropp *Samuel Johnson.
1  1998 Morgan Kaufmann Publishers Chapter 9 Multiprocessors.
1 Lecture 21: Synchronization Topics: lock implementations (Sections )
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
CSE 466 – Fall Introduction - 1 Implementation of Shared Memory  Considerations  Network traffic due to create/read/write  Latency of create/read/write.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
RFC: Breaking Free from the Collective Requirement for HDF5 Metadata Operations.
Argonne National Laboratory is a U.S. Department of Energy laboratory managed by U Chicago Argonne, LLC. Xin Zhao *, Pavan Balaji † (Co-advisor) and William.
Synchronization with shared memory AMANO, Hideharu Textbook pp
Virtual Memory Virtual Memory is created to solve difficult memory management problems Data fragmentation in physical memory: Reuses blocks of memory.
Non-Data-Communication Overheads in MPI: Analysis on Blue Gene/P P. Balaji, A. Chan, W. Gropp, R. Thakur, E. Lusk Argonne National Laboratory University.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
MPI: Portable Parallel Programming for Scientific Computing William Gropp Rusty Lusk Debbie Swider Rajeev Thakur.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
Porting processes to threads with MPC instead of forking Some slides from Marc Tchiboukdjian (IPDPS’12) : Hierarchical Local Storage Exploiting Flexible.
(Superficial!) Review of Uniprocessor Architecture Parallel Architectures and Related concepts CS 433 Laxmikant Kale University of Illinois at Urbana-Champaign.
Threaded Programming Lecture 1: Concepts. 2 Overview Shared memory systems Basic Concepts in Threaded Programming.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
August 13, 2001Systems Architecture II1 Systems Architecture II (CS ) Lecture 11: Multiprocessors: Uniform Memory Access * Jeremy R. Johnson Monday,
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04.
Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
Unified Parallel C at LBNL/UCB UPC at LBNL/U.C. Berkeley Overview Kathy Yelick LBNL and U.C. Berkeley.
Multiprocessors – Locks
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Parallel I/O Optimizations
Distributed Shared Memory
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Lecture 11: Consistency Models
Advanced Operating Systems
CMSC 611: Advanced Computer Architecture
MPI-Message Passing Interface
Lecture 21: Synchronization and Consistency
Shared Memory Programming
Lecture Topics: 11/1 General Operating System Concepts Processes
Lecture: Coherence and Synchronization
Lecture 25: Multiprocessors
Prof. Leonardo Mostarda University of Camerino
Lecture 10: Consistency Models
Concurrency and Immutability
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 24: Multiprocessors
Update : about 8~16% are writes
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
Types of Parallel Computers
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 11: Consistency Models
Presentation transcript:

MPI3 RMA William Gropp Rajeev Thakur

2 MPI-3 RMA Presented an overview of some of the issues and constraints at last meeting Homework - read Bonachea's "Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations", Called for alternative proposals; several volunteers but no activity on mailing lists Note that we're waiting on proposals Breakout will be used to sign up members and to identify subgroups to develop proposals Possible topics Address issues in Bonachea's paper (e.g., memory model, address-space accessibility) Exploit different memory models (e.g., cache coherence or other consistency models) Minor updates (e.g., need for put-ack) Extending accumulate to some kind of user-defined operations Extending accumulate to support read-modify-write operations Active message (and why not with threads/point-to-point?)

MPI RMA Breakout

4 Agenda Sign up for the MPI3 RMA Mailing list! Status of Efforts Launched at last Meeting Call for new topics Discussion of possible topics

5 Issues Raised in Bonachea's Paper Address-space accessibility Collective creation Potential limitations on passive target window memory Memory model Erroneous vs. undefined behavior Local updates vs remote updates Passive target allows only one target process cannot easily atomically update memory owned by several processes Note that MPIs choices made to reflect hardware realities at the time (some of which still apply, some of which may return)

6 Exploit different memory models Specialization for cache coherence Specialization for release consistency, etc.

7 Minor updates Option to turn off the required ack for put

8 Extending accumulate to some kind of user- defined operations Homogenous platforms only? Limit to some datatypes (e.g., integers)?

9 Extending accumulate to support read-modify-write operations Fetch and increment Compare and swap Transactional memory (e.g., update or not, atomically)

10 Active message Why not with threads and point-to- point?