2Minor wording correction In our semantics and correctness section, for the SEPARATE memory model, we state that a PUT/ACCUMULATE must not access a target window once a local update or PUT/GET to an overlapping window has started.Pg. 455, lines 32-39Recommendation: we change this to "... to the same or an overlapping window has started", since it obviously applies to the same window as well, and not just overlapping windows.
3Minor wording correction (contd.) The standard currently states, "A put or accumulate must not access a target window once a load/store update or a put ... ".The term "load/store update" has caused a major confusion even to the RMA working group, for example on the slide that we put together for compatibility between load, store, put, get, accumulate, etc.Recommendation: It should really say "load/store accesses" -- that is, simultaneous loads and PUTs are not allowed in the SEPARATE model.
4MPI_MODE_NOLOAD We currently have an assertion for MPI_MODE_NOSTORE Recommendation: Add MPI_MODE_NOLOADOptimization for the SEPARATE memory modelLogically, when the epoch starts, the public window is copied into the private window; when the epoch ends, the private window is copied into the public window (will be the actual implementation for software managed caches)MPI_MODE_NOSTORE allows the MPI implementation to ignore the copy at the end of the epochMPI_MODE_NOLOAD + MPI_MODE_NOSTORE will allow the MPI implementation to ignore the copy at the start of the epochAlternatives: we actually only need an MPI_MODE_NOLOADSTORE, but MPI_MODE_NOLOAD is cleaner and symmetric
5Info argument for req_mem_model Currently, the user does not request for a specific memory model – the MPI implementation has to decide this on its ownIf the hardware does not provide cache-coherence, the implementation is expected to return MPI_WIN_SEPARATEIf the hardware does provide cache-coherence, the memory model provided is not straight-forward:MPI_WIN_UNIFIED will make load/store inter-leaving with PUT/GET more straightforward for non-overlapping memory regionsMPI_WIN_SEPARATE will allow the MPI implementation to maintain a separate public window region (e.g., it can be allocated in symmetric address space or shared memory space)MPI_WIN_SEPARATE might be sufficient for the application if it doesn’t plan to do load/store operationsRecommendation: add an info argument req_mem_model to window creation and define UNIFIED > SEPARATE (provided value can already be queried as an attribute)
6MPI_WIN_SYNC clarification The current definition of MPI_WIN_SYNC states that it synchronizes the public and private memories of a windowThis leads us to believe that this is not required for the UNIFIED memory modelConsider this example:P P1MPI_Win_lockall MPI_Win_lockallMPI_Put(P0, X, 1);MPI_Win_flush();MPI_Barrier MPI_BarrierY = X;P0 cannot do a load operation from “X”, unless all MPI operations (MPI_Barrier in this example) do a memory barrierRecommendation: Clarify WIN_SYNC to say that it is still required in the UNIFIED memory model as well, for overlapping memory accesses
7Type-safe MPI WindowsThe disp_unit is a weird semantic which is really meant to demonstrate what datatype the application will be usingCausing confusion with the “same_size” and “same_disp_unit” hints which make little sense on heterogeneous systemsProvides MPI with no type-checking capability when the same memory location is accessed as “int” or “float” when they are of the same sizeRecommendation: provide type-safe MPI window creation routinesMPI_Win_create_type, MPI_Win_allocate_type, MPI_Win_allocate_shared_typeMPI operations restricted to the window base type or datatypes created out of this type (base type can be derived as well)Still backward compatible with the current model, if MPI_BYTE is used
8Load vs. Store distinction in the SEPARATE model The MPI standard does not distinguish load and store operations with respect to window semanticsIn the SEPARATE model, when the epoch is closed, the private memory is copied into the public memoryWhen the user passes the MPI_MODE_NOSTORE assertion, the MPI implementation is not required to do the copy, but might still copyMPI-3 states that if you start doing load/store operations on a window, you cannot do PUT/GET operations within the same epochThis is valid in general, but too restrictive when the MPI implementation is not actually doing the copyRecommendation: allow the application to use more relaxed semantics when it is not doing STORE operationsPossible solution: if we update the window creation prototypes for type-safe windows, we can consider adding a new parameter to define this