Presentation on theme: "Revoke / Incarnation #s / Matching Discussion around how to reclaim context IDs (resources that are a part of message matching) after an MPI_Comm_revoke."— Presentation transcript:
Revoke / Incarnation #s / Matching Discussion around how to reclaim context IDs (resources that are a part of message matching) after an MPI_Comm_revoke Basic problem: revoke is one-sided and can be called by multiple processes in the communicator – There is a race between calling revoke and when all correct processes update their local state to revoked – Need to ensure that all processes have revoked the communicator before context ID can be reused Scenario: – Communicator with correct processes A, B, and C is revoked – A and B free revoked communicator and create a new communicator using the old context ID – C calls revoke on the old communicator -- what happens at A and B? – OR -- C sends a message to A/B who has posted an ANY_SOURCE receive -- does it match? Several solutions were discussed: 1.Incarnation number -- An additional number on each context ID that becomes a part of the matching 2.Group guards -- Check incoming messages to ensure that the sender is in the group of the communicator 3.Fault tolerant MPI_Comm_free/create -- Enhance create/free algorithms to quiesce context IDs before they are used
RMA Semantics Pavan raised a concern about the definition of RMA window memory in the context of shared memory windows It may be impossible to guarantee that only locations updated in the window are invalid Suggested weakening the semantic to the entire window being undefined Requires further discussion
Shared Memory What happens if a process with shared memory goes down and another process has posted messages using its shared memory? – Yes this is an implementation issue, but is it possible to do anything?