Presentation is loading. Please wait.

Presentation is loading. Please wait.

Minaashi Kalyanaraman Pragya Upreti CSS 534 Parallel Programming

Similar presentations


Presentation on theme: "Minaashi Kalyanaraman Pragya Upreti CSS 534 Parallel Programming"— Presentation transcript:

1 Minaashi Kalyanaraman Pragya Upreti CSS 534 Parallel Programming
Fault Tolerance in MPI Minaashi Kalyanaraman Pragya Upreti CSS 534 Parallel Programming

2 OVERVIEW Fault Tolerance in MPI Levels of survival in MPI
Approaches to fault tolerance in MPI Advantages & disadvantages of implementing fault tolerance in MPI Extending MPI to HARNESS Why FT-MPI Implementation Comparison MPI and FT-MPI Performance consideration Conclusion Future scope

3 MPI is not fault tolerant! -Is that true?
It is a common misconception about MPI. MPI provides considerable flexibility in the handling of errors. FAULT TOLERANCE IS THE PROPERTY OF AN MPI PROGRAM! Job1 Processes in Job1 Sends MPI_ERRORS_ARE_FATAL MPI_COMM_WORLD P1 P2 P3 P4 By default other processes detect error and abort Process P2 dies Job1 Processes in Job1 MPI_COMM_WORLD P1 P2 P3 P4 Sends MPI_SUCCESS

4 Approaches to achieve fault tolerance in MPI
Levels of Survival of an MPI Implementation Level 1 – MPI implementation automatically recovers from failure and continues without significant change to its behavior. Highest Level of Survival and difficult to implement. Level 2 – The MPI implementation is notified of the problem and is prepared to take corrective action. Example: Using Intercommunicators Level 3 – In case of failure, certain MPI operations, although not all become invalid. Example: Modifying MPI Semantics, Extending MPI Level 4 – In case of failure, the MPI program can abort and be restarted from a checkpoint. Example: Checkpointing Program state of the failed process is retained for the overall computation to proceed

5 The MPI Standard and Fault Tolerance
Reliable Communication: The MPI implementation is responsible for detecting and handling network faults. The MPI implementation can retransmit the message or inform the application that an error has occurred, allowing the application to take its own corrective action. Error Handlers: Error handlers are set on communicators with MPI_Comm_set_errhandler. The default is MPI_ERRORS_ARE_FATAL and it can be changed to MPI_ERRORS_RETURN. Users can define their own error handlers and attach them to communicators.

6 ERROR HANDLING- CONTINUED
In c++ , MPI :: ERRORS_THROW_EXCEPTIONS is defined to handle the errors If an error is returned, the standard does not require that subsequent operations succeed or that they fail. Thus the standard allows implementations to take various approaches to the fault tolerance issue

7 Approach to Fault Tolerance in MPI programs
1.Checkpointing: This is a common technique that periodically saves the state of a computation, allowing the computation to be restarted from that point in the event of a failure. The cost of checkpointing is determined by, Cost to create and write checkpoint. Cost to read and restore checkpoint. Probability of failure. Time between checkpoints. Total time to run without checkpoints. Types of checkpointing: User-Directed checkpointing. System-Directed checkpointing. Advantage & disadvantage: It is easy to implement. Cost of saving and restoring checkpoints must be relatively small.

8 Approach to Fault Tolerance in MPI programs
2.Using Intercommunicators: It contains two groups of processes. All communications occurs between processes in one group and processes in the other group. Example: Manager-Worker Manager process keeps track of a pool of tasks and dispatches them to working processes for completion. Workers return results to the manager, simultaneously requesting a new task. Advantages & Disadvantage: The manager can easily recognize that a particular worker has failed and communicate to other processes. Each group can keep tack of the state held by the other group. Difficult to implement in complex systems.

9 Approach to Fault Tolerance in MPI programs
3.Modifying MPI Semantics: Takes advantage of the existing MPI objects that contain more state and MPI functions defined in the standard. Example: MPI objects guarantees that the number of processes and its rank in a communicator is constant. This property can be used by the program, To decompose data according to a communicator’s size. Calculate the data assigned to a process using its rank. Advantage & Disadvantage: Fault tolerant programs can be written for a wider set of algorithms. This approach uses the already existing semantics and therefore provides lesser fault tolerant features compared to other approaches.

10 Approach to Fault Tolerance in MPI programs
4.Extending MPI: This approach is developed to address the difficulty of using MPI communicators when processes may fail. It is difficult to construct the communicator consisting of the two individual processes. If the Manager group has failed, then it is even more difficult because of collective semantics of communicator construction in MPI.

11 Advantages of using MPI fault tolerance features
It is simple and easy to use the existing error handling features in MPI. Users can extend the “MPI_ERRORS_RETURN” to define errors specific to their needs. Error handling is purely local. Every process can have a different handler. The ability to attach error handlers on a communicator increases the modularity of MPI. MPI provides the ability to define one’s own application-specific error handler which is an important approach to fault tolerance.

12 Limitations of Fault Tolerance in MPI
The specification makes no demands on MPI to survive failures. The defined MPI error classes are used only to clarify to the user about the source of the error. It is difficult for MPI to notify users of the failure of a given function that happen after the function has already returned. There is no description of when error notification will happen relative to the occurrence of the error. It is not possible for one application process to ask to be informed of errors on other processes or for the application to be informed of specific classes of errors.

13 Harness/ Fault Tolerant MPI: an Extension to MPI
HARNESS (Heterogeneous Adaptive Reconfigurable Networked SyStem) Experimental System which provides highly dynamic, fault-tolerant computing environment for high performance computing applications HARNESS is a joint DOE funded project involving Oak Ridge National Laboratory (ORNL), University of Tennessee at Knoxville (UTK/ICL) and Emory University in Atlanta, GA.

14 Harness : an extension to MPI
Current MPI implementations either abort or use check-pointing Communications only via communicator MPI communicator based on static model

15 Implementation FT MPI (HARNESS) extends MPI
Allows applications to decide when errors occurs Restart failed node Continue with less number of nodes When member communicator fails: Communicator state changes to indicate problem Message transfer continues if safe or be stopped or ignored User application can fix or abort communicator to continue

16 Comparison FT-MPI and MPI: Communicator And Process States
FT_OK FT_DETECTED VALID FT_RECOVER INVALID FT_RECOVERED FT_FAILED PROCESS STATES OK UNAVAILABLE FAILED JOINING

17 Implementation: Extending MPI
When running an FT-MPI application, there are two parameters used to specify modes in which application is running. The first parameter, the ’communicator mode’, indicates what is the status of an MPI object after recovery. Which can be specified when starting the application: ABORT BLANK REBUILD SHRINK Like MPI FTMPI can abort error Failed process are not replace Failed process respawned surviving process has same rank. Default mode Failed Process not replaced. No gaps in lists of processors

18 FT/MPI : Second parameter communication mode:
Two types of communications Cont/ CONTINUE NOOP /RESET All operations which returned MPI_SUCCESS code will finish properly. All ongoing messeages dropped. Error on application sents it to last consistent state.

19 FT/MPI : Communicator (COMM.) Failure Handling
COMM. Invalidated if failure detected Underlying system sends a state update to all processes for that COMM. System behavior depends on COMM. mode chosen All COMM. are not updated for communication errors Process exit

20 FT/MPI Usage In form of error check
Some corrective action like communicator rebuild For example*: (Simple FT-MPI send usage) rc = MPI_Send( , com); if (rc == MPI_ERR_OTHER) MPI_Comm_dup (com, newcom); com = newcom; SPMD master-worker node only need master code to check for errors if user only takes master code as the point of failure

21 Example : MPI Error handling

22 Example of Error Handling Using FT-MPI

23 Performance Consideration
Fault free overhead of P2P communication in MPI/FT is negligible in long running applications. Check-pointing increases communication overhead considerably therefore user must determine less frequency of checkpoints.

24 Conclusions FT-MPI is tool to provide with methods of dealing with failures within MPI applications FT-MPI is useful for experimenting with Self tuning collective communications Distributed control algorithms Dynamics libraries download methods

25 Future Scope Developing further implementations that support more restrictive environments (ie. embedded clusters) Creation of number of drop-in library templates to simplify the construction of fault tolerant applications High performance and survivability

26 References Fault Tolerance in MPI Programs:
LEGION: HARNESS: MPI 3.0 Fault Tolerance Working Group: Graham E. Fagg, George Bosilca, Thara Angskun, Zhizhong Chen, Jelena Pjesivac-Grbovic, Kevin London and Jack J. Dongarra "Extending the MPI Specification for Process Fault Tolerance on High Performance Computing Systems" manual HARNESS Graham E. Fagg, Antonin Bukovsky,Jack J. Dongarra "HARNESS and fault tolerant MPI" Parallel Computing 27, Graham E. Fagg, Jack J. Dongarra "BUILDING AND USING A FAULT–TOLERANT MPI IMPLEMENTATION" The International Journal of High Performance Computing Applications, Volume 18, No. 3, Fall 2004, pp. 353–361 Conference proceedings FT-MPI Presentation Graham E. Fagg, Jack J. Dongarra

27 Q & A ?

28 FAQs 1.MPI vs TCP socket: Arguably, one of the biggest weaknesses of MPI is its lack of resilience — most (if not all) MPI implementations will kill an entire MPI job if any individual process dies.  This is in contrast to the reliability of TCP sockets, for example: if a process on one side of a socket suddenly goes away, the peer just gets a stale socket. 2. Does MPI guarantee that user-defines handler be used as MPI_ERRORS_RETURN The specification does not state whether an error that would cause MPI functions to return an error code under the MPI_ERRORS_RETURN error handler would cause a user-defined error handler to be called during the same MPI function or at some earlier or later point in time. 3. Relation between checkpointing and I/O The practicality of checkpointing is related to performance of parallel I/O as checkpoint data is saved to a parallel file system.

29 FAQs 4. Usability of HARNESS FT-MPI
The fault tolerance feature provides by HARNESS depends on its implementation. The HARNESS team actually works on the reported bugs and releases new versions. 5. Data Recovery in MPI The MPI standard does not provide a way to recover data. It depends on the implementation of the MPI program. 6. Is fault tolerance in MPI can be made transparent? It is very difficult to make the fault tolerance in MPI transparent. This is because of the complexity involved in communication between processes.

30 Reference Slides

31 Referrence: Structure of FT-MPI

32

33 Derived datatype handling
Reduces memory copies while allowing overlapping 3 stages of data handling Gather/Scatter Encoding/Decoding Send/Receive Package

34 Handling of compacted Datatype: only MPI_Snd and receive wer used

35 Performance Consideration
Tests show compacted data handling gives 10% to 19% imrovement. Benefit of buffer reuse and reordering of data elements leads to considerable improvements on heterogeneous networks.


Download ppt "Minaashi Kalyanaraman Pragya Upreti CSS 534 Parallel Programming"

Similar presentations


Ads by Google