Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fault Tolerant Parallel Data-Intensive Algorithms Mucahid KutluGagan AgrawalOguz Kurt Department of Computer Science and Engineering The Ohio State University.

Similar presentations


Presentation on theme: "Fault Tolerant Parallel Data-Intensive Algorithms Mucahid KutluGagan AgrawalOguz Kurt Department of Computer Science and Engineering The Ohio State University."— Presentation transcript:

1 Fault Tolerant Parallel Data-Intensive Algorithms Mucahid KutluGagan AgrawalOguz Kurt Department of Computer Science and Engineering The Ohio State University Department of Mathematics The Ohio State University † † HIPC'12 Pune, India

2 Outline Motivation Related Work Our Goal Data Intensive Algorithms Our Approach – Data Distribution – Fault Tolerant Algorithms – Recovery Experiments Conclusion HIPC'12 Pune, India 2

3 Why Is Fault Tolerant So Important? Typical first year for a new cluster * - 1000 individual machine failures - 1 PDU failure (~500-1000 machines suddenly disappear) - 20 rack failures (40-80 machines disappear,1-6 hours to get back) - Other failures because of overheating, maintenance, … If you have a code which runs on 1000 computers more than one day, it is high probability that you will get a failure before your code finishes. *taken from Jeff Dean’s talk in Google IO(http://perspectives.mvdirona.com/2008/06/11/JeffDeanOnGoogleInfrastructure.aspx) HIPC'12 Pune,India 3

4 Fault Tolerance So Far MPI fault-tolerance, which focus on checkpointing [1], [2], [3]. - High overhead MapReduce[4] - Good at fault tolerance for data-intensive applications Algorithm-based fault-tolerance - Most of them are for scientific computation like linear algebra routines [5], [6], iterative computations [7], including conjugate gradient [8]. HIPC'12 Pune,India 4

5 Our Goal Our main goal is to develop algorithm-based fault- tolerance solution for data intensive algorithms. Target Failure : – We focused on hardware failures. – We lose everything on the failed node. – We do not recover that failed node and continue process without it. In an iterative algorithm, when a failure occurs: – The system shouldn’t start the failed iteration from the beginning – The amount of data lost should be small as much as possible. HIPC'12 Pune,India 5

6 Data Intensive Algorithms We focused on two algorithms: K-Means & Apriori But it can be generalized to all algorithms that have the following reduction processing structure. HIPC'12 Pune,India 6

7 Our Approach We used Master-Slave approach. Replication : Divide the data to be replicated in parts, and distribute them among different processors. -The amount of lost data will be smaller. Summarization : The slaves can send the results of parts of their data before they processed all the data. -We won’t need to re-process the data that we already got the results. HIPC'12 Pune,India 7

8 Master P1 P2 P3 P4 D1 D2 D3 D4 D5 D6 D7 D8 D1 D2 D3 D4 D5 D6 D7 D8 Data Distribution Replication

9 P3 P1 P5 P7 P4 P2 P6 7 7 7 7 2 2 2 2 7 7 7 7 K-means with No Failure 5 5 4 4 1 1 5 5 1313 1313 4 4 6 6 1 1 1 1 2 2 6 6 3 3 3 3 5 5 3 3 1 1 2 2 7 7 8 8 9 9 1010 1010 3 3 4 4 1 1 2 2 11 1212 1212 5 5 6 6 1 1 2 2 1313 1313 1414 1414 7 7 8 8 6 6 3 3 4 4 5 5 6 6 9 9 1010 1010 3 3 4 4 1313 1313 1414 1414 11 1212 1212 5 5 6 6 9 9 1010 1010 1414 1414 11 1212 1212 7 7 8 8 Primary Data Master Replicas Send the summary of data portion 1 1 1 2 2 1- 14 Broadcast new centroids

10 Single Node Failure Recovery -Master node notifies P3 to process D2 at this iteration and P3 to process D1 starting from next iteration. Case 1: P1 Fails after sending the result of D1 P3 P4 P5 P6 P7 P2 P1 1 1 2 2 7 7 8 8 9 9 1010 1010 3 3 4 4 1 1 2 2 11 1212 1212 5 5 6 6 1 1 2 2 1313 1313 1414 1414 7 7 8 8 3 3 4 4 5 5 6 6 9 9 1010 1010 3 3 4 4 1313 1313 1414 1414 11 1212 1212 5 5 6 6 9 9 1010 1010 1313 1313 1414 1414 11 1212 1212 7 7 8 8 Primary Data Master Replicas HIPC'12 Pune,India 10

11 Multiple Node Failure Recovery -Master node notifies P7 to read first data block(D1 and D2) from storage cluster. -Master Node notifies P6 to processD5 and D6, P4 to process D3, P5 to process D4. -P7 reads D1 and D2 P3 P4 P5 P6 P7 P2 P1 1 1 2 2 7 7 8 8 9 9 1010 1010 3 3 4 4 1 1 2 2 11 1212 1212 5 5 6 6 1 1 2 2 1313 1313 1414 1414 7 7 8 8 3 3 4 4 5 5 6 6 9 9 1010 1010 3 3 4 4 1313 1313 1414 1414 11 1212 1212 5 5 6 6 9 9 1010 1010 1313 1313 1414 1414 11 1212 1212 7 7 8 8 Primary Data Master Replicas Storage Cluster 1 1 2 2 HIPC'12 Pune,India 11 Case 2: P1, P2 and P3 Fail, D1 and D2 are lost totally!

12 Experiments We used Glenn at OSC for our experiments. The nodes have : – Dual socket, quad core 2.5 GHz Opterons – 24 GB RAM Allocated 16 slave nodes with 1 core per each. Implemented in C programming language by using MPI library. Generated Datasets: – K-Means Size : 4.87 GB Coordinate Number : 20 Maximum Iteration : 50 – Apriori Size : 4.79 GB Item Number : 10 Support Vector :%1 Maximum Rule Size:6 (847 rules) HIPC'12 Pune,India 12

13 HIPC'12 Pune,India 13 Effect of Summarization Changing Message Numbers with Different Percentages Changing Data Block Numbers with Different Number of Failures

14 HIPC'12 Pune,India 14 Effect of Replication for Apriori Scalability Test for Apriori Effect of Replication & Scalability

15 Fault Tolerance in Map Reduce Replication data in file system – We replicate the data in processors, not in file system If a task fails, re-execute it. Completed map tasks also need to be re- executed since the results are stored in local disks. Because of dynamics scheduling, it can have better parallelism after a failure. HIPC'12 Pune,India 15

16 Experiments with Hadoop Experimental Setup – Allocated 17 nodes and 1 core per each. – Used one of nodes as master and the rest as slaves – Used default chunk size – No backup nodes are used – Replication is set to 3 Each test executed 5 times and took the average of 3 after eliminating the maximum and minimum results. For our system, we set R=3 and S=4 and M=2 Calculated total time, including I/O operation HIPC'12 Pune,India 16

17 HIPC'12 Pune,India 17 Single Failure Occurring at different Percentages Multiple Failure Test for Apriori Experiments with Hadoop(Cont’d)

18 Conclusion Summarization has good effect in fault tolerance – We recover faster when we divide the data into more parts Dividing data into small parts and distributing them with minimum intersection decrease the amount of data to be recovered in multiple failures. Our system behaves much better Hadoop HIPC'12 Pune,India 18

19 References 1.J. Hursey, J.M. Squyres, T.I. Mattox, and A. Lumsdaine. The design and implementation of checkpoint/restart process fault tolerance for open mpi. In Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pages 1 –8, march 2007 2.G. Bosilca, A. Bouteiller, F. Cappello, S. Djilali, G. Fedak, C. Germain, T. Herault, P. Lemarinier, O. Lodygensky, F. Magniette, V. Neri, and A. Selikhov. Mpich-v: Toward a scalable fault tolerant mpi for volatile nodes. In Supercomputing, ACM/IEEE 2002 Conference, page 29, nov. 2002. 3.Camille Coti, Thomas Herault, Pierre Lemarinier, Laurence Pilard, Ala Rezmerita, Eric Rodriguez, and Franck Cappello. Blocking vs. non-blocking coordinated checkpointing for large-scale fault tolerant mpi. In Proceedings of the 2006 ACM/IEEE conference on Supercomputing, SC ’06 ACM, 2006. 4.Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. In OSDI, pages 137–150, 2004. 5.J.S. Plank, Youngbae Kim, and J.J. Dongarra. Algorithm-based diskless checkpointing for fault tolerant matrix operations. In Fault-Tolerant Computing, 1995. FTCS-25. Digest of Papers., Twenty-Fifth International Symposium on, pages 351 –360, jun 1995. 6.Teresa Davies, Christer Karlsson, Hui Liu, Chong Ding, and Zizhong Chen. High performance linpack benchmark: a fault tolerant implementation without checkpointing. In Proceedings of the international conference on Supercomputing, ICS ’11, pages 162–171. ACM, 2011. 7.Zizhong Chen. Algorithm-based recovery for iterative methods without checkpointing. In Proceedings of the 20th ACM International Symposium on High Performance Distributed Computing, HPDC 2011, San Jose, CA, USA, June 8-11, 2011, pages 73–84, 2011. 8. Zizhong Chen and J. Dongarra. A scalable checkpoint encoding algorithm for diskless checkpointing. In High Assurance Systems Engineering Symposium, 2008. HASE 2008. 11th IEEE, pages 71 –79, dec. 2008. HIPC'12 Pune,India 19

20 Thank you for listening.. Any questions? HIPC'12 Pune,India 20


Download ppt "Fault Tolerant Parallel Data-Intensive Algorithms Mucahid KutluGagan AgrawalOguz Kurt Department of Computer Science and Engineering The Ohio State University."

Similar presentations


Ads by Google