Presentation is loading. Please wait.

Presentation is loading. Please wait.

HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng.

Similar presentations


Presentation on theme: "HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng."— Presentation transcript:

1 HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng IEEE 2007 Dec 3, 2014 Kyung-Bin Lim

2 2 / 35 Outline  Introduction  Methodology  Experiments  Conclusion

3 3 / 35 Apache HAMA  Easy-of-use tool for data-intensive scientific computation  Massive matrix/graph computations are often used as primary functionalities  Fundamental design is changed from MapReduce with matrix computation to BSP with graph processing  Mimic of Pregel running on HDFS – Use zookeeper as a synchronization barrier

4 4 / 35 Our Focus  This paper is a story about previous version 0.1 of HAMA – Latest version: 0.7.0, Mar. 2014 released  Only Focus on matrix computation with MapReduce  Shows simple case studies

5 5 / 35 The HAMA Architecture  We propose distributed scientific framework called HAMA (based on HPMR) – Provide transparent matrix/graph primitives

6 6 / 35 The HAMA Architecture  HAMA API: Easy-to-use Interface  HAMA Core: Provides matrix/graph primitives  HAMA Shell: Interactive User Console

7 7 / 35 Contributions of HAMA  Compatibility – Take advantage of all Hadoop features  Scalability – Scalable due to compatibility  Flexibility – Multiple Compute Engines Configurable  Applicability – HAMA’s primitives can be applied to various applications

8 8 / 35 Outline  Introduction  Methodology  Experiments  Conclusion

9 9 / 35 Case Study  With case study approach, we introduce two basic primitives with MapReduce model running on HAMA – Matrix multiplication and finding linear solution  And compare with MPI versions of these primitives

10 10 / 35 Case Study  Representing matrices – As a defaults, HAMA use HBase (NoSQL database)  HBase is modeled after Google’s Bigtable  Column oriented, semi-structured distributed database with high scalability

11 11 / 35 Case Study – Multiplication: Iterative Way  Iterative approach (Algorithm)

12 12 / 35 Case Study – Multiplication: Iterative Way  Simple, naïve strategy  Works well with sparse matrix  Sparse matrix: most entries are 0

13 13 / 35 Multiplication: Iterative Way

14 14 / 35 Multiplication: Iterative Way

15 15 / 35 Multiplication: Iterative Way

16 16 / 35 Multiplication: Iterative Way

17 17 / 35 Multiplication: Iterative Way

18 18 / 35 Multiplication: Iterative Way

19 19 / 35 Case Study – Multiplication: Block Way  Multiplication can be done using sub-matrix  Works well with dense matrix

20 20 / 35 Case Study – Multiplication: Block Way  Block Approach – Minimize data movement (network cost)

21 21 / 35 Case Study – Multiplication: Block Way  Block Approach (Algorithm)

22 22 / 35 Case Study – Finding Linear Solution  Ax =b – x = ?  A: known square symmetric positive-definite matrix  b: known vector  Use Conjugate Gradient approach

23 23 / 35 Case Study – Finding Linear Solution  Finding Linear Solution – Cramer’s rule – Conjugate Gradient Method

24 24 / 35 Case Study – Finding Linear Solution  Cramer’s rule

25 25 / 35 Case Study – Finding Linear Solution  Conjugate Gradient Method – Find a direction (conjugate direction) – Find a step size (Line search)

26 26 / 35 Case Study – Finding Linear Solution  Conjugate Gradient Method (Algorithm)

27 27 / 35 Outline  Introduction  Methodology  Experiments  Conclusion

28 28 / 35 Evaluations  TUSCI (TU Berlin SCI) Cluster – 16 nodes, two Intel P4 Xeon processors, 1GB memory – Connected with SCI (Scalable Coherent Interface) network interface in a 2D torus topology – Running in OpenCCS (similar environment of HOD)  Test sets

29 29 / 35 HPMR’s Enhancements  Prefetching – Increase Data Locality  Pre-shuffling – Reduces Amount of intermediate outputs to shuffle

30 30 / 35 Evaluations  The comparison of average execution time and scaleup with Matrix Multiplication

31 31 / 35 Evaluations  The comparison of average execution time and scaleup with CG

32 32 / 35 Evaluations  The comparison of average execution time with CG, when a single node is overloaded

33 33 / 35 Outline  Introduction  Methodology  Experiments  Conclusion

34 34 / 35 Conclusion  HAMA provides the easy-of-use tool for data-intensive computations – Matrix computation with MapReduce – Graph computation with BSP


Download ppt "HAMA: An Efficient Matrix Computation with the MapReduce Framework Sangwon Seo, Edward J. Woon, Jaehong Kim, Seongwook Jin, Jin-soo Kim, Seungryoul Maeng."

Similar presentations


Ads by Google