Presentation is loading. Please wait.

Presentation is loading. Please wait.

Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.

Similar presentations


Presentation on theme: "Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware."— Presentation transcript:

1 Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware

2 Overview Background Motivation A new idea: Tile Reduction Experimental Results Conclusion Related Work Future Work 1

3 Tile/Tiling Natural representation of data objects that are heavily used in scientific algorithms Tiling improves data locality Tiling can increase parallelism and reduce synchronization in parallel programs It is an effective compiler optimizing technique Essentially a program design paradigm Supported in many parallel programming languages: ZPL, CAF, HTA, etc. 2

4 OpenMP OpenMP is the de facto standard for shared- memory parallel programming Provides a simple and flexible interface for developing portable and scalable parallel application Support incremental parallelization Maintain sequential consistency “tile oblivious”, no directive or clause can be used to annotate data tile and carry such information to compiler 3

5 A Motivating Example 4

6 Parallelizing: the traditional way(1) 5

7 Parallelizing: the traditional way(2) Can only leverage the traditional scalar reduction in OpenMP Parallelism is trivial Data locality is not bad Not natural and intuitive 6

8 The Expected Parallelization 7 View the inner most two loops as a macro operation performing on the 2x2 data tiles Aggregate the data tiles in parallel More parallelism Better data locality

9 Tile Reduction Interface 8

10 Terms Reduction Tile: the data tile under reduction Tile descriptor: the “multi-dimensional array” in the list construct Reduction kernel loops: the loops involved in performing “one” recursive calculation Tile name Dimension descriptor: the tuples following the tile name 9

11 A Use Case 10 Tiled Matrix Multiplication Tile Reduction Applied on the Tiled Matrix Multiplication Code

12 Code Generation (1) 11 Distribute the iterations of the parallelized loop among the threads Allocate memory for the private copy of the tile used in the local recursive calculation Perform the local recursive calculation which is specified by the reduction kernel loops Update the global copy of the reduction tile

13 Code Generation (2) 12

14 Experimental Results (1) 13 2D Histogram Reduction

15 Experimental Results (2) 14 Matrix-Matrix Multiplication

16 Experimental Results (3) 15 Matrix-Vector Multiplication

17 Conclusions 16 As one of the building block of the tile aware parallelization theory, tile reduction brings more opportunities to parallelize dense matrix applications For some benchmarks, tile reduction is a more natural and intuitive way to reason about the best parallelization decision For some benchmarks, tile reduction not only can improve data locality, but also can expose more parallelism Amiable to programmers Code generation is as simple as the scalar reduction in the current OpenMP Runtime overhead is trivial

18 Similar Works 17 Parallel reduction is supported in: C**: Viswanathan, G., Larus, J.R.: User-defined reductions for efficient communication in data-parallel languages. Technical Report 1293, University of Wisconsin-Madison (Jan 1996) SAC: Scholz, S.B.: On defining application-specific high-level array operations by means of shape invariant programming facilities. In: APL ’98: Proceedings of the APL98 conference on Array processing language, New York, NY, USA, ACM (1998) 32–38 ZPL: Deitz, S.J., Chamberlain, B.L., Snyder, L.: High-level language support for user- defined reductions. J. Supercomput. 23(1) (2002) 23–37 UPC Consortium: UPC Collective Operations Specifications V1.0 A publication of the UPC Consortium (2003) Forum, M.P.I.: MPI: A message-passing interface standard (version 1.0). Technical report (May 1994) URL http://www.mcs.anl.gov/mpi/mpi-report.ps.http://www.mcs.anl.gov/mpi/mpi-report.ps Kambadur, P., Gregor, D., Lumsdaine, A.: Openmp extensions for generic libraries. In: Lecture Notes in Computer Science: OpenMP in a New Era of Parallelism, IWOMP’08, International Workshop on OpenMP. Volume 5004/2008., Springer Berlin / Heidelberg (2008) 123–133

19 Future Works 18 Design and develop OpenMP pragma directives that can be used to help compiler to generate efficient data movement code for parallel applications running on many-core platforms with highly non-uniform memory system, like the Cyclops-64 processor


Download ppt "Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware."

Similar presentations


Ads by Google