Presentation is loading. Please wait.

Presentation is loading. Please wait.

April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan.

Similar presentations


Presentation on theme: "April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan."— Presentation transcript:

1 April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan

2 April 19, 2010HIPS 20102 Motivation Statically

3 April 19, 2010HIPS 20103 Outline Inversion of a Triangular Matrix Requisite Semantic Information Static Generation of a Directed Acyclic Graph Performance Conclusion

4 April 19, 2010HIPS 20104 Inversion of a Triangular Matrix Formal Linear Algebra Methods Environment (FLAME)  High-level abstractions for expressing linear algebra algorithms Triangular Inversion (Trinv) R := U -1

5 April 19, 2010HIPS 20105 Inversion of a Triangular Matrix

6 April 19, 2010HIPS 20106 Inversion of a Triangular Matrix LAPACK-style Implementation DO J = 1, N, NB JB = MIN( NB, N-J+1 ) CALL DTRSM( ‘Left’, ‘Upper’, ‘No transpose’, ‘Non-unit’, $ JB, N-J-JB+1, -ONE, A( J, J ), LDA, $ A( J, J+JB ), LDA ) CALL DGEMM( ‘No transpose’, ‘No transpose’, $ J-1, N-J-JB+1, JB, ONE, A( 1, J ), LDA, $ A( J, J+JB ), LDA, ONE, A( 1, J+JB ), LDA ) CALL DTRSM( ‘Right’, ‘Upper’, ‘No transpose’, ‘Non-unit’, $ J-1, JB, ONE, A( J, J ), LDA, $ A( 1, J ), LDA ) CALL DTRTI2( ‘Upper’, ‘Non-unit’, $ JB, A( J, J ), LDA, INFO ) ENDDO

7 April 19, 2010HIPS 20107 Inversion of a Triangular Matrix FLASH  Matrix of matrices

8 April 19, 2010HIPS 20108 Inversion of a Triangular Matrix FLA_Part_2x2( A, &ATL, &ATR, &ABL, &ABR, 0, 0, FLA_TL ); while ( FLA_Obj_length( ATL ) < FLA_Obj_length( A ) ) { FLA_Repart_2x2_to_3x3( ATL, /**/ ATR, &A00, /**/ &A01, &A02, /* ******** */ /* **************** */ &A10, /**/ &A11, &A12, ABL, /**/ ABR, &A20, /**/ &A21, &A22, 1, 1, FLA_BR ); /*-------------------------------------------------------*/ FLASH_Trsm( FLA_LEFT, FLA_UPPER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_MINUS_ONE, A11, A12 ); FLASH_Gemm( FLA_NO_TRANSPOSE, FLA_NO_TRANSPOSE, FLA_ONE, A01, A12, FLA_ONE, A02 ); FLASH_Trsm( FLA_RIGHT, FLA_UPPER_TRIANGULAR, FLA_NO_TRANSPOSE, FLA_NONUNIT_DIAG, FLA_ONE, A11, A01 ); FLASH_Trinv( FLA_UPPER_TRIANGULAR, FLA_NONUNIT_DIAG, A11 ); /*-------------------------------------------------------*/ FLA_Cont_with_3x3_to_2x2( &ATL, /**/ &ATR, A00, A01, /**/ A02, A10, A11, /**/ A12, /* ********** */ /* ************* */ &ABL, /**/ &ABR, A20, A21, /**/ A22, FLA_TL ); }

9 April 19, 2010HIPS 20109 Inversion of a Triangular Matrix Extensible Markup Language (XML) FLA_UPPER_TRIANGULAR BR" inout="both">A A FLA_LEFT FLA_UPPER_TRIANGULAR FLA_NO_TRANSPOSE FLA_NONUNIT_DIAG FLA_MINUS_ONE A FLA_NO_TRANSPOSE FLA_ONE

10 April 19, 2010HIPS 201010 Inversion of a Triangular Matrix Extensible Markup Language (XML) Cont. A FLA_ONE A FLA_RIGHT FLA_UPPER_TRIANGULAR FLA_NO_TRANSPOSE FLA_NONUNIT_DIAG FLA_ONE A FLA_UPPER_TRIANGULAR FLA_NONUNIT_DIAG A

11 April 19, 2010HIPS 201011 Outline Inversion of a Triangular Matrix Requisite Semantic Information Static Generation of a Directed Acyclic Graph Performance Conclusion

12 April 19, 2010HIPS 201012 Requisite Semantic Information Partitioning Scheme FLA_UPPER_TRIANGULAR BR" inout="both">A A

13 April 19, 2010HIPS 201013 Requisite Semantic Information Problem Size* FLA_UPPER_TRIANGULAR BR" inout="both">A A

14 April 19, 2010HIPS 201014 Requisite Semantic Information Updates FLA_UPPER_TRIANGULAR BR" inout="both">A A

15 April 19, 2010HIPS 201015 Requisite Semantic Information Input and Output Parameters alpha A B alpha A B beta C A

16 April 19, 2010HIPS 201016 Outline Inversion of a Triangular Matrix Requisite Semantic Information Static Generation of a Directed Acyclic Graph Performance Conclusion

17 April 19, 2010HIPS 201017 Static Generation of a DAG Code Generation  Convert XML representation to FLASH code generation intermediary Annotated with input and output information  Create directed acyclic graph (DAG) by statically unrolling the loop Operations on submatrix blocks (tasks) are vertices Data dependencies between tasks are edges

18 April 19, 2010HIPS 201018 Static Generation of a DAG Data Dependencies  Flow (read-after-write) S1: A = B + C; S2: D = A + E;  Anti (write-after-read) S3: F = A + G; S4: A = H + I;  Output (write-after-write) S5: A = J + K; S6: A = L + M;

19 April 19, 2010HIPS 201019 Static Generation of a DAG

20 April 19, 2010HIPS 201020 Static Generation of a DAG Problem Size  Problem size cannot be determined a priori  Fix the block size or loop unrolling factor Balance between instruction footprint and data granularity of tasks Example  Trinv on 3x3 matrix of blocks

21 April 19, 2010HIPS 201021 Static Generation of a DAG Trinv  Iteration 1 Trinv 2 Trsm 0 Trsm 1

22 April 19, 2010HIPS 201022 Static Generation of a DAG Trinv  Iteration 2 Trsm 5 Gemm 4 Trinv 6 Trsm 3

23 April 19, 2010HIPS 201023 Static Generation of a DAG Trinv  Iteration 3 Trsm 7 Trsm 8 Trinv 9

24 April 19, 2010HIPS 201024 Static Generation of a DAG Trsm 1 Trinv 2 Trsm 0 Gemm 4 Trsm 5 Trinv 9 Trsm 3 Trsm 7 Trsm 8 Trinv 6

25 April 19, 2010HIPS 201025 Outline Inversion of a Triangular Matrix Requisite Semantic Information Static Generation of a Directed Acyclic Graph Performance Conclusion

26 April 19, 2010HIPS 201026 Performance LabVIEW  Graphical, data flow programming language (G) Anti-dependencies cannot exist in G Copies are made when wire is split

27 April 19, 2010HIPS 201027 Performance

28 April 19, 2010HIPS 201028 Performance Target Architecture  16-core AMD processor 4 socket quad-core Opteron 1.9 GHz 4 GB of RAM per socket  LabVIEW 8.6 Windows XP  Basic Linear Algebra Subprograms (BLAS) MKL 7.2

29 April 19, 2010HIPS 201029 Performance

30 April 19, 2010HIPS 201030 Performance Results  Parallelism Exploit parallelism inherent within DAG  Hierarchical matrix storage Spatial locality  Overhead Copy matrix from flat row-major storage to hierarchical matrix and back

31 April 19, 2010HIPS 201031 Performance

32 April 19, 2010HIPS 201032 Outline Inversion of a Triangular Matrix Requisite Semantic Information Static Generation of a Directed Acyclic Graph Performance Conclusion

33 April 19, 2010HIPS 201033 Conclusion Instantiate linear algebra algorithm using a code generation intermediary Statically produce a directed acyclic graph by fixing block size or loop unrolling factor XML → FLASH → DAG

34 April 19, 2010HIPS 201034 Acknowledgments Jim Nagle, Robert van de Geijn  We thank the other members of FLAME team for their support Funding  National Instruments  NSF Grants CCF—0540926 CCF—0702714

35 April 19, 2010HIPS 201035 Conclusion More Information http://www.cs.utexas.edu/~flame Questions? echan@cs.utexas.edu


Download ppt "April 19, 2010HIPS 20101 Transforming Linear Algebra Libraries: From Abstraction to Parallelism Ernie Chan."

Similar presentations


Ads by Google