Download presentation

Presentation is loading. Please wait.

Published byAntony Sennett Modified over 3 years ago

1
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois A Proposal of Operation History Management System for Source-to-Source Optimization of HPC Programs Yasushi Negishi, Hiroki Murata and Takao Moriyama Deep Computing, Tokyo Research Laboratory, IBM Research 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois

2
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 2 Outline of this Presentation 1.Proposal of an algorithm for managing operation history of source-to-source optimization. 2.Prototype system with new user interface for managing operation history explicitly.

3
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 3 Outline of this Presentation 1.Proposal of an algorithm for managing operation history of source-to-source optimization. 2.Prototype system with new user interface for managing operation history explicitly.

4
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 4 Background Improvement of single processor performance is stopping, and architectures of supercomputers is becoming more complex. –Architecture-specific optimizations are needed to utilize various kinds of network and processor architectures to achieve reasonable performance. Application areas for numerical simulations continue to expand. –We need solve performance issues more effectively and more easily. Source-to-source optimization tools are becoming important. –Automatic conversion (a.k.a. refactoring) for optimization –Support typical architecture-specific and application-specific performance optimization patterns. –Reduce programmer’s time and human errors by supporting routine but troublesome optimization.

5
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 5 Strength reduction –Replace costly operation with an equivalent but less expensive operation E.g. x = r ** (-1) x = 1 / r –Steps 1.Modify the code to use less expensive operation by manual editing Loop unrolling & SIMDization –Use SIMD instructions If compiler does not generate optimal SIMD instructions in a loop E.g. x(i) = a(i) + b(i) * c(i) x(i) = FPMADD(a(i), b(i), c(i)) x(i+1) = a(i+1) + b(i+1) * c(i+1) –Steps 1.Unroll the loop by automatic conversion with specifying the range and unroll factor. 2.Modify the unrolled loop body with in-line assemble code for SIMD by manual editing Loop tiling (a.k.a. loop blocking, strip mine and interchange) –Change loop structure to increase memory access locality and cache hit ratio. E.g. –Steps 1.Modify the loop by automatic conversion with specifying the range and blocking factors. Typical Source-to-Source Optimization Steps for (i=0; i<N; i++) for (j=0; j<N; j++) c[i] = c[i]+ a[i,j]*b[j]; for (i=0; i<N; i+= Bi) for (j=0; j<N; j+= Bj ） for (ii=i; ii<min(i+Bi,N); ii++) for (jj=j; jj<min(j+Bj,N); jj++) c[ii] =c[ii]+ a[ii,jj]*b[jj]; Optimization steps are combinations of automatic conversion and manual editing

6
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 6 “Reapplication Conflict” Because of trial-and-error nature of optimization work, it is sometimes required to undo an operation in the past or to insert or change operation in the past even if a single user manages the code. We call this conflict caused by a single user as “Reapplication Conflict”. System for supporting Source-to-Source optimization should handle this conflict correctly.

7
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 7 Issues of Existing Version Management Systems Handling “Reapplication Conflict” Because of trial-and-error nature of optimization work, it is sometimes required to undo an operation in the past or to insert or change operation in the past even if a single user manages the code. –We call this conflict caused by a single user as “Reapplication Conflict”. System should handle this conflict correctly. Existing version management systems use algorithm of “patch” command or similar one to handle conflicts. But the patch algorithm has a issue. –As for modification by manual editing, the patch algorithm works fine. The algorithm applies difference by an operation on different base code, with adjusting target range to be applied. –As for modification by automatic conversion, the patch algorithm may generate unexpected results. Scenario in which existing system does not work expectedly is shown.

8
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 8 Example Scenario of “Reapplication Conflict” (original) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = 3.14159265d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + x(i) ** (-1) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Original Original code is checked out.

9
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 9 Example Scenario of “Reapplication Conflict” (Step 1) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + x(i) ** (-1) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Original: Step 1: Original Operation A Step 1: Do loop invariant code motion by manual editing, and check it in

10
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 10 Step 2: Do strength reduction by manual editing, and check it in. Example Scenario of “Reapplication Conflict” (Step 2) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Original: Step 1: Step 2: Original AB

11
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 11 Step 3: Do loop unrolling by automatic conversion, and check it in. Example Scenario of “Reapplication Conflict” (Step 3) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / fourpi + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Original: Step 1: Step 2: Original AB C Step 3:

12
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 12 Example Scenario of “Reapplication Conflict” (Step 4) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / fourpi + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Original: Step 1: Step 2: Original AB C Step 3: Step 4: Compile and execute the code, and analyze effects of optimizations Find the following results Optimization A: not effective Optimization B: effective Optimization C: effective N.G.O.K.

13
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 13 Example Scenario of “Reapplication Conflict” (Step 5) Original: Step 1: Step 2: Original AB C Step 3: Step 5: Step 5: Undo the optimization A by “patch” command program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / fourpi + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Target of optimization A Not target of optimization A, but influenced

14
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 14 Example Scenario of “Reapplication Conflict” (Final Results) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = 3.14159265d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+1) + a) / fourpi + 1.0d0) b = b + ((x(i+2) + a) / fourpi + 1.0d0) b = b + ((x(i+3) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Problem: The wrong line is unrolled !! Because “patch” does not actually apply the automatic conversion operation again, but does just apply difference of the results by automatic conversion operation. System for managing automatic conversion operations needed. (1) Adjust the target range (2) Apply the automatic operation actually again.

15
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 15 Proposed Algorithm for saving/applying automatic operations Manual editing handled by the patch algorithm Automatic conversion handled by our proposed algorithm Original code Optimization results Manual Editing Context difference file Saving an operation Modified code Applying an saved operation Optimized results on modified code Patch algorithm Original Code Pseudo change file Specify Range Optimization results Specify Conversion ID and arguments Operation log Context difference file Operation log Conversion ID Arguments Modified Code Pseudo change file Optimization results Context difference file Conversion ID Arguments Operation log Context difference file Operation log Patch algorithm Apply automatic conversion

16
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 16 Scenario of Proposed Algorism to Save Automatic Operations program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Algorithm for saving operation history program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end pseudo change file Step 1: Generate pseudo change file by inserting special lines to specify range for the automatic operation. Step 2: Create context difference file between the file before editing and the pseudo change file “loop unrolling” *** opeB.F Sat Jul 11 11:36:34 2009 --- opeC2.F Sun Jul 12 13:36:10 2009 *************** *** 19,27 **** --- 19,29 ---- enddo t2 = rtc() - s s = rtc() + $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo + $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 4 By saving this context difference file, range-adjust algorithm of “patch” command can be used for identifying the target range of automatic conversion. Step 3: Save identifier of automatic conversion operation (e.g. “loop unrolling”), its parameter (e.g. “4”), and the context difference file as its operation log. context difference file parameter Identifier of automatic conversion Operation log

17
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 17 Scenario of Proposed Algorism to Apply Automatic Operation (Step 1) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = 3.14159265d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + x(i) ** (-1) enddo t2 = rtc() - s s = rtc() do i = 2, n b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Algorithm for applying operation history on modified target code Step1: Apply the context diff file to the target program by using algorithm used by the “patch” command. Trial 1: Apply the history at the same position Not Match Trial 2: Ignore the starting and ending line numbers Match “loop unrolling” *** opeB.F Sat Jul 11 11:36:34 2009 --- opeC2.F Sun Jul 12 13:36:10 2009 *************** *** 19,27 **** --- 19,29 ---- enddo t2 = rtc() - s s = rtc() + $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo + $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 4 context difference file parameter Identifier of automatic conversion Operation log Trial 3: Ignore outer most one line before/after the modification Trial 4: Ignore outer most two lines before/after the modification pseudo change file program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end

18
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 18 Scenario of Proposed Algorism to Apply Automatic Operation (Step 2) Algorithm for applying operation history on modified target code Step2: Redo automatic conversion with its parameter saved in the operation log. *** opeB.F Sat Jul 11 11:36:34 2009 --- opeC2.F Sun Jul 12 13:36:10 2009 *************** *** 19,27 **** --- 19,29 ---- enddo t2 = rtc() - s s = rtc() + $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo + $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 context difference file parameter Identifier of automatic conversion Operation log pseudo change file program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, fourpi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = 3.14159265d0 s = rtc() fourpi = pi * 4.0d0 do i = 1, n x(i) = i * sin(i / fourpi) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1.0d0 / x(i) enddo t2 = rtc() - s s = rtc() $BEGIN do i = 2, n b = b + ((x(i) + a) / fourpi + 1.0d0) enddo $END t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end “loop unrolling” 4 Redo “loop unrolling” “4” times on “the loop”

19
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 19 Proposed Algorism to Apply Automatic Operation (Final Results) program sample implicit none integer i, n parameter(n=10000000) real*8 a, b, pi, x(n), sin, s, t1, t2, t3, rtc a = 0 b = 0 pi = 3.14159265d0 s = rtc() do i = 1, n x(i) = i * sin(i / (pi * 4.0d0)) enddo t1 = rtc() - s s = rtc() do i = 1, n a = a + 1 / x(i) enddo t2 = rtc() - s s = rtc() do i = 2, n, 4 b = b + ((x(i) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+1) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+2) + a) / (pi * 4.0d0) + 1.0d0) b = b + ((x(i+3) + a) / (pi * 4.0d0) + 1.0d0) enddo t3 = rtc() - s write(*,*) 'a=', a, 'b=', b write(*,*) 'time=', t1, t2, t3 end Problem solved. The correct line is unrolled !! The proposed system can reapply automatic conversion operations correctly.

20
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 20 Outline of this Presentation 1.Proposal of an algorithm for managing operation history of source-to-source optimization. 2.Prototype system with new user interface for managing operation history explicitly.

21
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 21 Prototype Implementation of the Proposed System Implemented as an Eclipse plug-in module –Worked with open source CDT/Photran modules –Use CDT/Photran’s C/Fortran parser Eclipse Photran module (Fortran) Open Source HPC refactoring module CDT module (C) Open Source Pre-defined Transformation rules User defined Transformation rules User defined Transformation rules

22
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 22 Proposal of user interface for operation history management system Source code tree view Information and console output view Source code view Operation history view 1. Operation History is displayed as a sequence, and user can select and modify any point of source code. 3. Operations are categorized into the following three categories according to the status and necessity of the reapplication, and are displayed by using three colors. Green: Applied Yellow: Not tried to applied Red: Tried to applied, but fail. 2. The succeeding operations are automatically reapplied as needed to produce a new version according to the user’s instructions.

23
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 23 Conclusion 1.Explained proposal of an algorithm for managing operation history of source-to-source optimization. 2.Explained Prototype system with new user interface for managing operation history explicitly.

24
© 2009 IBM Corporation 19-20 July, 2009 | PADTAD 2009 @ Chicago, Illinois 24 Questions ?

Similar presentations

OK

New Algorithms for SIMD Alignment Liza Fireman - Technion Ayal Zaks – IBM Haifa Research Lab Erez Petrank – Microsoft Research & Technion.

New Algorithms for SIMD Alignment Liza Fireman - Technion Ayal Zaks – IBM Haifa Research Lab Erez Petrank – Microsoft Research & Technion.

© 2019 SlidePlayer.com Inc.

All rights reserved.

Ads by Google