Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Evaluation of Auto-Scoping in OpenMP

Similar presentations


Presentation on theme: "An Evaluation of Auto-Scoping in OpenMP"— Presentation transcript:

1 An Evaluation of Auto-Scoping in OpenMP
Michael Voss, Eric Chiu, Patrick Chow, Catherine Wong and Kevin Yuen ECE Department University of Toronto

2 An Overview of Auto-scoping
Dieter an Mey proposed Auto-scoping as an extension to OpenMP ( Relieve users from burden of explicit scoping error prone tedious compromise: explicit and automatic parallelization analysis is similar to automatic parallelization successful in 1 of 2 scientific programs WOMPAT 2004

3 Using DEFAULT(AUTO) C$OMP PARALLEL DO SHARED(A,B) C$OMP&PRIVATE(I,J)
DO I = 1,100 DO J = 1,100 A(I,J) = A(J,I) + B(I,J) ENDDO C$OMP END PARALLEL DO C$OMP PARALLEL DO C$OMP&DEFAULT(AUTO) DO I = 1,100 DO J = 1,100 A(I,J) = A(J,I) + B(I,J) ENDDO C$OMP END PARALLEL DO WOMPAT 2004

4 Outline of Talk Introduction Implementing DEFAULT(AUTO) in Polaris
An evaluation of DEFAULT(AUTO) in Polaris comparison with EA Sun Studio 9 F95 compiler A Discussion of runtime support Related Work Conclusion WOMPAT 2004

5 Implementing DEFAULT(AUTO) in Polaris
Polaris is auto-parallelizer for Fortran 77 Supports a range of advanced techniques The Range Test The Omega Test Array and Scalar Privatization Array and Scalar Reduction Recognition Induction Variables Substitution Interprocedural Constant Propagation Most Interprocedural Optimization by Inlining WOMPAT 2004

6 Polaris as an OMP to OMP Translator
Parser DDtest pass Reduction pass Privatization pass OpenMP Backend Fortran 77 Fortran 77 + OpenMP Polaris Parser Moerae Backend Fortran 77 + Moerae calls Fortran 77 + OpenMP Original automatic parallelization path OpenMP to explicitly threaded code path New OpenMP to OpenMP path WOMPAT 2004

7 Supporting DEFAULT(AUTO)
Parse DEFAULT(AUTO) React appropriately to user directives selective loop parallelization no changes without AUTO directive user scoping overrides Polaris scoping can parallelize loops that cannot be fully auto-scoped Limitations only regions with PARALLEL DO semantics bails out on general parallel regions WOMPAT 2004

8 Example 1: No explicit scoping
!$OMP PARALLEL DEFAULT(AUTO) DO N = 1,7 DO M = 1,7 !$OMP DO DO L = LSS(itsub),LEE(itsub) I = IG(L) J = JG(L) K = KG(L) LIJK = L2IJK(L) RHS(L,M) = RHS(L,M) + - FJAC(LIJK,LM00,M,N)*DQCO(i-1,j,k,n,NB)*FM00(L) + - FJAC(LIJK,LP00,M,N)*DQCO(i+1,j,k,n,NB)*FP00(L) + - FJAC(LIJK,L0M0,M,N)*DQCO(i,j-1,k,n,NB)*F0M0(L) + - FJAC(LIJK,L0P0,M,N)*DQCO(i,j+1,k,n,NB)*F0P0(L) ENDDO !$OMP END DO NOWAIT !$OMP END PARALLEL WOMPAT 2004

9 Example 1: No explicit scoping
!$OMP PARALLEL !$OMP+DEFAULT(SHARED) !$OMP+PRIVATE(M,L,N) DO n = 1, 7, 1 DO m = 1,7, 1 !$OMP DO DO l = lss(itsub), lee(itsub), 1 rhs(l, m) = rhs(l, m)+(-dqco(ig(l), (-1)+jg(l), kg(l), n, nb))* *f0m0(l)*fjac(l2ijk(l), l0m0, m, n)+(-dqco(ig(l), 1+jg(l), kg(l), n *, nb))*f0p0(l)*fjac(l2ijk(l), l0p0, m, n)+(-dqco((-1)+ig(l), jg(l) *, kg(l), n, nb))*fjac(l2ijk(l), lm00, m, n)*fm00(l)+(-dqco(1+ig(l) *, jg(l), kg(l), n, nb))*fjac(l2ijk(l), lp00, m, n)*fp00(l) ENDDO !$OMP END DO NOWAIT !$OMP END PARALLEL WOMPAT 2004

10 Example 2: Explicit scoping
SUBROUTINE RECURSION(n,k,a,b,c,d,e,f,g,h,s) REAL*8 A(*),B(*),C(*),D(*),E(*),F(*),G(*),H(*) REAL*8 T,S INTEGER N,K,I S = 0.0D0 C$OMP PARALLEL SHARED(D) C$OMP+DEFAULT(AUTO) C$OMP DO DO I = 1,N T = F(I) + G(I) A(I) = B(I) + C(I) D(I+K) = D(I) + E(I) H(I) = H(I) * T S = S + H(I) END DO C$OMP END DO C$OMP END PARALLEL END WOMPAT 2004

11 Example 2: Explicit scoping
SUBROUTINE recursion(n, k, a, b, c, d, e, f, g, h, s) DOUBLE PRECISION a, b, c, d, e, f, g, h, s, t INTEGER*4 i, k, n DIMENSION a(*), b(*), c(*), d(*), e(*), f(*), g(*), h(*) s = 0.0D0 !$OMP PARALLEL !$OMP+DEFAULT(SHARED) !$OMP+PRIVATE(T,I) !$OMP DO !$OMP+REDUCTION(+:s) DO i = 1, n, 1 t = f(i)+g(i) a(i) = b(i)+c(i) d(i+k) = d(i)+e(i) h(i) = h(i)*t s = h(i)+s ENDDO !$OMP END DO !$OMP END PARALLEL RETURN END WOMPAT 2004

12 Evaluation of DEFAULT(AUTO)
Fortran 77 Benchmarks from SPEC OpenMP removed all explicit scoping added DEFAULT(AUTO) to all regions used Omni OpenMP compiler as backend (-O2) Explicit speedup –vs- auto-scope speedup four processor Xeon server 1.8 GHz processors, 16 GBytes main memory Hyperthreaded, but only used 1 thread per CPU Also used EA Sun Studio 9 Fortran 95 compiler supports DEFAULT(__AUTO) report number of regions auto-scoped WOMPAT 2004

13 Performance of Auto-scoping
Benchmark Regions % Scoped Polaris Sun Explicit Speedup Auto-Scoped Applu 33 94% 72% 5.8 1.1 Apsi 32 47% 25% 2.8 1 Mgrid 12 100% 83% 3.6 Swim 8 63% 1.9 Wupwise 10 40% 20% 3.0 1.25 Sun results are for the Early Access Version of the Sun Microsystems Studio 9 Fortran 95 compiler. WOMPAT 2004

14 Discussion Many regions were not fully analyzable
Polaris could not fully inline the regions several regions were general parallel regions Early Access Sun Studio 9 compiler auto-scoped fewer regions in general missed important regions in Swim and Mgrid regions could be parallelized but not auto-scoped Sun compiler could auto-scope some regions that Polaris could not can analyze general parallel regions WOMPAT 2004

15 A general parallel region from Wupwise Polaris fails but the Sun compiler succeeds
C$OMP PARALLEL DEFAULT(AUTO) LSCALE = ZERO LSSQ = ONE C$OMP DO DO IX = 1, 1 + (N - 1) *INCX, INCX IF (DBLE (X(IX)) .NE. ZERO) THEN ... LSSQ = ONE + LSSQ* (LSCALE / TEMP) ** 2 LSCALE = TEMP END IF END DO C$OMP END DO C$OMP CRITICAL IF (SCALE .LT. LSCALE) THEN SSQ = ((SCALE / LSCALE) ** 2) * SSQ + LSSQ SCALE = LSCALE ELSE SSQ = SSQ + ((LSCALE / SCALE) ** 2) * LSSQ C$OMP END CRITICAL C$OMP END PARALLEL WOMPAT 2004

16 Runtime Support for Auto-scoping
add speculate directive for regions that cannot be auto-scoped applies to very few regions in SPEC OpenMP requires interprocedural marking of reads/writes only 2 regions not auto-scoped can be fully analyzed !$OMP PARALLEL !$OMP+DEFAULT(SHARED) !$OMP+PRIVATE(U51K,U41K,U31K,Q,U21K,M,K,I,U41,U31KM1,U51KM1,U21KM1) !$OMP+PRIVATE(U41KM1,TMP,J) !$OMP+SPECULATE(UTMP,RTMP) !$OMP DO !$OMP+LASTPRIVATE(FLUX2) DO j = jst, jend, 1 ... ENDDO !$OMP END DO !$OMP END PARALLEL (a region from the RHS subroutine of Applu) WOMPAT 2004

17 Related Work DEFAULT(AUTO) proposed by Dieter an Mey
Many commercial and research auto-parallelizers Polaris, SUIF, CAPO, … Perform parallelization and scoping The EA Sun Studio 9 Fortran 95 Compiler paper also here at WOMPAT thanks to Yuan Lin for pointing me to it Runtime dependence testing Saltz, Rauchwerger, … WOMPAT 2004

18 Conclusion Implemented DEFAULT(AUTO) in Polaris
created full OpenMP to OpenMP translator added facilities for auto-scoping Evaluated implementation 2 of 5 benchmarks fully auto-scoped remainder showed significant loss of speedup results different from EA Sun compiler performance not portable across compilers Discussed speculative parallelization support WOMPAT 2004

19 Conclusion cont… Combination of loop and region analyzer
Polaris auto-scoped more regions Sun compiler can handle general regions Performance not be portable across compilers never is but… sacrifice performance for convenience perhaps a useful tool during manual parallelization Future work general region support in Polaris WOMPAT 2004


Download ppt "An Evaluation of Auto-Scoping in OpenMP"

Similar presentations


Ads by Google