Presentation is loading. Please wait.

Presentation is loading. Please wait.

UPC Dynamic Removal of Redundant Computations Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona

Similar presentations


Presentation on theme: "UPC Dynamic Removal of Redundant Computations Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona"— Presentation transcript:

1 UPC Dynamic Removal of Redundant Computations Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona {cmolina,antonio,jordit}@ac.upc.es ICS´99, Rhodes (Greece) - June 20-25, 1999

2 UPC for (i=0; i<N; i++) A[i] = B[i]+C[i];..... R = S / T ;..... X = S / U ;..... Motivation Quasi - invariantQuasi-common subexpression

3 UPC Outline Instruction Reuse Related Work Redundant Computation Buffer Performance Results Conclusions

4 UPC Instruction Reuse Fetch Decode & Rename Commit OOO Execution Reuse Mechanism index

5 UPC Related Work Instruction Reuse Value Cache for the Tree Machine (Harbison 82) Result Cache (Richardson 92, Oberman et al. 95) Reuse Buffer (Sodani and Sohi 97) Physical Register Reuse (Jourdan et al. 98) Trace Reuse Basic blocks (Huang and Lilja 99) General traces (González et al. 99)

6 UPC Related Work Result Cache Richardson 92, Oberman & Flynn 95 –Special purpose (long latency operations) –Indexed by operand values –No reuse chaining –Can reuse dynamic instances of other static instructions Reuse Buffer Sodani & Sohi 97 –General purpose –Indexed by PC –Reuse chaining –Only reuse dynamic instances of same static instructions

7 UPC Redundant Computation Buffer Vtabl e Atable pointer opcoderesult/addressopnd1opnd2pointer Atable address tag result Mtable Reuse Test Reused Value Reused Memory Value

8 UPC RCB (Working Example) I1: 8 / 2 = 4 Vtable Atable 10: div8nil24 4 while (cond) { r = s / t ;...... x = s / u ; }

9 UPC 20: div824 nil RCB (Working Example) Vtable 10: Atable div8nil24 4 while (cond) { r = s / t ;...... x = s / u ; } I2: 8 / 2 = 4

10 UPC Vtable 10: Atable div8nil24 4 while (cond) { r = s / t ;...... x = s / u ; } I2: 8 / 2 = 4 20: div824 RCB (Working Example)

11 UPC 20: div8nil24div8nil24div9nil33 Vtable 10: Atable 4 while (cond) { r = s / t ;...... x = s / u ; } I1: 9 / 3 = 3 3 I2: 9 / 3 = 3 RCB (Working Example)

12 UPC Enhanced Result Cache Mtable address tag result Atable opcoderesult/addressopnd1opnd2 Operands Enhanced Reuse Buffer Mtable Atable opcoderesult/addressopnd1opnd2 address tag result PC Enhancements to Other Schemes

13 UPC Timing Considerations fetchissue commit execute write back decode& rename opnd read &dispatch Pipeline Stages Atable lookup reuse test Latency of the Reuse Buffer 1 st Atable lookup reuse test 2 nd Atable lookup Latency of the RCB Atable lookup reuse test Latency of the Result Cache

14 UPC Experimental Framework Simulator Alpha version of the SimpleScalar Toolset Benchmarks Spec95 Maximum Optimization Level DEC C & F77 compilers with -non_shared -O5 Statistics Collected for 125 million instructions Skipping initializations

15 UPC Basic Reuse Statistics We evaluate different schemes - Enhanced Result Cache (ERC) - Enhanced Reuse Buffer (ERB) - Redundant Computation Buffer (RCB) We find best configuration for each scheme - Number of entries - History depth Best configurations will be evaluated - Percentage of reuse - Speedup

16 UPC Quasi-Common Subexpressions 32 KB

17 UPC Study of Reuse (ERB) | | | | | | | | | 8 16 32 64 128 256 512 1024 2048 4096 Size in Kbytes

18 UPC Study of Reuse (RCB) | | | | | | | | | 8 16 32 64 128 256 512 1024 2048 4096 Size in Kbytes

19 UPC Study of Reuse (Comparative) | | | | | | | | | 8 16 32 64 128 256 512 1024 2048 4096 Size in Kbytes

20 UPC Performance Evaluation Two different capacities are evaluated - 32 KB - 200 KB Best configuration has been chosen for each reuse scheme We present a performance evaluation for a supercalar processor - Speedup - Percentage of reuse

21 UPC Base Microarchitecture

22 UPC Speedup (32 KB) 1.20 1.10 1.00 1.05 1.15

23 UPC Speedup (200 KB) 1.25 1.20 1.15 1.10 1.05 1.00

24 UPC Reuse (32 KB) Ops ready

25 UPC Reuse (200 KB) Ops ready

26 UPC Reuse by Instruction Category  Load Value  Memory Address  Arithmetic  Cond Branch

27 UPC Hybrid Scheme opcores/addrop1op2pointer Atable PC Atable opcores/addrop1op2pointer PC Opnds opcores/addrop1op2 nil Atable opcodresult/addropnd1opnd2 Opnds

28 UPC Speedup (Hybrid Scheme) 1.20 1.10 1.05 1.00 1.15

29 UPC Reuse (Hybrid Scheme)

30 UPC Speedup (Perfect Reuse Engine) 1.60 1.40 1.80 2.00 2.20 1.20 1.00

31 UPC Conclusions Redundant Computation Buffer Quasi-invariants Quasi-common subexpressions High reuse coverage and low latency 30% reuse 10% speedup Outperforms previous schemes


Download ppt "UPC Dynamic Removal of Redundant Computations Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona"

Similar presentations


Ads by Google