Presentation is loading. Please wait.

Presentation is loading. Please wait.

UPC Trace-Level Reuse A. González, J. Tubella and C. Molina Dpt. d´Arquitectura de Computadors Universitat Politècnica de Catalunya 1999 International.

Similar presentations


Presentation on theme: "UPC Trace-Level Reuse A. González, J. Tubella and C. Molina Dpt. d´Arquitectura de Computadors Universitat Politècnica de Catalunya 1999 International."— Presentation transcript:

1 UPC Trace-Level Reuse A. González, J. Tubella and C. Molina Dpt. d´Arquitectura de Computadors Universitat Politècnica de Catalunya 1999 International Conference on Parallel Processing ICPP´99

2 UPC September 21, 1999ICPP´992 Motivation Increase performance by overcoming dataflow limitation DATA SPECULATION Exploits predictability of values DATA REUSE Exploits redundancy of computations

3 UPC September 21, 1999ICPP´993 Motivation Redundant computations are rather frequent code loops, recursive subroutines data finite domain of values The results could be reused instead of recomputed OUT = f (IN) dynamic execution stream redundant computations

4 UPC September 21, 1999ICPP´994 Motivation Reuse granularity an instruction a sequence of instructions TRACE-LEVEL REUSE Performance potential of data reuse at instruction-level at trace-level

5 UPC September 21, 1999ICPP´995 Outline Trace-level reuse Performance potential A first approach Related work Conclusions

6 UPC September 21, 1999ICPP´996 Trace-Level Reuse Trace Any dynamic sequence of instructions Goal Avoid the execution of a trace by reusing its results provided that the same trace with the same inputs has already been executed Advantages Reduces other machine resources utilization Reduces time to compute results Allows the processor to exceed the dataflow limit

7 UPC September 21, 1999ICPP´997 Trace-Level Reuse Hardware scheme Main Issues Reuse Trace Memory (RTM) Dynamic trace collection Reuse test State update

8 UPC September 21, 1999ICPP´998 Reuse Trace Memory (RTM) RTM stores candidate traces to be reused Initial Address Input registers identifiers&contents Input memory addresses&contents Output registers identifiers&contents output memory addresses&contents Next Address Trace inputTrace output TRACE INPUT OUTPUT

9 UPC September 21, 1999ICPP´999 Dynamic trace collection Chooses candidate traces Initial address Next address Input and output trace locations are computed at execution-time and stored along with their values in RTM

10 UPC September 21, 1999ICPP´9910 Reuse Test & State Update Reuse test At some points of the execution the reused test is performed Checks if a trace input, stored in RTM, matches the current execution state State update Writes output trace values to output trace locations REUSE LATENCY Reuse test plus State update

11 UPC September 21, 1999ICPP´9911 Outline Trace-level reuse Performance Potential A first approach Related work Conclusions

12 UPC September 21, 1999ICPP´9912 Performance Potential Base-line machine ISA: Alpha Only constrained by: Data dependences Data dependences + Finite instruction window Reuse engine Perfect trace reuse Maximum-length traces Minimum number of traces

13 UPC September 21, 1999ICPP´9913 Performance Potential Instruction-level reuse (ILR) Perfect instruction reuse engine: All previous executed instances of each instruction are checked for a possible reuse Maximum reusability: almost 90%

14 UPC September 21, 1999ICPP´9914 ILR Performance limits Base-line machine constrained by data dependences Reuse engine: 1-cycle latency

15 UPC September 21, 1999ICPP´9915 ILR Performance limits Base-line machine constrained by data dependences data dependences and instruction window Reuse latency: 1 to 4 cycles

16 UPC September 21, 1999ICPP´9916 ILR Performance limits Moderate potential with a perfect reuse engine Instruction latency is reduced The reuse of a chain of dependent instructions is still a sequential process Source operands must be ready

17 UPC September 21, 1999ICPP´9917 Performance Potential Trace-level reuse (TLR) Perfect reuse engine Traces consist of maximum-length dynamic sequences of reusable instructions –Upper bound of the maximum reusability –Lower bound of the minimum traces I1 I2 I3 I4 I5 I6 TRACE

18 UPC September 21, 1999ICPP´9918 TLR Average trace size: 15.0 instructions FP: 11.7 INT: 20.3 203 116

19 UPC September 21, 1999ICPP´9919 TLR Performance limits Base-line machine constrained by data dependences ans instruction window (256-entry) Reuse engine latency Constant Linear: f(#INPUTS+#OUTPUTS) CONSTANTLINEAR

20 UPC September 21, 1999ICPP´9920 Outline Trace-level reuse Performance potential A first approach Related work Conclusions

21 UPC September 21, 1999ICPP´9921 A First Approach Reuse Trace Memory (RTM) Indexed by trace initial address (4-way and 8-way) Maximum number of input and output values: 8 register values 4 memory values Sizes 512 entries (4 different entries per initial address) 4K entries (8 entries per initial address) 32K entries (16 entries per initial address) 256K entries (16 entries per initial address)

22 UPC September 21, 1999ICPP´9922 A First Approach In-order execution Reuse test performed for every fetch operation PC Instruction Cache RTM RTM entry Reuse Test Execute Commit Fetch Decode

23 UPC September 21, 1999ICPP´9923 A First Approach Dynamic trace collection Built traces have all instructions reusable an additional memory to check instruction reusability is needed Fixed-length traces starting at any address Trace expansion on reuse hit

24 UPC September 21, 1999ICPP´9924 Reusable Instructions 25% reusability for a 4K-entry RTM

25 UPC September 21, 1999ICPP´9925 Trace Size 6 instructions for a 4K-entry RTM

26 UPC September 21, 1999ICPP´9926 Related work Data Reuse Software implementation Memoization [Richardson,92] Hardware implementation Tree Machine [Harbison,82] At instruction-level Reuse Buffer [Sodani and Sohi,97] Register renaming [Jourdan et al.,98] Redundant Computation Buffer [Molina, González and Tubella,99] At “trace”-level Result cache [Richardson,93] [Oberman and Flynn,95] Basic block reuse [Huang and Lilja,99]

27 UPC September 21, 1999ICPP´9927 Conclusions Increasing the granularity of reuse from instructions to traces Less reusability More effective Fetch band-width is reduced Effective instruction window size is increased Number of operations per reused instruction is reduced DATA DEPENDENCES ARE BROKEN

28 UPC September 21, 1999ICPP´9928 Conclusions Concentrate effort in divising strategies to choose reusable traces High-level structures Compiler assistance reducing the reuse test overhead Boolean test Invalidate/validate RTM entries


Download ppt "UPC Trace-Level Reuse A. González, J. Tubella and C. Molina Dpt. d´Arquitectura de Computadors Universitat Politècnica de Catalunya 1999 International."

Similar presentations


Ads by Google