Presentation is loading. Please wait.

Presentation is loading. Please wait.

UPC Trace-Level Speculative Multithreaded Architecture Carlos Molina Universitat Rovira i Virgili – Tarragona, Spain Antonio González.

Similar presentations


Presentation on theme: "UPC Trace-Level Speculative Multithreaded Architecture Carlos Molina Universitat Rovira i Virgili – Tarragona, Spain Antonio González."— Presentation transcript:

1 UPC Trace-Level Speculative Multithreaded Architecture Carlos Molina Universitat Rovira i Virgili – Tarragona, Spain cmolina@etse.urv.es Antonio González and Jordi Tubella Universitat Politècnica de Catalunya – Barcelona, Spain {antonio,jordit}@ac.upc.es ICCD´02, Freiburg (Germany) - September 16-18, 2002

2 Outline Motivation Related Work TSMA Performance Results Conclusions

3 Motivation Two techniques to avoid serialization caused by data dependences Data Value Speculation Data Value Reuse Speculation predicts values based on past Reuse is posible if has been done in the past Both may be considered at two levels Instruction Level Trace Level

4 Trace Level Reuse Set of instructions can be skipped in a row These instructions do not need to be fetched Live input test is not easy to handle Dynamic Trace Level Reuse Static

5 Trace Level Speculation Solves live input test Introduces penalties due to misspeculations Two orthogonal issues microarchitecture support for trace speculation control and data speculation techniques –prediction of initial and final points –prediction of live output values With Live Output Test Trace Level Speculation With Live Input Test

6 Trace Level Speculation with Live Input Test Live Output Actualization & Trace Speculation NST ST Miss Trace Speculation Detection & Recovery Actions INSTRUCTION EXECUTION NOT EXECUTED LIVE INPUT VALIDATION & INSTRUCTION EXECUTION

7 BUFFER Trace Level Speculation with Live Output Test Live Output Actualization & Trace Speculation NST ST Miss Trace Speculation Detection & Recovery Actions INSTRUCTION EXECUTION NOT EXECUTED LIVE OUTPUT VALIDATION

8 Related Work Trace Level Reuse Basic blocks (Huang and Lilja, 99) General traces (González et al, 99) Traces with compiler support (Connors and Hwu, 99) Trace Level Speculation DIVA (Austin, 99) Slipstream processors (Rotenberg et al, 99) Pre-execution (Sohi et al, 01) Precomputation (Shen et al, 01) Nearby and distant ILP (Balasubramonian et al, 01)

9 TSMA Cache I Engine Fetch Rename Decode & Units Functional Predictor Branch Speculation Trace NST Reorder Buffer ST Reorder Buffer NST Ld/St Queue ST Ld/St Queue NST I Window ST I Window Look Ahead Buffer Engine Verification L1NSDC L2NSDC L1SDC Data Cache Register File NST Arch. Register File ST Arch.

10 Trace Speculation Engine Two issues may handle to implement a trace level predictor to communicate trace speculation opportunity Trace level predictor PC-indexed table with N entries Each entry contains –live output values –final program counter of trace Trace speculation communication INI_TRACE instruction Additional MOVE instrucions

11 Look Ahead Buffer First-input first-output queue Stores instructions executed by ST The fields of each entry are: Program Counter Operation Type: indicates memory operation Source register Id 1 & source value 1 Source register Id 2 & source value 2 Destination register Id & destination value Memory address

12 Verification Engine Validates speculated instructions Mantains the non-speculative state Consumes instructions from LAB Test is performed as follows: testing source values of Is with non-speculative state if matching, destination value of I may be updated memory operations check effective address store instructions update memory, rest update registers Hardware required is minimal

13 Thread Synchronization Handles trace misspredictions Recovery actions involved are: Instruction execution is stopped ST structures are emptied (IW,LSQ,ROB,LAB) Speculative cache and ST register file are invalidated Two types of synchronization Total (Occurs when NST is not executing instructions) –Penalty due to fill again the pipeline Partial (Occurs when NST is executing instructions) –No penalty –NST takes the role of ST

14 Mantains memory state speculative non speculative Rules ST store updates values in L1SDC only 1 Traditional memory subsystem is supported Additional and small first level cache is added to mantain memory speculative state ST load get values from L1SDC. If not, get from NS caches 2 NST store updates values and allocate space in NS caches 3 NST loads get values and allocates space in NS caches 4 Line replaced in L1NSDC is copied back to L2NSDC 5 Memory Subsystem L1SDC L1NSDC L2NSDC

15 Register File Slight modification to permit prompt execution Register map table contains for each entry: Commited Value ROB Tag Counter Counter field is mantained as follows: New ST instruction increases dest. register counter Counter is decreased when ST instruction is commited After trace speculation counter are no longer increased But it is decreased until reaches the value zero.

16 1 ST Begins Execution 2 Live Output Actualization & Trace Speculation 3 NST Begins Execution 4 VE Validates Instructions 5 NST Executes Speculated Trace 6 NST Executes Some Additional Instructions 7 VE Begins Verification 8 VE Finishes Verification 9 Live Output Actualization & Trace SpeculationNST Execution 10 ST INSTRUCTION EXECUTION NOT EXECUTED LIVE OUTPUT VALIDATION Working Example NST VE 1 2 3 5 6 9 4 7 10 8

17 Experimental Framework Simulator Alpha version of the SimpleScalar Toolset Benchmarks Spec95 Maximum Optimization Level DEC C & F77 compilers with -non_shared -O5 Statistics Collected for 125 million instructions Skipping initializations

18 Base Microarchitecture

19 TSMA Additional Stuctures

20 Performance Evaluation Main objective: trace misspeculations cause minor penalties Traces are built following a simple rule from backward branch to backward branch minimum and maximum size of 8 and 64 respectively Simple Trace Predictor is evaluated Stride + Context Value (history of 9) Results provided Percentage of misspeculations Percentage of predicted instructions Speedup

21 Misspeculations 100 90 80 70 60 50 40 30 20 10 0

22 Predicted Instructions 50 40 30 20 10 0

23 Speedup 1.35 1.30 1.25 1.20 1.15 1.10 1.05 1.00

24 Conclusions TSMA designed to exploit trace-level speculation Special emphasis on minimizing misspeculation penalties Results show: architecture is tolerant to misspeculations speedup of 16% with a predictor that misses 70%

25 Future Work Agressive trave level predictors bigger traces better value predictors Generalization to multiple threads cascade execution Mixing prediction & execution speculated traces do not need to be fully speculated


Download ppt "UPC Trace-Level Speculative Multithreaded Architecture Carlos Molina Universitat Rovira i Virgili – Tarragona, Spain Antonio González."

Similar presentations


Ads by Google