Presentation is loading. Please wait.

Presentation is loading. Please wait.

Out-of-Order Commit Processor

Similar presentations


Presentation on theme: "Out-of-Order Commit Processor"— Presentation transcript:

1 Out-of-Order Commit Processor
Adrain Cristal, Daniel Ortega, Josep Llosa and Mateo Valero

2 Performance Limiting factors
Widening gap between memory and processor performance Increasing wire delays

3 High Memory Latency Current Solution
ROB Cache Miss LD R1, 0(R3) Multiple cache hierarchy Large number of in-flight instructions ROB Register File Load Store Queue Instruction queue DADDI R2, R4 #2

4 Motivation

5 Motivation

6 Goal of the paper To support large number of in-flight instructions without up-sizing ROB and Instruction queue Out-of-Order commit Slow Lane Instruction Queuing

7 Re-Order Buffer In-order commit, to handle precise interrupts
Controls exactly when stores can write to the memory Frees physical register Enable processor to recover from branch mis-prediction Keeps track of all in-flight instructions Large in-flight instruction Huge ROB structure Cycle time limitation

8 Checkpointing instead of ROB

9 Implementation CAM (Content-addressable memory) register mapping
Inclusion of Future Free bit For freeing physical register Free List Used for choosing free register

10 Checkpointing

11 Operation

12 Operation

13 Checkpoint Valid bits Future free bits
Number of (active) instructions in that checkpoint

14 Heuristic for taking checkpoints
First branch after 64 instructions Every 512 instructions After 64 stores After flushing the pipeline

15 Slow Lane Instruction Queuing

16 Slow Lane Instruction Queuing
Identifying instructions that will take long time Put them in a secondary buffer till it gets ready Alternate paper that considers these as critical instructions and put them in the fast queue

17 SLIQ Pseudo-ROB for finding long latency instructions
Slow queue to store the long latency instructions 32-bit register for 32 logical register to keep track of the dependency

18 Wakening of instructions in SLIQ
Every long latency load is stored in SLIQ along with its destination register Wakening done at a pace of four instructions per cycle LD R1, 0(R3) DADDI R2, R4 #2 New Load

19 Baseline Processor Configuration

20 RESULTS

21 Effect of delay in re-insertion
Clearly shows that the program is highly parallel What about integer programs?

22 Number of In-flight instructions

23 Results

24 Ephemeral Registers Conventional Scheme Virtual Physical Registers
Early release Ephemeral Registers

25 Early Release Early Release of Registers
Needs a pending counter for each register When an instruction is decoded, each pending counter associated with the source registers is incremented and when the instruction ins are issued, the pending counter is decremented. The instructions in a wrong path, are nullified and issued in order to maintain the pending counter Coupled with the renaming logic CAM maps table scheme A register can be freed if it is not referenced in any map table, and if its pending counter is zero.

26 Virtual Registers Decouple renaming from physical register allocation
Requires two map tables – GMT (General Map Table) and PMT (Physical Map Table) PMT - New table which maps virtual register to physical register

27 Putting it together

28 Analysis How efficient these methods are for integer programs which have Very little parallelism Very poor branch prediction accuracy Lengthy critical path How Scalable is the CAM scheme they have used for future processors having hundreds of physical register and running at very high clock speed Impact of these techniques on power


Download ppt "Out-of-Order Commit Processor"

Similar presentations


Ads by Google