Presentation is loading. Please wait.

Presentation is loading. Please wait.

Register Renaming & Value Prediction. Overview ► Need for Post-RISC ► Register Renaming vs. Allocation Strategies ► How to compile for Post-RISC machines.

Similar presentations

Presentation on theme: "Register Renaming & Value Prediction. Overview ► Need for Post-RISC ► Register Renaming vs. Allocation Strategies ► How to compile for Post-RISC machines."— Presentation transcript:

1 Register Renaming & Value Prediction

2 Overview ► Need for Post-RISC ► Register Renaming vs. Allocation Strategies ► How to compile for Post-RISC machines ► Dynamic Register Renaming through Virtual-Physical Registers

3 Software Outlives Hardware ► How to make old software run faster? Faster CPU clock and memory hierarchy Adapt CPU’s to actual software (profiling/tuning) More instructions per cycle ► Today’s software will run on tomorrow’s CPU’s Need to keep software interface stable More functional units and registers

4 Compile-time vs. Run-time ► Little is known about software at compile-time ► Space/time trade-offs Memory speeds cannot keep up with CPU speeds When to apply optimizations that increase code-size

5 Solutions ► New scalable architecture (IA-64) Decouple physical/virtual registers using register windows More explicit parallelism allows for more function units Explicit speculative instructions ► Post-RISC architecture Remove limits in super-scalar implementation of existing architectures Extract even more parallelism out of existing software

6 Anti- and Output Dependencies ► Also called read-after-write (RAW) hazards ► An instruction may use a result produced by the previous instruction Both instructions may not execute simultaneously in multiple pipelines. The second instruction must typically be stalled.

7 Structural Dependencies ► Stalls results in less than optimal performance ­We may have single-issue cycles, which process only a single instruction. ­Worse, we may have zero-issue cycles, which initiate no new instructions. ► Data dependencies can also limit performance for a scalar machine Two cycle memory load/write Intra-instruction dependencies

8 Scheduling ► Scheduling can remove stalls ► Intra-instruction dependencies cannot be removed by scheduling (CISC)

9 Need for Post-RISC ► Super-scalar has diminishing returns in CPI (Clocks Per Instruction) 2-Way  1.6 - 1.8 (85%) 4-Way  2.6 (65%) 8-Way  ??? ► More parallelism needed ► Look beyond set of 4 instructions

10 Post-RISC characteristics ► Out-of-order execution (Existed 20 years ago on IBM and CDC) Innovative for single-chip Branch history bits ► Precise interrupts ► Fetch/Flow Prediction ► More caching Instruction cache becomes CPU scratch space ► Register renaming First in IBM 360/91 FPU

11 Specint92 Trends ► Specint92 numbers are increasing DEC has historically been the champ ► Specint92/Clock rates DEC low (21164@300 => 1.14 10/95) IBM strong early (580H@55 => 1.76 9/93) HP (PA-8000@133 2.7 10/95)

12 The Post-RISC Architecture

13 Post-RISC CPU’s ► Traditional RISC DEC Alpha 21164 Sun UltraSPARC-1 ► (partially) Post-RISC PowerPC 604 MIPS R10000 HP PA-8000 Intel Pentium Pro DEC Alpha 21264 HAL SPARC64

14 Automatic Register Renaming ► Every R-write allocates new R ► The register name A is an alias for the last R allocated by a write to A ► An instruction reading and writing an register allocates a new R too

15 Advantages over More ISA Registers ► Smaller instructions ► Allow same software to run on range of implementations Compare the same program running on Pentium or AMD Ath ► Less state to save Faster function calls Faster context switches Life-times can be optimized

16 Renaming Implementation ► Rename Storage Locations Reorder Buffer Physical Register File ► Similarities: Allocate at decode Release at commit

17 Renaming using Reorder buffer ► Results are kept in reorder buffer ► Source operands are read either from the register file, or a reorder buffer entry ► Not-yet-ready results are forwarded to instruction queue ► Used by Intel Pentium III, PowerPC 604, SPARC64

18 Renaming on Pentium III ► All registers can be renamed (generic, floating-point, status) ► Renaming uses a set of 40 reorder buffers FPU control/status cannot be renamed Max 2 renamings per instruction

19 Register Allocation Example ► Minimal number of named registers ► Scheduling is limited ► Strictly serial execution rA := Mem1; rA := rA * rA; Mem2 := rA; rA := Mem3; rA := rA + 1; Mem4 := rA; Mem2 := Mem1 * Mem1; Mem4 := Mem3 + 1;

20 Renaming using Physical Register File ► Register file contains more registers than defined in ISA (logical registers) ► Map logical register to physical registers during decode ► Operands are always read from logical file ► Used by MIPS R10000 and DEC 21264

21 Virtual-Physical Registers ► Motivation: better utilization of physical registers Important in presence of long latency instructions ► Conventional scheme “wastes” register for each: Decoded instruction that has not finished execution Committed instruction whose result is dead ­Can be eliminated by maintaining reference counter Example: loadf2,0(r6) fdivf2,f2,f10 fmulf2,f2,f12 faddf2,f2,1

22 Virtual-Physical Register Renaming ► General Map Table Indexed by logical register L VP register: last virtual-physical register that L has been mapped to P register: Last physical register that L and VP have been mapped to V-bit: indicates whether P is valid ► Physical Map Table Has entry for each VP Contains last physical register that VP has been mapped to

23 Functional Description ► For each logical source register S do a GMT lookup If V-bit is set, rename S to P Otherwise, rename S to VP ► Rename the logical destination register to a new VP ► Update GMT: set VP to new mapping and reset V ► Save previous VP in reorder buffer to be able to roll back

24 Functional Description ► Instruction Queue Fields: Operation code Destination VP Source operands Ready-bits for source operands: when ready Source operand contains a physical register number ► Reorder Buffer Entry Destination logical register Completion bit VP mapping of last instruction with same logical destination

25 Functional Description ► When source operands are ready, instruction is issued ► When instruction completes: new physical register R is allocated for result PMT is updated to reflect new mapping VP number of destination is broadcast to all entries in instruction queue with physical register identifier GMT is updated: entry corresponding to logical destination is checked for match with the VP and if so, the physical register nr is copied to the P register field and the V flag is set As a result a new instruction using same logical register will find corresponding physical register in GMT Lastly, C flag of entry in reorder buffer is set

26 Register Allocation Example ► Uses more named registers ► Scheduling more effective ► 2-way super-scalar execution rA := Mem1; rB := Mem3; rA := rA * rA; rB := rB + 1; Mem2 := rA; Mem4 := rB; Mem2 := Mem1 * Mem1; Mem4 := Mem3 + 1;

27 Effect of Register Renaming ► Schedule uses 4 hardware registers ► 2-way super-scalar execution rA1 := Mem1; rB1 := Mem3; rA2 := rA1 * rA1; rB2 := rB1 + 1; Mem2 := rA2; Mem4 := rB2;

28 Effect of Register Renaming ► Schedule uses 4 hardware registers ► Can hide memory-write latency ► Still no full use of multiple pipelines rA1 := Mem1; rA2 := rA1 * rA1; Mem2 := rA2; rA3 := Mem3; rA4 := rA3 + 1; Mem4 := rA4;

29 Renaming and O-O-O execution ► Instructions wait for: Availability of execution unit Input dependencies Older instructions have priority Load instructions have priority ► Instructions do NOT wait for: Program order Branch resolution Output dependencies ­(use “rename register”)

30 Renaming and O-O-O execution ► Schedule uses 4 hardware registers ► Can hide memory-write latency ► “Bad” schedule uses both pipelines ► Only one register name used rA1 := Mem1; rA2 := rA1 * rA1; Mem2 := rA2; rA3 := Mem3; rA4 := rA3 + 1; Mem4 := rA4;

31 Renaming aware scheduling? ► Use Register Renaming in allocator minimal number of named registers maximal number of register instances ► Do not do scheduling that CPU can do over-scheduling can be worse than no scheduling at all

Download ppt "Register Renaming & Value Prediction. Overview ► Need for Post-RISC ► Register Renaming vs. Allocation Strategies ► How to compile for Post-RISC machines."

Similar presentations

Ads by Google