Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith Future Superscalar Processors Based on Instruction Compounding.

Similar presentations


Presentation on theme: "Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith Future Superscalar Processors Based on Instruction Compounding."— Presentation transcript:

1 Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith Future Superscalar Processors Based on Instruction Compounding

2 Future Microprocessors 2 Instruction Compounding (Fusing) Instruction compounding, or “fusing” has become a key idea in high performance microprocessors “A compound instruction reflects the parallel issue of instructions; it comprises some number of independent instructions or interlocked instructions” “Instructions composing a compound instruction need not be consecutive.” -- S. Vassiliadis et al. IBM Journal of R and D, Jan. 1994

3 Future Microprocessors 3 The Future Processor: Three Key Aspects  Instruction compounding or fusing Based on S. Vassiliadis work Employs compounding and 3-input ALU  Co-designed VM for dynamic translation/fusing Concealed from all software Optimized (fused) instructions held in code-cache  Dual decoder front-end for fast startup Hardware front-end decoder for fast startup Software translator for sustained high performance

4 Future Microprocessors 4 Processor Micro-architecture

5 Future Microprocessors 5 Fusible Instruction Set  RISC-ops with unique features: A fusible bit per instruction fuses two dependent instructions Dense instruction encoding, 16/32-bit ISA design  Special Features to Support the x86 ISA Condition codes Addressing modes Aware of long immediate & displacement values

6 Future Microprocessors 6 Microarchitecture: Macro-op Execution Enhanced OOO superscalar microarchitecture –Process & execute fused macro-ops as single Instructions throughout the entire pipeline

7 Future Microprocessors 7 Macro-op Fusing Algorithm  Objectives: Maximize fused dependent pairs Simple & Fast  Heuristics: Pipelined Scheduler: Only single-cycle ALU ops can be a head. Minimize non-fused single-cycle ALU ops Criticality: Fuse instructions that are “close” in the original sequence. ALU-ops criticality is easier to estimate. Simplicity: 2 or fewer distinct register operands per fused pair  Solution: Two-pass Fusing Algorithm: The 1 st pass, forward scan, prioritizes ALU ops, i.e. for each ALU-op tail candidate, look backward in the scan for its head The 2 nd pass considers all kinds of RISC-ops as tail candidates

8 Future Microprocessors 8 Fusing Algorithm: Example x86 asm: ----------------------------------------------------------- 1. lea eax, DS:[edi + 01] 2. mov [DS:080b8658], eax 3. movzx ebx, SS:[ebp + ecx << 1] 4. and eax, 0000007f 5. mov edx, DS:[eax + esi << 0 + 0x7c] RISC-ops: ----------------------------------------------------- 1. ADDReax, Redi, 1 2. ST Reax, mem[R22] 3. LD.zx Rebx, mem[Rebp + Recx << 1] 4. ANDReax, 0000007f 5. ADDR17, Reax, Resi 6. LDRedx, mem[R17 + 0x7c] After fusing: Macro-ops ----------------------------------------------------- 1. ADDR18, Redi, 1 :: ANDReax, R18, 007f 2. ST R18, mem[R22] 3. LD.zx Rebx, mem[Rebp + Recx << 1] 4. ADD R17, Reax, Resi :: LDRebx, mem[R17+0x7c]

9 Future Microprocessors 9 Instruction Fusing Profile  55+% fused RISC-ops  increases effective ILP by 1.4  Only 6% single-cycle ALU ops left un-fused.

10 Future Microprocessors 10 Other DBT Software Profile  Of all fused macro-ops: 50%  ALU-ALU pairs. 30%  fused condition test & conditional branch pairs. Others  mostly ALU-MEM ops pairs.  Of all fused macro-ops: 70+% are inter-x86instruction fusion. 46% access two distinct source registers, only 15% (6% of all instruction entities) write two distinct destination registers.  Translation Overhead Profile About 1000 instructions per translated hotspot instruction.

11 Future Microprocessors 11 Co-designed x86 Processor Performance

12 Future Microprocessors 12 Dual Decoder Front-End

13 Future Microprocessors 13 Evaluation: Startup Performance

14 Future Microprocessors 14 Activity of HW Assists

15 Future Microprocessors 15 Important Research Issues  Profiling Probe insertion via software translator not feasible  Multi-core Shared code cache SMT designs  Memory consistency Stores can be done in-order Re-scheduled loads may be important for performance  Precise traps Potential HW assist?


Download ppt "Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith Future Superscalar Processors Based on Instruction Compounding."

Similar presentations


Ads by Google