Download presentation
Presentation is loading. Please wait.
Published byΒαρ-ιησούς Κοντολέων Modified over 6 years ago
1
Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith
Future Superscalar Processors Based on Instruction Compounding Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith
2
Future Microprocessors
Instruction Compounding (Fusing) Instruction compounding, or “fusing” has become a key idea in high performance microprocessors “A compound instruction reflects the parallel issue of instructions; it comprises some number of independent instructions or interlocked instructions” “Instructions composing a compound instruction need not be consecutive.” -- S. Vassiliadis et al. IBM Journal of R and D, Jan. 1994 Macro-op Fusing: DBT software Future Microprocessors
3
Future Microprocessors
The Future Processor: Three Key Aspects Instruction compounding or fusing Based on S. Vassiliadis work Employs compounding and 3-input ALU Co-designed VM for dynamic translation/fusing Concealed from all software Optimized (fused) instructions held in code-cache Dual decoder front-end for fast startup Hardware front-end decoder for fast startup Software translator for sustained high performance Macro-op Fusing: DBT software Future Microprocessors
4
Future Microprocessors
Processor Micro-architecture Future Microprocessors
5
Fusible Instruction Set
RISC-ops with unique features: A fusible bit per instruction fuses two dependent instructions Dense instruction encoding, 16/32-bit ISA design Special Features to Support the x86 ISA Condition codes Addressing modes Aware of long immediate & displacement values Future Microprocessors
6
Microarchitecture: Macro-op Execution
Enhanced OOO superscalar microarchitecture Process & execute fused macro-ops as single Instructions throughout the entire pipeline Future Microprocessors
7
Future Microprocessors
Macro-op Fusing Algorithm Objectives: Maximize fused dependent pairs Simple & Fast Heuristics: Pipelined Scheduler: Only single-cycle ALU ops can be a head. Minimize non-fused single-cycle ALU ops Criticality: Fuse instructions that are “close” in the original sequence. ALU-ops criticality is easier to estimate. Simplicity: 2 or fewer distinct register operands per fused pair Solution: Two-pass Fusing Algorithm: The 1st pass, forward scan, prioritizes ALU ops, i.e. for each ALU-op tail candidate, look backward in the scan for its head The 2nd pass considers all kinds of RISC-ops as tail candidates Macro-op Fusing: DBT software Future Microprocessors
8
Future Microprocessors
Fusing Algorithm: Example x86 asm: 1. lea eax, DS:[edi + 01] 2. mov [DS:080b8658], eax 3. movzx ebx, SS:[ebp + ecx << 1] 4. and eax, f 5. mov edx, DS:[eax + esi << 0 + 0x7c] RISC-ops: 1. ADD Reax, Redi, 1 2. ST Reax, mem[R22] 3. LD.zx Rebx, mem[Rebp + Recx << 1] 4. AND Reax, f 5. ADD R17, Reax, Resi 6. LD Redx, mem[R17 + 0x7c] After fusing: Macro-ops 1. ADD R18, Redi, :: AND Reax, R18, 007f 2. ST R18, mem[R22] 3. LD.zx Rebx, mem[Rebp + Recx << 1] 4. ADD R17, Reax, Resi :: LD Rebx, mem[R17+0x7c] Macro-op Fusing: DBT software Future Microprocessors
9
Future Microprocessors
Instruction Fusing Profile Macro-op Fusing: DBT software 55+% fused RISC-ops increases effective ILP by 1.4 Only 6% single-cycle ALU ops left un-fused. Future Microprocessors
10
Future Microprocessors
Other DBT Software Profile Of all fused macro-ops: 50% ALU-ALU pairs. 30% fused condition test & conditional branch pairs. Others mostly ALU-MEM ops pairs. 70+% are inter-x86instruction fusion. 46% access two distinct source registers, only 15% (6% of all instruction entities) write two distinct destination registers. Translation Overhead Profile About 1000 instructions per translated hotspot instruction. Macro-op Fusing: DBT software Future Microprocessors
11
Co-designed x86 Processor Performance
Future Microprocessors
12
Future Microprocessors
Dual Decoder Front-End Future Microprocessors
13
Future Microprocessors
Evaluation: Startup Performance Add animation? Future Microprocessors
14
Future Microprocessors
Activity of HW x86 Decoder Future Microprocessors
15
Future Microprocessors
Important Research Issues Profiling Probe insertion via software translator not feasible Multi-core Shared code cache SMT designs Memory consistency Stores can be done in-order Re-scheduled loads may be important for performance Precise traps Potential HW assist? Macro-op Fusing: DBT software Future Microprocessors
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.