Outline Simple Pipeline – hazards and solution Out of order exceution Reg Renaming In order Commit
Quick recap – Pipelining source:
Control Hazard Branch delay slot bnz r1, label add r1, r2, r3 label: sub r1, r2, r3 Save one cycle stall. Fetch in the negative edge to save another. bez r1, label IFID EX MEM WB Bubble IF TargetIF IFID/RFEXMEMWB IFID/RFEXMEMWB IFID/RFEXMEMWB
Branch Prediction Deeper pipelines. Such static compiler techniques would not work. Dynamically remember last targets of this branch and take decision on basis of history
Data Hazards RAW hazard – Read after Write add r1, r2, r3 store r1, 0(r4) WAW hazard – Write after Write div r1, r3, r4 … add r1, r10, r5 WAR hazard – Write after Read Generally not relevant in simple pipelines IFID/RFEXMEMWB IFID/RFEXMEMWB
Remedies Bypass values (Data forwarding) RAW hazards are tackled this way Not all RAW hazards can be solved by forwarding. E.g.: Load delay load r1, 0(r2) add r3, r1, r4 Solutions: Software – Compiler Techniques Hardware – Out of order Execution IFID/RFEXMEMWB IFID/RFEXMEMWB
Out of Order Execution source: EV8 DEC Aplha Procesor, (c) Intel
Register Renaming lw r4, 0(r1)lw p2, 0(p7) addi r2, r4, 0x20addi p1, p2, 0x20 and r3, r4, r1and p3, p2, p7 xor r4, r2, r4xor p5, p1, p2 sub r2, r4, r3sub p6, p5, p3 WAW hazards eliminated Useful for new processors which have larger no. of Physical Reg Logical Register Physical Register R1P7 R2P6 R3P3 R4P5 Register Map
In order Retirement After Execution, each inst gets queued up in a table This table ensures that the initial program order is maintained Inst are allowed to become permanent only when they reach top of Re-order table
Remedies to Structural hazards Simplest solution: Increase resources, functional units (Silicon allows us to do this) Another solution: Pipeline the functional units Pipelining is not always possible/feasible.
Superscalar execution! Execute more than one instruction every cycle. Make better use of the functional units Fetch, commit more instructions every cycle.
Memory Organization in processors Caches inside the chip Faster – ‘Closer’ SRAM cells They contain recently-used data They contain data in ‘blocks’
Rational behind caches Principle of spatial locality Principle of temporal locality Replacement policy (LRU, LFU, etc.) Principle of inclusivity