Presentation is loading. Please wait.

Presentation is loading. Please wait.

Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update.

Similar presentations


Presentation on theme: "Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update."— Presentation transcript:

1

2 Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update state early –FP more difficult –Memory updating ops (e.g. string moves)

3 Instruction Set Issues (cont.) Difficult architectural features –“Odd” bits of state (e.g. condition codes) May need saving/restoring on exceptions –Implicitly set condition codes Complicate branch resolution Explicit setting helps here (still a RAW hazard) –Multicycle operations Widely differing execution times, lots of potential data hazards, etc.

4 Instruction Set Issues VAX suffers from many of these problems Solution: pipeline the microcode Intel 32-bit 80x86 processors since 1995 use a similar approach

5 A.5. Handling Multicycle Operations MIPS: FP operations –Long latency (EX repeated) –Several functional units –Structural hazards –Data hazards

6 DLX: FP Design Four functional units: –Integer ALU as before –FP multiplier also used for integer multiplication –FP adder addition, subtraction and conversion –FP divider also used for integer division

7 MIPS Design with FP Units

8 MIPS Multicycle Operations UnitLatency Initiation Interval Integer ALU01 Memory (loads)11 FP add31 FP multiply61 FP divide2425

9 Hazards Divides –Structural hazard Multiple register writes possible in a cycle Out-of-order completion –WAW hazards –Exception-handling complications RAW hazards increase

10 Potential RAW Hazards Example (SPARC syntax): ldd [%fp-8], %f4 fmuld %f4, %f6, %f0 faddd %f0, %f8, %f2 std %f2, [%fp-16] Instr ld FDXMW mul FDXXXXXXXMW add FDXXXXMW st FDXM

11 Multiple Writes Up to four instructions may need to write in the same cycle Solution –Track writes in ID –Stall at instruction issue Alternatively: –Stall at MEM or WB Stall instruction with shorter latency (may free RAW hazards) Simpler: all stalls at one point

12 WAW Hazards Example: faddd %f4, %f6, %f2 … ! Integer op ldd [%fp-8], %f2 Instr faddd FDXXXXMW … FDXMW ldd FDXMW

13 WAW Hazards (cont.) Rare –Compiler scheduling may result in unlikely instruction sequences, so must be caught Solutions: –Stall issue of ldd –Prevent write by faddd

14 Maintaining Precise Exceptions Out-of-order completion: fdivd %f2, %f4, %f0 faddd %f10, %f8, %f10 fsubd %f12, %f14, %f12 Complete long before fdivd Sub may cause an exception after add is complete, but not div No longer precise

15 Maintaining Precise Exceptions It may be very difficult to handle exceptions precisely –E.g. the add has destroyed one of its operands! Four solutions: –Accept imprecise exceptions Needed for VM & IEEE FP Allow switching between precise and imprecise modes

16 Maintaining Precise Exceptions Solutions (cont.) –Buffer results until earlier instructions complete Buffers may grow very large, and extensive forwarding required History files: restore original register values Future files: store new register values –Software executes intervening instructions to get “up to date” before returning from exception

17 Maintaining Precise Exceptions Solutions (cont.) –Hybrid scheme Instructions are only issued when it is certain that preceding instructions will not cause an exception May require stalling the pipeline

18 Performance of the MIPS FP Pipeline Structural Hazards (divide unit) –Very low: 0-2 cycles per FP operation RAW hazards –Divide: cycles, average 14.2 –Add: cycles, average 1.7 –In general, about 0.5 × latency

19 Overall MIPS FP Performance Stalls per instruction – cycles –Average: 0.87 –82% from FP RAW hazards

20 A.6. Putting It All Together MIPS R4000 Pipeline 64-bit instruction set Eight stage pipeline –superpipelining –IF + IS: instruction fetch –RF: decode/register fetch –EX: execution –DF + DS + TC: data cache access –WB: write back

21 MIPS R4000 Pipeline Performance –Load delay: two cycles –Branch delay: three cycles Delayed branch (one cycle) Predict-not-taken strategy, with anulling Increased forwarding requirements –Three stages between EX and WB now

22 MIPS R4000 Pipeline Floating Point –Three functional units Divider, multiplier, adder Shared components (8 sub-units) –Latency: 2–112 cycles –Initiation rate: 1–111 cycles –Complicated stall handling

23 MIPS R4000 Pipeline Performance: –CPI between 1.2 and 2.8 for SPEC92 benchmarks –Average: 2.0 Integer: 1.54 FP: 2.48 –Integer apps: mainly branch delays –FP apps: mainly FP data hazard stalls (RAW)

24


Download ppt "Instruction Set Issues MIPS easy –Instructions are only committed at MEM  WB transition Other architectures are more difficult –Instructions may update."

Similar presentations


Ads by Google