Presentation is loading. Please wait.

Presentation is loading. Please wait.

ENGS 116 Lecture 61 Pipelining Difficulties and MIPS R4000 Vincent H. Berk October 6, 2008 Reading for today: A.3 – A.4, article: Yeager Reading for Wednesday:

Similar presentations


Presentation on theme: "ENGS 116 Lecture 61 Pipelining Difficulties and MIPS R4000 Vincent H. Berk October 6, 2008 Reading for today: A.3 – A.4, article: Yeager Reading for Wednesday:"— Presentation transcript:

1 ENGS 116 Lecture 61 Pipelining Difficulties and MIPS R4000 Vincent H. Berk October 6, 2008 Reading for today: A.3 – A.4, article: Yeager Reading for Wednesday: A.5 – A.6, article: Smith&Pleszkun FRIDAY: NO CLASS

2 ENGS 116 Lecture 62 Exception Characterization Synchronous vs. Asynchronous –Synchronous: event occurs same place every time –Asynchronous: caused by devices external to CPU & memory, also hw malfunctions User requested vs. user coerced –Requested: user task asks for it –Coerced: hw event not under control of user program User maskable vs. user nonmaskable –Maskable: event that can be disabled by user task Within vs. between instructions –Within: during execution of task, hard to handle, usually synchronous since instruction is trigger Resume vs. terminate –Terminating: execution always stops after the interrupt

3 ENGS 116 Lecture 63 Exception Handling Table of Interrupt vector addresses Base register of this table stored in CPU by OS Addresses of Interrupt handling routines are stored in table On interrupt, CPU jumps to: base + 4 * int_num Usually 16 or 32 interrupts Physical pins on CPU, as well as software calls

4 ENGS 116 Lecture 64 Exception Examples (see also: figure A.27) I/O request: device requests attention from CPU System call or Supervisor call from software Breakpoint or instruction tracing: software debugging, single-step Arithmetic: Integer or FP, overflow, underflow, division by zero Page fault: requested virtual address was not present in main memory Misaligned address: bus error Memory protection: read/write/execute forbidden on requested address Invalid opcode: CPU was given an wrongly formatted instruction Hardware malfunction: CRC errors, component failure Power failure

5 ENGS 116 Lecture 65 Pipelining Complications Exceptions: 5 instructions executing in 5-stage pipeline – How to stop the pipeline? – How to restart the pipeline? – Who caused the exception? StageProblem exceptions occurring IFPage fault on instruction fetch; misaligned memory access; memory-protection violation IDUndefined or illegal opcode EXArithmetic interrupt MEMPage fault on data fetch; misaligned memory access; memory-protection violation

6 ENGS 116 Lecture 66 Pipelining Complications Simultaneous exceptions in more than one pipeline stage, e.g., – Load with data page fault in MEM stage – Add with instruction page fault in IF stage – Add fault will happen BEFORE load fault Solution #1 – Interrupt status vector per instruction – Defer check till last stage, kill state update if exception Solution #2 – Interrupt ASAP – Restart everything that is incomplete Another advantage for state update late in pipeline!

7 ENGS 116 Lecture 67 Pipelining Complications Complex addressing modes and instructions Address modes: Autoincrement causes register change during instruction execution – Interrupts? Need to restore register state – Adds WAR and WAW hazards since writes no longer in last stage Memory-memory move instructions – Must be able to handle multiple page faults – Long-lived instructions: partial state save on interrupt Floating point: long execution time; out of order completion

8 ENGS 116 Lecture 68 Stopping and Starting Execution Most difficult exception occurrences have 2 properties –They occur within instructions –They must be restartable The pipeline must be shut down safely and the state must be saved for correct restarting Restarting is usually done by saving PC of instruction at which to start Branches and delayed branches require special treatment Precise exceptions allow instructions just before the exception to be completed, while restarting instructions after the exception

9 ENGS 116 Lecture 69 Figure A.29The MIPS pipeline with three additional unpipelined, floating-point, functional units. IDIF WBMEM Integer unit EX FP/Integer divider EX FP adder EX FP/Integer multiply

10 ENGS 116 Lecture 610 Figure A.31 A pipeline that supports multiple outstanding FP operations IFID MEMWB Integer unit EXEX FP/integer multiply FP adder FP/integer divider DIV M1M1 M2M2 M3M3 M4M4 M5M5 M6M6 M7M7 A1A4A-3A2

11 ENGS 116 Lecture 611 Figure A.33A typical FP code sequence showing the stalls arising from RAW hazards.

12 ENGS 116 Lecture 612 Case Study: MIPS R4000 (100 MHz to 200 MHz) 8 Stage Pipeline: –IF – first half of fetching of instruction; PC selection happens here as well as initiation of instruction cache access. –IS – second half of access to instruction cache. –RF – instruction decode and register fetch, hazard checking and also instruction cache hit detection. –EX – execution, which includes effective address calculation, ALU operation, and branch target computation and condition evaluation. –DF – data fetch, first half of access to data cache. –DS – second half of access to data cache. –TC – tag check, determine whether the data cache access hit. –WB – write back for loads and register-register operations. 8 Stages: What is impact on Load delay? Branch delay? Why?

13 ENGS 116 Lecture 613 Instruction memoryRegData memoryReg IFISRFEXDFDSTCWB Figure A.37The eight-stage pipeline structure of the R4000 uses pipelined instruction and data cache accesses. ALU

14 ENGS 116 Lecture 614 WB TC DS DF EX RF IS IF IS IF RF IS IF EX RF IS IF DF EX RF IS IF DS DF EX RF IS IF TC DS DF EX RF IS IF WB TC DS DF EX RF IS IF TWO Cycle Load Latency IFIS IF RF IS IF EX RF IS IF DF EX RF IS IF DS DF EX RF IS IF TC DS DF EX RF IS IF THREE Cycle Branch Latency (conditions evaluated during EX phase) Delay slot plus two stalls Branch likely cancels delay slot if not taken Case Study: MIPS R4000

15 ENGS 116 Lecture 615 MIPS R4000 Floating Point FP Adder, FP Multiplier, FP Divider Last step of FP Multiplier/Divider uses FP Adder HW 8 kinds of stages in FP units: StageFunctional unitDescription AFP adderMantissa ADD stage DFP dividerDivide pipeline stage EFP multiplierException test stage MFP multiplierFirst stage of multiplier NFP multiplierSecond stage of multiplier RFP adderRounding stage SFP adderOperand shift stage UUnpack FP numbers

16 ENGS 116 Lecture 616 R4000 Performance Not ideal CPI of 1: –Load stalls (1 or 2 clock cycles) –Branch stalls (2 cycles + unfilled slots) –FP result stalls: RAW data hazard (latency) –FP structural stalls: Not enough FP hardware (parallelism)

17 ENGS 116 Lecture 617 Instruction Level Parallelism Want to exploit parallelism among instruction sequences Branches interfere with parallelism - gcc has branch every 5 or 6 instructions (on average) Need to find sequences of unrelated instructions that can be overlapped Often see loop-level parallelism for(i = 0; i < 100; i = i +1) x[i] = x[i] + y[i] Want to convert loop-level parallelism to instruction-level parallelism

18 ENGS 116 Lecture 618 FP Loop: Where are the Hazards? Loop:LDF0, 0(R1); F0=vector element ADDDF4, F0, F2; add scalar in F2 SD0 (R1), F4; store result SUBIR1, R1, #8; decrement pointer 8 bytes (DW) BNEZR1, Loop; branch R1!=zero NOP; delayed branch slot

19 ENGS 116 Lecture 619 FP Loop Hazards Where are the stalls? Loop:LDF0, 0(R1); F0=vector element ADDDF4, F0, F2; add scalar in F2 SD0 (R1), F4; store result SUBIR1, R1, #8; decrement pointer 8 bytes (DW) BNEZR1, Loop; branch R1! = zero NOP; delayed branch slot

20 ENGS 116 Lecture 620 FP Loop Showing Stalls Rewrite code to minimize stalls?


Download ppt "ENGS 116 Lecture 61 Pipelining Difficulties and MIPS R4000 Vincent H. Berk October 6, 2008 Reading for today: A.3 – A.4, article: Yeager Reading for Wednesday:"

Similar presentations


Ads by Google