Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP

Chapter 4 — The Processor — 2 SCP With Jumps Added

Chapter 4 — The Processor — 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory  register file  ALU  data memory  register file Not feasible to vary period for different instructions Violates design principle Making the common case fast We will improve performance by pipelining

Chapter 4 — The Processor — 4 Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance §4.5 An Overview of Pipelining Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup = 2n/0.5n + 1.5 ≈ 4 = number of stages

Chapter 4 — The Processor — 5 MIPS Pipeline Five stages, one step per stage 1.IF: Instruction fetch from memory 2.ID: Instruction decode & register read 3.EX: Execute operation or calculate address 4.MEM: Access memory operand 5.WB: Write result back to register

Chapter 4 — The Processor — 6 Pipeline Performance Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath with single-cycle datapath InstrInstr fetchRegister read ALU opMemory access Register write Total time lw200ps100 ps200ps 100 ps800ps sw200ps100 ps200ps 700ps R-format200ps100 ps200ps100 ps600ps beq200ps100 ps200ps500ps

Chapter 4 — The Processor — 7 Pipeline Performance Single-cycle (T c = 800ps) Pipelined (T c = 200ps)

Chapter 4 — The Processor — 8 Pipeline Speedup If all stages are balanced i.e., all take the same time Time between instructions pipelined = Time between instructions nonpipelined Number of stages If not balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decrease

Chapter 4 — The Processor — 9 Pipelining and ISA Design MIPS ISA designed for pipelining All instructions are 32-bits Easier to fetch and decode in one cycle c.f. x86: 1- to 17-byte instructions Few and regular instruction formats Can decode and read registers in one step Load/store addressing Can calculate address in 3 rd stage, access memory in 4 th stage Alignment of memory operands Memory access takes only one cycle

Chapter 4 — The Processor — 10 Hazards Situations that prevent starting the next instruction in the next cycle Structure hazards A required resource is busy Data hazard Need to wait for previous instruction to complete its data read/write Control hazard Deciding on control action depends on previous instruction There are ways to handle those hazards. Let’s ignore them for now

Chapter 4 — The Processor — 11 MIPS Pipelined Datapath §4.6 Pipelined Datapath and Control WB MEM Right-to-left flow leads to hazards

Chapter 4 — The Processor — 12 Pipeline registers Need registers between stages To hold information produced in previous cycle

Chapter 4 — The Processor — 13 Pipeline Operation Cycle-by-cycle flow of instructions through the pipelined datapath “Single-clock-cycle” pipeline diagram Shows pipeline usage in a single cycle Highlight resources used c.f. “multi-clock-cycle” diagram Graph of operation over time We’ll look at “single-clock-cycle” diagrams for load & store

Chapter 4 — The Processor — 14 IF for Load, Store, …

Chapter 4 — The Processor — 15 ID for Load, Store, …

Chapter 4 — The Processor — 16 EX for Load

Chapter 4 — The Processor — 17 MEM for Load

Chapter 4 — The Processor — 18 WB for Load Wrong register number

Chapter 4 — The Processor — 19 Corrected Datapath for Load

Chapter 4 — The Processor — 20 EX for Store

Chapter 4 — The Processor — 21 MEM for Store

Chapter 4 — The Processor — 22 WB for Store

Chapter 4 — The Processor — 23 Multi-Cycle Pipeline Diagram Form showing resource usage

Chapter 4 — The Processor — 24 Multi-Cycle Pipeline Diagram Traditional form

Chapter 4 — The Processor — 25 Single-Cycle Pipeline Diagram State of pipeline in a given cycle

Chapter 4 — The Processor — 26 Pipelined Control (Simplified)

Chapter 4 — The Processor — 27 Pipelined Control Control signals derived from instruction As in single-cycle implementation

Chapter 4 — The Processor — 28 Pipelined Control

Chapter 4 — The Processor — 29 Pipeline Summary Pipelining improves performance by increasing instruction throughput Executes multiple instructions in parallel Each instruction has the same latency Subject to hazards Structure, data, control (will be studied) Instruction set design affects complexity of pipeline implementation The BIG Picture

Chapter 4 — The Processor — 30 Hazards Situations that prevent starting the next instruction in the next cycle Structure hazards A required resource is busy Data hazard Need to wait for previous instruction to complete its data read/write Control hazard Deciding on control action depends on previous instruction

Chapter 4 — The Processor — 31 Structure Hazards Conflict for use of a resource In MIPS pipeline with a single memory Load/store requires data access Instruction fetch would have to stall for that cycle Would cause a pipeline “bubble” Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches

Structure Hazards How about the Registers? For a given cycle, an lw/ALU instruction may write to the Registers, while a new instruction is reading from the Registers This is NOT a structure hazards The above two instructions are using different ports of the Registers This is a data hazard (to be discussed) Chapter 1 — Computer Abstractions and Technology — 32

Chapter 4 — The Processor — 33 Data Hazards An instruction depends on completion of data access by a previous instruction add$s0, $t0, $t1 sub$t2, $s0, $t3 How about the following code? lw$t0, 100($gp) lw$t1, 104($gp) add$t2, $t0, $t1 sub$t3, $t2, $s0 sw$t3, 108($gp)

Chapter 4 — The Processor — 34 Data Hazards A naïve approach: Stall the 2 nd instruction in the dependence add$s0, $t0, $t1 sub$t2, $s0, $t3

Data Hazards in ALU Instructions Chapter 1 — Computer Abstractions and Technology — 35

Chapter 4 — The Processor — 36 Dependencies & Forwarding

Chapter 4 — The Processor — 37 Detecting the Need to Forward Pass register numbers along pipeline e.g., ID/EX.RegisterRs = register number for Rs sitting in ID/EX pipeline register ALU operand register numbers in EX stage are given by ID/EX.RegisterRs, ID/EX.RegisterRt Data hazards when 1a. EX/MEM.RegisterRd = ID/EX.RegisterRs 1b. EX/MEM.RegisterRd = ID/EX.RegisterRt 2a. MEM/WB.RegisterRd = ID/EX.RegisterRs 2b. MEM/WB.RegisterRd = ID/EX.RegisterRt Fwd from EX/MEM pipeline reg Fwd from MEM/WB pipeline reg

Chapter 4 — The Processor — 38 Detecting the Need to Forward But only if forwarding instruction will write to a register! EX/MEM.RegWrite, MEM/WB.RegWrite And only if Rd for that instruction is not $zero EX/MEM.RegisterRd ≠ 0, MEM/WB.RegisterRd ≠ 0

Chapter 4 — The Processor — 39 Forwarding Paths

Chapter 4 — The Processor — 40 Forwarding Conditions EX hazard if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10 if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 MEM hazard if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

More Thoughts on Data Hazards When are we concerned with data hazards? Instruction B uses the output of another instruction A; and When B reads the Registers, A hasn’t yet written to the Registers In single-cycle processor, this never happens All operations of A completes before B starts In pipelined processor, we have multiple instructions pending in the pipeline Chapter 1 — Computer Abstractions and Technology — 41

More Thoughts on Data Hazards Two type of data dependences Register: Value passed through register Producer: ALU, lw Consumer: ALU, lw, sw, beq Memory: Value passed through memory Producer: sw Consumer: lw Chapter 1 — Computer Abstractions and Technology — 42

More Thoughts on Data Hazards No pipeline bubble with data forwarding An ALU instruction produces its register output value at the end of its EX stage Other instructions consumes their register inputs in the beginning of their EX stage The datum is already in the pipeline when needed! How about lw instruction? A lw instruction produces its register output value at the end of its MEM stage Chapter 1 — Computer Abstractions and Technology — 43

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.

Similar presentations

Presentation on theme: "Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP.

Similar presentations

Presentation on theme: "Chapter 4 The Processor CprE 381 Computer Organization and Assembly Level Programming, Fall 2012 Revised from original slides provided by MKP."— Presentation transcript:

Similar presentations

About project

Feedback