Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spring 2005331 W11.1 14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 11 Introduction to Pipelined Datapath [Adapted from Dave Patterson’s.

Similar presentations


Presentation on theme: "Spring 2005331 W11.1 14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 11 Introduction to Pipelined Datapath [Adapted from Dave Patterson’s."— Presentation transcript:

1 Spring 2005331 W11.1 14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 11 Introduction to Pipelined Datapath [Adapted from Dave Patterson’s UCB CS152 slides and Mary Jane Irwin’s PSU CSE331 slides]

2 Spring 2005331 W11.2 Head’s Up  Reminders l Pipelined datapath and control l HW#6 will be handed out soon

3 Spring 2005331 W11.3 Review: Multicycle Data and Control Path Address Read Data (Instr. or Data) Memory PC Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 ALU Write Data IR MDR A B ALUout Sign Extend Shift left 2 ALU control Shift left 2 ALUOp Control FSM IRWrite MemtoReg MemWrite MemRead IorD PCWrite PCWriteCond RegDst RegWrite ALUSrcA ALUSrcB zero PCSource 1 1 1 1 1 1 0 0 0 0 0 0 2 2 3 4 Instr[5-0] Instr[25-0] PC[31-28] Instr[15-0] Instr[31-26] 32 28

4 Spring 2005331 W11.4 Review: RTL Summary StepR-typeMem RefBranchJump Instr fetch IR = Memory[PC]; PC = PC + 4; Decode A = Reg[IR[25-21]]; B = Reg[IR[20-16]]; ALUOut = PC +(sign-extend(IR[15-0])<< 2); Execute ALUOut = A op B; ALUOut = A + sign-extend (IR[15-0]); if (A==B) PC = ALUOut; PC = PC[31-28] ||(IR[25-0] << 2); Memory access Reg[IR[15- 11]] = ALUOut; MDR = Memory[ALUOut]; or Memory[ALUOut] = B; Write- back Reg[IR[20-16]] = MDR;

5 Spring 2005331 W11.5 Review: Multicycle Datapath FSM Start Instr Fetch Decode Write Back Memory Access Execute (Op = R-type) (Op = beq) (Op = lw or sw) (Op = j) (Op = lw) (Op = sw) 01 2 3 4 5 6 7 8 9 Unless otherwise assigned PCWrite,IRWrite, MemWrite,RegWrite=0 others=X IorD=0 MemRead;IRWrite ALUSrcA=0 ALUsrcB=01 PCSource,ALUOp=00 PCWrite ALUSrcA=0 ALUSrcB=11 ALUOp=00 PCWriteCond=0 ALUSrcA=1 ALUSrcB=10 ALUOp=00 PCWriteCond=0 ALUSrcA=1 ALUSrcB=00 ALUOp=10 PCWriteCond=0 ALUSrcA=1 ALUSrcB=00 ALUOp=01 PCSource=01 PCWriteCond PCSource=10 PCWrite MemRead IorD=1 PCWriteCond=0 MemWrite IorD=1 PCWriteCond=0 RegDst=1 RegWrite MemtoReg=0 PCWriteCond=0 RegDst=0 RegWrite MemtoReg=1 PCWriteCond=0

6 Spring 2005331 W11.6 Review: FSM Implementation Combinational control logic State Reg Inst[31-26] Next State Inputs Outputs Op0Op1Op2Op3Op4Op5 PCWrite PCWriteCond IorD MemRead MemWrite IRWrite MemtoReg PCSource ALUOp ALUSourceB ALUSourceA RegWrite RegDst System Clock

7 Spring 2005331 W11.7 Single Cycle Disadvantages & Advantages  Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowest instruction  Is wasteful of area since some functional units must (e.g., adders) be duplicated since they can not be shared during a clock cycle but  Is simple and easy to understand Clk Single Cycle Implementation: lwswWaste Cycle 1Cycle 2

8 Spring 2005331 W11.8 Multicycle Advantages & Disadvantages  Uses the clock cycle efficiently – the clock cycle is timed to accommodate the slowest instruction step l balance the amount of work to be done in each step l restrict each step to use only one major functional unit  Multicycle implementations allow l functional units to be used more than once per instruction as long as they are used on different clock cycles l faster clock rates l different instructions to take a different number of clock cycles but  Requires additional internal state registers, muxes, and more complicated (FSM) control

9 Spring 2005331 W11.9 The Five Stages of Load Instruction  IFetch: Instruction Fetch and Update PC  Dec: Registers Fetch and Instruction Decode  Exec: Execute R-type; calculate memory address  Mem: Read/write the data from/to the Data Memory  WB: Write the data back to the register file Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IFetchDecExecMemWB lw

10 Spring 2005331 W11.10 Single Cycle vs. Multiple Cycle Timing Clk Cycle 1 Multiple Cycle Implementation: IFetchDecExecMemWB Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 IFetchDecExecMem lwsw Clk Single Cycle Implementation: lwswWaste IFetch R-type Cycle 1Cycle 2 multicycle clock slower than 1/5 th of single cycle clock due to stage flipflop overhead

11 Spring 2005331 W11.11 Pipelined MIPS Processor  Start the next instruction while still working on the current one l improves throughput - total amount of work done in a given time l instruction latency (execution time, delay time, response time) is not reduced - time from the start of an instruction to its completion Cycle 1Cycle 2Cycle 3Cycle 4Cycle 5 IFetchDecExecMemWB lw Cycle 7Cycle 6Cycle 8 sw IFetchDecExecMemWB R-type IFetchDecExecMemWB

12 Spring 2005331 W11.12 Single Cycle, Multiple Cycle, vs. Pipeline Clk Cycle 1 Multiple Cycle Implementation: IFetchDecExecMemWB Cycle 2Cycle 3Cycle 4Cycle 5Cycle 6Cycle 7Cycle 8Cycle 9Cycle 10 lw IFetchDecExecMemWB IFetchDecExecMem lwsw Pipeline Implementation: IFetchDecExecMemWB sw Clk Single Cycle Implementation: LoadStoreWaste IFetch R-type IFetchDecExecMemWB R-type Cycle 1Cycle 2 wasted cycle

13 Spring 2005331 W11.13 Pipelining the MIPS ISA  What makes it easy l all instructions are the same length (32 bits) l few instruction formats (three) with symmetry across formats l memory operations can occur only in loads and stores l operands must be aligned in memory so a single data transfer requires only one memory access  What makes it hard l structural hazards: what if we had only one memory l control hazards: what about branches l data hazards: what if an instruction’s input operands depend on the output of a previous instruction

14 Spring 2005331 W11.14 MIPS Pipeline Datapath Modifications Read Address Instruction Memory Add PC 4 0 1 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 1632 ALU 1 0 Shift left 2 Add Data Memory Address Write Data Read Data 1 0  What do we need to add/modify in our MIPS datapath? l State registers between pipeline stages to isolate them IFetch/Dec Dec/Exec Exec/Mem Mem/WB IFetch DecExecMemWB System Clock Sign Extend

15 Spring 2005331 W11.15 MIPS Pipeline Control Path Modifications Read Address Instruction Memory Add PC 4 0 1 Write Data Read Addr 1 Read Addr 2 Write Addr Register File Read Data 1 Read Data 2 1632 ALU 1 0 Shift left 2 Add Data Memory Address Write Data Read Data 1 0  All control signals are determined during Decode l and held in the state registers between pipeline stages IFetch/Dec Dec/Exec Exec/Mem Mem/WB IFetchDecExecMemWB System Clock Control Sign Extend

16 Spring 2005331 W11.16 Graphically Representing MIPS Pipeline  Can help with answering questions like: l how many cycles does it take to execute this code? l what is the ALU doing during cycle 4? l is there a hazard, why does it occur, and how can it be fixed? ALU IM Reg DMReg

17 Spring 2005331 W11.17 Why Pipeline? For Throughput! I n s t r. O r d e r Time (clock cycles) Inst 0 Inst 1 Inst 2 Inst 4 Inst 3 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg Once the pipeline is full, one instruction is completed every cycle Time to fill the pipeline

18 Spring 2005331 W11.18 Can pipelining get us into trouble?  Yes: Pipeline Hazards l structural hazards: attempt to use the same resource by two different instructions at the same time l data hazards: attempt to use item before it is ready -instruction depends on result of prior instruction still in the pipeline l control hazards: attempt to make a decision before condition is evaulated -branch instructions  Can always resolve hazards by waiting l pipeline control must detect the hazard l take action (or delay action) to resolve hazards

19 Spring 2005331 W11.19 I n s t r. O r d e r Time (clock cycles) lw Inst 1 Inst 2 Inst 4 Inst 3 ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg ALU Mem Reg MemReg A Unified Memory Would Be a Structural Hazard Reading data from memory Reading instruction from memory

20 Spring 2005331 W11.20 How About Register File Access? I n s t r. O r d e r Time (clock cycles) add Inst 1 Inst 2 Inst 4 add ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg Can fix register file access hazard by doing reads in the second half of the cycle and writes in the first half.

21 Spring 2005331 W11.21 Branch Instructions Cause Control Hazards I n s t r. O r d e r add beq lw Inst 4 Inst 3 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg  Dependencies backward in time cause hazards

22 Spring 2005331 W11.22 One Way to “Fix” a Control Hazard I n s t r. O r d e r add beq ALU IM Reg DMReg ALU IM Reg DMReg Inst 3 lw ALU IM Reg DMReg ALU IM Reg DMReg Can fix branch hazard by waiting – stall – but affects throughput stall

23 Spring 2005331 W11.23 Register Usage Can Cause Data Hazards I n s t r. O r d e r add r1,r2,r3 sub r4,r1,r5 and r6,r1,r7 xor r4,r1,r5 or r8, r1, r9 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg  Dependencies backward in time cause hazards

24 Spring 2005331 W11.24 One Way to “Fix” a Data Hazard I n s t r. O r d e r add r1,r2,r3 ALU IM Reg DMReg sub r4,r1,r5 and r6,r1,r7 ALU IM Reg DMReg ALU IM Reg DMReg stall Can fix data hazard by waiting – stall – but affects throughput

25 Spring 2005331 W11.25 Loads Can Cause Data Hazards I n s t r. O r d e r lw r1,100(r2) sub r4,r1,r5 and r6,r1,r7 xor r4,r1,r5 or r8, r1, r9 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg  Dependencies backward in time cause hazards

26 Spring 2005331 W11.26 Stores Can Cause Data Hazards I n s t r. O r d e r add r1,r2,r3 sw r1,100(r5) and r6,r1,r7 xor r4,r1,r5 or r8, r1, r9 ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg ALU IM Reg DMReg  Dependencies backward in time cause hazards

27 Spring 2005331 W11.27 Other Pipeline Structures Are Possible  What about (slow) multiply operation? l let it take two cycles  What if the data memory access is twice as slow as the instruction memory? l make the clock twice as slow or … l let data memory access take two cycles (and keep the same clock rate) ALU IM Reg DMReg MUL ALU IM Reg DM1Reg DM2

28 Spring 2005331 W11.28 Sample Pipeline Alternatives  ARM7  StrongARM-1  XScale ALU IM1 IM2 DM1 Reg DM2 IM Reg EX PC update IM access decode reg access ALU op DM access shift/rotate commit result (write back) ALU IM Reg DMReg SHFT PC update BTB access start IM access IM access decode reg 1 access shift/rotate reg 2 access ALU op start DM access exception DM write reg write

29 Spring 2005331 W11.29 Summary  All modern day processors use pipelining  Pipelining doesn’t help latency of single task, it helps throughput of entire workload l Multiple tasks operating simultaneously using different resources  Potential speedup = Number of pipe stages  Pipeline rate limited by slowest pipeline stage l Unbalanced lengths of pipe stages reduces speedup l Time to “fill” pipeline and time to “drain” it reduces speedup  Must detect and resolve hazards l Stalling negatively affects throughput


Download ppt "Spring 2005331 W11.1 14:332:331 Computer Architecture and Assembly Language Spring 2005 Week 11 Introduction to Pipelined Datapath [Adapted from Dave Patterson’s."

Similar presentations


Ads by Google