Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1

Slides:



Advertisements
Similar presentations
Constructive Computer Architecture: Data Hazards in Pipelined Processors Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute.
Advertisements

Lecture Objectives: 1)Define pipelining 2)Calculate the speedup achieved by pipelining for a given number of instructions. 3)Define how pipelining improves.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University Pipeline Hazards See: P&H Chapter 4.7.
Kevin Walsh CS 3410, Spring 2010 Computer Science Cornell University RISC Pipeline See: P&H Chapter 4.6.
Pipeline Hazards Hakim Weatherspoon CS 3410, Spring 2011 Computer Science Cornell University See P&H Appendix 4.7.
DLX Instruction Format
Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.
Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.
Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.
CSC 4250 Computer Architectures September 15, 2006 Appendix A. Pipelining.
Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1
Lecture 5: Pipelining Implementation Kai Bu
1 Tutorial: Lab 4 Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
1 Designing a Pipelined Processor In this Chapter, we will study 1. Pipelined datapath 2. Pipelined control 3. Data Hazards 4. Forwarding 5. Branch Hazards.
Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.
Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.
Computer Architecture: A Constructive Approach Branch Direction Prediction – Pipeline Integration Joel Emer Computer Science & Artificial Intelligence.
Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.
February 20, 2009http://csg.csail.mit.edu/6.375L08-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,
October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.
Computer Architecture: A Constructive Approach Multi-Cycle and 2 Stage Pipelined SMIPS Implementations Teacher: Yoav Etsion Taken (with permission) from.
6.375 Tutorial 4 RISC-V and Final Projects Ming Liu March 4, 2016http://csg.csail.mit.edu/6.375T04-1.
Computer Architecture: A Constructive Approach Data Hazards and Multistage Pipelines Teacher: Yoav Etsion Taken (with permission) from Arvind et al.*,
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010
Computer Architecture: A Constructive Approach Implementing SMIPS Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
February 27, 2006http://csg.csail.mit.edu/6.375/L08-1 Bluespec-2: Types Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of.
Elastic Pipelines: Concurrency Issues
Caches-2 Constructive Computer Architecture Arvind
Bluespec-3: A non-pipelined processor Arvind
Control Hazards Constructive Computer Architecture: Arvind
Bluespec-6: Modeling Processors
Bluespec-6: Modules and Interfaces
Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind
Caches-2 Constructive Computer Architecture Arvind
Performance Specifications
Pipelining combinational circuits
Non-Pipelined Processors - 2
Pipelining combinational circuits
Modular Refinement Arvind
Modular Refinement Arvind
Lab 4 Overview: 6-stage SMIPS Pipeline
Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind
Bluespec-7: Scheduling & Rule Composition
Control Hazards Constructive Computer Architecture: Arvind
Modeling Processors: Concurrency Issues
Pipelining combinational circuits
Bypassing Computer Architecture: A Constructive Approach Joel Emer
Elastic Pipelines: Concurrency Issues
Bluespec-3: A non-pipelined processor Arvind
Caches-2 Constructive Computer Architecture Arvind
Modular Refinement - 2 Arvind
Elastic Pipelines: Concurrency Issues
Pipelined Processors Arvind
Bluespec-5: Modeling Processors
Bluespec-7: Scheduling & Rule Composition
Control Hazards Constructive Computer Architecture: Arvind
Pipelined Processors Constructive Computer Architecture: Arvind
Tutorial 4: RISCV modules Constructive Computer Architecture
Modeling Processors Arvind
Modeling Processors Arvind
Modular Refinement Arvind
Control Hazards Constructive Computer Architecture: Arvind
Modular Refinement Arvind
Caches-2 Constructive Computer Architecture Arvind
Bluespec-8: Modules and Interfaces
Presentation transcript:

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1

Instruction set typedef enum {R0;R1;R2;…;R31} RName; An instruction set can be implemented using many different microarchitectures typedef union tagged { struct {RName dst; RName src1; RName src2;} Add; struct {RName condR; RName addrR;} Bz; struct {RName dst; RName addrR;} Load; struct {RName valueR; RName addrR;} Store } Instr deriving(Bits, Eq); typedef Bit#(32) Iaddress; typedef Bit#(32) Daddress; typedef Bit#(32) Value; February 22, 2011 L07-2http://csg.csail.mit.edu/6.375

Deriving Bits To store datatypes in register, fifo, etc. we need to know how to represent them as bits (pack) and interpret their bit representation (unpack) Deriving annotation automatically generates the “pack” and “unpack” operations on the type (simple concatenation of bit representations of components) It is possible to customize the pack/unpack operations to any specific desired representation typedef struct { … } Foo deriving (Bits); February 22, 2011 L07-3http://csg.csail.mit.edu/6.375

Tagged Unions: B it Representation 00 dst src1 src2 01 condR addrR 10 dst addrR 11 dst imm typedef union tagged { struct {RName dst; RName src1; RName src2;} Add; struct {RName condR; RName addrR;} Bz; struct {RName dst; RName addrR;} Load; struct {RName dst; Immediate imm;} AddImm; } Instr deriving(Bits, Eq); Automatically derived representation; can be customized by the user written pack and unpack functions February 22, 2011 L07-4http://csg.csail.mit.edu/6.375

The Plan Non-pipelined processor Two-stage Inelastic pipeline Two-stage Elastic pipeline – next lecture Some understanding of simple processor pipelines is needed to follow this lecture February 22, 2011 L07-5http://csg.csail.mit.edu/6.375

Non-pipelined Processor fetch & execute pc iMemdMem rf CPU module mkCPU#(Mem iMem, Mem dMem)(); Reg#(Iaddress) pc <- mkReg(0); RegFile#(RName, Bit#(32)) rf <- mkRegFileFull(); Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; rule fetch_Execute... endmodule February 22, 2011 L07-6http://csg.csail.mit.edu/6.375

Non-pipelined processor rule rule fetch_Execute (True); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: begin rf.upd(rd, rf[ra]+rf[rb]); pc <= predIa end tagged Bz {condR:.rc,addrR:.ra}: begin pc <= (rf[rc]==0) ? rf[ra] : predIa; end tagged Load {dest:.rd,addrR:.ra}: begin rf.upd(rd, dMem.read(rf[ra])); pc <= predIa; end tagged Store {valueR:.rv,addrR:.ra}: begin dMem.write(rf[ra],rf[rv]); pc <= predIa; end endcase endrule my syntax rf[r]  rf.sub(r) Assume “magic memory”, i.e. responds to a read request in the same cycle and a write updates the memory at the end of the cycle Pattern matching February 22, 2011 L07-7http://csg.csail.mit.edu/6.375

Register File How many read ports? Two How many write ports? One Concurrency properties? Must be able to do two reads and a write concurrently The values produced must be as if reads precede the write (if any) February 22, 2011 L07-8http://csg.csail.mit.edu/6.375

The Plan Non-pipelined processor Two-stage Inelastic pipeline  Two-stage Elastic pipeline February 22, 2011 L07-9http://csg.csail.mit.edu/6.375

Two-stage Inelastic Pipeline fetch & decode execute buReg time t0t1t2t3t4t5t6t7.... FDstageFD 1 FD 2 FD 3 FD 4 FD 5 EXstageEX 1 EX 2 EX 3 EX 4 EX 5 Actions to be performed in parallel every cycle: Fetch Action: Decodes the instruction at the current pc and fetches operands from the register file and stores the result in buReg Execute Action: Performs the action specified in buReg and updates the processor state (pc, rf, dMem) pcrfdMem rule InelasticPipeline2(True); fetchAction; executeAction; endrule February 22, 2011 L07-10http://csg.csail.mit.edu/6.375

Instructions & Templates typedef union tagged { struct {RName dst; Value op1; Value op2} EAdd; struct {Value cond; Iaddress tAddr} EBz; struct {RName dst; Daddress addr} ELoad; struct {Value val; Daddress addr} EStore; } InstTemplate deriving(Eq, Bits); typedef union tagged { struct {RName dst; RName src1; RName src2} Add; struct {RName condR; RName addrR} Bz; struct {RName dst; RName addrR} Load; struct {RName valueR; RName addrR} Store; } Instr deriving(Bits, Eq); buReg contains instruction templates, i.e., decoded instructions February 22, 2011 L07-11http://csg.csail.mit.edu/6.375

Fetch & Decode Action Fills the buReg with a decoded instruction function InstrTemplate newIt(Instr instr); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return EAdd{dst:rd,op1:rf[ra],op2:rf[rb]}; tagged Bz {condR:.rc,addrR:.addr}: return EBz{cond:rf[rc],tAddr:rf[addr]}; tagged Load {dst:.rd,addrR:.addr}: return ELoad{dst:rd,addrR:rf[addr]}; tagged Store{valueR:.v,addrR:.addr}: return EStore{val:rf[v],addr:rf[addr]}; endcase endfunction buReg <= newIt(instr); February 22, 2011 L07-12http://csg.csail.mit.edu/6.375

Execute Action: Reads buReg and modifies state (rf,dMem,pc) case (buReg) matches tagged EAdd{dst:.rd,op1:.va,op2:.vb}: begin rf.upd(rd, va+vb); pc <= predIa; end tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); pc <= predIa; end tagged EStore{val:.vv,addr:.av}: begin dMem.write(av, vv); pc <= predIa; end tagged EBz {cond:.cv,tAddr:.av}: if (cv != 0) then pc <= predIa; else begin pc <= av; Invalidate buReg end endcase What does this mean? February 22, 2011 L07-13http://csg.csail.mit.edu/6.375

Issues with buReg fetch & decode execute buReg buReg may not always contain an instruction. Why? start cycle Execute stage may kill the fetched instructions because of branch misprediction Maybe type to the rescue … Can’t update buReg in two concurrent actions fetchAction; executeAction Fold them together pcrfdMem February 22, 2011 L07-14http://csg.csail.mit.edu/6.375

Inelastic Pipeline first attempt fetch & decode execute pc rf CPU buReg rule SyncTwoStage (True); let instr = iMem.read(pc); let predIa = pc+1; Action fetchAction = action buReg <= Valid newIt(instr); pc <= predIa; endaction; case (buReg) matches each instruction execution calls fetchAction or puts Invalid in buReg … endcase endcase endrule February 22, 2011 L07-15http://csg.csail.mit.edu/6.375

Execute case (buReg) matches tagged Valid.it: case (it) matches tagged EAdd{dst:.rd,op1:.va,op2:.vb}: begin rf.upd(rd, va+vb); fetchAction; end tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); fetchAction; end tagged EStore{val:.vv,addr:.av}: begin dMem.write(av, vv); fetchAction; end tagged EBz {cond:.cv,tAddr:.av}: if (cv != 0) then fetchAction; else begin pc <= av; buReg <= Invalid; end endcase tagged Invalid: fetchAction; endcase fetch & decode execute pc rf CPU buReg Not quite correct! February 22, 2011 L07-16http://csg.csail.mit.edu/6.375

Pipeline Hazards fetch & decode execute buReg time t0t1t2t3t4t5t6t7.... FDstageFD 1 FD 2 FD 3 FD 4 FD 5 EXstageEX 1 EX 2 EX 3 EX 4 EX 5 I 1 Add(R1,R2,R3) I 2 Add(R4,R1,R2) I 2 must be stalled until I 1 updates the register file pcrfdMem time t0t1t2t3t4t5t6t7.... FDstageFD 1 FD 2 FD 2 FD 3 FD 4 FD 5 EXstageEX 1 EX 2 EX 3 EX 4 EX 5 February 22, 2011 L07-17http://csg.csail.mit.edu/6.375

Stall condition Suppose the fetched instruction needs to read register r and the instruction in buReg is going to write in r then the Fetch unit must stall A function to find register r in an instruction template it fetch & decode execute pc rf CPU buReg function Bool findf (RName r, InstrTemplate it); case (it) matches tagged EAdd{dst:.rd,op1:.v1,op2:.v2}: return (r == rd); tagged EBz {cond:.c,tAddr:.a}: return (False); tagged ELoad{dst:.rd,addr:.a}: return (r == rd); tagged EStore{val:.v,addr:.a}: return (False); endcase endfunction February 22, 2011 L07-18http://csg.csail.mit.edu/6.375

The Stall Function Decides if instruction instr should stall given the state of the buReg function Bool stallFunc (Instr instr, Maybe#(InstTemplate) mit); case (mit) matches tagged Invalid: return False; tagged Valid.it: case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return (findf(ra,it) || findf(rb,it)); tagged Bz {condR:.rc,addrR:.addr}: return (findf(rc,it) || findf(addr,it)); tagged Load {dst:.rd,addrR:.addr}: return (findf(addr,it)); tagged Store {valueR:.v,addrR:.addr}: return (findf(v,it) || findf(addr,it)); endcase endfunction February 22, 2011 L07-19http://csg.csail.mit.edu/6.375

Inelastic Pipeline corrected rule SyncTwoStage (True); let instr = iMem.read(pc); let predIa = pc+1; Action fetchAction = action if stallFunc(instr, buReg) then buReg <=Invalid else begin buReg <= Valid newIt(instr); pc <= predIa; end endaction; case (buReg) matches The execute rule (no change) endcase endcase endrule fetch & decode execute pc rf CPU buReg February 22, 2011 L07-20http://csg.csail.mit.edu/6.375

Bypassing After decoding the newIt function must read the new register values if available (i.e., the values that are still to be committed in the register file) We pass the value being written to decoding action (the newIt function) February 22, 2011 L07-21http://csg.csail.mit.edu/6.375

Generation of bypass register value rule inelasticProcessor2 (True); case (buReg) matches tagged Valid.it: case (it) matches tagged EAdd{dst:.rd,op1:.va,op2:.vb}: begin rf.upd(rd, va+vb); fetchAction; end tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); fetchAction; end tagged EStore{val:.vv,addr:.av}: begin dMem.write(av, vv); fetchAction; end tagged EBz {cond:.cv,tAddr:.av}: if (cv != 0) then fetchAction; else begin pc <= av; buReg <= Invalid; end endcase tagged Invalid: fetchAction; endcase endrule bypassable values February 22, 2011 L07-22http://csg.csail.mit.edu/6.375

Bypassing values to Fetch case (buReg) matches tagged Valid.it: case (it) matches tagged EAdd{dst:.rd,op1:.va,op2:.vb}: begin rf.upd(rd, va+vb); fetchAction(Valid rd, va+vb); end tagged ELoad{dst:.rd,addr:.av}: begin rf.upd(rd, dMem.read(av)); fetchAction(Valid rd, dMem.read(av)); end tagged EStore{val:.vv,addr:.av}: begin dMem.write(av, vv); fetchAction(Invalid, ?); end tagged EBz {cond:.cv,tAddr:.av}: if (cv != 0) then fetchAction(Invalid, ?); else begin pc <= av; buReg <= Invalid; end endcase tagged Invalid: fetchAction(Invalid, ?); endcase February 22, 2011 L07-23http://csg.csail.mit.edu/6.375

New fetchAction function Action fetchAction(Maybe#(RName) mrd, Value val); action if stallFunc(instr, buReg) then buReg <= Invalid; else begin buReg <= Valid newIt(mrd, val, instr); pc <= predIa; end endaction endfunction February 22, 2011 L07-24http://csg.csail.mit.edu/6.375

Updated newIt function InstrTemplate newIt(Maybe#(RName) mrd, Value val, Instr instr); let nrf(a)=(Valid a == mrd) ? val: rf.sub(a); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return EAdd{dst:rd,op1:nrf(ra),op2:nrf(rb)}; tagged Bz {condR:.rc,addrR:.addr}: return EBz{cond:nrf(rc),tAddr:nrf(addr)}; tagged Load {dst:.rd,addrR:.addr}: return ELoad{dst:rd,addr:nrf(addr)}; tagged Store{valueR:.v,addrR:.addr}: return EStore{val:nrf(v),addr:nrf(addr)}; endcase endfunction February 22, 2011 L07-25http://csg.csail.mit.edu/6.375

Bypassing Now that we’ve correctly bypassed data, we do not have to stall as often The current stall function is correct, but inefficient Should not stall if the value is now being bypassed February 22, 2011 L07-26http://csg.csail.mit.edu/6.375

The stall function for the Inelastic pipeline function Bool newStallFunc (Instr instr, Reg#(Maybe#(InstTemplate)) buReg); case (buReg) matches tagged Invalid: return False; tagged Valid.it: case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return (findf(ra,it) || findf(rb,it)); … Previously we stalled when ra matched the destination register of the instruction in the execute stage. Now we bypass that information when we read, so no stall is necessary. return (false); February 22, 2011 L07-27http://csg.csail.mit.edu/6.375

Inelastic Pipelines Notoriously difficult to get right Imagine the cases to be analyzed if it was a five stage pipeline Difficult to refine for better clock timing elastic pipelines February 22, 2011 L07-28http://csg.csail.mit.edu/6.375