October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.

Slides:

Advertisements

Similar presentations

Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.

Advertisements

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1

March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.

December 12, 2006http://csg.csail.mit.edu/6.827/L24-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1

Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.

September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.

Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.

Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.

Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.

February 20, 2009http://csg.csail.mit.edu/6.375L08-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,

October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Constructive Computer Architecture Tutorial 6: Five Details of SMIPS Implementations Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T05-1.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010

Elastic Pipelines: Concurrency Issues

February 28, 2005http://csg.csail.mit.edu/6.884/L09-1 Bluespec-3: Modules & Interfaces Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Bluespec-3: A non-pipelined processor Arvind

Concurrency properties of BSV methods and rules

Control Hazards Constructive Computer Architecture: Arvind

Bluespec-6: Modeling Processors

Bluespec-6: Modules and Interfaces

Scheduling Constraints on Interface methods

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

Sequential Circuits Constructive Computer Architecture Arvind

Sequential Circuits: Constructive Computer Architecture

Constructive Computer Architecture Tutorial 6: Discussion for lab6

Multistage Pipelined Processors and modular refinement

Performance Specifications

Pipelining combinational circuits

Multirule Systems and Concurrent Execution of Rules

Constructive Computer Architecture: Guards

Pipelining combinational circuits

Modular Refinement Arvind

Modular Refinement Arvind

Lab 4 Overview: 6-stage SMIPS Pipeline

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

Bluespec-7: Scheduling & Rule Composition

Control Hazards Constructive Computer Architecture: Arvind

Modeling Processors: Concurrency Issues

Modules with Guarded Interfaces

Pipelining combinational circuits

Bypassing Computer Architecture: A Constructive Approach Joel Emer

Multistage Pipelined Processors and modular refinement

Modular Refinement Arvind

Elastic Pipelines: Concurrency Issues

Bluespec-3: A non-pipelined processor Arvind

Modular Refinement - 2 Arvind

IP Lookup: Some subtle concurrency issues

Elastic Pipelines: Concurrency Issues

Bluespec-5: Modeling Processors

Constructive Computer Architecture: Guards

Elastic Pipelines and Basics of Multi-rule Systems

Bluespec-7: Scheduling & Rule Composition

Control Hazards Constructive Computer Architecture: Arvind

Pipelined Processors Constructive Computer Architecture: Arvind

IP Lookup: Some subtle concurrency issues

Tutorial 4: RISCV modules Constructive Computer Architecture

Modeling Processors Arvind

Modeling Processors Arvind

Modular Refinement Arvind

Control Hazards Constructive Computer Architecture: Arvind

Implementing for Correct Concurrency

Modular Refinement Arvind

Tutorial 7: SMIPS Labs and Epochs Constructive Computer Architecture

Bluespec-8: Modules and Interfaces

Presentation transcript:

October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

October 20, 2009L14-2http://csg.csail.mit.edu/korea Concurrency analysis Two-stage Pipeline rule fetch_and_decode (!stallfunc(instr, bu)); bu.enq(newIt(instr,rf)); pc <= predIa; endrule rule execAdd (it matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}); rf.upd(rd, va+vb); bu.deq(); endrule rule bzTaken(it matches tagged Bz {cond:.cv,addr:.av}) &&& (cv == 0); pc <= av; bu.clear(); endrule rule bzNotTaken(it matches tagged Bz {cond:.cv,addr:.av}); &&& !(cv == 0); bu.deq(); endrule rule execLoad(it matches tagged ELoad{dst:.rd,addr:.av}); rf.upd(rd, dMem.read(av)); bu.deq(); endrule rule execStore(it matches tagged EStore{value:.vv,addr:.av}); dMem.write(av, vv); bu.deq(); endrule fetch & decode execute pc rf CPU bu For pipelining, we want the behavior in a clock-cycle to be execute < fetch_and_decode

October 20, 2009L14-3http://csg.csail.mit.edu/korea Properties Required of Register File and FIFO for Instruction Pipelining Register File: rf.upd(r1, v) < rf.sub(r2) Bypass RF FIFO bu: {first, deq} < {find, enq}   bu.first < bu.find  bu.first < bu.enq  bu.deq < bu.find  bu.deq < bu.enq Pipeline SFIFO

October 20, 2009L14-4http://csg.csail.mit.edu/korea module mkSFIFO1#(function Bool findf(tr r, t x)) (SFIFO#(t,tr)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full || deqp); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); endmethod method t first() if (full); return (data); endmethod method Action clear(); full <= False; endmethod method Bool find(tr r); return (findf(r, data) && full); endmethod endmodule One Element Searchable Pipeline SFIFO bu.first < bu.enq bu.deq < bu.enq (full && !deqp)); bu.find < bu.enq bu.deq < bu.find bu.enq < bu.clear bu.deq < bu.clear

October 20, 2009L14-5http://csg.csail.mit.edu/korea Register File concurrency properties Normal Register File implementation guarantees: rf.sub < rf.upd  that is, reads happen before writes in concurrent execution But concurrent rf.sub(r1) and rf.upd(r2,v) where r1 ≠ r2 behaves like both rf.sub(r1) < rf.upd(r2,v) rf.sub(r1) > rf.upd(r2,v) To guarantee rf.upd < rf.sub Either bypass the input value to output when register names match Or make sure that on concurrent calls rf.upd and rf.sub do not operate on the same register True for our rules because of stalls but it is too difficult for the compiler to detect

October 20, 2009L14-6http://csg.csail.mit.edu/korea Bypass Register File module mkBypassRFFull(RegFile#(RName,Value)); RegFile#(RName,Value) rf <- mkRegFileFull(); RWire#(Tuple2#(RName,Value)) rw <- mkRWire(); method Action upd (RName r, Value d); rf.upd(r,d); rw.wset(tuple2(r,d)); endmethod method Value sub(RName r); case rw.wget() matches tagged Valid {.wr,.d}: return (wr==r) ? d : rf.sub(r); tagged Invalid: return rf.sub(r); endcase endmethod endmodule Will work only if the compiler lets us ignore conflicts on the rf made by mkRegFileFull “Config reg file”

October 20, 2009L14-7http://csg.csail.mit.edu/korea Since our rules do not really require a Bypass Register File, the overhead of bypassing can be avoided by simply using the “Config Ref file”

October 20, 2009L14-8http://csg.csail.mit.edu/korea An aside Unsafe modules Bluespec allows you to import Verilog modules by identifying wires that correspond to methods Such modules can be made safe either by asserting the correct scheduling properties of the methods or by wrapping the unsafe modules in appropriate Bluespec code

October 20, 2009L14-9http://csg.csail.mit.edu/korea Concurrency analysis Two-stage Pipeline rule fetch_and_decode (!stallfunc(instr, bu)); bu.enq(newIt(instr,rf)); pc <= predIa; endrule rule execAdd (it matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}); rf.upd(rd, va+vb); bu.deq(); endrule rule BzTaken(it matches tagged Bz {cond:.cv,addr:.av}) &&& (cv == 0); pc <= av; bu.clear(); endrule rule BzNotTaken(it matches tagged Bz {cond:.cv,addr:.av}); &&& !(cv == 0); bu.deq(); endrule rule execLoad(it matches tagged ELoad{dst:.rd,addr:.av}); rf.upd(rd, dMem.read(av)); bu.deq(); endrule rule execStore(it matches tagged EStore{value:.vv,addr:.av}); dMem.write(av, vv); bu.deq(); endrule fetch & decode execute pc rf CPU bu all concurrent cases work X

October 20, 2009L14-10http://csg.csail.mit.edu/korea Lot of nontrivial analysis but no change in processor code! Needed Fifos and Register files with the appropriate concurrency properties

October 20, 2009L14-11http://csg.csail.mit.edu/korea Bypassing After decoding the newIt function must read the new register values if available (i.e., the values that are still to be committed in the register file) Will happen automatically if we use bypassRF The instruction fetch must not stall if the new value of the register to be read exists The old stall function is correct but unable to take advantage of bypassing and stalls unnecessarily

October 20, 2009L14-12http://csg.csail.mit.edu/korea The stall function for the asynchronous pipeline function Bool newStallFunc (Instr instr, SFIFO#(InstTemplate, RName) bu); case (instr) matches tagged Add {dst:.rd,src1:.ra,src2:.rb}: return (bu.find(ra) || bu.find(rb)); tagged Bz {cond:.rc,addr:.addr}: return (bu.find(rc) || bu.find(addr)); … bu.find in our Pipeline SFIFO happens after deq. This means that if bu can hold at most one instruction like in the synchronous case, we do not have to stall. Otherwise, we will still need to check for hazards and stall. No change in the stall function

October 20, 2009L14-13http://csg.csail.mit.edu/korea Modular Refinement

October 20, 2009L14-14http://csg.csail.mit.edu/korea Successive refinement & Modular Structure fetch execute iMem rf CPU decode memory pc writeback dMem Can we derive the 5-stage pipeline by successive refinement of a 2-stage pipeline? fetch & decode execute pc rf CPU bu

October 20, 2009L14-15http://csg.csail.mit.edu/korea CPU as one module Read method call Action method call Method calls embody both data and control (i.e., protocol) iMemdMem fetch & decode pc execute RFile rf FIFO bu CPU

October 20, 2009L14-16http://csg.csail.mit.edu/korea CPU as one module module mkCPU#(Mem iMem, Mem dMem)(); // Instantiating state elements Reg#(Iaddress) pc <- mkReg(0); RegFile#(RName, Value) rf <- mkBypassRF(); SFIFO#(InstTemplate, RName) bu <- mkPipelineSFifo(findf); // Some definitions Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; // Rules rule fetch_decode... rule execute... endmodule you have seen this before

October 20, 2009L14-17http://csg.csail.mit.edu/korea A Modular organization iMemRFile rf FIFO bu dMem fetch & decode pc execute setPc CPU enqIt WB stall Suppose we include rf and pc in Fetch and bu in Execute Fetch delivers decoded instructions to Execute and needs to consult Execute for the stall condition Execute writes back data in rf and supplies the pc value in case of a branch misprediction modules call each other (recursive)

October 20, 2009L14-18http://csg.csail.mit.edu/korea Recursive modular organization module mkCPU2#(Mem iMem, Mem dMem)(); Execute execute <- mkExecute(dMem, fetch); Fetch fetch <- mkFetch(iMem, execute); endmodule interface Fetch; method Action setPC (Iaddress cpc); method Action writeback (RName dst, Value v); endinterface interface Execute; method Action enqIt(InstTemplate it); method Bool stall(Instr instr) endinterface recursive calls Unfortunately, recursive module syntax is not as simple

October 20, 2009L14-19http://csg.csail.mit.edu/korea Fetch Module module mkFetch#(IMem iMem, Execute execute) (Fetch); Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; Reg#(Iaddress) pc <- mkReg(0); RegFile#(RName, Bit#(32)) rf <- mkBypassRegFile(); rule fetch_and_decode (!execute.stall(instr)); execute.enqIt(newIt(instr,rf)); pc <= predIa; endrule method Action writeback(RName rd, Value v); rf.upd(rd,v); endmethod method Action setPC(Iaddress newPC); pc <= newPC; endmethod endmodule no change

October 20, 2009L14-20http://csg.csail.mit.edu/korea Execute Module module mkExecute#(DMem dMem, Fetch fetch) (Execute); SFIFO#(InstTemplate) bu <- mkSLoopyFifo(findf); InstTemplate it = bu.first; rule execute … method Action enqIt(InstTemplate it); bu.enq(it); endmethod method Bool stall(Instr instr); return (stallFunc(instr, bu)); endmethod endmodule no change

October 20, 2009L14-21http://csg.csail.mit.edu/korea Execute Module Rule rule execute (True); case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin fetch.writeback(rd, va+vb); bu.deq(); end tagged EBz {cond:.cv,addr:.av}: if (cv == 0) then begin fetch.setPC(av); bu.clear(); end else bu.deq(); tagged ELoad{dst:.rd,addr:.av}: begin fetch.writeback(rd, dMem.read(av)); bu.deq(); end tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); bu.deq(); end endcase endrule

October 20, 2009L14-22http://csg.csail.mit.edu/korea Issue A recursive call structure can be wrong in the sense of “circular calls”; fortunately the compiler can perform this check Unfortunately recursive call structure amongst modules is supported by the compiler in a limited way. The syntax is complicated Recursive modules cannot be synthesized separately

October 20, 2009L14-23http://csg.csail.mit.edu/korea Syntax for Recursive Modules moduleFix is like the Y combinator F = Y F module mkFix#(Tuple2#(Fetch, Execute) fe) (Tuple2#(Fetch, Execute)); match{.f,.e} = fe; Fetch fetch <- mkFetch(e); Execute execute <- mkExecute(f); return(tuple2(fetch,execute)); endmodule (* synthesize *) module mkCPU(Empty); match {.fetch,.execute} <- moduleFix(mkFix); endmodule

October 20, 2009L14-24http://csg.csail.mit.edu/korea Passing parameters module mkCPU#(IMem iMem, DMem dMem)(Empty); module mkFix#(Tuple2#(Fetch, Execute) fe)  (Tuple2#(Fetch, Execute)); match{.f,.e} = fe; Fetch fetch <- mkFetch(iMem,e); Execute execute <- mkExecute(dMem,f); return(tuple2(fetch,execute);  endmodule match {.fetch,.execute} <- moduleFix(mkFix); endmodule

October 20, 2009L14-25http://csg.csail.mit.edu/korea Modular refinements Separating Fetch and Decode Replace magic memory by multicycle memory Multicycle execution module …

October 20, 2009L14-26http://csg.csail.mit.edu/korea Subtle Architecture Issues interface Fetch; method Action setPC (Iaddress cpc); method Action writeback (RName dst, Value v); endinterface interface Execute; method Action enqIt(InstTemplate it); method Bool stall(Instr instr) endinterface After setPC is called the next instruction enqueued via enqIt must correspond to iMem(cpc) stall and writeback methods are closely related; writeback affects the results of subsequent stall s the effect of writeback must be reflected immediately in the decoded instructions Any modular refinement must preserve these extra linguistic semantic properties

October 20, 2009L14-27http://csg.csail.mit.edu/korea Fetch Module Refinement Separating Fetch and Decode module mkFetch#(IMem iMem, Execute execute) (Fetch); FIFO#(Instr) fetchDecodeQ <- mkLoopyFIFO(); Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; … rule fetch(True); pc <= predIa; fetchDecodeQ.enq(instr); endrule rule decode (!execute.stall(fetchDecodeQ.first())); execute.enqIt(newIt(fetchDecodeQ.first(),rf)); fetchDecodeQ.deq(); endrule method Action setPC … method Action writeback … endmodule Are any changes needed in the methods?

October 20, 2009L14-28http://csg.csail.mit.edu/korea Fetch Module Refinement module mkFetch#(IMem iMem, Execute execute) (Fetch); FIFO#(Instr) fetchDecodeQ <- mkFIFO(); … Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; method Action writeback(RName rd, Value v); rf.upd(rd,v); endmethod method Action setPC(Iaddress newPC); pc <= newPC; fetchDecodeQ.clear(); endmethod endmodule no change