Modular Refinement Arvind

Slides:

Advertisements

Similar presentations

Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.

Advertisements

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L07-1

Computer Architecture: A Constructive Approach Branch Direction Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab.

March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

December 10, 2009 L29-1 The Semantics of Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute.

December 12, 2006http://csg.csail.mit.edu/6.827/L24-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Realistic Memories and Caches Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology March 21, 2012L13-1

September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.

Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

1 Tutorial: Lab 4 Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.

September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.

Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.

Computer Architecture: A Constructive Approach Next Address Prediction – Six Stage Pipeline Joel Emer Computer Science & Artificial Intelligence Lab. Massachusetts.

Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.

February 20, 2009http://csg.csail.mit.edu/6.375L08-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

1 Tutorial: Lab 4 Again Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,

October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.

October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.

Modeling Processors Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 1, 2010

Elastic Pipelines: Concurrency Issues

February 28, 2005http://csg.csail.mit.edu/6.884/L09-1 Bluespec-3: Modules & Interfaces Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Bluespec-3: A non-pipelined processor Arvind

Concurrency properties of BSV methods and rules

Control Hazards Constructive Computer Architecture: Arvind

Bluespec-6: Modeling Processors

Bluespec-6: Modules and Interfaces

Scheduling Constraints on Interface methods

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

Sequential Circuits Constructive Computer Architecture Arvind

Constructive Computer Architecture Tutorial 6: Discussion for lab6

Multistage Pipelined Processors and modular refinement

in Pipelined Processors

Stmt FSM Arvind (with the help of Nirav Dave)

Performance Specifications

Multirule Systems and Concurrent Execution of Rules

Constructive Computer Architecture: Guards

Constructive Computer Architecture Tutorial 7 Final Project Overview

Modular Refinement Arvind

Lab 4 Overview: 6-stage SMIPS Pipeline

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

Bluespec-7: Scheduling & Rule Composition

Control Hazards Constructive Computer Architecture: Arvind

Modeling Processors: Concurrency Issues

Modules with Guarded Interfaces

Bypassing Computer Architecture: A Constructive Approach Joel Emer

Multistage Pipelined Processors and modular refinement

Modular Refinement Arvind

Elastic Pipelines: Concurrency Issues

Bluespec-3: A non-pipelined processor Arvind

Modular Refinement - 2 Arvind

Multistage Pipelined Processors and modular refinement

Elastic Pipelines: Concurrency Issues

Bluespec-5: Modeling Processors

Constructive Computer Architecture: Guards

Bluespec-7: Scheduling & Rule Composition

Control Hazards Constructive Computer Architecture: Arvind

Pipelined Processors Constructive Computer Architecture: Arvind

Multirule systems and Concurrent Execution of Rules

Tutorial 4: RISCV modules Constructive Computer Architecture

Modeling Processors Arvind

Modeling Processors Arvind

Modular Refinement Arvind

Control Hazards Constructive Computer Architecture: Arvind

Modular Refinement Arvind

Tutorial 7: SMIPS Labs and Epochs Constructive Computer Architecture

Bluespec-8: Modules and Interfaces

Presentation transcript:

Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology January 20, 2011 http://csg.csail.mit.edu/SNU

Successive refinement & Modular Structure fetch execute iMem rf CPU decode memory pc writeback dMem fetch & decode execute pc rf CPU bu Can we derive the 5-stage pipeline by successive refinement of a 2-stage pipeline? January 20, 2011 http://csg.csail.mit.edu/SNU

Architectural refinements Separating Fetch and Decode Replace magic memory by multicycle memory Multicycle functional units … Nirav Dave, M.C. Ng, M. Pellauer, Arvind [Memocode 2010] A design flow based on modular refinement January 20, 2011 http://csg.csail.mit.edu/SNU

CPU as one module iMem dMem CPU fetch & decode RFile rf SFIFO bu execute pc Method calls embody both data and control (i.e., protocol) Read method call Action method call January 20, 2011 http://csg.csail.mit.edu/SNU

A Modular organization CPU iMem RFile rf FIFO bu dMem modules call each other (recursive) execute WB fetch & decode setPc pc stall enqIt Suppose we include rf and pc in Fetch and bu in Execute Fetch delivers decoded instructions to Execute and needs to consult Execute for the stall condition Execute writes back data in rf and supplies the pc value in case of a branch misprediction January 20, 2011 http://csg.csail.mit.edu/SNU

Interface definitions: Fetch and Execute interface Fetch; method Action setPC (Iaddress cpc); method Action writeback (RName dst, Value v); endinterface interface Execute; method Action enqIt(InstTemplate it); method Bool stall(Instr instr) This first method describe what happens in the fetch of this pipeline. This methods says that given any processor term, we can always rewrite it to a new term wehre, and we enqueue the current instuction into bf. And at the same time the pc field is incremende by one It may look like this method can fire forever, but because bf is bounded, when a method enqueue into bf, there is an implied predicate that say bf must not be full. January 20, 2011 http://csg.csail.mit.edu/SNU 8 6 8 8

Recursive modular organization module mkCPU2#(Mem iMem, Mem dMem)(); Execute execute <- mkExecute(dMem, fetch); Fetch fetch <- mkFetch(iMem, execute); endmodule recursive calls Unfortunately, the recursive module syntax is not so simple This first method describe what happens in the fetch of this pipeline. This methods says that given any processor term, we can always rewrite it to a new term wehre, and we enqueue the current instuction into bf. And at the same time the pc field is incremende by one It may look like this method can fire forever, but because bf is bounded, when a method enqueue into bf, there is an implied predicate that say bf must not be full. January 20, 2011 http://csg.csail.mit.edu/SNU 6 8 8 8

Issue A recursive call structure can be wrong in the sense of “circular calls”; fortunately the compiler can perform this check Unfortunately recursive call structure amongst modules is supported by the compiler in a limited way. The syntax is complicated Recursive modules cannot be synthesized separately January 20, 2011 http://csg.csail.mit.edu/SNU

Syntax for Recursive Modules module mkFix#(Tuple2#(Fetch, Execute) fe) (Tuple2#(Fetch, Execute)); match{.f, .e} = fe; Fetch fetch <- mkFetch(e); Execute execute <- mkExecute(f); return(tuple2(fetch,execute)); endmodule (* synthesize *) module mkCPU(Empty); match {.fetch, .execute} <- moduleFix(mkFix); endmodule moduleFix is like the Y combinator F = Y F January 20, 2011 http://csg.csail.mit.edu/SNU

Passing parameters module mkCPU#(IMem iMem, DMem dMem)(Empty); module mkFix#(Tuple2#(Fetch, Execute) fe) (Tuple2#(Fetch, Execute)); match{.f, .e} = fe; Fetch fetch <- mkFetch(iMem,e); Execute execute <- mkExecute(dMem,f); return(tuple2(fetch,execute); endmodule match {.fetch, .execute} <- moduleFix(mkFix); endmodule January 20, 2011 http://csg.csail.mit.edu/SNU

Fetch Module module mkFetch#(IMem iMem, Execute execute) (Fetch); Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; Reg#(Iaddress) pc <- mkReg(0); RegFile#(RName, Bit#(32)) rf <- mkBypassRegFile(); rule fetch_and_decode (!execute.stall(instr)); execute.enqIt(newIt(instr,rf)); pc <= predIa; endrule method Action writeback(RName rd, Value v); rf.upd(rd,v); endmethod method Action setPC(Iaddress newPC); pc <= newPC; endmodule January 20, 2011 http://csg.csail.mit.edu/SNU

Execute Module module mkExecute#(DMem dMem, Fetch fetch) (Execute); SFIFO#(InstTemplate) bu <- mkSPipelineFifo(findf); InstTemplate it = bu.first; rule execute … method Action enqIt(InstTemplate it); bu.enq(it); endmethod method Bool stall(Instr instr); return (stallFunc(instr, bu)); endmodule no change January 20, 2011 http://csg.csail.mit.edu/SNU

Execute Module Rule rule execute (True); case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin fetch.writeback(rd, va+vb); bu.deq(); end tagged EBz {cond:.cv,addr:.av}: if (cv == 0) then begin fetch.setPC(av); bu.clear(); end else bu.deq(); tagged ELoad{dst:.rd,addr:.av}: begin fetch.writeback(rd,dMem.read(av)); bu.deq(); tagged EStore{value:.vv,addr:.av}: begin dMem.write(av, vv); bu.deq(); endcase endrule January 20, 2011 http://csg.csail.mit.edu/SNU

Subtle Architecture Issues interface Fetch; method Action setPC (Iaddress cpc); method Action writeback (RName dst, Value v); endinterface interface Execute; method Action enqIt(InstTemplate it); method Bool stall(Instr instr) After setPC is called the next instruction enqueued via enqIt must correspond to iMem(cpc) stall and writeback methods are closely related; writeback affects the results of subsequent stalls the effect of writeback must be reflected immediately in the decoded instructions This first method describe what happens in the fetch of this pipeline. This methods says that given any processor term, we can always rewrite it to a new term wehre, and we enqueue the current instuction into bf. And at the same time the pc field is incremende by one It may look like this method can fire forever, but because bf is bounded, when a method enqueue into bf, there is an implied predicate that say bf must not be full. Any modular refinement must preserve these extra-linguistic semantic properties January 20, 2011 http://csg.csail.mit.edu/SNU 6 8 8 8

Modular refinement: Separating Fetch and Decode January 20, 2011 http://csg.csail.mit.edu/SNU

Fetch Module Refinement Separating Fetch and Decode module mkFetch#(IMem iMem, Execute execute) (Fetch); FIFO#(Instr) fetch2DecodeQ <- mkPipelineFIFO(); Instr instr = iMem.read(pc); Iaddress predIa = pc + 1; … rule fetch(True); pc <= predIa; fetch2DecodeQ.enq(instr); endrule rule decode (!execute.stall(fetch2DecodeQ.first())); execute.enqIt(newIt(fetch2DecodeQ.first(),rf)); fetch2DecodeQ.deq(); method Action setPC … method Action writeback … endmodule Are any changes needed in the methods? January 20, 2011 http://csg.csail.mit.edu/SNU

Fetch Module Refinement module mkFetch#(IMem iMem, Execute execute) (Fetch); FIFO#(Instr) fetchDecodeQ <- mkFIFO(); … rule fetch … rule decode … method Action writeback(RName rd, Value v); rf.upd(rd,v); endmethod method Action setPC(Iaddress newPC); pc <= newPC; fetch2DecodeQ.clear(); Endmodule no change The stall signal definition guarantees that any order for the execution of the decode rule and writeback method is valid. January 20, 2011 http://csg.csail.mit.edu/SNU

Modular refinement: Replace magic memory by multicycle memory January 20, 2011 http://csg.csail.mit.edu/SNU

The desired behavior iMem.read(pc) needs to be split into a pair of request-response actions: method Action iMem.req(Addr pc) method Actionvalue#(Inst) iMem.res() rule fetch_and_decode(True); instr <- actionvalue imem.req(pc) <$> i <- imem.resp(); return i; endactionvalue execute.enqIt(newIt(instr,rf)) when (!execute.stall(instr)); pc <= predIa; endrule imem req and res change state Unfortunately, BSV does not have multicycle rules <$> a guard is needed to avoid repeatedly requesting the stalled instr January 20, 2011 http://csg.csail.mit.edu/SNU

Action Connectives: Par vs. Seq Parallel compositions (a1;a2) Neither action observes others’ updates Writes are disjoint Natural in hardware Sequential Connective (a1 <$> a2) a2 observes a1’s updates Still atomic Not present in BSV because of implementation complications (r1 <= r2 ; r2 <= r1) swaps r1 & r2 We need to split the rule to get rid of <$> January 20, 2011 http://csg.csail.mit.edu/SNU

Predicating Actions: Guards vs. Ifs Guards affect their surroundings (a1 when p1) ; a2 ==> (a1 ; a2) when p1 The effect of an “if” is local (if p1 then a1) ; a2 ==> if p1 then (a1 ; a2) else a2 p1 has no effect on a2 January 20, 2011 http://csg.csail.mit.edu/SNU

Splitting the rule rule fetch(True); imem.req(pc) pc <= predIa; endrule rule decode(True); let instr <- imem.resp(); execute.enqIt(newIt(instr,rf)) when (!execute.stall(instr)); Suppose the PC was also needed to decode the instruction January 20, 2011 http://csg.csail.mit.edu/SNU

Passing data from Fetch to Decode All data between actions passed through state (imem + fet2decQ) rule fetch(True); imem.req(pc); fet2decQ.enq(pc); pc <= predIa; endrule rule decode(True); let instr <- imem.resp(); let pc = fet2decQ.first(); fet2decQ.deq(); execute.enqIt(newIt(instr,rf,pc)) when (!execute.stall(instr)); January 20, 2011 http://csg.csail.mit.edu/SNU

Methods of Fetch module To finish we need to change method setPC so that the next instruction sent to the Execute module is the correct one method setPC(Iaddress npc); pc <= npc; fet2decQ.clear(); endmethod January 20, 2011 http://csg.csail.mit.edu/SNU

Multicycle memory: Refined Execute Module Rule rule execute (True); case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin fetch.writeback(rd, va+vb); bu.deq(); end tagged EBz {cond:.cv,addr:.av}: if (cv == 0) then begin fetch.setPC(av); bu.clear(); end else bu.deq(); tagged ELoad{dst:.rd,addr:.av}: begin let val <- actionvalue dMem.req(Read {av}) <$> v <- dMem.resp(); return v; endactionvalue; fetch.writeback(rd, val); bu.deq(); end tagged EStore{value:.vv,addr:.av}: begin dMem.req(Write {av, vv}); <$> let val <- dMem.resp(); bu.deq(); end endcase endrule January 20, 2011 http://csg.csail.mit.edu/SNU

Splitting the Backend Rules: The execute rule rule execute (True); bu.deq(); case (it) matches tagged EAdd{dst:.rd,src1:.va,src2:.vb}: begin exec2wbQ.enq ( WWB {dst: rd, val: va+vb}); end tagged EBz {cond:.cv,addr:.av}: if (cv == 0) then begin fetch.setPC(av); bu.clear(); end; tagged ELoad{dst:.rd,addr:.av}: begin dMem.req(Read {addr:av}); exec2wbQ.enq(WLd {dst:rd}); end tagged EStore{value:.vv,addr:.av}: begin dMem.req(Write {addr:av, val:vv}); exec2wbQ.enq(WSt {}); end endcase endrule rule writeback(True); …. Should exec2wbQ be cleared? No because all instructions in exec2wbQ are non speculative and must commit January 20, 2011 http://csg.csail.mit.edu/SNU

Splitting the Backend Rules The writeback rule rule execute (True); … endrule rule writeback(True); exec2wbQ.deq(); case exec2wbQ.first() matches tagged WWB {dst: .rd, val: .v}: fetch.writeback(rd,v); tagged WLd {dst: .rd}: begin let v <- dMem.resp(); fetch.writeback(rd,v); end tagged WSt {} : begin let ack <- dMem.resp(); end default: noAction; endcase endrule January 20, 2011 http://csg.csail.mit.edu/SNU

Correctness Stepwise refinement makes the verification task easier but does not eliminate it We still need to prove that each step is correct January 20, 2011 http://csg.csail.mit.edu/SNU