Folded “Combinational” circuits

Slides:



Advertisements
Similar presentations
Constructive Computer Architecture: Multirule systems and Concurrent Execution of Rules Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
Advertisements

Folding and Pipelining complex combinational circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Stmt FSM Richard S. Uhler Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology (based on a lecture prepared by Arvind)
February 21, 2007http://csg.csail.mit.edu/6.375/L07-1 Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence.
March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L06-1.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 4, 2013
Computer Architecture: A Constructive Approach Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
December 12, 2006http://csg.csail.mit.edu/6.827/L24-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.
September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Constructive Computer Architecture Combinational circuits-2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Folded Combinational Circuits as an example of Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Oct 2, 2014T01-1http://csg.csail.mit.edu/6.175 Constructive Computer Architecture Tutorial 1 BSV Sizhuo Zhang TA.
Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.
Constructive Computer Architecture Tutorial 2 Advanced BSV Sizhuo Zhang TA Oct 9, 2015T02-1http://csg.csail.mit.edu/6.175.
September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
Folding complex combinational circuits to save area Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Non-blocking Caches Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology May 14, 2012L25-1
Realistic Memories and Caches – Part III Li-Shiuan Peh Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology April 4, 2012L15-1.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Introduction to Bluespec: A new methodology for designing Hardware
Introduction to Bluespec: A new methodology for designing Hardware
Bluespec-3: A non-pipelined processor Arvind
Folded Combinational Circuits as an example of Sequential Circuits
Bluespec-6: Modeling Processors
Scheduling Constraints on Interface methods
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Sequential Circuits Constructive Computer Architecture Arvind
Sequential Circuits: Constructive Computer Architecture
Combinational ALU Constructive Computer Architecture Arvind
Combinational circuits-2
Stmt FSM Arvind (with the help of Nirav Dave)
Pipelining combinational circuits
Multirule Systems and Concurrent Execution of Rules
Constructive Computer Architecture: Guards
BSV Types Constructive Computer Architecture Tutorial 1 Andy Wright
Sequential Circuits Constructive Computer Architecture Arvind
Pipelining combinational circuits
BSV objects and a tour of known BSV problems
Bluespec-4: Architectural exploration using IP lookup Arvind
Constructive Computer Architecture: Well formed BSV programs
Modules with Guarded Interfaces
Pipelining combinational circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Introduction to Bluespec: A new methodology for designing Hardware
Bluespec-3: A non-pipelined processor Arvind
Multirule systems and Concurrent Execution of Rules
Stmt FSM Arvind (with the help of Nirav Dave)
IP Lookup: Some subtle concurrency issues
Elastic Pipelines and Basics of Multi-rule Systems
Constructive Computer Architecture: Guards
Elastic Pipelines and Basics of Multi-rule Systems
Simple Synchronous Pipelines
Multirule systems and Concurrent Execution of Rules
Introduction to Bluespec: A new methodology for designing Hardware
IP Lookup: Some subtle concurrency issues
Constructive Computer Architecture: Well formed BSV programs
Simple Synchronous Pipelines
Combinational ALU 6.S078 - Computer Architecture:
Caches-2 Constructive Computer Architecture Arvind
Constructive Computer Architecture Tutorial 1 Andy Wright 6.S195 TA
Presentation transcript:

Folded “Combinational” circuits Constructive Computer Architecture Folded “Combinational” circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology (Andy) Changed course website to correct website September 25, 2017 http://csg.csail.mit.edu/6.175

Design Alternatives Combinational (C) f1 f2 f3 Pipeline (P) f1 f2 f3 Folded (F) Reuse a block, multicycle fi Clock: C < P  F Clock? Area? Throughput? Area: F < C < P Throughput: F < C < P September 25, 2017 http://csg.csail.mit.edu/6.175

Content How to implement loop computations? Need registers to hold the state from one iteration to the next Request-Response Latency-Insensitive modules A common way to implement large combinational circuits is by folding or as loops Multiplication Polymorphic Multiply September 25, 2017 http://csg.csail.mit.edu/6.175

Expressing a loop using registers int s = s0; while (p(s)) {     s = f(s);        } return s; C-code Such a loop cannot be implemented by unfolding because the number of iterations is input-data dependent A register is needed to hold s from one iteration to the next s has to be initialized when the computation starts, and updated every cycle until the computation terminates p notDone s0 f sel en sel = start en = start | notDone s September 25, 2017 http://csg.csail.mit.edu/6.175

Expressing a loop in BSV When a rule executes: the register s is read at the beginning of a clock cycle computations to evaluate the next value of the register and the sen are performed If sen is True then s is updated at the end of the clock cycle A mux is needed to initialize the register Reg#(t) s <- mkRegU(); rule step; if (p(s)) begin      s <= f(s); end endrule p notDone s0 f sel s en How should this circuit be packaged for proper use? sel = start en = start | notDone September 25, 2017 http://csg.csail.mit.edu/6.175

Packaging a computation as a Latency-Insensitive Module startF F getResultF ready busy en Interface with guards interface F#(t); method Action start (t a); method ActionValue#(t) getResult; endinterface September 25, 2017 http://csg.csail.mit.edu/6.175

Request-Response Module module mkF (F#(t)); Reg#(t) s <-mkRegU(); Reg#(Bool) busy <- mkReg(False); rule step; if (p(s)) begin s <= f(s); end endrule method Action start(t a) if (!busy); s <= a; busy <= True; endmethod method ActionValue t getResult if (!p(s)&& busy); busy <= False; return s; endmodule September 25, 2017 http://csg.csail.mit.edu/6.175

Using F inQ outQ F#(t) f <- mkF(…) rule invokeF; getResult get result invoke F start F outQ F#(t) f <- mkF(…) rule invokeF; f.start(inQ.first); inQ.deq; endrule This system is insensitive to the latency of F rule getResult; let x <- f.getResult; outQ.enq(x); endrule A rule can be executed only if guards of all of its actions are true September 25, 2017 http://csg.csail.mit.edu/6.175

Combinational 32-bit multiply function Bit#(64) mul32(Bit#(32) a, Bit#(32) b); Bit#(32) tp = 0; Bit#(32) prod = 0; for(Integer i = 0; i < 32; i = i+1) begin Bit#(32) m = (a[i]==0)? 0 : b; Bit#(33) sum = add32(m,tp,0); prod[i] = sum[0]; tp = sum[32:1]; end return {tp,prod}; endfunction Combinational circuit uses 31 add32 circuits We can reuse the same add32 circuit if we store the partial results in a register September 25, 2017 http://csg.csail.mit.edu/6.175

Multiply using registers function Bit#(64) mul32(Bit#(32) a, Bit#(32) b); Bit#(32) prod = 0; Bit#(32) tp = 0; for(Integer i = 0; i < 32; i = i+1) begin Bit#(32) m = (a[i]==0)? 0 : b; Bit#(33) sum = add32(m,tp,0); prod[i:i] = sum[0]; tp = sum[32:1]; end return {tp,prod}; endfunction Combinational version Need registers to hold a, b, tp, prod and i Update the registers every cycle until we are done September 25, 2017 http://csg.csail.mit.edu/6.175

Sequential Circuit for Multiply Reg#(Bit#(32)) a <- mkRegU(); Reg#(Bit#(32)) b <- mkRegU(); Reg#(Bit#(32)) prod <-mkRegU(); Reg#(Bit#(32)) tp <- mkReg(0); Reg#(Bit#(6)) i <- mkReg(32); rule mulStep; if (i < 32) begin Bit#(32) m = (a[i]==0)? 0 : b; Bit#(33) sum = add32(m,tp,0); prod[i] <= sum[0]; tp <= sum[32:1]; i <= i+1; end endrule state elements a rule to describe the dynamic behavior similar to the loop body in the combinational version So that the rule has no effect until i is set to some other value September 25, 2017 http://csg.csail.mit.edu/6.175

Dynamic selection requires a mux a[i] a i when the selection indices are regular then it is better to use a shift operator (no gates!) a >> a[0],a[1],a[2],… September 25, 2017 http://csg.csail.mit.edu/6.175

Replacing repeated selections by shifts rule mulStep if (i < 32); Bit#(32) m = (a[0]==0)? 0 : b; a <= a >> 1; Bit#(33) sum = add32(m,tp,0); prod <= {sum[0], prod[31:1]}; tp <= sum[32:1]; i <= i+1; endrule September 25, 2017 http://csg.csail.mit.edu/6.175

Circuit for Sequential Multiply bIn aIn << 31:0 s1 b result (high) 31 add 32:1 == 32 done +1 s1 result (low) [30:0] << s1 s1 i a tp prod s2 s2 s2 s2 s1 = start_en s2 = start_en | !done September 25, 2017 http://csg.csail.mit.edu/6.175

Circuit analysis Number of add32 circuits has been reduced from 31 to one, though some registers and muxes have been added The longest combinational path has been reduced from 62 FAs to one add32 plus a few muxes The sequential circuit will take 31 clock cycles to compute an answer (Andy) longest combinational path was effectively 63 1-bit adders. September 25, 2017 http://csg.csail.mit.edu/6.175

Packaging Multiply as a Latency-Insensitive Module startMul Multiply getMulResult ready busy en 64 32 Interface with guards interface Multiply; method Action startMul (Bit#(32) a, Bit#(32) b); method ActionValue#(Bit#(64)) getResultMul; endinterface September 25, 2017 http://csg.csail.mit.edu/6.175

Multiply Module Module mkMultiply (Multiply); Reg#(Bit#(32)) a<-mkRegU(); Reg#(Bit#(32)) b<-mkRegU(); Reg#(Bit#(32)) prod <-mkRegU(); Reg#(Bit#(32)) tp <- mkReg(0); Reg#(Bit#(6)) i <- mkReg(32); Reg#(Bool) busy <- mkReg(False); rule mulStep if (i < 32); Bit#(32) m = (a[0]==0)? 0 : b; a <= a >> 1; Bit#(33) sum = add32(m,tp,0); prod <= {sum[0], prod[31:1]}; tp <= sum[32:1]; i <= i+1; endrule method Action startMul(Bit#(32) x, Bit#(32) y) if (!busy); a <= x; b <= y; busy <= True; i <= 0; endmethod method ActionValue Bit#(64) getMulRes if ((i==32) && busy); busy <= False; return {tp,prod}; endmethod September 25, 2017 http://csg.csail.mit.edu/6.175

Circuit for Sequential Multiply b en rdy startMul getMulRes bIn b a i == 32 done +1 prod result (low) [30:0] aIn << 31:0 tp s1 s2 result (high) 31 add 32:1 s1 = start_en s2 = start_en | !done busy September 25, 2017 http://csg.csail.mit.edu/6.175

Polymorphic Multiply Module startMul Multiply getMulResult ready busy en 64 32 Tmul#(t,2) Polymorphic Interface t interface Multiply#(32); method Action startMul (Bit#(32) a, Bit#(32) b); method ActionValue#(Bit#(64)) getResultMul; endinterface t t Tmul#(t,2) September 25, 2017 http://csg.csail.mit.edu/6.175

Polymorphic Multiply Module mkMultiply (Multiply#(t)); Reg#(Bit#(t)) a<-mkRegU(); Reg#(Bit#(t)) b<-mkRegU(); Reg#(Bit#(t)) prod <-mkRegU(); Reg#(Bit#(t)) tp <- mkReg(0); vt = valueOf(t); Reg#(Bit#(TAdd#(1, TLog#(t)))) i <- mkReg(vt); Reg#(Bool) busy <- mkReg(False); rule mulStep if (i < vt); Bit#(t) m = (a[0]==0)? 0 : b; a <= a >> 1; Bit#(Tadd#(t)) sum = addN(m,tp,0); prod <= {sum[0], prod[(vt-1):1]}; tp <= sum[vt:1]; i <= i+1; endrule method Action startMul(Bit#(t) x, Bit#(t) y) if (!busy); a <= x; b <= y; busy <= True; i<=0; endmethod method ActionValue#(Bit#(TMul#(t,2)) getMulRes if ((i==vt)&& busy); busy <= False; return {tp,prod}; endmethod September 25, 2017 http://csg.csail.mit.edu/6.175