Elastic Pipelines and Basics of Multi-rule Systems

Slides:



Advertisements
Similar presentations
BSV execution model and concurrent rule scheduling Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February.
Advertisements

Elastic Pipelines and Basics of Multi-rule Systems Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.
An EHR based methodology for Concurrency management Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Constructive Computer Architecture: Multirule systems and Concurrent Execution of Rules Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.
Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.
March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.
December 10, 2009 L29-1 The Semantics of Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute.
December 12, 2006http://csg.csail.mit.edu/6.827/L24-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.
September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.
September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
February 20, 2009http://csg.csail.mit.edu/6.375L08-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Simple Inelastic and Folded Pipelines Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 14, 2011L04-1.
Computer Architecture: A Constructive Approach Bluespec execution model and concurrent rule scheduling Teacher: Yoav Etsion Taken (with permission) from.
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
Elastic Pipelines: Concurrency Issues
EHRs: Designing modules with concurrent methods
EHRs: Designing modules with concurrent methods
Introduction to Bluespec: A new methodology for designing Hardware
Introduction to Bluespec: A new methodology for designing Hardware
Concurrency properties of BSV methods and rules
Bluespec-6: Modeling Processors
Folded “Combinational” circuits
Scheduling Constraints on Interface methods
Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind
Sequential Circuits Constructive Computer Architecture Arvind
Sequential Circuits: Constructive Computer Architecture
Introduction to Bluespec: A new methodology for designing Hardware
Performance Specifications
Pipelining combinational circuits
Multirule Systems and Concurrent Execution of Rules
Constructive Computer Architecture: Guards
Sequential Circuits Constructive Computer Architecture Arvind
Pipelining combinational circuits
EHR: Ephemeral History Register
Constructive Computer Architecture: Well formed BSV programs
Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind
Bluespec-7: Scheduling & Rule Composition
Modeling Processors: Concurrency Issues
Modules with Guarded Interfaces
Pipelining combinational circuits
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Introduction to Bluespec: A new methodology for designing Hardware
Elastic Pipelines: Concurrency Issues
Bluespec-3: A non-pipelined processor Arvind
Multirule systems and Concurrent Execution of Rules
Bluespec-4: Rule Scheduling and Synthesis
IP Lookup: Some subtle concurrency issues
Computer Science & Artificial Intelligence Lab.
Elastic Pipelines: Concurrency Issues
Constructive Computer Architecture: Guards
GCD: A simple example to introduce Bluespec
Elastic Pipelines and Basics of Multi-rule Systems
Bluespec-7: Scheduling & Rule Composition
Control Hazards Constructive Computer Architecture: Arvind
Multirule systems and Concurrent Execution of Rules
Introduction to Bluespec: A new methodology for designing Hardware
IP Lookup: Some subtle concurrency issues
Bluespec-5: Scheduling & Rule Composition
Pipelining combinational circuits
Implementing for Correct Concurrency
Constructive Computer Architecture: Well formed BSV programs
Presentation transcript:

Elastic Pipelines and Basics of Multi-rule Systems Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 17, 2010 http://csg.csail.mit.edu/6.375

Inelastic Pipeline x sReg1 inQ f1 f2 f3 sReg2 outQ rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= Valid f1(inQ.first()); inQ.deq(); end else sReg1 <= Invalid; case (sReg1) matches tagged Valid .sx1: sReg2 <= Valid f2(sx1); tagged Invalid: sReg2 <= Invalid; case (sReg2) matches tagged Valid .sx2: outQ.enq(f3(sx2)); endrule February 17, 2010 http://csg.csail.mit.edu/6.375

Elastic pipeline Use FIFOs instead of pipeline registers x inQ fifo1 fifo2 outQ rule stage1 (True); fifo1.enq(f1(inQ.first()); inQ.deq(); endrule rule stage2 (True); fifo2.enq(f2(fifo1.first()); fifo1.deq(); endrule rule stage3 (True); outQ.enq(f3(fifo2.first()); fifo2.deq(); endrule Firing conditions? Can tokens be left inside the pipeline? No Maybe types? Easier to write? Can all three rules fire concurrently? February 17, 2010 http://csg.csail.mit.edu/6.375

Inelastic vs Elastic Pipelines In an Inelastic pipeline: typically only one rule; the designer controls precisely which activities go on in parallel downside: The rule can get too complicated -- easy to make a mistake; difficult to make changes In an Elastic pipeline: several smaller rules, each easy to write, easier to make changes downside: sometimes rules do not fire concurrently when they should February 17, 2010 http://csg.csail.mit.edu/6.375

What behavior do we want? x fifo1 inQ f1 f2 f3 fifo2 outQ If inQ, fifo1 and fifo2 are not empty and fifo1, fifo2 and outQ are not full then we want all the three rules to fire If inQ is empty, fifo1 and fifo2 are not empty and fifo2 and outQ are not full then we want rules stage2 and stage3 to fire … Maximize concurrency - Fire maximum number of rules February 17, 2010 http://csg.csail.mit.edu/6.375

The tension If multiple rules never fire in the same cycle then the machine can hardly be called a pipelined machine If all rules fire in parallel every cycle when they are enabled, then, in general, wrong results can be produced February 17, 2010 http://csg.csail.mit.edu/6.375

Concurrency analysis and rule scheduling February 17, 2010 http://csg.csail.mit.edu/6.375

Guarded Atomic Actions (GAA): Execution model Repeatedly: Select a rule to execute Compute the state updates Make the state updates Highly non-deterministic User annotations can help in rule selection Implementation concern: Schedule multiple rules concurrently without violating one-rule-at-a-time semantics February 17, 2010 http://csg.csail.mit.edu/6.375

some insight into Concurrent rule firing Rules Ri Rj Rk rule steps Rj HW Rk clocks Ri There are more intermediate states in the rule semantics (a state after each rule step) In the HW, states change only at clock edges February 17, 2010 http://csg.csail.mit.edu/6.375

Parallel execution reorders reads and writes Rules rule steps reads writes reads writes reads writes reads writes reads writes reads writes reads writes clocks HW In the rule semantics, each rule sees (reads) the effects (writes) of previous rules In the HW, rules only see the effects from previous clocks, and only affect subsequent clocks February 17, 2010 http://csg.csail.mit.edu/6.375

Correctness Rules Ri Rj Rk rule steps Rj HW Rk clocks Ri Rules are allowed to fire in parallel only if the net state change is equivalent to sequential rule execution Consequence: the HW can never reach a state unexpected in the rule semantics February 17, 2010 http://csg.csail.mit.edu/6.375

A compiler can determine if two rules can be executed in parallel without violating the one-rule-at-a-time semantics James Hoe, Ph.D., 2000 February 17, 2010 http://csg.csail.mit.edu/6.375

Rule: As a State Transformer A rule may be decomposed into two parts p(s) and d(s) such that snext = if p(s) then d(s) else s p(s) is the condition (predicate) of the rule, a.k.a. the “CAN_FIRE” signal of the rule. p is a conjunction of explicit and implicit conditions d(s) is the “state transformation” function, i.e., computes the next-state values from the current state values Abstractly, we can think of a rule having to parts, a pi function and a delta function. The pi function tells us whetherule can be applied to a term s If the the pievaluates to true, then the delta function tells us what is the new term. And if pi is false, the rule cannot change s February 17, 2010 http://csg.csail.mit.edu/6.375

Executing Multiple Rules Per Cycle: Conflict-free rules rule ra (z > 10); x <= x + 1; endrule rule rb (z > 20); y <= y + 2; Parallel execution behaves like ra < rb or equivalently rb < ra Rulea and Ruleb are conflict-free if s . pa(s)  pb(s)  1. pa(db(s))  pb(da(s)) 2. da(db(s)) == db(da(s)) The single rewrite per cycle implementation is correct but not very good. Parallel Execution can also be understood in terms of a composite rule rule ra_rb; if (z>10) then x <= x+1; if (z>20) then y <= y+2; endrule February 17, 2010 http://csg.csail.mit.edu/6.375 8 8 8 6

Mutually Exclusive Rules Rulea and Ruleb are mutually exclusive if they can never be enabled simultaneously s . pa(s)  ~ pb(s) Mutually-exclusive rules are Conflict-free by definition February 17, 2010 http://csg.csail.mit.edu/6.375

Executing Multiple Rules Per Cycle: Sequentially Composable rules rule ra (z > 10); x <= y + 1; endrule rule rb (z > 20); y <= y + 2; Parallel execution behaves like ra < rb - R(rb) is the range of rule rb - Prjst is the projection selecting st from the total state Rulea and Ruleb are sequentially composable if s . pa(s)  pb(s)  1. pb(da(s)) 2. PrjR(rb)(db(s)) == PrjR(rb)(db(da(s))) Parallel Execution can also be understood in terms of a composite rule rule ra_rb; if (z>10) then x <= y+1; if (z>20) then y <= y+2; endrule February 17, 2010 http://csg.csail.mit.edu/6.375 8 8 8 6

Compiler determines if two rules can be executed in parallel Rulea and Ruleb are conflict-free if s . pa(s)  pb(s)  1. pa(db(s))  pb(da(s)) 2. da(db(s)) == db(da(s)) D(Ra)  R(Rb) =  D(Rb)  R(Ra) =  R(Ra)  R(Rb) =  Rulea and Ruleb are sequentially composable if s . pa(s)  pb(s)  1. pb(da(s)) 2. PrjR(Rb)(db(s)) == PrjR(Rb)(db(da(s))) D(Rb)  R(Ra) =  These conditions are sufficient but not necessary These properties can be determined by examining the domains and ranges of the rules in a pairwise manner. Parallel execution of CF and SC rules does not increase the critical path delay February 17, 2010 http://csg.csail.mit.edu/6.375 8 8 8 6

Conflicting rules rule ra (True); x <= y + 1; endrule rule rb (True); y <= x + 2; Assume x and y are initially zero Concurrent execution of these can produce x=1 and y=2 but these values cannot be produced by any sequential execution ra followed by rb would produce x=1 and y=3 rb followed by ra would produce x=3 and y=2 Such rules must be executed one-by-one and not concurrently February 17, 2010 http://csg.csail.mit.edu/6.375

The compiler issue Can the compiler detect all the conflicting conditions? Important for correctness Does the compiler detect conflicts that do not exist in reality? False positives lower the performance The main reason is that sometimes the compiler cannot detect under what conditions the two rules are mutually exclusive or conflict free What can the user specify easily? Rule priorities to resolve nondeterministic choice yes yes In many situations the correctness of the design is not enough; the design is not done unless the performance goals are met February 17, 2010 http://csg.csail.mit.edu/6.375

Concurrency in Elastic pipeline f1 f2 f3 x inQ fifo1 fifo2 outQ Can all three rules fire concurrently? rule stage1 (True); fifo1.enq(f1(inQ.first()); inQ.deq(); endrule rule stage2 (True); fifo2.enq(f2(fifo1.first()); fifo1.deq(); endrule rule stage3 (True); outQ.enq(f3(fifo2.first()); fifo2.deq(); endrule Consider rules stage1 and stage2: No conflict around inQ or fifo2. What can we assume about enq, deq and first methods of fifo1? we want the FIFO to behave as if first < deq < enq February 17, 2010 http://csg.csail.mit.edu/6.375

Concurrency in FIFOs February 17, 2010 http://csg.csail.mit.edu/6.375

One-Element FIFO enq and deq cannot even be enabled together much less fire concurrently! module mkFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); method Action enq(t x) if (!full); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; method t first() if (full); return (data); method Action clear(); endmodule n not empty not full rdy enab enq deq module FIFO Can we build a FIFO such that functionally deq “happens” before enq and if deq does not happen then enq behaves normally February 17, 2010 http://csg.csail.mit.edu/6.375

Two-Element FIFO d1 d0 module mkFIFO (FIFO#(t)); Reg#(t) d0 <- mkRegU(); Reg#(Bool) v0 <- mkReg(False); Reg#(t) d1 <- mkRegU(); Reg#(Bool) v1 <- mkReg(False); method Action enq(t x) if (!v1); if v0 then begin d1 <= x; v1 <= True; end else begin d0 <= x; v0 <= True; end endmethod method Action deq() if (v0); if v1 then begin d0 <= d1; v1 <= False; end else begin v0 <= False; end endmethod method t first() if (v0); return d0; endmethod method Action clear(); v0<= False; v1 <= False; endmethod endmodule Assume, if there is only one element in the FIFO it resides in d0 enq and deq can be enabled together and they do not conflict A compiler may not succeed in doing such analysis February 17, 2010 http://csg.csail.mit.edu/6.375

Two-Element FIFO another version d1 d0 module mkFIFO (FIFO#(t)); Reg#(t) d0 <- mkRegU(); Reg#(Bool) v0 <- mkReg(False); Reg#(t) d1 <- mkRegU(); Reg#(Bool) v1 <- mkReg(False); method Action enq(t x) if (!v1); v0 <= True; v1 <= v0; if v0 then d1 <= x; else d0 <= x; endmethod method Action deq() if (v0); v1 <= False; v0 <= v1; d0 <= d1; endmethod method t first() if (v0); return d0; endmethod method Action clear(); v0<= False; v1 <= False; endmethod endmodule Assume, if there is only one element in the FIFO it resides in d0 enq and deq can be enabled together but apparently conflict Compiler has no chance to be able to deduce the concurrency of enq and deq February 17, 2010 http://csg.csail.mit.edu/6.375

RWire to rescue interface RWire#(type t); method Action wset(t x); method Maybe#(t) wget(); endinterface Like a register in that you can read and write it but unlike a register - read happens after write - data disappears in the next cycle RWires can break the atomicity of a rule if not used properly February 17, 2010 http://csg.csail.mit.edu/6.375

One-Element Pipeline FIFO not empty not full rdy enab enq deq module FIFO or !full module mkLFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); RWire#(void) deqEN <- mkRWire(); Bool deqp = isValid (deqEN.wget())); method Action enq(t x) if (!full || deqp); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqEN.wset(?); method t first() if (full); return (data); method Action clear(); full <= False; endmethod endmodule This works correctly in both cases (fifo full and fifo empty) first < enq deq < enq enq < clear deq < clear February 17, 2010 http://csg.csail.mit.edu/6.375

FIFOs Ordinary one element FIFO Pipeline FIFO Bypass FIFO deq & enq conflict – won’t do Pipeline FIFO first < deq < enq < clear Bypass FIFO enq < first < deq < clear February 17, 2010 http://csg.csail.mit.edu/6.375

Takeaway FIFOs with concurrent operations are quite difficult to design, though the amount of hardware involved is small FIFOs with appropriate properties are in the BSV library Various FIFOs affect performance but not correctness For performance, concentrate on high-level design and then search for modules with appropriate properties February 17, 2010 http://csg.csail.mit.edu/6.375

Extras Scheduler synthesis February 17, 2010 http://csg.csail.mit.edu/6.375

Scheduling and control logic Modules (Current state) “CAN_FIRE” “WILL_FIRE” Modules (Next state) Rules p1 Scheduler f1 d1 p1 pn fn d1 After mapping all the rules, we have to combine their logic some how. For a particular state elemetn like the PC register, the latch enable is the or the enable signals from all the rules that updates PC. The actual next state value of PC has to be selected through a multiplexer. Notice, this circuit only works if only one of these pi signal is asserted at a time Muxing dn pn cond action dn Compiler synthesizes a scheduler such that at any given time f’s for only non-conflicting rules are true February 17, 2010 http://csg.csail.mit.edu/6.375

Multiple-Rules-per-Cycle Scheduler pn f1 f2 fn Divide the rules into smallest conflicting groups; provide a scheduler for each group 1. fi  pi 2. p1  p2  ....  pn  f1  f2  ....  fn 3. Multiple operations such that fi  fj  Ri and Rj are conflict-free or sequentially composable February 17, 2010 http://csg.csail.mit.edu/6.375

Muxing structure Muxing logic requires determining for each register (action method) the rules that update it and under what conditions Conflict Free/Mutually Exclusive) and or d1 p1 d2 p2 If two CF rules update the same element then they must be mutually exclusive (p1  ~p2) Sequentially Composable and or d1 p1 and ~p2 d2 p2 February 17, 2010 http://csg.csail.mit.edu/6.375