Bluespec-5: Scheduling & Rule Composition

Slides:

Advertisements

Similar presentations

BSV execution model and concurrent rule scheduling Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February.

Advertisements

Elastic Pipelines and Basics of Multi-rule Systems Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.

An EHR based methodology for Concurrency management Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Constructive Computer Architecture: Multirule systems and Concurrent Execution of Rules Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.

Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.

March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.

December 10, 2009 L29-1 The Semantics of Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute.

December 12, 2006http://csg.csail.mit.edu/6.827/L24-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.

Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

March 6, 2006http://csg.csail.mit.edu/6.375/L10-1 Bluespec-4: Rule Scheduling and Synthesis Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.

September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.

Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.

Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Constructive Computer Architecture: Control Hazards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology October.

February 20, 2009http://csg.csail.mit.edu/6.375L08-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,

Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Computer Architecture: A Constructive Approach Bluespec execution model and concurrent rule scheduling Teacher: Yoav Etsion Taken (with permission) from.

October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.

Elastic Pipelines: Concurrency Issues

Introduction to Bluespec: A new methodology for designing Hardware

Introduction to Bluespec: A new methodology for designing Hardware

Concurrency properties of BSV methods and rules

Bluespec-6: Modeling Processors

Folded “Combinational” circuits

Scheduling Constraints on Interface methods

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

Sequential Circuits Constructive Computer Architecture Arvind

Sequential Circuits: Constructive Computer Architecture

Introduction to Bluespec: A new methodology for designing Hardware

Performance Specifications

Pipelining combinational circuits

Multirule Systems and Concurrent Execution of Rules

Constructive Computer Architecture: Guards

Sequential Circuits Constructive Computer Architecture Arvind

Pipelining combinational circuits

Modular Refinement Arvind

Modular Refinement Arvind

EHR: Ephemeral History Register

Constructive Computer Architecture: Well formed BSV programs

Blusepc-5: Dead cycles, bubbles and Forwarding in Pipelines Arvind

Bluespec-7: Scheduling & Rule Composition

Modeling Processors: Concurrency Issues

Modules with Guarded Interfaces

Pipelining combinational circuits

Sequential Circuits - 2 Constructive Computer Architecture Arvind

Introduction to Bluespec: A new methodology for designing Hardware

Elastic Pipelines: Concurrency Issues

Multirule systems and Concurrent Execution of Rules

IP Lookup Arvind Computer Science & Artificial Intelligence Lab

Bluespec-4: Rule Scheduling and Synthesis

IP Lookup: Some subtle concurrency issues

Computer Science & Artificial Intelligence Lab.

Elastic Pipelines: Concurrency Issues

Elastic Pipelines and Basics of Multi-rule Systems

Constructive Computer Architecture: Guards

GCD: A simple example to introduce Bluespec

Elastic Pipelines and Basics of Multi-rule Systems

Bluespec-7: Scheduling & Rule Composition

Control Hazards Constructive Computer Architecture: Arvind

Multirule systems and Concurrent Execution of Rules

Introduction to Bluespec: A new methodology for designing Hardware

IP Lookup: Some subtle concurrency issues

Modular Refinement Arvind

Implementing for Correct Concurrency

Constructive Computer Architecture: Well formed BSV programs

Presentation transcript:

Bluespec-5: Scheduling & Rule Composition Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8, 2006 http://csg.csail.mit.edu/6.375/

Executing Multiple Rules Per Cycle: Conflict-free rules rule ra (z > 10); x <= x + 1; endrule rule rb (z > 20); y <= y + 2; Parallel execution behaves like ra < rb = rb < ra Rulea and Ruleb are conflict-free if s . pa(s)  pb(s)  1. pa(db(s))  pb(da(s)) 2. da(db(s)) == db(da(s)) The single rewrite per cycle implementation is correct but not very good. Remember how we described the fetch stage and execute stage in separate rules. If we only fire one rule per clock cycle, the we are not going to get a piplined processor. Parallel Execution can also be understood in terms of a composite rule rule ra_rb((z>10)&&(z>20)); x <= x+1; y <= y+2; endrule March 8, 2006 http://csg.csail.mit.edu/6.375/ 8 8 6 8

Executing Multiple Rules Per Cycle: Sequentially Composable rules rule ra (z > 10); x <= y + 1; endrule rule rb (z > 20); y <= y + 2; Parallel execution behaves like ra < rb Rulea and Ruleb are sequentially composable if s . pa(s)  pb(s)  pb(da(s)) The single rewrite per cycle implementation is correct but not very good. Remember how we described the fetch stage and execute stage in separate rules. If we only fire one rule per clock cycle, the we are not going to get a piplined processor. Parallel Execution can also be understood in terms of a composite rule rule ra_rb((z>10)&&(z>20)); x <= y+1; y <= y+2; endrule March 8, 2006 http://csg.csail.mit.edu/6.375/ 8 8 6 8

Sequentially Composable rules ... rule ra (z > 10); x <= 1; endrule rule rb (z > 20); x <= 2; Parallel execution can behave either like ra < rb or rb < ra but the two behaviors are not the same Composite rules Behavior ra < rb The single rewrite per cycle implementation is correct but not very good. Remember how we described the fetch stage and execute stage in separate rules. If we only fire one rule per clock cycle, the we are not going to get a piplined processor. Behavior rb < ra March 8, 2006 http://csg.csail.mit.edu/6.375/ 8 6 8 8

A property of rule-based systems Adding a new rule to a system can only introduce new behaviors If the new rule is a derived rule, then it does not add new behaviors Example of a derived rule: Given rules: Ra: when pa(s) => s := da(s); Rb: when pb(s) => s := db(s); The following rule is a derived rule: Ra,b: when pa(s) & pb(da(s)) => s := db(da(s)); For CF rules pb(da(s)) = pb(s) and s := db(da(s))= da(db(s)); For SC rules pb(da(s)) = pb(s) and s := db(da(s)); March 8, 2006 http://csg.csail.mit.edu/6.375/

Rule composition rule_1 rule_2 S1 S2 S3 rule_1_2 rule rule_1 (p1(s)); r <= f1(s); endrule rule rule_2 (p2(s)); r <= f2(s); endrule rule rule_1_2 (p1(s) && p2(s’); s <= f2(s’);endrule where s’ = f1(s); Semantics of rule based systems guarantee that rule_1_2 which takes s1 to s3 is correct Such composed rules are called derived rules because they are mechanically derivable March 8, 2006 http://csg.csail.mit.edu/6.375/

Implementation oriented view of concurrency A. When executing a set of rules in a clock cycle, each rule reads state from the leading clock edge and sets state at the trailing clock edge  none of the rules in the set can see the effects of any of the other rules in the set B. However, in one-rule-at-a-time semantics, each rule sees the effects of all previous rule executions Thus, a set of rules can be safely executed together in a clock cycle only if A and B produce the same net state change March 8, 2006 http://csg.csail.mit.edu/6.375/

Pictorially Rules Ri Rj Rk rule steps Rj HW Rk clocks Ri There are more intermediate states in the rule semantics (a state after each rule step) In the HW, states change only at clock edges March 8, 2006 http://csg.csail.mit.edu/6.375/

Parallel execution reorders reads and writes Rules rule steps reads writes reads writes reads writes reads writes reads writes reads writes reads writes clocks HW In the rule semantics, each rule sees (reads) the effects (writes) of previous rules In the HW, rules only see the effects from previous clocks, and only affect subsequent clocks March 8, 2006 http://csg.csail.mit.edu/6.375/

Correctness Rules Ri Rj Rk rule steps Rj HW Rk clocks Ri Rules are allowed to fire in parallel only if the net state change is equivalent to sequential rule execution (i.e., CF or SC) Consequence: the HW can never reach a state unexpected in the rule semantics March 8, 2006 http://csg.csail.mit.edu/6.375/

Compiler determines if two rules can be executed in parallel Rulea and Ruleb are conflict-free if s . pa(s)  pb(s)  1. pa(db(s))  pb(da(s)) 2. da(db(s)) == db(da(s)) Rulea and Ruleb are sequentially composable if s . pa(s)  pb(s)  pb(da(s)) These properties can be determined by examining the domains and ranges of the rules in a pairwise manner. The single rewrite per cycle implementation is correct but not very good. Remember how we described the fetch stage and execute stage in separate rules. If we only fire one rule per clock cycle, the we are not going to get a piplined processor. March 8, 2006 http://csg.csail.mit.edu/6.375/ 8 8 6 8

Mutually Exclusive Rules Rulea and Ruleb are mutually exclusive if they can never be enabled simultaneously s . pa(s)  ~ pb(s) Mutually-exclusive rules are Conflict-free even if they write the same state As an implementation consideration, we are going to require that the result of an sequential application can be reconstructed by combining the effects of the applying the two rules direct on s. Otherwise, we will have to cascade their combination logic which may leads to a longer cycle time. We say two rules are conflict free if they can satisfy all these conditions Mutual-exclusive analysis brings down the cost of conflict-free analysis March 8, 2006 http://csg.csail.mit.edu/6.375/

Conflict-Free Scheduler Partition rules into maximum number of disjoint sets such that a rule in one set may conflict with one or more rules in the same set a rule in one set is conflict free with respect to all the rules in all other sets ( Best case: All sets are of size 1!!) Schedule each set independently Priority Encoder, Round-Robin Priority Encoder Enumerated Encoder To build a scheduler for this, we have to first determine the cofnlict free relationship between all pairs of rules. It is actually very expensive to compute relation exactly, so in a compiler, we can only get an apporixmation by making several different conservative tests. Once we know the pairwise relationships, we can partition the rules into mutilple non-overlapping sets such that rules in different sets are conflict free. Now each set rules can be schedule indepently of the others sets because whatever one set selets will be conflict free with the selections made by the other sets. The state update logic doesn’t have to be changed to use this scheduler The state update logic depends upon whether the scheduler chooses “sequential composition” or not March 8, 2006 http://csg.csail.mit.edu/6.375/

Multiple-Rules-per-Cycle Scheduler pn f1 f2 fn Divide the rules into smallest conflicting groups; provide a scheduler for each group 1. fi  pi 2. p1  p2  ....  pn  f1  f2  ....  fn 3. Multiple operations such that fi  fj  Ri and Rj are conflict-free or sequentially composable March 8, 2006 http://csg.csail.mit.edu/6.375/

Muxing structure Muxing logic requires determining for each register (action method) the rules that update it and under what conditions Conflict Free (Mutually exclusive) and or d1 p1 d2 p2 CF rules either do not update the same element or are ME p1  ~p2 Sequentially composable and or d1 p1 and ~p2 d2 p2 March 8, 2006 http://csg.csail.mit.edu/6.375/

Scheduling and control logic Modules (Current state) Modules (Next state) “CAN_FIRE” “WILL_FIRE” Rules p1 Scheduler f1 d1 p1 pn fn d1 After mapping all the rules, we have to combine their logic some how. For a particular state elemetn like the PC register, the latch enable is the or the enable signals from all the rules that updates PC. The actual next state value of PC has to be selected through a multiplexer. Notice, this circuit only works if only one of these pi signal is asserted at a time Muxing dn pn cond action dn March 8, 2006 http://csg.csail.mit.edu/6.375/

Synthesis Summary Bluespec generates a combinational hardware scheduler allowing multiple enabled rules to execute in the same clock cycle The hardware makes a rule-execution decision on every clock (i.e., it is not a static schedule) Among those rules that CAN_FIRE, only a subset WILL_FIRE that is consistent with a Rule order Since multiple rules can write to a common piece of state, the compiler introduces appropriate muxing logic March 8, 2006 http://csg.csail.mit.edu/6.375/

Scheduling conflicting rules When two rules conflict on a shared resource, they cannot both execute in the same clock The compiler produces logic that ensures that, when both rules are applicable, only one will fire Which one? source annotations March 8, 2006 http://csg.csail.mit.edu/6.375/

Circular Pipeline Code enter? done? RAM cbuf in active rule enter (True); Token t <- cbuf.getToken(); IP ip = in.first(); ram.req(ip[31:16]); active.enq(tuple2(ip[15:0], t)); in.deq(); endrule rule done (True); TableEntry p <- ram.resp(); match {.rip, .t} = active.first(); if (isLeaf(p)) cbuf.done(t, p); else begin active.enq(rip << 8, t); ram.req(p + signExtend(rip[15:7])); end active.deq(); Can rules enter and done be applicable simultaneously? Which one should go? March 8, 2006 http://csg.csail.mit.edu/6.375/

Concurrency Expectations Register FIFO read2 write2 read1 write1 enq2 first2 deq2 clear2 enq1 first1 deq1 clear1 March 8, 2006 http://csg.csail.mit.edu/6.375/

One Element FIFO Concurrency? enq and deq ? module mkFIFO1 (FIFO#(t)); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); method Action enq(t x) if (!full); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; method t first() if (full); return (data); method Action clear(); endmodule enq and deq ? March 8, 2006 http://csg.csail.mit.edu/6.375/

Two-Element FIFO Shift register implementation module mkFIFO2#(FIFO#(t)); Reg#(t) data0 <-mkRegU; Reg#(Bool) full0 <- mkReg(False); Reg#(t) data1 <-mkRegU; Reg#(Bool) full1 <- mkReg(False); method Action enq(t x) if (!(full0 && full1)); data1 <= x; full1 <= True; if (full1) then begin data0 <= data1; full0 <= True; end endmethod method Action deq() if (full0 || full1); if (full0) full0 <= False; else full1 <= False; method t first() if (full0 || full1); return ((full0)?data0:data1); method Action clear(); full0 <= False; full1 <= False; endmodule Shift register implementation March 8, 2006 http://csg.csail.mit.edu/6.375/

The good news ... It is always possible to transform your design to meet desired concurrency and functionality March 8, 2006 http://csg.csail.mit.edu/6.375/

Register Interfaces read < write write < read ? read write.x 1 D Q read write.x write.en March 8, 2006 http://csg.csail.mit.edu/6.375/

Ephemeral History Register (EHR) [MEMOCODE’04] read0 < write0 < read1 < write1 < …. D Q 1 read1 write0.x write0.en read0 write1.x write1.en writei+1 takes precedence over writei March 8, 2006 http://csg.csail.mit.edu/6.375/

One Element FIFO using EHRs first0 < deq0 < enq1 module mkFIFO1 (FIFO#(t)); EHReg2#(t) data <- mkEHReg2U(); EHReg2#(Bool) full <- mkEHReg2(False); method Action enq0(t x) if (!full.read0); full.write0 <= True; data.write0 <= x; endmethod method Action deq0() if (full.read0); full.write0 <= False; method t first0() if (full.read0); return (data.read0); method Action clear0(); endmodule March 8, 2006 http://csg.csail.mit.edu/6.375/

EHR as the base case? write0.x write0.en read0 read0 write1.x writen.x writen.en read1 read2 read2 read3 read3 readn+1 readn+1 March 8, 2006 http://csg.csail.mit.edu/6.375/

The bad news ... EHR cannot be written in Bluespec as defined so far Even though this transformation to meet the performance “specification” is mechanical, the Bluespec compiler currently does not do this transformation. Choices: do it manually and use a library of EHRs rely on a low level (dangerous) programming mechanism. Wires March 8, 2006 http://csg.csail.mit.edu/6.375/

RWires interface RWire #(type t); method Action wset (t data); method Maybe#(t) wget (); endinterface module mkRWire (RWire#(t)); The mkRWire module contains no state and no logic: it’s just wires! By testing the valid bit of wget() we know whether some rule containing wset() is executing concurrently (enab is True) enab n wset no rdy wire (always True) mkRWire wget data a Maybe value containing a data value and a valid bit March 8, 2006 http://csg.csail.mit.edu/6.375/

Intra-clock communication Suppose Rj uses rw.wset() on an RWire Suppose Rk uses rw.wget() on the same RWire If Rj and Rk execute in the same cycle then Rj always precedes Rk in the rule-step semantics Testing isValid(rw.wget()) allows Rk to test whether Rj is executing in the same cycle) wset/wget allows Rj to communicate a value to Rk Intra-clock rule-to-rule communication, provided both rules actually execute concurrently (same cycle) Forward communication only (in the rule-step ordering) Ri Rj Rk rule steps Rj wset(x) clocks Rk mx = wget() March 8, 2006 http://csg.csail.mit.edu/6.375/

One Element FIFO w/ RWires Pipeline FIFO module mkFIFO1#(type t); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); PulseWire deqW <- mkPulseWire(); method Action enq(t x) if (deqW || !full); full <= True; data <= x; endmethod method Action deq() if (full); full <= False; deqW.send(); method t first() if (full); return (data); method Action clear(); full <= False; endmodule first < deq < enq March 8, 2006 http://csg.csail.mit.edu/6.375/

One Element FIFO w/ RWires Bypass FIFO module mkFIFO1#(type t); Reg#(t) data <- mkRegU(); Reg#(Bool) full <- mkReg(False); RWire#(t) enqW <- mkRWire(); PulseWire deqW <- mkPulseWire(); rule finishMethods(isJust(enqW.wget) || deqW); full <= !deqW; endrule method Action enq(t x) if (!full); enqW.wset(x); data <= x; endmethod method Action deq() if (full || isJust(enqW.wget())); deqW.send(); method t first() if (full || isJust(enqW.wget())); return (full ? data : unJust(enqW.wget)); method Action clear(); full <= False; endmodule enq < first < deq March 8, 2006 http://csg.csail.mit.edu/6.375/

A HW implication of mkPipelineFIFO There is now a combinational path from enab_deq to rdy_enq (a consequence of the RWire) This is how a rule using enq() “knows” that it can go even if the FIFO is full, i.e., enab_deq is a signal that a rule using deq() is executing concurrently n enab enq rdy_enq not full || enab_deq n first rdy not empty mkPiplineFIFO enab_deq rdy deq not empty enab clear always true rdy March 8, 2006 http://csg.csail.mit.edu/6.375/

Viewing the schedule The command-line flag -show-schedule can be used to dump the schedule Three groups of information: method scheduling information rule scheduling information the static execution order of rules and methods March 8, 2006 http://csg.csail.mit.edu/6.375/