GCD: A simple example to introduce Bluespec

Slides:

Advertisements

Similar presentations

BSV execution model and concurrent rule scheduling Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February.

Advertisements

Elastic Pipelines and Basics of Multi-rule Systems Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February.

An EHR based methodology for Concurrency management Arvind (with Asif Khan) Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Constructive Computer Architecture: Multirule systems and Concurrent Execution of Rules Arvind Computer Science & Artificial Intelligence Lab. Massachusetts.

March 2007http://csg.csail.mit.edu/arvindSemantics-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Overview Logistics Last lecture Today HW5 due today

Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.

Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology October 13, 2009http://csg.csail.mit.edu/koreaL12-1.

December 10, 2009 L29-1 The Semantics of Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute.

Computer Architecture: A Constructive Approach Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

December 12, 2006http://csg.csail.mit.edu/6.827/L24-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Pipelining combinational circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology February 20, 2013http://csg.csail.mit.edu/6.375L05-1.

February 14, 2007L04-1http://csg.csail.mit.edu/6.375/ Bluespec-1: Design methods to facilitate rapid growth of SoCs Arvind Computer Science & Artificial.

September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.

Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Multiple Clock Domains (MCD) Continued … Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology November.

March 6, 2006http://csg.csail.mit.edu/6.375/L10-1 Bluespec-4: Rule Scheduling and Synthesis Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.

September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.

Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology

Elastic Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 28, 2011L08-1http://csg.csail.mit.edu/6.375.

Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September.

Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

February 20, 2009http://csg.csail.mit.edu/6.375L08-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,

October 22, 2009http://csg.csail.mit.edu/korea Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.

Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.

Computer Architecture: A Constructive Approach Bluespec execution model and concurrent rule scheduling Teacher: Yoav Etsion Taken (with permission) from.

October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.

Elastic Pipelines: Concurrency Issues

Overview Logistics Last lecture Today HW5 due today

Introduction to Bluespec: A new methodology for designing Hardware

Introduction to Bluespec: A new methodology for designing Hardware

Concurrency properties of BSV methods and rules

Bluespec-6: Modeling Processors

Introduction Introduction to VHDL Entities Signals Data & Scalar Types

Folded “Combinational” circuits

Sequential Circuits - 2 Constructive Computer Architecture Arvind

Sequential Circuits Constructive Computer Architecture Arvind

Sequential Circuits: Constructive Computer Architecture

Introduction to Bluespec: A new methodology for designing Hardware

Stmt FSM Arvind (with the help of Nirav Dave)

Performance Specifications

Multirule Systems and Concurrent Execution of Rules

Bluespec-1: Design Affects Everything

Constructive Computer Architecture: Guards

Sequential Circuits Constructive Computer Architecture Arvind

Modular Refinement Arvind

Modular Refinement Arvind

Bluespec-7: Scheduling & Rule Composition

Modeling Processors: Concurrency Issues

Modules with Guarded Interfaces

Sequential Circuits - 2 Constructive Computer Architecture Arvind

Introduction to Bluespec: A new methodology for designing Hardware

Elastic Pipelines: Concurrency Issues

Multirule systems and Concurrent Execution of Rules

Stmt FSM Arvind (with the help of Nirav Dave)

Bluespec-4: Rule Scheduling and Synthesis

Elastic Pipelines: Concurrency Issues

Elastic Pipelines and Basics of Multi-rule Systems

Constructive Computer Architecture: Guards

Elastic Pipelines and Basics of Multi-rule Systems

Bluespec-7: Scheduling & Rule Composition

Design methods to facilitate rapid growth of SoCs

Multirule systems and Concurrent Execution of Rules

Introduction to Bluespec: A new methodology for designing Hardware

Bluespec-5: Scheduling & Rule Composition

Modular Refinement Arvind

Presentation transcript:

GCD: A simple example to introduce Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 15, 2008 February 15, 2008 http://csg.csail.mit.edu/6.375

Bluespec: State and Rules organized into modules interface module All state (e.g., Registers, FIFOs, RAMs, ...) is explicit. Behavior is expressed in terms of atomic actions on the state: Rule: guard  action Rules can manipulate state in other modules only via their interfaces. February 15, 2008 http://csg.csail.mit.edu/6.375

Programming with rules: A simple example Euclid’s algorithm for computing the Greatest Common Divisor (GCD): 15 6 9 6 subtract 3 6 subtract 6 3 swap 3 3 subtract 0 3 subtract answer: February 15, 2008 http://csg.csail.mit.edu/6.375

GCD in BSV If (a==0) then 0 else b module mkGCD (I_GCD); x y swap sub module mkGCD (I_GCD); Reg#(int) x <- mkRegU; Reg#(int) y <- mkReg(0); rule swap ((x > y) && (y != 0)); x <= y; y <= x; endrule rule subtract ((x <= y) && (y != 0)); y <= y – x; method Action start(int a, int b) if (y==0); x <= a; y <= b; endmethod method int result() if (y==0); return x; endmodule State typedef int Int#(32) Internal behavior External interface If (a==0) then 0 else b Assume a/=0 February 15, 2008 http://csg.csail.mit.edu/6.375

GCD Hardware Module t implicit conditions interface I_GCD; #(type t) rdy enab int start result module GCD In a GCD call t could be Int#(32), UInt#(16), Int#(13), ... y == 0 implicit conditions interface I_GCD; method Action start (int a, int b); method int result(); endinterface The module can easily be made polymorphic Many different implementations can provide the same interface: module mkGCD (I_GCD) February 15, 2008 http://csg.csail.mit.edu/6.375

GCD: Another implementation module mkGCD (I_GCD); Reg#(int) x <- mkRegU; Reg#(int) y <- mkReg(0); rule swapANDsub ((x > y) && (y != 0)); x <= y; y <= x - y; endrule rule subtract ((x<=y) && (y!=0)); y <= y – x; method Action start(int a, int b) if (y==0); x <= a; y <= b; endmethod method int result() if (y==0); return x; endmodule Combine swap and subtract rule Does it compute faster ? February 15, 2008 http://csg.csail.mit.edu/6.375

Bluespec SystemVerilog source Bluespec Tool flow Blueview Bluespec SystemVerilog source Bluespec Compiler C Bluesim Cycle Accurate Verilog 95 RTL RTL synthesis gates Verilog sim VCD output files Bluespec tools 3rd party tools Legend Debussy Visualization Place & Route Physical February 15, 2008 http://csg.csail.mit.edu/6.375 Tapeout

Generated Verilog RTL: GCD module mkGCD(CLK,RST_N,start_a,start_b,EN_start,RDY_start, result,RDY_result); input CLK; input RST_N; // action method start input [31 : 0] start_a; input [31 : 0] start_b; input EN_start; output RDY_start; // value method result output [31 : 0] result; output RDY_result; // register x and y reg [31 : 0] x; wire [31 : 0] x$D_IN; wire x$EN; reg [31 : 0] y; wire [31 : 0] y$D_IN; wire y$EN; ... // rule RL_subtract assign WILL_FIRE_RL_subtract = x_SLE_y___d3 && !y_EQ_0___d10 ; // rule RL_swap assign WILL_FIRE_RL_swap = !x_SLE_y___d3 && !y_EQ_0___d10 ; February 15, 2008 http://csg.csail.mit.edu/6.375

Generated Hardware start result x_en = swap? y_en = swap? OR subtract? rdy start result next state values sub x_en y_en x y > !(=0) swap? subtract? predicates x_en = y_en = swap? swap? OR subtract? February 15, 2008 http://csg.csail.mit.edu/6.375

Generated Hardware Module x y en rdy start result start_en start_en x_en y_en x y > !(=0) swap? subtract? sub x_en = swap? y_en = swap? OR subtract? OR start_en OR start_en rdy = (y==0) February 15, 2008 http://csg.csail.mit.edu/6.375

GCD: A Simple Test Bench module mkTest (); Reg#(int) state <- mkReg(0); I_GCD gcd <- mkGCD(); rule go (state == 0); gcd.start (423, 142); state <= 1; endrule rule finish (state == 1); $display (“GCD of 423 & 142 =%d”,gcd.result()); state <= 2; endmodule Why do we need the state variable? February 15, 2008 http://csg.csail.mit.edu/6.375

GCD: Test Bench Feeds all pairs (c1,c2) 1 < c1 < 7 module mkTest (); Reg#(int) state <- mkReg(0); Reg#(Int#(4)) c1 <- mkReg(1); Reg#(Int#(7)) c2 <- mkReg(1); I_GCD gcd <- mkGCD(); rule req (state==0); gcd.start(signExtend(c1), signExtend(c2)); state <= 1; endrule rule resp (state==1); $display (“GCD of %d & %d =%d”, c1, c2, gcd.result()); if (c1==7) begin c1 <= 1; c2 <= c2+1; end else c1 <= c1+1; if (c1==7 && c2==63) state <= 2 else state <= 0; endmodule Feeds all pairs (c1,c2) 1 < c1 < 7 1 < c2 < 63 to GCD February 15, 2008 http://csg.csail.mit.edu/6.375

GCD: Synthesis results Original (16 bits) Clock Period: 1.6 ns Area: 4240 mm2 Unrolled (16 bits) Clock Period: 1.65ns Area: 5944 mm2 Unrolled takes 31% fewer cycles on the testbench February 15, 2008 http://csg.csail.mit.edu/6.375

Rule scheduling and the synthesis of a scheduler February 15, 2008 http://csg.csail.mit.edu/6.375

GAA Execution model Repeatedly: Select a rule to execute Compute the state updates Make the state updates Highly non-deterministic User annotations can help in rule selection Implementation concern: Schedule multiple rules concurrently without violating one-rule-at-a-time semantics February 15, 2008 http://csg.csail.mit.edu/6.375

Rule: As a State Transformer A rule may be decomposed into two parts p(s) and d(s) such that snext = if p(s) then d(s) else s p(s) is the condition (predicate) of the rule, a.k.a. the “CAN_FIRE” signal of the rule. (conjunction of explicit and implicit conditions) d(s) is the “state transformation” function, i.e., computes the next-state value in terms of the current state values. Abstractly, we can think of a rule having to parts, a pi function and a delta function. The pi function tells us whetherule can be applied to a term s If the the pievaluates to true, then the delta function tells us what is the new term. And if pi is false, the rule cannot change s February 15, 2008 http://csg.csail.mit.edu/6.375

Compiling a Rule p enable d next current state state values rule r (f.first() > 0) ; x <= x + 1 ; f.deq (); endrule enable p f f x x d In a circuit, pi maps to combination logic that looks at the current state and generates a boolean enable signal for this rule The delta functions is another combination logic that computes the next state value from the current state value. Actually, delta has to compute the control signals to set the state element to the new value current state next state values rdy signals read methods enable signals action parameters p = enabling condition d = action signals & values February 15, 2008 http://csg.csail.mit.edu/6.375

Combining State Updates: strawman p’s from the rules that update R OR pn latch enable After mapping all the rules, we have to combine their logic some how. For a particular state elemetn like the PC register, the latch enable is the or the enable signals from all the rules that updates PC. The actual next state value of PC has to be selected through a multiplexer. Notice, this circuit only works if only one of these pi signal is asserted at a time d1,R dn,R OR R d’s from the rules that update R next state value What if more than one rule is enabled? February 15, 2008 http://csg.csail.mit.edu/6.375

Combining State Updates f1 p1 Scheduler: Priority Encoder OR p’s from all the rules pn fn latch enable After mapping all the rules, we have to combine their logic some how. For a particular state elemetn like the PC register, the latch enable is the or the enable signals from all the rules that updates PC. The actual next state value of PC has to be selected through a multiplexer. Notice, this circuit only works if only one of these pi signal is asserted at a time d1,R dn,R OR R d’s from the rules that update R next state value Scheduler ensures that at most one fi is true February 15, 2008 http://csg.csail.mit.edu/6.375

One-rule-at-a-time Scheduler p1 Scheduler: Priority Encoder f1 p2 f2 pn fn 1. fi  pi 2. p1  p2  ....  pn  f1  f2  ....  fn 3. One rewrite at a time i.e. at most one fi is true To resolve this, In the simplest implementation, instead of using the pi signals to enable state elements directly, we can these arbitrate phi signals. Thephi signals are generate a priority encoder that only assert one of the phi’s signals whose corresponding pi signal is asserted. With this scheduler, In effect, we are creating a implementation that executes one rule per clock cycle. Very conservative way of guaranteeing correctness February 15, 2008 http://csg.csail.mit.edu/6.375

Executing Multiple Rules Per Cycle: Conflict-free rules rule ra (z > 10); x <= x + 1; endrule rule rb (z > 20); y <= y + 2; Parallel execution behaves like ra < rb = rb < ra Rulea and Ruleb are conflict-free if s . pa(s)  pb(s)  1. pa(db(s))  pb(da(s)) 2. da(db(s)) == db(da(s)) The single rewrite per cycle implementation is correct but not very good. Remember how we described the fetch stage and execute stage in separate rules. If we only fire one rule per clock cycle, the we are not going to get a piplined processor. Parallel Execution can also be understood in terms of a composite rule rule ra_rb((z>10)&&(z>20)); x <= x+1; y <= y+2; endrule February 15, 2008 http://csg.csail.mit.edu/6.375 8 8 6 8

Executing Multiple Rules Per Cycle: Sequentially Composable rules rule ra (z > 10); x <= y + 1; endrule rule rb (z > 20); y <= y + 2; Parallel execution behaves like ra < rb Rulea and Ruleb are sequentially composable if s . pa(s)  pb(s)  pb(da(s)) The single rewrite per cycle implementation is correct but not very good. Remember how we described the fetch stage and execute stage in separate rules. If we only fire one rule per clock cycle, the we are not going to get a piplined processor. Parallel Execution can also be understood in terms of a composite rule rule ra_rb((z>10)&&(z>20)); x <= y+1; y <= y+2; endrule February 15, 2008 http://csg.csail.mit.edu/6.375 8 8 6 8

Multiple-Rules-per-Cycle Scheduler pn f1 f2 fn Divide the rules into smallest conflicting groups; provide a scheduler for each group 1. fi  pi 2. p1  p2  ....  pn  f1  f2  ....  fn 3. Multiple operations such that fi  fj  Ri and Rj are conflict-free or sequentially composable February 15, 2008 http://csg.csail.mit.edu/6.375

Muxing structure Muxing logic requires determining for each register (action method) the rules that update it and under what conditions Conflict Free/Mutually Exclusive) and or d1 p1 d2 p2 CF rules either do not update the same element or are ME p1  ~p2 Sequentially Composable and or d1 p1 and ~p2 d2 p2 February 15, 2008 http://csg.csail.mit.edu/6.375

Scheduling and control logic Modules (Current state) Modules (Next state) “CAN_FIRE” “WILL_FIRE” Rules p1 Scheduler f1 d1 p1 pn fn After mapping all the rules, we have to combine their logic some how. For a particular state elemetn like the PC register, the latch enable is the or the enable signals from all the rules that updates PC. The actual next state value of PC has to be selected through a multiplexer. Notice, this circuit only works if only one of these pi signal is asserted at a time d1 Muxing dn pn cond action dn February 15, 2008 http://csg.csail.mit.edu/6.375

Extra’s February 15, 2008 http://csg.csail.mit.edu/6.375

Sequentially Composable rules ... rule ra (z > 10); x <= 1; endrule rule rb (z > 20); x <= 2; Parallel execution can behave either like ra < rb or rb < ra but the two behaviors are not the same Composite rules rule ra_rb(z>10 && z>20); x <= 2; endrule Behavior ra < rb The single rewrite per cycle implementation is correct but not very good. Remember how we described the fetch stage and execute stage in separate rules. If we only fire one rule per clock cycle, the we are not going to get a piplined processor. rule rb_ra(z>10 && z>20); x <= 1; endrule Behavior rb < ra February 15, 2008 http://csg.csail.mit.edu/6.375 8 6 8 8

Mutually Exclusive Rules Rulea and Ruleb are mutually exclusive if they can never be enabled simultaneously s . pa(s)  ~ pb(s) Mutually-exclusive rules are Conflict-free even if they write the same state As an implementation consideration, we are going to require that the result of an sequential application can be reconstructed by combining the effects of the applying the two rules direct on s. Otherwise, we will have to cascade their combination logic which may leads to a longer cycle time. We say two rules are conflict free if they can satisfy all these conditions Mutual-exclusive analysis brings down the cost of conflict-free analysis February 15, 2008 http://csg.csail.mit.edu/6.375

Compiler determines if two rules can be executed in parallel Rulea and Ruleb are conflict-free if s . pa(s)  pb(s)  1. pa(db(s))  pb(da(s)) 2. da(db(s)) == db(da(s)) D(Ra)  R(Rb) =  D(Rb)  R(Ra) =  R(Ra)  R(Rb) =  Rulea and Ruleb are sequentially composable if s . pa(s)  pb(s)  pb(da(s)) D(pb)  R(Ra) =  The single rewrite per cycle implementation is correct but not very good. Remember how we described the fetch stage and execute stage in separate rules. If we only fire one rule per clock cycle, the we are not going to get a piplined processor. These properties can be determined by examining the domains and ranges of the rules in a pairwise manner. These conditions are sufficient but not necessary. Parallel execution of CF and SC rules does not increase the critical path delay February 15, 2008 http://csg.csail.mit.edu/6.375 8 8 6 8

Binary Multiplication Homework problem Binary Multiplication February 15, 2008 http://csg.csail.mit.edu/6.375

Exercise: Binary Multiplier Simple binary multiplication: 1001 0101 0000 0101101 // d = 4’d9 // r = 4’d5 // d << 0 (since r[0] == 1) // 0 << 1 (since r[1] == 0) // d << 2 (since r[2] == 1) // 0 << 3 (since r[3] == 0) // product (sum of above) = 45 x What does it look like in Bluespec? A module consists of: Rules – Are the fundumental means of expressing behavior in BSV Module interface – consists of Methods and Subinterfaces which describe micro protocols for interaction of the mentioned module with the outside world. d r product One step of multiplication February 15, 2008 http://csg.csail.mit.edu/6.375

Multiplier in Bluespec module mkMult (I_mult); Reg#(Int#(32)) product <- mkReg(0); Reg#(Int#(32)) d <- mkReg(0); Reg#(Int#(16)) r <- mkReg(0); rule cycle endrule method Action start endmethod method Int#(32) result () endmodule rule cycle (r != 0); if (r[0] == 1) product <= product + d; d <= d << 1; r <= r >> 1; endrule method Action start (Int#(16)x,Int#(16)y) if (r == 0); d <= signExtend(x); r <= y; endmethod method Int#(32) result () if (r == 0); return product; endmethod What is the interface I_mult ? February 15, 2008 http://csg.csail.mit.edu/6.375