Bluespec-1: Design Affects Everything

Slides:



Advertisements
Similar presentations
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
Advertisements

March 2007http://csg.csail.mit.edu/arvindSemantics-1 Scheduling Primitives for Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Constructive Computer Architecture Tutorial 4: SMIPS on FPGA Andy Wright 6.S195 TA October 7, 2013http://csg.csail.mit.edu/6.s195T04-1.
Stmt FSM Richard S. Uhler Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology (based on a lecture prepared by Arvind)
February 21, 2007http://csg.csail.mit.edu/6.375/L07-1 Bluespec-4: Architectural exploration using IP lookup Arvind Computer Science & Artificial Intelligence.
March, 2007http://csg.csail.mit.edu/arvindIPlookup-1 IP Lookup Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
September 24, L08-1 IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 22, 2011L06-1.
IP Lookup: Some subtle concurrency issues Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 4, 2013
December 10, 2009 L29-1 The Semantics of Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute.
Computer Architecture: A Constructive Approach Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
February 22, 2005http://csg.csail.mit.edu/6.884/L07-1 Bluespec-1: Design Affects Everything Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Lecture 9. MIPS Processor Design – Instruction Fetch Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System Education &
Copyright © Bluespec Inc Bluespec SystemVerilog™ Design Example A DMA Controller with a Socket Interface.
February 14, 2007L04-1http://csg.csail.mit.edu/6.375/ Bluespec-1: Design methods to facilitate rapid growth of SoCs Arvind Computer Science & Artificial.
September 3, 2009L02-1http://csg.csail.mit.edu/korea Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
March, 2007Intro-1http://csg.csail.mit.edu/arvind Design methods to facilitate rapid growth of SoCs Arvind Computer Science & Artificial Intelligence Lab.
March 6, 2006http://csg.csail.mit.edu/6.375/L10-1 Bluespec-4: Rule Scheduling and Synthesis Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Constructive Computer Architecture: Guards Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September 24, 2014.
September 22, 2009http://csg.csail.mit.edu/koreaL07-1 Asynchronous Pipelines: Concurrency Issues Arvind Computer Science & Artificial Intelligence Lab.
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology
Constructive Computer Architecture Sequential Circuits Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology September.
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Constructive Computer Architecture Sequential Circuits - 2 Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Modular Refinement Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology March 8,
Introduction to Bluespec: A new methodology for designing Hardware Arvind Computer Science & Artificial Intelligence Lab. Massachusetts Institute of Technology.
Multiple Clock Domains (MCD) Arvind with Nirav Dave Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology.
Problem: design complexity advances in a pace that far exceeds the pace in which verification technology advances. More accurately: (verification complexity)
October 20, 2009L14-1http://csg.csail.mit.edu/korea Concurrency and Modularity Issues in Processor pipelines Arvind Computer Science & Artificial Intelligence.
October 6, 2009http://csg.csail.mit.edu/koreaL10-1 IP Lookup-2: The Completion Buffer Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
February 28, 2005http://csg.csail.mit.edu/6.884/L09-1 Bluespec-3: Modules & Interfaces Arvind Computer Science & Artificial Intelligence Lab Massachusetts.
Introduction to Bluespec: A new methodology for designing Hardware
Introduction to Bluespec: A new methodology for designing Hardware
ASIC Design Methodology
Bluespec-6: Modeling Processors
Digital System Verification
Folded “Combinational” circuits
Scheduling Constraints on Interface methods
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Sequential Circuits Constructive Computer Architecture Arvind
Sequential Circuits: Constructive Computer Architecture
Introduction to Bluespec: A new methodology for designing Hardware
Processor (I).
IP Lookup: Some subtle concurrency issues
Stmt FSM Arvind (with the help of Nirav Dave)
Performance Specifications
Multirule Systems and Concurrent Execution of Rules
Constructive Computer Architecture: Guards
Sequential Circuits Constructive Computer Architecture Arvind
Modular Refinement Arvind
Modular Refinement Arvind
Bluespec-4: Architectural exploration using IP lookup Arvind
Lecture 1.3 Hardware Description Languages (HDLs)
Modules with Guarded Interfaces
Sequential Circuits - 2 Constructive Computer Architecture Arvind
Introduction to Bluespec: A new methodology for designing Hardware
Stmt FSM Arvind (with the help of Nirav Dave)
IP Lookup Arvind Computer Science & Artificial Intelligence Lab
IP Lookup: Some subtle concurrency issues
Constructive Computer Architecture: Guards
GCD: A simple example to introduce Bluespec
Win with HDL Slide 4 System Level Design
Design methods to facilitate rapid growth of SoCs
Introduction to Bluespec: A new methodology for designing Hardware
IP Lookup: Some subtle concurrency issues
Bluespec-5: Scheduling & Rule Composition
Modular Refinement Arvind
Bluespec SystemVerilog™
Presentation transcript:

Bluespec-1: Design Affects Everything Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology February 24, 2006 http://csg.csail.mit.edu/6.375/

Chip costs are exploding because of design complexity SoC failures costing time/spins Source: IBM/IBS, Inc. Architecture Verification Physical Validation Prototype Source: Aart de Geus, CEO of Synopsys Based on a survey of 2000 users by Synopsys While it’s getting harder to get “first time right”, design and NRE costs are rapidly increasing. Respins are getting prohibitively expensive to afford. In particular, the costs to architect/design and verify ICs is the largest cost – and the largest target to attack. Design and verification dominate escalating project costs February 24, 2006 http://csg.csail.mit.edu/6.375/

Common quotes “Design is not a problem; design is easy” “Verification is a problem” “Timing closure is a problem” “Physical design is a problem” Almost complete reliance on post-design verification for quality Mind set February 24, 2006 http://csg.csail.mit.edu/6.375/

Through the early 1980s: and U.S. quality was… The U.S. auto industry Sought quality solely through post-build inspection Planned for defects and rework and U.S. quality was… Defect Make Inspect Rework February 24, 2006 http://csg.csail.mit.edu/6.375/

… less than world class Adding quality inspectors (“verification engineers”) and giving them better tools, was not the solution The Japanese auto industry showed the way “Zero defect” manufacturing February 24, 2006 http://csg.csail.mit.edu/6.375/

New mind set: Design affects everything! A good design methodology Can keep up with changing specs Permits architectural exploration Facilitates verification and debugging Eases changes for timing closure Eases changes for physical design Promotes reuse  It is essential to Design for Correctness February 24, 2006 http://csg.csail.mit.edu/6.375/

New semantics for expressing behavior to reduce design complexity Decentralize complexity: Rule-based specifications (Guarded Atomic Actions) Let us think about one rule at a time Formalize composition: Modules with guarded interfaces Automatically manage and ensure the correctness of connectivity, i.e., correct-by-construction methodology Retain resilience to changes in design or layout, e.g. compute latency ’s Promote regularity of layout at macro level Bluespec February 24, 2006 http://csg.csail.mit.edu/6.375/

RTL has poor semantics for composition data_in push_req_n pop_req_n clk rstn data_out full empty Example: Commercially available FIFO IP block No machine verification of such informal constraints is feasible These constraints are spread over many pages of the documentation... February 24, 2006 http://csg.csail.mit.edu/6.375/

Bluespec promotes composition through guarded interfaces Self-documenting interfaces; Automatic generation of logic to eliminate conflicts in use. theModuleA theFifo.enq(value1); theFifo.deq(); value2 = theFifo.first(); theFifo.enq(value3); value4 = theFifo.first(); Enqueue arbitration control theFifo Dequeue arbitration control n rdy enab enq not full not empty theModuleB deq FIFO n first February 24, 2006 http://csg.csail.mit.edu/6.375/

In Bluespec SystemVerilog (BSV) … Power to express complex static structures and constraints Checked by the compiler “Micro-protocols” are managed by the compiler The compiler generates the necessary hardware (muxing and control) Micro-protocols need less or no verification Easier to make changes while preserving correctness Smaller, simpler, clearer, more correct code February 24, 2006 http://csg.csail.mit.edu/6.375/

Bluespec: State and Rules organized into modules interface module All state (e.g., Registers, FIFOs, RAMs, ...) is explicit. Behavior is expressed in terms of atomic actions on the state: Rule: condition  action Rules can manipulate state in other modules only via their interfaces. February 24, 2006 http://csg.csail.mit.edu/6.375/

Examples GCD Multiplication IP Lookup February 24, 2006 http://csg.csail.mit.edu/6.375/

Programming with rules: A simple example Euclid’s algorithm for computing the Greatest Common Divisor (GCD): 15 6 9 6 subtract February 24, 2006 http://csg.csail.mit.edu/6.375/

GCD in BSV module mkGCD (I_GCD); Reg#(int) x <- mkRegU; y swap sub module mkGCD (I_GCD); Reg#(int) x <- mkRegU; Reg#(int) y <- mkReg(0); rule swap ((x > y) && (y != 0)); x <= y; y <= x; endrule rule subtract ((x <= y) && (y != 0)); y <= y – x; method Action start(int a, int b) if (y==0); x <= a; y <= b; endmethod method int result() if (y==0); return x; endmodule Assumes x /= 0 and y /= 0 February 24, 2006 http://csg.csail.mit.edu/6.375/

GCD Hardware Module implicit conditions interface I_GCD; rdy enab int start result module GCD y == 0 implicit conditions interface I_GCD; method Action start (int a, int b); method int result(); endinterface The module can easily be made polymorphic Many different implementations can provide the same interface: module mkGCD (I_GCD) February 24, 2006 http://csg.csail.mit.edu/6.375/

GCD: Another implementation module mkGCD (I_GCD); Reg#(int) x <- mkRegU; Reg#(int) y <- mkReg(0); rule swapANDsub ((x > y) && (y != 0)); x <= y; y <= x - y; endrule rule subtract ((x<=y) && (y!=0)); y <= y – x; method Action start(int a, int b) if (y==0); x <= a; y <= b; endmethod method int result() if (y==0); return x; endmodule Combine swap and subtract rule Does it compute faster ? February 24, 2006 http://csg.csail.mit.edu/6.375/

Bluespec SystemVerilog source Bluespec Tool flow Blueview Bluespec SystemVerilog source Bluespec Compiler C Bluespec C sim Cycle Accurate Verilog 95 RTL RTL synthesis gates Verilog sim VCD output files Bluespec tools 3rd party tools Legend Debussy Visualization Place & Route Physical February 24, 2006 http://csg.csail.mit.edu/6.375/ Tapeout

Generated Verilog RTL: GCD module mkGCD(CLK,RST_N,start_a,start_b,EN_start,RDY_start, result,RDY_result); input CLK; input RST_N; // action method start input [31 : 0] start_a; input [31 : 0] start_b; input EN_start; output RDY_start; // value method result output [31 : 0] result; output RDY_result; // register x and y reg [31 : 0] x; wire [31 : 0] x$D_IN; wire x$EN; reg [31 : 0] y; wire [31 : 0] y$D_IN; wire y$EN; ... // rule RL_subtract assign WILL_FIRE_RL_subtract = x_SLE_y___d3 && !y_EQ_0___d10 ; // rule RL_swap assign WILL_FIRE_RL_swap = !x_SLE_y___d3 && !y_EQ_0___d10 ; February 24, 2006 http://csg.csail.mit.edu/6.375/

Generated Hardware swap? subtract? February 24, 2006 x_en y_en x y > !(=0) swap? subtract? February 24, 2006 http://csg.csail.mit.edu/6.375/

Generated Hardware Module x y en rdy start result start_en start_en x_en y_en x y > !(=0) swap? subtract? sub February 24, 2006 http://csg.csail.mit.edu/6.375/

GCD: A Simple Test Bench module mkTest (); Reg#(int) state <- mkReg(0); I_GCD gcd <- mkGCD(); rule go (state == 0); gcd.start (423, 142); state <= 1; endrule rule finish (state == 1); $display (“GCD of 423 & 142 =%d”,gcd.result()); state <= 2; endmodule Why do we need the state variable? February 24, 2006 http://csg.csail.mit.edu/6.375/

GCD: Test Bench module mkTest (); Reg#(int) state <- mkReg(0); Reg#(Int#(4)) c1 <- mkReg(1); Reg#(Int#(7)) c2 <- mkReg(1); I_GCD gcd <- mkGCD(); rule req (state==0); gcd.start(signExtend(c1), signExtend(c2)); state <= 1; endrule rule resp (state==1); $display (“GCD of %d & %d =%d”, c1, c2, gcd.result()); if (c1==7) begin c1 <= 1; c2 <= c2+1; state <= 0; end else c1 <= c1+1; if (c2 == 63) state <= 2; endmodule February 24, 2006 http://csg.csail.mit.edu/6.375/

GCD: Synthesis results Original (16 bits) Clock Period: 1.6 ns Area: 4240.10 mm2 Unrolled (16 bits) Clock Period: 1.65ns Area: 5944.29 mm2 Unrolled takes 31% fewer cycles on testbench February 24, 2006 http://csg.csail.mit.edu/6.375/

One step of multiplication Multiplier Example Simple binary multiplication: 1001 0101 0000 0101101 // d = 4’d9 // r = 4’d5 // d << 0 (since r[0] == 1) // 0 << 1 (since r[1] == 0) // d << 2 (since r[2] == 1) // 0 << 3 (since r[3] == 0) // product (sum of above) = 45 x What does it look like in Bluespec? A module consists of: Rules – Are the fundumental means of expressing behavior in BSV Module interface – consists of Methods and Subinterfaces which describe micro protocols for interaction of the mentioned module with the outside world. d r product One step of multiplication February 24, 2006 http://csg.csail.mit.edu/6.375/

Multiplier in Bluespec module mkMult (I_mult); Reg#(Int#(32)) product <- mkReg(0); Reg#(Int#(32)) d <- mkReg(0); Reg#(Int#(16)) r <- mkReg(0); rule cycle endrule method Action start endmethod method Int#(32) result () endmodule What is the interface I_mult ? February 24, 2006 http://csg.csail.mit.edu/6.375/

Exploring microarchitectures IP Lookup Module February 24, 2006 http://csg.csail.mit.edu/6.375/

IP Lookup block in a router Queue Manager Packet Processor Exit functions Control Processor Line Card (LC) IP Lookup SRAM (lookup table) Arbitration Switch LC The design example you’ll be seeing is from an IP switch. We’ll be looking at the IP lookup function that you typically find on the ingress port of a line card. A packet is routed based on the “Longest Prefix Match” (LPM) of it’s IP address with entries in a routing table Line rate and the order of arrival must be maintained line rate  15Mpps for 10GE February 24, 2006 http://csg.csail.mit.edu/6.375/

Sparse tree representation F … A … 7 A … B F … F * E 5.*.*.* D 10.18.200.5 C 10.18.200.* B 7.14.7.3 A 7.14.*.* 14 3 5 E F … F 7 F … 10 F … F … 18 255 F … F … IP address Result M Ref 7.13.7.3 F 10.18.201.5 7.14.7.2 5.13.7.2 E 10.18.200.7 C C … 5 D 200 2 F … 3 4 A Real-world lookup algorithms are more complex but all make a sequence of dependent memory references. 1 4 February 24, 2006 http://csg.csail.mit.edu/6.375/

SW (“C”) version of LPM How to implement LPM in HW? int lpm (IPA ipa) /* 3 memory lookups */ { int p; p = RAM [ipa[31:16]]; /* Level 1: 16 bits */ if (isLeaf(p)) return p; p = RAM [p + ipa [15:8]]; /* Level 2: 8 bits */ p = RAM [p + ipa [7:0]]; /* Level 3: 8 bits */ return p; /* must be a leaf */ } Here’s a longest prefix match algorithm in software. It’s pretty simple, up to three lookups of different parts of the IP address. (Click now) Can you visualize how to implement this in HW? It’s not obvious. How to implement LPM in HW? Not obvious from C code! February 24, 2006 http://csg.csail.mit.edu/6.375/

Longest Prefix Match for IP lookup: 3 possible implementation architectures Rigid pipeline Inefficient memory usage but simple design Linear pipeline Efficient memory usage through memory port replicator Circular pipeline Efficient memory with most complex control So, let’s look at three different approaches: Rigid pipeline Linear pipeline Circular pipeline They all appear at a high-level to have different advantages, but… 1 2 3 Designer’s Ranking: Which is “best”? February 24, 2006 http://csg.csail.mit.edu/6.375/

Static Pipeline RAM IP addr MUX req resp February 24, 2006 http://csg.csail.mit.edu/6.375/

Static code rule static (True); if (canInsert(c5)) begin c1 <= 0; r1 <= in.first(); in.deq(); end else r1 <= r5; c1 <= c5; if (notEmpty(r1)) makeMemReq(r1); r2 <= r1; c2 <= c1; r3 <= r2; c3 <= c2; r4 <= r3; c4 <= c3; r5 <= getMemResp(); c5 <= (c4 == n-1) ? 0 : n; if (c5 == n) out.enq(r5); endrule February 24, 2006 http://csg.csail.mit.edu/6.375/

Circular pipeline cbuf RAM enter? done? yes getToken in active no luResp luReq February 24, 2006 http://csg.csail.mit.edu/6.375/

Circular Pipeline code rule enter (True); t <- cbuf.newToken(); IP ip = in.first(); ram.req(ip[31:16]); active.enq(tuple2(ip[15:0], t)); in.deq(); endrule rule done (True); p <- ram.resp(); match {.rip, .t} = active.first(); if (isLeaf(p)) cbuf.complete(t, p); else begin match {.newreq, .newrip} = remainder(p, rip); active.enq(rip << 8, t); ram.req(p+signExtend(rip[15:7])); end active.deq(); February 24, 2006 http://csg.csail.mit.edu/6.375/

Synthesis results LPM versions Code size (lines) Best Area (gates) Best Speed (ns) Mem. util. (random workload) Static V 220 2271 3.56 63.5% Static BSV 179 2391 (5% larger) 3.32 (7% faster) Linear V 410 14759 4.7 99.9% Linear BSV 168 15910 (8% larger) 4.7 (same) Circular V 364 8103 3.62 Circular BSV 257 8170 (1% larger) 3.67 (2% slower) (Just read it….) V = Verilog BSV = Bluespec System Verilog Synthesized to TSMC 0.18 µm library Bluespec and Verilog synthesis results are nearly identical February 24, 2006 http://csg.csail.mit.edu/6.375/ Arvind, Nikhil, Rosenband & Dave ICCAD 2004

Next Time Combinational Circuits and Types February 24, 2006 http://csg.csail.mit.edu/6.375/