Presentation is loading. Please wait.

Presentation is loading. Please wait.

September 8, 2009http://csg.csail.mit.edu/koreaL03-1 Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts.

Similar presentations


Presentation on theme: "September 8, 2009http://csg.csail.mit.edu/koreaL03-1 Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts."— Presentation transcript:

1 September 8, 2009http://csg.csail.mit.edu/koreaL03-1 Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

2 September 8, 2009 L03-2http://csg.csail.mit.edu/korea Object code (Verilog/C) Bluespec: Two-Level Compilation Rules and Actions (Term Rewriting System) Rule conflict analysis Rule scheduling  James Hoe & Arvind  @MIT 1997-2000 Bluespec (Objects, Types, Higher-order functions) Level 1 compilation Type checking Massive partial evaluation and static elaboration Level 2 synthesis  Lennart Augustsson  @Sandburst 2000-2002 Now we call this Guarded Atomic Actions

3 September 8, 2009 L03-3http://csg.csail.mit.edu/korea Static Elaboration.exe compile design2design3design1 elaborate w/params run1 run2.1 … run1 run3.1 … run1 run1.1 … run w/ params run w/ params run1 … run At compile time Inline function calls and unroll loops Instantiate modules with specific parameters Resolve polymorphism/overloading, perform most data structure operations source Software Toolflow: source Hardware Toolflow:

4 September 8, 2009 L03-4http://csg.csail.mit.edu/korea Combinational IFFT in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute All numbers are complex and represented as two sixteen bit quantities. Fixed-point arithmetic is used to reduce area, power,... * * * * + - - + + - - + *j t2t2 t0t0 t3t3 t1t1

5 September 8, 2009 L03-5http://csg.csail.mit.edu/korea 4-way Butterfly Node function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k); BSV has a very strong notion of types Every expression has a type. Either it is declared by the user or automatically deduced by the compiler The compiler verifies that the type declarations are compatible * * * * + - - + + - - + *i t0t1t2t3t0t1t2t3 k0k1k2k3k0k1k2k3

6 September 8, 2009 L03-6http://csg.csail.mit.edu/korea BSV code: 4-way Butterfly function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k); Vector#(4,Complex) m, y, z; m[0] = k[0] * t[0]; m[1] = k[1] * t[1]; m[2] = k[2] * t[2]; m[3] = k[3] * t[3]; y[0] = m[0] + m[2]; y[1] = m[0] – m[2]; y[2] = m[1] + m[3]; y[3] = i*(m[1] – m[3]); z[0] = y[0] + y[2]; z[1] = y[1] + y[3]; z[2] = y[0] – y[2]; z[3] = y[1] – y[3]; return(z); endfunction Polymorphic code: works on any type of numbers for which *, + and - have been defined * * * * + - - + + - - + *i my z Note: Vector does not mean storage

7 September 8, 2009 L03-7http://csg.csail.mit.edu/korea Complex Arithmetic Addition z R = x R + y R z I = x I + y I Multiplication z R = x R * y R - x I * y I z R = x R * y I + x I * y R The actual arithmetic for FFT is different because we use a non-standard fixed point representations

8 September 8, 2009 L03-8http://csg.csail.mit.edu/korea BSV code for Addition typedef struct{ Int#(t) r; Int#(t) i; } Complex#(numeric type t) deriving (Eq,Bits); function Complex#(t) \+ (Complex#(t) x, Complex#(t) y); Int#(t) real = x.r + y.r; Int#(t) imag = x.i + y.i; return(Complex{r:real, i:imag}); endfunction

9 September 8, 2009 L03-9http://csg.csail.mit.edu/korea Combinational IFFT in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute stage_f function repeat stage_f three times function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data);

10 September 8, 2009 L03-10http://csg.csail.mit.edu/korea BSV Code: Combinational IFFT function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); //Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]); The for loop is unfolded and stage_f is inlined during static elaboration Note: no notion of loops or procedures during execution

11 September 8, 2009 L03-11http://csg.csail.mit.edu/korea BSV Code: Combinational IFFT- Unfolded function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); //Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]); Stage_f can be inlined now; it could have been inlined before loop unfolding also. Does the order matter? stage_data[1] = stage_f(0,stage_data[0]); stage_data[2] = stage_f(1,stage_data[1]); stage_data[3] = stage_f(2,stage_data[2]);

12 September 8, 2009 L03-12http://csg.csail.mit.edu/korea Bluespec Code for stage_f function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) begin Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; end return(stage_out); twid’s are mathematically derivable constants

13 September 8, 2009 L03-13http://csg.csail.mit.edu/korea Higher-order functions: Stage functions f1, f2 and f3 function f1(x); return (stage_f(1,x)); endfunction function f2(x); return (stage_f(2,x)); endfunction function f3(x); return (stage_f(3,x)); endfunction What is the type of f1(x) ?

14 September 8, 2009 L03-14http://csg.csail.mit.edu/korea Suppose we want to reuse some part of the circuit... in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute Reuse the same circuit three times to reduce area But why?

15 September 8, 2009http://csg.csail.mit.edu/koreaL03-15 Architectural Exploration: Area-Performance tradeoff in 802.11a Transmitter

16 September 8, 2009 L03-16http://csg.csail.mit.edu/korea 802.11a Transmitter Overview ControllerScramblerEncoderInterleaverMapper IFFT Cyclic Extend headers data IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain) complex numbers accounts for 85% area 24 Uncoded bits One OFDM symbol (64 Complex Numbers) Must produce one OFDM symbol every 4 sec Depending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol

17 September 8, 2009 L03-17http://csg.csail.mit.edu/korea Preliminary results [MEMOCODE 2006] Dave, Gerding, Pellauer, Arvind Design Lines of Relative Block Code (BSV)Area Controller 49 0% Scrambler 40 0% Conv. Encoder 113 0% Interleaver 76 1% Mapper 112 11% IFFT 95 85% Cyc. Extender 23 3% Complex arithmetic libraries constitute another 200 lines of code

18 September 8, 2009 L03-18http://csg.csail.mit.edu/korea Combinational IFFT in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute Reuse the same circuit three times to reduce area

19 September 8, 2009 L03-19http://csg.csail.mit.edu/korea fg Design Alternatives Reuse a block over multiple cycles we expect: Throughput to Area to ff g decrease – less parallelism The clock needs to run faster for the same throughput  hyper-linear increase in energy decrease – reusing a block

20 September 8, 2009 L03-20http://csg.csail.mit.edu/korea Circular pipeline: Reusing the Pipeline Stage in0 … in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 … Bfly4 Permute Stage Counter

21 September 8, 2009 L03-21http://csg.csail.mit.edu/korea Superfolded circular pipeline: Just one Bfly-4 node! in0 … in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 Bfly4 Permute Index == 15? Index: 0 to 15 64, 2-way Muxes 4, 16-way Muxes 4, 16-way DeMuxes Stage 0 to 2

22 September 8, 2009 L03-22http://csg.csail.mit.edu/korea Pipelining a block inQoutQ f2f1 f3 Combinational C inQoutQ f2f1 f3 Pipeline P inQoutQ f Folded Pipeline FP Clock? Area?Throughput? Clock: C < P  FP Area: FP < C < PThroughput: FP < C < P

23 September 8, 2009 L03-23http://csg.csail.mit.edu/korea Syntax: Vector of Registers Register suppose x and y are both of type Reg. Then x <= y means x._write(y._read()) Vector of Int x[i] means sel(x,i) x[i] = y[j] means x = update(x,i, sel(y,j)) Vector of Registers x[i] <= y[j] does not work. The parser thinks it means (sel(x,i)._read)._write(sel(y,j)._read), which will not type check (x[i]) <= y[j] parses as sel(x,i)._write(sel(y,j)._read), and works correctly Don’t ask me why

24 September 8, 2009 L03-24http://csg.csail.mit.edu/korea Static vs dynamic expressions Expressions that can be evaluated at compile time will be evaluated at compile-time 3+4  7 Some expressions do not have run-time representations and must be evaluated away at compile time; an error will occur if the compile-time evaluation does not succeed Integers, reals, loops, lists, functions, …


Download ppt "September 8, 2009http://csg.csail.mit.edu/koreaL03-1 Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts."

Similar presentations


Ads by Google