Presentation is loading. Please wait.

Presentation is loading. Please wait.

March, 2007http://csg.csail.mit.edu/arvind802.11a-1 Architectural Exploration: Area-Performance tradeoff in 802.11a Transmitter Arvind Computer Science.

Similar presentations


Presentation on theme: "March, 2007http://csg.csail.mit.edu/arvind802.11a-1 Architectural Exploration: Area-Performance tradeoff in 802.11a Transmitter Arvind Computer Science."— Presentation transcript:

1 March, 2007http://csg.csail.mit.edu/arvind802.11a-1 Architectural Exploration: Area-Performance tradeoff in a Transmitter Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology

2 March, a-2http://csg.csail.mit.edu/arvind a Transmitter Overview ControllerScramblerEncoderInterleaverMapper IFFT Cyclic Extend headers data IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain) complex numbers accounts for 85% area 24 Uncoded bits One OFDM symbol (64 Complex Numbers) Must produce one OFDM symbol every 4 sec Depending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol

3 March, a-3http://csg.csail.mit.edu/arvind Preliminary results [MEMOCODE 2006] Dave, Gerding, Pellauer, Arvind Design Lines of Relative Block Code (BSV)Area Controller 49 0% Scrambler 40 0% Conv. Encoder 113 0% Interleaver 76 1% Mapper % IFFT 95 85% Cyc. Extender 23 3% Complex arithmetic libraries constitute another 200 lines of code

4 March, a-4http://csg.csail.mit.edu/arvind Combinational IFFT in0 … in1 in2 in63 in3 in4 Bfly4 x16 Bfly4 … … out0 … out1 out2 out63 out3 out4 Permute Reuse the same circuit three times to reduce area

5 March, a-5http://csg.csail.mit.edu/arvind fg Design Alternatives Reuse a block over multiple cycles we expect: Throughput to Area to ff g decrease – less parallelism The clock needs to run faster for the same throughput  hyper-linear increase in energy decrease – reusing a block

6 March, a-6http://csg.csail.mit.edu/arvind Circular pipeline: Reusing the Pipeline Stage in0 … in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 … Bfly4 Permute Stage Counter

7 March, a-7http://csg.csail.mit.edu/arvind Superfolded circular pipeline: Just one Bfly-4 node! in0 … in1 in2 in63 in3 in4 out0 … out1 out2 out63 out3 out4 Bfly4 Permute Index == 15? Index: 0 to 15 64, 2-way Muxes 4, 16-way Muxes 4, 16-way DeMuxes Stage 0 to 2

8 March, a-8http://csg.csail.mit.edu/arvind Pipelining a block inQoutQ f2f1 f3 Combinational C inQoutQ f2f1 f3 Pipeline P inQoutQ f Folded Pipeline FP Clock? Area?Throughput? Clock: C < P  FP Area: FP < C < PThroughput: FP < C < P

9 March, a-9http://csg.csail.mit.edu/arvind Synchronous pipeline x sReg1inQ f1f2f3 sReg2outQ rule sync-pipeline (True); inQ.deq(); sReg1 <= f1(inQ.first()); sReg2 <= f2( sReg1 ); outQ.enq(f3(sReg2)); endrule This is real IFFT code; just replace f1, f2 and f3 with stage_f code This rule can fire only if Atomicity: Either all or none of the state elements inQ, outQ, sReg1 and sReg2 will be updated - inQ has an element - outQ has space

10 March, a-10http://csg.csail.mit.edu/arvind Stage functions f1, f2 and f3 function f1(x); return (stage_f(1,x)); endfunction function f2(x); return (stage_f(2,x)); endfunction function f3(x); return (stage_f(3,x)); endfunction The stage_f function was given earlier

11 March, a-11http://csg.csail.mit.edu/arvind Problem: What about pipeline bubbles? x sReg1inQ f1f2f3 sReg2outQ rule sync-pipeline (True); inQ.deq(); sReg1 <= f1(inQ.first()); sReg2 <= f2( sReg1 ); outQ.enq(f3(sReg2)); endrule Red and Green tokens must move even if there is nothing in the inQ! Modify the rule to deal with these conditions Also if there is no token in sReg2 then nothing should be enqueued in the outQ Valid bits or the Maybe type

12 March, a-12http://csg.csail.mit.edu/arvind The Maybe type data in the pipeline typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); data valid/invalid Registers contain Maybe type values rule sync-pipeline (True); if ( inQ.notEmpty()) begin sReg1 <= Valid f1(inQ.first()); inq.deq(); end else sReg1 <= Invalid; case (sReg1) matches tagged Valid.sx1: sReg2 <= Valid f2(sx1); tagged Invalid: sReg2 <= Invalid; case (sReg2) matches tagged Valid.sx2: outQ.enq(f3(sx2)); endrule

13 March, a-13http://csg.csail.mit.edu/arvind Folded pipeline rule folded-pipeline (True); if (stage==0) begin sxIn= inQ.first(); inQ.deq(); end else sxIn= sReg; sxOut = f(stage,sxIn); if (stage==n-1) outQ.enq(sxOut); else sReg <= sxOut; stage <= (stage==n-1)? 0 : stage+1; endrule x sReg inQ f outQ stage The same code will work for superfolded pipelines by changing n and stage function f notice stage is a dynamic parameter now! no for- loop Need type declarations for sxIn and sxOut

14 March, a-14http://csg.csail.mit.edu/arvind a Transmitter Synthesis results (Only the IFFT block is changing) IFFT DesignArea (mm 2 ) Throughput Latency (CLKs/sym) Min. Freq Required Pipelined MHz Combinational MHz Folded (16 Bfly-4s) MHz Super-Folded (8 Bfly-4s) MHz SF(4 Bfly-4s) MHz SF(2 Bfly-4s) MHz SF (1 Bfly4) MHZ TSMC.18 micron; numbers reported are before place and route. The same source code All these designs were done in less than 24 hours!

15 March, a-15http://csg.csail.mit.edu/arvind Why are the areas so similar Folding should have given a 3x improvement in IFFT area BUT a constant twiddle allows low- level optimization on a Bfly-4 block a 2.5x area reduction!

16 March, a-16http://csg.csail.mit.edu/arvind Parameterize the synchronous pipeline Vector#(n, Reg#(t)) sReg <- replicateM(mkReg(Invalid)); rule sync-pipeline (True); if ( inQ.notEmpty()) begin (sReg[1]) <= Valid f(0,inQ.first()); inq.deq(); end else ( sReg[1]) <= Invalid; for (Integer stage = 1; stage < n-1; stage = stage+1) case (sReg[stage]) matches tagged Valid.sx: ( sReg[stage+1]) <= Valid f(stage,sx); tagged Invalid : ( sReg[stage+1]) <= Invalid; endcase case (sReg[n-1]) matches tagged Valid.sx: outQ.enq(f(n-1,sx)); endcase endrule x sReg[1]inQ f1f1 fnfn sReg[n-1]outQ n and stage are static parameters

17 March, a-17http://csg.csail.mit.edu/arvind Syntax: Vector of Registers Register suppose x and y are both of type Reg. Then x <= y means x._write(y._read()) Vector of (say) Int x[i] means sel(x,i) x[i] = y[j] means x = update(x,i, sel(y,j)) Vector of Registers x[i] <= y[j] does not work. The parser thinks it means (sel(x,i)._read)._write(sel(y,j)._read), which will not type check (x[i]) <= y[j] does work!


Download ppt "March, 2007http://csg.csail.mit.edu/arvind802.11a-1 Architectural Exploration: Area-Performance tradeoff in 802.11a Transmitter Arvind Computer Science."

Similar presentations


Ads by Google