Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploiting Fast Carry Chains of FPGAs for Designing Compressor Trees

Similar presentations


Presentation on theme: "Exploiting Fast Carry Chains of FPGAs for Designing Compressor Trees"— Presentation transcript:

1 Exploiting Fast Carry Chains of FPGAs for Designing Compressor Trees
Hadi P. Afshar Philip Brisk Paolo Ienne

2 Multi-input Additions are Fundamental
DSP and Multimedia Application FIR filters, Motion Estimation,… Parallel Multipliers Flow Graph Transformation D Σ FIR Filter

3 Flow Graph Transformation
BEFORE step 3 delta 7 delta 4 delta 2 delta 1 AFTER >> 4 & & & & step 1 + & = = = >> = 2 step step 1 step 2 step 3 step 2 + & >> >> >> >> >> = 1 + & Compressor Tree = + ADPCM vpdiff vpdiff

4 Compressor vs. Adder Tree
Compressor Tree Adder Tree CPA CPA CSA Slow intra LUT routing Poor LUT utilization Low logic density Compressors are better than Adder Trees in VLSI But Adder Trees are better than Compressors in FPGA!

5 But Compressor Trees can be faster and smaller if Properly Designed

6 Better Compressors on FPGA
Generalized Parallel Counter (GPC) is the basic block More logic density Fewer logic levels Less pressure on the routing CPA GPC

7 Overview Arithmetic Concepts Hybrid Design Approach Experiments
Bottom-up Top-down Experiments Conclusion

8 Parallel Counters Parallel Counter Generalized Parallel Counter (GPC)
Count # of input bits set to 1 Output is a binary value 3:2 − Full Adder 2:2 − Half Adder Generalized Parallel Counter (GPC) Input bits can have different bit position Eg. (3, 3; 4) GPC m n m:n counter n = log2(m+1)

9 Compressor Trees on FPGAs
We propose GPCs as the basic blocks for compressor trees Why? GPCs map well onto FPGA logic cells GPCs are flexible

10 GPC Mapping Example (0,5;3) (3,4;4) (3,5;4) 5 Counters 3 GPCs

11 Overview Arithmetic Concepts Hybrid Design Approach Experiments
Bottom-up Top-down Experiments Conclusion

12 Hybrid Design Approach
Compressor Tree Specification Top-Down GPC Mapped HDL Netlist Place and Route Result Atom Level GPC HDL Library FPGA Architectural Characteristics Bottom-Up

13 FPGA Logic Cell Altera Stratix-II/III/V + Logic Array Block (LAB)
Adaptive Logic Module (ALM) Reg Comb. Logic + 1 2 3 4 5 6 7 8

14 FPGA Logic Cell ALM Configuration Modes Normal Extended Arithmetic
Shared Arithmetic 4-LUT + 4-LUT +

15 Bottom-up Design LAB1 LAB0 6:3 GPC F2 F1 F0
What if we have bigger GPCs like 7:3 GPC? Can we exploit the carry chain and dedicated adders for building GPCs?

16 GPC Design Example (0, 6; 3) GPC + s0 s1 c0 c1 z0 z1 z2 ALM0 ALM1 a5
C(a1,a2,a3) C(a4,a5) S(a1,a2,a3) S(a4,a5) a0 s0 s1 c0 c1 z0 z1 z2 ALM0 ALM1 + a5 a4 a3 a2 a1 a0 FA HA s0 c0 s1 c1 z0 z1 z2 (0, 6; 3) GPC

17 + + + GPC Placement {cout,s} = cin+ a + a = cin+ 2a
Logic separation between carry and sum Zero value on the carry + + GPC Boundary + GPC Boundary a cin cout s {cout,s} = cin+ a + a = cin+ 2a cout = a and s = cin

18 + LUT + LUT GPCi GPCi GPCi+1 GPCi+1

19 Top-down Heuristic Mapping_algorithm(Integer : M, Integer : W,
{ Build_GPC_library(); repeat while (col_indx<max_col_indx) if(columns[col_indx] > H) Map_by_GPC(); else col_indx++; } lsb_to_msb_covering(); Connect_GPCs_IOs(); Propagate_comb_delay(); Generate_next_stage_dots(); } until three rows of dots remains; Step1: Step2: Step3: Mapping_algorithm(Integer : M, Integer : W, Array of Integers : columns ) (0, H; log2H)

20 Major Step of Heuristic
Mapped to (0, H; log2H) GPCs Height < H Process columns from LSB to MSB

21 Delay Balancing CP1 = z1d+a0d CP2 = max(z1d+a5d, z4d+a2d, z6d+a0d)
CP1 = z1d+a0d CP2 = max(z1d+a5d, z4d+a2d, z6d+a0d) z8 z7 z6 z5 z4 z3 z2 z1 z0 a5 a2 a0 z1d > z4d > z6d a0d > a2d > a5d

22 Overview Arithmetic Constructs Hybrid Design Approach Experiments
Bottom-up Top-down Experiments Conclusion

23 Experiments Bottom-up design Top-down Quartus-II Altera tool
Atom-level design by Verilog Quartus Module (VQM) format Top-down Heuristic: C++ Output: Structural VHDL Quartus-II Altera tool Benchmarks DCT, FIR, ME, G721 Multiplier Horner Polynomial Video Mixer

24 Experiments Mapping methods Ternary LUT Only
Arith1: Arithmetic mode, without delay balancing Arith2: Arithmetic mode, with delay balancing

25 Delay (ns) -27% +2%

26 Area (ALM) +47% +18%

27 Area (LAB) -4.5%

28 Overview Arithmetic Concepts Hybrid Design Approach Experiments
Bottom-up Top-down Experiments Conclusion

29 Conclusion Conventional wisdom has held that adder trees outperform compressor trees on FPGAs Ternary adder trees were a major selling point Conventional wisdom is wrong! GPCs map nicely onto FPGA logic cells Carry-chain Compressor trees on FPGAs, are faster than adder trees when built from GPCs


Download ppt "Exploiting Fast Carry Chains of FPGAs for Designing Compressor Trees"

Similar presentations


Ads by Google