Presentation is loading. Please wait.

Presentation is loading. Please wait.

Give qualifications of instructors: DAP

Similar presentations


Presentation on theme: "Give qualifications of instructors: DAP"— Presentation transcript:

1 ECE 697F Reconfigurable Computing Lecture 5 Technology Mapping: Packing Logic into LUTs
Give qualifications of instructors: DAP teaching computer architecture at Berkeley since 1977 Co-athor of textbook used in class Best known for being one of pioneers of RISC currently author of article on future of microprocessors in SciAm Sept 1995 RY took 152 as student, TAed 152,instructor in 152 undergrad and grad work at Berkeley joined NextGen to design fact 80x86 microprocessors one of architects of UltraSPARC fastest SPARC mper shipping this Fall

2 Overview Logic synthesis LUT Clustering LUT capacity
Chortle – example technology mapper Architecture-specific optimization

3 Boolean network A Boolean network is the main representation of the logic functions for technology independent optimizations. Each node can be represented as sum-of- products (or product-of-sums). Provides multi-level structure, but functions in the network need not correspond to logic gates.

4 Boolean network example
out1 = k2 + x2’ out2 = k3 + x1 k2 = x1’ x2 x4 + k1 k3 = k1 x4’ k1 = x2 + x3 x1 x2 x3 x4 primary outputs primary inputs

5 Support: set of variables used by a function.
Terms Support: set of variables used by a function. Transitive fanout: all the primary outputs and intermediate variables of a function. Transitive fanin: all the primary inputs and intermediate variables used by a function. Transistive fanin determines a cone of logic. cone primary inputs output

6 Partially-specified function
x1 x2 x3 1 don’t care

7 Network restructuring. Delay restructuring.
Optimizations Simplification. Changing the way a function is represented. Network restructuring. Adding and removing nodes. Delay restructuring. Optimizations that reduce the height of critical paths.

8 Partial collapsing f1 f4 F f4 f2 f3 f3 before after

9 Technology mapping Cover the function:

10 FPGA tech mapping Cost (number of inputs) doesn’t always increase with added functions:

11 Cost metric for static gates is literal:
FPGAs vs. custom logic Cost metric for static gates is literal: ax + bx’ has four literals, requires 8 transistors. Cost metric for FPGAs is logic element: All functions that fit in an LE have the same cost.

12 LUT-based logic synthesis
Find the largest logic cone that will fit into the LUT: r = q + s’ s = d’ q = g’ + h d = a + b

13 How much fits in a LUT? One 2-input NAND gate frequently used for comparison. Approximately 12 ~ 15 gates per four-input LUT. 216 functions -> 80 after IO swapping 14 after IO inversion 4-input determined to be optimal [Rose 1990] A B C D A B C D

14 Technology-Independent Logic Optimization
Improve circuit based on cost Keep same functionality Boolean Evaluation/decomposition Simple factoring -> minimizing literals f = ac + ad + bc + bd g = a + b + c e = a + b g = e + c f = e(c + d)

15 Factorization Based on division:
formulate candidate divisor; test how it divides into the function; if g = f/c, we can use c as an intermediate function for f. Algebraic division: don’t take into account Boolean simplification. Less expensive then Boolean division.

16 Library-based Technology Mapping – MIS II
Three steps: decomposition, matching, covering Circuit first decomposed into NAND representations Different collections of NANDs can be implemented differently in VLSI Inv, cost 2 NAND2, cost 3 AOI-21, cost 4

17 MIS II Cost = Decompose into NAND-2 using Boolean techniques
Use dynamic programming to match subtrees with libraries Choose lowest cost implementation that covers all primitives.

18 Tech Mapping for LUTs Minimize total number of LUTs
Minimize the number of levels of LUTs Many different approaches Partitioning -> Flowmap BDDs -> XMAP Chortle -> Covering Basic Xilinx tech mapping follows Chortle with modification to handle registers.

19 Chortle-crf Secondary goal Dynamic programming approach
Minimize # LUTs – primary goal Minimize # input circuit root uses Secondary goal Operates on AND-OR circuits. A B C D E F w x G H I J K L M y z Locate boundaries

20 Chortle-crf Major innovation is bin packing
Simultaneously addresses decomposition and matching Goal: Find decomposition of every node in the network that minimizes # LUTs in final circuit Without decomp 4-LUTs With decomposition 2-LUTs

21 Mapping Each Tree Dynamically visit each node in the graph
Fanin nodes drive the node under evaluation Boxes -> fanin LUTs, cost is number of inputs Bins -> N input LUT (in this case 5) First Fit Decreasing /* construct 2-level decomp */ box list <- fanin LUTs sorted by size bin list <- 0 while (box list is not 0) { box <- largest LUT find bin that will contain LUT if bin doesn’t exist bin <- box /* create new bin */ else bin <- box /* pack in exisiting */

22 Multi-Level Decomposition
Chain LUTs together Output of largest second level LUT connected to LUT with unused input May need to add a new LUT Leads to min LUTs and fanout LUT with smallest # input This fanout LUT used as input to next stage

23 b) Two-level Decomposition
Examples a) Fanin LUTs u v w x y b) Two-level Decomposition y x z.2 z.1 w v u y u v w x z.1 c) Multi-level Decomposition

24 Optimality For LUTs with fewer than 6 inputs Chortle will create an optimal result for subtree Combination of sub-trees is not optimized. Local optimizations needed to ensure global optimality. Reconvergent paths -> net drives multiple gates. Replicating logic -> creating additional fanout

25 Translating a Design to an FPGA
Improve 2-level decomposition to take fanout into account Replace FFD with an exhaustive search that repeatedly invokes FFD. Try both with and without reconvergent path and select best mapping (forced merging) Inputs must reconverge at node being decomposed.

26 Reconvergent Paths Frequently, more than one pair of fan-in LUTs share inputs For each combination of pairs that share inputs, perform FFD. Two-level decomp with fewest bins and smallest least filled bin retained Reconverge pair list <- all pairs of fanin LUTs with shared inputs best LUTs <- 0 for all possible pairs from pair list { merged LUTs <- copy of fanin LUTs with forced merge FFD(merged LUTs) /* best combo */ }

27 Maximum Share Decreasing
Exhaustive search prohibitive Select box using following criteria Greatest # inputs Shares greatest # inputs with any existing bin Shares greatest # of inputs with existing (remaining) boxes Reduces to FFD for no input sharing Points 2 and 3 optimize network sharing

28 Node Replication Without Replication With Replication
Apply replication to fanout nodes Map without replication first Locally decompose fanout nodes to determine savings Ordering important

29 Results – Chortle-crf 20 netlists mapped to 5-input LUTs
Reconvergence reduced LUTs by 2.7% Replication reduced LUTs by 3.7% Combined 14% reduction achieved Replication exposes reconvergent paths creating additional opportunities for optimization.

30 Chortle-d Minimize delay through circuit
Generally increases hardware required Reduced logic levels by 38% Increased # LUTs by 79% Note most delay in FPGA in interconnect

31 Other Approaches MIS-PGA Groups inputs into LUTs
Decompose into 4-LUTs (Roth-Karp) 47 times slower than Chortle 14% fewer LUTs XMAP Represent circuit as BDDs Effective for multiplexer based devices. Also, BDS-PGA

32 Flowmap 1. Use network flow to partition circuit.
2. Determine point where minimum flow achieved for minimum cut 3. Cut until LUTs of size N achieved.

33 Taking Flip flops into Account
FPGA devices contain fixed resources – FFs Technology mapping should take these into account Consider fanout nodes. FF

34 LUT Packing - VPACK Seed BLE – choose BLE with most inputs.
Select next BLE -> BLE which shares most inputs and outputs with cluster Continue until cluster is full or adding any BLE will overflow I -> # inputs Hill Climbing – exceed I limit temporarily to find better minimum.

35 Summary Many tech mapping algorithms exist to minimize delay/area
Chortle use dynamic programming heuristic to perform mapping Largely a solved problem More sophisticated techniques evaluated recently


Download ppt "Give qualifications of instructors: DAP"

Similar presentations


Ads by Google