Download presentation
Presentation is loading. Please wait.
1
Mapping into LUT Structures
Sayak Ray, Alan Mishchenko, Niklas Een, Robert Brayton Department of EECS, UC Berkeley Stephen Jang, Chao Chen Agate Logic Inc.
2
Contributions (in a nutshell)
New mapping algorithm for FPGAs, which maps into LUT structures, instead of LUTs It has two applications: (1) Improving the quality of mapping into LUTs Area improves by 7.4% on average Delay improves by 11.3% on average (2) Improving delay for specialized hardware, which supports non-routable connections Delay improves by 40% on average With some area penalty This is a dedicated cut-based mapping algorithm, which considers larger cuts, and map each of them into a LUT structure, as opposed to the traditional mapping, which first maps into LUTs and then packs them into LUT structures. In contribution (1), hardware remains the same (single LUTs and routable connections between them). In contribution (2), specialized hardware is used, which can implement the connections between LUTs inside the same LUT structure, as non-routable connections.
3
LUT Structure LUT-structure – a group of LUTs connected by direct, non-routable wires Non-routable Wire Non-routable Wire Non-routable Wire 7-input LUT structure “44” 10‑input LUT structure “444”
4
Some Terminology Let (X) be a Boolean function
Let X1 X be a subset of its support Suppose {q1(X), q2(X), …, q(X)} is the set of distinct cofactors of w.r.t. X1 is called the column multiplicity of w.r.t X1 Given a partition of X into two disjoint subsets X1 and X2, we say that Ashenhurst-Curtis decomposition of (X) exists if (X) can be expressed as (X) = h(g1(X1), g2(X1), …, gk(X1), X2) X1 : bound set X2 : free set
5
Flow of performLutMatchingXY
1 SupportMinimize removes vacuous variables 2 findOutputDecomposition Checks for f = x G Variable reordering in truth table Allows cases = 2, 3, 4 For = 3, 4, consider special decomposition with one shared variable only 3 findGoodBoundSet 4 checkSpecialNonDisjoint 5 reverseVariableOrder A heuristic to find suitable decomposition 6 findGoodBoundSet 7 checkSpecialNonDisjoint
6
Checking for XYZ decomposition
X, Y, and Z are sizes of the main/fanin LUTs Two step process Checking for XW where W = Y + Z – 2 If it exists, then check the remainder function G for YZ Priority cut-based technology mapper is modified to accommodate the algorithm for XY and XYZ The results of decomposition checking are cached This substantially reduces runtime on large designs
7
Experiment 1
8
Experiment 2
9
Experiment 3
10
Experiment 4 – Delay Optimization
11
Experiment 5 – Delay Optimization
12
Experiment 6 – Delay Optimization
13
Experiment 7 : industrial design
14
Experiment 8 : industrial design
15
Future Work Improving Implementation
Handling delay driven decomposition Currently we ignore arrival time, and just care about detecting any decomposition Using semi-canonical form to increase the number of hits in the hash table of computed results Making truth-table based decomposition even faster Combining Boolean decomposition into LUT structures with structural mapping of LUTs into clusters Evaluating results after place and route This will be especially interesting when specialized hardware is available
16
Questions Questions….
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.