Presentation is loading. Please wait.

Presentation is loading. Please wait.

ESE535: Electronic Design Automation

Similar presentations


Presentation on theme: "ESE535: Electronic Design Automation"— Presentation transcript:

1 ESE535: Electronic Design Automation
Day 7: February 4, 2013 Clustering (LUT Mapping, Delay) Penn ESE535 Spring DeHon

2 Today How do we map to LUTs? What happens when Lessons… IO dominates
Behavioral (C, MATLAB, …) RTL Gate Netlist Layout Masks Arch. Select Schedule FSM assign Two-level, Multilevel opt. Covering Retiming Placement Routing Today How do we map to LUTs? What happens when IO dominates Delay dominates? Lessons… for non-LUTs for delay-oriented partitioning Penn ESE535 Spring DeHon

3 LUT Mapping Problem: Map logic netlist to LUTs Old problem?
minimizing area minimizing delay Old problem? Technology mapping? (Day 2) How big is the library for K-input LUT? 22K gates in library Penn ESE535 Spring DeHon

4 Simplifying Structure
K-LUT can implement any K-input function Penn ESE535 Spring DeHon

5 Preclass: Cover in 4-LUT?
Penn ESE535 Spring DeHon

6 Preclass: Cover in 4-LUT?
Penn ESE535 Spring DeHon

7 Preclass: Cover in 4-LUT?
Penn ESE535 Spring DeHon

8 Preclass: Cover in 4-LUT?
Penn ESE535 Spring DeHon

9 Preclass: Cover in 4-LUT?
Penn ESE535 Spring DeHon

10 Cost Function Delay: number of LUTs in critical path
doesn’t say delay in LUTs or in wires does assume uniform interconnect delay Area: number of LUTs Assumes adequate interconnect to use LUTs Penn ESE535 Spring DeHon

11 LUT Mapping NP-Hard in general
Fanout-free -- can solve optimally given decomposition (but which one?) Delay optimal mapping achievable in Polynomial time Area w/ fanout NP-complete Penn ESE535 Spring DeHon

12 Preliminaries What matters/makes this interesting? Area / Delay target
Decomposition Fanout replication reconvergent Penn ESE535 Spring DeHon

13 Costs: Area vs. Delay Penn ESE535 Spring DeHon

14 Decomposition Penn ESE535 Spring DeHon

15 Decomposition Penn ESE535 Spring DeHon

16 Fanout: Replication Penn ESE535 Spring DeHon

17 Fanout: Replication Penn ESE535 Spring DeHon

18 Fanout: Reconvergence
Penn ESE535 Spring DeHon

19 Fanout: Reconvergence
Penn ESE535 Spring DeHon

20 What makes it hard? Cost does not monotonically increase as cover more of graph. Not clear when to stop? We say cost does not have a monotone property Penn ESE535 Spring DeHon

21 Preclass Revisited Penn ESE535 Spring DeHon

22 Definition Cone: set of nodes in the recursive fanin of a node
Penn ESE535 Spring DeHon

23 Example Cones Penn ESE535 Spring DeHon

24 Delay Penn ESE535 Spring DeHon

25 Delay of Preclass Circuit?
Poll: Delay of preclass circuit Penn ESE535 Spring DeHon

26 Dynamic Programming Optimal covering of a logic cone is:
Minimum cost (all possible coverings) Evaluate costs of each node based on: cover node cones covering each fanin to node cover Evaluate node costs in topological order Key: are calculating optimal solutions to subproblems only have to evaluate covering options at each node Penn ESE535 Spring DeHon

27 Flowmap Key Idea: LUT holds anything with K inputs
Use network flow to find cuts  logic can pack into LUT including reconvergence …allows replication Optimal depth arise from optimal depth solution to subproblems Penn ESE535 Spring DeHon

28 Max-Flow / Min-Cut The maximum flow in a network is equal to the minimum cut …the bottleneck We can find the mincut by computing the maxflow. Conceptually, how would we determine the maximum flow? Penn ESE535 Spring DeHon

29 MaxFlow Set all edge flows to zero While there is a path from s,t
F[u,v]=0 While there is a path from s,t (breadth-first-search) for each edge in path f[u,v]=f[u,v]+1 f[v,u]=-f[u,v] When c[v,u]=f[v,u] remove edge from search O(|E|*cutsize) Penn ESE535 Spring DeHon

30 Flowmap Delay objective: Height of node will be: Check shorter first
Examples are K=4 Delay objective: minimum height, K-feasible cut I.e. cut no more than K edges start by bounding fanin  K Height of node will be: height of predecessors or one greater than height of predecessors Check shorter first 1 1 1 1 2 Penn ESE535 Spring DeHon

31 Flowmap Construct flow problem sink  target node being mapped
source  start set (primary inputs) flow infinite into start set flow of one on each link to see if height same as predecessors collapse all predecessors of maximum height into sink (single node, cut must be above) height +1 case is trivially true Penn ESE535 Spring DeHon

32 Example Subgraph Target: K=4 1 1 1 1 2 2
Penn ESE535 Spring DeHon

33 Trivial: Height +1 1 1 1 1 2 2 3 Penn ESE535 Spring DeHon

34 Collapse at max height 1 1 1 1 2 2 Penn ESE535 Spring DeHon

35 Collapse at max height 1 1 1 1 2 2 Collapsed Node
Penn ESE535 Spring DeHon

36 Augmenting Flows Collapsed Node Penn ESE535 Spring DeHon

37 Augmenting Flows Collapsed Node Penn ESE535 Spring DeHon

38 Augmenting Flows Collapsed Node Penn ESE535 Spring DeHon

39 Augmenting Flows Collapsed Node Penn ESE535 Spring DeHon

40 Augmenting Flows Collapsed Node Collapsed Node
Penn ESE535 Spring DeHon

41 Collapse at max height: works for K=4
2 Collapsed Node 1 1 1 1 2 2 Penn ESE535 Spring DeHon

42 Collapse not work (K still 4) (different/larger graph)
1 1 1 1 2 2 Forced to label height+1 2 Penn ESE535 Spring DeHon

43 Reconvergent fanout (yet different graph) Can label at height 1 1 1 1
2 Can label at height 2 Penn ESE535 Spring DeHon

44 Flowmap Max-flow Min-cut algorithm to find cut
Use augmenting paths until discover max flow > K O(K|e|) time to discover K-feasible cut (or that does not exist) Depth identification: O(KN|e|) Penn ESE535 Spring DeHon

45 Mincut may not be unique
2 Collapsed Node 1 1 1 1 2 2 Penn ESE535 Spring DeHon

46 Flowmap Min-cut may not be unique
To minimize area achieving delay optimum find max volume min-cut Compute max flow  find min cut remove edges consumed by max flow DFS from source Compliment set is max volume set Penn ESE535 Spring DeHon

47 Collapse at max height: works for K=4
2 Collapsed Node 1 1 1 1 2 2 Penn ESE535 Spring DeHon

48 BFS from Source 2 Collapsed Node 1 1 1 1 2 2
Penn ESE535 Spring DeHon

49 BFS from Source 1 1 1 1 2 2 Collapsed Node 2
Penn ESE535 Spring DeHon

50 BFS from Source 1 1 1 1 2 2 Collapsed Node 2
Penn ESE535 Spring DeHon

51 BFS from Source 1 1 1 1 2 2 Collapsed Node 2 Does not find rest.
Penn ESE535 Spring DeHon

52 Max-Volume Mincut 1 1 1 1 2 2 Collapsed Node 2
Penn ESE535 Spring DeHon

53 Flowmap Covering from labeling is straightforward Notes:
process in reverse topological order allocate identified K-feasible cut to LUT remove node postprocess to minimize LUT count Notes: replication implicit (covered multiple places) nodes purely internal to one or more covers may not get their own LUTs Penn ESE535 Spring DeHon

54 Flowmap Roundup Label Cover Work from inputs to outputs
Find max label of predecessors Collapse new node with all predecessors at this label Can find flow cut  K? Yes: mark with label (find max-volume cut extent) No: mark with label+1 Cover Work from outputs to inputs Allocate LUT for identified cluster/cover Recurse covering selection on inputs to identified LUT Penn ESE535 Spring DeHon

55 Changing Cost Functions Now (previous was delay)
Area Changing Cost Functions Now (previous was delay) Penn ESE535 Spring DeHon

56 DF-Map Duplication Free Mapping
can find optimal area under this constraint (but optimal area may not be duplication free) [Cong+Ding, IEEE TR VLSI Sys. V2n2p137] Penn ESE535 Spring DeHon

57 Maximum Fanout Free Cones
MFFC: bit more general than trees Penn ESE535 Spring DeHon

58 MFFC Follow cone backward
end at node that fans out (has output) outside the code Penn ESE535 Spring DeHon

59 MFFC example Identify FFC I F G H C D E A B
Penn ESE535 Spring DeHon

60 MFFC example I F G H C D E A B Penn ESE535 Spring DeHon

61 DF-Map Partition into graph into MFFCs Optimally map each MFFC
In dynamic programming for each node examine each K-feasible cut note: this is very different than flowmap where only had to examine a single cut Example to follow pick cut to minimize cost 1 +  cones for fanins Penn ESE535 Spring DeHon

62 DF-Map Example Cones? Penn ESE535 Spring DeHon

63 DF-Map Example Penn ESE535 Spring DeHon

64 DF-Map Example Penn ESE535 Spring DeHon

65 DF-Map Example Penn ESE535 Spring DeHon

66 DF-Map Example Penn ESE535 Spring DeHon

67 DF-Map Example Penn ESE535 Spring DeHon

68 DF-Map Example Start mapping cone Penn ESE535 Spring DeHon

69 DF-Map Example 1 1 1 1 Penn ESE535 Spring DeHon

70 DF-Map Example 1 1 1 ? 1 Penn ESE535 Spring DeHon

71 DF-Map Example 1 1 1 ? 1 Penn ESE535 Spring DeHon

72 DF-Map Example 1 1 1 ? 1 Penn ESE535 Spring DeHon

73 DF-Map Example 1 1 1 ? 1 Penn ESE535 Spring DeHon

74 DF-Map Example 1 1 1 1 1 Penn ESE535 Spring DeHon

75 DF-Map Example Similar to previous 1 1 1 1 1 1 1
Penn ESE535 Spring DeHon

76 DF-Map Example 1 1 1 1 1 1 1 ? Penn ESE535 Spring DeHon

77 DF-Map Example 1 1 1 1 1 1 1 ? Penn ESE535 Spring DeHon

78 DF-Map Example 1 1 1 1 1 1 1 3 ? Penn ESE535 Spring DeHon

79 DF-Map Example 1 1 1 1 1 1 1 ? Penn ESE535 Spring DeHon

80 DF-Map Example 2 1 1 1 1 1 1 1 ? Penn ESE535 Spring DeHon

81 DF-Map Example 1 1 1 1 1 1 1 ? Penn ESE535 Spring DeHon

82 DF-Map Example 3 1 1 1 1 1 1 1 ? Penn ESE535 Spring DeHon

83 DF-Map Example 1 1 1 1 1 1 1 2 Penn ESE535 Spring DeHon

84 DF-Map Example 1 1 1 1 1 1 1 2 ? Penn ESE535 Spring DeHon

85 DF-Map Example 1 1 1 1 1 1 1 2 ? Penn ESE535 Spring DeHon

86 DF-Map Example 1 1 1 1 1 1 1 3 2 ? Penn ESE535 Spring DeHon

87 DF-Map Example 1 1 1 1 1 1 1 3 2 ? Penn ESE535 Spring DeHon

88 DF-Map Example 1 1 3 1 1 1 1 1 3 2 ? Penn ESE535 Spring DeHon

89 DF-Map Example 1 1 3 1 1 1 1 1 3 2 ? Penn ESE535 Spring DeHon

90 DF-Map Example 3 1 1 3 1 1 1 1 1 3 2 ? Penn ESE535 Spring DeHon

91 DF-Map Example 1 1 1 1 1 1 1 2 3 Penn ESE535 Spring DeHon

92 DF-Map Example 1 1 1 1 1 1 1 2 3 ? Penn ESE535 Spring DeHon

93 DF-Map Example 1 1 1 1 1 1 1 2 3 ? Penn ESE535 Spring DeHon

94 DF-Map Example 1 1 1 1 1 1 1 4 2 3 ? Penn ESE535 Spring DeHon

95 DF-Map Example 1 1 1 1 1 1 1 4 2 3 ? Penn ESE535 Spring DeHon

96 DF-Map Example 1 1 3 1 1 1 1 1 4 2 3 ? Penn ESE535 Spring DeHon

97 DF-Map Example 1 1 3 1 1 1 1 1 4 2 3 ? Penn ESE535 Spring DeHon

98 DF-Map Example 1 1 3 3 1 1 1 1 1 4 2 3 ? Penn ESE535 Spring DeHon

99 DF-Map Example 1 1 3 3 1 1 1 1 1 4 2 3 ? Penn ESE535 Spring DeHon

100 DF-Map Example 1 1 3 3 1 1 1 1 1 4 2 3 ? 4 Penn ESE535 Spring DeHon

101 DF-Map Example 1 1 3 3 1 1 1 1 1 4 2 3 ? 4 Penn ESE535 Spring DeHon

102 DF-Map Example 1 1 3 3 1 1 1 1 1 4 2 3 ? 3 4 Penn ESE535 Spring DeHon

103 DF-Map Example 1 1 1 1 1 1 1 2 3 3 Penn ESE535 Spring DeHon

104 DF-Map Example 1 1 1 1 1 1 1 2 3 3 ? Penn ESE535 Spring DeHon

105 DF-Map Example 1 1 1 1 1 1 1 2 3 3 ? Penn ESE535 Spring DeHon

106 DF-Map Example 1 1 1 1 1 1 1 2 3 3 9 ? Penn ESE535 Spring DeHon

107 DF-Map Example 1 1 1 1 1 1 1 2 3 3 9 ? Penn ESE535 Spring DeHon

108 DF-Map Example 1 1 1 1 1 1 1 7 2 3 3 9 ? Penn ESE535 Spring DeHon

109 DF-Map Example 1 1 1 1 1 1 1 7 2 3 3 9 ? Penn ESE535 Spring DeHon

110 DF-Map Example 1 1 1 1 1 1 1 7 5 2 3 3 9 ? Penn ESE535 Spring DeHon

111 DF-Map Example 1 1 1 1 1 1 1 2 3 3 5 Penn ESE535 Spring DeHon

112 DF-Map Example 1 1 1 1 1 1 1 2 3 3 5 Penn ESE535 Spring DeHon

113 Composing Don’t need minimum delay off the critical path
Don’t always want/need minimum delay Composite: map with flowmap Greedy decomposition of “most promising” non-critical nodes DF-map these nodes Penn ESE535 Spring DeHon

114 Variations on a Theme Penn ESE535 Spring DeHon

115 Applicability to Non-LUTs?
E.g. LUT Cascade can handle some functions of K inputs How apply? Penn ESE535 Spring DeHon

116 Adaptable to Non-LUTs Sketch:
Initial decomposition to nodes that will fit Find max volume, min-height K-feasible cut ask if logic block will cover yes  done no  exclude one (or more) nodes from block and repeat exclude == collapse into start set nodes this makes heuristic Penn ESE535 Spring DeHon

117 Partitioning? Effectively partitioning logic into clusters LUT cluster
unlimited internal “gate” capacity limited I/O (K) simple delay cost model 1 cross between clusters 0 inside cluster Penn ESE535 Spring DeHon

118 Partitioning Clustering
if strongly I/O limited, same basic idea works for partitioning to components typically: partitioning onto multiple FPGAs assumption: inter-FPGA delay >> intra-FPGA delay w/ area constraints similar to non-LUT case make min-cut will it fit? Exclude some LUTs and repeat Penn ESE535 Spring DeHon

119 Clustering for Delay W/ no IO constraint area is monotone property
DP-label forward with delays grab up largest labels (greatest delays) until fill cluster size Work backward from outputs creating clusters as needed Penn ESE535 Spring DeHon

120 Area and IO? Real problem: Doing both optimally is NP-hard
FPGA/chip partitioning Doing both optimally is NP-hard Heuristic around IO cut first should do well (e.g. non-LUT slide) [Yang and Wong, FPGA’94] Penn ESE535 Spring DeHon

121 Partitioning To date: Open/promising
primarily used for 2-level hierarchy I.e. intra-FPGA, inter-FPGA Open/promising adapt to multi-level for delay-optimized partitioning/placement on fixed-wire schedule localize critical paths to smallest subtree possible? Penn ESE535 Spring DeHon

122 Summary Optimal LUT mapping NP-hard in general
fanout, replication, …. K-LUTs makes delay optimal feasible single constraint: IO capacity technique: max-flow/min-cut Heuristic adaptations of basic idea to capacity constrained problem promising area for interconnect delay optimization Penn ESE535 Spring DeHon

123 Today’s Big Ideas: IO may be a dominant cost Exploit structure: K-LUTs
limiting capacity, delay Exploit structure: K-LUTs Mixing dominant modes multiple objectives Define optimally solvable subproblem duplication free mapping Penn ESE535 Spring DeHon

124 Admin Reading Wednesday on web
Assignment 2a was due at beginning of class Assignment 2b due next Monday Penn ESE535 Spring DeHon


Download ppt "ESE535: Electronic Design Automation"

Similar presentations


Ads by Google