Presentation is loading. Please wait.

Presentation is loading. Please wait.

Penn ESE535 Spring 2015 -- DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2015 Clustering (LUT Mapping, Delay)

Similar presentations


Presentation on theme: "Penn ESE535 Spring 2015 -- DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2015 Clustering (LUT Mapping, Delay)"— Presentation transcript:

1 Penn ESE535 Spring 2015 -- DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2015 Clustering (LUT Mapping, Delay)

2 Penn ESE535 Spring 2015 -- DeHon 2 Today How do we map to LUTs? What happens when –IO dominates –Delay dominates? Lessons… –for non-LUTs –for delay-oriented partitioning Behavioral (C, MATLAB, …) RTL Gate Netlist Layout Masks Arch. Select Schedule FSM assign Two-level, Multilevel opt. Covering Retiming Placement Routing

3 Penn ESE535 Spring 2015 -- DeHon 3 LUT Mapping Problem: Map logic netlist to LUTs –minimizing area –minimizing delay Old problem? –Technology mapping? (Day 3) –How big is the library for K-input LUT? 2 2 K gates in library

4 Penn ESE535 Spring 2015 -- DeHon 4 Simplifying Structure K-LUT can implement any K-input function

5 Preclass: Cover in 4-LUT? Penn ESE535 Spring 2015 -- DeHon 5

6 Preclass: Cover in 4-LUT? Penn ESE535 Spring 2015 -- DeHon 6

7 Preclass: Cover in 4-LUT? Penn ESE535 Spring 2015 -- DeHon 7

8 Preclass: Cover in 4-LUT? Penn ESE535 Spring 2015 -- DeHon 8

9 Preclass: Cover in 4-LUT? Penn ESE535 Spring 2015 -- DeHon 9

10 10 Cost Function Delay: number of LUTs in critical path –doesn’t say delay in LUTs or in wires –does assume uniform interconnect delay Area: number of LUTs –Assumes adequate interconnect to use LUTs

11 Penn ESE535 Spring 2015 -- DeHon 11 LUT Mapping NP-Hard in general Fanout-free -- can solve optimally given decomposition –(but which one?) Delay optimal mapping achievable in Polynomial time Area w/ fanout NP-complete

12 Penn ESE535 Spring 2015 -- DeHon 12 Preliminaries What matters/makes this interesting? –Area / Delay target –Decomposition –Fanout replication reconvergent

13 Penn ESE535 Spring 2015 -- DeHon 13 Costs: Area vs. Delay

14 Penn ESE535 Spring 2015 -- DeHon 14 Decomposition

15 Penn ESE535 Spring 2015 -- DeHon 15 Decomposition

16 Penn ESE535 Spring 2015 -- DeHon 16 Fanout: Replication

17 Penn ESE535 Spring 2015 -- DeHon 17 Fanout: Replication

18 Penn ESE535 Spring 2015 -- DeHon 18 Fanout: Reconvergence

19 Penn ESE535 Spring 2015 -- DeHon 19 Fanout: Reconvergence

20 What makes it hard? Cost does not monotonically increase as cover more of graph. Not clear when to stop? We say cost does not have a monotone property Penn ESE535 Spring 2015 -- DeHon 20

21 Preclass Revisited Penn ESE535 Spring 2015 -- DeHon 21

22 Penn ESE535 Spring 2015 -- DeHon 22 Definition Cone: set of nodes in the recursive fanin of a node

23 Penn ESE535 Spring 2015 -- DeHon 23 Example Cones

24 Penn ESE535 Spring 2015 -- DeHon 24 Delay

25 Delay of Preclass Circuit? Poll: Delay of preclass circuit Penn ESE535 Spring 2015 -- DeHon 25

26 Penn ESE535 Spring 2015 -- DeHon 26 Dynamic Programming Optimal covering of a logic cone is: –Minimum cost (all possible coverings) Evaluate costs of each node based on: –cover node –cones covering each fanin to node cover Evaluate node costs in topological order Key: are calculating optimal solutions to subproblems –only have to evaluate covering options at each node

27 Penn ESE535 Spring 2015 -- DeHon 27 Flowmap Key Idea: –LUT holds anything with K inputs –Use network flow to find cuts  logic can pack into LUT including reconvergence …allows replication –Optimal depth arise from optimal depth solution to subproblems

28 Max-Flow / Min-Cut The maximum flow in a network is equal to the minimum cut –…the bottleneck We can find the mincut by computing the maxflow. Conceptually, how would we determine the maximum flow? Penn ESE535 Spring 2015 -- DeHon 28

29 Example Flow cut Penn ESE535 Spring 2015 -- DeHon 29 s t

30 Example Flow cut Penn ESE535 Spring 2015 -- DeHon 30 s t Visually, what is min-cut?

31 Penn ESE535 Spring 2015 -- DeHon 31 MaxFlow Set all edge flows to zero –F[u,v]=0 While there is a path from s,t –(breadth-first-search) –for each edge in path f[u,v]=f[u,v]+1 –f[v,u]=-f[u,v] –When c[v,u]=f[v,u] remove edge from search O(|E|*cutsize)

32 Example Flow cut Penn ESE535 Spring 2015 -- DeHon 32 A B C D E F G H I J s t Find a path?

33 Example Flow cut Penn ESE535 Spring 2015 -- DeHon 33 A B C D E F G H I J s t

34 Example Flow cut Penn ESE535 Spring 2015 -- DeHon 34 A B C D E F G H I J s t Find a path?

35 Example Flow cut Penn ESE535 Spring 2015 -- DeHon 35 A B C D E F G H I J s t

36 Example Flow cut Penn ESE535 Spring 2015 -- DeHon 36 A B C D E F G H I J s t Find a path?

37 Penn ESE535 Spring 2015 -- DeHon 37 Delay objective: –minimum height, K-feasible cut –I.e. cut no more than K edges –start by bounding fanin  K Height of node will be: –height of predecessors or –one greater than height of predecessors Check shorter first Flowmap 11 11 2 Examples are K=4

38 Penn ESE535 Spring 2015 -- DeHon 38 Flowmap Construct flow problem –sink  target node being mapped –source  start set (primary inputs) –flow infinite into start set –flow of one on each link –to see if height same as predecessors collapse all predecessors of maximum height into sink (single node, cut must be above) height +1 case is trivially true

39 Penn ESE535 Spring 2015 -- DeHon 39 11 2 2 1 1 Example Subgraph Target: K=4

40 Penn ESE535 Spring 2015 -- DeHon 40 3 11 2 2 1 1 Trivial: Height +1

41 Penn ESE535 Spring 2015 -- DeHon 41 11 2 2 1 1 Collapse at max height

42 Penn ESE535 Spring 2015 -- DeHon 42 11 2 2 1 1 Collapse at max height Collapsed Node

43 Penn ESE535 Spring 2015 -- DeHon 43 Augmenting Flows Collapsed Node

44 Penn ESE535 Spring 2015 -- DeHon 44 Augmenting Flows Collapsed Node

45 Penn ESE535 Spring 2015 -- DeHon 45 Augmenting Flows Collapsed Node

46 Penn ESE535 Spring 2015 -- DeHon 46 Augmenting Flows Collapsed Node

47 Penn ESE535 Spring 2015 -- DeHon 47 Augmenting Flows Collapsed Node Collapsed Node

48 Penn ESE535 Spring 2015 -- DeHon 48 2 11 2 2 1 1 Collapse at max height: works for K=4 Collapsed Node

49 Penn ESE535 Spring 2015 -- DeHon 49 2 11 2 2 1 1 Collapse not work (K still 4) (different/larger graph) Forced to label height+1

50 Penn ESE535 Spring 2015 -- DeHon 50 2 1 1 1 2 1 Reconvergent fanout (yet different graph) Can label at height 1

51 Penn ESE535 Spring 2015 -- DeHon 51 Flowmap Max-flow Min-cut algorithm to find cut Use augmenting paths until discover max flow > K O(K|E|) time to discover K-feasible cut –(or that does not exist) Depth identification: O(KN|E|)

52 Penn ESE535 Spring 2015 -- DeHon 52 2 11 2 2 1 1 Mincut may not be unique Collapsed Node

53 Penn ESE535 Spring 2015 -- DeHon 53 Flowmap Min-cut may not be unique To minimize area achieving delay optimum –find max volume min-cut Compute max flow  find min cut remove edges consumed by max flow DFS from source Compliment set is max volume set

54 Penn ESE535 Spring 2015 -- DeHon 54 2 11 2 2 1 1 Collapse at max height: works for K=4 Collapsed Node

55 Penn ESE535 Spring 2015 -- DeHon 55 2 11 2 2 1 1 BFS from Source Collapsed Node

56 Penn ESE535 Spring 2015 -- DeHon 56 2 11 2 2 1 1 BFS from Source Collapsed Node

57 Penn ESE535 Spring 2015 -- DeHon 57 2 11 2 2 1 1 BFS from Source Collapsed Node

58 Penn ESE535 Spring 2015 -- DeHon 58 2 11 2 2 1 1 BFS from Source Collapsed Node Does not find rest.

59 Penn ESE535 Spring 2015 -- DeHon 59 2 11 2 2 1 1 Max-Volume Mincut Collapsed Node

60 Penn ESE535 Spring 2015 -- DeHon 60 Flowmap Covering from labeling is straightforward –process in reverse topological order –allocate identified K-feasible cut to LUT –remove node –postprocess to minimize LUT count Notes: –replication implicit (covered multiple places) –nodes purely internal to one or more covers may not get their own LUTs

61 Penn ESE535 Spring 2015 -- DeHon 61 Flowmap Roundup Label –Work from inputs to outputs –Find max label of predecessors –Collapse new node with all predecessors at this label –Can find flow cut  K? Yes: mark with label (find max-volume cut extent) No: mark with label+1 Cover –Work from outputs to inputs –Allocate LUT for identified cluster/cover –Recurse covering selection on inputs to identified LUT

62 Penn ESE535 Spring 2015 -- DeHon 62 Area Changing Cost Functions Now (previous was delay)

63 Penn ESE535 Spring 2015 -- DeHon 63 DF-Map Duplication Free Mapping –can find optimal area under this constraint –(but optimal area may not be duplication free) [Cong+Ding, IEEE TR VLSI Sys. V2n2p137]

64 Penn ESE535 Spring 2015 -- DeHon 64 Maximum Fanout Free Cones MFFC: bit more general than trees

65 Penn ESE535 Spring 2015 -- DeHon 65 MFFC Follow cone backward end at node that fans out (has output) outside the cone

66 Penn ESE535 Spring 2015 -- DeHon 66 MFFC example A CD FG I B E H Identify FFC

67 Penn ESE535 Spring 2015 -- DeHon 67 MFFC example A CD FG I B E H

68 Penn ESE535 Spring 2015 -- DeHon 68 DF-Map Partition into graph into MFFCs Optimally map each MFFC In dynamic programming –for each node examine each K-feasible cut –note: this is very different than flowmap where only had to examine a single cut –Example to follow pick cut to minimize cost –1 +  cones for fanins

69 Penn ESE535 Spring 2015 -- DeHon 69 DF-Map Example Cones?

70 Penn ESE535 Spring 2015 -- DeHon 70 DF-Map Example

71 Penn ESE535 Spring 2015 -- DeHon 71 DF-Map Example

72 Penn ESE535 Spring 2015 -- DeHon 72 DF-Map Example

73 Penn ESE535 Spring 2015 -- DeHon 73 DF-Map Example

74 Penn ESE535 Spring 2015 -- DeHon 74 DF-Map Example

75 Penn ESE535 Spring 2015 -- DeHon 75 DF-Map Example Start mapping cone

76 Penn ESE535 Spring 2015 -- DeHon 76 DF-Map Example 11 11

77 Penn ESE535 Spring 2015 -- DeHon 77 DF-Map Example ? 11 11

78 Penn ESE535 Spring 2015 -- DeHon 78 DF-Map Example ? 11 11

79 Penn ESE535 Spring 2015 -- DeHon 79 DF-Map Example ? 11 11

80 Penn ESE535 Spring 2015 -- DeHon 80 DF-Map Example ? 11 11

81 Penn ESE535 Spring 2015 -- DeHon 81 DF-Map Example 1 11 11

82 Penn ESE535 Spring 2015 -- DeHon 82 DF-Map Example 11 1 1 1 11 Similar to previous

83 Penn ESE535 Spring 2015 -- DeHon 83 DF-Map Example ? 11 1 1 1 11

84 Penn ESE535 Spring 2015 -- DeHon 84 DF-Map Example ? 11 1 1 1 11

85 Penn ESE535 Spring 2015 -- DeHon 85 DF-Map Example ? 11 1 1 1 11 3

86 Penn ESE535 Spring 2015 -- DeHon 86 DF-Map Example ? 11 1 1 1 11

87 Penn ESE535 Spring 2015 -- DeHon 87 DF-Map Example ? 11 1 1 1 11 2

88 Penn ESE535 Spring 2015 -- DeHon 88 DF-Map Example ? 11 1 1 1 11

89 Penn ESE535 Spring 2015 -- DeHon 89 DF-Map Example ? 11 1 1 1 11 3

90 Penn ESE535 Spring 2015 -- DeHon 90 DF-Map Example 2 11 1 1 1 11

91 Penn ESE535 Spring 2015 -- DeHon 91 DF-Map Example 2? 11 1 1 1 11

92 Penn ESE535 Spring 2015 -- DeHon 92 DF-Map Example 2? 11 1 1 1 11

93 Penn ESE535 Spring 2015 -- DeHon 93 DF-Map Example 2? 11 1 1 1 11 3

94 Penn ESE535 Spring 2015 -- DeHon 94 DF-Map Example 2? 11 1 1 1 11 3

95 Penn ESE535 Spring 2015 -- DeHon 95 DF-Map Example 2? 11 1 1 1 11 3 3

96 Penn ESE535 Spring 2015 -- DeHon 96 DF-Map Example 2? 11 1 1 1 11 3 3

97 Penn ESE535 Spring 2015 -- DeHon 97 DF-Map Example 2? 11 1 1 1 11 3 3 3

98 Penn ESE535 Spring 2015 -- DeHon 98 DF-Map Example 23 11 1 1 1 11

99 Penn ESE535 Spring 2015 -- DeHon 99 DF-Map Example 23 11 1 1 ? 1 11

100 Penn ESE535 Spring 2015 -- DeHon 100 DF-Map Example 23 11 1 1 ? 1 11

101 Penn ESE535 Spring 2015 -- DeHon 101 DF-Map Example 23 11 1 1 ? 1 11 4

102 Penn ESE535 Spring 2015 -- DeHon 102 DF-Map Example 23 11 1 1 ? 1 11 4

103 Penn ESE535 Spring 2015 -- DeHon 103 DF-Map Example 23 11 1 1 ? 1 11 4 3

104 Penn ESE535 Spring 2015 -- DeHon 104 DF-Map Example 23 11 1 1 ? 1 11 4 3

105 Penn ESE535 Spring 2015 -- DeHon 105 DF-Map Example 23 11 1 1 ? 1 11 4 3 3

106 Penn ESE535 Spring 2015 -- DeHon 106 DF-Map Example 23 11 1 1 ? 1 11 4 3 3

107 Penn ESE535 Spring 2015 -- DeHon 107 DF-Map Example 23 11 1 1 ? 1 11 4 3 3 4

108 Penn ESE535 Spring 2015 -- DeHon 108 DF-Map Example 23 11 1 1 ? 1 11 4 3 3 4

109 Penn ESE535 Spring 2015 -- DeHon 109 DF-Map Example 23 11 1 1 ? 1 11 4 3 3 4 3

110 Penn ESE535 Spring 2015 -- DeHon 110 DF-Map Example 23 11 1 1 3 1 11

111 Penn ESE535 Spring 2015 -- DeHon 111 DF-Map Example ? 23 11 1 1 3 1 11

112 Penn ESE535 Spring 2015 -- DeHon 112 DF-Map Example ? 23 11 1 1 3 1 11

113 Penn ESE535 Spring 2015 -- DeHon 113 DF-Map Example ? 23 11 1 1 3 1 11 9

114 Penn ESE535 Spring 2015 -- DeHon 114 DF-Map Example ? 23 11 1 1 3 1 11 9

115 Penn ESE535 Spring 2015 -- DeHon 115 DF-Map Example ? 23 11 1 1 3 1 11 9 7

116 Penn ESE535 Spring 2015 -- DeHon 116 DF-Map Example ? 23 11 1 1 3 1 11 9 7

117 Penn ESE535 Spring 2015 -- DeHon 117 DF-Map Example ? 23 11 1 1 3 1 11 9 7 5

118 Penn ESE535 Spring 2015 -- DeHon 118 DF-Map Example 5 23 11 1 1 3 1 11

119 Penn ESE535 Spring 2015 -- DeHon 119 DF-Map Example 5 23 11 1 1 3 1 11

120 Penn ESE535 Spring 2015 -- DeHon 120 Composing Don’t need minimum delay off the critical path Don’t always want/need minimum delay Composite: –map with flowmap –Greedy decomposition of “most promising” non-critical nodes –DF-map these nodes

121 Penn ESE535 Spring 2015 -- DeHon 121 Variations on a Theme

122 Penn ESE535 Spring 2015 -- DeHon 122 Applicability to Non-LUTs? E.g. LUT Cascade –can handle some functions of K inputs How apply?

123 Penn ESE535 Spring 2015 -- DeHon 123 Adaptable to Non-LUTs Sketch: –Initial decomposition to nodes that will fit –Find max volume, min-height K-feasible cut –ask if logic block will cover yes  done no  exclude one (or more) nodes from block and repeat –exclude == collapse into start set nodes –this makes heuristic

124 Penn ESE535 Spring 2015 -- DeHon 124 Partitioning? Effectively partitioning logic into clusters –LUT cluster unlimited internal “gate” capacity limited I/O (K) simple delay cost model –1 cross between clusters –0 inside cluster

125 Penn ESE535 Spring 2015 -- DeHon 125 Partitioning Clustering –if strongly I/O limited, same basic idea works for partitioning to components typically: partitioning onto multiple FPGAs assumption: inter-FPGA delay >> intra-FPGA delay –w/ area constraints similar to non-LUT case –make min-cut –will it fit? –Exclude some LUTs and repeat

126 Penn ESE535 Spring 2015 -- DeHon 126 Clustering for Delay W/ no IO constraint area is monotone property DP-label forward with delays –grab up largest labels (greatest delays) until fill cluster size Work backward from outputs creating clusters as needed

127 Penn ESE535 Spring 2015 -- DeHon 127 Area and IO? Real problem: –FPGA/chip partitioning Doing both optimally is NP-hard Heuristic around IO cut first should do well –(e.g. non-LUT slide) –[Yang and Wong, FPGA’94]

128 Penn ESE535 Spring 2015 -- DeHon 128 Partitioning To date: –primarily used for 2-level hierarchy I.e. intra-FPGA, inter-FPGA Open/promising –adapt to multi-level for delay-optimized partitioning/placement on fixed-wire schedule localize critical paths to smallest subtree possible?

129 Penn ESE535 Spring 2015 -- DeHon 129 Summary Optimal LUT mapping NP-hard in general –fanout, replication, …. K-LUTs makes delay optimal feasible –single constraint: IO capacity –technique: max-flow/min-cut Heuristic adaptations of basic idea to capacity constrained problem –promising area for interconnect delay optimization

130 Penn ESE535 Spring 2015 -- DeHon 130 Today’s Big Ideas: IO may be a dominant cost –limiting capacity, delay Exploit structure: K-LUTs Mixing dominant modes –multiple objectives Define optimally solvable subproblem –duplication free mapping

131 Penn ESE535 Spring 2015 -- DeHon 131 Admin Reading Wednesday on web Assignment 3 due Thursday


Download ppt "Penn ESE535 Spring 2015 -- DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2015 Clustering (LUT Mapping, Delay)"

Similar presentations


Ads by Google