Presentation is loading. Please wait.

Presentation is loading. Please wait.

Penn ESE535 Spring 2008 -- DeHon 1 ESE535: Electronic Design Automation Day 3: January 27, 2008 Clustering (LUT Mapping, Delay) Please work preclass example.

Similar presentations


Presentation on theme: "Penn ESE535 Spring 2008 -- DeHon 1 ESE535: Electronic Design Automation Day 3: January 27, 2008 Clustering (LUT Mapping, Delay) Please work preclass example."— Presentation transcript:

1 Penn ESE535 Spring 2008 -- DeHon 1 ESE535: Electronic Design Automation Day 3: January 27, 2008 Clustering (LUT Mapping, Delay) Please work preclass example before we start lecture.

2 Penn ESE535 Spring 2008 -- DeHon 2 Mapping Results What delay were you able to achieve? –What area? How did you approach mapping?

3 Penn ESE535 Spring 2008 -- DeHon 3 Background Question Experience with FPGAs? What provides programmable logic in an FPGA?

4 Penn ESE535 Spring 2008 -- DeHon 4 Today How do we map to LUTs? What happens when delay dominates? Lessons… –for non-LUTs –for delay-oriented partitioning

5 Penn ESE535 Spring 2008 -- DeHon 5 LUT Mapping Problem: Map logic netlist to LUTs –minimizing area –minimizing delay Old problem? –Technology mapping? (Last Wednesday) –How big is the library? 2 2 K gates in library

6 Penn ESE535 Spring 2008 -- DeHon 6 Simplifying Structure K-LUT can implement any K-input function

7 Penn ESE535 Spring 2008 -- DeHon 7 Cost Function Delay: number of LUTs in critical path –doesn’t say delay in LUTs or in wires –does assume uniform interconnect delay Area: number of LUTs –Assumes adequate interconnect to use LUTs

8 Penn ESE535 Spring 2008 -- DeHon 8 LUT Mapping NP-Hard in general Fanout-free -- can solve optimally given decomposition –(but which one?) Delay optimal mapping achievable in Polynomial time Area w/ fanout NP-complete

9 Penn ESE535 Spring 2008 -- DeHon 9 Preliminaries What matters/makes this interesting? –Area / Delay target –Decomposition –Fanout replication reconvergent

10 Penn ESE535 Spring 2008 -- DeHon 10 Area vs. Delay

11 Penn ESE535 Spring 2008 -- DeHon 11 Decomposition

12 Penn ESE535 Spring 2008 -- DeHon 12 Decomposition

13 Penn ESE535 Spring 2008 -- DeHon 13 Fanout: Replication

14 Penn ESE535 Spring 2008 -- DeHon 14 Fanout: Replication

15 Penn ESE535 Spring 2008 -- DeHon 15 Fanout: Reconvergence

16 Penn ESE535 Spring 2008 -- DeHon 16 Fanout: Reconvergence

17 Penn ESE535 Spring 2008 -- DeHon 17 Monotone Property Does cost function increase monotonicly as more of the graph is included? (do all subsets have property) –gate count? –I/o? Important? –How far back do we need to search?

18 Penn ESE535 Spring 2008 -- DeHon 18 Definition Cone: set of nodes in the recursive fanin of a node

19 Penn ESE535 Spring 2008 -- DeHon 19 Example Cones

20 Penn ESE535 Spring 2008 -- DeHon 20 Delay

21 Penn ESE535 Spring 2008 -- DeHon 21 Dynamic Programming Optimal covering of a logic cone is: –Minimum cost (all possible coverings) Evaluate costs of each node based on: –cover node –cones covering each fanin to node cover Evaluate node costs in topological order Key: are calculating optimal solutions to subproblems –only have to evaluate covering options at each node

22 Penn ESE535 Spring 2008 -- DeHon 22 Flowmap Key Idea: –LUT holds anything with K inputs –Use network flow to find cuts  logic can pack into LUT including reconvergence …allows replication –Optimal depth arise from optimal depth solution to subproblems

23 Penn ESE535 Spring 2008 -- DeHon 23 MaxFlow Set all edge flows to zero –F[u,v]=0 While there is a path from s,t –(breadth-first-search) –for each edge in path f[u,v]=f[u,v]+1 –f[v,u]=-f[u,v] –When c[v,u]=f[v,u] remove edge from search O(|E|*cutsize) [Our problem simpler than general case CLRS]

24 Penn ESE535 Spring 2008 -- DeHon 24 Delay objective: –minimum height, K-feasible cut –I.e. cut no more than K edges –start by bounding fanin  K Height of node will be: –height of predecessors or –one greater than height of predecessors Check shorter first Flowmap 11 11 2

25 Penn ESE535 Spring 2008 -- DeHon 25 Flowmap Construct flow problem –sink  target node being mapped –source  start set (primary inputs) –flow infinite into start set –flow of one on each link –to see if height same as predecessors collapse all predecessors of maximum height into sink (single node, cut must be above) height +1 case is trivially true

26 Penn ESE535 Spring 2008 -- DeHon 26 11 2 2 1 1 Example Subgraph Target: K=4

27 Penn ESE535 Spring 2008 -- DeHon 27 3 11 2 2 1 1 Trivial: Height +1

28 Penn ESE535 Spring 2008 -- DeHon 28 11 2 2 1 1 Collapse at max height

29 Penn ESE535 Spring 2008 -- DeHon 29 2 11 2 2 1 1 Collapse not work (different/larger graph) Forced to label height+1

30 Penn ESE535 Spring 2008 -- DeHon 30 2 1 1 1 2 1 Reconvergent fanout (different/larger graph) Can label at height 1

31 Penn ESE535 Spring 2008 -- DeHon 31 Flowmap Max-flow Min-cut algorithm to find cut Use augmenting paths to until discover max flow > K O(K|e|) time to discover K-feasible cut –(or that does not exist) Depth identification: O(KN|e|)

32 Penn ESE535 Spring 2008 -- DeHon 32 Flowmap Min-cut may not be unique

33 Penn ESE535 Spring 2008 -- DeHon 33 Two 3-cuts

34 Penn ESE535 Spring 2008 -- DeHon 34 Flowmap Min-cut may not be unique To minimize area achieving delay optimum –find max volume min-cut Compute max flow  find min cut remove edges consumed by max flow DFS from source Compliment set is max volume set

35 Penn ESE535 Spring 2008 -- DeHon 35 Graph

36 Penn ESE535 Spring 2008 -- DeHon 36 Graph: maxflow (K=3)

37 Penn ESE535 Spring 2008 -- DeHon 37 Graph: BFS source Reachable

38 Penn ESE535 Spring 2008 -- DeHon 38 Max Volume min-cut

39 Penn ESE535 Spring 2008 -- DeHon 39 Flowmap Covering from labeling is straightforward –process in reverse topological order –allocate identified K-feasible cut to LUT –remove node –postprocess to minimize LUT count Notes: –replication implicit (covered multiple places) –nodes purely internal to one or more covers may not get their own LUTs

40 Penn ESE535 Spring 2008 -- DeHon 40 Flowmap Roundup Label –Work from inputs to outputs –Find max label of predecessors –Collapse new node with all predecessors at this label –Can find flow cut  K? Yes: mark with label (find max-volume cut extent) No: mark with label+1 Cover –Work from outputs to inputs –Allocate LUT for identified cluster/cover –Recurse covering selection on inputs to identified LUT

41 Penn ESE535 Spring 2008 -- DeHon 41 Area

42 Penn ESE535 Spring 2008 -- DeHon 42 DF-Map Duplication Free Mapping –can find optimal area under this constraint –(but optimal area may not be duplication free) [Cong+Ding, IEEE TR VLSI Sys. V2n2p137]

43 Penn ESE535 Spring 2008 -- DeHon 43 Maximum Fanout Free Cones MFFC: bit more general than trees

44 Penn ESE535 Spring 2008 -- DeHon 44 MFFC Follow cone backward end at node that fans out (has output) outside the code

45 Penn ESE535 Spring 2008 -- DeHon 45 MFFC example

46 Penn ESE535 Spring 2008 -- DeHon 46 MFFC example

47 Penn ESE535 Spring 2008 -- DeHon 47 DF-Map Partition into graph into MFFCs Optimally map each MFFC In dynamic programming –for each node examine each K-feasible cut –note: this is very different than flowmap where only had to examine a single cut pick cut to minimize cost –1 +  MFFCs for fanins

48 Penn ESE535 Spring 2008 -- DeHon 48 DF-Map Example Cones?

49 Penn ESE535 Spring 2008 -- DeHon 49 DF-Map Example

50 Penn ESE535 Spring 2008 -- DeHon 50 DF-Map Example

51 Penn ESE535 Spring 2008 -- DeHon 51 DF-Map Example

52 Penn ESE535 Spring 2008 -- DeHon 52 DF-Map Example

53 Penn ESE535 Spring 2008 -- DeHon 53 DF-Map Example

54 Penn ESE535 Spring 2008 -- DeHon 54 DF-Map Example Start mapping cone

55 Penn ESE535 Spring 2008 -- DeHon 55 DF-Map Example 11 11

56 Penn ESE535 Spring 2008 -- DeHon 56 DF-Map Example ? 11 11

57 Penn ESE535 Spring 2008 -- DeHon 57 DF-Map Example ? 11 11

58 Penn ESE535 Spring 2008 -- DeHon 58 DF-Map Example ? 11 11

59 Penn ESE535 Spring 2008 -- DeHon 59 DF-Map Example 1 11 11

60 Penn ESE535 Spring 2008 -- DeHon 60 DF-Map Example 11 1 1 1 11 Similar to previous

61 Penn ESE535 Spring 2008 -- DeHon 61 DF-Map Example ? 11 1 1 1 11

62 Penn ESE535 Spring 2008 -- DeHon 62 DF-Map Example ? 11 1 1 1 11

63 Penn ESE535 Spring 2008 -- DeHon 63 DF-Map Example ? 11 1 1 1 11 3

64 Penn ESE535 Spring 2008 -- DeHon 64 DF-Map Example ? 11 1 1 1 11

65 Penn ESE535 Spring 2008 -- DeHon 65 DF-Map Example ? 11 1 1 1 11 2

66 Penn ESE535 Spring 2008 -- DeHon 66 DF-Map Example ? 11 1 1 1 11

67 Penn ESE535 Spring 2008 -- DeHon 67 DF-Map Example ? 11 1 1 1 11 3

68 Penn ESE535 Spring 2008 -- DeHon 68 DF-Map Example 2 11 1 1 1 11

69 Penn ESE535 Spring 2008 -- DeHon 69 DF-Map Example 2? 11 1 1 1 11

70 Penn ESE535 Spring 2008 -- DeHon 70 DF-Map Example 2? 11 1 1 1 11

71 Penn ESE535 Spring 2008 -- DeHon 71 DF-Map Example 2? 11 1 1 1 11 3

72 Penn ESE535 Spring 2008 -- DeHon 72 DF-Map Example 2? 11 1 1 1 11 3

73 Penn ESE535 Spring 2008 -- DeHon 73 DF-Map Example 2? 11 1 1 1 11 3 3

74 Penn ESE535 Spring 2008 -- DeHon 74 DF-Map Example 2? 11 1 1 1 11 3 3

75 Penn ESE535 Spring 2008 -- DeHon 75 DF-Map Example 2? 11 1 1 1 11 3 3 3

76 Penn ESE535 Spring 2008 -- DeHon 76 DF-Map Example 23 11 1 1 1 11

77 Penn ESE535 Spring 2008 -- DeHon 77 DF-Map Example 23 11 1 1 ? 1 11

78 Penn ESE535 Spring 2008 -- DeHon 78 DF-Map Example 23 11 1 1 ? 1 11

79 Penn ESE535 Spring 2008 -- DeHon 79 DF-Map Example 23 11 1 1 ? 1 11 4

80 Penn ESE535 Spring 2008 -- DeHon 80 DF-Map Example 23 11 1 1 ? 1 11 4

81 Penn ESE535 Spring 2008 -- DeHon 81 DF-Map Example 23 11 1 1 ? 1 11 4 3

82 Penn ESE535 Spring 2008 -- DeHon 82 DF-Map Example 23 11 1 1 ? 1 11 4 3

83 Penn ESE535 Spring 2008 -- DeHon 83 DF-Map Example 23 11 1 1 ? 1 11 4 3 3

84 Penn ESE535 Spring 2008 -- DeHon 84 DF-Map Example 23 11 1 1 ? 1 11 4 3 3

85 Penn ESE535 Spring 2008 -- DeHon 85 DF-Map Example 23 11 1 1 ? 1 11 4 3 3 4

86 Penn ESE535 Spring 2008 -- DeHon 86 DF-Map Example 23 11 1 1 ? 1 11 4 3 3 4

87 Penn ESE535 Spring 2008 -- DeHon 87 DF-Map Example 23 11 1 1 ? 1 11 4 3 3 4 3

88 Penn ESE535 Spring 2008 -- DeHon 88 DF-Map Example 23 11 1 1 3 1 11

89 Penn ESE535 Spring 2008 -- DeHon 89 DF-Map Example ? 23 11 1 1 3 1 11

90 Penn ESE535 Spring 2008 -- DeHon 90 DF-Map Example ? 23 11 1 1 3 1 11

91 Penn ESE535 Spring 2008 -- DeHon 91 DF-Map Example ? 23 11 1 1 3 1 11 9

92 Penn ESE535 Spring 2008 -- DeHon 92 DF-Map Example ? 23 11 1 1 3 1 11 9 7

93 Penn ESE535 Spring 2008 -- DeHon 93 DF-Map Example ? 23 11 1 1 3 1 11 9 7

94 Penn ESE535 Spring 2008 -- DeHon 94 DF-Map Example ? 23 11 1 1 3 1 11 9 7 5

95 Penn ESE535 Spring 2008 -- DeHon 95 DF-Map Example 5 23 11 1 1 3 1 11

96 Penn ESE535 Spring 2008 -- DeHon 96 DF-Map Example 5 23 11 1 1 3 1 11

97 Penn ESE535 Spring 2008 -- DeHon 97 Composing Don’t need minimum delay off the critical path Don’t always want/need minimum delay Composite: –map with flowmap –Greedy decomposition of “most promising” non-critical nodes –DF-map these nodes

98 Penn ESE535 Spring 2008 -- DeHon 98 Variations on a Theme

99 Penn ESE535 Spring 2008 -- DeHon 99 Applicability to Non-LUTs? E.g. LUT Cascade –can handle some functions of K inputs How apply?

100 Penn ESE535 Spring 2008 -- DeHon 100 Adaptable to Non-LUTs Sketch: –Initial decomposition to nodes that will fit –Find max volume, min-height K-feasible cut –ask if logic block will cover yes  done no  exclude one (or more) nodes from block and repeat –exclude == collapse into start set nodes –this makes heuristic

101 Penn ESE535 Spring 2008 -- DeHon 101 Partitioning? Effectively partitioning logic into clusters –LUT cluster unlimited internal “gate” capacity limited I/O (K) simple delay cost model –1 cross between clusters –0 inside cluster

102 Penn ESE535 Spring 2008 -- DeHon 102 Partitioning Clustering –if strongly I/O limited, same basic idea works for partitioning to components typically: partitioning onto multiple FPGAs assumption: inter-FPGA delay >> intra-FPGA delay –w/ area constraints similar to non-LUT case –make min-cut –will it fit? –Exclude some LUTs and repeat

103 Penn ESE535 Spring 2008 -- DeHon 103 Clustering for Delay W/ no IO constraint area is monotone property DP-label forward with delays –grab up largest labels (greatest delays) until fill cluster size Work backward from outputs creating clusters as needed

104 Penn ESE535 Spring 2008 -- DeHon 104 Area and IO? Real problem: –FPGA/chip partitioning Doing both optimally is NP-hard Heuristic around IO cut first should do well –(e.g. non-LUT slide) –[Yang and Wong, FPGA’94]

105 Penn ESE535 Spring 2008 -- DeHon 105 Partitioning To date: –primarily used for 2-level hierarchy I.e. intra-FPGA, inter-FPGA Open/promising –adapt to multi-level for delay-optimized partitioning/placement on fixed-wire schedule localize critical paths to smallest subtree possible?

106 Penn ESE535 Spring 2008 -- DeHon 106 Summary Optimal LUT mapping NP-hard in general –fanout, replication, …. K-LUTs makes delay optimal feasible –single constraint: IO capacity –technique: max-flow/min-cut Heuristic adaptations of basic idea to capacity constrained problem –promising area for interconnect delay optimization

107 Penn ESE535 Spring 2008 -- DeHon 107 Today’s Big Ideas: IO may be a dominant cost –limiting capacity, delay Exploit structure: K-LUTs Mixing dominant modes –multiple objectives Define optimally solvable subproblem –duplication free mapping

108 Penn ESE535 Spring 2008 -- DeHon 108


Download ppt "Penn ESE535 Spring 2008 -- DeHon 1 ESE535: Electronic Design Automation Day 3: January 27, 2008 Clustering (LUT Mapping, Delay) Please work preclass example."

Similar presentations


Ads by Google