Presentation is loading. Please wait.

Presentation is loading. Please wait.

Resource Sharing and Binding

Similar presentations


Presentation on theme: "Resource Sharing and Binding"— Presentation transcript:

1 Resource Sharing and Binding
A SoC Design Automation School of EECS Seoul National University

2 Data-Dominated Circuits
Resource sharing in non-hierarchical CDFG Compatibility graph G+(V, E) E={(vi,vj)|t(vi)=t(vj) and ((ti+di£tj) or (tj+dj£ti)), i,j=1,...,nops} same type no concurrency transitive orientation property --> G+(V, E) is a comparability graph --> minimum clique partitioning in polynomial time NOP * - v0 v1 v2 v6 v3 v4 v7 v8 v10 v11 v9 v5 vn C-step 1 C-step 2 C-step 3 C-step 4 + < 3 7 6 1 2 8 Mult 5 4 11 10 9 ALU compatibility graph

3 Data-Dominated Circuits
Conflict graph G-(V, E) complement of G+(V, E) vertex color same color --> no conflict --> can share one resource chromatic number of G-(V, E)=clique cover number of G+(V, E) NOP * - v0 v1 v2 v6 v3 v4 v7 v8 v10 v11 v9 v5 vn C-step 1 C-step 2 C-step 3 C-step 4 + < 3 7 6 1 2 8 5 4 11 10 9 conflict graph

4 Data-Dominated Circuits
Conflict graph G-(V, E) as an interval graph execution interval [ti, ti + di - 1] intersection between two intervals --> edge minimum vertex coloring in polynomial time (left edge algorithm) 1 7 3 8 6 2 4 5 9 11 10 NOP * - v0 v1 v2 v6 v3 v4 v7 v8 v10 v11 v9 v5 vn C-step 1 C-step 2 C-step 3 C-step 4 + <

5 Data-Dominated Circuits
NOP v0 1 7 3 8 6 2 4 5 9 11 10 v1 * * + v2 v10 C-step 1 * * v6 < v3 v11 C-step 2 - * v7 v4 * v8 C-step 3 - + v9 v5 C-step 4 NOP vn

6 Data-Dominated Circuits
Resource sharing in hierarchical CDFG Model call we can flatten the hierarchy to compute the compatibility of operations across different levels of hierarchy + + a a * * * b * b + +

7 Data-Dominated Circuits
single call --> interval graph can be used multiple calls --> not an interval graph when the hierarchy is to be preserved --> coloring a general graph is NP-hard + a * a 2 a 2 * 2 * 4 3 * 3 3 * 4 * 4 not chordal --> not an interval graph a * + * a

8 Data-Dominated Circuits
Iteration unroll similar to the case of model call Branching a c NOP NOP a a c d BR c d d b b b not chordal --> not an interval graph NOP NOP same type --> compatible

9 General Circuits Register sharing * * * * - * - Lifetime of variable
Variables alive in non-overlapping intervals or under alternative conditions are compatible Compatibility graph --> min. clique partitioning Conflict graph --> min. vertex coloring Non-hierarchical --> intervals --> left edge algorithm v1 * * v2 conflict graph (interval graph) z1 z2 z1 z2 z1 z2 * * v6 v3 z4 z3 z3 z4 - * z3 z4 v7 v4 z5 z6 z5 z6 - z5 z6 v5

10 + * * * < * - * * - + Hierarchical (iteration) u 3 x u dx x u dx x
General Circuits Hierarchical (iteration) u 3 x u dx x u dx x dx u y x v1 + * * v2 v10 x 3 z1 z2 a z1 z2 x * v6 < v3 * v11 z3 z4 dx z3 z4 c - v7 v8 * v4 * z7 z5 z6 y z5 z6 z7 - + v9 v5 u y u y

11 circular-arc conflict graph
General Circuits z1 z2 u x z1 z2 1 u y 4 2 x z3 z4 y z4 z3 3 z5 z6 z7 z7 z6 z5 circular-arc conflict graph not a chordal graph --> intractable

12 Multi-port memory binding
General Circuits Multi-port memory binding Given a scheduled graph, minimize the number of ports of the memory where xil is 1 if i-th variable is accessed at step l . Given a, the number of ports of the multi-port memory, maximize the number of variables to be stored in the memory. That is, maximize 1T b= subject to where bT = [b1, b2, ..., bnvar], bi = 1 if i-th variable is stored in the memory.

13 Example (assume all ports are read/write ports)
General Circuits Example (assume all ports are read/write ports) time-step 1 : z3 = z1 + z2; z12 = z1 time-step 2 : z5 = z3 + z4; z7 = z3 * z6; z13 = z3 time-step 3 : z8 = z3 + z5; z9 = z1 + z7; z11 = z10 / z5 time-step 4 : z14 = z11 Ù z8; z15 = z12 Ú z9 time-step 5 : z1 = z14; z2 = z15 maximize subject to b1 + b2 + b3 + b12 £ a b3 + b4 + b5 + b6 + b7 + b13 £ a b1 + b3 + b5 + b7 + b8 + b9 + b10 + b11 £ a b8 + b9 + b11 + b12 + b14 + b15 £ a b1 + b2 + b14 + b15 £ a a=1 --> b2=b4=b8=1 --> only z2, z4, and z8 can be stored a=2 --> z2, z4, z5, z10, z12, z14 can be stored

14 * * * * - * - Bus sharing and binding
General Circuits Bus sharing and binding Analogous to multi-port memory binding problem minimize the number of buses maximize the number of data transfers Example1 number of write buses = aw number of read buses = ar w1 + w2 £ aw r1 + r2 £ ar w3 + w4 £ aw r3 + r4 £ ar w5 + w6 £ aw r5 + r6 £ ar Example 2 number of read/write buses=a w1 + w2 £ a r1 + r2 + w3 + w4 £ a r3 + r4 + w5 + w6 £ a r5 + r6 £ a v1 * * v2 z1 z2 * * v6 v3 z4 z3 - * v7 v4 z5 z6 - v5

15 Multiplexers Unconstrained minimum-area binding Example
-> -> General Circuits Multiplexers Unconstrained minimum-area binding Example n add operations a adders > 0 then area increases as a increases < 0 then area decreases as a increases may omit 2: 1. mux area accounts for two muxes 2. consider operand sharing --> approximated average ->

16 Weighted compatibility graph
General Circuits Weighted compatibility graph Spread the mux cost over the operations share --> overhead (mux+wiring) --> assign weights to the graph --> the problem becomes weighted clique partitioning problem --> how to weight and how to solve?

17 Example each vertex has the triple dedicated:
General Circuits Example each vertex has the triple dedicated: v1, v2, v3 share a resource: 1 3 2 4 4 1 2 3

18 chaining is considered
General Circuits Performance-constrained Add performance constraint and minimize area area = cT a + mux_area(B) + wire_area(B) where cT a = [area1, area2, ... areanres] [a1 a2 ... anres]T di: propagation delay of functional resource B: binding f: cycle time mux_delay(B), wire_delay(B), mux_area(B), wire_area(B): non-linear functions of B Performance-directed binding Minimize path delay More functional resource less mux's --> less mux delay more area --> more wire delay path _ delay = å d + mux _ delay ( B ) + wire _ delay ( B ) < f " path i i Î path chaining is considered

19 Module Selection Problem
Same operation with different resource types Ripple-carry adder, carry look-ahead adder --> different area, propagation delay Serial, parallel --> different area, cycle time, execution delay in cycles Example: 32bit x 32bit multiplier fully serial multiplier: (area, delay in cycles) = (1, 1024) serial-parallel multiplier: (area, delay in cycles) = (32, 32) fully parallel multiplier: (area, delay in cycles) = (1024, 1) Module selection and scheduling Module selection --> execution delay --> scheduling Module selection and binding Same module must be selected for operations sharing a resource

20 Module Selection Problem
Minimize latency using fastest resource types then replace with slower and smaller resource types for non-critical operations Example mult (area, delay) = (5, 1), (2, 2) ALU (1, 1) latency=4 v1, v2, v3 : two fast mult v8, v6 or v7 : non-critical --> small mult --> use just two fast mult (area 10) sharing is impossible NOP v0 v1 * * v2 + v10 C-step 1 * * v6 < v3 v11 C-step 2 - v8 v4 * v7 * C-step 3 - + v9 v5 C-step 4 NOP vn

21 Module Selection Problem
latency = 5 v1, v2, v3, v7 : one fast mult v6, v8 : one small mult area = 7 NOP v0 * v1 * v6 C-step 1 * v2 + v10 C-step 2 * v8 < C-step 3 v3 * v11 - v4 * v7 C-step 4 - + v9 v5 C-step 5 NOP vn

22 Module Selection Problem
Module selection and resource sharing Example adder vs. ALU dedicated resource: area = 3 areaadd+ areaALU {v1, v2, v3}, {v4}: area = areaadd+ areaALU + 2 areaDmux {v2, v3}, {v1, v4}: v1 < + v4 v3 + + v2 v1 < + v4 v3 + + v2

23 Resource Sharing and Binding for Pipelined Circuits
Operations with start time l + pd0 conflict with each other for p Î Z example d0 = 2 3 1 8 v1 * + * * v2 v6 v10 stage 1 C-step 1 7 6 2 * * < compatibility graph v3 * v7 v8 v11 C-step 2 9 - v4 + C-step 1 stage 2 4 10 v9 - v5 C-step 2 5 11 v1 + * * v2 * - v6 v10 v4 + v9 C-step 1 * * v8 < - + v3 * v7 v11 v5 C-step 2

24 Resource Sharing and Binding for Pipelined Circuits
Pipelining with branching K. Hwang, A. Casavant, M. Dragomirecky, and M. d'Abreu, "Constrained conditional resource sharing in pipeline synthesis," Proc. ICCAD, Nov Alternative path operations may not be compatible Twisted pair: only one pair can share a resource if (cond ==1) { d = a + b; y = c * d; } else { e = a * b; y = c + e; a b c a b c + * d e * + y y true block false block

25 Resource Sharing and Binding for Pipelined Circuits
+ a b c + d * * reg reg reg reg condi MUX MUX e y + true block * y reg reg false block + a b c * condi-1 MUX a b c e reg y + + d * y false block y true block


Download ppt "Resource Sharing and Binding"

Similar presentations


Ads by Google