Presentation is loading. Please wait.

Presentation is loading. Please wait.

ELEC692 VLSI Signal Processing Architecture Lecture 3

Similar presentations


Presentation on theme: "ELEC692 VLSI Signal Processing Architecture Lecture 3"— Presentation transcript:

1 ELEC692 VLSI Signal Processing Architecture Lecture 3
Retiming

2 Retiming - Introduction
Retiming is a transformation technique to change the locations of delay element such that The input/output characteristics are kept The latency of the system is not changed The critical path of the system is reduced The number of registers is reduced The power consumption is reduced D retiming D D D D (4) (4) (2) retiming (2) A B A B 2D D Critical path=6, loop bound=6/2=3 Critical path=4, loop bound=6/2=3

3 An IIR Example retiming (1) x(n) y(n) (1) x(n) y(n) D D a w(n) D 2D a
(2) D (1) (2) w1(n) b b D w2(n) (2) (2) Critical path delay = 3 unit # of register = 4 Critical path delay = 2 unit # of register = 5

4 Retiming for power consumption
Placing registers at the input of nodes with large capacitances can reduce the switching activities at these nodes, which can lead to low-power solution.

5 Quantitative Description of retiming
Retiming maps a circuit (graph) G to a retimed circuit (graph) Gr. A retiming solution is characterized by a value r(V) for each node V in the graph G. R(V): the number of delays moved from the output edge of the node V to each of its input edges Let w(e) denote the weight of the edge e in the original graph G, and let wr(e) denote the weight of the edge e in the retimed graph Gr. The weight of the edge U->V in the retimed graph is computed by wr(e) = w(e) + r(V) – r(U) The IIR example A retiming solution is feasible if wr(e) >=0 holds for all edges 1 2 3 4 (1) (2) 2D D (1) r(1)=0, r(2)=1 r(3)=0 r(4)=0 1 2D D If r(1)=0,r(2)=-1,r(3)=0 and r(4)=0, the retiming solution is not feasible as wr(3->2) will be equal to -1. D (2) (1) 2 3 D 4 (2)

6 Properties of Retiming
Paths: a sequence of nodes and edges, Weight of the path p: delay of the path p: Cycle: also a sequence of nodes and edges, Weight of the path p: delay of the path p:

7 Properties of Retiming
Property 1: The weight of the retimed path p=V0->V1->…->Vk is given by wr(p)=w(p)+r(Vk)-r(V0). Property 2: Retiming does not change the number of delays in a cycle. Property 3: Retiming does not alter the iteration bound in a DFG. Property 4: Adding the constant value j to the retiming value of each node does not change the mapping from G to Gr.

8 Retiming Technique Cutset retiming and pipelining
Retiming to minimize the clock period and retiming to minimize the number of registers

9 Cutset Retiming and Pipelining
A cutset is a set of edges that can be removed from the graph to create 2 disconnected subgraphs. Only affects the weights of the edges in the cutset. If the 2 disconnected subgraphs are G1 and G2, cutset retiming consists of adding k delays to each edge from G1 to G2 and removing k delays from each edge from G2 to G1. 1 2 3 4 (1) (2) 2D D G1 (1) 1 1 3D +kD D D +kD D (2) 2 3 (1) 2 3 -kD 4 4 (2)

10 Feasibility of cutset retiming
Feasibility: for the retimed graph, wr(e) >=0 must hold for all edges e in Fr. Cutset retiming by adding k delay, from edges from G1 to G2, we have From edges from G2 to G1, we have Combining two, we have

11 Node retiming Cutset retiming – If one of the subgraph G2 is a single node and the other subgraph is the rest of the graph minus the edges going into and out of the chosen node. We call this node retiming G1 1 2 3 4 (1) (2) 2D D 1 2 3 4 (1) (2) 2D D 1 2D D 2 3 G2 Node retiming 4 Subtracting one delay from each edge outgoing from the node Adding one delay to each edge incident into the node

12 Node Retiming To recap the feasibility:
Feasibility constraints: for each edge U->V of the retimed graph, the number of delay elements, i.e. the weight of the edge must be non-negative, we have Where wr(e) and w(e) are the number of delay elements on the edge U->V after and before retiming, respectively For a circuit graph, there is a retiming design space defined by the system of inequalities (each edge forms an inequality. This requires solving systems of inequalities

13 Example again Original weights wr(e)=w(e)+r(V)-r(U) G1 1 2 3 4 (1) (2)
2D D 1 2D 1 2 3 4 (1) (2) 2D D D 2 3 G2 4 Original weights wr(e)=w(e)+r(V)-r(U) Choosing retiming values:r(1)=0,r(2)=1,r(3)=0,r(4)=0, G2=1,G1=0

14 Pipelining – Feedforward Cutset
Pipelining – special case of cutset retiming, where there are not edges in the cutset from the subgraph G2 to the subgraph G1, termed as feedforward cutset. D In D D In D G1 a a a a a a cutset G2 D In D Retime with k=2 a a a 2D 2D 2D

15 Example-Lattice Filter
Critical path D D D Tcritical = 2Tmult+(N+1)Tadd where N is the number of stages Cutset retiming New critical path D D D D Tcritical = 4Tmult+4Tadd , a constant independent of # of stages, N

16 N slow-down with retiming
Cutset retiming is used in combination with slow-down. Replace each delay in the graph with N delays to create an N-slow version of the graph Perform cutset retiming on the N-slow graph N-1 null operations must be interleaved after each useful signal sample to preserve the functionality of the algorithm (1) (1) (1) (1) 1 2 1 2 Tclk=2 Titer=2*2 = 4 2-slow transformation D 2D Tclk=2 Titer=2 Retiming of the 2-slow graph D (1) (1) 1 2 D Tclk=1 Titer=2

17 Another example- A 100-stage lattice filter
Critical path Stage 1 Stage 2 Stage 100 …… D …… D D D Tcritical = 2Tmult+(N+1)Tadd where N is the number of stages In this case it is 2Tmult+101*Tadd 2-slow transformation Critical path Stage 1 Stage 2 Stage 100 …… 2D …… 2D 2D 2D

18 Another example- A 100-stage lattice filter
Critical path Stage 1 Stage 2 Stage 100 …… 2D …… 2D 2D 2D Inserting null operation to preserve behavior Cutset retiming Critical path Stage 1 Stage 2 Stage 100 D …… D D 2D …… D D D Tcritical = 2Tmult+2 Tadd Since the circuit is 2-slow, the sample period is 2*Tcritical

19 Retiming for clock period minimization
Minimum feasible clock period F(G): Computation time of the critical path, which is the path with the longest computation time among all paths with no delays F(G)= max{t(p):w(p)=0} (w(p) is the # of delay of the path. The problem: Given a circuit graph G, find the best retimed circuit graph Gr, for all possible and feasible retiming choices, such that F(Gr) is minimum, i.e. find a retiming solution r0 such that F(Gr0) <= F(Gr) for any other retiming solution r. Definition: W(U,V)- minimum # of registers on any path from node U to node V D(U,V)- maximum computation time among all paths from U to V with weight W(U,V)

20 Retiming for clock period minimization (Cont.)
Overall algorithm: Compute W(U,V) and D(U,V) From the W(U,V) and D(U,V), determie if there is a retiming solution that can achieve a desired clock period. Given a clock period c, find a feasible retiming solution such that F(G)< c if the following constraints hold Feasibility constraint: r(U)-r(V) <=w(e) for every edge U->V of G, (make sure # of delays on each edge in the retimed graph is nonnegative Critical path constraint: r(U)-r(V) <= W(U,V)-1 for all vertices U,V in G such that D(U,V)>c, If there is a retimed solution, we can try a smaller c until there is not retimed solution.

21 How to compute W(U,V) and D(U,V)
Let M=tmaxn, where tmax is the maximum computation time of the nodes in G and n is the number of nodes in G. Form a new graph Gn which is the same as G except the edge weights are replaced by w’(e)=Mw(e)-t(U) for all edges U->V. Solve the all-pair shortest path problem on G’. Let S’UV be the shortest path from U->V. If U /= V, then and D(U,V)= MW(U,V)-S’UV+t(V). If U=V, then W(U,V)=0 and D(U,V)=t(U).

22 Example Step 1: tmax = 2, n = 4, hence M = 8 1 2 3 4 (1) (2) 2D D
Step 2: The new graph G’ is shown here Step 3: The shortest path can be found by standard shortest path algorihtm and the value of S’UV is as follows: S’UV 1 2 3 4 12 5 7 15 14 22 -2 20 1 2 3 4 (1) (2) 15 7 -2 Step 4: The W(U,V) and D(U,V) can be found as follows W(U,V) 1 2 3 4 D(U,V) 1 2 3 4 6

23 Example Given the values of W(U,V) and D(U,V), determine whether there is a retiming solution that can achieved a desired clock period c. Let c = 3 in the previous example: Critical path constraints for all edges: Feasibility constraints for all edges: Feasible solution r(1)=r(2)=r(3)=r(4)=0 i.e. no retiming is needed

24 Example (cont.) Now we try c = 2
Critical path constraints for all edges: Now we try c = 2 Feasibility constraints for all edges: Feasible solution r(1)=-1,r(2)=0, r(3)=-1, r(4)=-1 The retimed solution (1) 1 2D D D (2) (1) 2 3 D 4 (2)

25 Retiming for register minimization
Find a retiming solution that uses the minimum # of registers while satisfying the clock period constraints, i.e. find a feasible solution with minimum # of registers in the design space constrained by the feasibility and critical path constraints. Maximum fanout observation: if a node has several output edges carrying the same signal, the number of registers to implement these edge is the maximum number of registers on any one of the edge. V1 V1 D U D 2D 4D V3 U V2 3D V21 V3 7D

26 Retiming for register minimization (cont.)
# of registers required to implement the output edges of the node V in the retimed graph is The total register cost in the retimed circuit is The formulation of retiming to minimize the # of register under the constriant that the clock period is not greater than c is: Minimize COST subject to (fanout constraint) RV>= wr(e) for all V and all edges coming from V. (feasibility constraint): r(U)-r(V) <=w(e) for every edge U->V (clock period constraint): r(U)-r(V) <= w(U,V)-1 for all nodes U,V such that D(U,V) >c

27 Retiming for register minimization (cont.)
The above formulation can be solved by using Interger Linear Programming (ILP) The details of solving ILP will not be covered in this course.


Download ppt "ELEC692 VLSI Signal Processing Architecture Lecture 3"

Similar presentations


Ads by Google