ELEC692 VLSI Signal Processing Architecture Lecture 3

Slides:



Advertisements
Similar presentations
Lecture 7. Network Flows We consider a network with directed edges. Every edge has a capacity. If there is an edge from i to j, there is an edge from.
Advertisements

Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
ADSP Lecture2 - Unfolding VLSI Signal Processing Lecture 2 Unfolding Transformation.
ECE 667 Synthesis and Verification of Digital Circuits
1 ECE734 VLSI Arrays for Digital Signal Processing Chapter 3 Parallel and Pipelined Processing.
Chapter 4 Retiming.
Introduction to Algorithms
Global Flow Optimization (GFO) in Automatic Logic Design “ TCAD91 ” by C. Leonard Berman & Louise H. Trevillyan CAD Group Meeting Prepared by Ray Cheung.
Clock Skewing EECS 290A Sequential Logic Synthesis and Verification.
Sequential Timing Optimization. Long path timing constraints Data must not reach destination FF too late s i + d(i,j) + T setup  s j + P s i s j d(i,j)
Combinatorial Algorithms
Assume array size is 256 (mult: 4ns, add: 2ns)
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
ELEC692 VLSI Signal Processing Architecture Lecture 4
ECE734 VLSI Arrays for Digital Signal Processing Algorithm Representations and Iteration Bound.
Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Shortest Paths Definitions Single Source Algorithms –Bellman Ford –DAG shortest path algorithm –Dijkstra All Pairs Algorithms –Using Single Source Algorithms.
A general approximation technique for constrained forest problems Michael X. Goemans & David P. Williamson Presented by: Yonatan Elhanani & Yuval Cohen.
Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.
Approximation Algorithms
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Continuous Retiming EECS 290A Sequential Logic Synthesis and Verification.
Shortest Paths Definitions Single Source Algorithms
Retiming with Interconnect and Gate Delay CUHK CSE CAD Group Dennis Tong 29 th Sept., 2003.
CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming.
EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.
1 Retiming Outline: ProblemProblem FormulationFormulation Retiming algorithmRetiming algorithm.
All-Pairs Shortest Paths
VLSI DSP 2008Y.T. Hwang3-1 Chapter 3 Algorithm Representation & Iteration Bound.
ECE Synthesis & Verification 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Retiming.
Algorithmic Transformations
1 IOE/MFG 543 Chapter 7: Job shops Sections 7.1 and 7.2 (skip section 7.3)
EDA (CS286.5b) Day 18 Retiming. Today Retiming –cycle time (clock period) –C-slow –initial states –register minimization.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 13, 2008 Retiming.
TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
V. V. Vazirani. Approximation Algorithms Chapters 3 & 22
Copyright © Cengage Learning. All rights reserved.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
L7: Pipelining and Parallel Processing VADA Lab..
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.
ELEC692 VLSI Signal Processing Architecture Lecture 2 Pipelining and Parallel Processing.
1 Retiming and Re-synthesis Outline: RetimingRetiming Retiming and Resynthesis (RnR)Retiming and Resynthesis (RnR) Resynthesis of PipelinesResynthesis.
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
Pipelining and Retiming
CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
Retiming EECS 290A Sequential Logic Synthesis and Verification.
Approximation Algorithms based on linear programming.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
DSP Design – Lecture 7 Unfolding cont. & Folding Fredrik Edman fredrik
CS137: Electronic Design Automation
The minimum cost flow problem
102-1 Under-Graduate Project Techniques in VLSI design
James D. Z. Ma Department of Electrical and Computer Engineering
Chapter 5. Optimal Matchings
Chapter 7 Network Flow Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
1.3 Modeling with exponentially many constr.
Tsung-Hao Chen and Kuang-Ching Wang May
ESE535: Electronic Design Automation
101-1 Under-Graduate Project Techniques in VLSI design
ESE535: Electronic Design Automation
Lecture 19 Linear Program
Timing Analysis and Optimization of Sequential Circuits
Presentation transcript:

ELEC692 VLSI Signal Processing Architecture Lecture 3 Retiming

Retiming - Introduction Retiming is a transformation technique to change the locations of delay element such that The input/output characteristics are kept The latency of the system is not changed The critical path of the system is reduced The number of registers is reduced The power consumption is reduced D retiming D D D D (4) (4) (2) retiming (2) A B A B 2D D Critical path=6, loop bound=6/2=3 Critical path=4, loop bound=6/2=3

An IIR Example retiming (1) x(n) y(n) (1) x(n) y(n) D D a w(n) D 2D a (2) D (1) (2) w1(n) b b D w2(n) (2) (2) Critical path delay = 3 unit # of register = 4 Critical path delay = 2 unit # of register = 5

Retiming for power consumption Placing registers at the input of nodes with large capacitances can reduce the switching activities at these nodes, which can lead to low-power solution.

Quantitative Description of retiming Retiming maps a circuit (graph) G to a retimed circuit (graph) Gr. A retiming solution is characterized by a value r(V) for each node V in the graph G. R(V): the number of delays moved from the output edge of the node V to each of its input edges Let w(e) denote the weight of the edge e in the original graph G, and let wr(e) denote the weight of the edge e in the retimed graph Gr. The weight of the edge U->V in the retimed graph is computed by wr(e) = w(e) + r(V) – r(U) The IIR example A retiming solution is feasible if wr(e) >=0 holds for all edges 1 2 3 4 (1) (2) 2D D (1) r(1)=0, r(2)=1 r(3)=0 r(4)=0 1 2D D If r(1)=0,r(2)=-1,r(3)=0 and r(4)=0, the retiming solution is not feasible as wr(3->2) will be equal to -1. D (2) (1) 2 3 D 4 (2)

Properties of Retiming Paths: a sequence of nodes and edges, Weight of the path p: delay of the path p: Cycle: also a sequence of nodes and edges, Weight of the path p: delay of the path p:

Properties of Retiming Property 1: The weight of the retimed path p=V0->V1->…->Vk is given by wr(p)=w(p)+r(Vk)-r(V0). Property 2: Retiming does not change the number of delays in a cycle. Property 3: Retiming does not alter the iteration bound in a DFG. Property 4: Adding the constant value j to the retiming value of each node does not change the mapping from G to Gr.

Retiming Technique Cutset retiming and pipelining Retiming to minimize the clock period and retiming to minimize the number of registers

Cutset Retiming and Pipelining A cutset is a set of edges that can be removed from the graph to create 2 disconnected subgraphs. Only affects the weights of the edges in the cutset. If the 2 disconnected subgraphs are G1 and G2, cutset retiming consists of adding k delays to each edge from G1 to G2 and removing k delays from each edge from G2 to G1. 1 2 3 4 (1) (2) 2D D G1 (1) 1 1 3D +kD D D +kD D (2) 2 3 (1) 2 3 -kD 4 4 (2)

Feasibility of cutset retiming Feasibility: for the retimed graph, wr(e) >=0 must hold for all edges e in Fr. Cutset retiming by adding k delay, from edges from G1 to G2, we have From edges from G2 to G1, we have Combining two, we have

Node retiming Cutset retiming – If one of the subgraph G2 is a single node and the other subgraph is the rest of the graph minus the edges going into and out of the chosen node. We call this node retiming G1 1 2 3 4 (1) (2) 2D D 1 2 3 4 (1) (2) 2D D 1 2D D 2 3 G2 Node retiming 4 Subtracting one delay from each edge outgoing from the node Adding one delay to each edge incident into the node

Node Retiming To recap the feasibility: Feasibility constraints: for each edge U->V of the retimed graph, the number of delay elements, i.e. the weight of the edge must be non-negative, we have Where wr(e) and w(e) are the number of delay elements on the edge U->V after and before retiming, respectively For a circuit graph, there is a retiming design space defined by the system of inequalities (each edge forms an inequality. This requires solving systems of inequalities

Example again Original weights wr(e)=w(e)+r(V)-r(U) G1 1 2 3 4 (1) (2) 2D D 1 2D 1 2 3 4 (1) (2) 2D D D 2 3 G2 4 Original weights wr(e)=w(e)+r(V)-r(U) Choosing retiming values:r(1)=0,r(2)=1,r(3)=0,r(4)=0, G2=1,G1=0

Pipelining – Feedforward Cutset Pipelining – special case of cutset retiming, where there are not edges in the cutset from the subgraph G2 to the subgraph G1, termed as feedforward cutset. D In D D In D G1 a a a a a a cutset G2 D In D Retime with k=2 a a a 2D 2D 2D

Example-Lattice Filter Critical path D D D Tcritical = 2Tmult+(N+1)Tadd where N is the number of stages Cutset retiming New critical path D D D D Tcritical = 4Tmult+4Tadd , a constant independent of # of stages, N

N slow-down with retiming Cutset retiming is used in combination with slow-down. Replace each delay in the graph with N delays to create an N-slow version of the graph Perform cutset retiming on the N-slow graph N-1 null operations must be interleaved after each useful signal sample to preserve the functionality of the algorithm (1) (1) (1) (1) 1 2 1 2 Tclk=2 Titer=2*2 = 4 2-slow transformation D 2D Tclk=2 Titer=2 Retiming of the 2-slow graph D (1) (1) 1 2 D Tclk=1 Titer=2

Another example- A 100-stage lattice filter Critical path Stage 1 Stage 2 Stage 100 …… D …… D D D Tcritical = 2Tmult+(N+1)Tadd where N is the number of stages In this case it is 2Tmult+101*Tadd 2-slow transformation Critical path Stage 1 Stage 2 Stage 100 …… 2D …… 2D 2D 2D

Another example- A 100-stage lattice filter Critical path Stage 1 Stage 2 Stage 100 …… 2D …… 2D 2D 2D Inserting null operation to preserve behavior Cutset retiming Critical path Stage 1 Stage 2 Stage 100 D …… D D 2D …… D D D Tcritical = 2Tmult+2 Tadd Since the circuit is 2-slow, the sample period is 2*Tcritical

Retiming for clock period minimization Minimum feasible clock period F(G): Computation time of the critical path, which is the path with the longest computation time among all paths with no delays F(G)= max{t(p):w(p)=0} (w(p) is the # of delay of the path. The problem: Given a circuit graph G, find the best retimed circuit graph Gr, for all possible and feasible retiming choices, such that F(Gr) is minimum, i.e. find a retiming solution r0 such that F(Gr0) <= F(Gr) for any other retiming solution r. Definition: W(U,V)- minimum # of registers on any path from node U to node V D(U,V)- maximum computation time among all paths from U to V with weight W(U,V)

Retiming for clock period minimization (Cont.) Overall algorithm: Compute W(U,V) and D(U,V) From the W(U,V) and D(U,V), determie if there is a retiming solution that can achieve a desired clock period. Given a clock period c, find a feasible retiming solution such that F(G)< c if the following constraints hold Feasibility constraint: r(U)-r(V) <=w(e) for every edge U->V of G, (make sure # of delays on each edge in the retimed graph is nonnegative Critical path constraint: r(U)-r(V) <= W(U,V)-1 for all vertices U,V in G such that D(U,V)>c, If there is a retimed solution, we can try a smaller c until there is not retimed solution.

How to compute W(U,V) and D(U,V) Let M=tmaxn, where tmax is the maximum computation time of the nodes in G and n is the number of nodes in G. Form a new graph Gn which is the same as G except the edge weights are replaced by w’(e)=Mw(e)-t(U) for all edges U->V. Solve the all-pair shortest path problem on G’. Let S’UV be the shortest path from U->V. If U /= V, then and D(U,V)= MW(U,V)-S’UV+t(V). If U=V, then W(U,V)=0 and D(U,V)=t(U).

Example Step 1: tmax = 2, n = 4, hence M = 8 1 2 3 4 (1) (2) 2D D Step 2: The new graph G’ is shown here Step 3: The shortest path can be found by standard shortest path algorihtm and the value of S’UV is as follows: S’UV 1 2 3 4 12 5 7 15 14 22 -2 20 1 2 3 4 (1) (2) 15 7 -2 Step 4: The W(U,V) and D(U,V) can be found as follows W(U,V) 1 2 3 4 D(U,V) 1 2 3 4 6

Example Given the values of W(U,V) and D(U,V), determine whether there is a retiming solution that can achieved a desired clock period c. Let c = 3 in the previous example: Critical path constraints for all edges: Feasibility constraints for all edges: Feasible solution r(1)=r(2)=r(3)=r(4)=0 i.e. no retiming is needed

Example (cont.) Now we try c = 2 Critical path constraints for all edges: Now we try c = 2 Feasibility constraints for all edges: Feasible solution r(1)=-1,r(2)=0, r(3)=-1, r(4)=-1 The retimed solution (1) 1 2D D D (2) (1) 2 3 D 4 (2)

Retiming for register minimization Find a retiming solution that uses the minimum # of registers while satisfying the clock period constraints, i.e. find a feasible solution with minimum # of registers in the design space constrained by the feasibility and critical path constraints. Maximum fanout observation: if a node has several output edges carrying the same signal, the number of registers to implement these edge is the maximum number of registers on any one of the edge. V1 V1 D U D 2D 4D V3 U V2 3D V21 V3 7D

Retiming for register minimization (cont.) # of registers required to implement the output edges of the node V in the retimed graph is The total register cost in the retimed circuit is The formulation of retiming to minimize the # of register under the constriant that the clock period is not greater than c is: Minimize COST subject to (fanout constraint) RV>= wr(e) for all V and all edges coming from V. (feasibility constraint): r(U)-r(V) <=w(e) for every edge U->V (clock period constraint): r(U)-r(V) <= w(U,V)-1 for all nodes U,V such that D(U,V) >c

Retiming for register minimization (cont.) The above formulation can be solved by using Interger Linear Programming (ILP) The details of solving ILP will not be covered in this course.