EDA (CS286.5b) Day 18 Retiming. Today Retiming –cycle time (clock period) –C-slow –initial states –register minimization.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms 6.046J/18.401J/SMA5503
Advertisements

ECE 667 Synthesis and Verification of Digital Circuits
Architecture-dependent optimizations Functional units, delay slots and dependency analysis.
Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Chapter 4 Retiming.
Routing in a Parallel Computer. A network of processors is represented by graph G=(V,E), where |V| = N. Each processor has unique ID between 1 and N.
Introduction to Algorithms
MS&E 211 Minimum Cost Flow LP Ashish Goel. Minimum Cost Flow (MCF) Need to ship some good from “supply” nodes to “demand” nodes over a network – Example:
Sequential Timing Optimization. Long path timing constraints Data must not reach destination FF too late s i + d(i,j) + T setup  s j + P s i s j d(i,j)
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Basic Feasible Solutions: Recap MS&E 211. WILL FOLLOW A CELEBRATED INTELLECTUAL TEACHING TRADITION.
1 Maximum Flow w s v u t z 3/33/3 1/91/9 1/11/1 3/33/3 4/74/7 4/64/6 3/53/5 1/11/1 3/53/5 2/22/2 
Tirgul 12 Algorithm for Single-Source-Shortest-Paths (s-s-s-p) Problem Application of s-s-s-p for Solving a System of Difference Constraints.
Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.
ECE Synthesis & Verification - Lecture 2 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
Shortest Paths Definitions Single Source Algorithms –Bellman Ford –DAG shortest path algorithm –Dijkstra All Pairs Algorithms –Using Single Source Algorithms.
Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.
EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Retiming. Consider the Following Circuit Suppose T XOR = 3 ns, T pcq = 1 ns, T setup = 1 ns, then this circuit can be clocked at 1 ns + (3 x 3 ns) + 1.
Shortest Paths Definitions Single Source Algorithms
EDA (CS286.5b) Day 11 Scheduling (List, Force, Approximation) N.B. no class Thursday (FPGA) …
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming.
Tirgul 13. Unweighted Graphs Wishful Thinking – you decide to go to work on your sun-tan in ‘ Hatzuk ’ beach in Tel-Aviv. Therefore, you take your swimming.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under contract.
EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.
1 Retiming Outline: ProblemProblem FormulationFormulation Retiming algorithmRetiming algorithm.
All-Pairs Shortest Paths
VLSI DSP 2008Y.T. Hwang3-1 Chapter 3 Algorithm Representation & Iteration Bound.
ECE Synthesis & Verification 1 ECE 667 ECE 667 Synthesis and Verification of Digital Systems Retiming.
03/08/2005 © J.-H. Jiang1 Retiming and Resynthesis EECS 290A – Spring 2005 UC Berkeley.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 13, 2008 Retiming.
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
Network and Communications Ju Wang Chapter 5 Routing Algorithm Adopted from Choi’s notes Virginia Commonwealth University.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.
1 1 © 2003 Thomson  /South-Western Slide Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
ELEC692 VLSI Signal Processing Architecture Lecture 3
Pipelining and Retiming
CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
TU/e Algorithms (2IL15) – Lecture 12 1 Linear Programming.
Instructor Neelima Gupta Edited by Divya Gaur(39, MCS '09) Thanks to: Bhavya(9), Deepika(10), Deepika Bisht(11) (MCS '09)
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
St. Edward’s University
Maximum Flow c v 3/3 4/6 1/1 4/7 t s 3/3 w 1/9 3/5 1/1 3/5 u z 2/2
CS137: Electronic Design Automation
The minimum cost flow problem
Algorithms and Networks
Instructor: Shengyu Zhang
Maximum Flow c v 3/3 4/6 1/1 4/7 t s 3/3 w 1/9 3/5 1/1 3/5 u z 2/2
Analysis of Algorithms
ESE535: Electronic Design Automation
Richard Anderson Lecture 30 NP-Completeness
CprE / ComS 583 Reconfigurable Computing
CS184a: Computer Architecture (Structures and Organization)
Flow Networks and Bipartite Matching
ESE535: Electronic Design Automation
Constraint satisfaction problems
Algorithms (2IL15) – Lecture 7
Algorithms and Networks
Maximum Flow c v 3/3 4/6 1/1 4/7 t s 3/3 w 1/9 3/5 1/1 3/5 u z 2/2
Timing Analysis and Optimization of Sequential Circuits
Fast Min-Register Retiming Through Binary Max-Flow
Constraint satisfaction problems
Presentation transcript:

EDA (CS286.5b) Day 18 Retiming

Today Retiming –cycle time (clock period) –C-slow –initial states –register minimization

Problem Given: clocked circuit Goal: minimize clock period without changing (observable) behavior I.e. minimize maximum delay between any pair of registers Freedom: move placement of internal registers

Other Goals Minimize number of registers in circuit Achieve target cycle time Minimize number of registers while achieving target cycle time …start talking about minimizing cycle...

Simple Example Path Length (L) = 4 Can we do better?

Legal Register Moves Retiming Lag/Lead

Canonical Graph Representation Separate arch for each path Weight edges by number of registers (weight nodes by delay through node)

Critical Path Length Critical Path: Length of longest path of zero weight nodes Compute in O(|E|) time by levelizing network: Topological sort, push path lengths forward until find register.

Retiming Lag/Lead Retiming: Assign a lag to every vertex weight(e) = weight(e) + lag(head(e))-lag(tail(e))

Valid Retiming Retiming is valid as long as: –  e in graph weight(e) = weight(e) + lag(head(e))-lag(tail(e))  0 Assuming original circuit was a valid synchronous circuit, this guarantees: –non-negative register weights on all edges no travel backward in time :-) –all cycles have strictly positive register counts –propagation delay on each vertex is non- negative (assumed 1 for today)

Retiming Task Move registers  assign lags to nodes –lags define all locally legal moves Preserving non-negative edge weights –(previous slide) –guarantees collection of lags remains consistent globally

Retiming Transformation N.B. -- unchanged by retiming –number of registers around a cycle –delay along a cycle Cycle of length P must have –at least P/c registers on it –to be retimeable to cycle c

Optimal Retiming There is a retiming of –graph G –w/ clock cycle c –iff G-1/c has no cycles with negative edge weights G-   subtract  from each edge weight

G-1/c

Intuition Must have at most c delay between every pair of registers So, count 1/c’th charge against register for every delay without out –(G provides credit of 1 register every time one passed)

Compute Retiming Lag(v) = shortest path to I/O in G-1/c Compute shortest paths in O(|V||E|) –Bellman-Ford –also use to detect negative weight cycles when c too small

Bellman Ford For I  0 to N –u i  (except u i =0 for IO) For k  0 to N –for e i,j  E u i  min(u i, u j +w(e i,j )) for e i,j  E if u i >u j +w(e i,j ) –cycles detected

Apply to Example

Apply: Find Lags

Apply: Lags

Apply: Move Registers weight(e) = weight(e) + lag(head(e))-lag(tail(e))

Apply: Retimed

Apply: Retimed Design

Revise Example (fanout delay)

Revised: Graph

Revised: C=1?

Revised: C=2?

Revised: Lag

Take ceiling to convert to integer lags: 0 0

Revised: Apply Lag 0 0

Revised: Apply Lag

Revised: Retimed

Pipelining Can use this retiming to pipeline Assume have enough (infinite supply) of registers at edge of circuit Retime them into circuit

C>1 ==> Pipeline

Add Registers

Pipeline Retiming: Lag

Pipelined Retimed

Real Cycle

Cycle C=1?

Cycle C=2?

Cycle: C-slow Cycle=c  C-slow network has Cycle=1

2-slow Cycle  C=1

2-Slow Lags

2-Slow Retime

Retimed 2-Slow Cycle

C-Slow applicable? Available parallelism –solve C identical, independent problems e.g. process packets (blocks) separately e.g. independent regions in images Commutative operators –e.g. max example

Note Algorithm/examples shown –for special case of unit-delay nodes For general delay, still polynomial, but a bit more complicated –(but still polynomial)

Initial State What about initial state? 0 1

Initial State 0

1 0

Cannot always get exactly the same initial state behavior on the retimed circuit –without additional care in the retiming transformation –sometimes have to modify structure of retiming to perserve initial behavior Only a problem for startup transient –if circuit you’re willing to clock to get into initial state, not a limitation

Minimize Registers

Number of registers:  w(e) After retime:  w(e)+  (FI(v)-FO(v))lag(v) delta only in lags So want to minimize:  (FI(v)-FO(v))lag(v) –subject to earlier constraints non-negative register weights, delays positive cycle counts

Minimize Registers Can be formulated as flow problem Can add cycle time constraints to flow problem Time: O(|V||E|log(|V|)log|(|V| 2 /|E|))

Summary Can move registers to minimize cycle time Formulate as a lag assignment to every node Optimally solve cycle time in O(|V||E|) time Also –Compute multithreaded computations –Minimize registers Watch out for initial values

Today’s Big Ideas Exploit freedom Formulate transformations (lag assignment) Express legality constraints Technique: –graph algorithms –network flow