CS184a: Computer Architecture (Structures and Organization)

Slides:



Advertisements
Similar presentations
ECE 667 Synthesis and Verification of Digital Circuits
Advertisements

Courtesy RK Brayton (UCB) and A Kuehlmann (Cadence) 1 Logic Synthesis Sequential Synthesis.
Chapter 4 Retiming.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,
Pipelining and Retiming 1 Pipelining  Adding registers along a path  split combinational logic into multiple cycles  increase clock rate  increase.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array William Tsu, Kip Macy, Atul Joshi, Randy Huang, Norman Walker, Tony Tung, Omid Rowhani,
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 18: February 21, 2003 Retiming 2: Structures and Balance.
CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming.
Spring 07, Apr 5 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Retiming Vishwani D. Agrawal James J. Danaher Professor.
Tirgul 13. Unweighted Graphs Wishful Thinking – you decide to go to work on your sun-tan in ‘ Hatzuk ’ beach in Tel-Aviv. Therefore, you take your swimming.
03/08/2005 © J.-H. Jiang1 Retiming and Resynthesis EECS 290A – Spring 2005 UC Berkeley.
EDA (CS286.5b) Day 18 Retiming. Today Retiming –cycle time (clock period) –C-slow –initial states –register minimization.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 8: February 13, 2008 Retiming.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
MAX FLOW CS302, Spring 2013 David Kauchak. Admin.
Network and Communications Ju Wang Chapter 5 Routing Algorithm Adopted from Choi’s notes Virginia Commonwealth University.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 24: April 18, 2011 Covering and Retiming.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 7: January 24, 2003 Instruction Space.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 7: February 3, 2002 Retiming.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
ELEC692 VLSI Signal Processing Architecture Lecture 3
Pipelining and Retiming
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day16: November 15, 2000 Retiming Structures.
CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.
Algorithm Design and Analysis (ADA)
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 20: February 27, 2005 Retiming 2: Structures and Balance.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 25: April 17, 2013 Covering and Retiming.
Vishwani D. Agrawal James J. Danaher Professor
Pipelining and Retiming 1
CS137: Electronic Design Automation
Richard Anderson Lecture 29 NP-Completeness and course wrap-up
CS184a: Computer Architecture (Structure and Organization)
CS137: Electronic Design Automation
CS137: Electronic Design Automation
James D. Z. Ma Department of Electrical and Computer Engineering
Planar Graphs & Euler’s Formula
ESE535: Electronic Design Automation
ESE534: Computer Organization
ELEC 7770 Advanced VLSI Design Spring 2012 Retiming
Standard-Cell Mapping Revisited
Vishwani D. Agrawal James J. Danaher Professor
Day 26: November 1, 2013 Synchronous Circuits
Analysis of Algorithms
ESE535: Electronic Design Automation
Unit 4: Dynamic Programming
Richard Anderson Lecture 30 NP-Completeness
CprE / ComS 583 Reconfigurable Computing
Flow Networks and Bipartite Matching
CS184a: Computer Architecture (Structures and Organization)
ESE535: Electronic Design Automation
Algorithms (2IL15) – Lecture 7
Lecture 14 Shortest Path (cont’d) Minimum Spanning Tree
ELEC 7770 Advanced VLSI Design Spring 2016 Retiming
Network Flow.
Timing Analysis and Optimization of Sequential Circuits
CS137: Electronic Design Automation
CS184a: Computer Architecture (Structure and Organization)
Fast Min-Register Retiming Through Binary Max-Flow
Lecture 13 Shortest Path (cont’d) Minimum Spanning Tree
Network Flow.
CS137: Electronic Design Automation
Presentation transcript:

CS184a: Computer Architecture (Structures and Organization) Day15: November 13, 2000 Retiming Caltech CS184a Fall2000 -- DeHon

Previously Reviewed Pipelining Saw spatial designs efficient basic assignments on Saw spatial designs efficient when reuse logic at maximum frequency Interconnect dominant delay and dominant area heavy call to reuse to use efficiently Caltech CS184a Fall2000 -- DeHon

Today Systematic transformation for retiming maximize throughput preserve semantics “justify” mandatory registers in design Caltech CS184a Fall2000 -- DeHon

Motivation FPGAs (spatial computing) run efficiently when all resources reused rapidly cycle time minimized “Everything in the right place at the right time.” Caltech CS184a Fall2000 -- DeHon

Task Move registers to: preserve semantics Minimize path length between registers (make path length 1 for maximum throughput or reuse) Maximize reuse rate …while minimizing number of registers required Caltech CS184a Fall2000 -- DeHon

Simple Example Path Length (L) = 4 Can we do better? Caltech CS184a Fall2000 -- DeHon

Legal Register Moves Retiming Lag/Lead Caltech CS184a Fall2000 -- DeHon

Canonical Graph Representation Separate arch for each path Weight edges by number of registers (weight nodes by delay through node) Caltech CS184a Fall2000 -- DeHon

Critical Path Length Critical Path: Length of longest path of zero weight nodes Compute in O(|E|) time by levelizing network: Topological sort, push path lengths forward until find register. Caltech CS184a Fall2000 -- DeHon

Retiming Lag/Lead Retiming: Assign a lag to every vertex weight(e) = weight(e) + lag(head(e))-lag(tail(e)) Caltech CS184a Fall2000 -- DeHon

Valid Retiming Retiming is valid as long as: e in graph weight(e) = weight(e) + lag(head(e))-lag(tail(e))  0 Assuming original circuit was a valid synchronous circuit, this guarantees: non-negative register weights on all edges no travel backward in time :-) all cycles have strictly positive register counts propagation delay on each vertex is non-negative (assumed 1 for today) Caltech CS184a Fall2000 -- DeHon

Retiming Task Move registers  assign lags to nodes lags define all locally legal moves Preserving non-negative edge weights (previous slide) guarantees collection of lags remains consistent globally Caltech CS184a Fall2000 -- DeHon

Retiming Transformation N.B. -- unchanged by retiming number of registers around a cycle delay along a cycle Cycle of length P must have at least P/c registers on it to be retimeable to cycle c Caltech CS184a Fall2000 -- DeHon

Optimal Retiming There is a retiming of graph G w/ clock cycle c iff G-1/c has no cycles with negative edge weights G-  subtract a from each edge weight Caltech CS184a Fall2000 -- DeHon

G-1/c Caltech CS184a Fall2000 -- DeHon

Compute Retiming Lag(v) = shortest path to I/O in G-1/c Compute shortest paths in O(|V||E|) Bellman-Ford also use to detect negative weight cycles when c too small Caltech CS184a Fall2000 -- DeHon

Bellman Ford For I0 to N For k0 to N ui  (except ui=0 for IO) for ei,jE ui min(ui ,uj+w(ei,j)) if ui >uj+w(ei,j) cycles detected Caltech CS184a Fall2000 -- DeHon

Apply to Example Caltech CS184a Fall2000 -- DeHon

Apply: Find Lags Caltech CS184a Fall2000 -- DeHon

Apply: Lags Caltech CS184a Fall2000 -- DeHon

Apply: Move Registers weight(e) = weight(e) + lag(head(e))-lag(tail(e)) Caltech CS184a Fall2000 -- DeHon

Apply: Retimed Caltech CS184a Fall2000 -- DeHon

Apply: Retimed Design Caltech CS184a Fall2000 -- DeHon

Revise Example (fanout delay) Caltech CS184a Fall2000 -- DeHon

Revised: Graph Caltech CS184a Fall2000 -- DeHon

Revised: Graph Caltech CS184a Fall2000 -- DeHon

Revised: C=1? Caltech CS184a Fall2000 -- DeHon

Revised: C=2? Caltech CS184a Fall2000 -- DeHon

Revised: Lag Caltech CS184a Fall2000 -- DeHon

Revised: Lag Take ceiling to convert to integer lags: -1 -1 Caltech CS184a Fall2000 -- DeHon

Revised: Apply Lag -1 Caltech CS184a Fall2000 -- DeHon

Revised: Apply Lag -1 1 1 1 1 1 1 1 1 Caltech CS184a Fall2000 -- DeHon

Revised: Retimed 1 1 1 1 1 1 1 1 Caltech CS184a Fall2000 -- DeHon

Pipelining Can use this retiming to pipeline Assume have enough (infinite supply) of registers at edge of circuit Retime them into circuit Caltech CS184a Fall2000 -- DeHon

C>1 ==> Pipeline Caltech CS184a Fall2000 -- DeHon

Add Registers Caltech CS184a Fall2000 -- DeHon

Pipeline Retiming: Lag Caltech CS184a Fall2000 -- DeHon

Pipelined Retimed Caltech CS184a Fall2000 -- DeHon

Real Cycle Caltech CS184a Fall2000 -- DeHon

Real Cycle Caltech CS184a Fall2000 -- DeHon

Cycle C=1? Caltech CS184a Fall2000 -- DeHon

Cycle C=2? Caltech CS184a Fall2000 -- DeHon

Cycle: C-slow Cycle=c  C-slow network has Cycle=1 Caltech CS184a Fall2000 -- DeHon

2-slow Cycle  C=1 Caltech CS184a Fall2000 -- DeHon

2-Slow Lags Caltech CS184a Fall2000 -- DeHon

2-Slow Retime Caltech CS184a Fall2000 -- DeHon

Retimed 2-Slow Cycle Caltech CS184a Fall2000 -- DeHon

C-Slow applicable? Available parallelism Commutative operators solve C identical, independent problems e.g. process packets (blocks) separately e.g. independent regions in images Commutative operators e.g. max example Caltech CS184a Fall2000 -- DeHon

Max Example Caltech CS184a Fall2000 -- DeHon

Max Example Caltech CS184a Fall2000 -- DeHon

Monday Lecture Stopped Here Caltech CS184a Fall2000 -- DeHon

HSRA Retiming HSRA One additional twist adds mandatory pipelining to interconnect One additional twist long, pipelined interconnect  need more than one register on paths Caltech CS184a Fall2000 -- DeHon

Accommodating HSRA Interconnect Delays Add buffers to LUTLUT path to match interconnect register requirements Retime to C=1 as before Buffer chains force enough registers to cover interconnect delays Caltech CS184a Fall2000 -- DeHon

Accommodating HSRA Interconnect Delays Caltech CS184a Fall2000 -- DeHon

Big Ideas [MSB Ideas] Retiming important to minimize cycles efficiently utilize spatial architectures Optimally solvable in O(|V||E|) time Tells us pipelining required C-slow where to move registers Caltech CS184a Fall2000 -- DeHon