Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming.

Similar presentations


Presentation on theme: "CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming."— Presentation transcript:

1 CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming

2 So Far W/ HSRA –…lots of emphasis on retiming, already

3 Today Formalizing the transformations Retiming Algorithm C-slow

4 Motivation FPGAs (spatial computing) –run efficiently when all resources reused rapidly cycle time minimized HSRA –design to embody/require this –correct operation depends on match basic cycle components in architecture “Everything in the right place at the right time.”

5 Task Move registers to: –Minimize path length between registers –(make path length 1 for HSRA) –Maximize reuse rate –…while minimizing number of registers required

6 Simple Example Path Length (L) = 4 Can we do better?

7 Legal Register Moves Retiming Lag/Lead

8 Canonical Graph Representation Separate arch for each path Weight edges by number of registers (weight nodes by delay through node)

9 Critical Path Length Critical Path: Length of longest path of zero weight nodes Compute in O(|E|) time by levelizing network: Topological sort, push path lengths forward until find register.

10 Retiming Lag/Lead Retiming: Assign a lag to every vertex weight(e) = weight(e) + lag(head(e))-lag(tail(e))

11 Valid Retiming Retiming is valid as long as: –  e in graph weight(e) = weight(e) + lag(head(e))-lag(tail(e))  0 Assuming original circuit was a valid synchronous circuit, this guarantees: –non-negative register weights on all edges no travel backward in time :-) –all cycles have strictly positive register counts –propagation delay on each vertex is non- negative (assumed 1 for today)

12 Retiming Task Move registers  assign lags to nodes –lags define all locally legal moves Preserving non-negative edge weights –(previous slide) –guarantees collection of lags remains consistent globally

13 Retiming Transformation N.B. -- unchanged by retiming –number of registers around a cycle –delay along a cycle Cycle of length P must have –at least P/c registers on it –to be retimeable to cycle c

14 Optimal Retiming There is a retiming of –graph G –w/ clock cycle c –iff G-1/c has no cycles with negative edge weights G-   subtract  from each edge weight

15 G-1/c

16 Compute Retiming Lag(v) = shortest path to I/O in G-1/c Compute shortest paths in O(|V||E|) –Bellman-Ford –also use to detect negative weight cycles when c too small

17 Bellman Ford For I  0 to N –u i  (except u i =0 for IO) For k  0 to N –for e i,j  E u i  min(u i, u j +w(e i,j )) for e i,j  E if u i >u j +w(e i,j ) –cycles detected

18 Apply to Example

19 Apply: Find Lags

20 Apply: Lags

21 Apply: Move Registers weight(e) = weight(e) + lag(head(e))-lag(tail(e))

22 Apply: Retimed

23 Apply: Retimed Design

24 Revise Example (fanout delay)

25 Revised: Graph

26

27 Revised: C=1?

28 Revised: C=2?

29 Revised: Lag

30 Take ceiling to convert to integer lags:

31 Revised: Apply Lag

32 Revised: Retimed

33 C>1 ==> Pipeline

34 Add Registers

35 Pipeline Retiming: Lag

36 Pipelined Retimed

37 Real Cycle

38

39 Cycle C=1?

40 Cycle C=2?

41 Cycle: C-slow Cycle=c  C-slow network has Cycle=1

42 2-slow Cycle  C=1

43 2-Slow Lags

44 2-Slow Retime

45 Retimed 2-Slow Cycle

46 C-Slow applicable? Available parallelism –solve C identical, independent problems e.g. process packets (blocks) separately e.g. independent regions in images Commutative operators –e.g. max example

47 HSRA Retiming One additional twist –long, pipelined interconnect  need more than one register on paths

48 Accommodating HSRA Interconnect Delays Add buffers to LUT  LUT path to match interconnect register requirements Retime to C=1 as before Buffer chains force enough registers to cover interconnect delays

49 Accommodating HSRA Interconnect Delays

50 Summary Retiming important to –minimize cycles –efficiently utilize spatial architectures Optimally solvable in O(|V||E|) time Tells us –pipelining required –C-slow –where to move registers


Download ppt "CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming."

Similar presentations


Ads by Google