Presentation is loading. Please wait.

Presentation is loading. Please wait.

NTHU-CS 1 Performance-Optimal Clustering with Retiming for Sequential Circuits Tzu-Chieh Tien and Youn-Long Lin Department of Computer Science National.

Similar presentations


Presentation on theme: "NTHU-CS 1 Performance-Optimal Clustering with Retiming for Sequential Circuits Tzu-Chieh Tien and Youn-Long Lin Department of Computer Science National."— Presentation transcript:

1 NTHU-CS 1 Performance-Optimal Clustering with Retiming for Sequential Circuits Tzu-Chieh Tien and Youn-Long Lin Department of Computer Science National Tsing Hua University Hsin-Chu, Taiwan, R.O.C.

2 NTHU-CS 2 Outline  Introduction  Previous Work  Proposed Approach  Experimental Results  Conclusion and Future Research

3 NTHU-CS 3 Retiming critical path delay = 8 retiming critical path delay = 7 35 2 1 3 52 1

4 NTHU-CS 4 Performance-Driven Clustering  Minimize clock period under cluster- size constraint 352 1

5 NTHU-CS 5 352 1 Combining Clustering and Retiming critical path delay = 7critical path delay = 8 inter-cluster delay = 2 clustering w/o retiming consideration clustering w/ retiming consideration 35 2 1 3 52 1

6 NTHU-CS 6 Problem Definition  Given  a sequential circuit G,  a target clock period c, and  an area-bound number M  Find  a clustered/retimed/node-replicated circuit G r  clock period less than or equal to c  each cluster is of size M or less

7 NTHU-CS 7 Previous Work  P. Pan, A. K. Karandikar, and C. L. Liu, “Optimal Clock Period Clustering for Sequential Circuits with Retiming,” IEEE T- CAD, June 1998.  Optimal under the unit gate delay model  Near-optimal for the general gate delay model  J. Cong, H. Li, and C. Wu, “Simultaneous Circuit Partitioning/Clustering with Retiming for Performance Optimization,” DAC’99.  100X more efficient but still near-optimal

8 NTHU-CS 8 This Work  Optimal for the general gate delay model  More (2X) efficient than Pan’s approach

9 NTHU-CS 9 Pan’s Approach  Label each node v an l -value, l(v)  Find a clustered-retimed circuit such that all PO’s l -values less than or equal to c  Retiming solution  Resulting clock period less than c + max. gate delay

10 NTHU-CS 10 Pan’s l -value of a Node  Total w 1 edge weight of the longest path from PI’s to the node  w 1 weight of edge e from u to v: w 1 (e) = - c * w(e) + d(v)  w(e): number of FF’s along e w 1 (e) 2 - 1 3 0 l (v) 0 2 1 4 4 < 6 target c = 6 253

11 NTHU-CS 11 Pan’s l -value Labeling  Traveling the whole circuit for updating l -values until no more updating in any node  Time complexity

12 NTHU-CS 12 Our Approach  Modified l -value definition  Optimal for general delay model  Based on W.-J. Chen, “A Study on the Relationship Between Retiming and Loop Folding,” Master thesis, National Tsing-Hua Univ., Taiwan, R.O.C., Aug. 1994.  FIFO to aid circuit traveling during labeling  Improve run time  Time complexity

13 NTHU-CS 13 Modified l-value Labeling   If an FF’s position is occupied by a gate v,  detected by l (v) 0 2 1 target c = 6 5 8 8 > 6 253

14 NTHU-CS 14 Example (target c = 7, inter-cluster delay = 2) 52 l (v) 3 1 3 10 12 1 l (v) 3 1 12 7 35 352 1 1 33 135 3 3152 12 9 3 7 5

15 NTHU-CS 15 Example (Cont’) (target c = 7, inter-cluster delay = 2) 3 35 1 52 clusteringconnecting & retimingmerging 352 1 3 52 1 352 1 35

16 NTHU-CS 16 Example (target c = 6, inter-cluster delay = 2) 52 l (v) 3 1 3 10 11 1 l (v) 3 1 11 7 35 352 1 1 33 135 3 3152 11 9 3 7 5

17 NTHU-CS 17 Example of Pan’s Approach (target c = 6, inter-cluster delay = 2) 2 l (v) 3 1 3 10 1 l (v) 3 1 8 6 35 352 1 1 33 135 3 3152 8 6 3 6 5

18 NTHU-CS 18 Example of Pan’s (Cont’) (target c = 6, inter-cluster delay = 2) 3 35 1 2 clusteringconnecting & retimingmerging 352 1 35 2 1 352 1 3

19 NTHU-CS 19 Experimental Results  26 ISCAS-89 Benchmark Circuits  Pan’s approach produces suboptimal results for 11 circuits  Our approach produces optimal result for every circuit  Our CPU time consumption is 50% of Pan’s

20 NTHU-CS 20 Conclusion and Future Research  First exact algorithm for performance- optimal clustering with retiming under general gate delay model  Twice as fast as Pan’s near-optimal heuristic  Future research is to improve run time efficiency

21 NTHU-CS 21

22 NTHU-CS 22

23 NTHU-CS 23

24 NTHU-CS 24 Experimental Results

25 NTHU-CS 25 Experimental Results (Cont’)


Download ppt "NTHU-CS 1 Performance-Optimal Clustering with Retiming for Sequential Circuits Tzu-Chieh Tien and Youn-Long Lin Department of Computer Science National."

Similar presentations


Ads by Google