Introduction Precise timing information for critical paths/sinks with delay violations is only available after P&R stage. –Re-design is time consuming. Engineering change orders (ECO) can be used to fix timing violations after P&R. –Using spare cells with re-routing.
Introduction (cont.) Conventional timing ECO algorithms focus on improving the delay of one timing path at a time. – considered one two-pin net in the timing path but neglected the multi-pin net topology when selecting inserted buffers. – considered the positions of multiple pins of a net but did not consider the net topology of detailed routing paths. Only optimize the delay of the critical sink by treating one multi-pin net as one two-pin net may degrade the delays of other sinks of the same net. –Sequentially worsening other timing violation paths.
The effect of topology
Introduction (cont.) Besides, detail routing is time consuming. –Greedily finding the inserted buffer and connections may falling into suboptimal. –Sequentially investigating each reconnection to the newly inserted buffer requires unacceptable detailed rerouting runtime. Parallel routing could save the runtime. –GPU supports high computing power with low cost.
Problem formulation Given –A routed design (D), a buffer set (B), a routed net set (N ALL ), a routed net (N) belonging to N ALL with an edge set (E), a pin set (P), a violation pin set (VP). Objective –Inserting one buffer in B into N, such that the topology of N is changed and the arrival times of the sinks in VP are minimized without the addition of violated sinks. Topology-Aware Buffer Insertion (BI) & Topology Restructuring
Buffering pair scoring We want to disregard those BP that may potentially worsening the delay of some sinks. –In other words, invalid BPs are ignored. Then adopts the Elmore delay model to compute the delay difference for all sinks in VP if a BP is valid. The wire length is estimated by the Manhattan distance.
Speedup and preventing race condition Partition routing graph to blocks due to performance and scalability. Stagger adjacent blocks for better performance. –2.25x faster.
Experimental results Environment –AMD Opteron 2.6GHz workstation with 16GB memory. –Intel Xeon E GHz with 8GB memory and a single NVIDIA Tesla C1060 GPU. Implemented in C++. s35932 in IWLS benchmark with additional 300 spare cells. Selects five nets, N1-N5, in s35942 with various degrees of pins to demonstrate.
Critical sink delay improvement
Analysis The following results are on platform 2.
Conclusions This work develops topology-aware ECO timing optimization algorithm flow. –BP, EB, TR. –GPU based re-routing. Improve the WNS and TNS significantly with 7.72x average runtime speedup compared to conventional 2-pin net-based buffer insertion.