Before Placement: Clustering °Intra-cluster connections: fast °Inter-cluster connections: slow Need to pack BLEs °Goals: Reduce stress on routing Take advantage of local fast interconnect Reduce inter-cluster wiring Minimize critical path (timing- driven) °How do we do this Take advantage of cluster architecture °Tradeoffs
Basic Clustering (Betz) °How many distinct inputs should be provided to a cluster of N 4-LUTs? °How many 4 LUTs should be included in a cluster to create the most area-efficient logic block?
Basic Clustering (Betz) °Flow Iterate until all BLEs consumed Start new cluster by selecting a random BLE -select the currently unclustered BLE with the most used inputs, Add BLE with most shared inputs with current cluster to cluster -to minimize the number of inputs that must be routed to each cluster. Keep adding until either cluster full or input pins used up Hill climbing – if some cluster BLEs unused -Add another BLE even if cluster input count temporarily overflowed -If input count not eventually reduced select best choice from before hill climbing
Number of Inputs per Cluster Lots of opportunities for input sharing in large clusters (Betz – CICC’99) Reducing inputs reduces the size of the device and makes it faster. Most FPGA devices (Xilinx, Lucent) have 4 BLE per cluster with more inputs than actually needed.
Architecture Modeling Tri-state buffer and pass transistor distribution Cluster Size vs. Routing resources (Tile size) Transistor and Buffer Scaling based on segment length Flexibility of Switches (Fc=W for large cluster size is a waste?)
Timing-Driven Clustering – T-VPACK °Optimization goals of VPack Pack each cluster to its capacity -Minimize number of clusters Minimize number of inputs per cluster -Reduce the number of external connections
Timing-Driven Clustering – T-VPACK °Optimization goal of T-VPack Minimize number of external connections on critical path Why? -External connections have higher delay and internal connections -Reducing number of external nets on critical path will reduce delay
Timing-Driven Clustering – T-VPACK °First stage Identify connections that are on the critical path °Second Stage Pack BLEs sequentially along the critical path Recompute criticality of remaining BLEs
Timing-Driven Clustering – T-VPACK °Cost metric now considers both connectivity and timing criticality °Perform an analysis of criticality at beginning considering all wires to be inter-cluster °Determine “Base” BLE criticality
How to break ties? °Initially, many paths may have the same number of BLEs °Include “tie-breaking” in performance cost function
Results for T-VPACK versus VPACK Why does the gap between VPack and T-VPack increase as N increases?
Results for T-VPACK versus VPACK °T-VPack prefers to cluster a BLE with BLEs that are in its fan-in or fan-out °VPack favors input sharing °T-VPack completely absorbs many low-fanout nets Fewer nets to route!
Results for T-VPACK versus VPACK Why does area-delay product show an increasing trend beyond cluster size of 10?
Results for T-VPACK versus VPACK °Increased number of nets that are completely absorbed by T-Vpack °Area- delay product Cluster size 7-10 best choice (36-34% better than N=1) °N=7 vs N=1 30% less delay, 8% les area
Results for T-VPACK, DELAY !!! Why do we see a circuit speedup?
Results for T-VPACK, DELAY !!! 18% 40% °Intra-cluster: Fast, Inter-cluster: Slow ! °As N increases Number of internal connections on the critical path increase Number of external connections on the critical path decrease
Why are inter-cluster connections becoming faster? Reduction in Number of external connections (internal connections are faster) External connections on the critical path are becoming faster Reduction in routing requirements