Packing and Placement Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

Slides:



Advertisements
Similar presentations
Simulation of Fracturable LUTs
Advertisements

Commercial FPGAs: Altera Stratix Family Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
FPGA Intra-cluster Routing Crossbar Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
1 Advancing Supercomputer Performance Through Interconnection Topology Synthesis Yi Zhu, Michael Taylor, Scott B. Baden and Chung-Kuan Cheng Department.
BSPlace: A BLE Swapping technique for placement Minsik Hong George Hwang Hemayamini Kurra Minjun Seo 1.
FPGA Technology Mapping Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Meng-Kai Hsu, Sheng Chou, Tzu-Hen Lin, and Yao-Wen Chang Electronics Engineering, National Taiwan University Routability Driven Analytical Placement for.
Floating-Point FPGA (FPFPGA) Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
ISQED’2015: D. Seemuth, A. Davoodi, K. Morrow 1 Automatic Die Placement and Flexible I/O Assignment in 2.5D IC Design Daniel P. Seemuth Prof. Azadeh Davoodi.
Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.
Chapter 2 – Netlist and System Partitioning
Reconfigurable Computing (EN2911X, Fall07)
Lecture 4: FPGA Placement September 12, 2013 ECE 636 Reconfigurable Computing Lecture 4 FPGA Placement.
CS294-6 Reconfigurable Computing Day 15 October 13, 1998 LUT Mapping.
EDA (CS286.5b) Day 7 Placement (Simulated Annealing) Assignment #1 due Friday.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.
The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays Steven J.
ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement.
ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement.
Escaping local optimas Accept nonimproving neighbors – Tabu search and simulated annealing Iterating with different initial solutions – Multistart local.
Placement by Simulated Annealing. Simulated Annealing  Simulates annealing process for placement  Initial placement −Random positions  Perturb by block.
Global Routing. Global routing:  To route all the nets, should consider capacities  Sequential −One net at a time  Concurrent −Order-independent 2.
Power Reduction for FPGA using Multiple Vdd/Vth
Titan: Large and Complex Benchmarks in Academic CAD
POWER-DRIVEN MAPPING K-LUT-BASED FPGA CIRCUITS I. Bucur, N. Cupcea, C. Stefanescu, A. Surpateanu Computer Science and Engineering Department, University.
Network Aware Resource Allocation in Distributed Clouds.
FPGA Switch Block Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.
Solving Hard Instances of FPGA Routing with a Congestion-Optimal Restrained-Norm Path Search Space Keith So School of Computer Science and Engineering.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
Tools - Implementation Options - Chapter15 slide 1 FPGA Tools Course Implementation Options.
CP Summer School Modelling for Constraint Programming Barbara Smith 2. Implied Constraints, Optimization, Dominance Rules.
Placement. Physical Design Cycle Partitioning Placement/ Floorplanning Placement/ Floorplanning Routing Break the circuit up into smaller segments Place.
FPGA Global Routing Architecture Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.
Introduction to FPGAs Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 10: February 6, 2002 Placement (Simulated Annealing…)
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
Section 1  Quickly identify faulty components  Design new, efficient testing methodologies to offset the complexity of FPGA testing as compared to.
Parallel Routing for FPGAs based on the operator formulation
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
A local search algorithm with repair procedure for the Roadef 2010 challenge Lauri Ahlroth, André Schumacher, Henri Tokola
Routability-driven Floorplanning With Buffer Planning Chiu Wing Sham Evangeline F. Y. Young Department of Computer Science & Engineering The Chinese University.
Optimization Problems
Give qualifications of instructors: DAP
FPGA CAD 10-MAR-2003.
Ramakrishna Lecture#2 CAD for VLSI Ramakrishna
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
CALTECH CS137 Winter DeHon 1 CS137: Electronic Design Automation Day 8: January 27, 2006 Cellular Placement.
Placement and Routing Algorithms. 2 FPGA Placement & Routing.
Optimization Problems
Partial Reconfigurable Designs
Placement study at ESA Filomena Decuzzi David Merodio Codinachs
Floating-Point FPGA (FPFPGA)
HeAP: Heterogeneous Analytical Placement for FPGAs
CS137: Electronic Design Automation
Verilog to Routing CAD Tool Optimization
ESE534: Computer Organization
Optimization Problems
Topics Logic synthesis. Placement and routing..
CS137: Electronic Design Automation
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Packing and Placement Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223

Packing Example (Homogeneous)

Packing Example (Heterogeneous) Netlist Architecture Packing Solution

Architecture Description and Packing for Logic Blocks with Hierarchy, Modes, and Complex Interconnect Jason Luu, Jason Anderson, and Jonathan Rose International Symposium on FPGAs, 2011

AA-Pack 6.0 Algorithm Pick the un-packed mapped LUT with the largest number of attached nets p – Netlist block ; B partially filled logic cluster nets(p, B) – number of shared nets between p and B ext(p, B) – number of pins on p ’s nets residing on netlist blocks NOT packed into B packed(p) – number of pins on p ’s nets residing on netlist blocks packed into logic clusters OTHER than B num_pins(p) – number of used pins on p (normalizes affinities across netlist blocks with varying numbers of used pins

Legality Challenges Handle complex logic clusters with hierarchy – Fracturable LUTs – Carry chains – Hard logic circuits Routability – Sparse crossbar intra-cluster routing

Hierarchical Cluster Example Strategy: Pack each netlist block into the smallest primitive that can accommodate it Algorithm: Search the tree bottom-up, from right to left

Ensuring Routability Basic Check: Does packing the netlist block into the cluster exceed I/O pin availability? Routability: Build routing graph and run a routing algorithm to determine legality – Routing algorithm details will be discussed next week

Limitations Focus is area optimization, not timing Architectural limitations – (Fracturable) LUT-based logic blocks – Fracturable arithmetic blocks (e.g., multipliers) – Memories with reconfigurable aspect ratios (not discussed) Mapping assumptions – Different block types cannot accommodate the same netlist block In reality, could pack a flip-flop into either a LUT- or multiplier- based block

Toward Interconnect-Adaptive Packing for FPGAs Jason Luu, Jason Anderson, and Jonathan Rose International Symposium on FPGAs, 2014

AA-Pack 7.0 Calling the router repeatedly during packing is computationally expensive – Speculative Packing: avoid unnecessary calls to the router – Interconnect-Aware Pin Counting: Quickly find unroutable instances based on pin demand Pre-packing: Support inflexible routing structures – E.g., carry chains Other bells and whistles – Accurate timing model – Best-fit placement – Better support for high-fanout nets

Speculative Packing FPGA 2011 Implementation – Call the router to check legality each time a new block is packed into the cluster FPGA 2014 Implementation – Fill the logic block to capacity, then call the router If a legal route is found, we’re done Otherwise, re-pack the block using the FPGA 2011 approach – Works because the common case is that a legal route is found

Interconnect-Aware Pin Counting Partition I/O pins into classes based on interconnect structure When each netlist block is packed, check the demand for each pin class Reject the block if demand exceeds supply for any pin class

Example

Properties and Limitations An optimistic filter – Cases that fail are not routable – Cases that pass may or may not be routable Sparse interconnect is approximated as fully connected Does not account for situations where a net routes through a sub- cluster without connecting to any primitives in that subcluster Internal feedback/feedforward connections within a logic cluster are discovered before packing and accounted for during pin counting Gives a pass/fail answer – Does not help to guide future candidate selection

Pre-packing Inflexible routing structures – Incorrect grouping or placement of netlist blocks may fail routing – The architect enumerates “pack patterns” to describe each structure – Before packing, identify netlist sub-graphs that match “pack patterns” Group them together and match them to logic cluster primitives that match the “pack pattern” Pack Patterns Multiply-add Registered multiply Registered add Registered multiply-add

Experiments

Results

Timing-Driven Placement for FPGAs Alexander (Sandy) Marquardt, Vaughn Betz, and Jonathan Rose International Symposium on FPGAs, 2000

Placement

Simulated Annealing

VPlace (Pre-dates this paper) Strategy: Minimize interconnect overhead

Timing Analysis For a placed and routed net How much delay can we add to a net before it becomes critical?

T-VPlace (This Paper) Optimize Timing + Wiring Complexity Delay approximation – FPGAs are uniform – Store delays (Δx, Δy) in a ROM Model a two-terminal net with source at (x source, y source ) and target at (x source + Δx, y source + Δy) Reduce the allowable move distance over time α is the fraction of attempted moves that were accepted at the previous temperature

Timing Cost and Objective Sum the timing costs of all source-sink pairs Heavily weight critical nets Maximum delay of all nets in the circuit

Default value is 10 Annealing Schedule Number of moves to perform at each temperature Vary the temperature as the algorithm progresses Termination criteria α is the fraction of attempted moves that were accepted at the old temperature T old

VPlace vs. T-VPlace

Improving Simulated Annealing- Based FPGA Placement with Directed Moves Kristofer Vorwerk, Andrew Kennings, and Jonathan W. Greene IEEE Transactions on CAD 28(2): (2009)

Motivation: an annealer may spend significant time revisiting previously explored states before it finds the lowest cost state – Coax the annealer into exploring neighbor states that are more likely to yield an improvement

Simple “Moves” (T-Vplace) Randomly select a cell – Move a cell to an unoccupied target location – Swap the location of two cells Location selection – Random shrinking window α is the fraction of attempted moves that were accepted at the previous temperature

Heuristics to Determine Source Cells Random – VPR Graph coloring – Color the netlist before placement – Chose up to 15 non-adjacent (same color) cells at a time Priority list – Randomly choose among the 25% worst placed cells Position (details to follow) Timing cost of paths

Heuristics to Determine Target Locations Random – VPR Linear assignment – Details omitted Median placement and variants – Details on the next slide Priority list

Median Placement Compute bounding boxes for all nets omitting source pins – Take x and y minimums and maximums Put points into vectors and sort Define a rectangle by the median and median+1 entries in each vector Randomly select a new target location within the rectangle

Cell Rippling Nearest empty location to B Rippling directions are chosen randomly

Quality Factor of a Move p i is the probability that the move is accepted Use previous annealing iteration to determine the probabilities empirically P prev (i) is P(i) from the previous iteration

Results 4 BLEs per cluster 8 BLEs per cluster

Improving FPGA Placement with Dynamically Adaptive Stochastic Tunneling Mingjie Lin and John Wawryznek IEEE Transactions on CAD 29(12): (2010)

Simulated Annealing (Conceptual) Stochastic Tunneling

Simulated Annealing Weaknesses Sensitivity to parameters – Quite a few – Interactions between them not understood Freezing problem – Unable to escape local minima – Prevalent at low temperatures where bad moves are accepted with a very low probability

Acceptance Criteria for Bad Moves Simulated Annealing Stochastic Tunneling “Energy” of the best solution found so far Continually adjusted as better solutions are found “Energy” of the current solution being evaluated Tunneling parameter

Stochastic Tunneling (Conceptual)

Stochastic Tunneling Pseudocode

Results Averages: