Lecture 9: Multi-FPGA System Software October 3, 2013 ECE 636 Reconfigurable Computing Lecture 9 Multi-FPGA System Software.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Conceptual Clustering
Memory Management Chapter 7.
ECE 506 Reconfigurable Computing Lecture 6 Clustering Ali Akoglu.
BSPlace: A BLE Swapping technique for placement Minsik Hong George Hwang Hemayamini Kurra Minjun Seo 1.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 9: February 20, 2008 Partitioning (Intro, KLFM)
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 15: March 12, 2007 Interconnect 3: Richness.
CS294-6 Reconfigurable Computing Day 8 September 17, 1998 Interconnect Requirements.
VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.
Chapter 2 – Netlist and System Partitioning
EDA (CS286.5b) Day 5 Partitioning: Intro + KLFM. Today Partitioning –why important –practical attack –variations and issues.
CS294-6 Reconfigurable Computing Day 10 September 24, 1998 Interconnect Richness.
Local Unidirectional Bias for Smooth Cutsize-delay Tradeoff in Performance-driven Partitioning Andrew B. Kahng and Xu Xu UCSD CSE and ECE Depts. Work supported.
Reconfigurable Computing (EN2911X, Fall07)
Evolution of implementation technologies
CS294-6 Reconfigurable Computing Day 13 October 6, 1998 Interconnect Partitioning.
EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.
Lecture 13: Mid-term 1 Review October 22, 2013 ECE 636 Reconfigurable Computing Lecture 13 Mid-term I Review.
CSE 144 Project Part 2. Overview Multiple rows Routing channel between rows Components of identical height but various width Goal: Implement a placement.
Partitioning 1 Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem.
7/15/ VLSI Placement Prof. Shiyan Hu Office: EERC 731.
Partitioning Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem –Understand.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 4: January 28, 2015 Partitioning (Intro, KLFM)
COE4OI5 Engineering Design. Copyright S. Shirani 2 Course Outline Design process, design of digital hardware Programmable logic technology Altera’s UP2.
Power Reduction for FPGA using Multiple Vdd/Vth
Global Routing.
TSV-Aware Analytical Placement for 3D IC Designs Meng-Kai Hsu, Yao-Wen Chang, and Valerity Balabanov GIEE and EE department of NTU DAC 2011.
Lessons Learned The Hard Way: FPGA  PCB Integration Challenges Dave Brady & Bruce Riggins.
March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,
Fundamentals of Algorithms MCS - 2 Lecture # 7
Lecture 15: Multi-FPGA System Software I November 1, 2004 ECE 697F Reconfigurable Computing Lecture 15 Mid-term Review.
Improved Cut Sequences for Partitioning Based Placement Mehmet Can YILDIZ and Patrick H. Madden State University of New York at BinghamtonComputer Science.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Massachusetts Institute of Technology 1 L14 – Physical Design Spring 2007 Ajay Joshi.
Tools - Implementation Options - Chapter15 slide 1 FPGA Tools Course Implementation Options.
Sept. 2005EE37E Adv. Digital Electronics Lesson 1 CPLDs and FPGAs: Technology and Design Features.
Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.
Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 6: January 23, 2002 Partitioning (Intro, KLFM)
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
Lecture 6: Mapping to Embedded Memory and PLAs September 27, 2004 ECE 697F Reconfigurable Computing Lecture 6 Mapping to Embedded Memory and PLAs.
Circuit Partitioning Divides circuit into smaller partitions that can be efficiently handled Goal is generally to minimize communication between balanced.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 9: February 9, 2004 Partitioning (Intro, KLFM)
1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 8 Lecture 8 Network Flow Based Modeling Mustafa Ozdal Computer Engineering Department,
FPGA-Based System Design: Chapter 7 Copyright  2004 Prentice Hall PTR Topics n Multi-FPGA systems.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 6, 2003 Interconnect 3: Richness.
FPGA CAD 10-MAR-2003.
CprE566 / Fall 06 / Prepared by Chris ChuPartitioning1 CprE566 Partitioning.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 6: January 30, 2013 Partitioning (Intro, KLFM)
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
CSE 144 Project. Overall Goal of the Project Implement a physical design tool for a two- row standard cell design
Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.
Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of.
ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.
Practical Database Design and Tuning
VLSI Quadratic Placement
CS137: Electronic Design Automation
Chapter 2 – Netlist and System Partitioning
EE5780 Advanced VLSI Computer-Aided Design
Xilinx CPLD Fitter Advanced Optimization
CS184a: Computer Architecture (Structures and Organization)
Register-Transfer (RT) Synthesis
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Lecture 9: Multi-FPGA System Software October 3, 2013 ECE 636 Reconfigurable Computing Lecture 9 Multi-FPGA System Software

Lecture 9: Multi-FPGA System Software October 3, 2013 Overview Steps in multi-FPGA software Bipartitioning Logic Replication Partition Ordering Theoretical limits of multi-FPGA systems.

Lecture 9: Multi-FPGA System Software October 3, 2013 Multi-FPGA Software Missing high-level synthesis Global placement and routing similar to intra-device CAD

Lecture 9: Multi-FPGA System Software October 3, 2013 System-level Constraints Even though general solutions are desirable, system specific issues must be considered. For many systems, designs are created independently of the system Software efficiency determines performance and usability

Lecture 9: Multi-FPGA System Software October 3, 2013 Bipartitioning Perhaps biggest problem in multi-FPGA design is partitioning Partitioner must deal with logic and pin constraints. Could simultaneously attempt partitioning across all devices. Even “simple” algorithms are O(n 3 ) Better to recursively bipartition circuit.

Lecture 9: Multi-FPGA System Software October 3, 2013 KLFM Partitioning Identify nodes to swap to reduce overall cut size Lock moved nodes Algorithm continues until no un-locked node can be moved without violating size constraints Bin 1Bin 2

Lecture 9: Multi-FPGA System Software October 3, 2013 KLFM Partitioning Key issue is implementing node costs in lists that can be easily accessed and updated. Many extensions to consider to speed up overall optimization Reasonably easy to implement in software

Lecture 9: Multi-FPGA System Software October 3, 2013 Partition Preprocessing: Clustering Identify bin size Choose a seed block (node) Identify node with highest connectivity to join cluster Terminate when cluster size met. In practical terms cluster size of 4 works best

Lecture 9: Multi-FPGA System Software October 3, 2013 Clustering Technology mapping before partitioning is typically ineffective since frequently area is secondary to interconnect Frequently bipartitioning continues after unclustering as well. Cluster KLFM unclusterKLFM This allows for additional fine-grain moves.

Lecture 9: Multi-FPGA System Software October 3, 2013 Initial Partition Creation KLFM primarily designed to operate on fixed-sized partitions. Several approaches exist to distribute nodes between the two partitions -Random -> assign ½ to each -Breadth-first -> select a node, select the next node attached to it -Depth-first -> similar to B.F. except get all attached nodes

Lecture 9: Multi-FPGA System Software October 3, 2013 Partition Creation Results Suprisingly random appears to be the best For the largest designs, results similar For smaller designs, variance across designs Seeded ->start from an empty partition and apply KLFM

Lecture 9: Multi-FPGA System Software October 3, 2013 Higher-level Gains Effectively look-ahead to try to anticipate next move Look-ahead of 3 considered best tradeoff

Lecture 9: Multi-FPGA System Software October 3, 2013 Partition Size Variation Most bipartitions must be balanced so that full FPGA utilization may be achieved Frequently application designers do not create circuits that are evenly balanced

Lecture 9: Multi-FPGA System Software October 3, 2013 Logic Replication Attempt to reduce cutset by replicating logic. Every input of original cell must also input the replicated cell. Replication can either be integrated into the partitioning process or used as a post-process technique.

Lecture 9: Multi-FPGA System Software October 3, 2013 Example: Kring-Newton Replication Introduce a new state to partitioning -Node can exist in separate locations Possible node moves include gain/reduce, replication, and unreplication Positive unreplication moves must be taken before any other moves Gradient technique-only allow replication when cut-size changes by more than 10%

Lecture 9: Multi-FPGA System Software October 3, 2013 Kring-Newton Results Results indicate 20% improvement in cut size with 5% increase in logic node count. Minimal increase in computation time

Lecture 9: Multi-FPGA System Software October 3, 2013 Functional Replication Applied to tech-mapped Xilinx blocks. Outputs in CLBs split into two CLBs Only inputs needed by both CLBs split across partitions.

Lecture 9: Multi-FPGA System Software October 3, 2013 Replication Summary Tech mapping before partitioning shown to be ineffective (again) Kring-Newton simple but effective Overall summary of bipartitioning -Use random initial placement -Bandwidth clustering -High-order gain of 3 and Kring-Newton to achieve best results

Lecture 9: Multi-FPGA System Software October 3, 2013 Logic Partition Ordering Simply bipartitioning not enough. Knowing what to partition is important. One approach -> locate critical point of expected wires/available wires and partition here first. Example above shows alternating horizontal and vertical cuts.

Lecture 9: Multi-FPGA System Software October 3, 2013 Terminal Propogation Even though bipartitioning occurs with a fixed set of nodes, previously cut nodes may play a factor. Consider recursive cut. Need to use “anchors” to guide partitioning.

Lecture 9: Multi-FPGA System Software October 3, 2013 Splash 2 68 connections most FPGAs, only 35 between A-7 More balanced with even schedule Somewhat unimportant due to Splash programming style.

Lecture 9: Multi-FPGA System Software October 3, 2013 Are Meshes Really Realistic? The number of wires leaving a partition grows with Rent’s Rule Perimeter grows as G 0.5 but unfortunately most circuits grow at G B where B > 0.5 Effectively devices highly pin limited What does this mean for meshes? P = KG B

Lecture 9: Multi-FPGA System Software October 3, 2013 Summary Multi-FPGA system software requires many steps. Bipartitioning has been the subject of much research Suprisingly, simple approaches to initializing partitions and replicating logic is most effective. Pin limitations pose a problem -> address this issue in next class.