QUIZ 1. Question 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which.

Slides:

Advertisements

Similar presentations

FPGA (Field Programmable Gate Array)

Advertisements

Simulation of Fracturable LUTs

OCV-Aware Top-Level Clock Tree Optimization

ECE 506 Reconfigurable Computing Lecture 6 Clustering Ali Akoglu.

Interconnect Complexity-Aware FPGA Placement Using Rent’s Rule G. Parthasarathy Malgorzata Marek-Sadowska Arindam Mukherjee Amit Singh University of California,

BSPlace: A BLE Swapping technique for placement Minsik Hong George Hwang Hemayamini Kurra Minjun Seo 1.

Meng-Kai Hsu, Sheng Chou, Tzu-Hen Lin, and Yao-Wen Chang Electronics Engineering, National Taiwan University Routability Driven Analytical Placement for.

Ripple: An Effective Routability-Driven Placer by Iterative Cell Movement Xu He, Tao Huang, Linfu Xiao, Haitong Tian, Guxin Cui and Evangeline F.Y. Young.

Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.

CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.

1 DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jacon Cong ICCAD 2004 Presented by: Wei Chen.

02/02/20091 Logic devices can be classified into two broad categories Fixed Programmable Programmable Logic Device Introduction Lecture Notes – Lab 2.

Reconfigurable Computing (EN2911X, Fall07)

1/31/20081 Logic devices can be classified into two broad categories Fixed Programmable Programmable Logic Device Introduction Lecture Notes – Lab 2.

Simulated Annealing 10/7/2005.

HARP: Hard-Wired Routing Pattern FPGAs Cristinel Ababei , Satish Sivaswamy ,Gang Wang , Kia Bazargan , Ryan Kastner , Eli Bozorgzadeh   ECE Dept.

ECE 506 Reconfigurable Computing Lecture 7 FPGA Placement.

ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement.

Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.

Yehdhih Ould Mohammed Moctar1 Nithin George2 Hadi Parandeh-Afshar2

Authors: Jia-Wei Fang,Chin-Hsiung Hsu,and Yao-Wen Chang DAC 2007 speaker: sheng yi An Integer Linear Programming Based Routing Algorithm for Flip-Chip.

Escape Routing For Dense Pin Clusters In Integrated Circuits Mustafa Ozdal, Design Automation Conference, 2007 Mustafa Ozdal, IEEE Trans. on CAD, 2009.

Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.

MASSOUD PEDRAM UNIVERSITY OF SOUTHERN CALIFORNIA Interconnect Length Estimation in VLSI Designs: A Retrospective.

Power Reduction for FPGA using Multiple Vdd/Vth

Global Routing.

Titan: Large and Complex Benchmarks in Academic CAD

LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.

Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.

Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.

Solving Hard Instances of FPGA Routing with a Congestion-Optimal Restrained-Norm Path Search Space Keith So School of Computer Science and Engineering.

1 Global Routing Method for 2-Layer Ball Grid Array Packages Yukiko Kubo*, Atsushi Takahashi** * The University of Kitakyushu ** Tokyo Institute of Technology.

High-Level Interconnect Architectures for FPGAs Nick Barrow-Williams.

Julien Lamoureux and Steven J.E Wilton ICCAD

CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.

A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.

Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.

Impact of Interconnect Architecture on VPSAs (Via-Programmed Structured ASICs) Usman Ahmed Guy Lemieux Steve Wilton System-on-Chip Lab University of British.

Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen and Jason Cong Computer Science Department University of California,

Incremental Placement Algorithm for Field Programmable Gate Arrays David Leong Advisor: Guy Lemieux University of British Columbia Department of Electrical.

Introduction to FPGAs Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

LatchPlanner:Latch Placement Algorithm for Datapath-oriented High-Performance VLSI Design Minsik Cho, Hua Xiang, Haoxing Ren, Matthew M. Ziegler, Ruchir.

CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.

1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.

Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation

1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.

FPGA CAD 10-MAR-2003.

1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.

FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.

1 WireMap FPGA Technology Mapping for Improved Routability Stephen Jang, Xilinx Inc. Billy Chan, Xilinx Inc. Kevin Chung, Xilinx Inc. Alan Mishchenko,

© PSU Variation Aware Placement in FPGAs Suresh Srinivasan and Vijaykrishnan Narayanan Pennsylvania State University, University Park.

DAOmap: A Depth-optimal Area Optimization Mapping Algorithm for FPGA Designs Deming Chen, Jason Cong ， Computer Science Department ， UCLA Presented.

Register-Transfer (RT) Synthesis Greg Stitt ECE Department University of Florida.

Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of.

ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.

FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.

A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu

Placement and Routing Algorithms. 2 FPGA Placement & Routing.

1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.

This chapter in the book includes: Objectives Study Guide

Partial Reconfigurable Designs

Placement study at ESA Filomena Decuzzi David Merodio Codinachs

Delay Optimization using SOP Balancing

This chapter in the book includes: Objectives Study Guide

FPGA Glitch Power Analysis and Reduction

Timing Optimization.

Delay Optimization using SOP Balancing

A New Hybrid FPGA with Nanoscale Clusters and CMOS Routing Reza M. P

Reconfigurable Computing (EN2911X, Fall07)

Presentation transcript:

QUIZ 1

Question 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which drawbacks of the traditional FPGA CAD flow are targeted with the fragment level moves? 2

BSPlace: A BLE Swapping technique for placement Minsik Hong George Hwang Hemayamini Kurra Minjun Seo 3

Outline SCPlace Introduction Algorithm flowchart Net Counting Algorithm Results BSPlace Algorithm Demo Backup Slides If you guys ask minimal questions we can cover more Net Weighting VPR Datastructures 4

Rajavel, Senthilkumar Thoravi, and Ali Akoglu. "MO-Pack: Many-objective clustering for F PGA CAD." Proceedings of the 48th Design Automation Conference. ACM,

Simultaneous timing driven clustering and placement for FPGAs. Chen, Gang, and Jason Cong. Field Programmable Logic and Application. Springer Berlin Heidelberg,

Key concept Fragment level move BLE to a new CLB Check for valid CLB configuration Feasibility (number of BLEs and input pins) Update the cost function Block level move CLB to CLB 7

BLE Level Swapping Advantages Fix Packing issues during simulated annealing Better Congestion Mitigation Better at Routeability Disadvantages Speed Complexity 8

SCPlace Algorithm 9

10

Additional feature of Journal version SCPlace 11

Use Novel net weighting 12

A novel net weighting algorithm for timing- driven placement Kong, Tim Tianming. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design. ACM,

Accurate All Path Counting 14

a b c d e f /0 0/2 7/7 8/8 13/13 11/13 ARR/REQ a b c d e f Calculate F(t) Fs(a, c) = 7 – 0 – 7 = 0 Fs(b, c) = 7 – 0 – 2 = a=2, T: the longest path delay F(c) = F(c) + D{Fs(a, c), T} x F(a) + D{Fs(b, c), T} x F(b) = 0 + 1x x1 = delay 15

Calculate B(s) a b c d e f /0 0/2 7/7 8/8 13/13 11/13 ARR/REQ a b c d e f Bs(d, e) = 13 – 5 – 8 = 0 Bs(d, f) = 13 – 3 – 8 = a=2, T: the longest path delay D{Bs(a, c), T} = D{0,13} = 1 D{Bs(b, c), T} = D{0,13} = 1 D{Bs(c, d), T} = D{0,13} = 1 D{Bs(d, e), T} = D{0,13} = 1 D{Bs(d, f), T} = D{2,13} = 0.88 B(d) = B(d) + D{Bs(d, e), T} x B(e) + D{Bs(d, f), T} x B(f) = 0 + 1x x1 =

Calculate AP(s, t) (a=2) D{slack(a, c), T} = D{0,13} = 1 D{slack(b, c), T} = D{2,13} = 0.88 D{slack(c, d), T} = D{0,13} = 1 D{slack(d, e), T} = D{0,13} = 1 D{slack(d, f), T} = D{2,13} = 0.88 a b c d e f 1.88/ /1 1/ F(s)/B(t) slack AP(a,c) = F(a) x B(c) x D{slack(a, c), T} = 1 x 1.88 x 1 = 1.88 AP(b,c) = F(b) x B(c) x D{slack(b, c), T} = 1 x 1.88 x 0.88 = 1.65 a b cd e f

Results (Only use BLE swapping) 18 CLB = 4

Results (Only use BLE swapping) 19

Results (BLE + CLB swapping) 20

Results (BLE + CLB swapping) T-Vpack+VPR vs SCPlace (α=0.5) 21

BSPlace 22

BSPlace BLE Level Swapping within Simulated Annealing with Rent’s Rule Advantages Fix packing issues as they occur. Potentially better routability. Potentially better congestion due to combination of placement and packing. Disadvantages Execution time – We need to do memory allocation and deallocation for any ble swapping. Code Complexity – VPR is complex. We focus a lot of time with debugging and testing instead of algorithms. 23

Rent’s Rule Threshold Value Calculate the k value to get threshold Enter simulated annealing process Outer loop process Inner loop process Choose random CLB to move from current position to another position Check Rent’s Rule Threshold If we get a better result for swap Queue BLE Swapping Otherwise Do CLB swapping :Use T-v place Loop Through BLE Swapping Do BLE Swap after checking whether swap overlaps with previous swap Re-Allocated Memory and return to outer loop 24

Current Status Code Created our own BLE swapping mechanism using VPR data structure. We have a whole suite of test fixtures to test code. Testing still continuing, but we are finding minimal issues. We have done a swap within placement. We have started to integrate our cost function Validation We intend to run VPR benchmarks. Our BLE swapping solution should be better or the same as TV-Place. Our VPR benchmarks should also be comparable to IRAC. 25

The circuit below abstracts the MUX, switchboxes, and connection boxes. The connections represent the direct connections between bles in clbs. Op timize this circuit by performing one BLE swap. Explain why your optimizat ion will result in better performance. Architecture Parameter K = 2 I = 3 N = 2 Measurement Critical Path Delay = 1.182ns Demo 26

Demo 27

Demo 28

Thanks. 29

Backup Slides 30

Impact of duplication on placement Delay = 2 Delay = 1 31

A novel net weighting algorithm for timing- driven placement Kong, Tim Tianming. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design. ACM,

A Novel Net Weighting Algorithm Accurate path counting algorithm The first known accurate path counting algorithm that considers all paths Due to experimental number of paths present in the circuit, accurate all path counting has been considered very difficult. Significant performance improvement Little loss in total wirelength No runtime overhead 33

A Novel Net Weighting Algorithm consider the path sharing effect If two critical paths share a common segment, the edges in the common segment should receive higher weights. Define two variables Forward path F(p) - the number of different critical paths starting from P I elements, terminating at p. Backward path B(p) – the number of different critical paths staring from P O elements, terminating at p, if we reverse all signal flow directions. 34

Background 35

Background 36

Example a b c d e f Timing of a circuit ARR(t) REQ(s) The longest path delay (T) 37

Example Slack(s, t) /0 0/2 7/7 8/8 13/13 11/13 38

Example d(π) = 13, slack(π) = d(π) = 9, slack(π) = 4 d(π) = 11, slack(π) = 2 39

Critical Path counting 40

Calculate F(p)

Calculate B(p)

Calculate GP(s,t) a b c d e f

Accurate All Path Counting Use discount function to get accurate counting result ‘a’ is a positive constant number x Fs(s,t) = ARR(t) – ARR(s) – d(s,t) Bs(s,t) = REQ(t) – REQ(s) – d(s,t) y is the longest path delay (T) 44

Accurate All Path Counting 45

Ex. Calculate F(t) (a=2) a b c d e f /0 0/2 7/7 8/8 13/13 11/13 D{Fs(a, c), T} = D{0,13} = 1 D{Fs(b, c), T} = D{2,13} = 0.88 D{Fs(c, d), T} = D{0,13} = 1 D{Fs(d, e), T} = D{0,13} = 1 D{Fs(d, f), T} = D{0,13} = 1 a b c d e f

Ex. Calculate B(s) (a=2) a b c d e f /0 0/2 7/7 8/8 13/13 11/13 D{Bs(a, c), T} = D{0,13} = 1 D{Bs(b, c), T} = D{0,13} = 1 D{Bs(c, d), T} = D{0,13} = 1 D{Bs(d, e), T} = D{0,13} = 1 D{Bs(d, f), T} = D{2,13} = 0.88 a b c d e f

Ex. Calculate AP(s,t) (a=2) a b c d e f a b c d e f a b c d e f 1*1.88*1 = 1.88 D{slack(a, c), T} = D{0,13} = 1 D{slack(b, c), T} = D{2,13} = 0.88 D{slack(c, d), T} = D{0,13} = 1 D{slack(d, e), T} = D{0,13} = 1 D{slack(d, f), T} = D{2,13} = *1.88*0.88 = *1.88*1 = *1*1 = *1*0.88 =

Compare results a b c d e f a b c d e f Using Critical counting method (GPATH), it is difficult to get accurate result. However, if we use proposed algorithm, we can get more accurate result. 49

VPR Datastructures Resource Routing Graph Physical Block Graph Netlist Global CLB Netlist Global Atom Netlist Blocks 50

Blocks Contains CLB Contains the Input Output Contains the Resource Routing Graph Contains the Physical Blocks Physical Blocks represents the BLE Physical Blocks represents the Flip Flop Physical Blocks also contains the LUTs 51

Resource Routing Graph Nodes are pins Edges are architectural connections Each pin is associated with a net num Prev Nodes and Edges represents the actual connections per ble. 52

Global Netlist 53

Atom Netlist 54