MASSOUD PEDRAM UNIVERSITY OF SOUTHERN CALIFORNIA Interconnect Length Estimation in VLSI Designs: A Retrospective.

Slides:

Advertisements

Similar presentations

OCV-Aware Top-Level Clock Tree Optimization

Advertisements

Presentation of Designing Efficient Irregular Networks for Heterogeneous Systems-on-Chip by Christian Neeb and Norbert Wehn and Workload Driven Synthesis.

Meng-Kai Hsu, Sheng Chou, Tzu-Hen Lin, and Yao-Wen Chang Electronics Engineering, National Taiwan University Routability Driven Analytical Placement for.

High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department

Toward Better Wireload Models in the Presence of Obstacles* Chung-Kuan Cheng, Andrew B. Kahng, Bao Liu and Dirk Stroobandt† UC San Diego CSE Dept. †Ghent.

Intrinsic Shortest Path Length: A New, Accurate A Priori Wirelength Estimator Andrew B. KahngSherief Reda VLSI CAD Laboratory.

Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.

38 th Design Automation Conference, Las Vegas, June 19, 2001 Creating and Exploiting Flexibility in Steiner Trees Elaheh Bozorgzadeh, Ryan Kastner, Majid.

Chien Hsing James Wu David Gottesman Andrew Landahl.

Spring 07, Jan 16 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Introduction Vishwani D. Agrawal James J. Danaher.

Architectural-Level Prediction of Interconnect Wirelength and Fanout Kwangok Jeong, Andrew B. Kahng and Kambiz Samadi UCSD VLSI CAD Laboratory

Reconfigurable Computing (EN2911X, Fall07)

Scheduling with Optimized Communication for Time-Triggered Embedded Systems Slide 1 Scheduling with Optimized Communication for Time-Triggered Embedded.

Automated Generation of Layout and Control for Quantum Circuits Mark Whitney, Nemanja Isailovic, Yatish Patel, John Kubiatowicz University of California,

University of Toronto Pre-Layout Estimation of Individual Wire Lengths Srinivas Bodapati (Univ. of Illinois) Farid N. Najm (Univ. of Toronto)

Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research

Functional Coverage Driven Test Generation for Validation of Pipelined Processors P. Mishra and N. Dutt Proceedings of the Design, Automation and Test.

An Algebraic Multigrid Solver for Analytical Placement With Layout Based Clustering Hongyu Chen, Chung-Kuan Cheng, Andrew B. Kahng, Bo Yao, Zhengyong Zhu.

Accurate Pseudo-Constructive Wirelength and Congestion Estimation Andrew B. Kahng, UCSD CSE and ECE Depts., La Jolla Xu Xu, UCSD CSE Dept., La Jolla Supported.

Merging Synthesis With Layout For Soc Design -- Research Status Jinian Bian and Hongxi Xue Dept. Of Computer Science and Technology, Tsinghua University,

A Fault-tolerant Architecture for Quantum Hamiltonian Simulation Guoming Wang Oleg Khainovski.

CDCTree: Novel Obstacle-Avoiding Routing Tree Construction based on Current Driven Circuit Model Speaker: Lei He.

ROM-based computations: quantum versus classical B.C. Travaglione, M.A.Nielsen, H.M. Wiseman, and A. Ambainis.

By: Mike Neumiller & Brian Yarbrough

VLSI Physical Design: From Graph Partitioning to Timing Closure Paper Presentation © KLMH Lienig 1 EECS 527 Paper Presentation Accurate Estimation of Global.

Introduction to Routing. The Routing Problem Apply after placement Input: –Netlist –Timing budget for, typically, critical nets –Locations of blocks and.

MGR: Multi-Level Global Router Yue Xu and Chris Chu Department of Electrical and Computer Engineering Iowa State University ICCAD

An automatic tool flow for the combined implementation of multi-mode circuits Brahim Al Farisi, Karel Bruneel, João Cardoso, Dirk Stroobandt.

CRISP: Congestion Reduction by Iterated Spreading during Placement Jarrod A. Roy†‡, Natarajan Viswanathan‡, Gi-Joon Nam‡, Charles J. Alpert‡ and Igor L.

Global Routing.

CAD for Physical Design of VLSI Circuits

1 Cost Metrics for Reversible and Quantum Logic Synthesis Dmitri Maslov 1 D. Michael Miller 2 1 Dept. of ECE, McGill University 2 Dept. of CS, University.

Qubit Placement to Minimize Communication Overhead in 2D Quantum Architectures Alireza Shafaei, Mehdi Saeedi, Massoud Pedram Department of Electrical Engineering.

LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.

March 20, 2007 ISPD An Effective Clustering Algorithm for Mixed-size Placement Jianhua Li, Laleh Behjat, and Jie Huang Jianhua Li, Laleh Behjat,

An Efficient Clustering Algorithm For Low Power Clock Tree Synthesis Rupesh S. Shelar Enterprise Microprocessor Group Intel Corporation, Hillsboro, OR.

1 Wire Length Prediction-based Technology Mapping and Fanout Optimization Qinghua Liu Malgorzata Marek-Sadowska VLSI Design Automation Lab UC-Santa Barbara.

CBSSS 2002: DeHon Costs André DeHon Wednesday, June 19, 2002.

Tinoosh Mohsenin and Bevan M. Baas VLSI Computation Lab, ECE Department University of California, Davis Split-Row: A Reduced Complexity, High Throughput.

Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.

J. Christiansen, CERN - EP/MIC

A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.

Placement. Physical Design Cycle Partitioning Placement/ Floorplanning Placement/ Floorplanning Routing Break the circuit up into smaller segments Place.

ASIC, Customer-Owned Tooling, and Processor Design Nancy Nettleton Manager, VLSI ASIC Device Engineering April 2000 Design Style Myths That Lead EDA Astray.

GLARE: Global and Local Wiring Aware Routability Evaluation Yaoguang Wei1, Cliff Sze, Natarajan Viswanathan, Zhuo Li, Charles J. Alpert, Lakshmi Reddy,

The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,

RF network in SoC1 SoC Test Architecture with RF/Wireless Connectivity 1. D. Zhao, S. Upadhyaya, M. Margala, “A new SoC test architecture with RF/wireless.

1 Efficient Obstacle-Avoiding Rectilinear Steiner Tree Construction Chung-Wei Lin, Szu-Yu Chen, Chi-Feng Li, Yao-Wen Chang, Chia-Lin Yang National Taiwan.

1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.

ECE 260B – CSE 241A /UCB EECS Kahng/Keutzer/Newton Physical Design Flow Read Netlist Initial Placement Placement Improvement Cost Estimation Routing.

DEVICES AND DESIGN : ASIC. DEFINITION Any IC other than a general purpose IC which contains the functionality of thousands of gates is usually called.

Maze Routing Algorithms with Exact Matching Constraints for Analog and Mixed Signal Designs M. M. Ozdal and R. F. Hentschke Intel Corporation ICCAD 2012.

Multi-Split-Row Threshold Decoding Implementations for LDPC Codes

Dirk Stroobandt Ghent University Electronics and Information Systems Department A Priori System-Level Interconnect Prediction Rent’s Rule and Wire Length.

CLASSICAL LOGIC SRFPGA layout With I/O pins.

Dirk Stroobandt Ghent University Electronics and Information Systems Department A New Design Methodology Based on System-Level Interconnect Prediction.

Optimization of Quantum Circuits for Interaction Distance in

1 Floorplanning of Pipelined Array (FoPA) Modules using Sequence Pairs Matt Moe Herman Schmit.

Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,

Dirk Stroobandt Ghent University Electronics and Information Systems Department Multi-terminal Nets do Change Conventional Wire Length Distribution Models.

Prediction of Interconnect Net-Degree Distribution Based on Rent’s Rule Tao Wan and Malgorzata Chrzanowska- Jeske Department of Electrical and Computer.

A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer.

Partial Reconfigurable Designs

Jody Matos, Augusto Neutzling, Renato Ribas and Andre Reis

Mohammad Gh. Alfailakawi, Imtiaz Ahmad, Suha Hamdan

HIGH LEVEL SYNTHESIS.

Off-path Leakage Power Aware Routing for SRAM-based FPGAs

Magic-State Functional Units

Reconfigurable Computing (EN2911X, Fall07)

Presentation transcript:

MASSOUD PEDRAM UNIVERSITY OF SOUTHERN CALIFORNIA Interconnect Length Estimation in VLSI Designs: A Retrospective

2 Motivation and Problem Definition Interconnect represents an increasingly significant part of total circuit delay  Longer interconnect is more significant Interconnect is accurately known only after place/route  This leads to timing closure problems  Logic design is now coupled with physical design Interconnect must be considered during:  Floorplanning, synthesis, timing verification We need to be able to predict the length of individual wires before layout, say during technology mapping 2

3 Previous Work Previous work in this area:  Pedram and Preas, ICCD-89  Average wire length for given pin-count  Heineken and Maly, CICC-96  Wire-length distribution  Hamada, Cheng, and Chau, TCAD 1996  Average wire length for given pin-count  Srinivas Bodapati, Farid N. Najm, TVLSI 2001  Andrew Kahng and Sherief Reda, SLIP 2006  Dirk Stroobandt  Others …

4 Key Ideas The number of pins on a net (denoted P net ) is known to affect net length The first level neighborhood (denoted N h1 (i) ) of a given net i is defined as:  The set of all other nets connected to cells to which this net is also connected The second level neighborhood (denoted N h2 (i) ) of a given net i is defined as:  The union of all first level neighborhoods of nets that are in the first level neighborhood of this net

Mohammad Javad Dousti and Massoud Pedram (DAC 2013 Paper) LEQA: Latency Estimation for a Quantum Algorithm Mapped to a Quantum Circuit Fabric

6 Related Papers M. Pedram. B. T. Preas, "Accurate prediction of physical design characteristics of random logic," Proc. of Int'l Conference on Computer Design: VLSI in Computers and Processors, Oct. 1989, pp M. Pedram. B. T. Preas, "Interconnection length estimation for optimized standard cell layouts," Proc. of Int’l Conference on Computer Aided Design, Nov. 1989, pp

7 Overview Introduction & Motivation Problem Statement Preliminaries  Quantum Operation Dependency Graph (QODG)  Universal Logic Blocks (ULBs) Estimating the Latency of a Quantum Algorithm  Average Routing Latency for CNOT Gate LEQA Performance Experimental Results Conclusion

8 Introduction & Motivation Total execution time of a software depends on 1.Processor architecture, 2.Circuit design, 3.Place and route. Several estimation methods for the estimation of a software execution time without running it on a specific processor/processor simulator is proposed. The same paradigm exists for quantum computers: Calculating the exact latency of a quantum algorithm is an expansive proposition since it needs scheduling and placement of quantum operations and routing of qubits The exact answer has no use since there is no real-size quantum computer out there!  However, the latency estimation of the mapped quantum circuit still has many applications:  Early algorithm/program analysis  Helps quantum error correction code (QECC) designers to account enough amount of resources for QECCs

9 Problem Statement Given:  A quantum circuit  Size of the fabric (width×height)  Logical gates delays  The capacity of routing channels  Speed of a logical qubit through the routing channels Estimate the latency of the mapped quantum circuit to the quantum circuit fabric.

10 Preliminaries (1): Quantum Operation Dependency Graph (QODG) In QODG, nodes represent quantum operations and edges capture data dependencies. QODG of ham3 circuit Synthesized ham3 circuit q1 q2 q3

11 Preliminaries (2): Universal Logic Blocks (ULBs) To avoid dealing with complexity, Tiled Quantum Architecture (TQA) is used which is composed of a regular two-dimensional array of ULBs. A 3×3 Tiled Quantum Architecture (TQA) Each ULB can perform any FT quantum operations. ULBs are separated by the routing channels, which are needed to move logical qubits from some source ULBs to a target ULB in the TQA H CNOT T†T† T

12 Estimating the Latency of a Quantum Algorithm Tech, QECC, & QC dependent values Easy; Empirically set to 2×T move Main challenge!

13 A computationally efficient model for estimating the average qubit routing latency for CNOT gates is developed. The model comprises a number of sub-models dealing with  Possible placement locations of each qubit captured as a “presence zone”  Congestion in the routing channels captured by “zone overlaps”  Intra-zone routing modelled as “shortest Hamiltonian path” A procedural method, combining the sub-models together to estimate the Qubit routing latency for CNOT gates. Average Qubit Routing Latency for CNOT Gate Highly Congested 5 presence zones

14 Should be estimated

15

16 Estimating Average Area of Presence Zones (B)

17 Derivation of this comes next

18

19

20

21 LEQA Performance Polynomial in terms of input size (operation count, qubit count and fabric size)

22 Experimental Results (1) LEQA is compared with a modified version of our previous work QSPR (DATE’12) Average error is 2.11% Worst case error; still low enough

23 Experimental Results (2) Shor’s factorization algorithm for a 1024-bit integer has ~1.35×10 10 logical operations. Using extrapolation, QSPR would compute the latency in ~2 years whereas LEQA needs only 16.5 hours!!

24 Conclusion Persistence of Ideas The method developed some 25 years ago applies today not to classical computing but also to quantum computing fabric Gratitude of Scholars We are who we are because of what we have learned from whom and what we have done since Voice of Hearts Friendship and collegiality are key