FPGA Place & Route Challenges

Slides:



Advertisements
Similar presentations
Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.
Advertisements

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
10 December 2012 Clive Max Maxfield All Programmable FPGAs, SoCs, and 3D ICs Part V. Advanced Concepts and Future Trends 1.
All Programmable FPGAs, SoCs, and 3D ICs
Technical Seminar Tour 2007 LATTICE‘S PROGRAMMABLE LOWCOST SOLUTIONS
Fill in missing numbers or operations
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 4 Computing Platforms.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 1 Embedded Computing.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
1 Chapter 40 - Physiology and Pathophysiology of Diuretic Action Copyright © 2013 Elsevier Inc. All rights reserved.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
ITRS Roadmap Design + System Drivers Makuhari, December 2007 Worldwide Design ITWG Good morning. Here we present the work that the ITRS Design TWG has.
Reconfigurable Computing After a Decade: A New Perspective and Challenges For Hardware-Software Co-Design and Development Tirumale K Ramesh, Ph.D. Boeing.
Chuck Alpert Design Productivity Group Austin Research Laboratory
and 6.855J Cycle Canceling Algorithm. 2 A minimum cost flow problem , $4 20, $1 20, $2 25, $2 25, $5 20, $6 30, $
0 - 0.
1 1  1 =.
1  1 =.
ALGEBRAIC EXPRESSIONS
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
ZMQS ZMQS
Year 6/7 mental test 5 second questions
Utility Optimization for Event-Driven Distributed Infrastructures Cristian Lumezanu University of Maryland, College Park Sumeer BholaMark Astley IBM T.J.
Augmenting FPGAs with Embedded Networks-on-Chip
Chapter 1: Introduction to Scaling Networks
Embedded Systems Design: A Unified Hardware/Software Introduction 1 Chapter 10: IC Technology.
FPGA (Field Programmable Gate Array)
Mohamed ABDELFATTAH Vaughn BETZ. 2 Why NoCs on FPGAs? Embedded NoCs Power Analysis
© 2011 Altera CorporationPublic The Trends in Programmable Solutions SoC FPGAs for Embedded Applications and Hardware-Software Co-Design Misha Burich Senior.
Describing Complex Products as Configurations using APL Arrays.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.
Database System Concepts and Architecture
© 2003 Xilinx, Inc. All Rights Reserved Course Wrap Up DSP Design Flow.
Addition 1’s to 20.
25 seconds left…...
Mikael Collin Mälardalen University 1 SoCrates -A Multiprocessor SoC in 40 days Mikael Collin Co-authors: Raimo Haukilahti, Mladen Nikitovic, Joakim Adomat.
Test B, 100 Subtraction Facts
Dan Lander Haru Yamamoto Shane Erickson (EE 201A Spring 2004)
Week 1.
Number bonds to 10,
2 x0 0 12/13/2014 Know Your Facts!. 2 x1 2 12/13/2014 Know Your Facts!
1 Unit 1 Kinematics Chapter 1 Day
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Dataflow Programming with MaxCompiler.
From Model-based to Model-driven Design of User Interfaces.
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
Programmable logic and FPGA
1 A survey on Reconfigurable Computing for Signal Processing Applications Anne Pratoomtong Spring2002.
1 DIGITAL DESIGN I DR. M. MAROUF FPGAs AUTHOR J. WAKERLY.
Dr. Konstantinos Tatas ACOE201 – Computer Architecture I – Laboratory Exercises Background and Introduction.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
ISE. Tatjana Petrovic 249/982/22 ISE software tools ISE is Xilinx software design tools that concentrate on delivering you the most productivity available.
Lessons Learned The Hard Way: FPGA  PCB Integration Challenges Dave Brady & Bruce Riggins.
J. Christiansen, CERN - EP/MIC
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
© 2004 Mercury Computer Systems, Inc. FPGAs & Software Components Graham Bardouleau & Jim Kulp Mercury Computer Systems, Inc. High Performance Embedded.
DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Novel, Emerging Computing System Technologies Smart Technologies for Effective Reconfiguration: The FASTER approach.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
EE121 John Wakerly Lecture #15
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Time-borrowing platform in the Xilinx UltraScale+ family of FPGAs and MPSoCs Ilya Ganusov, Benjamin Devlin.
Challenges Implementing Complex Systems with FPGA Components
Programmable logic and FPGA
Presentation transcript:

FPGA Place & Route Challenges Rajat Aggarwal Sr Director, FPGA Implementation Tools March 31st, 2014

Agenda FPGA Evolution Placement Challenges Routing Challenges Open Areas of Research

FPGA Technology Evolution Take-Away: Our 20nm story must be tied back to our 28nm breakout to fully understand our strategy and vision To set the context for our 20nm story, it helps to know what exactly changed at 28. Originally, most people traditionally thought of us as just a programmable ‘logic’ design company, as on the left, Yet at 28nm we made a significant break-out from being a ‘programmable logic’ company to being an ‘ALL PROGRAMMABLE” company by providing not just FPGAs, but SoC and 3D IC devices, essentially employing ‘All’ forms of programmable technologies; this meant going beyond beyond digital to analog mixed signal (AMS), beyond programmable hardware to software, and beyond single die to multi-die 3DIC implementations. This new portfolio enables much more than programmable ‘logic’ design, but ‘programmable systems integration’, in effect putting more and more functionality into a single device. By doing this customers maximize overall system performance, lower total system power, and reduce overall BOM cost . Programmable Logic Devices Enables Programmable “Logic” All Programmable Devices Enables Programmable “Systems Integration”

Device Sizes Over last 5 Xilinx Generations Logic Cells LUTs FFs Distributed RAM DSP Block RAM IOs V4 220 200,448 178,176* 178,176 1,392 96 6,048 960 V5 330 330,000 207,360 3,420 192 10,368 1200 V6 760 758,784 474,240 948,480 8,280 864 25,920 V7 2000T + 1,954,560 1,221,600 2,443,200 21,550 2160 46,512 US 440 + 4,407,480 2,518,560 5,037,120 28,700 2880 88,600 1456 Biggest devices in each Xilinx architecture family Lots of other components such as: PCIe, MMCMs, PLLs, GTs not shown * - V4 used LUT4. All other families use LUT6 + - 3D devices

Increased Complexity Increase of around 15x-30x over last the 10 years A lot more hardened blocks in the devices

Increased Complexity - Challenges Fast Changing New architecture every 2 years More special modules/IPs with strict performance requirements Turnaround Time Customer expectation of 3-4 turns per day on largest devices Translates to 2-3 hours runtime for the entire flow Multi-threading/Multi-Processing/Incremental Flows Performance Heterogeneous blocks with fixed discrete locations Large devices with skewed aspect ratios pose routing challenges Simultaneous optimization of Power, Timing and Congestion metrics

3D FPGAs Multiple adjacent Super Logic Regions (SLRs) Super Long Lines (SLLs) cross from SLR, over interposer, to SLR 10K-15K SLLs between adjacent SLRs Compared to 1.2K-1.4K IOs per FPGA Package Substrate SLR V7 2000T SLR SLLs

3D FPGAs - Challenges P&R Tools need to make the SSI devices seamless to Customers No floorplanning requirements Minimal performance impact Congestion management CLB, BRAM, DSP HR (3.3V) I/O HP (1.8V) I/O CMT GTP GTX GTH CFG, AES, XADC Clock Routing

Programmable SoCs - Challenges Embedded Dual ARM Cortex-A9 MPCore Challenges Congestion management at the Processor Boundary New IPs interfacing with the Processor

Agenda FPGA Evolution Placement Challenges Routing Challenges Open Areas of Research

IO Banking Rules and Compatibility group of IO sites that share common VREF and VCCO voltages Only IOs with compatible standards can go to the same IO Bank Compatibility Rules Numerous and complicated Change from architecture to architecture

UltraScale Clocking Architecture Flexible ASIC style clocking network Clocking network defined by software IOx52 Clocking PCIe Clocking IOx52 IOx52 Clocking CoreIO Clocking IOx52 IOx52 Clocking CoreIO Clocking IOx52 IOx52 Clocking CFG IO Clocking IOx52 XAMS IOx52 Clocking Config Clocking IOx52 IOx52 Clocking PCIe Clocking IOx52 IOx52 Clocking PCIe Clocking IOx52 IOx52 Clocking CoreIO Clocking IOx52 IOx52 Clocking CoreIO Clocking IOx52 IOx52 Clocking CFG IO Clocking IOx52 XAMS IOx52 Clocking Config Clocking IOx52 IOx52 Clocking PCIe Clocking IOx52

Placement Challenges Heterogeneous Placement Handle Multiple Resources DSPs BRAMs Heterogeneous Placement Handle Multiple Resources Discrete Resource (DSP/Block-RAM) Not Always One-to-One map (example: LUTRAM) FPGA Legalization Example: Control Sets Complex, time consuming and changing

Agenda FPGA Evolution Placement Challenges Routing Challenges Open Areas of Research

Interconnect delays are not Monotonic minDly = 40 maxDly = 100 minDly = 30 maxDly = 80 minDly = 50 minDly = 20 maxDly = 40 minDly = 10 maxDly = 15 A C B E D F Delay(ACDF) > Delay(ABEF) Manhattan Distance(ACDF) < Manhattan Distance(ABEF)

Routing tracks already exist minDly = 40 maxDly = 100 minDly = 30 maxDly = 80 minDly = 50 minDly = 20 maxDly = 40 minDly = 10 maxDly = 15 A C B E D F Unit delays of these wires can differ substantially Small changes can generate jump in delays Best Path: SlowMaxDly = 155ps Next Best Path: SlowMaxDly = 175ps

Need to Optimize Multiple Corners at once minDly = 40 maxDly = 100 minDly = 30 maxDly = 80 minDly = 50 minDly = 20 maxDly = 40 minDly = 10 maxDly = 15 A C B E D F Constraint: FastMinDly > 80ps, SlowMaxDly < 180ps Path (ACDF) FastMin = 90ps, SlowMax = 175ps Path (ABEF) FastMin = 70ps, SlowMax = 155ps

Agenda FPGA Evolution Placement Challenges Routing Challenges Open Areas of Research

Incremental Flows Evaluation 3D FPGAs Scalability Open Areas of Research Ultrafast compilations for small changes Emulation and OpenCL markets Incremental Flows Fast and accurate evaluation of new architectures Create new methods of Abstractions Evaluation Adoption is set to increase more and more Different configurations with non-identical dice 3D FPGAs Design size 750K  2.0M  4.4M  ? Need to deliver 2x-3x scalability every 2 years Massive Multi-threading? Multi-Processing? Scalability