Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.

Slides:



Advertisements
Similar presentations
Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
Advertisements

FPGA (Field Programmable Gate Array)
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
Floating-Point FPGA (FPFPGA) Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009.
Application-Specific Customization of FPGA Soft- core Processors Journal Paper Presentation Presented by: Ahmad Sghaier Course Instructor: Dr. Shawki Areibi.
Reducing the Pressure on Routing Resources of FPGAs with Generic Logic Chains Hadi P. Afshar Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Application-Specific Customization of Parameterized FPGA Soft-Core Processors David Sheldon a, Rakesh Kumar b, Roman Lysecky c, Frank Vahid a*, Dean Tullsen.
A Configurable Logic Architecture for Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department of Computer Science and Engineering.
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning Roman Lysecky, Frank Vahid* Department.
Dynamic FPGA Routing for Just-in-Time Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer Science and Engineering.
FPGA Defect Tolerance: Impact of Granularity Anthony YuGuy Lemieux December 14, 2005.
Architecture and Synthesis for Power-Efficient FPGAs Jason Cong University of California, Los Angeles Partially supported by NSF Grants.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Dynamic Power Consumption In Large FPGAs WILLIAM GARCIA, ANDREW MORTELLARO.
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
Dr. Konstantinos Tatas ACOE201 – Computer Architecture I – Laboratory Exercises Background and Introduction.
Yehdhih Ould Mohammed Moctar1 Nithin George2 Hadi Parandeh-Afshar2
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
An automatic tool flow for the combined implementation of multi-mode circuits Brahim Al Farisi, Karel Bruneel, João Cardoso, Dirk Stroobandt.
Power Reduction for FPGA using Multiple Vdd/Vth
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
LOPASS: A Low Power Architectural Synthesis for FPGAs with Interconnect Estimation and Optimization Harikrishnan K.C. University of Massachusetts Amherst.
Un/DoPack: Re-Clustering of Large System-on-Chip Designs with Interconnect Variation for Low-Cost FPGAs Marvin Tom* Xilinx Inc.
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.
1 Rapid Estimation of Power Consumption for Hybrid FPGAs Chun Hok Ho 1, Philip Leong 2, Wayne Luk 1, Steve Wilton 3 1 Department of Computing, Imperial.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
1 Extending Atmel FPGA Flow Nikos Andrikos TEC-EDM, ESTEC, ESA, Netherlands DAUIN, Politecnico di Torino, Italy NPI Final Presentation 25 January 2013.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.
Introduction to FPGAs Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
A Hybrid Design Space Exploration Approach for a Coarse-Grained Reconfigurable Accelerator Farhad Mehdipour, Hamid Noori, Hiroaki Honda, Koji Inoue, Kazuaki.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
1 Carnegie Mellon University Center for Silicon System Implementation An Architectural Exploration of Via Patterned Gate Arrays Chetan Patel, Anthony Cozzie,
An Improved “Soft” eFPGA Design and Implementation Strategy
FPGA CAD 10-MAR-2003.
WARP PROCESSORS ROMAN LYSECKY GREG STITT FRANK VAHID Presented by: Xin Guan Mar. 17, 2010.
An Automated Development Framework for a RISC Processor with Reconfigurable Instruction Set Extensions Nikolaos Vassiliadis, George Theodoridis and Spiridon.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Routing Wire Optimization through Generic Synthesis on FPGA Carry Hadi P. Afshar Joint work with: Grace Zgheib, Philip Brisk and Paolo Ienne.
Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of.
Defect-tolerant FPGA Switch Block and Connection Block with Fine-grain Redundancy for Yield Enhancement Anthony J. YuGuy G.F. Lemieux August 25, 2005.
ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.
Application-Specific Customization of Soft Processor Microarchitecture Peter Yiannacouras J. Gregory Steffan Jonathan Rose University of Toronto Electrical.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Presenter: Yi-Ting Chung Fast and Scalable Hybrid Functional Verification and Debug with Dynamically Reconfigurable Co- simulation.
A Study of the Scalability of On-Chip Routing for Just-in-Time FPGA Compilation Roman Lysecky a, Frank Vahid a*, Sheldon X.-D. Tan b a Department of Computer.
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Dynamic and On-Line Design Space Exploration for Reconfigurable Architecture Fakhreddine Ghaffari, Michael Auguin, Mohamed Abid Nice Sophia Antipolis University.
Floating-Point FPGA (FPFPGA)
Application-Specific Customization of Soft Processor Microarchitecture
MAPLD 2005 Reduced Triple Modular Redundancy for Tolerating SEUs in SRAM based FPGAs Vikram Chandrasekhar, Sk. Noor Mahammad, V. Muralidharan Dr. V. Kamakoti.
Autonomously Adaptive Computing: Coping with Scalability, Reliability, and Dynamism in Future Generations of Computing Roman Lysecky Department of Electrical.
An Active Glitch Elimination Technique for FPGAs
Off-path Leakage Power Aware Routing for SRAM-based FPGAs
A New Hybrid FPGA with Nanoscale Clusters and CMOS Routing Reza M. P
Application-Specific Customization of Soft Processor Microarchitecture
Presentation transcript:

Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer Engineering University of Arizona, Tucson AZ, USA

2 Introduction and Motivation FPGAs vs. ASICs  FPGAs vs ASICs in SoC Designs  Advantages of FPGAs  Programmed by downloading bits to the FPGA  Much like software executing on a microprocessor  Allows hardware modifications throughout the development cycle  And, even after manufacturing  Correct costly design errors without requiring respin  Dynamically reconfigurable  FPGAs can be used to implement multiple hardware circuits throughout its execution  Disadvantages of FPGAs  10-40x larger than ASICs  5-12x more power than ASICs  3-4x longer delay than ASICs  Kuon et al. FPGA 2006 University of Arizona µPµP Periphs I$ D$ FPGA µPµP Periph(s) I$ D$ ASIC How can we take advantage of FPGAs without the significant overheads?

3 Introduction and Motivation Application-Specific FPGAs  SoCs require fabrication  Provides an opportunity to customize the FPGA architecture  Reduce area, reduce energy, improve performance  Application-Specific FPGA  Create an FPGA architecture tailored to the specific hardware circuit  Flexible-optimized  Optimized for one application, but flexible enough to implement other hardware circuits or additions  Fully-optimized  Highly optimized for one application – only flexible enough to support minor changes  Trades off flexibility for smaller area/power/delay University of Arizona HW Circuit ASFPGA Generation FPGA Architecture & Bitstream µPµP Periphs I$ D$ FPGA ASFPGA

4 Introduction and Motivation Previous Work University of Arizona  Researchers have investigated various methods for optimizing reconfigurable fabrics  Levinthal et al. (DesignCon, 2005)  Coarse-grained reconfigurable logic cells with fixed routing  Aken’Ova et al. (IEEE Custom IC, 2005)  FPGA-specific standard cells  Rose et al. (FPGA 2003, 2005)  Auto generate transistor-level implementation of FPGA from architectural description  Enabling technology  Holland et al. (FPL 2004, 2005; FPGA, 2006)  Automated tool flow for creating domain-specific reconfigurable logic  Domains: floating point, arithmetic, encryption, sorters

5 Application-Specific FPGAs (ASFPGAs) Traditional FPGA CAD Tool Flow  Traditional CAD Tool Flow  Utilize academic FPGA CAD tools to map hardware circuits to target FPGA  Technology mapping (FlowMap)  Packing (T-VPack)  Placement and routing (VPR)  FPGA architecture is known a prioiri and represents the target FPGA  Application-Specific FPGA  FPGA’s architectural features can be tuned to the target hardware circuit  FPGA CAD tools can be utilized to explore the available architectural options  Currently focus on a creating a flexible-optimized ASFPGA HW Circuit (BLIF) Tech. Mapping (FlowMap) Mapped Circuit (BLIF) Packing (T-VPack) Packed Circuit (Netlist) Placement/Routing (VPR) HW BitstreamDesign Metrics (Area, Delay, Energy) LUT Size CLB Size Connectivity/Channel Width/FPGA Size FPGA Arch. University of Arizona

6 Application-Specific FPGAs (ASFPGAs) Design Space Exploration Framework  Design Space Exploration Framework  Explores a set of configurable options for the target FPGA  Goal: Find lowest area/delay/power FPGA architecture for target application  Configurable FPGA Options  LUT Size:  3-, 4-, or 5-input LUTs  CLB Size:  2 or 4 LUT CLBs  Connection Block Connectivity:  100%, 90%, 80%, 70%, 60%  FPGA Size:  NxN fixed size  Channel Width:  100%-130% of minimum channel width  More configurable options exist, but are not considered at this time University of Arizona HW Circuit (BLIF) Tech. Mapping (FlowMap) Mapped Circuit (BLIF) Design Space Exploration for ASFPGAs Packing/Activity Est. (T-VPack) Packed Circuit (Netlist) Switching Activity Placement/Routing/Power Est. (VPR with Power Model) HW BitstreamDesign Metrics (Area, Delay, Energy) LUT Size CLB Size Connectivity/Channel Width/FPGA Size FPGA Arch. & Bitstream

7 Application-Specific FPGAs (ASFPGAs) Experimental Setup  Experimental Setup  Consider several MCNC benchmark circuits of varying complexity  alu4, apex6, bigkey, cordic, des, dsip, misex1, mult32a, s1423, s298  Design Metric Calculation  Delay is reported by VPR after routing  Power Model utilized to estimate power consumption  Poon et al. (TODAES 2005)  Area  Routing area is reported by VPR  Developed a transistor level estimation method to determine CLB area requirements University of Arizona HW Circuit (BLIF) Tech. Mapping (FlowMap) Mapped Circuit (BLIF) Design Space Exploration for ASFPGAs Packing/Activity Est. (T-VPack) Packed Circuit (Netlist) Switching Activity Placement/Routing/Power Est. (VPR with Power Model) HW BitstreamDesign Metrics (Area, Delay, Energy) LUT Size CLB Size Connectivity/Channel Width/FPGA Size FPGA Arch. & Bitstream

8 Experimental Results ASFPGA vs Delay/Energy/Area-Optimized FPGA  ASFPGA  Optimized for one particular hardware application  Design space exploration determined three best architectures for each circuit  Delay/Energy/Area-Optimized  Best average delay, energy, or area across all hardware circuits  Delay- and energy-optimized architecture:  5-input LUTs, 4 LUTs per CLB, 80% connectivity  Area-optimized architecture:  3-input LUTs, 2 LUTs per CLB, 90% connectivity University of Arizona

9 Experimental Results ASFPGA vs Delay/Energy/Area-Optimized FPGA  ASFPGA provides good reductions over delay-optimized, energy- optimized, and area-optimized FPGAs  5% faster, 10% more energy efficient, or 17% smaller, on average University of Arizona 67% less energy49% smaller26% faster

10 Experimental Results Experimental Results ASFPGA vs Balance-Optimized FPGA  ASFPGA  Optimized for one particular hardware application  Design space exploration determined three best architectures for each circuit  Balance-Optimized  Balanced FPGA between delay, energy, and area  Selected FPGA architecture with best average area/delay/energy (ADE) cost  ADE is average of the individual area, delay, energy costs for each FPGA across all benchmarks  Calculated as the area/delay/ energy for an architecture divided by max area/delay/ energy for that hardware circuit  FPGA architecture with best average ADE cost across all circuits:  5-input LUTs, 2 LUTs per CLB, 60% connectivity University of Arizona

11 Experimental Results ASFPGA vs Balance-Optimized FPGA  ASFPGA can provide significant reductions in delay/energy/area over balance-optimized FPGA  25% faster, 36% more energy efficient, or 28% smaller, on average University of Arizona 73% less energy 49% less area 39% shorter delay

12 Experimental Results ASFPGA vs Fixed-Size Balance-Optimized FPGA  ASFPGA  Optimized for one particular hardware application  Design space exploration determined three best architectures for each circuit  Fixed-Size Balance-Optimized  Limited to a fixed size and balanced between area, delay, and energy  Fixed size is min size needed to support all hardware benchmarks considered  63x63 CLBs University of Arizona

13 Experimental Results ASFPGA vs Fixed-Size Balance-Optimized FPGA  ASFPGA can provide significant reductions in delay/energy/area over fixed-size balance-optimized FPGA  50% faster, 75% more energy efficient, or 82% smaller, on average University of Arizona > 40% area savings for all circuits > 60% energy savings for most circuits

14 Conclusions and Future Work  Conclusions  Presented an initial design space exploration framework for Application- Specific FPGAs  Allows an FPGA architecture to be customized to a particular hardware circuit before manufacturing  Yet flexible enough to support changes to the hardware after fabrication  ASFPGAs are 5% faster, 10% more energy efficient, or 17% smaller than traditional metric-optimized FPGAs  As much as 50% faster, 75% more energy efficient, or 82% smaller, on average, compared to fixed-size balance-optimized FPGA  Current/Future Work  FPGA architecture customization that constructs/optimizes an FPGA from the logic characteristics of the hardware circuit  Potentially can provide significant additional savings by further customizing individual CLBs and routing resources – but yields irregular FPGA fabric  Requires new FPGA CAD tools to handle irregularity to support hardware modifications University of Arizona

15 Thanks Questions? University of Arizona