The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays Steven J.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

Cadence Design Systems, Inc. Why Interconnect Prediction Doesn’t Work.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
BSPlace: A BLE Swapping technique for placement Minsik Hong George Hwang Hemayamini Kurra Minjun Seo 1.
Floating-Point FPGA (FPFPGA) Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009.
1 Optimization of Routing Algorithms Summer Science Research By: Kumar Chheda Research Mentor: Dr. Sean McCulloch.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
EECE579: Digital Design Flows
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. Yu August 15, 2005.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Hsiang-Chieh Liao 01/15/04 FFAST: An FPGA Fault Simulation Tool for Stuck-at and Path-Delay Faults Laboratory for Reliable Computing (LaRC) Electrical.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Build-In Self-Test of FPGA Interconnect Delay Faults Laboratory for Reliable Computing (LaRC) Electrical Engineering Department National Tsing Hua University.
Architecture and Routing for NoC-based FPGA Israel Cidon* *joint work with Roman Gindin and Idit Keidar.
HARP: Hard-Wired Routing Pattern FPGAs Cristinel Ababei , Satish Sivaswamy ,Gang Wang , Kia Bazargan , Ryan Kastner , Eli Bozorgzadeh   ECE Dept.
Techniques and Algorithms for Fault Grading of FPGA Interconnect Test Configurations Mehdi Baradaran Tahoori and Subhasish Mitra IEEE Transactions on Computer-Aided.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
ECE 506 Reconfigurable Computing Lecture 8 FPGA Placement.
Dr. Konstantinos Tatas ACOE201 – Computer Architecture I – Laboratory Exercises Background and Introduction.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Power Reduction for FPGA using Multiple Vdd/Vth
Shashi Kumar 1 Logic Synthesis: Course Introduction Shashi Kumar Embedded System Group Department of Electronics and Computer Engineering Jönköping Univ.
FPGA Switch Block Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.
ECE 465 Introduction to CPLDs and FPGAs Shantanu Dutt ECE Dept. University of Illinois at Chicago Acknowledgement: Extracted from lecture notes of Dr.
PROGRAMMABLE LOGIC DEVICES (PLD)
New Modeling Techniques for the Global Routing Problem Anthony Vannelli Department of Electrical and Computer Engineering University of Waterloo Waterloo,
J. Christiansen, CERN - EP/MIC
Heterogeneous FPGA architecture and CAD Peter Jamieson Supervisor: Jonathan Rose.
Hardware Support for Trustworthy Systems Ted Huffmire ACACES 2012 Fiuggi, Italy.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
A Routing Approach to Reduce Glitches in Low Power FPGAs Quang Dinh, Deming Chen, Martin D. F. Wong Department of Electrical and Computer Engineering University.
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
Impact of Interconnect Architecture on VPSAs (Via-Programmed Structured ASICs) Usman Ahmed Guy Lemieux Steve Wilton System-on-Chip Lab University of British.
Lecture 10: Logic Emulation October 8, 2013 ECE 636 Reconfigurable Computing Lecture 13 Logic Emulation.
Congestion Estimation and Localization in FPGAs: A Visual Tool for Interconnect Prediction David Yeager Darius Chiu Guy Lemieux The University of British.
Lecture 13: Logic Emulation October 25, 2004 ECE 697F Reconfigurable Computing Lecture 13 Logic Emulation.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
1 A Min-Cost Flow Based Detailed Router for FPGAs Seokjin Lee *, Yongseok Cheon *, D. F. Wong + * The University of Texas at Austin + University of Illinois.
Topics Architecture of FPGA: Logic elements. Interconnect. Pins.
Timing-Driven Routing for FPGAs Based on Lagrangian Relaxation
1 Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer.
1 Carnegie Mellon University Center for Silicon System Implementation An Architectural Exploration of Via Patterned Gate Arrays Chetan Patel, Anthony Cozzie,
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
An Improved “Soft” eFPGA Design and Implementation Strategy
FPGA CAD 10-MAR-2003.
1 Field-programmable Gate Array Architectures and Algorithms Optimized for Implementing Datapath Circuits Andy Gean Ye University of Toronto.
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
Review of “Register Binding for FPGAs with Embedded Memory” by Hassan Al Atat and Iyad Ouaiss Lisa Steffen CprE 583.
Architecture and algorithm for synthesizable embedded programmable logic core Noha Kafafi, Kimberly Bozman, Steven J. E. Wilton 2003 Field programmable.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Reconfigurable Architectures Greg Stitt ECE Department University of Florida.
Congestion-Driven Re-Clustering for Low-cost FPGAs MASc Examination Darius Chiu Supervisor: Dr. Guy Lemieux University of British Columbia Department of.
Enhancing the Area-Efficiency of FPGAs with Hard Blocks Using Shadow Clusters Peter Jamieson and Jonathan Rose.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
A Survey of Fault Tolerant Methodologies for FPGA’s Gökhan Kabukcu
1 Architecture of Datapath- oriented Coarse-grain Logic and Routing for FPGAs Andy Ye, Jonathan Rose, David Lewis Department of Electrical and Computer.
Oleg Petelin and Vaughn Betz FPL 2016
Floating-Point FPGA (FPFPGA)
Mapping into LUT Structures
Verilog to Routing CAD Tool Optimization
Islamic University - Gaza
A New Hybrid FPGA with Nanoscale Clusters and CMOS Routing Reza M. P
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays The Memory/Logic Interface in FPGA’s with Large Embedded Memory Arrays Steven J. E. Wilton, Member, IEEE, Jonathan Rose, Member, IEEE, and Zvonko G. Vranesic, Senior Member, IEEE Laboratory of Reliable Computing Department of Electrical Engineering National Tsing Hua University Hsinchu, Taiwan

Reference  S. J. E. Wilton, “Architectures and algorithms for field-programmable gate arrays with embedded memory,” Ph.D. dissertation, Dept. Elect. Comput. Eng., Univ. Toronto, Toronto, Ont., Canada, 1997.

Outline  Introduction  Baseline architecture  Experiment methodology and result  Enhanced architecture and its improvement

Introduction  In the past, FPGA’s have been primarily used to implement small logic subcircuits  As the capacities of FPGA’s grow, they will be use to implement much larger circuits than ever before  In order to address the storage requirement of large system, FPGA with large embedded memory arrays are now developed by many vendors

Introduction  One of the challenges when embedding memory arrays into FPGA is to provide enough interconnect between memory arrays and logic resources

Baseline Architecture

Memory/Logic Interconnect Block

Benchmark Circuit Generation  Need to generate benchmark circuit for the architecture because  Typical circuits have only a few memories each  To gather hundreds of those is not feasible  The solution is to study the types of memory configuration found in systems, and develop a stochastic memory configuration generator  Make sure they are realistic by some circuit analysis

Circuit Analysis  Memory configuration  Logic memory clustering  Interconnect patterns  Point to point patterns  Shared-connection patterns  Point to point with no shuffling patterns

Memory Configurations  171 circuits with total of 268 user memories, they are from  Recent conference proceeding  Recent journal articles  Local designer  Customer study conducted by Atera

Memory Configurations

Logic Memory Clustering

Interconnect Patterns

Stochastic Circuit Generation  A stochastic circuit generator is developed using the statistics gathered during circuit analysis  The steps of generating a benchmark circuit  Choosing logical memory configuration  Division logical memories into cluster  Choosing interconnect pattern for each cluster  Choosing number of data-in data-out subcircuits for the clusters  Generate logic subcircuits and connect them to memory arrays

Implementation Tool  Each benchmark circuit generated is “implemented” in each FPGA  Logical to physical mapping  Placement  Place memory and logic blocks simultaneously  Routing  Initially nets to memory have higher priority  Between each iteration the nets are reordered  Repeat 10 times  Increase W  Determine the minimum value of W

Memory/Logic Flexibility Result

Area Result  The area of the FPGA is the sum of  Logic blocks  Memory blocks  Routing resources  Programmable switch  Programming bits  Metal routing segments

Area Result

Delay Result  A delayed model is used to measure the memory read time of all memories in the circuit  CACTI: to estimate array access time  Elmore: address in and data out

Delay Result

Issues Issues  Nets connect more than one memory block to one or more than one logic block  When combining the small memory arrays to implement a large one  When data in pins of several user memories are driven by a common data bus  Such nets often appear but unfortunately they are hard to route, especially for larger architecture  We can use higher value of Fm for larger architecture or?

Further Investigation

Enhanced Architecture  The above motivates them to study memory to memory connection more closely  An enhanced architecture  Adding extra switches between memory arrays to support these nets  Result  Extra switches take up negligible area  Improvement in both speed and routability

Enhanced Architecture

Baseline Architecture

Enhanced Architecture

Evaluation of Enhanced Architecture  Maze routing algorithm must be restricted such that it uses memory-to-memory switches only to implement memory-to-memory connection  If the maze router is not modified…

Routing Result Using Standard Maze

Modified Maze  Even though some tracks will be wasted if a circuit contains no or few memory-to-memory connections, it alleviates the problem above

Area Result

Delay Result

Conclusion  Even with this relatively unaggressive use of the memory-to-memory switches, area is improved somewhat and speed is improved significantly  The development of algorithms that use these tracks more aggressively is left as future work  The enhanced architecture reduces the channel width by 0.5~1 tracks, and improved the speed by 25%