HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array William Tsu, Kip Macy, Atul Joshi, Randy Huang, Norman Walker, Tony Tung, Omid Rowhani,

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

Altera FLEX 10K technology in Real Time Application.
BRASS Eylon Caspi, Michael Chu, Randy Huang, Joseph Yeh, John Wawrzynek University of California, Berkeley – BRASS.
Floating-Point FPGA (FPFPGA) Architecture and Modeling (A paper review) Jason Luu ECE University of Toronto Oct 27, 2009.
Architecture Design Methodology. 2 The effects of architecture design on metrics:  Area (cost)  Performance  Power Target market:  A set of application.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day1: September 25, 2000 Introduction and Overview.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
UC Berkeley BRASS Group Post Placement C-Slow Retiming for Xilinx Virtex FPGAs Nicholas Weaver Yury Markovskiy Yatish Patel John Wawrzynek UC Berkeley.
CS294-6 Reconfigurable Computing Day 5 September 8, 1998 Comparing Computing Devices.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 21: April 2, 2007 Time Multiplexing.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day10: October 25, 2000 Computing Elements 2: Cascades, ALUs,
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 20: March 28, 2007 Retiming 2: Structures and Balance.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
Penn ESE Fall DeHon 1 ESE (ESE534): Computer Organization Day 19: March 26, 2007 Retime 1: Transformations.
Storage Assignment during High-level Synthesis for Configurable Architectures Wenrui Gong Gang Wang Ryan Kastner Department of Electrical and Computer.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
CS294-6 Reconfigurable Computing Day 2 August 27, 1998 FPGA Introduction.
HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array William Tsu, Kip Macy, Atul Joshi, Randy Huang, Norman Walker, Tony Tung, Omid Rowhani,
Trends toward Spatial Computing Architectures Dr. André DeHon BRASS Project University of California at Berkeley.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 18: February 21, 2003 Retiming 2: Structures and Balance.
CS294-6 Reconfigurable Computing Day 16 October 15, 1998 Retiming.
CS294-6 Reconfigurable Computing Day 14 October 7/8, 1998 Computing with Lookup Tables.
CS294-6 Reconfigurable Computing Day 19 October 27, 1998 Multicontext.
CS294-6 Reconfigurable Computing Day 16 October 20, 1998 Retiming Structures.
Balancing Interconnect and Computation in a Reconfigurable Array Dr. André DeHon BRASS Project University of California at Berkeley Why you don’t really.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
BRASS Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael.
Octavo: An FPGA-Centric Processor Architecture Charles Eric LaForest J. Gregory Steffan ECE, University of Toronto FPGA 2012, February 24.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Lecture 5. Sequential Logic 3 Prof. Taeweon Suh Computer Science Education Korea University 2010 R&E Computer System Education & Research.
Power Reduction for FPGA using Multiple Vdd/Vth
Coarse and Fine Grain Programmable Overlay Architectures for FPGAs
Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of.
Amalgam: a Reconfigurable Processor for Future Fabrication Processes Nicholas P. Carter University of Illinois at Urbana-Champaign.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
Safe Overclocking Safe Overclocking of Tightly Coupled CGRAs and Processor Arrays using Razor © 2012 Guy Lemieux Alex Brant, Ameer Abdelhadi, Douglas Sim,
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Design Space Exploration for Application Specific FPGAs in System-on-a-Chip Designs Mark Hammerquist, Roman Lysecky Department of Electrical and Computer.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
1 Leakage Power Analysis of a 90nm FPGA Authors: Tim Tuan (Xilinx), Bocheng Lai (UCLA) Presenter: Sang-Kyo Han (ECE, University of Maryland) Published.
Caltech CS184 Winter DeHon CS184: Computer Architecture (Structure and Organization) Day 1: January 6, 2003 Introduction and Overview.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day16: November 15, 2000 Retiming Structures.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 8: January 27, 2003 Empirical Cost Comparisons.
EE121 John Wakerly Lecture #15
FPGA Logic Cluster Design Dr. Philip Brisk Department of Computer Science and Engineering University of California, Riverside CS 223.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #23 – Function.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
Introduction to Intrusion Detection Systems. All incoming packets are filtered for specific characteristics or content Databases have thousands of patterns.
LB Logic Block LB Logic Block LB Logic Block LB Logic Block LB Logic Block LB Logic Block LB Logic Block LB Logic Block LB Logic Block S/V block I/O Cell.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 22: April 16, 2014 Time Multiplexing.
Lecture 4: Contrasting Processors: Fixed and Configurable September 20, 2004 ECE 697F Reconfigurable Computing Lecture 4 Contrasting Processors: Fixed.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #22 – Multi-Context.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 20: February 27, 2005 Retiming 2: Structures and Balance.
Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 21: April 12, 2010 Retiming.
CS184a: Computer Architecture (Structure and Organization)
Gouraud-shaded Triangle Rasterization
ESE534: Computer Organization
ESE534: Computer Organization
FPGA Glitch Power Analysis and Reduction
ESE534: Computer Organization
Pipelining: critical path, pipeline hazards Prof. Eric Rotenberg
Presentation transcript:

HSRA: High-Speed, Hierarchical Synchronous Reconfigurable Array William Tsu, Kip Macy, Atul Joshi, Randy Huang, Norman Walker, Tony Tung, Omid Rowhani, Varghese George, John Wawrzynek, and André DeHon BRASS Project University of California at Berkeley

Myth FPGAs inherently run at an order of magnitude lower clock rates than microprocessors.

Don’t Believe It! Example: XC4000XL-09 (0.35  m) –Minimum clock low/high 2.3ns  4.6ns cycle –Composing: clock  Q 1.5ns interconnect budget 1.5ns logic  clock setup 1.6ns 4.6ns Also: Von Herzen FPGA97, XC  4ns

Cycle Comparison FPGA cycles comparable to contemporary microprocessors.

Outline FPGA cycle times Why low frequency? Architecture and CAD for high frequency HSRA Experiments Assessment

Why FPGA designs run slowly? Few designs run at 200+MHz Limited application/user requirements 2. Cyclic data dependencies 3. Poor tool support 4. Long interconnect delays 5. Pipelining expensive?

HSRA High-Speed, Hierarchical Synchronous Reconfigurable Array Attacks architecture and CAD impediments –pipeline the interconnect (4) –balance retiming resources (5) –CAD for auto retiming (3)

HSRA Architecture

Pipelined Interconnect

Input Retiming

Flop Experiment #1 Pipeline and retime to single LUT delay per cycle –MCNC benchmarks to LUTs –no interconnect accounting –average 1.7 registers/LUT (some circuits 2--7)

Add Interconnect Delays

Flop Experiment #2 Pipeline and retime to HSRA cycle –place on HSRA –single LUT or interconnect domain –same MCNC benchmarks –average 4.7 registers/LUT

Input Depth Optimization Real design, fixed input retiming depth –truncate deeper and allocate additional logic blocks

Assessment Cost: –our designs: 1.5  area of no pipelining –plausible ballpark for other designs –w/ 8 deep retiming, 20% BLB overhead –total: 1.8  area Running LUT  LUT delay on FPGA –70% overhead for retiming –freq still vary with interconnect Benefits –2--17  higher frequency operation than unpipelined  Net Area-Time win + automation/consistency

Summary No inherent reasons for FPGAs/RC arrays to run slower than microprocessors Current FPGAs lack architectural and CAD support to reliably achieve high clock rates HSRA demonstrates how to attack problems –retiming balance – interconnect pipelining – automated retiming