JR.S00 1 Lecture 13: (Re)configurable Computing Prof. Jan Rabaey Computer Science 252, Spring 2000 The major contributions of Andre Dehon to this slide.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

Penn ESE534 Spring DeHon 1 ESE534: Computer Organization Day 14: March 19, 2014 Compute 2: Cascades, ALUs, PLAs.
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
EECE579: Digital Design Flows
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day17: November 20, 2000 Time Multiplexing.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
CS294-6 Reconfigurable Computing Day 5 September 8, 1998 Comparing Computing Devices.
Introduction to Reconfigurable Computing CS61c sp06 Lecture (5/5/06) Hayden So.
Lecture 2: Field Programmable Gate Arrays I September 5, 2013 ECE 636 Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays I.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
CS294-6 Reconfigurable Computing Day 8 September 17, 1998 Interconnect Requirements.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day8: October 18, 2000 Computing Elements 1: LUTs.
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
DeHon March 2001 Rent’s Rule Based Switching Requirements Prof. André DeHon California Institute of Technology.
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 13: February 26, 2007 Interconnect 1: Requirements.
Evolution of implementation technologies
Programmable logic and FPGA
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 11: February 14, 2007 Compute 1: LUTs.
CS294-6 Reconfigurable Computing Day 2 August 27, 1998 FPGA Introduction.
Trends toward Spatial Computing Architectures Dr. André DeHon BRASS Project University of California at Berkeley.
February 4, 2002 John Wawrzynek
Penn ESE Spring DeHon 1 ESE (ESE534): Computer Organization Day 5: January 24, 2007 ALUs, Virtualization…
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Introduction to FPGA and DSPs Joe College, Chris Doyle, Ann Marie Rynning.
Digital Circuit Implementation. Wafers and Chips  Integrated circuit (IC) chips are manufactured on silicon wafers  Transistors are placed on the wafers.
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #3 – FPGA.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
CSET 4650 Field Programmable Logic Devices
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 13: February 4, 2005 Interconnect 1: Requirements.
February 12, 1998 Aman Sareen DPGA-Coupled Microprocessors Commodity IC’s for the Early 21st Century by Aman Sareen School of Electrical Engineering and.
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Reconfigurable Computing. This Class is About Reconfigurable Computing Computer Architecture Coping with Change.
ESE Spring DeHon 1 ESE534: Computer Organization Day 19: April 7, 2014 Interconnect 5: Meshes.
Power Reduction for FPGA using Multiple Vdd/Vth
Open Discussion of Design Flow Today’s task: Design an ASIC that will drive a TV cell phone Exercise objective: Importance of codesign.
J. Christiansen, CERN - EP/MIC
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Caltech CS184a Fall DeHon1 CS184a: Computer Architecture (Structures and Organization) Day11: October 30, 2000 Interconnect Requirements.
Reconfigurable Computing Using Content Addressable Memory (CAM) for Improved Performance and Resource Usage Group Members: Anderson Raid Marie Beltrao.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
Basic Sequential Components CT101 – Computing Systems Organization.
EE3A1 Computer Hardware and Digital Design
CprE / ComS 583 Reconfigurable Computing Prof. Joseph Zambreno Department of Electrical and Computer Engineering Iowa State University Lecture #4 – FPGA.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
1 Carnegie Mellon University Center for Silicon System Implementation An Architectural Exploration of Via Patterned Gate Arrays Chetan Patel, Anthony Cozzie,
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 8: January 27, 2003 Empirical Cost Comparisons.
EE121 John Wakerly Lecture #15
Penn ESE534 Spring DeHon 1 ESE534 Computer Organization Day 9: February 13, 2012 Interconnect Introduction.
Lecture 17: Dynamic Reconfiguration I November 10, 2004 ECE 697F Reconfigurable Computing Lecture 17 Dynamic Reconfiguration I Acknowledgement: Andre DeHon.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 11: January 31, 2005 Compute 1: LUTs.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Caltech CS184 Winter DeHon 1 CS184a: Computer Architecture (Structure and Organization) Day 10: January 28, 2005 Empirical Comparisons.
ESE532: System-on-a-Chip Architecture
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
ESE534 Computer Organization
ESE532: System-on-a-Chip Architecture
Instructor: Dr. Phillip Jones
Electronics for Physicists
Lecture 41: Introduction to Reconfigurable Computing
Electronics for Physicists
CprE / ComS 583 Reconfigurable Computing
Programmable logic and FPGA
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

JR.S00 1 Lecture 13: (Re)configurable Computing Prof. Jan Rabaey Computer Science 252, Spring 2000 The major contributions of Andre Dehon to this slide set are gratefully acknowledged

JR.S00 2 Computers in the News … TI announces 2 new DSPs C64x –Up to 1.1 GHz –9 Billion Operations/sec –10x performance of C62x –32 full-rate DSL modems on a single chip! C55x –0.05 mW/MIPS (20 MIPS/mW!) –Cut power consumption of C54x by 85% –5x performance of C54x

JR.S00 3 C64x

JR.S00 4 Enhanced performance for communications and multimedia

JR.S00 5 From the C54x core …

JR.S00 6 To the C55x

JR.S00 7 Leading to higher energy efficiency (?)

JR.S00 8 Evaluation metrics for Embedded Systems Flexibility Power Cost Performance as a Functionality Constraint (“Just-in-Time Computing”) Components of Cost –Area of die / yield –Code density (memory is the major part of die size) –Packaging –Design effort –Programming cost –Time-to-market –Reusability

JR.S00 9 Special Instructions for Specific Applications

JR.S00 10 What is Configurable Computing? Spatially-programmed connection of processing elements “Hardware” customized to specifics of problem. Direct map of problem specific dataflow, control. Circuits “adapted” as problem requirements change.

JR.S00 11 Spatial vs. Temporal Computing Spatial Temporal

JR.S00 12 Defining Terms Computes one function (e.g. FP-multiply, divider, DCT) Function defined at fabrication time Computes “any” computable function (e.g. Processor, DSPs, FPGAs) Function defined after fabrication Fixed Function: Programmable: Parameterizable Hardware: Performs limited “set” of functions

JR.S00 13 “Any” Computation? (Universality) Any computation which can “fit” on the programmable substrate Limitations: hold entire computation and intermediate data Recall size/fit constraint

JR.S00 14 Benefits of Programmable Non-permanent customization and application development after fabrication –“Late Binding” economies of scale (amortize large, fixed design costs) time-to-market (evolving requirements and standards, new ideas) Disadvantages Efficiency penalty (area, performance, power) Correctness Verification

JR.S00 15 Spatial/Configurable Benefits 10x raw density advantage over processors Potential for fine-grained (bit-level) control --- can offer another order of magnitude benefit Locality! Each compute/interconnect resource dedicated to single function Must dedicate resources for every computational subtask Infrequently needed portions of a computation sit idle --> inefficient use of resources Spatial/Configurable Drawbacks

JR.S00 16 Density Comparison

JR.S00 17 Processor vs. FPGA Area

JR.S00 18 Processors and FPGAs

JR.S00 19 Early RC Successes Fastest RSA implementation is on a reconfigurable machine (DEC PAM) Splash2 (SRC) performs DNA Sequence matching 300x Cray2 speed, and 200x a 16K CM2 Many modern processors and ASICs are verified using FPGA emulation systems For many signal processing/filtering operations, single chip FPGAs outperform DSPs by x.

JR.S00 20 Issues in Configurable Design Choice and Granularity of Computational Elements Choice and Granularity of Interconnect Network (Re)configuration Time and Rate –Fabrication time --> Fixed function devices –Beginning of product use --> Actel/Quicklogic FPGAs –Beginning of usage epoch --> (Re)configurable FPGAs –Every cycle --> traditional Instruction Set Processors

JR.S00 21 The Choice of the Computational Elements ReconfigurableLogicReconfigurableDatapathsReconfigurableArithmeticReconfigurableControl Bit-Level Operations e.g. encoding Dedicated data paths e.g. Filters, AGU Arithmetic kernels e.g. Convolution RTOS Process management

JR.S00 22 FPGA Basics LUT for compute FF for timing/retiming Switchable interconnect …everything we need to build fixed logic circuits –don’t really need programmable gates –latches can be built from gates

JR.S00 23 Field Programmable Gate Array (FPGA) Basics Collection of programmable “gates” embedded in a flexible interconnect network. …a “user programmable” alternative to gate arrays. ? Programmable Gate

JR.S00 24 Look-Up Table (LUT) In Out LUT Mem In1 In2 Out

JR.S00 25 LUTs K-LUT -- K input lookup table Any function of K inputs by programming table

JR.S00 26 Conventional FPGA Tile K-LUT (typical k=4) w/ optional output Flip-Flop

JR.S00 27 Commercial FPGA (XC4K) Cascaded 4 LUTs (2 4-LUTs -> 1 3-LUT) Fast Carry support Segmented interconnect Can use LUT config as memory.

JR.S00 28 XC4000 CLB

JR.S00 29 Not Restricted to Logic Gates Example: Paddi-2 (1995)

JR.S00 30 A Data-driven Computation Paradigm

JR.S00 31 Not restricted to Logic Gate Operations

JR.S00 32 For Spatial Architectures Interconnect dominant –area –power –time …so need to understand in order to optimize architectures

JR.S00 33 Dominant in Area

JR.S00 34 Dominant in Time

JR.S00 35 Dominant in Power XC4003A data from Eric Kusse (UCB MS 1997)

JR.S00 36 Interconnect Problem –Thousands of independent (bit) operators producing results »true of FPGAs today »…true for *LIW, multi-uP, etc. in future –Each taking as inputs the results of other (bit) processing elements –Interconnect is late bound »don’t know until after fabrication

JR.S00 37 Design Issues Flexibility -- route “anything” –(w/in reason?) Area -- wires, switches Delay -- switches in path, stubs, wire length Power -- switch, wire capacitance Routability -- computational difficulty finding routes

JR.S00 38 First Attempt: Crossbar Any operator may consume output from any other operator Try a crossbar?

JR.S00 39 Crossbar Flexibility (++) –routes everything (guaranteed) Delay (Power) (-) –wire length O(kn) –parasitic stubs: kn+n –series switch: 1 –O(kn) Area (-) –Bisection bandwidth n –kn 2 switches –O(n 2 ) Too expensive and not scalable

JR.S00 40 Avoiding Crossbar Costs Good architectural design –Optimize for the common case Designs have spatial locality We have freedom in operator placement Thus: Place connected components “close” together –don’t need full interconnect?

JR.S00 41 Exploit Locality Wires expensive Local interconnect cheap Try a mesh? LUT C Box S Box

JR.S00 42 The Toronto Model Switch Box Connect Box

JR.S00 43 Mesh Analysis Flexibility - ? –Ok w/ large w Delay (Power) –Series switches »1--  n –Wire length »w--  n –Stubs »O(w)--O(w  n) Area –Bisection BW -- w  n –Switches -- O(nw) –O(w 2 n)

JR.S00 44 Mesh Analysis Can we place everything close?

JR.S00 45 Mesh “Closeness” Try placing “everything” close

JR.S00 46 Adding Nearest Neighbor Connections Connection to 8 neighbors Improvement over Mesh by x3 Good for neighbor-neighbor connections

JR.S00 47 Typical Extensions Segmented Interconnect Hardwired/Cascade Inputs

JR.S00 48 XC4K Interconnect

JR.S00 49 XC4K Interconnect Details

JR.S00 50 Creating Hierarchy Example: Paddi-2

JR.S00 51 Level-1 Communication Network

JR.S00 52 Level-2 Communication Network (Pipelined)

JR.S00 53 Paddi-2 Processor 1-  m 2-metal CMOS tech 1.2 x 1.2 mm 2 600k transistors 208-pin PGA fclock = 50 MHz P av = 3.6 5V

JR.S00 54 How to Provide Scalability? Tree of Meshes Main question: How to populate/ parameterize the tree?

JR.S00 55 Hierarchical Interconnect Two regions of connectivity lengths Manhattan Distance Energy x Delay Mesh Binary Tree Hybrid architecture using both Mesh and Binary structures favored

JR.S00 56 Hybrid Architecture Revisited Straightforward combination of Mesh and Binary tree is not smart Short connections will be through the Mesh architecture The cheap connections on the Binary tree will be redundant

JR.S00 57 Inverse Clustering Blocks further away are connected at the lowest levels Inverse clustering complements Mesh Architecture Manhattan Distance Energy x Delay Mesh Binary Tree Mesh + Inverse

JR.S00 58 Hybrid Interconnect Architecture Levels of interconnect targeting different connectivity lengths Level0 Nearest Neighbor Level1 Mesh Interconnect Level2 Hierarchical

JR.S00 59 Prototype Array Size: 8x8 (2 x 4 LUT) Power Supply: 1.5V & 0.8V Configuration: Mapped as RAM Toggle Frequency: 125MHz Area: 3mm x 3mm Process: 0.25U ST

JR.S00 60 Programming the Configurable Platform LUT Mapping PlacementRouting Bitstream Generation Tech. Indep. Optimization Config. Data RTL

JR.S00 61 Starting Point RTL –t=A+B –Reg(t,C,clk); Logic –O i =A i  i  C i –C i+1 = A i B i  B i C i  A i C i

JR.S00 62 LUT Map

JR.S00 63 Placement Maximize locality –minimize number of wires in each channel –minimize length of wires –(but, cannot put everything close) Often start by partitioning/clustering State-of-the-art finish via simulated annealing

JR.S00 64 Place

JR.S00 65 Routing Often done in two passes –Global to determine channel –Detailed to determine actual wires and switches Difficulty is –limited channels –switchbox connectivity restrictions

JR.S00 66 Route

JR.S00 67 Summary Configurable Computing using “programming in space” versus “programming in time” for traditional instruction-set computers Key design choices –Computational units and their granularity –Interconnect Network –(Re)configuration time and frequency Next class: Some practical examples of reconfigurable computers