Reconfigurable Computing Nehir Sönmez 25-11-2004.

Slides:



Advertisements
Similar presentations
FPGA (Field Programmable Gate Array)
Advertisements

TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
A Survey of Logic Block Architectures For Digital Signal Processing Applications.
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
An Introduction to Reconfigurable Computing Mitch Sukalski and Craig Ulmer Dean R&D Seminar 11 December 2003.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR SRAM-based FPGA n SRAM-based LE –Registers in logic elements –LUT-based logic element.
Reconfigurable Computing: What, Why, and Implications for Design Automation André DeHon and John Wawrzynek June 23, 1999 BRASS Project University of California.
Zheming CSCE715.  A wireless sensor network (WSN) ◦ Spatially distributed sensors to monitor physical or environmental conditions, and to cooperatively.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Introduction to Reconfigurable Computing CS61c sp06 Lecture (5/5/06) Hayden So.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Configurable System-on-Chip: Xilinx EDK
Evolution of implementation technologies
Programmable logic and FPGA
Dynamically Reconfigurable Architectures: An Overview Juanjo Noguera Dept. Computer Architecture (DAC-UPC)
February 4, 2002 John Wawrzynek
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu
Lecture 1: Course Introduction September 3, 2013 ECE 636 Reconfigurable Computing Lecture 1 Course Introduction Prof. Russell Tessier.
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.
General FPGA Architecture Field Programmable Gate Array.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Lecture 2: Field Programmable Gate Arrays September 13, 2004 ECE 697F Reconfigurable Computing Lecture 2 Field Programmable Gate Arrays.
Reconfigurable Computing. Lect-02.2 Course Schedule Introduction to Reconfigurable Computing FPGA Technology, Architectures, and Applications FPGA Design.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
Architectures for mobile and wireless systems Ese 566 Report 1 Hui Zhang Preethi Karthik.
SYSTEM-ON-CHIP (SoC) AND USE OF VLSI CIRCUIT DESIGN TECHNOLOGY.
Open Discussion of Design Flow Today’s task: Design an ASIC that will drive a TV cell phone Exercise objective: Importance of codesign.
ECE 465 Introduction to CPLDs and FPGAs Shantanu Dutt ECE Dept. University of Illinois at Chicago Acknowledgement: Extracted from lecture notes of Dr.
Automated Design of Custom Architecture Tulika Mitra
CPLD (Complex Programmable Logic Device)
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.
J. Christiansen, CERN - EP/MIC
FPGA (Field Programmable Gate Array): CLBs, Slices, and LUTs Each configurable logic block (CLB) in Spartan-6 FPGAs consists of two slices, arranged side-by-side.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR FPGA Fabric n Elements of an FPGA fabric –Logic element –Placement –Wiring –I/O.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
Programmable Logic Devices
1 Fly – A Modifiable Hardware Compiler C. H. Ho 1, P.H.W. Leong 1, K.H. Tsoi 1, R. Ludewig 2, P. Zipf 2, A.G. Oritz 2 and M. Glesner 2 1 Department of.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
EE3A1 Computer Hardware and Digital Design
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Evaluating and Improving an OpenMP-based Circuit Design Tool Tim Beatty, Dr. Ken Kent, Dr. Eric Aubanel Faculty of Computer Science University of New Brunswick.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
Lecture 1: Course Introduction September 8, 2004 ECE 697F Reconfigurable Computing Lecture 1 Course Introduction Prof. Russell Tessier.
M.Mohajjel. Why? TTM (Time-to-market) Prototyping Reconfigurable and Custom Computing 2Digital System Design.
ESS | FPGA for Dummies | | Maurizio Donna FPGA for Dummies Basic FPGA architecture.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Introduction to Field Programmable Gate Arrays (FPGAs) EDL Spring 2016 Johns Hopkins University Electrical and Computer Engineering March 2, 2016.
Programmable Logic Devices
ECE 636 Reconfigurable Computing Lecture 1 Course Introduction Prof
Topics SRAM-based FPGA fabrics: Xilinx. Altera..
ECE354 Embedded Systems Introduction C Andras Moritz.
FPGAs in AWS and First Use Cases, Kees Vissers
Dynamically Reconfigurable Architectures: An Overview
HIGH LEVEL SYNTHESIS.
Programmable logic and FPGA
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Reconfigurable Computing Nehir Sönmez

Reconfigurable Computing Standard Definition: A reconfigurable computer is a device which computes by using post-fabrication spatial components of compute elements. [Dehon] FPGA implementation of a processor core to run a program is excluded - not spatial mapping of problem. ASIC implementations excluded – not postfabrication programmable. The definition restricts RC to mapping to fine- grained devices (such as FPGAs). Whereas General Purpose computers compute by making connections in time.

What is Reconfigurable Computing? Computation using hardware that can adapt at the logic level to solve specific problems Why is this interesting? –Some applications are poorly suited to microprocessor. –VLSI “explosion” provides increasing resources. –Hardware/Software –Relatively new research area.

Spatial Computation Example: grade = 0.2 × mt × mt × mt × project; A hardware resource (multiplier or adder) is allocated for each operator in the compute graph. The abstract computation graph becomes the implementation template.

Temporal Computation A hardware resource is time-multiplexed to implement the actions of the operators in the compute graph. Close to a sequential processor/software solution. Many inbetween cases exist.

Why is Custom Logic Faster Than Software? Spatial vs. Temporal Computation –Processors divide computation across time, dedicated logic divides across space

Why is Custom Logic Faster Than Software? Specialization –Instruction set may not provide the operations your program needs –Processors provide hardware that may not be useful in every program or in every cycle of a given program Multipliers Dividers Instruction Memory –Processors need lots of memory to hold the instructions that make up a program and to hold intermediate results. Bit Width Mismatches –In general, processors have a fixed bit width, and all computations are performed on that many bits Multimedia vector instructions (MMX) a response to this

Microprocessor-based Systems –Generalized to perform many functions well. –Operates on fixed data sizes. –Inherently sequential. Data Storage (Register File) ALU ABC 64

Reconfigurable Computing –Create specialized hardware for each application. –Functional units optimized to perform a special task. If (A > B) { H = A; L = B; } Else { H = B; L = A; } Functional Unit A B H L

Dataflow Superscalar must find dataflow graph at run time RC constructs data flow graph at compile time no logic control overhead no window size limitations

Implementation Spectrum –ASIC gives high performance at cost of inflexibility. –Processor is very flexible but not tuned to the application. –Reconfigurable hardware is a nice compromise. MicroprocessorReconfigurable Hardware ASIC

Flexibility vs Data-Processing Rate

Field-Programmable Gate Array –Each logic element outputs one data bit. –Interconnect programmable between elements. –Interconnect tracks grouped into channels. LE Logic Element Tracks

FPGA Architecture Issues –Need to explore architectural issues. –How much functionality should go in a logic element? –How many routing tracks per channel? –Switch “population”? Logic Element

Real World Physical Issues –Modelling FPGA delay. –Improving performance through buffering/segmentation. –Technology dependent. –The cost of reconfigurability. SS Wires have real cost

Translating a Design to an FPGA –CAD to translate circuit from text description to physical implementation well understood. –CAD to translate from C program to circuit not well understood. –Very difficult for application designers to successfully write high-performance applications C program. C = A+B. Circuit A B +C Array Need for design automation!

High-level Compilers –Difficult to estimate hardware resources. –Some parts of program more appropriate for processor (hardware/software codesign). –Compiler must parallelize computation across many resources. –Engineers like to write in C rather than pushing little blocks around. C = A+B AB + C for (i = 0; i<n, i++) {. }

Reconfigurable Hardware –Each logic element operates on four one-bit inputs. –Output is one data bit. –Can perform any boolean function of four inputs 2 = 64K functions! Logic Element A B C D Out A B C D = out 2 4

Basic Logic Block Architecture

Xilinx - Spartan II Architecture IOBs provide the interface between the package pins and the internal logic CLBs provide the functional elements for constructing most logic Dedicated block RAM memories of 4096 bits each Clock DLLs for clockdistribution delay compensation and clock domain control Versatile multi-level interconnect structure

Spartan II Configurable Logic Block LUT capacity is completely determined by the number of inputs, not the complexity Basic block is a logic cell (LC) – A 4-input function generator (LUT), – Carry logic – storage element. Each CLB contains – four LCs, organized in two similar slices. – logic that combines function generators to provide functions of five or six inputs.

Spartan II CLB

Example: Two Bit Adder FA AB CoCo CiCi S Made of Full Adders A+B = D Logic synthesis tool reduces circuit to SOP form C o = ABC i + ABC i + ABC i + ABC i S = ABC i + ABC i + ABC i + ABC i LUT CoCo CiCi B A S CiCi B A

Circuit Compilation 1.Technology Mapping 2.Placement 3.Routing LUT ? Assign a logical LUT to a physical location. Select wire segments And switches for Interconnection.

Processor + FPGA 1. FPGA serves as coprocessor for data intensive applications. Three possibilities Backplane bus (e.g. PCI) Proc chip daughtercard FPGA chip FPGA Proc 2. FPGA serves as embedded computer for low latency transfer. “Reconfigurable Functional Unit”

Processor + FPGA (cont..) –FPGA logic embedded inside processor. –A number of problems with 2 and 3. Process technology an issue. ALU much faster than FPGA generally. FPGA much faster than the entire processor. RF ALU FPGA Processor 3. Processor integration

Multi-FPGA Systems –Most applications don’t fit on one device. –Create need for partitioning designs across many devices. –Effectively a “netlist computer” Each FPGA is a logic processor interconnected in a given topology. F F F F F F F F F

Xilinx XC4000 Cell –2 4-input look-up tables –1 3-input look-up table –2 D flip flops

Altera Flex10K

Xilinx Virtex CLB

Reconfiguration Reconfiguration methodology Static Partially static (=partial reconfiguration) Dynamic

The Design Process 1.Partition a program into sections to be implemented on hardware and software separately 2.Synthesize the computations destined for reconfigurable hardware into gate-level or circuit level description. 3.Map the circuit onto reconfigurable blocks and connect them using reconfigurable routing. 4.After compilation, the circuit is ready for configuration onto the hardware at runtime.

RC Objectives RC objectives: Specialization, performance, flexibility Basic idea: “Programmable Hardware”  Specialization l  Performance l  Power consumption l  Flexibility l  Programming

Routing strategies Reconfigurable Devices Reconfigurable Computing A B C A B C Continuous Routing Structured Routing

Xilinx XC4000 Routing 25

By including reconfigurability we can increase flexibility with high specialization Reconfigurable Instruction Set Processors ProcessorPLD Reconfigurable Processor

Coprocessor based approach ASIP based approach Reconfigurable Instruction Set Processors · · · Task 1 Task K · · · Task K+1Task N SoftwareHardware Task 1Task 2Task N Software Hardware · · ·

Typical example: CPU + PCI board –Altera ARC-PCI –Compaq Pamette System on Chip (SoC) –Altera´s Excalibur device –Chameleon Systems, Inc. Coprocessor based approach (I) Reconfigurable Instruction Set Processors

Altera ARC-PCI Coprocessor based approach (II) Reconfigurable Instruction Set Processors

Compaq Pamette Coprocessor based approach (III) Reconfigurable Instruction Set Processors

Altera´s Excalibur device –Embedded Processor: ARM, MIPS or NIOS Coprocessor based approach (IV) Reconfigurable Instruction Set Processors

Chameleon Systems, Inc. Coprocessor based approach (V) Reconfigurable Instruction Set Processors

Reconfigurable unit within CPU ASIP based approach (I) Reconfigurable Instruction Set Processors Fetch Decode Issue Integer Unit FP Unit Branch Unit LD/ST Unit Reconfigurable Unit

Challenge: CAD tools ASIP based approach (II) Reconfigurable Instruction Set Processors C Code Compiler Assembly Code Instruction Description (Configuration bits)

ASIP based approach (III) Reconfigurable Instruction Set Processors C Parsing Optimizations Inst. Identification Inst. Selection Config. Scheduling Code Generation C Code Assembly Code Hardware Generation Configuration bits Hardware Estimator Compiler Structure

Example: Philips CinCISe Architecture ASIP based approach (II) Reconfigurable Instruction Set Processors Encoded Instruction Word Register File ALU RFU MUX

Why Compute With FPGAs? Huge performance gap between software and hand-designed hardware systems –Often 100-to-1 ratio of performance or performance/area Hardware systems not so good for general computing –Big design, cost barriers to implementation –Not practical to buy a new machine every time you want to run a different program Reconfigurable systems offer best-of-both-worlds –Run-time programmability –Hardware-level performance

Good Applications for Reconfigurable Computing Relatively small application graph –FPGAs have limited capacity –Simple control flow helps a lot Data Parallelism –Execute same computations on many independent data elements –Pipeline computations through the hardware Small and/or varying bit widths –Take advantage of the ability to customize the size of operators

Reconfigurable Computing Successes RSA Decryption –Programmable-Active-Memory machine set record for decryption of RSA-encrypted data DNA Sequence Matching –Reconfigurable hardware has achieved 100x better performance than contemporary supercomputers Signal Processing –FPGA-based filters often get 10x better performance than DSP chips –Benefit from customization of hardware to the application Emulation –Use reconfigurable logic to simulate new processors at high speeds Cryptographic Attacks –High-performance low-cost implementations for breaking encryption algorithms

FPGAs vs CPUs Capacity: Instructions are very dense representation, logic blocks aren’t Tools: Compilers for reconfigurable logic aren’t very good –Some operations are hard to implement on FPGAs One approach to capacity is to exploit the rule of software –Run the 90% of code that takes 10% of execution time on a conventional processor –Run the 10% of code that takes 90% of execution time on reconfigurable logic Programmable-reconfigurable processors

Fine-Grained System: CHIMERAE Treat reconfigurable array as ALU within superscalar –Array implements some number of custom instructions for each program –Register file is interface between programmable and reconfigurable

CHIMERAE Programmed in C –Instruction combining –Control localization –SIMD Within a Register Simulation Studies –Example applications only require 8 RFUOPs in the reconfigurable array –Equivalent to 32 rows in RFU Performance Results –Vary strongly from application to application –Also dependent on model used for RFU delay –Average speedup of 20-30%, one application sees >2x improvement

Coarse-Grained System: Garp Small programmable processor with large reconfigurable array –Interface through memory system

Garp Again, Programmed in C –Compiler attempts to map loop nests onto the reconfigurable array Data Encryption Standard –Estimate 24x speedup over UltraSPARC Image Dithering –9x Speedup Sorting –2x Speedup

Advantages of RC Relative to microprocessors: on average a higher percentage of peak (or raw) computational density is achieved with reconfigurable devices. Fine-grain flexibility leads to exploitation of problem specific parallelism at many levels. Also, many different computation models (or patterns) can be supported. In general, it is possible to match problem characteristics to hardware, through the use of problem specific architectures and low-level circuit specialization. Spatial mapping of computation versus multiplexing of function units (as in processors) relieves pressure for memory capacity, BW, and low-latency and local communication patterns. Modern FPGAs make good system-level components: Relatively large number of IOs (many parallel memory ports) High- BW communications. Machines based on these components can easily scale peak performance by riding Moore’s curve (FPGAs are process drivers). Low-level redundancy permits fault-tolerance and great cost savings. Built-in microprocessors. Is there still room for research in novel devices for RC?

Advantages of RC Even in an application with fixed algorithms, reconfigurable devices may offer advantages over a full-custom or ASIC approach: FPGAs are processes drivers, therefore a generation ahead of ASIC. Increasing NREs for ASIC and full-custom has pushed "cross-over" point way out. Time to market advantage. Programmability leads to: project risk management extended product life-times Dynamic reconfiguration might permit even higher efficiency through hardware sharing (multiplexing) and on the fly circuit specialization. Largely unexploited (unproven) to date. A few research projects have explored this idea.

RC Disadvantages Reconfiguration time might be critical in run-time reconfigurable systems. Low utilization of hardware resources in configurable systems.

FPGAs are Reconfigurable 1. Commercial applications have not taken advantage of reconfigurability. Xilinx/Altera haven’t done much to help. Methodologies/tools nearly nonexistent. 2. Volume/cost graphs don’t accurately capture the potential real costs and other advantages. Re configuration uses: Field upgrades. product life extension, changing requirements. In system board-level testing and field diagnostics. Tolerance to manufacturing faults. Risk-management in system development. Runtime reconfiguration -- higher silicon efficiency. Time-multiplexed pre-designed circuits take maximum use of resources. Runtime specialized circuit generation.

Silicon Usage

Performance: ~10x Speedup Efficiency: ~10x Lower Chip Costs: ~0.5x --increased yield Decreased complexity Decreased design cost