Reconfigurable HPC Reconfigurable HPC part 4 miscellaneous Reiner Hartenstein TU Kaiserslautern May 14, 2004, TU Tallinn, Estonia.

Slides:



Advertisements
Similar presentations
VHDL Design of Multifunctional RISC Processor on FPGA
Advertisements

Field Programmable Gate Array
FPGA (Field Programmable Gate Array)
Introduction to Programmable Logic John Coughlan RAL Technology Department Electronics Division.
EELE 367 – Logic Design Module 2 – Modern Digital Design Flow Agenda 1.History of Digital Design Approach 2.HDLs 3.Design Abstraction 4.Modern Design Steps.
Enabling Technologies for System-on-Chip Development Reconfigurable Computing Architectures and Methodologies for System-on-Chip Monday, November 19, 10:15.
Programmable Logic Devices
Graduate Computer Architecture I Lecture 15: Intro to Reconfigurable Devices.
Some Thoughts on Technology and Strategies for Petaflops.
1 HW/SW Partitioning Embedded Systems Design. 2 Hardware/Software Codesign “Exploration of the system design space formed by combinations of hardware.
Design Flow – Computation Flow. 2 Computation Flow For both run-time and compile-time For some applications, must iterate.
Configurable System-on-Chip: Xilinx EDK
Programmable logic and FPGA
Enabling Technologies for Reconfigurable Computing Part 4: FPGAs: recent developments Wednesday, November 21, – hrs. Reiner Hartenstein University.
February 4, 2002 John Wawrzynek
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
Implementation of DSP Algorithm on SoC. Mid-Semester Presentation Student : Einat Tevel Supervisor : Isaschar Walter Accompaning engineer : Emilia Burlak.
Enabling Technologies for System-on-Chip Development Reconfigurable Computing Architectures and Methodologies for System-on-Chip Monday, November 19, 10:15.
Foundation and XACTstepTM Software
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
Using FPGAs with Embedded Processors for Complete Hardware and Software Systems Jonah Weber May 2, 2006.
Digital Circuit Implementation. Wafers and Chips  Integrated circuit (IC) chips are manufactured on silicon wafers  Transistors are placed on the wafers.
Xilinx at Work in Hot New Technologies ® Spartan-II 64- and 32-bit PCI Solutions Below ASSP Prices January
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Section I Introduction to Xilinx
Programmable Solutions in Video Capture/Editing. Overview  Xilinx - Industry Leader in FPGAs/CPLDs High-density, high-speed, programmable, low cost logic.
Lecture #3 Page 1 ECE 4110– Sequential Logic Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.No Class Monday, Labor Day Holiday 2.HW#2 assigned.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
Ch.9 CPLD/FPGA Design TAIST ICTES Program VLSI Design Methodology Hiroaki Kunieda Tokyo Institute of Technology.
Lecture #3 Page 1 ECE 4110– Sequential Logic Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.No Class Monday, Labor Day Holiday 2.HW#2 assigned.
Design Verification An Overview. Powerful HDL Verification Solutions for the Industry’s Highest Density Devices  What is driving the FPGA Verification.
VLSI & ECAD LAB Introduction.
Xilinx Programmable Logic Design Solutions Version 2.1i Designing the Industry’s First 2 Million Gate FPGA Drop-In 64 Bit / 66 MHz PCI Design.
집적회로 Spring 2007 Prof. Sang Sik AHN Signal Processing LAB.
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
J. Christiansen, CERN - EP/MIC
COE 405 Design and Modeling of Digital Systems
® SPARTAN Series High Volume System Solution. ® Spartan/XL Estimated design size (system gates) 30K 5K180K XC4000XL/A XC4000XV Virtex S05/XL.
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Field Programmable Gate Arrays (FPGAs) An Enabling Technology.
® Java Debug Hardware Modules Using JBits by Jonathan Ballagh Eric Keller Peter Athanas Reconfigurable Architectures Workshop 2001.
Lecture #3 Page 1 ECE 4110–5110 Digital System Design Lecture #3 Agenda 1.FPGA's 2.Lab Setup Announcements 1.HW#2 assigned Due.
Xilinx Programmable Logic Development Systems Alliance Series version 3.
EE3A1 Computer Hardware and Digital Design
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Tools - Design Manager - Chapter 6 slide 1 Version 1.5 FPGA Tools Training Class Design Manager.
M.Mohajjel. Why? TTM (Time-to-market) Prototyping Reconfigurable and Custom Computing 2Digital System Design.
Reiner Hartenstein University of Kaiserslautern
® Xilinx XC9500 CPLDs. ®  High performance —t PD = 5ns, f SYS = 178MHz  36 to 288 macrocell densities  Lowest price, best value CPLD.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Introduction to Field Programmable Gate Arrays Lecture 1/3 CERN Accelerator School on Digital Signal Processing Sigtuna, Sweden, 31 May – 9 June 2007 Javier.
Teaching Digital Logic courses with Altera Technology
Survey of Reconfigurable Logic Technologies
Delivered by.. Love Jain p08ec907. Design Styles  Full-custom  Cell-based  Gate array  Programmable logic Field programmable gate array (FPGA)
CoDeveloper Overview Updated February 19, Introducing CoDeveloper™  Targeting hardware/software programmable platforms  Target platforms feature.
FPGA 상명대학교 소프트웨어학부 2007년 1학기.
1 The user’s view  A user is a person employing the computer to do useful work  Examples of useful work include spreadsheets word processing developing.
Programmable Logic Devices
ECE354 Embedded Systems Introduction C Andras Moritz.
THE PROCESS OF EMBEDDED SYSTEM DEVELOPMENT
Electronics for Physicists
Reconfigurable Computing
Dynamically Reconfigurable Architectures: An Overview
Embedded systems, Lab 1: notes
Embedded Architectures: Configurable, Re-configurable, or what?
XILINX CPLDs The Total ISP Solution
HIGH LEVEL SYNTHESIS.
Reconfigurable HPC part 4 miscellaneous
Xilinx Alliance Series
Presentation transcript:

Reconfigurable HPC Reconfigurable HPC part 4 miscellaneous Reiner Hartenstein TU Kaiserslautern May 14, 2004, TU Tallinn, Estonia

© 2004, TU Kaiserslautern 2 Time to Market A Fundamental Paradigm Shift in Silicon Application Revenue / month Time / months ASIC Product 30 Update 1 Product Update 2 reconfigurable Product with download [Tom Kean]

© 2004, TU Kaiserslautern 3 Makimoto’s 3rd wave Reconfigurability The next Revolution: 1978 Transistor entry: Applicon, Calma, CV Synthesis: Cadence, Synopsys Schematics entry: Daisy, Mentor, Valid... [Keutzer / Newton] EDA industry paradigm switching every 7 years 1999 (Co-) Compilation & Data-stream-based ( r ) DPAs [Hartenstein] 2006 Paradigm Shift Mainstream Tornado McKinsey Curve [Richard Newton] [Keutzer / Newton] 82% of designers hate their tools

© 2004, TU Kaiserslautern 4 Software to Configware Migration this talk will illustrate the performance benfit which may be obtained from Reconfigurable Computing stressing coarse grain Reconfigurable Computing (RC), point of view, this talk hardly mentions FPGAs (But coarse grain may be always mapped onto FPGAs) Software to Configware Migration is the most important source of speed-up Hardware is just frozen Configware

© 2004, TU Kaiserslautern 5 directly delivered to the customer: completely configured number of design starts rGA-based [N. Tredennick, Gilder Technology Report, 2003] omit emulation avoiding specific silicon ….

© 2004, TU Kaiserslautern 6 Mega-rGAs planned Virtex II XC 40250XV Virtex XC 4085XL 100 System gates per rGA chip Jahr [Xilinx Data]

© 2002, University of Kaiserslautern TU Kaiserslautern 7 Embedded hardw. CPU & memory cores on chip. HLL Compiler CPU core FPGA core Memory core HLL Compiler [à la S. Guccione]

© 2004, TU Kaiserslautern 8 FPGA Fabric-based on Virtex-II Architecture Source: Ivo Bolsens, Xilinx On Chip Memory Controller Power PC Core Embeded RAM Rocket IO entire system on a single chip all you need on board Xilinx Virtex-II Pro FPGA Architecture PowerPC 405 RISC CPU (PPC405) cores

© 2004, TU Kaiserslautern 9 What’s Wrong with This Picture? 1.Still Have to Make the Chip 2.Need Two Sets of Software to Build It –The ASIC Flow –The PLD Flow 3.Have No Idea What to Connect the PLD Pins to –Chances Are, You Are Going to Get It Wrong! Embedded FPGA Fabric [ Jonathan Rose ] What About PLD Cores on ASICs ?

© 2004, TU Kaiserslautern 10 What’s Right with This Picture! 1.Pre-Fabricated 2.One CAD Tool Flow! 3.Can Connect Anything to Anything PLDs are built for general connectivity Embedded CPU Serial Link, Analog, “etc.” [ Jonathan Rose ]

© 2004, TU Kaiserslautern 11 >> rGAs << rGAs Placement & Routing Soft Processors History of Frameworks RTR Support by rGA vendors EDA Future directions conclusions

© 2004, TU Kaiserslautern 12 Different Morphware-Platforms: Reconfigurable Logic Blocks Reconfigurable Interconnect Blocks Reconfigurable Datapath Arrays fine grain reconfigurable coarse grain reconfigurable Reconfigurable interconnect fabrics

© 2004, TU Kaiserslautern 13 switch rGA w. island architecture (Ausschnitt) © 2003, 13 Interkonnect- Fabrics switch box connect box reconfigurable logic block

© 2004, TU Kaiserslautern 14 Switch box TU Kaiserslautern © 2003, 14 switch point switch box

© 2004, TU Kaiserslautern 15 connect box TU Kaiserslautern © 2003, 15 point

© 2004, TU Kaiserslautern 16 Verbindu ngspunkt (vergröße rt) conncect point activated TU Kaiserslautern © 2003, 16

© 2004, TU Kaiserslautern 17 der 4. Schaltpunkt der 5. Schaltpunkt 3 Schaltpunkte switch boxes activated TU Kaiserslautern © 2003, 17 switch point switch box

© 2004, TU Kaiserslautern 18 Result TU Kaiserslautern © 2003, 18

© 2004, TU Kaiserslautern 19 TU Kaiserslautern A B Routing completed for 1 net © 2003, 1979 Silva Lisco (Silicon Valley Research Corp.) offers CALM-P 20 Transistors + 20 Flipflops 19

© 2004, TU Kaiserslautern 20 >> Placement & Routing << rGAs Placement & Routing Soft Processors History of Frameworks RTR Support by rGA vendors EDA Future directions conclusions

© 2004, TU Kaiserslautern 21 A B passing through Routing: long distance net At a time a path may be used only for one signal Bridges of Königsberg

© 2004, TU Kaiserslautern 22 A B C C D D C and D are not reachable C and D need another placement Routing congestion C cannot beconnected with D. rLBs are not 100% usable

© 2004, TU Kaiserslautern 23 Leonhard Euler Euler‘s Problem of the bridges Königsberg is such a network (1736): Find a way, which crosses each bridge exactly once Also an optimization: none of the bridges is unused. 1736

© 2004, TU Kaiserslautern 24 L. Euler: Solutio Problematis Ad geometriam Situs Pertinentis; Commetarii Academiae Scientiarum Imperialis Petropolitanae 8 (1736), pp Graph edge node Left Bank Right Bank Kneiphof Island Other Island

© 2004, TU Kaiserslautern J. N. Reynold‘s crossbar switch 1915 patent granted 1926 first public telefon switching application in Shweden Betulander‘s crossbar switch 1919 NASA telemetrics crossbar array 1964 Crossbar Crossbr switch

© 2004, TU Kaiserslautern 26 Crossbar complete? One bar connects 2 pins Size of full complete switchs: n x n / 2 n x n/2n cossbar chips in a row full n partial no of crossbar chips needed Crossbar Chips available from Aptix, Texas Instruments and others

© 2004, TU Kaiserslautern 27 Routing congestion example with detour Direct connection impossible rGA Routing through Detour connection rLB Identity function configured Routing-Resources: Logic gates and/or pass transistors © 2003, 27

© 2004, TU Kaiserslautern 28 Crossbar-based Architectures 1993: PADY-II (Jan Rabaey) 1990: UC Berkeley (Jan Rabaey) 16 bit 1997: Pleiades (mesh & crossbar) 32 bit

© 2004, TU Kaiserslautern 29 PADDI-II Architecture

© 2004, TU Kaiserslautern 30 >> Soft Processors << rGAs Placement & Routing Soft Processors History of Frameworks RTR Support by rGA vendors EDA Future directions conclusions

© 2004, TU Kaiserslautern 31 FPGA CPUs in teaching and academic research UCSC: 1990! Märaldalen University, Eskilstuna, Sweden Chalmers University, Göteborg, Sweden Cornell University Gray Research Georgia Tech Hiroshima City University, Japan Michigan State Universidad de Valladolid, Spain Virginia Tech Washington University, St. Louis New Mexico Tech UC Riverside Tokai University, Japan

© 2004, TU Kaiserslautern 32 Some soft CPU core examples corearchitectureplatform MicroBlaze 125 MHz 70 D-MIPS 32 bit standard RISC 32 reg. by 32 LUT RAM- based reg. Xilinx up to 100 on one FPGA Nios16-bit instr. set Altera Mercury Nios 50 MHz 32-bit instr. set Altera 22 D-MIPS Nios8 bitAltera – Mercury gr bit gr bit My80i8080AFLEX10K30 or EPF6016 DSPuva1616 bit DSPSpartan-II corearchitectureplatform Leon 25 Mhz SPARC ARM7 cloneARM uP bitCISC, 32 reg.200 XC4000E CLBs REGIS8 bits Instr. + ext. ROM 2 XILINX 3020 LCA Reliance-112 bit DSPLattice 4 isp30256, 4 isp1016 1Popcorn-18 bit CISCAltera, Lattice, Xilinx Acorn-11 Flex 10K20 YARD-1A16-bit RISC, 2 opd. Instr. old Xilinx FPGA Board xr16RISC integer CSpartanXL

© 2004, TU Kaiserslautern 33 einige „soft CPU core“ Beispiele Spartan-II16 bit DSPDSPuva16 FLEX10K30 or EPF6016 i8080AMy80 32-bitgr bitgr1040 Altera – Mercury 8 bitNios Altera 22 D-MIPS 32-bit instr. set Nios 50 MHz Altera Mercury 16-bit instr. set Nios Xilinx up to 100 on one FPGA 32 bit standard RISC 32 reg. by 32 LUT RAM- based reg. MicroBlaze 125 MHz 70 D-MIPS platformarchitecturecore SpartanXLRISC integer Cxr16 old Xilinx FPGA Board 16-bit RISC, 2 opd. Instr. YARD-1A 1 Flex 10K20Acorn-1 Altera, Lattice, Xilinx 8 bit CISC1Popcorn-1 Lattice 4 isp30256, 4 isp bit DSPReliance-1 2 XILINX 3020 LCA 8 bits Instr. + ext. ROM REGIS 200 XC4000E CLBs CISC, 32 reg.uP bit ARMARM7 clone SPARCLeon 25 Mhz platformarchitecturecore Configware ! (keine Hardware) Configware ! (keine Hardware) Retro- Emulation Retro- Emulation

© 2004, TU Kaiserslautern 34 It’s a Paradigm Shift ! Using FPGAs (fine grain reconfigurable) just mainly has been classical Logic Synthesis on a “strange hardware” platform Coarse Grain Reconfigurable Arrays (rDPAs) (Reconfigurable Computing), however, mean a really fundamental Paradigm Shift This is still ignored by CS and EE Curricula and almost all R&D scenes

© 2004, TU Kaiserslautern 35 Why the speed-up although FPGA is clock slower by x 3 or even more (most know-how from „ high level synthesis “ discipline) moving operator to the data stream (before run time) support operations: no clock nor memory cycle decisions without memory cycles nor clock cycles most „ data fetch “ without memory cycle

© 2004, TU Kaiserslautern 36 >> History of Frameworks << rGAs Placement & Routing Soft Processors History of Frameworks RTR Support by rGA vendors EDA Future directions conclusions

© 2002, University of Kaiserslautern TU Kaiserslautern 37 Goal: away from complex design flow Place and Route Netlist Schematics/ HDL Netlister Bitstream Compiler HLL [à la S. Guccione]

© 2002, University of Kaiserslautern TU Kaiserslautern 38 Overcome traditional separate design flow User Code Compiler Executable Netlister Netlist Place and Route. Bitstream Schematics/ HDL HLL Compiler [à la S. Guccione]

© 2002, University of Kaiserslautern TU Kaiserslautern 39 Overcome traditional co-processing design separate flow -> JBits Design Flow User Java Code Java Compiler JBits API Executable User Code Compiler Executable Netlister Netlist Place and Route. Bitstream Schematics/ HDL [à la S. Guccione]

© 2004, TU Kaiserslautern 40 new directions in application development new directions in application development. aut. partitioning compilers: designer productivity like CoDe-X (Jürgen Becker, Univ. of Karlsruhe), supports Run-Time Reconfiguration (RTR), a key enabler of error handling and fault correction by partial re-routing the FPGA at run time, as well as remote patching for upgrading, remote debugging, and remote repair by reconfiguration - even over the internet.

© 2004, TU Kaiserslautern 41 rGAs Placement & Routing Soft Processors History of Frameworks RTR Support by rGA vendors EDA Future directions conclusions >> RTR <<

© 2002, University of Kaiserslautern TU Kaiserslautern 42 CPU use for configuration management on-board microprocessor CPU is available anyhow - even along with a little RTOS use this CPU for configuration management Compiler HLL RTR System Design

© 2002, University of Kaiserslautern TU Kaiserslautern 43 hard CPU & memory core on same chip CPU core FPGA core Memory core Compiler HLL Compiler HLL RTR System Design

© 2002, University of Kaiserslautern TU Kaiserslautern 44 Converging factors for RTR User Java Code Java Compiler JBits API Executable Converging factors make RTR based system design viable 1) million gate FPGA devices and co-processing with standard microprocessors are commonplace direct implementation of complex algorithms in FPGAs. This alone has already revolutionized FPGA design. 2) new tools like Xilinx Jbits software tool suite directly support coprocessing and RTR.

© 2004, TU Kaiserslautern 45 RTR divides application into a series of sequentially executed stages, each mapped as a separate execution module. Excellent example :Xtrem platform by PACT AG, Munich Without RTR, all configurable platforms just ASIC emulators. directly support development and debugging of RTR applications will also heavily influence the future system organization

© 2004, TU Kaiserslautern 46 rGAs Placement & Routing Soft Processors History of Frameworks RTR Support by rGA vendors EDA Future directions conclusions >> Support by rGA vendors <<

© 2004, TU Kaiserslautern 47 >> Support … Support by FPGA Vendors –Xilinx Software by Xilinx Configware (soft IP Cores) Hardware –Altera Software Configware Hardware

© 2004, TU Kaiserslautern 48 Xilinx fabless FPGA semi vendor, San Jose, Ca, founded 1984 key patents on FPGAs (expiring in a few years) Fortune 2001: No. 14 Best Company to work for in (intel: no. 42, hp no. 64, TI no. 65). DARPA grant (Nov‘99) to develop Jbits API tools for internet reconfigurable / upgradable logic (w. VT) Less brilliant early/mid 90ies (president Curt Wozniak): 1995 market share from 84% down to 62% [Dataquest] As designs get larger, Xilinx losed its advantage (bugfixes did not require to burn new chips) meanwhile, weeks of expensive debug time needed

© 2004, TU Kaiserslautern 49 Software by Xilinx Full design flow from Cadence, Mentor, and Synopsys Xilinx Software AllianceEDA Program: –Alliance Series Development System. –Foundation Series Development Systems. –Xilinx Foundation Series ISE (Integrated Synthesis Environment) –free WebPOWERED SW w. WebFitter & WebPACK-ISE –StateCAD XE and HDL Bencher –Foundation Base Express –Foundation ISE Base Express More: ModelSim Xilinx Edition (ModelSim XE) | Forge Compiler | Modular Design | Chipscope ILA | The Xilinx System Generator| XPower| JBits SDK | The Xilinx XtremeDSP Initiative| MathWorks / Xilinx Alliance| System Generator| The Wind River / Xilinx alliance|

© 2004, TU Kaiserslautern 50 Configware (soft IP Products) For libraries, creation and reuse of configware To search for IPs see: List of all available IP The AllianceCORE program is a cooperation between Xilinx and third-party core developers The Xilinx Reference Design Alliance Program The Xilinx University Program LogiCORE soft IP with LogiCORE PCI Interface. Consultants

© 2004, TU Kaiserslautern 51 Xilinx hardware Virtex, Virtex-II, first w. 1 mio system gates. –Virtex-E series > 3 mio system gates. Virtex-EM on a copper process & addit. on chip memory f. network switch appl. The Virtex XCV3200E > 3 million gates, 0.15-micron technology, Spartan, Spartan-XL, Spartan-II –for low-cost, high volume applications as ASIC replacements –Multiple I/O standards, on-chip block RAM, digital delay lock loops –eliminate phase lock loops, FIFOs, I/O xlators, system bus drivers XC4000XV, XC4000XL/XLA, CPLD: low-cost families –rapid development, longer system life, robust field upgradability –support In-System Programming (ISP), in-board debugging, –test during manufacturing, field upgrades, full JTAG compliant interface CoolRunner: low power, high speed/density, standby mode. Military & Aerospace: QPRO high-reliability QML certified Configuration Storage Devices

© 2004, TU Kaiserslautern 52 Altera Altera was founded in June 1983 EDA: synthesis, place & route, and, verification Quartus II: APEX, Excalibur, Mercury, FLEX 6000 families MAX+PLUS II: FLEX, ACEX & MAX families Flow with Quartus II: Mentor Graphics, Synopsys, Synplicity deliver a design design software to support Altera SOPC solutions. Mentor: only EDA vendor w. complete design environment f. APEX II incl. IP, design capture, simulation, synthesis, and h/s co- verification Configware: Altera offers over a hundred IP cores Third party IP core design services and consultants

© 2004, TU Kaiserslautern 53 Altera hardware Newer families: APEX 20KE, APEX 20KC, APEX II, MAX 7000B, ACEX 1K, Excalibur, Mercury families. –Apex EP20K1500E (0.18-µ), up to 2.4 mio system gates, –APEX II (all-copper 0.13-µ) f. data path applications, supports many I/O standards. 1-Gbps True-LVDS performance –wQ2001, an ARM-based Excalibur device Altera mainstream: MAX 7000A, 3000A; FLEX 6000, 10KA, 10KE; APEX 20K families. Mature and other : Classic, MAX 7000, 7000S, 9000; FLEX 8000, 10K families.

© 2004, TU Kaiserslautern 54 rGAs Placement & Routing Soft Processors History of Frameworks RTR Support by rGA vendors EDA Future directions conclusions >> EDA <<

© 2004, TU Kaiserslautern 55 >> EDA << EDA as the Key Enabler (major EDA vendors) Altera Cadence Mentor Graphics Synopsys Xilinx Changing EDA Tools Market

© 2004, TU Kaiserslautern 56 EDA as the Key Enabler (major EDA vendors) Select EDA quality / productivity, not FPGA architectures EDA often has massive software quality problems Customer: highest priority EDA center of excellence –collecting EDA expertise and EDA user experience –to assemble best possible tool environments –for optimum support design teams –to cope with interoperability problems –to keep track with the EDA scene as a rapidly moving target being fabless, FPGA vendors spend most qualified manpower in development of EDA, IP cores, applications, support Xilinx and Altera are morphing into EDA companies.

© 2004, TU Kaiserslautern 57 Cadence FPGA Designer: top-down FPGA design system, high-level mapping, architecture-specific optimization, Verilog,VHDL, schematic-level design entry. Verilog, VHDL to Synergy (logic synthesis) and FPGA Designer FPGAs simulated by themselves using Cadence's Verilog- XL or Leapfrog VHDL simulators and simulated w. rest of the system design w. Logic Workbench board/system verification env‘ment. Libraries for the leading FPGA manufacturers.

© 2004, TU Kaiserslautern 58 Mentor Graphics System Design and Verification. PCB design and analysis: IC Design and Verification shifts ASIC design flow to FPGAs (Altera, Xilinx) –by FPGA Advantage with IP support –by ModuleWare, –Xilinx CORE Generator –Altera MegaWizard integration,

© 2004, TU Kaiserslautern 59 Synopsys FPGA Compiler II Version of ASIC Design Compiler Ultra Block Level Incremental Synthesis (BLIS) ASIC FPGA migration Actel, Altera, Atmel, Cypress, Lattice, Lucent, Quicklogic, Triscend, Xilinx

© 2004, TU Kaiserslautern 60 new directions in application development new directions in application development. aut. partitioning compilers: designer productivity like CoDe-X (Jürgen Becker, Univ. of Karlsruhe), supports Run-Time Reconfiguration (RTR), a key enabler of error handling and fault correction by partial re-routing the FPGA at run time, as well as remote patching for upgrading, remote debugging, and remote repair by reconfiguration - even over the internet.

© 2002, University of Kaiserslautern TU Kaiserslautern 61 Converging factors for RTR User Java Code Java Compiler JBits API Executable Converging factors make RTR based system design viable 1) million gate FPGA devices and co-processing with standard microprocessors are commonplace direct implementation of complex algorithms in FPGAs. This alone has already revolutionized FPGA design. 2) new tools like Xilinx Jbits software tool suite directly support coprocessing and RTR.

© 2004, TU Kaiserslautern 62 RTR divides application into a series of sequentially executed stages, each implemented as a separate execution module. Partial RTR partitions these stages into finer-grain sub-modules to be swapped in as needed. Without RTR, all conf. platforms just ASIC emulators. needs a new kind of application development environments. directly support development and debugging of RTR appl. essential for the advancement of configurable computing will also heavily influence the future system organization Xilinx, VT, BYU work on run-time kernels, run-time support, RTR debugging tools and other associated tools. smaller, faster circuits, simplified hardware interfacing, fewer IOBs; smaller, cheaper packages, simplified software interfaces.

© 2004, TU Kaiserslautern 63 Run-time Mapping run-time reconfigurable are: Xilinx VIRTEX FPGA family RAs being part of Chameleon CS2000 series systems Using such devices changes many of the basic assumptions in the HW/SW co-design process: host/RL interaction is dynamic, needs a tiny OS like eBIOS, also to organize RL reconfiguration under host control typical goal is minimization of reconfiguration latency (especially important in communication processors), to hide configuration loading latency, and, Scheduling to find ’best’ schedule for eBIOS calls (C~side).

© 2004, TU Kaiserslautern 64 >> future directions << rGAs Placement & Routing Soft Processors History of Frameworks RTR Support by rGA vendors EDA Future directions conclusions

© 2002, University of Kaiserslautern TU Kaiserslautern 65 Soft CPU: new job for compilers soft CPU FPGA Memory core FPGA Compiler HLL

© 2002, University of Kaiserslautern TU Kaiserslautern 66 Soft rDPA feasible ? rDPU Array rDPU Array [à la S. Guccione]

© 2002, University of Kaiserslautern TU Kaiserslautern 67 Array I/O examples rDPU Array rDPU Array data streams, or, from / to embedded memory banks data streams, or, from / to embedded memory banks [à la S. Guccione]

© 2002, University of Kaiserslautern TU Kaiserslautern 68 HLL 2 Soft Array Memory soft CPU miscellanous softDPUarraysoftDPUarray HLL Compiler [à la S. Guccione]

© 2002, University of Kaiserslautern TU Kaiserslautern 69 HLL 2 „flex“ rDPA Memory CPU miscellanous rDPUarrayrDPUarray HLL Compiler [à la S. Guccione]

© 2002, University of Kaiserslautern TU Kaiserslautern 70 >> HLLs <<

© 2002, University of Kaiserslautern TU Kaiserslautern 71 HLLs for Hardware Design vs. System Design vs. RTR System Design HLL Compiler System Design Compiler HLL RTR System Design [à la S. Guccione]

© 2002, University of Kaiserslautern TU Kaiserslautern 72 HLLs for Hardware Design vs. System Design vs. RTR System Design HLL Compiler System Design Compiler HLL RTR System Design Compiler HLL [à la S. Guccione]

© 2002, University of Kaiserslautern TU Kaiserslautern 73 CPU and memory on Chip CPU core FPGA core Memory core Compiler HLL Compiler HLL RTR System Design [à la S. Guccione]

© 2002, University of Kaiserslautern TU Kaiserslautern 74 Jbit Environment RTP Core Library JRoute API Device Simulator User Code BoardScope Debugger XHWIF JBits API TCP/IP [à la S. Guccione]

© 2002, University of Kaiserslautern TU Kaiserslautern 75 HLLs for Hardware Design vs. System Design vs. RTR System Design Compiler HLL Compiler System Design [à la S. Guccione]

© 2002, University of Kaiserslautern TU Kaiserslautern 76 Embedded System Design HLL Compiler CPU core FPGA core Memory core HLL Compiler soft CPU FPGA Memory core FPGA [à la S. Guccione]

© 2002, University of Kaiserslautern TU Kaiserslautern 77 >> conclusions << rGAs Placement & Routing Soft Processors History of Frameworks RTR Support by rGA vendors EDA Future directions conclusions

© 2004, TU Kaiserslautern 78 © 2001, University of Kaiserslautern missing the next revolution Ignoring reconfigurable computing by teaching computing fundamentals within our CS curricula causing the waste billions of dollars. is one of the biggest mistakes in the history of information technology application

© 2004, TU Kaiserslautern 79 „EDA industry shifts into CS mentality“ [Wojciech Maly] Microprogramming to replace FSM design Hardware languages replace EE-type schematics EDA Software and its interfacing languages Newer system level languages like systemC etc. Small and large module re-use Hierarchical organization of designs, EDA, et al

© 2004, TU Kaiserslautern 80 „EDA industry shifts into CS mentality“ [Wojciech Maly] Which language to select ?

© 2004, TU Kaiserslautern 81 roadmap old CS lab course philosophy: given an application: implement it by a program -/- new CS freshman lab course environment: Given an application: a)implement it by writing a program b)implement it as a morphware prototype c)Partition it into P and Q c.1) implement P by software c.2) implement Q by morphware c.3) implement P / Q communication interface

© 2004, TU Kaiserslautern 82 All enabling technologies are available anti machine and all its architectural resources parallel memory IP cores and generators anything else needed languages & (co-)compilation techniques morphware vendors like PACT.... literature from last 30 years

© 2004, TU Kaiserslautern 83 END

© 2004, TU Kaiserslautern 84 The dichotomy of models Note for von Neumann: state register is with the CPU Note for the anti machine: state register is with memory bank / state register s are within memory bank s

© 2004, TU Kaiserslautern 85 Machine Paradigms ( “instruction fetch” ) also hardwired implementations* *) e g. Bee project Prof. Broderson

© 2004, TU Kaiserslautern 86 benefit from RAM-based & 2 nd paradigm RAM-based platform needed for: flexibility, programmability avoiding the need of specific silicon mask cost: currently 2 mio $ - rapidly growing 1) simple 2nd machine paradigm needed as a common model: to avoid the need of circuit expertize needed to to educate zillions of programmers 2)

© 2004, TU Kaiserslautern 87 Design Space Exploration Systems