Enabling Technologies for Reconfigurable Computing Reiner Hartenstein University of Kaiserslautern November 21, 2001, Tampere, Finland Enabling Technologies.

Slides:



Advertisements
Similar presentations
Symantec 2010 Windows 7 Migration EMEA Results. Methodology Applied Research performed survey 1,360 enterprises worldwide SMBs and enterprises Cross-industry.
Advertisements

Números.
Symantec 2010 Windows 7 Migration Global Results.
University Paderborn 07 January 2009 RG Knowledge Based Systems Prof. Dr. Hans Kleine Büning Reinforcement Learning.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
PDAs Accept Context-Free Languages
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala
EuroCondens SGB E.
Worksheets.
Reinforcement Learning
CASES 2002 Intl Conference on Compilers, Architectures and Synthesis for Embedded Systems Embedded Architectures: Configurable, Re-configurable, or what?
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 4 Computing Platforms.
Sequential Logic Design
Copyright © 2013 Elsevier Inc. All rights reserved.
Addition and Subtraction Equations
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Break Time Remaining 10:00.
The basics for simulations
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Factoring Quadratics — ax² + bx + c Topic
MM4A6c: Apply the law of sines and the law of cosines.
Briana B. Morrison Adapted from William Collins
Chapter 3 Logic Gates.
Figure 3–1 Standard logic symbols for the inverter (ANSI/IEEE Std
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Operating Systems Operating Systems - Winter 2010 Chapter 3 – Input/Output Vrije Universiteit Amsterdam.
1 Prediction of electrical energy by photovoltaic devices in urban situations By. R.C. Ott July 2011.
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Progressive Aerobic Cardiovascular Endurance Run
Biology 2 Plant Kingdom Identification Test Review.
2.5 Using Linear Models   Month Temp º F 70 º F 75 º F 78 º F.
Defect Tolerance for Yield Enhancement of FPGA Interconnect Using Fine-grain and Coarse-grain Redundancy Anthony J. YuGuy G.F. Lemieux September 15, 2005.
CSE 6007 Mobile Ad Hoc Wireless Networks
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
ST/PRM3-EU | | © Robert Bosch GmbH reserves all rights even in the event of industrial property rights. We reserve all rights of disposal such as copying.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Figure 10–1 A 64-cell memory array organized in three different ways.
Static Equilibrium; Elasticity and Fracture
Resistência dos Materiais, 5ª ed.
Clock will move after 1 minute
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 9 TCP/IP Protocol Suite and IP Addressing.
Select a time to count down from the clock above
WARNING This CD is protected by Copyright Laws. FOR HOME USE ONLY. Unauthorised copying, adaptation, rental, lending, distribution, extraction, charging.
UNDERSTANDING THE ISSUES. 22 HILLSBOROUGH IS A REALLY BIG COUNTY.
A Data Warehouse Mining Tool Stephen Turner Chris Frala
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Introduction Embedded Universal Tools and Online Features 2.
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
J. Christiansen, CERN - EP/MIC
VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,
EE3A1 Computer Hardware and Digital Design
Embedded Architectures: Configurable, Re-configurable, or what?
Presentation transcript:

Enabling Technologies for Reconfigurable Computing Reiner Hartenstein University of Kaiserslautern November 21, 2001, Tampere, Finland Enabling Technologies for Reconfigurable Computing part 1: Reconfigurable Computing (RC) Wednesday, November 21, 8.30 – hrs.

© 2001, University of Kaiserslautern 2 Schedule timeslot – 10.00Reconfigurable Computing (RC) – 10.30coffee break – 12.00Compilation Techniques for RC – 14.00lunch break – 15.30Resources for Stream-based RC – 16.00coffee break – 17.30FPGAs: recent developments

© 2001, University of Kaiserslautern 3 Reconfigurable: why? Exploding design cost and shrinking product life cycles of ASICs create a demand on RA usage for product longevity. Performance is only one part of the story. The time has come fully exploit their flexibility to support turn-around times of minutes instead of months for real time in-system debugging, profiling, verification, tuning, field- maintenance, and field-upgrades. A new soft machine paradigm and language framework is available for novel compilation techniques to cope with the new market structures transferring synthesis from vendor to customer.

© 2001, University of Kaiserslautern 4 SOC Alternatives… not including C/C++ CAD Tools [Gordon Bell] The blank sheet of paper: FPGA Auto design of a basic system: Tensilica Standardized, committee designed components*, cells, and custom IP Standard components including more application specific processors *, IP add-ons and custom One chip does it all: SMOP ** *) Processors, Memory, Communication & Memory Links, **) SMOP ??

© 2001, University of Kaiserslautern 5 SoC Alternatives [Gordon Bell] productstrategyvendor FPGAsea of uncommitted gate arraysXylinx, Altera compile a systemunique processor for every application Tensilica systolic arraymany pipelined or parallel processors + custom DSP, VLIWspecial purpose processor cores + custom TI processor + RAM + ASICS general purpose cores, specialized by I/O, etc. IBM, Intel, universal micromultiprocessor array, programmable I/O Cradle

© 2001, University of Kaiserslautern 6 A Decade of Research in Reconfigurable Computing Due to the achievements of numerous Research Projects throughout the 90ies the Breakthrough in Commercialization has started and already a quite comprehensive Methodology is available. Dear Colleague, the RC Scene welcomes your contributions to improve it and to push for Inclusion in contemporary CS&E Curricula. It is one of the Goals of this Talk to stimulate you by Highlights and introducing some Key Issues.

© 2001, University of Kaiserslautern 7 no more a strange niche area was Hardware design for a strange plattform –CAD, but no Compilation Emerging awareness: –New mind set –New curricular embedding coming Dichotomie of CS –SW CW –HW FW –computing in time computing in space

© 2001, University of Kaiserslautern 8 flexibility / universality trade-off trade-off flexibility efficiency application- specific domain- specific general purpose FPGA Kress Array Xplorer hard- wired

© 2001, University of Kaiserslautern 9 RAs are heading for Mainstream ASPP, application-specific programmable product is: Application-specific standard product and: embedded programmable logic Soap Chip : System on a programmable Chip Logic Analog DRAM/Flash/SRAM Programmable Logic Microprocessor CSoC, configurable SoC is: an industry standard µProcessor, embedded reconfigurable array, memory, dedicated systen bus... Logic Flash / RAM memory banks Reconfigurable Accelerator Array ARM, MIPS, or become indispensable for SoC products ?

© 2001, University of Kaiserslautern 10 Reconfigurable Logic going Mainstream Please, Lobby for New Curricula. Comprehensive Methodology One of the goals of this talk: to motivate You by Key Issues and Visionary Highlights. Fine grain: FPGAs killing the ASIC market Coarse grain: several startups Substantially improved design flow and libraries Fastest growing segment of semiconductor market

© 2001, University of Kaiserslautern 11 Designer-oriented Innovation stalled ? EDA industry: about 7 bio $ leverages > 200 bio $ semconductor industry FPGAs (7 bio $) fastest growing segment EDA industry constantly redefining itself except logic synthesis nor really significant innovation in the past decade CAD developers cant deliver their idear effectively CAD developers personally dont appreciate the real problems facing designers

© 2001, University of Kaiserslautern 12 EDA the main bottleneck

© 2001, University of Kaiserslautern 13 Biggest Mistake of EDA guess it !

© 2001, University of Kaiserslautern 14 >> History History Paradidgm Shift Coarse Grain: why ? Coarse Grain Architectures Reconfiguration Architecture

© 2001, University of Kaiserslautern 15 Logic Gate Price Trend Source:Altera Price (Normalized to Q1/1993) Q1 '93 Q1 '94 Q1 '95 Q1 '96 Q1 '97 Q1 '98 Q1 '99 Q1 '00 Price per Logic Element 40% lower per Year

© 2001, University of Kaiserslautern 16 ? Whats coming next ? The History of Paradigm Shifts Mainstream Silicon Application is switching every 10 Years TTL µproc., memory The Programmable System-on-a-Chip is the next wave custom standard Makimotos Wave ASICs, accels LSI, MSI 1 st Design Crisis 2 nd Design Crisis ? reconfigurable Published in 1989

© 2001, University of Kaiserslautern 17 Makimotos 3rd Wave Fine Grain Subsystems (FPGAs): –1st half of 3rd wave –universal (but less efficient) Coarse Grain Subsystems: –2nd half of 3rd wave –domain-specific –much more flexible than 2nd half of 2rd wave

© 2001, University of Kaiserslautern 18 Hows next Wave ? 2007 FPGAs custom standard Tredennicks Paradigm Shifts procedural programming algorithm: variable resources: fixed hardwired algorithm: fixed resources: fixed 2007 ? structural programming algorithm: variable resources: variable Coarse grain RAs no further wave ! Hartensteins Curve ? 4 th wave ?

© 2001, University of Kaiserslautern 19 The Impact of Makimotos Paradigm Shifts TTL µproc., memory custom standard ASICs, accels LSI, MSI reconfigurable Procedural personalization via RAM-based Machine Paradigm Personalization (CAD) before fabrication structural personalization: RAM-based before run time Dr. Makimoto: FPL 2000 keynote Software Industrys Secret of Success Repeat Success Story by new Machine Paradigm !

© 2001, University of Kaiserslautern 20 >> Paradigm Shift History Paradidgm Shift Coarse Grain: why ? Coarse Grain Architectures Reconfiguration Architecture

© 2001, University of Kaiserslautern 21 Sequential vs. structural RAM re- download conf. accelerator(s) RAM Logic Synthesis Route and Place FPGA von Neumann downloading RAM downloading data path instruction sequencer I / O (procedural) Software sequential RAM structural RAM

© 2001, University of Kaiserslautern 22 Changing Models of Computing von Neumann contemporary reconfigurable computing downloading RAM downloading data path instruction sequencer I / O host hardwired downloading accelerator(s) CAD RAM host re- downloading conf. accelerator(s) RAM (procedural) Software Configware (structural) Flexware Hardware occupies most silicon the tail wagging the dog

© 2001, University of Kaiserslautern 23 The Microprocessor is a Methuselah 1 th nd rd th th th th P5 (Pentium) 8 th P6 (Pentium Pro / Pentium II) 9 th Pentium III 9 technology generations the steam engine of the silicon age

© 2001, University of Kaiserslautern 24 … Decline of Wintel Business Model Billion Subscribers worldwide 1 Bio cellular & PCS 0.5 Bio 20 Billion US-$ US Market [ forrester] Million Devices delivered in the U.S. [IDC] Consumer PC Information Appliances 1000 $ Consumer PC av. resale ($) 1500 $ [ forrester]

© 2001, University of Kaiserslautern 25 Basics of Binding Time run time loading time compile time time of Instruction Fetch microprocessor parallel computer Reconfigurable Computing

© 2001, University of Kaiserslautern 26 Binding Time vs. Computing Domain time domain (procedural) Binding time: (Set-up of Communication Channels) at run time microprocessor parallel computer time & space (hybrid) systolic arrays later fabrication step ASICs space domain (structural) before fabrication full custom ICs at loading time at compile time Reconfigurable Computing array processor programming domain: The KressArray is a generalization of the systolic array

© 2001, University of Kaiserslautern 27 Dataquest Predicts Programmability to be Predominant in SOC With programmability as a standard feature, ASPPs will be predominant system-on-a-chip products in five years Dataquest Semiconductors 98 conference EETimes 10/21/98 Jordan Selburn, principal analyst, ASICs and system-level integration, Dataquest Inc.s Semiconductors Group Application-specific programmable products (ASPPs) will be the next best thing in semiconductor technology

© 2001, University of Kaiserslautern 28 Applications The 10 th International Conference on Field-programmable Logic and Applications The Roadmap to Reconfigurable Systems *) keynotes and papers at FPL 2000 Villach, Austria, August , next generations wireless* network processors* many other areas*

© 2001, University of Kaiserslautern 29 Applications (2) Image Processing: –for smart car (collision avoidance, others...), –Smart traffic pilots, robotics, fast material inspection, –smart stub finders, motion detection (MPEG-4,...) Signal Processing, Speech Processing, Software Radio, Correlation, Encryption, Comm. Switching / Protocols, Innovative consumer electronics: –super smart cards, smart handies, wearable, –portable, set-top, laptop, desktop, embedded,... many others,...

© 2001, University of Kaiserslautern 30 Applications new cellular standard: up to 2 Mbit/sec: new CDMA standard: > 500 MIPS needed just for RF receiver part wide variety of end-users devices: smart handies, palm pilots, laptops, games, camcorder-likes,..the internet car, many new types of devices to come... increasing wide variety of services available from network provider:download just what a particular customer is subscribed to expert group [Vissers]: > 20% of it will be accelerator code*

© 2001, University of Kaiserslautern 31 Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld microprocessor / DSP Normalized processor speed battery performance Algorithmic Complexity (Shannons Law) memory Transistors/chip G 3G 4G Why coarse grain ? 1G wireless mA/ MIP computational efficiency StrongARM SH7752

© 2001, University of Kaiserslautern 32 Shannons Law In a number of application areas throughput requirements are growing faster than Moore's law Fundamental flaws in software processor solutions 32 soft ARM cores fit onto contemporary FPGA Stream-based distributed processing is the way to go

© 2001, University of Kaiserslautern 33 Its a Paradigm Shift ! Using FPGAs (fine grain reconfigurable) just mainly is classical Logic Synthesis on a strange hardware platform Coarse Grain Reconfigurable Arrays (Reconfigurable Computing), however, mean a really fundamental Paradigm Shift This is still ignored by CS and EE Curricula and almost all R&D scenes

© 2001, University of Kaiserslautern 34 >> Coarse Grain: why ? History Paradidgm Shift Coarse Grain: why ? Coarse Grain Architectures Reconfiguration Architecture

© 2001, University of Kaiserslautern 35 Its a General Paradigm Shift ! Using FPGAs (fine grain reconfigurable): just Logic Synthesis on a strange platform Coarse Grain Reconfigurable Arrays (Reconfigurable Computing): a fundamental Paradigm Shift ignored by Curricula & most R&D scenes Replacing Concurrent Processes by much more efficient parallelism: Stream-based ComputingArrays systolic array* [1980] KressArray** [1995] chip-on-a-day* [2000] ____ *) hardwired **) reconfigurable

© 2001, University of Kaiserslautern 36 Fine-grained vs. coarse-grained Fine-grained reconfiguration versus coarse-grained reconfiguration. fine grain is general purpose slow and area-inefficient, but high parallelism coarse grain is application domain-specific coarse grain is highly area-efficient extremely high performance

© 2001, University of Kaiserslautern 37 Reconfigurability Overhead S S S S resources needed for reconfigurability partly for configuration code storage L LL LL L LLL area used by application hidden RAM not shown

© 2001, University of Kaiserslautern 38 Principle of a Typical FPGA FF of hidden RAM

© 2001, University of Kaiserslautern 39 Routing Overhead in FPGAs >1000 transistors at each cross bar FF part of the hidden RAM most FPGA vendors gate count: 1 flipflop of configuration RAM = 4 gates Routing Congestion [DeHon]: often 50% or less of CLBs used FF Ý 40 transistors at each switching point > Ý 15 transistors at each tap >

© 2001, University of Kaiserslautern 40 Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld Why Coarse Grain instead of FPGA ? physical logical supersystolic FPGA logical FPGA physical Transistors / chip ~ 10 ~ drastically smaller configuration memory a lot of more benefits much faster loading FPGA routed memory microprocessor reduced reconfigurability overhead by up to ~ 1000

© 2001, University of Kaiserslautern 41 >>> extremely high efficiency 1.avoiding address computation overhead 2.avoiding instruction fetch and interpretation overhead 3.high parallelism, massively multiple deep pipelines 4.much less configuration memory 5.no routing areas to configure functions from CLBs

© 2001, University of Kaiserslautern 42 Configurable Computing Systems combine programmable sequential processor with Flexware (structurally programmable hardware): capitalize on the strength of both,flexware and software. early 60ies: Estrin (UCLA): enabling technology not available 90ies: significant increase of research activities (DARPA...) FPGAs: not the enabling technology: hardware skills needed Verilog or VHDL based systems often result in poor performance

© 2001, University of Kaiserslautern 43 Platforms available Soft Data Path Arrays –KressArray –Xtreme (PACT) –ACM (Quicksilver Tech) –CHESS Array (Elixent) –others Compilation techniques feasibility studies: –Partitioning Co-Compiler –Design Space Explorer –others

© 2001, University of Kaiserslautern 44 Also as an autonomous Machine New Machine Paradigm (Xputer) is the counterpart of the so-called von Neumann paradigm – CONS: confuses customers (paradigm switch: the brain hurts) –PROS: strong guidance of EDA tool development –more effective hardware/software APIs –compilation techniques similar to traditional compilation –better Application Development Tools accepting C or Java easy to teach: simple machine principles –scan patterns (data counter) similar to control flow (program counter) –general model of hardware / software co-design –fascination for freak effect: opening up a new R&D discipline

© 2001, University of Kaiserslautern 45 >> Coarse Grain Architectures History Paradidgm Shift Coarse Grain: why ? Coarse Grain Architectures Reconfiguration Architecture

© 2001, University of Kaiserslautern 46 TriscendSystem on ChipSell Chips Embedded Systems Company Adaptive Silicon Chameleon Systems Malleable Silicon Spice Systolix MorphICs Architecture Not disclosed 32 bit datapath array Not disclosed Bit Serial Systolic Array Not disclosed Business Model Sell Cores Sell Chips Sell Solutions Sell Cores Markets Embedded DSP Networking Voice over IP Networking Signal Conditioning Wireless Commun. Network Processors : > 20 Players Cisco: Xilinxs largest Customer Some Players in Silicon Valley and ….

© 2001, University of Kaiserslautern 47 Commercial rDPAs XPU family (IP cores): PACT Corp., Munich XPU128 **) bought ** flexible array: MorphICs CALISTO: Silicon Spice CS2000 family: Chameleon Systems MECA family: Malleable FIPSOC: SIDSA ACM: Quicksilver Tech CHESS array: Elixent MorphoSys: Morpho Tech * * *) here at SoC

© 2001, University of Kaiserslautern 48 PACT Corp Xtreme Processor Platform (XPP) family of IP cores, high-speed data-stream-capable, scalable, reconfigurable clusters of arrays of 32-bit DPUs with embedded memories, and high-speed I/O ports - Application development support software featuring a flow graph- style algorithm mapping language - to minimize training requirements. XPP's fabrics, featuring automatic DataFlow synchronization and flagged Event Network to dynamically configure the execution flow, Supports dynamic RTR: hierarchical configuration managers free the designer from chip-level details and ensure that configurations are independently loaded in exactly the intended order. Automatic event-based task swapping along with data streams: released resources automatically reconfigured immediately

© 2001, University of Kaiserslautern 49 Reconfigurable Interconnect Fabric separate routing area rDPA (Reconfigurable Datapath Array) rDPU RIF layouted over rDPUs: rDPA wired by abutment

© 2001, University of Kaiserslautern 50 Generically defined Fabrics: KressArray Family Some Application Areas, like e. g. Wireless Communication, need extraordinarily powerful Communication Resources

© 2001, University of Kaiserslautern 51 Universal RAs are not always feasible... often Functional Resources are not the Throughput Bottleneck Some Application Areas, such as e. g. Wireless Communication, need extremely rich Communication Resources Use Domain-specific Platform Generators ! The General Purpose (coarse grain) Reconfigurable Array may appear to be an Illusion...

© 2001, University of Kaiserslautern 52 KressArray Family Example rDPU external view: only NNport Abutment Architecture shown taylored KressArray rDPU example

© 2001, University of Kaiserslautern 53 KressArray Family generic Fabrics: a few examples Examples of 2 nd Level Interconnect: layouted over rDPU cell - no separate routing areas ! + rout-through and function rout- through only more NNports: rich Rout Resources Select Function Repertory select Nearest Neighbour (NN) Interconnect: an example rDPU Select mode, number, width of NNports

© 2001, University of Kaiserslautern 54 CMOS intercoonnect resources Foundries offer up to 8 metal layers and up to 3 poly layers reconfigurable interconnect fabric layouted over the rDU cell

© 2001, University of Kaiserslautern 55 Super Pipe Networks The key is mapping, rather than architecture * *) KressArray [1995]

© 2001, University of Kaiserslautern 56 Communication Resource Requirements... often Functional Resources are not the Throughput Bottleneck In some Application Areas, such as e. g. Wireless Communication, Reconfigurable Computing Arrays need extraordinarily rich and powerful Communication Resources The Solution: Generators for Domain-specific RA Platforms

© 2001, University of Kaiserslautern 57 array size: 10 x 16 = 160 rDPUs SNN filter KressArray Mapping Example rout thru only not used backbus connect

© 2001, University of Kaiserslautern 58 route-thru-only rDPU 3 vert. NNports, 32 bit Xplorer Plot: SNN Filter Example + [13] 2 hor. NNports, 32 bit operator result operand route thru backbus connect

© 2001, University of Kaiserslautern 59 Super Pipe Networks The key is mapping, rather than architecture * *) KressArray [ASP-DAC-1995]

© 2001, University of Kaiserslautern 60 KressArray: try out youself ! You may experiment yourself You may use it over the internet Map an application onto a KressArray Start with a simple example Visit Click the link to Xplorer... does not run on internet explorer since Bill Gates does not like Java try Netscape 4.7x

© 2001, University of Kaiserslautern 61 Michael Herz Dissertation Michael Herz:... on mapping parallel memory architectures for stream-based arrays onto KessArrays... also transformation of storage schemes to optimize memory bandwith (MoM scan pattern transformations) Agilent, Sindelfingen

© 2001, University of Kaiserslautern 62 Ulrich Nageldinger Dissertation Ulrich Nageldinger:... on mapping applications onto KessArrays... simultaneous routing and placement by simulated annealing Supporting a huge family of KressArrays fuzzy logic improvement proposal generator profiling design space exploration infineon technologies, Munich

© 2001, University of Kaiserslautern 63 Rainer Kress Dissertation Rainer Kress:... on mapping applications onto his* KessArray DPSS datapath synthesis system Including a data scheduler (data stream scheduler) Generalization of the Systolic Array (KressArray is a super systolic array) 32 bit design via Eurochip support infineon technologies, Munich

© 2001, University of Kaiserslautern 64 Jürgen Becker Dissertation Jürgen Becker:... Automatically partitioning Co-compiler (configware / software co-compilation) Resource-parameter-driven retargettable Profiler-driven optimization Accepts HLL ALE-X (extended C subset) (subset: pointers not supported) Professor at Univ. Karlsruhe

© 2001, University of Kaiserslautern 65 Karin Schmidt Dissertation Karin Schmidt: Compilation Techniques for Xputers modified loop transformations Modified parts of implementation used for Jürgen Beckers Ph. D. thesis DaimlerChrysler Research

© 2001, University of Kaiserslautern 66 CHESS Array w. embedded RAM (Elixent) RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM User Registers Clock Control Memory Interface multi-granular e. g. 16 * 4 Bits = 64 Bits ALU 16by 4 RAM Sequencer State Machine

© 2001, University of Kaiserslautern 67 Chameleon Systems RISC processor and an array of 108 arithmetic processing units. Each of those 32-bit processing cores runs at 125 MHz. The CS2112 is the industry's first Reconfigurable Communications Processor (RCP), a streaming data processor. The vendor claims a performance of 20 billion 16-bit operations per second, and 2.4 billion 16-bit multiply-accumulates per second - and 1.6 GBytes / sec for ist programmable I/O (PIO) banks. It also has a PCI interface. Tool suite C~SIDE for developing, verifying and optimizing.

© 2001, University of Kaiserslautern 68 Coarse Grain Architectures

© 2001, University of Kaiserslautern 69 Primarily Mesh-based ….

© 2001, University of Kaiserslautern 70 UC Berkeley (Jan Rabaey)

© 2001, University of Kaiserslautern 71 Crossbar-based Architectures 1993: PADY-II (Jan Rabaey) 1990: UC Berkeley (Jan Rabaey) 16 bit 1997: Pleiades (mesh & crossbar) 32 bit

© 2001, University of Kaiserslautern 72 PADDI-II Architecture

© 2001, University of Kaiserslautern 73 MorphoSys

© 2001, University of Kaiserslautern 74 PipeRench Architecture (CMU 1998) highly dynamic reconfiguration alternating data/instruction stream

© 2001, University of Kaiserslautern 75 M.I.T. MIPS-like processor core cross bar global lines global lines RAW (M.I.T. 1997) Reconfigurable Architecture Workbench MATRIX (1996) Multiple Alu archiTecture with Reconfigurable Interconnect eXperiment 0.5 CMOS 8 bit 10 x mm MHz multi- granular ALU 8 bit 256x8 bit Mem WE mode Network Port A Network Port B Mem Func Port ALU Func Port compare / reduce 2 C / R Network compare / reduce 1 C / R NetworkLevel-1 Network BFU opcoperation × × + × + + × const insh nsh dsh csh := nand nor xor

© 2001, University of Kaiserslautern 76 MATRIX Interconnect Fabrics BFU its neighbours BFUs Communication Resources are often the bottleneck

© 2001, University of Kaiserslautern 77 More Research Projects.... and others Garp (UC Berkeley) RaPiD (U. Washington ) REMARC (Stanford) published between DReAM (U. Karlsruhe) Asia / Pacific: also see embedded tutorials by Prof. Amano (ASP_DAC99, FPL-2000)

© 2001, University of Kaiserslautern 78 RaPiD Architecture

© 2001, University of Kaiserslautern 79 REMARC

© 2001, University of Kaiserslautern 80 Future Coarse Grain RA Development It is indispensable to operate within the Convergence Area of Compilers, Co-Compilers, Architecture and full- custom-style VLSI Design (array cells). It is a must, that Products come with a Development Platform which encourages users,especially also those with a limited Hardware Background.

© 2001, University of Kaiserslautern 81 >> Reconfiguration Architecture History Paradidgm Shift Coarse Grain: why ? Coarse Grain Architectures Reconfiguration Architecture

© 2001, University of Kaiserslautern 82 statically re- configurable Dimensions of Reconfigurability configuration time ASIP fabrication time run time Network Processor design time compile time dynamically reconfigurable *) Application-Specific Instruction set Processors ASIPs* vs. Network Processors Extremes:

© 2001, University of Kaiserslautern 83 Configuration Architectures host Compiler, Mapper, RTOS etc. Soft Data Path RAM multi-context: Soft Data Path RAM host Compiler, Mapper, RTOS etc. straight forward: host Compiler, Mapper, RTOS etc. Config. Cache RAM Soft Data Path RAM Configuration caching*: Configuration Loading Resources: separate configuration fabrics (e.g. FPGA) wormhole routing (KressArray, Colt, PipeRench) RA part computes code for other RA part (self reconfiguration) (dynamic vs. static) dynamic *) no cache as usual !

© 2001, University of Kaiserslautern 84 Colt Architecture (P. Athanas 1996) Studying highly dynamic reconfiguration wormhole routing

© 2001, University of Kaiserslautern 85 Schedule timeslot – 10.00Reconfigurable Computing (RC) – 10.30coffee break – 12.00Compilation Techniques for RC – 14.00lunch break – 15.30Resources for Stream-based RC – 16.00coffee break – 17.30FPGAs: recent developments

© 2001, University of Kaiserslautern 86 - END -