Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling Technologies for Reconfigurable Computing Reiner Hartenstein University of Kaiserslautern November 21, 2001, Tampere, Finland Enabling Technologies.

Similar presentations


Presentation on theme: "Enabling Technologies for Reconfigurable Computing Reiner Hartenstein University of Kaiserslautern November 21, 2001, Tampere, Finland Enabling Technologies."— Presentation transcript:

1 Enabling Technologies for Reconfigurable Computing Reiner Hartenstein University of Kaiserslautern November 21, 2001, Tampere, Finland Enabling Technologies for Reconfigurable Computing part 1: Reconfigurable Computing (RC) Wednesday, November 21, 8.30 – 10.00 hrs.

2 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 2 Schedule timeslot 08.30 – 10.00Reconfigurable Computing (RC) 10.00 – 10.30coffee break 10.30 – 12.00Compilation Techniques for RC 12.00 – 14.00lunch break 14.00 – 15.30Resources for Stream-based RC 15.30 – 16.00coffee break 16.00 – 17.30FPGAs: recent developments

3 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 3 Reconfigurable: why? Exploding design cost and shrinking product life cycles of ASICs create a demand on RA usage for product longevity. Performance is only one part of the story. The time has come fully exploit their flexibility to support turn-around times of minutes instead of months for real time in-system debugging, profiling, verification, tuning, field- maintenance, and field-upgrades. A new soft machine paradigm and language framework is available for novel compilation techniques to cope with the new market structures transferring synthesis from vendor to customer.

4 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 4 SOC Alternatives… not including C/C++ CAD Tools [Gordon Bell] The blank sheet of paper: FPGA Auto design of a basic system: Tensilica Standardized, committee designed components*, cells, and custom IP Standard components including more application specific processors *, IP add-ons and custom One chip does it all: SMOP ** *) Processors, Memory, Communication & Memory Links, **) SMOP ??

5 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 5 SoC Alternatives [Gordon Bell] productstrategyvendor FPGAsea of uncommitted gate arraysXylinx, Altera compile a systemunique processor for every application Tensilica systolic arraymany pipelined or parallel processors + custom DSP, VLIWspecial purpose processor cores + custom TI processor + RAM + ASICS general purpose cores, specialized by I/O, etc. IBM, Intel, universal micromultiprocessor array, programmable I/O Cradle

6 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 6 A Decade of Research in Reconfigurable Computing Due to the achievements of numerous Research Projects throughout the 90ies the Breakthrough in Commercialization has started and already a quite comprehensive Methodology is available. Dear Colleague, the RC Scene welcomes your contributions to improve it and to push for Inclusion in contemporary CS&E Curricula. It is one of the Goals of this Talk to stimulate you by Highlights and introducing some Key Issues.

7 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 7 no more a strange niche area was Hardware design for a strange plattform –CAD, but no Compilation Emerging awareness: –New mind set –New curricular embedding coming Dichotomie of CS –SW CW –HW FW –computing in time computing in space

8 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 8 flexibility / universality trade-off trade-off flexibility efficiency application- specific domain- specific general purpose FPGA Kress Array Xplorer hard- wired

9 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 9 RAs are heading for Mainstream ASPP, application-specific programmable product is: Application-specific standard product and: embedded programmable logic Soap Chip : System on a programmable Chip Logic Analog DRAM/Flash/SRAM Programmable Logic Microprocessor CSoC, configurable SoC is: an industry standard µProcessor, embedded reconfigurable array, memory, dedicated systen bus... Logic Flash / RAM memory banks Reconfigurable Accelerator Array ARM, MIPS, or...... become indispensable for SoC products ?

10 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 10 Reconfigurable Logic going Mainstream Please, Lobby for New Curricula. Comprehensive Methodology One of the goals of this talk: to motivate You by Key Issues and Visionary Highlights. Fine grain: FPGAs killing the ASIC market Coarse grain: several startups Substantially improved design flow and libraries Fastest growing segment of semiconductor market

11 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 11 Designer-oriented Innovation stalled ? EDA industry: about 7 bio $ leverages > 200 bio $ semconductor industry FPGAs (7 bio $) fastest growing segment EDA industry constantly redefining itself except logic synthesis nor really significant innovation in the past decade CAD developers cant deliver their idear effectively CAD developers personally dont appreciate the real problems facing designers

12 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 12 EDA the main bottleneck

13 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 13 Biggest Mistake of EDA guess it !

14 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 14 >> History History Paradidgm Shift Coarse Grain: why ? Coarse Grain Architectures Reconfiguration Architecture http://www.uni-kl.de

15 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 15 Logic Gate Price Trend Source:Altera Price (Normalized to Q1/1993) Q1 '93 Q1 '94 Q1 '95 Q1 '96 Q1 '97 Q1 '98 Q1 '99 Q1 '00 Price per Logic Element 40% lower per Year 0 0.2 0.4 0.6 0.8 1 1.2 0.261 0.086 0.042 0.029

16 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 16 ? Whats coming next ? The History of Paradigm Shifts Mainstream Silicon Application is switching every 10 Years TTL µproc., memory The Programmable System-on-a-Chip is the next wave custom standard 1957 1967 1977 1987 1997 2007 Makimotos Wave ASICs, accels LSI, MSI 1 st Design Crisis 2 nd Design Crisis ? reconfigurable Published in 1989

17 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 17 Makimotos 3rd Wave Fine Grain Subsystems (FPGAs): –1st half of 3rd wave –universal (but less efficient) Coarse Grain Subsystems: –2nd half of 3rd wave –domain-specific –much more flexible than 2nd half of 2rd wave

18 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 18 Hows next Wave ? 2007 FPGAs custom standard 1957 1967 1977 1987 1997 Tredennicks Paradigm Shifts procedural programming algorithm: variable resources: fixed hardwired algorithm: fixed resources: fixed 2007 ? structural programming algorithm: variable resources: variable Coarse grain RAs no further wave ! Hartensteins Curve ? 4 th wave ?

19 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 19 The Impact of Makimotos Paradigm Shifts TTL µproc., memory custom standard ASICs, accels LSI, MSI reconfigurable 1957 1967 1977 1987 1997 2007 Procedural personalization via RAM-based Machine Paradigm Personalization (CAD) before fabrication structural personalization: RAM-based before run time Dr. Makimoto: FPL 2000 keynote Software Industrys Secret of Success Repeat Success Story by new Machine Paradigm !

20 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 20 >> Paradigm Shift History Paradidgm Shift Coarse Grain: why ? Coarse Grain Architectures Reconfiguration Architecture http://www.uni-kl.de

21 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 21 Sequential vs. structural RAM re- download conf. accelerator(s) RAM Logic Synthesis Route and Place FPGA von Neumann downloading RAM downloading data path instruction sequencer I / O (procedural) Software sequential RAM structural RAM

22 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 22 Changing Models of Computing von Neumann contemporary reconfigurable computing downloading RAM downloading data path instruction sequencer I / O host hardwired downloading accelerator(s) CAD RAM host re- downloading conf. accelerator(s) RAM (procedural) Software Configware (structural) Flexware Hardware occupies most silicon the tail wagging the dog

23 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 23 The Microprocessor is a Methuselah 1 th 4004 2 nd 8008 3 rd 8086 4 th 80286 5 th 80386 6 th 80486 7 th P5 (Pentium) 8 th P6 (Pentium Pro / Pentium II) 9 th Pentium III 9 technology generations...... the steam engine of the silicon age

24 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 24 … Decline of Wintel Business Model Billion Subscribers worldwide 1 Bio cellular & PCS 0.5 Bio 20 Billion US-$ US Market [ forrester] 15 10 20 199719981999200020012002 Million Devices delivered in the U.S. [IDC] Consumer PC Information Appliances 1000 $ Consumer PC av. resale ($) 1500 $ [ forrester]

25 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 25 Basics of Binding Time run time loading time compile time time of Instruction Fetch microprocessor parallel computer Reconfigurable Computing

26 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 26 Binding Time vs. Computing Domain time domain (procedural) Binding time: (Set-up of Communication Channels) at run time microprocessor parallel computer time & space (hybrid) systolic arrays later fabrication step ASICs space domain (structural) before fabrication full custom ICs at loading time at compile time Reconfigurable Computing array processor programming domain: The KressArray is a generalization of the systolic array

27 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 27 Dataquest Predicts Programmability to be Predominant in SOC With programmability as a standard feature, ASPPs will be predominant system-on-a-chip products in five years Dataquest Semiconductors 98 conference EETimes 10/21/98 Jordan Selburn, principal analyst, ASICs and system-level integration, Dataquest Inc.s Semiconductors Group Application-specific programmable products (ASPPs) will be the next best thing in semiconductor technology

28 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 28 Applications The 10 th International Conference on Field-programmable Logic and Applications The Roadmap to Reconfigurable Systems *) keynotes and papers at FPL 2000 Villach, Austria, August 27 - 30, 2000 http://www.fpl.uni-kl.de/FPL/ next generations wireless* network processors* many other areas*

29 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 29 Applications (2) Image Processing: –for smart car (collision avoidance, others...), –Smart traffic pilots, robotics, fast material inspection, –smart stub finders, motion detection (MPEG-4,...) Signal Processing, Speech Processing, Software Radio, Correlation, Encryption, Comm. Switching / Protocols, Innovative consumer electronics: –super smart cards, smart handies, wearable, –portable, set-top, laptop, desktop, embedded,... many others,...

30 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 30 Applications new cellular standard: up to 2 Mbit/sec: new CDMA standard: > 500 MIPS needed just for RF receiver part wide variety of end-users devices: smart handies, palm pilots, laptops, games, camcorder-likes,..the internet car, many new types of devices to come... increasing wide variety of services available from network provider:download just what a particular customer is subscribed to expert group [Vissers]: > 20% of it will be accelerator code*

31 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 31 Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld microprocessor / DSP Normalized processor speed battery performance Algorithmic Complexity (Shannons Law) memory Transistors/chip 1960 1970 1980 1990 2000 2010 100 000 000 10 000 000 1000 000 100 000 10 000 1000 100 10 1 2G 3G 4G Why coarse grain ? 1G wireless 100 10 1 0.1 0.01 0.001 mA/ MIP computational efficiency StrongARM SH7752

32 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 32 Shannons Law In a number of application areas throughput requirements are growing faster than Moore's law Fundamental flaws in software processor solutions 32 soft ARM cores fit onto contemporary FPGA Stream-based distributed processing is the way to go

33 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 33 Its a Paradigm Shift ! Using FPGAs (fine grain reconfigurable) just mainly is classical Logic Synthesis on a strange hardware platform Coarse Grain Reconfigurable Arrays (Reconfigurable Computing), however, mean a really fundamental Paradigm Shift This is still ignored by CS and EE Curricula and almost all R&D scenes

34 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 34 >> Coarse Grain: why ? History Paradidgm Shift Coarse Grain: why ? Coarse Grain Architectures Reconfiguration Architecture http://www.uni-kl.de

35 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 35 Its a General Paradigm Shift ! Using FPGAs (fine grain reconfigurable): just Logic Synthesis on a strange platform Coarse Grain Reconfigurable Arrays (Reconfigurable Computing): a fundamental Paradigm Shift ignored by Curricula & most R&D scenes Replacing Concurrent Processes by much more efficient parallelism: Stream-based ComputingArrays systolic array* [1980] KressArray** [1995] chip-on-a-day* [2000] ____ *) hardwired **) reconfigurable

36 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 36 Fine-grained vs. coarse-grained Fine-grained reconfiguration versus coarse-grained reconfiguration. fine grain is general purpose slow and area-inefficient, but high parallelism coarse grain is application domain-specific coarse grain is highly area-efficient extremely high performance

37 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 37 Reconfigurability Overhead S S S S resources needed for reconfigurability partly for configuration code storage L LL LL L LLL area used by application hidden RAM not shown

38 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 38 Principle of a Typical FPGA FF of hidden RAM

39 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 39 Routing Overhead in FPGAs >1000 transistors at each cross bar FF part of the hidden RAM most FPGA vendors gate count: 1 flipflop of configuration RAM = 4 gates Routing Congestion [DeHon]: often 50% or less of CLBs used FF Ý 40 transistors at each switching point > Ý 15 transistors at each tap >

40 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 40 Sources: Proc ISSCC, ICSPAT, DAC, DSPWorld Why Coarse Grain instead of FPGA ? physical logical supersystolic FPGA logical 1980 1990 2000 2010 FPGA physical 100 000 000 000 10 000 000 000 1000 000 000 100 000 000 10 000 000 1000 000 100 000 10 000 1000 Transistors / chip ~ 10 ~ 10 000 drastically smaller configuration memory a lot of more benefits much faster loading FPGA routed memory microprocessor reduced reconfigurability overhead by up to ~ 1000

41 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 41 >>> extremely high efficiency 1.avoiding address computation overhead 2.avoiding instruction fetch and interpretation overhead 3.high parallelism, massively multiple deep pipelines 4.much less configuration memory 5.no routing areas to configure functions from CLBs

42 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 42 Configurable Computing Systems combine programmable sequential processor with Flexware (structurally programmable hardware): capitalize on the strength of both,flexware and software. early 60ies: Estrin (UCLA): enabling technology not available 90ies: significant increase of research activities (DARPA...) FPGAs: not the enabling technology: hardware skills needed Verilog or VHDL based systems often result in poor performance

43 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 43 Platforms available Soft Data Path Arrays –KressArray –Xtreme (PACT) –ACM (Quicksilver Tech) –CHESS Array (Elixent) –others Compilation techniques feasibility studies: –Partitioning Co-Compiler –Design Space Explorer –others

44 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 44 Also as an autonomous Machine New Machine Paradigm (Xputer) is the counterpart of the so-called von Neumann paradigm – CONS: confuses customers (paradigm switch: the brain hurts) –PROS: strong guidance of EDA tool development –more effective hardware/software APIs –compilation techniques similar to traditional compilation –better Application Development Tools accepting C or Java easy to teach: simple machine principles –scan patterns (data counter) similar to control flow (program counter) –general model of hardware / software co-design –fascination for freak effect: opening up a new R&D discipline

45 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 45 >> Coarse Grain Architectures History Paradidgm Shift Coarse Grain: why ? Coarse Grain Architectures Reconfiguration Architecture http://www.uni-kl.de

46 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 46 TriscendSystem on ChipSell Chips Embedded Systems Company Adaptive Silicon Chameleon Systems Malleable Silicon Spice Systolix MorphICs Architecture Not disclosed 32 bit datapath array Not disclosed Bit Serial Systolic Array Not disclosed Business Model Sell Cores Sell Chips Sell Solutions Sell Cores Markets Embedded DSP Networking Voice over IP Networking Signal Conditioning Wireless Commun. Network Processors : > 20 Players Cisco: Xilinxs largest Customer Some Players in Silicon Valley and ….

47 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 47 Commercial rDPAs XPU family (IP cores): PACT Corp., Munich XPU128 **) bought ** flexible array: MorphICs CALISTO: Silicon Spice CS2000 family: Chameleon Systems MECA family: Malleable FIPSOC: SIDSA ACM: Quicksilver Tech CHESS array: Elixent MorphoSys: Morpho Tech * * *) here at SoC

48 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 48 PACT Corp Xtreme Processor Platform (XPP) family of IP cores, high-speed data-stream-capable, scalable, reconfigurable clusters of arrays of 32-bit DPUs with embedded memories, and high-speed I/O ports - Application development support software featuring a flow graph- style algorithm mapping language - to minimize training requirements. XPP's fabrics, featuring automatic DataFlow synchronization and flagged Event Network to dynamically configure the execution flow, Supports dynamic RTR: hierarchical configuration managers free the designer from chip-level details and ensure that configurations are independently loaded in exactly the intended order. Automatic event-based task swapping along with data streams: released resources automatically reconfigured immediately

49 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 49 Reconfigurable Interconnect Fabric separate routing area rDPA (Reconfigurable Datapath Array) rDPU RIF layouted over rDPUs: rDPA wired by abutment

50 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 50 Generically defined Fabrics: KressArray Family Some Application Areas, like e. g. Wireless Communication, need extraordinarily powerful Communication Resources

51 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 51 Universal RAs are not always feasible... often Functional Resources are not the Throughput Bottleneck Some Application Areas, such as e. g. Wireless Communication, need extremely rich Communication Resources Use Domain-specific Platform Generators ! The General Purpose (coarse grain) Reconfigurable Array may appear to be an Illusion...

52 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 52 KressArray Family Example 16 24 32 4 8 2 rDPU external view: only NNport Abutment Architecture shown taylored KressArray rDPU example http://kressarray.de

53 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 53 KressArray Family generic Fabrics: a few examples Examples of 2 nd Level Interconnect: layouted over rDPU cell - no separate routing areas ! + rout-through and function rout- through only more NNports: rich Rout Resources Select Function Repertory select Nearest Neighbour (NN) Interconnect: an example 16328 24 4 2 rDPU Select mode, number, width of NNports http://kressarray.de

54 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 54 CMOS intercoonnect resources Foundries offer up to 8 metal layers and up to 3 poly layers reconfigurable interconnect fabric layouted over the rDU cell

55 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 55 Super Pipe Networks The key is mapping, rather than architecture * *) KressArray [1995]

56 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 56 Communication Resource Requirements... often Functional Resources are not the Throughput Bottleneck In some Application Areas, such as e. g. Wireless Communication, Reconfigurable Computing Arrays need extraordinarily rich and powerful Communication Resources The Solution: Generators for Domain-specific RA Platforms

57 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 57 array size: 10 x 16 = 160 rDPUs http://kressarray.de SNN filter KressArray Mapping Example rout thru only not used backbus connect

58 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 58 route-thru-only rDPU 3 vert. NNports, 32 bit http://kressarray.de Xplorer Plot: SNN Filter Example + [13] 2 hor. NNports, 32 bit operator result operand route thru backbus connect

59 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 59 Super Pipe Networks The key is mapping, rather than architecture * *) KressArray [ASP-DAC-1995]

60 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 60 KressArray: try out youself ! You may experiment yourself You may use it over the internet Map an application onto a KressArray Start with a simple example Visit http://kressarray.dehttp://kressarray.de Click the link to Xplorer... does not run on internet explorer....... since Bill Gates does not like Java try Netscape 4.7x

61 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 61 Michael Herz Dissertation Michael Herz:... on mapping parallel memory architectures for stream-based arrays onto KessArrays... also transformation of storage schemes to optimize memory bandwith (MoM scan pattern transformations) Agilent, Sindelfingen

62 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 62 Ulrich Nageldinger Dissertation Ulrich Nageldinger:... on mapping applications onto KessArrays... simultaneous routing and placement by simulated annealing Supporting a huge family of KressArrays fuzzy logic improvement proposal generator profiling design space exploration infineon technologies, Munich

63 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 63 Rainer Kress Dissertation Rainer Kress:... on mapping applications onto his* KessArray DPSS datapath synthesis system Including a data scheduler (data stream scheduler) Generalization of the Systolic Array (KressArray is a super systolic array) 32 bit design via Eurochip support infineon technologies, Munich

64 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 64 Jürgen Becker Dissertation Jürgen Becker:... Automatically partitioning Co-compiler (configware / software co-compilation) Resource-parameter-driven retargettable Profiler-driven optimization Accepts HLL ALE-X (extended C subset) (subset: pointers not supported) Professor at Univ. Karlsruhe

65 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 65 Karin Schmidt Dissertation Karin Schmidt: Compilation Techniques for Xputers modified loop transformations Modified parts of implementation used for Jürgen Beckers Ph. D. thesis DaimlerChrysler Research

66 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 66 CHESS Array w. embedded RAM (Elixent) RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM RAMRAM User Registers Clock Control Memory Interface multi-granular e. g. 16 * 4 Bits = 64 Bits ALU 16by 4 RAM Sequencer State Machine

67 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 67 Chameleon Systems RISC processor and an array of 108 arithmetic processing units. Each of those 32-bit processing cores runs at 125 MHz. The CS2112 is the industry's first Reconfigurable Communications Processor (RCP), a streaming data processor. The vendor claims a performance of 20 billion 16-bit operations per second, and 2.4 billion 16-bit multiply-accumulates per second - and 1.6 GBytes / sec for ist programmable I/O (PIO) banks. It also has a PCI interface. Tool suite C~SIDE for developing, verifying and optimizing.

68 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 68 Coarse Grain Architectures

69 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 69 Primarily Mesh-based ….

70 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 70 UC Berkeley (Jan Rabaey)

71 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 71 Crossbar-based Architectures 1993: PADY-II (Jan Rabaey) 1990: UC Berkeley (Jan Rabaey) 16 bit 1997: Pleiades (mesh & crossbar) 32 bit

72 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 72 PADDI-II Architecture

73 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 73 MorphoSys

74 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 74 PipeRench Architecture (CMU 1998) highly dynamic reconfiguration alternating data/instruction stream

75 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 75 M.I.T. MIPS-like processor core cross bar global lines global lines RAW (M.I.T. 1997) Reconfigurable Architecture Workbench MATRIX (1996) Multiple Alu archiTecture with Reconfigurable Interconnect eXperiment 0.5 CMOS 8 bit 10 x 10 1.8 mm 2 100 MHz multi- granular ALU 8 bit 256x8 bit Mem WE mode Network Port A Network Port B Mem Func Port ALU Func Port compare / reduce 2 C / R Network compare / reduce 1 C / R NetworkLevel-1 Network BFU opcoperation 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 × × + × + + × const insh nsh dsh csh + +0 +1 := nand nor xor

76 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 76 MATRIX Interconnect Fabrics BFU its neighbours BFUs Communication Resources are often the bottleneck

77 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 77 More Research Projects.... and others Garp (UC Berkeley) RaPiD (U. Washington ) REMARC (Stanford) published between 1996 - 2000 DReAM (U. Karlsruhe) Asia / Pacific: also see embedded tutorials by Prof. Amano (ASP_DAC99, FPL-2000)

78 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 78 RaPiD Architecture

79 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 79 REMARC

80 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 80 Future Coarse Grain RA Development It is indispensable to operate within the Convergence Area of Compilers, Co-Compilers, Architecture and full- custom-style VLSI Design (array cells). It is a must, that Products come with a Development Platform which encourages users,especially also those with a limited Hardware Background.

81 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 81 >> Reconfiguration Architecture History Paradidgm Shift Coarse Grain: why ? Coarse Grain Architectures Reconfiguration Architecture http://www.uni-kl.de

82 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 82 statically re- configurable Dimensions of Reconfigurability configuration time ASIP fabrication time run time Network Processor design time compile time dynamically reconfigurable *) Application-Specific Instruction set Processors ASIPs* vs. Network Processors Extremes:

83 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 83 Configuration Architectures host Compiler, Mapper, RTOS etc. Soft Data Path RAM multi-context: Soft Data Path RAM host Compiler, Mapper, RTOS etc. straight forward: host Compiler, Mapper, RTOS etc. Config. Cache RAM Soft Data Path RAM Configuration caching*: Configuration Loading Resources: separate configuration fabrics (e.g. FPGA) wormhole routing (KressArray, Colt, PipeRench) RA part computes code for other RA part (self reconfiguration) (dynamic vs. static) dynamic *) no cache as usual !

84 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 84 Colt Architecture (P. Athanas 1996) Studying highly dynamic reconfiguration wormhole routing

85 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 85 Schedule timeslot 08.30 – 10.00Reconfigurable Computing (RC) 10.00 – 10.30coffee break 10.30 – 12.00Compilation Techniques for RC 12.00 – 14.00lunch break 14.00 – 15.30Resources for Stream-based RC 15.30 – 16.00coffee break 16.00 – 17.30FPGAs: recent developments

86 © 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de University of Kaiserslautern 86 - END -


Download ppt "Enabling Technologies for Reconfigurable Computing Reiner Hartenstein University of Kaiserslautern November 21, 2001, Tampere, Finland Enabling Technologies."

Similar presentations


Ads by Google