From Organic Computing to Reconfigurable Computing Reiner Hartenstein TU Kaiserslautern PASA, Frankfurt, March 16, 2006.

Slides:



Advertisements
Similar presentations
CASES 2002 Intl Conference on Compilers, Architectures and Synthesis for Embedded Systems Embedded Architectures: Configurable, Re-configurable, or what?
Advertisements

The von Neumann Syndrome Reiner Hartenstein TU Kaiserslautern TU Delft, Sept 28, (v.2)
Reconfigurable Supercomputing means to brave the paradigm chasm Reiner Hartenstein HiPEAC Workshop on Reconfigurable Computing Ghent, Belgium January 28,
Device Tradeoffs Greg Stitt ECE Department University of Florida.
A reconfigurable system featuring dynamically extensible embedded microprocessor, FPGA, and customizable I/O Borgatti, M. Lertora, F. Foret, B. Cali, L.
An Introduction to Reconfigurable Computing Mitch Sukalski and Craig Ulmer Dean R&D Seminar 11 December 2003.
Lecture 9: Coarse Grained FPGA Architecture October 6, 2004 ECE 697F Reconfigurable Computing Lecture 9 Coarse Grained FPGA Architecture.
Implementation methodology for Emerging Reconfigurable Systems With minimum optimization an appreciable speedup of 3x is achievable for this program with.
Computer Architecture & Organization
Reconfigurable Supercomputing: Hindernisse und Chancen Reiner Hartenstein TU Kaiserslautern Universität Mannheim, 13. Dez
MSE 2005 Reconfigurable Computing (RC) being Mainstream: Torpedoed by Education Reiner Hartenstein TU Kaiserslautern International Conference on Microelectronic.
© 2006, Reconfigurable Computing Reiner Hartenstein Computing Meeting EU, ESU, Brussells, May 18, 2006.
IPDPS 2004 Software or Configware? About the Digital Divide of Parallel Computing Reiner Hartenstein TU Kaiserslautern Santa Fe, NM, April , 2004.
Reconfigurable HPC Reconfigurable HPC part 1 Introduction Reiner Hartenstein TU Kaiserslautern May 14, 2004, TU Tallinn, Estonia.
(keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,
Reconfigurable Supercomputing: Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Dresden, Gemany, June , 2006 International Supercomputer.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Lecture 26: Reconfigurable Computing May 11, 2004 ECE 669 Parallel Computer Architecture Reconfigurable Computing.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
Seminar at Kyushu University Reconfigurable Technologies (1) Reiner Hartenstein TU Kaiserslautern July 23, 2004, Fukuoka, Japan.
Configurable System-on-Chip: Xilinx EDK
Seven Minute Madness: Reconfigurable Computing Dr. Jason D. Bakos.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
Parallel Algorithms - Introduction Advanced Algorithms & Data Structures Lecture Theme 11 Prof. Dr. Th. Ottmann Summer Semester 2006.
SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu
CS 151 Digital Systems Design Lecture 38 Programmable Logic.
CS curricula update proposed: by adding Reconfigurable Computing Reiner Hartenstein TU Kaiserslautern EAB meeting, Philadelphia,1 Nov 2005.
Dynamic Hardware Software Partitioning A First Approach Komal Kasat Nalini Kumar Gaurav Chitroda.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
FPGA Based Fuzzy Logic Controller for Semi- Active Suspensions Aws Abu-Khudhair.
General FPGA Architecture Field Programmable Gate Array.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
The Transdisciplinary Responsibility of CS Curricula Reiner Hartenstein TU Kaiserslautern San Diego, CA, USA, June , 2006 THE NINTH WORLD CONFERENCE.
Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.
1 3-General Purpose Processors: Altera Nios II 2 Altera Nios II processor A 32-bit soft core processor from Altera Comes in three cores: Fast, Standard,
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
1 Embedded Systems Computer Architecture. Embedded Systems2 Memory Hierarchy Registers Cache RAM Disk L2 Cache Speed (faster) Cost (cheaper per-byte)
J. Christiansen, CERN - EP/MIC
Reminder Lab 0 Xilinx ISE tutorial Research Send me an if interested Looking for those interested in RC with skills in compilers/languages/synthesis,
Introduction to Reconfigurable Computing Greg Stitt ECE Department University of Florida.
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Computer Engineering Rabie A. Ramadan Lecture 1. 2 Welcome Back.
VLSI-SoC 2001 IFIP - LIRMM Stream-based Arrays: Converging Design Flows for both, Reiner Hartenstein University of Kaiserslautern December 2- 4, 2001,
EE3A1 Computer Hardware and Digital Design
Chapter 1 Computer Abstractions and Technology. Chapter 1 — Computer Abstractions and Technology — 2 The Computer Revolution Progress in computer technology.
Computer Organization & Assembly Language © by DR. M. Amer.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
The von Neumann Syndrome calls for a Revolution Reiner Hartenstein TU Kaiserslautern Reno, NV, November 11, HPRCTA'07 - First.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Cray XD1 Reconfigurable Computing for Application Acceleration.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
Introduction to Field Programmable Gate Arrays (FPGAs) EDL Spring 2016 Johns Hopkins University Electrical and Computer Engineering March 2, 2016.
1 A simple parallel algorithm Adding n numbers in parallel.
Introduction to Computers - Hardware
Computer Organization and Architecture Lecture 1 : Introduction
Programmable Logic Devices
ECE354 Embedded Systems Introduction C Andras Moritz.
Embedded Systems Design
Architecture & Organization 1
FPGAs in AWS and First Use Cases, Kees Vissers
Introduction to Reconfigurable Computing
Architecture & Organization 1
Dynamically Reconfigurable Architectures: An Overview
Operating Systems Chapter 5: Input/Output Management
Embedded Architectures: Configurable, Re-configurable, or what?
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Chapter 1 Introduction.
HIGH LEVEL SYNTHESIS.
Presentation transcript:

From Organic Computing to Reconfigurable Computing Reiner Hartenstein TU Kaiserslautern PASA, Frankfurt, March 16, 2006

© 2005, TU Kaiserslautern 2 Reconfigurable Computing (RC) and FPGA* in the media ##### Design Starts until 2010: from 80,000 to 110,000 [Dataquest] June 2005 fastest growing segment of the semiconductor market: ~6 billion US-$ [Dataquest] *) Field-Programmable Gate Array Google: 10 million hits

© 2005, TU Kaiserslautern 3 The Pervasiveness of RC 162, , , , , ,000 # of hits by Google 1,620, , , , ,000 1,490,000 # of hits by Google search “FPGA and ….”

© 2005, TU Kaiserslautern 4 >> Outline << Reconfigurable Computing Paradox Von Neumann loosing its dominance Software vs. Configware The dual paradigm approach Coarse-grained Reconfigurable Devices Conclusions

© 2005, TU Kaiserslautern 5 The RC Paradox Effective integration density much worse than the Gordon Moore curve: by a factor of more than 10,000 „very power-hungry“ [Rick Kornfeld*] *) personal communication application development: until recently still Logic Design on a very strange platform The awful technology of FPGAs: FPGAs run at lower clock frequencies, draw more power and are more expensive.

© 2005, TU Kaiserslautern 6 fine-grained RC: low effective integration density immense area inefficiency reconfigurability overhead routing congestion wiring overhead overhead: > FPGA logical FPGA routed density: FPGA physical (Gordon Moore curve) transistors / microchip (microprocessor) [DeHon, Ph.D 1996]

© 2005, TU Kaiserslautern 7 published speed-up factors # P4 7% / yr 50% / yr Los Alamos traffic simulation 47 real-time face detection 6000 video-rate stereo vision 900 pattern recognition 730 SPIHT wavelet-based image compression 457 Smith-Waterman pattern matching 288 BLAST 52 protein identification 40 molecular dynamics simulation 88 Reed-Solomon Decoding 2400 Viterbi Decoding 400 FFT MAC Grid-based DRC: no FPGA: DPLA on MoM by TU-KL Grid-based DRC: no FPGA: DPLA on MoM by TU-KL D FIR filter (no FPGA: DPLA by TU-KL) 39,4 Lee Routing ( DPLA by TU-KL) 160 Grid-based DRC („fair comparizon“) DSP and wireless Image processing, Pattern matching, Multimedia Bioinformatics GRAPE 20 Astrophysics MoM Xputer architecture crypto Microprocessor relative performance Memory X 2 / yr

© 2005, TU Kaiserslautern 8 HeHon‘s Law MOPS / milliWatt µ feature size RISC FPGA

© 2005, TU Kaiserslautern 9 However.... Application migration [from supercomputer] resulting in performance increase up to 4 orders of magnitude Reducing electricity bill by an order of magnitude Hits the memory wall from a different direction People think that high-performance must mean expensive

© 2005, TU Kaiserslautern 10 why the RC paradigm shift is so important Move the stool or the grand piano? by Software by Configware

© 2005, TU Kaiserslautern 11 >> Outline << Reconfigurable Computing Paradox Von Neumann loosing its dominance Software vs. Configware The dual paradigm approach Coarse-grained Reconfigurable Devices Conclusions

© 2005, TU Kaiserslautern 12 Cray XD1 vN paradigm loosing its dominance Xilinx inside ! Xilinx FPGA

© 2005, TU Kaiserslautern 13 von Neumann is not the common model progra m counter DPU CPU RAM memory von Neumann bottleneck von Neumann instruction-stream- based machine co-processors accelerator CPU instruction- stream- based data- stream- based hardware software mainframe age: microprocessor age: wagging the dog the tail is vN paradigm dominance ?

© 2005, TU Kaiserslautern 14 Here is the common model progra m counter DPU CPU RAM memory von Neumann bottleneck von Neumann instruction-stream- based machine co-processors accelerator CPU instruction- stream- based data- stream- based hardware software mainframe age: microprocessor age: configware age: morphware accelerator reconfigurable accelerator hardwired CPU

© 2005, TU Kaiserslautern 15 Here is the common model progra m counter DPU CPU RAM memory von Neumann bottleneck von Neumann instruction-stream- based machine co-processors accelerator CPU instruction- stream- based data- stream- based hardware software mainframe age: microprocessor age: configware age: CPU accelerator reconfigurable morphware software/configware co-compiler

© 2005, TU Kaiserslautern 16 Fundamentally different mind set no program counter non-von-Neumann completely different OS principles no instruction fetch at run time it’s configware: definitely it is not software

© 2005, TU Kaiserslautern 17 >> Outline << Reconfigurable Computing Paradox Von Neumann loosing its dominance Software vs. Configware The dual paradigm approach Coarse-grained Reconfigurable Devices Conclusions

© 2005, TU Kaiserslautern 18 Compilation: Software vs. Configware source program software compiler software code Software Engineering configware code mapper configware compiler scheduler flowware code source „ program “ Configware Engineering placement & routing data C, FORTRAN MATHLAB

© 2005, TU Kaiserslautern 19 configware resources: variable Nick Tredennick’s Paradigm Shifts explain the differences 2 programming sources needed flowware algorithm: variable Configware Engineering Software Engineering 1 programming source needed algorithm: variable resources: fixed software CPU

© 2005, TU Kaiserslautern 20 Co-Compilation software compiler software code Software / Configware Co-Compiler configware code mapper configware compiler scheduler flowware code data C, FORTRAN, MATHLAB automatic SW / CW partitioner simulated annealing

© 2005, TU Kaiserslautern 21 Organic Computing ? Bio-inspired use of FPGAs evolvable „hardware“ community: crossover of chromosomes In love with genetic algorithms: darwinistic way to fitness thru generations of populations inefficient, but unexpected results possible simulated annealing (genetic morphing) - fitness by synthesis: highly efficient

© 2005, TU Kaiserslautern 22 Software / Configware Co-Compilation Resource Parameters supporting different platforms Analyzer / Profiler SW code SW compiler paradigm “vN" machine CW Code CW compiler Kress/Kung machine paradigm Partitioner C language source FW Code Juergen Becker’s CoDe-X, 1996 simulated annealing

© 2005, TU Kaiserslautern 23 Co-Compiler for Hardwired Kress/Kung Machine [e. g. Brodersen] software compiler software code Software / Flowware Co-Compiler flowware compiler scheduler flowware code data source automatic SW / CW partitioner

© 2005, TU Kaiserslautern 24 >> Outline << Reconfigurable Computing Paradox Von Neumann loosing its dominance Software vs. Configware The dual paradigm approach Coarse-grained Reconfigurable Devices Conclusions

© 2005, TU Kaiserslautern 25 The dual paradigm approach von Neumann paradigm Kress-Kung paradigm Software Engineering Configware Engineering ASM CPU

© 2005, TU Kaiserslautern 26 DPA x x x x x x x x x | || xx x x x x xx x -- - input data streams xx x x x x xx x x x x x x x x x x | | | | | | | | | | | | | | output data streams „ data streams “ time port # time port # time port # Flowware defines:... which data item at which time at which port Data streams (flowware) (pipe network) ASM algebraic synthesis algorithms: H. T. Kung paradigm (systolic array) Auto- Sequencing Memory RAM GAG ASM implemented by distributed memory

© 2005, TU Kaiserslautern MHz Flexible Soft Logic Architecture 200KLogic Cells 500MHz Programmable DSP Execution Units Gbps Serial Transceivers 500MHz PowerPC™ Processors (680DMIPS) with Auxiliary Processor Unit 1Gbps Differential I/O 500MHz multi-port Distributed 10 Mb SRAM 500MHz DCM Digital Clock Management DSP platform FPGA [courtesy Xilinx Corp.]

© 2005, TU Kaiserslautern 28 Generalization of the systolic array.... discard algebraic synthesis methods [Rainer Kress] use optimization algorithms instead for example: simulated annealing the achievement: also non-linear and non-uniform pipes, and even more wild pipe structures possible now reconfigurability makes sense remedy?

© 2005, TU Kaiserslautern 29 >> Outline << Reconfigurable Computing Paradox Von Neumann loosing its dominance Software vs. Configware The dual paradigm approach Coarse-grained Reconfigurable Devices Conclusions

© 2005, TU Kaiserslautern 30 array size: 10 x 16 = 160 rDPUs Coarse grain is about computing, not logic rout thru only not used backbus connect SNN filter on KressArray (mainly a pipe network) [Ulrich Nageldinger] Example: mapping onto rDPA by DPSS: based on simulated annealing reconfigurable function block, e. g. 32 bits wide no CPU

© 2005, TU Kaiserslautern 31 coarse-grained RC: high integration density FPGA routed > (Gordon Moore curve) transistors / microchip rDPA physical rDPA logical [Hartenstein, ISIS 1996] The Reconfigurable Computing Paradox

© 2005, TU Kaiserslautern 32 hardwired hardwired and coarse-grained reconf. (rDPA) Claassen‘s Law µ feature size MOPS / milliWatt standard microprocessor DSP instruction set processors (fine grained reconf.) FPGAs + Hartenstein‘s Amendment

© 2005, TU Kaiserslautern 33 commercial rDPA example: PACT XPP - XPU128 XPP128 rDPA Evaluation Board available, and XDS Development Tool with Simulator buses not shown rDPU Full 32 or 24 Bit Design working silicon 2 Configuration Hierarchies © PACT AG, (r) DPA

© 2005, TU Kaiserslautern 34 >> Outline << Reconfigurable Computing Paradox Von Neumann loosing its dominance Software vs. Configware The dual paradigm approach Coarse-grained Reconfigurable Devices Conclusions

© 2005, TU Kaiserslautern 35 Conclusions RC is reducing cost without loss of performance and flexibility. FPGAs may be configured like for a micro-processor for C/C++ code. An FPGA can perform a specific algorithm at very high speed. Using a high-level language, the FPGA can be programmed for a wide variety of algorithms without any deep knowledge of the underlying architecture. RC is reducing the electricity bill and the required building floor area Speed-up factors of up to 4 orders of magnitude hve been reported Compared to ASICs, prototyping time is on the order of hours rather than months, with a cost less than a tenth of that for an ASIC. The personal supercomputer is near

© 2005, TU Kaiserslautern 36 Conclusions (2) We urgently need Reconfigurable Computing Education An Update of CS curricula is overdue

© 2005, TU Kaiserslautern 37 END

© 2005, TU Kaiserslautern 38 thank you

© 2005, TU Kaiserslautern 39 The first archetype machine model main frame CPU compile or assemble procedural personalization Software Industry Software Industry’s Secret of Success simple basic. Machine Paradigm personalization: RAM-based instruction-stream- based mind set “von Neumann”

© 2005, TU Kaiserslautern 40 An Archetype Common Model needed Guidance for organizing efficient solutions Make the project manageable Allow to share lessions between applications and between application areas Useful simple archetype not widely accepted Archetype common model should provide.... Progress stalled by the software/configware chasm Configware Industry from the

© 2005, TU Kaiserslautern 41 The 2nd archetype machine model compile structural personalization Configware Industry Configware Industry’s Secret of Success personalization: RAM-based data-stream- based mind set “Kress-Kung” accelerator reconfigurable simple basic. Machine Paradigm

© 2005, TU Kaiserslautern 42 rDPU S + for demo: a tiny section of the pipe network inter-rDPU-communication: no memory cycles needed configware solution: computing in space

© 2005, TU Kaiserslautern 43 Compare it to software solution on CPU on a very simple CPU C = 1 memory cycles nano seconds if C then read A read instruction instruction decoding read operand* operate & register transfers if not C then read B read instruction instruction decoding add & store read instruction instruction decoding operate & register transfers store result total S = R + (if C then A else B endif); S + A B R C Clock 200 =1 S +

© 2005, TU Kaiserslautern 44 hypothetical branching example to illustrate software-to-configware migration *) if no intermediate storage in register file C = 1 simple conservative CPU example memory cycles nano seconds if C then read A read instruction1100 instruction decoding read operand*1100 operate & reg. transfers if not C then read B read instruction1100 instruction decoding add & store read instruction1100 instruction decoding operate & reg. transfers store result1100 total 5500 S = R + (if C then A else B endif); S + ABR C clock 200 MHz (5 nanosec) =1 section of a major pipe network on rDPU no memory cycles: speed-up factor = 100

© 2005, TU Kaiserslautern 45 The wrong mind set.... S = R + (if C then A else B endif); =1 + A B R C section of a very large pipe network: decision not knowing this solution: symptom of the hardware / software chasm and the configware / software chasm „but you can‘t implement decisions!“

© 2005, TU Kaiserslautern 46 The hardware / software chasm If I use the term "software", a variety of images might appear in the engineering audience's mind. Still we have "hardware" engineers and "software" engineers that go to different schools, attend different conferences, avoid each other's cocktail parties, and almost never play on the same volleyball teams at the company picnic. System designers begin to plan their creations around the skill sets and development processes of hardware engineers and software engineers. The two become oil and water. The hardware / software chasm

© 2005, TU Kaiserslautern 47 Blurred line between hardware and software The line between "hardware" and "software" is rapidly blurring and even becoming irrelevant from a system design perspective. As this happens, the traditional roles and skillsets of hardware and software engineers are being challenged, and a new generation of designers is emerging as a result. the obfuscation caused by the pervasiveness of softness.

© 2005, TU Kaiserslautern 48 We need Reconfigurable Computing Education We need a unification in dealing with problems, which are shared across many different application domains There is an urgent need to cure severe qualification deficiencies of our graduates. We need new curricula in CS and CE for providing an integrating dual paradigm mind set instead of vN-only

© 2005, TU Kaiserslautern 49 Terminology clean-up Software: for scheduling instruction streams Flowware: for scheduling data streams Configware: for configuring morphware Programming sources: von Neumann primarily non-von Neumann

© 2005, TU Kaiserslautern 50 Why coarse grain much more MOPS/milliWatt reconfigurable Data Path Unit (e. g. rALU) mind set close to classical computing background instead of rLB (~1 bit wide) use rDPU (e. g. 32 bits wide) instead of FPGA use rDPA rDPU Reconfigurable Computing (RC) much more area-efficient much less reconfigurability overhead

© 2005, TU Kaiserslautern 51 „data stream“: an ambigouos definition Reconfigurable Computing is not instruction-stream-based it‘s data-stream-based it‘s different from the operation of the (indeterministic) „dataflow machine“ other definition also from multimedia area usable definition from systolic array area

© 2005, TU Kaiserslautern 52 >> Outline << Reconfigurable Devices Coarse-grained Reconfigurable Devices Data-stream-based Computing The contemporary Common Model Reconfigurable Supercomputing Conclusions

© 2005, TU Kaiserslautern 53 Why the speed-up although FPGA is clock slower by x 3 or even more (most know-how from „ high level synthesis “ discipline) decisions without memory cycles nor clock cycles most „ data fetch “ without memory cycle

© 2005, TU Kaiserslautern 54 data moved around by software i.e. by memory-cycle-hungry instruction streams which fully hit the memory wall P&R: move locality of operation, not data ! extremely unbalanced stolen from Bob Colwell CPU

© 2005, TU Kaiserslautern 55 Replace Caches by... stolen from Bob Colwell CPU caches … by 16 x 16 reconfigurable data path array (rDPA) which fits on the same chip

© 2005, TU Kaiserslautern 56 Similarly skilled with hardware description languages, Hardware engineers had to adopt the methodologies and techniques of software engineers - Increased softness has an impact on even our products themselves The required skills for your respective jobs are converging (against the grain in an age of increased specialization) and you'll soon be working with (and competing against) a new generation of embedded engineers that are similarly skilled in both disciplines.

© 2005, TU Kaiserslautern 57 Using FPGAs Reducing cost without loss of performance and flexibility. It may be configured like a general flexible micro-processor executing conventional C/C++ code, and as a highly specific programmability of FPGAs distinguishes to ASICs. An FPGA can perform a specific algorithm at very high speed. Compared to ASICs, prototyping time is on the order of hours rather than months, with a cost less than a tenth of that for an ASIC. Using a high-level language, the FPGA can be programmed for a wide variety of algorithms without any deep knowledge of the underlying architecture. Field-programmable FPGAs

© 2005, TU Kaiserslautern 58 Co-Compiler Enabling Technology is available from academia only a small team needed for commercial re-implementation on the road map to the Personal Supercomputer

© 2005, TU Kaiserslautern 59 Conclusions (1) We need a unification in dealing with problems, which are shared across many different application domains. RC suffers from fragmentation into different cultures of the many application domains. CS is the only domain being qualified f. such an effort

© 2005, TU Kaiserslautern 60 Conclusions (2) IEEE Computer Society should advocate to improve application development methodologies and, a common educational approach useful for the wide variety of application domains inside IEEE Computer Society, a TC on RC should lobby for more

© 2005, TU Kaiserslautern 61 Conclusions (3) reverse the downtrend in CS enrolment educate not only students … increase membership make CS more fascinating Strategic issue for entire IEEE Computer Society

© 2005, TU Kaiserslautern 62 Conclusions (4) The personal supercomputer is near, not only for the desktop, but also for a new road map to large scale supercomputing of up to now unthinkable highest performance dimensions. IEEE-CS should accept this fascinating challenge, by spearheading the paradigm shift. IEEE-CS is needed as a translator to explain the impact to managers and to a wide public.

© 2005, TU Kaiserslautern 63 RC education last week at Karlsruhe Attendees declared ready to work for a task force 35 submissions from Australia, Brasil, India, USA, and throughout Europe But education is just one of several facets ……

© 2005, TU Kaiserslautern 64 However.... “What did you say again that your company does?” My father posed the question, “Gate arrays,” I replied, “They’re chips used to…” “Oh yes, that’s right, Gatorade.” ….. “I used to give that to my marching band members so they wouldn’t get dehydrated on hot days. Don’t remember it coming in chip form …..” Explain to your grandmother what it means if you’re one of the world’s leading experts on optical proximity correction (OPC) for nanometer-scale semiconductor lithography? Could you perhaps relate it to some difficulty she has with needlepoint and her cataracts? Even those with a scientific or technical background often won’t understand precisely what we do. A PhD in molecular biology won’t help to understand VHDL and Verilog synthesis for FPGAs. Trying to relate DNA sequences to LUT truth tables might offer a starting point, but somebody has to be able to bridge the technology and terminology gap, even to initiate that analogy. Try explaining FPGAs with the consumer electronics approach. “People tend to relate when you tell them what your part goes into. Today, finally, ‘chip’ seems universally understood. I never get people asking about potato chips anymore.”

© 2005, TU Kaiserslautern 65 However.... Abstract. Google’s yaw-dropping hit rates illustrate the pervasiveness of Reconfigurable Computing (RC), mainstream in embedded systems already for years, and now being adopted by supercomputing (Cray, sgi, etc.). From FPGA usage as accelerators, speed-up factors by up to two orders of magnitude are reported, as well as floor space requirements and electricity invoice amounts reduced by one order of magnitude. About 3 orders of magnitude and more is obtained by using coarse-grained reconfigurable datapath arrays (rDPAs) available from a number of start-ups.This is astonishing, since FPGAs and rDPAs have a substantially lower clock speed than microprocessors. Algorithmic cleverness is the secret of success, based on software to configware migration mechanisms, striving away from memory-cycle-hungry instruction-stream-based computing paradigms. The main benefit of RC platforms - having replaced the use of hardwired accelerators - is their flexibility by non-procedural programmability. This also contributes to those concepts of Organic Computing, which rely on processes of evolution, self-organization, adaptation and fault tolerance. The main hurdles on the way to heart-stopping new horizons of cheap highest performance are CS-related educational deficits causing the configware / software chasm and a methodology fragmentation between the different cultures of application domains. Current CS curricula do not sufficiently meet their transdisciplinary responsibility. The talk gives a survey on fundamental issues in RC and on new directions in CS-related curricula, focused on a dual paradigm organic computing approach.

© 2005, TU Kaiserslautern 66 However.... Application migration [from supercomputer] resulting in performance increase up to 4 orders of magnitude „Saves more than $10,000 in electricity bills per year (7 ¢ / kWh) per 64-processor 19" rack “ [Herb Riley, R. Associates] Reducing electricity bill by an order of magnitude Hits the memory wall from a different direction

© 2005, TU Kaiserslautern 67 However....

© 2005, TU Kaiserslautern 68 Conclusions IEEE Computer Society should advocate to introduce a dual paradigm approach – away from the monopoly of the vN mind set IEEE Computer Society should advocate a common model useful for the wide variety of application domains

© 2005, TU Kaiserslautern 69 Conclusions We need a unification in dealing with problems, which are shared across many different application domains. RC suffers from fragmentation into different cultures of the many application domains. Each domain uses its own trick box. We should teach the world to think outside the box CS is the only domain qualified for this unification

© 2005, TU Kaiserslautern 70 An Archetype Common Model needed Configware Industry from the IEEE Computer Society should advocate to introduce a dual paradigm transdisciplinary education by using Configware Engineering as the counterpart of Software Engineering by new curricula in CS and CE for providing an integrating dual paradigm mind set supporting a unification in dealing with problems, which are shared across many different application domains - to cure severe qualification deficiencies of our graduates.