Presentation is loading. Please wait.

Presentation is loading. Please wait.

(keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,

Similar presentations


Presentation on theme: "(keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,"— Presentation transcript:

1 (keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island, Greece, April 25-26, 2006

2 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 2 Reconfigurable Supercomputing (VHPC) going commercial Cray XD1 silicon graphics RASC … it‘s a paradigm shift ! … and other vendors

3 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 3 The Pervasiveness of RC 162,000 127,000 158,000 113,000 171,000 194,000 # of hits by Google 1,620,000 915,000 398,000 272,000 647,000 1,490,000 # of hits by Google “FPGA and ….” ECE-savvy scene Math/SW-savvy scene unqualified for RC ?

4 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 4 world-wide a mass movement Methodology ? reminds me to the mass migration of lemmings terminology chaos not really a sense of direction an urgent need to get organized

5 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 5 >> Outline << Reconfigurable Computing Paradox The Supercomputing Paradox We are using the wrong model Coarse-grained Reconfigurable Devices Super Pentium for Desktop Supercomputer http://www.uni-kl.de

6 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 6 The Reconfigurable Computing Paradox very poor effective integration density „very power-hungry“ [Rick Kornfeld*] very poor application development support poor FPGA technology: lower clock frequencies, and more expensive. RC education: extremely poor, or none Languages and tools unacceptable for software people most hardware experts (86%**) hate their tools **) DeHon ‘98 *) personal communication poor tools: poor education: However, brilliant results everywhere what paradox ? ignored by CS curricula … teach like for a 50 year old mainframe …

7 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 7 Computing Curricula 2004 fully ignores Reconfigurable Computing Joint Task Force for FPGA & synonyma: 0 hits not even here (Google: 10 million hits) Education ?

8 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 8 Computing Curricula v.2005: no changes other than „… FPGA, etc.“ (not really mentioning that it‘s missing) Completed ? Taskforce activity completed ? Next task force in 2020 or later ?

9 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 9 End of this week: brainstorming session at DARPA: (urgently needed – overdue! ) Tools ?

10 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 10 fine-grained RC: 1 st DeHon‘s Law Technology: reconfigurability overhead> routing congestion wiring overhead overhead: >> 10 000 1980199020002010 10 0 10 3 10 6 10 9 FPGA logical FPGA routed density: FPGA physical (Gordon Moore curve) transistors / microchip (microprocessor) immense area inefficiency [1996: Ph. D, MIT]

11 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 11 X 2 / yr FPGA published speed-up factors 1980199020002010 10 0 10 3 10 6 10 9 8080 Pentium 4 7% / yr 50% / yr http://xputers.informatik.uni-kl.de/faq-pages/fqa.html 10 000 Los Alamos traffic simulation 47 real-time face detection 6000 video-rate stereo vision 900 pattern recognition 730 SPIHT wavelet-based image compression 457 Smith-Waterman pattern matching 288 BLAST 52 protein identification 40 molecular dynamics simulation 88 Reed-Solomon Decoding 2400 Viterbi Decoding 400 FFT 100 1000 MAC Grid-based DRC: no FPGA: DPLA on MoM by TU-KL Grid-based DRC: no FPGA: DPLA on MoM by TU-KL 2000 2-D FIR filter [TU-KL] 39,4 Lee Routing ( by TU-KL) 160 Grid-based DRC („fair comparizon“) 15000 DSP and wireless Image processing, Pattern matching, Multimedia Bioinformatics GRAPE 20 Astrophysics DPLA MoM Xputer architecture Microprocessor relative performance Memory 10 000 x1.25 / yr (Moore) crypto 1000 pre-FPGA era

12 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 12 pre FPGA era: Why DPLA* was so good Close to Moore because of small overhead (wiring, programmability, routing) Large arrays of canonical boolean expressions PLA layout ~similar to RAM / ROM layout: Mid’ 80ies: first very tiny FPGAs available *) designed by TU-KL, fabricated by E.I.S. German multi university project GAG Generic Address Generator to avoid address computation overhead 2 ASM : A uto- S equencing M emory ASM [M. Herz et al.: I CECS 2003, Dubrovnik]

13 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 13 (anti-von-Neumann machine paradigm) Data Counter instead of Program Counter Generalization of the DMA data counter GAG RAM ASM : A uto- S equencing M emory ASM GAG & enabling technology: published 1989 [by TU-KL], Survey paper: [M. Herz et al. * : IEEE ICECS 2003, Dubrovnik] *) IMEC & TU-KL **) -- patented by TI ** 1995 Storge Scheme optimization methodology, etc.

14 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 14 Thousands or Millions of $ for free Application migration [from supercomputer ] resulting not only in massive speed-ups Electricity bills reduced by an order of magnitude and even more you may get for free …. up to millions of $ dollars per year (also a matter of national energy policy) Google Amsterdam NY

15 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 15 Reconfigurable Scientific Computing How software types do programming the FPGAs ? Hiring a good student from the EE Dept. ? Because of Missing RC education: Far away from optimum solutions ? Much higher speedup achievable ? 1 or 2 more orders of magnitude ? 100.000 ? 1.000.000 ?

16 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 16 X 2 / yr FPGA By education: better speed-up factors ? 1980199020002010 10 0 10 3 10 6 10 9 8080 P4 7% / yr 50% / yr http://xputers.informatik.uni-kl.de/faq-pages/fqa.html 10 000 Los Alamos traffic simulation 47 real-time face detection 6000 video-rate stereo vision 900 pattern recognition 730 SPIHT wavelet-based image compression 457 Smith-Waterman pattern matching 288 BLAST 52 protein identification 40 molecular dynamics simulation 88 Reed-Solomon Decoding 2400 Viterbi Decoding 400 FFT 100 1000 MAC Grid-based DRC: no FPGA: DPLA on MoM by TU-KL Grid-based DRC: no FPGA: DPLA on MoM by TU-KL 2000 2-D FIR filter [TU-KL] 39,4 Lee Routing ( by TU-KL) 160 Grid-based DRC („fair comparizon“) 15000 DSP and wireless Image processing, Pattern matching, Multimedia Bioinformatics GRAPE 20 Astrophysics DPLA MoM Xputer architecture Microprocessor relative performance Memory 10 000 x1.25 / yr (Moore) crypto 1000 tools & edu available ?

17 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 17 >> Outline << Reconfigurable Computing Paradox The Supercomputing Paradox We are using the wrong model Coarse-grained Reconfigurable Devices Super Pentium for Desktop Supercomputer http://www.uni-kl.de

18 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 18 ISC2006 BoF SessionTitle and Abstract Is Reconfigurable Computing the Next Generation Supercomputing? Advances in reconfigurable computing, particularly FPGA (field-programmable gate array) technology, have reached a performance level where they rival and exceed the performance of general purpose processors for the right applications. FPGAs have gotten cheaper thanks to smaller geometries, multimillion gate counts and volume market leverage from ASIC preproduction and other conventional uses. The potential benefit from the widespread incorporation of FPGA technology into high-performance applications is high, provided present day barriers to their incorporation can be overcome. This session will focus on defining the anticipated market changes, anticipated roles of FPGA technology in high-performance computing (from accelerators to hybrid architectures), characterizing present day barriers to the incorporation of FPGA technology (such as identifying the right applications), and partnering efforts required (tools, benchmarks, standards, etc.)to speed the adoption of reconfigurable technology in high-performance supercomputing. Keywords: Reconfigurable computing, FPGA Accelerators, Supercomputing Date and Time This BoF session is part of the conference program and will take place within a 45 minute-slot on Wednesday 28. June 2006 from 18:00 - 19:30. BoF Organizers John Abott Chief Analyst, The 451 Group, USA Dr. Joshua Harr CTO, Linux Networx, USA As CTO for Linux Networ x, Dr. Joshu a Harr has the respon sibility of laying the technic al roadma p for the compa ny and is leading the team develo ping cluster manag ement tools. Josh's experie nce with parallel process ing, distrib uted comput ing, large server farms, and Linux clusteri ng began when he built an eight- node cluster system out of used compo nents while in college. An industr y expert, Josh has been called upon to consult with busines ses and lecture in college classro oms. He earned a Ph.D. in comput ational chemis try and a bachel or's degree in molecu lar biolog y from BYU. Dr. Eric Stahlberg Organizing founder OpenFPGA, Ohio Supercomputer Center (OSC), USA The Supercomputing Paradox Growing listed Teraflops Often limited sustained Teraflops Almost stalled application implementation progress Increasing number of processors running in parallel COTS processor decreasing cost Very high total cost of the Tera(?)flops promising technology poor results Scientists waiting for affordable compute capacity The Law of More

19 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 19 >> Outline << Reconfigurable Computing Paradox The Supercomputing Paradox We are using the wrong model Coarse-grained Reconfigurable Devices Super Pentium for Desktop Supercomputer http://www.uni-kl.de

20 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 20 ISC2006 BoF SessionTitle and Abstract Is Reconfigurable Computing the Next Generation Supercomputing? Advances in reconfigurable computing, particularly FPGA (field-programmable gate array) technology, have reached a performance level where they rival and exceed the performance of general purpose processors for the right applications. FPGAs have gotten cheaper thanks to smaller geometries, multimillion gate counts and volume market leverage from ASIC preproduction and other conventional uses. The potential benefit from the widespread incorporation of FPGA technology into high-performance applications is high, provided present day barriers to their incorporation can be overcome. This session will focus on defining the anticipated market changes, anticipated roles of FPGA technology in high-performance computing (from accelerators to hybrid architectures), characterizing present day barriers to the incorporation of FPGA technology (such as identifying the right applications), and partnering efforts required (tools, benchmarks, standards, etc.)to speed the adoption of reconfigurable technology in high-performance supercomputing. Keywords: Reconfigurable computing, FPGA Accelerators, Supercomputing Date and Time This BoF session is part of the conference program and will take place within a 45 minute-slot on Wednesday 28. June 2006 from 18:00 - 19:30. BoF Organizers John Abott Chief Analyst, The 451 Group, USA Dr. Joshua Harr CTO, Linux Networx, USA As CTO for Linux Networ x, Dr. Joshu a Harr has the respon sibility of laying the technic al roadma p for the compa ny and is leading the team develo ping cluster manag ement tools. Josh's experie nce with parallel process ing, distrib uted comput ing, large server farms, and Linux clusteri ng began when he built an eight- node cluster system out of used compo nents while in college. An industr y expert, Josh has been called upon to consult with busines ses and lecture in college classro oms. He earned a Ph.D. in comput ational chemis try and a bachel or's degree in molecu lar biolog y from BYU. Dr. Eric Stahlberg Organizing founder OpenFPGA, Ohio Supercomputer Center (OSC), USA Why traditional supercomputing / HPC failed instruction-stream-based: memory-cycle-hungry the wrong way, how the data are moved around because of the wrong multi-core interconnect architecture extremely unbalanced stolen from Bob Colwell CPU

21 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 21 Earth Simulator 5120 Processors, 5000 pins each ES 20: TFLOPS Crossbar weight: 220 t, 3000 km of thick cable, moving data around inside the

22 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 22 Bringing together data and processor moving the grand piano by Software Moving data to the processor:

23 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 23 >> Outline << Reconfigurable Computing Paradox The Supercomputing Paradox We are using the wrong model Coarse-grained Reconfigurable Devices Super Pentium for Desktop Supercomputer http://www.uni-kl.de

24 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 24 coarse-grained RC: Hartenstein‘s Law rDPA FPGA routed >> 10 000 1980199020002010 10 0 10 3 10 6 10 9 (Gordon Moore curve) transistors / microchip rDPA physical rDPA logical area efficiency very close to Moore‘s law [1996: ISIS, Austin, TX] e.g. KressArray family

25 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 25 X 2 / yr FPGA higher speed-up factors by coarse-grained? 1980199020002010 10 0 10 3 10 6 10 9 8080 P4 7% / yr 50% / yr http://xputers.informatik.uni-kl.de/faq-pages/fqa.html 10 000 Los Alamos traffic simulation 47 real-time face detection 6000 video-rate stereo vision 900 pattern recognition 730 SPIHT wavelet-based image compression 457 Smith-Waterman pattern matching 288 BLAST 52 protein identification 40 molecular dynamics simulation 88 Reed-Solomon Decoding 2400 Viterbi Decoding 400 FFT 100 1000 MAC Grid-based DRC: no FPGA: DPLA on MoM by TU-KL Grid-based DRC: no FPGA: DPLA on MoM by TU-KL 2000 2-D FIR filter [TU-KL] 39,4 Lee Routing ( by TU-KL) 160 Grid-based DRC („fair comparizon“) 15000 DSP and wireless Image processing, Pattern matching, Multimedia Bioinformatics GRAPE 20 Astrophysics DPLA MoM Xputer architecture Microprocessor relative performance Memory 10 000 x1.25 / yr (Moore) crypto 1000 Coarse-grained arrays ?

26 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 26 array size: 10 x 16 = 160 rDPUs Coarse grain is about computing, not logic rout thru only not used backbus connect SNN filter on KressArray (mainly a pipe network) [Ulrich Nageldinger] r econfigurable D ata P ath U nit, e. g. 32 bits wide no CPU rDPU

27 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 27 SW 2coarse-grained CW migration example rDPU S +

28 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 28 Compare it to software solution on CPU on a very simple CPU C = 1 memory cycles nano seconds if C then read A read instruction instruction decoding read operand* operate & register transfers if not C then read B read instruction instruction decoding add & store read instruction instruction decoding operate & register transfers store result total S + A B R C Clock 200 =1 S + S = R + (if C then A else B endif);

29 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 29 hypothetical branching example to illustrate software-to-configware migration *) if no intermediate storage in register file C = 1 simple conservative CPU example memory cycles nano seconds if C then read A read instruction1100 instruction decoding read operand*1100 operate & reg. transfers if not C then read B read instruction1100 instruction decoding add & store read instruction1100 instruction decoding operate & reg. transfers store result1100 total 5 500 S = R + (if C then A else B endif); S + ABR C clock 200 MHz ( 5 nanosec) =1 no memory cycles: speed-up factor = 100

30 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 30 moving the locality of operation into the route of the data stream by P&R Why the speed-up? What‘s the difference? instead of moving data by instruction streams

31 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 31 rout thru only not used backbus connect [Ulrich Nageldinger] The wrong mind set.... S = R + (if C then A else B endif); =1 + A B R C section of a very large pipe network: decision not knowing this solution: symptom of the hardware / software chasm and the configware / software chasm „but you can‘t implement decisions!“ We need Reconfigurable Computing Education

32 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 32 ISC2006 BoF SessionTitle and Abstract Is Reconfigurable Computing the Next Generation Supercomputing? Advances in reconfigurable computing, particularly FPGA (field-programmable gate array) technology, have reached a performance level where they rival and exceed the performance of general purpose processors for the right applications. FPGAs have gotten cheaper thanks to smaller geometries, multimillion gate counts and volume market leverage from ASIC preproduction and other conventional uses. The potential benefit from the widespread incorporation of FPGA technology into high-performance applications is high, provided present day barriers to their incorporation can be overcome. This session will focus on defining the anticipated market changes, anticipated roles of FPGA technology in high-performance computing (from accelerators to hybrid architectures), characterizing present day barriers to the incorporation of FPGA technology (such as identifying the right applications), and partnering efforts required (tools, benchmarks, standards, etc.)to speed the adoption of reconfigurable technology in high-performance supercomputing. Keywords: Reconfigurable computing, FPGA Accelerators, Supercomputing Date and Time This BoF session is part of the conference program and will take place within a 45 minute-slot on Wednesday 28. June 2006 from 18:00 - 19:30. BoF Organizers John Abott Chief Analyst, The 451 Group, USA Dr. Joshua Harr CTO, Linux Networx, USA As CTO for Linux Networ x, Dr. Joshu a Harr has the respon sibility of laying the technic al roadma p for the compa ny and is leading the team develo ping cluster manag ement tools. Josh's experie nce with parallel process ing, distrib uted comput ing, large server farms, and Linux clusteri ng began when he built an eight- node cluster system out of used compo nents while in college. An industr y expert, Josh has been called upon to consult with busines ses and lecture in college classro oms. He earned a Ph.D. in comput ational chemis try and a bachel or's degree in molecu lar biolog y from BYU. Dr. Eric Stahlberg Organizing founder OpenFPGA, Ohio Supercomputer Center (OSC), USA The new paradigm: how the data are traveling not transport-triggered: old hat pipeline, or chaining super systolic array no, not by instruction execution DPU vN Move Processor instruction-driven + instruction-driven [Jack Lipovski, EUROMiCRO, Nice, 1975] P&R: move locality of operation, not data !

33 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 33 DPA x x x x x x x x x | || xx x x x x xx x -- - input data stream xx x x x x xx x -- - - - - - - - - - - x x x x x x x x x | | | | | | | | | | | | | | output data streams „ data streams “ time port # time port # time port # define:... which data item at which time at which port Data streams (pipe network) H. T. Kung paradigm (systolic array) implemented by distributed memory data counter GAG RAM ASM ASM : A uto- S equencing M emory 50 & more on-chip ASM are feasible 50 & more on-chip ASM are feasible

34 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 34 The Generalization of the Systolic Array [R. Kress]: use optimization algorithms e. g.: simulated annealing Achievement: also non-linear and non-uniform pipes, and even more wild pipe structures possible reconfigurability makes sense discard algebraic synthesis methods remedy? only for applications with regular data dependencies Kress-Kung paradigm super systolic array

35 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 35 >> Outline << Reconfigurable Computing Paradox The Supercomputing Paradox We are using the wrong model Coarse-grained Reconfigurable Devices Super Pentium for Desktop Supercomputer http://www.uni-kl.de

36 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 36 Here is the common model data- stream- based instruction- stream- based software code accelerator reconfigurable accelerator hardwired configware code CPU it’s not von Neumann the vN monopoly in our curricula is severely harmful wagging the dog the tail is we need dual paradigm education

37 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 37 A potential Pentium successor Discard most caches have 64* cores, 0.5 - 1 GHz with clever interconnect for: ! concurrent processes and ! and for multithreading, ! Kung-Kress pipe network The Desk-top Supercomputer! *) CPU mode / DPU mode capability and, for CPU mode DPU mode

38 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 38 “Super Pentium” configuration example rDPU CPU

39 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 39 e. g.: ~ 8 x 8 rDPA: all feasible under 500 MHz Games MusicVideos SMeXPP Camera Baseband- Processor Radio- Interface Audio - Interface SD/MMC Cards LCD DISPLAY rDPA Variable resolutions and refresh rates Variable scan mode characteristics Noise Reduction and Artifact Removal High performance requirements Variable file encoding formats Variable content security formats Variable Displays Luminance processing Detail enhancement Color processing Sharpness Enhancement Shadow Enhancement Differentiation Programmable de-interlacing heuristics Frame rate detection and conversion Motion detection & estimation & compensation Different standards (MPEG2/4, H.264) A single device handles all modes World TV & game console & multi media center http://pactcorp.com

40 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 40 Dual Paradigm Application Development instruction- stream- based software code accelerator reconfigurable accelerator hardwired configware code data- stream- based CPU software/configware co-compiler high level language

41 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 41 Software / Configware Co-Compilation Juergen Becker’s CoDe-X, 1996 CPU Resource Parameters supporting different platforms SW compiler CW compiler C language source Partitioner rDPU Placement & Routing (Move the Locality of Operation)

42 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 42 Bringing together data and processor Move the stool by Configware Place the location of execution into the data pipe

43 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 43 >> Conclusions << Reconfigurable Computing Paradox The Supercomputing Paradox We are using the wrong model Coarse-grained Reconfigurable Devices Super Pentium for Desktop Supercomputer Conclusions http://www.uni-kl.de

44 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 44 Conclusions (1): Hurdles Obstacles are: unbelievably disastrous tools market: unbelievably ignorant curricula: enabling technologies available, partly decades old, but not used transdisciplinary models not available nor taught at CS, nor elsewhere fragmentation into application-domain- specific cultures and trick boxes … teach like for a 50 year old mainframe …

45 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 45 Conclusions (2): Future Work CS disciplines must recognize and accept its strategic role and its responsibility toward all its application disciplines: embedded and scientific computing. The monopoly of the von-Neumann-based mind set in CS education: heavily stalls progress in R&D, not only in HPC causes high cost in R&D, not only in supercomputing The von-Neumann-only-based mind set in CS urgently needs to go to adopt the dual paradigm common model CS graduates are not qualified for our job market

46 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 46 Conclusions (3): Chances New horizons: chances are brilliant

47 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 47 thank you

48 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 48 END

49 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 49 thank you

50 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 50 Backup:

51 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 51 Co-Compiler Enabling Technology is available from academia only a small team needed for commercial re-implementation on the road map to the Personal Supercomputer

52 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 52 Compilation: Software vs. Configware source program software compiler software code Software Engineering configware code mapper configware compiler scheduler flowware code source „ program “ Configware Engineering placement & routing data C, FORTRAN MATHLAB

53 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 53 configware resources: variable Nick Tredennick’s Paradigm Shifts explain the differences 2 programming sources needed flowware algorithm: variable Configware Engineering Software Engineering 1 programming source needed algorithm: variable resources: fixed software CPU

54 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 54 Co-Compilation software compiler software code Software / Configware Co-Compiler configware code mapper configware compiler scheduler flowware code data C, FORTRAN, MATHLAB automatic SW / CW partitioner simulated annealing

55 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 55 Co-Compiler for Hardwired Kress/Kung Machine [e. g. Brodersen] software compiler software code Software / Flowware Co-Compiler flowware compiler scheduler flowware code data source automatic SW / CW partitioner

56 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 56 The first archetype machine model main frame CPU compile or assemble procedural personalization Software Industry Software Industry’s Secret of Success simple basic. Machine Paradigm personalization: RAM-based instruction-stream- based mind set “von Neumann”

57 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 57 The 2nd archetype machine model compile structural personalization Configware Industry Configware Industry’s Secret of Success personalization: RAM-based data-stream- based mind set “Kress-Kung” accelerator reconfigurable simple basic. Machine Paradigm

58 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 58 „Saves more than $10,000 in electricity bills per year (7 ¢ / kWh) -.... per 64-processor 19" rack “ [Herb Riley, R. Associates]

59 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 59 modern FPGA bestsellers: The new model is reality: FPGA fabrics, together with several µprocessors, many memory banks, and other IP cores, on the same COTS microchip

60 © 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 60 500MHz Flexible Soft Logic Architecture 200KLogic Cells 500MHz Programmable DSP Execution Units 0.6-11.1Gbps Serial Transceivers 500MHz PowerPC™ Processors (680DMIPS) with Auxiliary Processor Unit 1Gbps Differential I/O 500MHz multi-port Distributed 10 Mb SRAM 500MHz DCM Digital Clock Management DSP platform FPGA [courtesy Xilinx Corp.]


Download ppt "(keynote) (from HPC to) New Horizons of Very High Performance Computing (VHPC): Hurdles and Chances Reiner Hartenstein TU Kaiserslautern Rhodes Island,"

Similar presentations


Ads by Google