Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reconfigurable HPC Reconfigurable HPC part 1 Introduction Reiner Hartenstein TU Kaiserslautern May 14, 2004, TU Tallinn, Estonia.

Similar presentations


Presentation on theme: "Reconfigurable HPC Reconfigurable HPC part 1 Introduction Reiner Hartenstein TU Kaiserslautern May 14, 2004, TU Tallinn, Estonia."— Presentation transcript:

1 Reconfigurable HPC Reconfigurable HPC part 1 Introduction Reiner Hartenstein TU Kaiserslautern May 14, 2004, TU Tallinn, Estonia

2 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 2 Preface The White House, Sept 2000: Bill Clinton condemns the Digital Divide in America: World Economic Forum 2002: The Global Digital Divide, disparity between the "haves" and "have nots“ The Digital Divide of Parallel Computing: Access to Configware (CW) Solutions access to the internet

3 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 3 The „havenots“ Configware methodology to move data around more efficiently: Configware engineering as a qualification for programming embedded systems * : „ havenots “ are found in the HPC community The „ havenots “ are our typical CS graduates Reconfigurable HPC is torpedoed by deficits in education: curricular revisions are overdue *) also HPC !

4 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 4 Software to Configware Migration this talk will illustrate the performance benfit which may be obtained from Reconfigurable Computing stressing coarse grain Reconfigurable Computing (RC), point of view, this talk hardly mentions FPGAs (But coarse grain may be always mapped onto FPGAs) Software to Configware Migration is the most important source of speed-up Hardware is just frozen Configware

5 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 5 >> HPC << HPC Embedded Computing The wrong Roadmap Configware Engineering Dual Machine Paradigms Speed-up Examples Final Remarks http://www.uni-kl.de

6 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 6 Earth Simulator 5120 Processors, 5000 pins each ES 20: TFLOPS Crossbar weight: 220 t, 3000 km of cable, moving data around inside the

7 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 7 data are moved around by software (slower than CPU clock by 2 orders of magnitude) i.e. by memory-cycle-hungry instruction streams which fully hit the memory wall extremely unbalanced stolen from Bob Colwell CPU

8 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 8 path of least resistance * : avoiding a paradigm shift Many researchers seem never to stop working on sophisticated solutions for marginal improvements...... continously ignoring methodologies promising speed-ups by orders of magnitude.... blinders to ignore the impact of morphware... continue to bang their heads against the memory wall instead of *) [Michel Dubois]

9 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 9 the data-stream-based approach has no von Neumann bottle- neck … understand only this parallelism solution: the instruction-stream-based approach von Neumann bottle- necks... cannot cope with this one

10 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 10 >> Embedded Computing << HPC Embedded Computing The wrong Roadmap Configware Engineering Dual Machine Paradigms Speed-up Examples Final Remarks http://www.uni-kl.de

11 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 11 ? What’s coming next ? The History of Paradigm Shifts “Mainstream Silicon Application is switching every 10 Years” TTL µproc., memory “The Programmable System-on-a-Chip is the next wave“ custom standard 1957 1967 1977 1987 1997 2007 Makimoto’s Wave ASICs, accel’s LSI, MSI 1 st Design Crisis 2 nd Design Crisis ? reconfigurable Published in 1989

12 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 12 Makimoto’s 3rd Wave Fine Grain Subsystems (FPGAs): –1st half of 3rd wave –universal (but less efficient) Coarse Grain Subsystems: –2nd half of 3rd wave –domain-specific –much more flexible than 2nd half of 2rd wave

13 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 13 How’s next Wave ? 2007 FPGAs custom standard 1957 1967 1977 1987 1997 Tredennick’s Paradigm Shifts procedural programming algorithm: variable resources: fixed hardwired algorithm: fixed resources: fixed 2007 ? structural programming algorithm: variable resources: variable Coarse grain RAs no further wave ! Hartenstein’s Curve ? 4 th wave ?

14 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 14 History of Silicon Application TTL µproc., memory 1957 1967 1977 1987 1997 2007 ASICs, accel’s LSI, MSI FPGAs coarse grain soft CPUs hardware people CS people new breed needed Common terminology needed 3 different mind sets

15 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 15 History of Machine Models 1957 1967 1977 1987 1997 2007 mainframe age main frame. compile procedural mind set: instruction-stream-based (coordinates by Makimtos wave) computer age (PC age) accel. µ Proc. compile users: RIKEN institute, ARI, Heidelberg, etc. MD-GRAPE-2 PCI board [1997] 4 chips for N-body simulation converts a PC to 64 GFlops scientific computing example: molecular dynamics, astrophysics, plasma physics, hydrodynamics:

16 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 16 History of Machine Models 1957 1967 1977 1987 1997 2007 mainframe age main frame. compile procedural mind set: instruction-stream-based (coordinates by Makimtos wave) computer age (PC age) accel. µ Proc. compile structural mind set: data-stream-based by hardware guys design

17 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 17 the hardware / Software Chasm: typical programmers don‘t understand function evaluation without machine mechanisms (counters, state registers) It‘s the gap between procedural (instruction-stream- based) and structural (datastream-based) mind set accelerators µ processor

18 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 18 Growth Rate of Embedded Software 1 2 0 101218 months factor *) Department of Trade and Industry, London (1.4/year) [Moore ’ s law] >10 times more programmers will write embedded applications than computer software by 2010 Embedded software [DTI*] (~2.5/yr) already to-day, more than 98% of all microprocessors are used within embedded systems

19 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 19 typical CS graduates: the „havenots“ To-day, „ typical “ CS graduates are unqualified for this labor market … cannot cope with Hardware / Configware / Software partitioning issues … cannot implement Configware

20 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 20 the current CS mind set is based on the Submarine Model Hardware invisible: under the surface Hardware Algorithm Assembly Language procedural high level Programming Language Software This model does not suport Hardware / Configware / Software partitioning

21 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 21 Hardware / Configware / Software Partitioning skills urgently needed Algorithm partitioning HW CW SW to cope with each of it: SW, CW, HW. SW / HW SW / CW / HW SW / CW CW / HW or: to cope with any combination of co-design. Software to Configware Migration is the most important source of speed-up Hardware is just frozen Configware

22 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 22 By the way... http://fpl.org International Conference on Field-Programmable Logic and Applications (FPL) Aug. 20 – Sept 1, 2004, Antwerp, Belgium 288 submissions !... the oldest and largest conference in the field: accel. µ Proc.... going into every type of application they all work on high performance

23 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 23 Dominance of the Submarine Model... Hardware... indicates, that our CS education system produces zillions of mentally disabled CS graduates (procedural) structurally disabled … disabled to cope with solutions other than instruction-stream-based

24 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 24 CS Education procedural have not You cannot * teach Hardware to a Programmer *) efficiently But to a Hardware Guy you always can teach Programming structural have natural

25 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 25 >> the wrong Roadmap << HPC Embedded Computing the wrong Roadmap Configware Engineering Dual Machine Paradigms Speed-up Examples Final Remarks http://www.uni-kl.de

26 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 26 Completely wrong roadmap beef up old architectural principles by new technology? growth factor µm 0.1 performance area efficiency „Pollack‘s Law“ (simplified) [intel]... the CPU is a methusela, the steam engine of the silicon age

27 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 27 Completely wrong mind set The key problem, the memory wall, cannot be solved by new CPU technology We need a 2 nd machine paradigm (a 2 nd mind set...) The vN paradigm is not a communication paradigm Its monopoly creates a completely wrong mind set We need an architectural communication paradigm But we need both paradigms: a dichotomy

28 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 28 3 rd machine model became mainstream 1957 1967 1977 1987 1997 2007 computer age (PC age) accel. design µ Proc. compile (Makimtos wave) mainframe age main frame compile instruction- stream-based DPA r r µ Proc. programmable most CS curricula & HPC are still here morphware age

29 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 29 >> Configware Engineering << Supercomputing (HPC) Embedded Computing The wrong Roadmap Configware Engineering Dual Machine Paradigms Speed-up Examples Final Remarks http://www.uni-kl.de

30 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 30 de facto Duality of RAM-based platforms traditionalnew RAM-based platformCPUmorphware (FPGA, rDPA..) „running“ on it: softwareconfigware machine paradigmvon Neumann etc.: instruction-stream-based anti machine: data-stream-based 2 nd paradigm We now have 2 types of programmable platforms soft hardware morphware [DARPA] hardware viewed as frozen configware: Just earlier binding

31 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 31 [Gordon Bell]... going into every type of application [Gordon Bell].... the brain hurts CW has become mainstream... Others experienced, that the brain hurts, when trying the paradigm shift The HPC scene believed to be smart, when smiling about us CW guys morphware: fastest growing sector of the IC market

32 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 32 DPA morphware age r r From Software to Configware Industry structural personalization: RAM-based Repeat Success Story by a 2 nd Machine Paradigm ! Growing Configware Industry 1957 1967 1977 1987 1997 2007 computer age (PC age) µ Proc. compile Procedural personalization via RAM-based. Machine Paradigm Software Industry 1) 2) Software Industry’s Secret of Success anti machine

33 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 33 benefit from RAM-based & 2 nd paradigm RAM-based platform needed for: flexibility, programmability avoiding the need of specific silicon mask cost: currently 2 mio $ - rapidly growing 1) simple 2nd machine paradigm needed as a common model: to avoid the need of circuit expertize needed to to educate zillions of programmers 2) (vN relocatability is based on the von Neumann bottleneck) high price By the way: relocatability is more difficult, but not hopeless

34 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 34 configware resources: variable Nick Tredennick’s Paradigm Shifts explain the differences 2 programming sources needed flowware algorithm: variable Configware Engineering Software Engineering 1 programming source needed algorithm: variable resources: fixed software CPU

35 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 35 Compilation: Software vs. Configware source program software compiler software code Software Engineering configware code mapper configware compiler scheduler flowware code source „ program “ Configware Engineering placement & routing data

36 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 36 DPA x x x x x x x x x | || xx x x x x xx x -- - input data streams xx x x x x xx x -- - - - - - - - - - - x x x x x x x x x | | | | | | | | | | | | | | output data streams „ data streams “ time port # time port # time port # Flowware defines:... which data item at which time at which port Flowware programs data streams

37 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 37 Flowware: not new 1957 1967 1977 1987 1997 2007 computer age (PC age) accel. design µ Proc. compile (Makimtos wave) mainframe age main frame compile DPA r r µ Proc. morphware age *) no confusion, please: no „ dataflow machine “ !!! data stream*... Flowware: around 1980

38 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 38 data streams * : not new 1980: data streams (Kung, Leiserson: systolic arrays) 1989: data-stream-based Xputer architecture 1990: rDPU (Rabaey) 1994: Flowware Language MoPL (Becker et al.) 1995: super systolic array (rDPA) + DPSS tool (Kress) 1996+: Stream-C language, SCCC (Los Alamos), SCORE, ASPRC, Bee (UC Berkeley),... 1996+: streaming languages (Stanford et al.) 1996: configware / software partitioning compiler (Becker) *) please, don ‘ t confuse with „ data flow “

39 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 39 >> Dual Machine Paradigms << HPC Embedded Computing The wrong Roadmap Configware Engineering Dual Machine Paradigms Speed-up Examples Final Remarks http://www.uni-kl.de

40 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 40 Why a new machine paradigm ??? The anti machine as the 2 nd paradigm is the key to curricular innovation rDPA µ processor... a Troyan horse to introduce the structural domain to the procedural-only mind set of programmers Programming by flowware instead of software is very easy to learn Flowware education: no fully fledged hardware expert needed to program embedded systems (... same language primitives)

41 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 41 von Neumann vs. anti machine progra m counter DPU CPU RAM memory von Neumann bottleneck (r) DPA without sequencer no CPU ! asMA: auto-sequencing Memory Array........ asM (r) DPA........ data stream machine (anti machine) data counter memory bank asM asM: auto-sequencing Memory instruction stream machine (von Neumann etc.)

42 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 42 Behavior of the Counter data counter memory bank asM........ progra m counter DPU CPU programmed by flowware data streams programmed by software (r) DPA programmed by flowware

43 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 43 Counters: the same micro architecture ? data stream machine (anti machine) data counter memory bank asM progra m counter DPU CPU instruction stream machine: (von Neumann etc.) yes, is possible, but for data counters... *) for history of AGUs see Herz et al.: Proc. ICECS 2002, Dubrovnik, Croatia... a much better AGU methodology is available* AGU: address generator unit

44 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 44 The dichotomy of models Note for von Neumann: state register is with the CPU Note for the anti machine: state register is with memory bank / state register s are within memory bank s

45 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 45 commercial rDPA example: PACT XPP - XPU128 XPP128 rDPA Evaluation Board available, and XDS Development Tool with Simulator buses not shown rDPU Full 32 or 24 Bit Design working silicon 2 Configuration Hierarchies © PACT AG, Munich http://pactcorp.com (r) DPA

46 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 46 XPP64A: Platform Development Board - SDR Board In Debug Phase -> XPP64A Chips from STMicro Fab - Assembly & Test / Available March 2003

47 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 47 symbiosis of machine models 1957 1967 1977 1987 1997 2007 computer age (PC age) accel. design µ Proc. compile (Makimtos wave) mainframe age main frame compile morphware age DPA r r µ Proc. replace PC by PS co-compiler symbiosis

48 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 48 >> Speed-up Examples << HPC Embedded Computing The wrong Roadmap Configware Engineering Dual Machine Paradigms Speed-up Examples Final Remarks http://www.uni-kl.de

49 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 49 Better solutions by Configware Memory cycles minimized e.g.: no instruction fetch at run time & other effects Memory access for data: caches do not help anyhow Loop xforms: no intra-stream data memory cycles Complex address computation: no memory cycles No cache misses! instead of software methodologies not new: high level synthesis (1980+) loop transformations (1970+) many other areas

50 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 50 speed-up examples platformapplication examplespeed-up factormethod PACT Xtreme 4-by-4 array [2003] 16 tap FIR filterx16 MOPS/mW straight forward MoM anti machine with DPLA* [1983] grid-based DRC** 1-metal 1-poly nMOS *** 256 reference patterns > x1000 (computation time) multiple aspects *) DPLA: MPC fabr. via E.I.S. multi univ. project key issue: algorithmic cleverness **) Design Rule Check CPU 2 FPGA [FPGA 2004] migrate several simple application exampes x7 – x46 (compute time) hi level synthesis ***) for 10-metal 3-poly cMOS expected: >> x10,000 DSP 2 FPGA [Xilinx 2004 2 ] from fastest DSP: 10 gMACs to 1 teraMAC X 100 (compute time) not spec. 2) Wim Roelandts

51 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 51 Software to Configware Migration: ( RAW’99 at Orlando) question by a highly respected industrial senior researcher: Ulrich Nageldinger‘s talk about KressArray Xplorer: „But you can‘t implement decisions!“ (symptom of...)

52 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 52 branching example: time-to-space migration *) if no intermed. storage in register file on a very simple CPU C = 1 memory cycles if C then read A read instruction1 instruction decoding read operand*1 operate & register transfers if not C then read B read instruction1 instruction decoding add & store read instruction1 instruction decoding operate & register transfers store result1 total5 S = R + (if C then A else B endif); S + A B R C clock =1 on rDPU: no memory cycles

53 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 53 Why the speed-up...... although FPGA is clock slower by x 3 or even more (most know-how from „ high level synthesis “ discipline) moving operator to the data stream (before run time) support operations: no clock nor memory cycle decisions without memory cycles nor clock cycles most „ data fetch “ without memory cycle

54 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 54 >> Final Remarks << http://www.uni-kl.de HPC Embedded Computing The wrong Roadmap Configware Engineering Dual Machine Paradigms Speed-up Examples Final Remarks

55 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 55 First Indications of Change 10th RAW at IPDPS, Nice, France, April 2003: after a decade of non-overlap: first IPDPS people coming HPC Asia 2004 - 7th Int‘l Conference on High Performance Computing, July 20-22, 2004 Omiya Sonic City, Tokyo Area, Japan: Workshop on Reconfigurable Systems f. HPC (RHPC) + keynote address * HPCA-11, 11th International Symposium on High-Performance Computer Architecture, San Francisco, Febr. 12-16, 2005: topic area explicitely: Embedded and reconfigurable architectures SBAC-PAD 2004 - 16th Symposium on Computer Architecture and High Performance Computing, Foz do Iguacu, PR, Brazil, October 27-29, 2004: topic area explicitely: Reconfigurable Systems *) keynote speaker: PARS & Speed-up, Basel, Switzerland, March 2003: keynote address * IPDPS, Santa Fe, NM, USA, April 2004: keynote address * PDP’04, La Coruna, Spain, Febr. 2004: keynote address *

56 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 56 HPC experts coming... ARI, Astrononisches Rechen-Institut, founded 1700 in Berlin, moved 1945 to Heidelberg by August Kopff Gottfried Kirch Simulation of Star Clusters: x10 speed-up by supercomputer-to-morphware migration (also molecular biology et al.) Rainer Spurzem, University of Heidelberg Reinhard M ä nner, University of Mannheim HPC pioneer since 1976 (Physics Dept Heidelberg) Configware by Astrophysics by example: N-body problem went configware paper already at FPL 1999 http://fpl.org

57 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 57 August Kopff 18 th Director, Astrononisches Rechen-Institut (ARI) 1924 - 1954 discovered the Kopff comet, Koenigstuhl Observatory, Heidelberg, Germany, 1906 Copyright © 1996 by Masayuki Suzuki The Galileo spacecraft's 14-year odyssey came to an end on Sunday, Sept. 21, 2003 discovered the asteriod 631 Philippina, 21 March 1907, which became the first asteroid ever visited by a spacecraft - on the Galileo mission to Jupiter

58 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 58 Conclusions We need an academic grass roots movement, for.... RC has become mainstream in all kinds of applications... by a merger with the embedded systems mind set CS education deficits: a curricular revision is overdue...free material & tools for undergraduate lab courses to program and emulate small SW/CW/HW examples all know-how needed readily available: get involved !

59 © 2004, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 59 END


Download ppt "Reconfigurable HPC Reconfigurable HPC part 1 Introduction Reiner Hartenstein TU Kaiserslautern May 14, 2004, TU Tallinn, Estonia."

Similar presentations


Ads by Google