Presentation is loading. Please wait.

Presentation is loading. Please wait.

The von Neumann Syndrome Reiner Hartenstein TU Kaiserslautern TU Delft, Sept 28, 2007 (v.2)

Similar presentations


Presentation on theme: "The von Neumann Syndrome Reiner Hartenstein TU Kaiserslautern TU Delft, Sept 28, 2007 (v.2)"— Presentation transcript:

1 The von Neumann Syndrome Reiner Hartenstein TU Kaiserslautern TU Delft, Sept 28, (v.2)

2 © 2007, [R.H.] TU Kaiserslautern 2 von Neumann Syndrome this term has been coined by “ RAM ” (C.V. Ramamoorthy, emeritus, UC Berkeley)

3 © 2007, [R.H.] TU Kaiserslautern 3 The first Reconfigurable Computer prototyped 1884 by Herman Hollerith a century before FPGA introduction data-stream-based 60 years later the von Neumann (vN) model took over instruction-stream-based

4 © 2007, [R.H.] TU Kaiserslautern 4 Outline von Neumann overhead hits the memory wall The manycore programming crisis Reconfigurable Computing is the solution We need a twin paradigm approach Conclusions

5 © 2007, [R.H.] TU Kaiserslautern 5 The spirit of the Mainframe Age For decades, we’ve trained programmers to think sequentially, breaking complex parallelism down into atomic instruction steps … Even in “hardware” courses (unloved child of CS scenes) we often teach von Neumann machine design – deepening this tunnel view … finally tending to code sizes of astronomic dimensions 1951: Hardware Design going von Neumann (Microprogramming)

6 © 2007, [R.H.] TU Kaiserslautern 6 von Neumann: array of massive overhead phenomena overhead von Neumann machine instruction fetch instruction stream state address computation instruction stream data address computation instruction stream data meet PU instruction stream i/o - to / from off-chip RAM instruction stream … other overhead instruction stream … piling up to code sizes of astronomic dimensions

7 © 2007, [R.H.] TU Kaiserslautern 7 von Neumann: array of massive overhead phenomena overhead von Neumann machine instruction fetch instruction stream state address computation instruction stream data address computation instruction stream data meet PU instruction stream i/o - to / from off-chip RAM instruction stream … other overhead instruction stream piling up to code sizes of astronomic dimensions [R.H. 1975] universal bus considered harmful [Dijkstra 1968] the “go to” considered harmful temptations by von Neumann style software engineering massive communication congestion Backus, 1978 : Can programming be liberated from the von Neumann style? Arvind et al., 1983: A critique of Multiprocessing the von Neumann Style

8 © 2007, [R.H.] TU Kaiserslautern 8 von Neumann overhead: just one example overhead von Neumann machine instruction fetch instruction stream state address computation instruction stream data address computation instruction stream data meet PU instruction stream i/o - to / from off-chip RAM instruction stream … other overhead instruction stream [1989]: 94% computation load (image processing example) 94% computation load only for moving this window

9 © 2007, [R.H.] TU Kaiserslautern 9 the Memory Wall DRAM 7%/yr.. µProc 60%/yr.. Dave Patterson’s Law - “Performance” Gap: … needs off-chip RAM which fully hits instruction stream code size of astronomic dimensions ….. growth 50% / year CPU clock speed ≠ performance: processor’s silicon is mostly cache better compare off-chip vs. fast on-chip memory ends in : ~1000

10 © 2007, [R.H.] TU Kaiserslautern 10 Benchmarked Computational Density [BWRC, UC Berkeley, 2004] SPECfp2000/MHz/Billion Transistors DEC alpha SUN HP IBM alpha: down by 100 in 6 yrs IBM: down by 20 in 6 yrs stolen from Bob Colwell CPU caches... CPU clock speed ≠ performance: processor’s silicon is mostly cache

11 © 2007, [R.H.] TU Kaiserslautern 11 Outline von Neumann overhead hits the memory wall The manycore programming crisis Reconfigurable Computing is the solution We need a twin paradigm approach Conclusions

12 © 2007, [R.H.] TU Kaiserslautern 12 The Manycore future we are embarking on a new computing age -- the age of massive parallelism [Burton Smith] multiple von Neumann CPUs on the same µprocessor chip lead to exploding (vN) instruction stream overhead [R.H.] Even mobile devices will exploit multicore processors, also to extend battery life [B.S.] everyone will have multiple parallel computers [B.S.]

13 © 2007, [R.H.] TU Kaiserslautern 13 Several overhead phenomena The instruction-stream-based parallel von Neumann approach: the watering pot model [Hartenstein] has several von Neumann overhead phenomena per CPU ! CPU

14 © 2007, [R.H.] TU Kaiserslautern 14 Explosion of overhead by von Neumann parallelism overhead von Neumann machine monoprocessor local overhead instruction fetchinstruction stream state address computation instruction stream data address computation instruction stream data meet PUinstruction stream i / o to / from off-chip RAM instruction stream … other overheadinstruction stream parallel global inter PU communication instruction stream message passinginstruction stream proportionate to the number of processors disproportionate to the number of processors [R.H. 2006] MPI considered harmful CPU

15 © 2007, [R.H.] TU Kaiserslautern 15 Rewriting Applications more processors means rewriting applications we need to map an application onto different size manycore configurations most applications are not readily mappable onto a regular array. CPU rDPU Mapping is much less problematic with Reconfigurable Computing

16 © 2007, [R.H.] TU Kaiserslautern 16 Disruptive Development Computer industry is probably going to be disrupted by some very fundamental changes. [ I an Barron] I don‘t agree: we have a model. A parallel [vN] programming model for manycore machines will not emerge for five to 10 years [experts from Microsoft Corp]. We must reinvent computing. [Burton J. Smith] Reconfigurable Computing: Technology is Ready, Users are Not It‘s mainly an education problem The Education Wall

17 © 2007, [R.H.] TU Kaiserslautern 17 Outline von Neumann overhead hits the memory wall The manycore programming crisis Reconfigurable Computing is the solution We need a twin paradigm approach Conclusions

18 © 2007, [R.H.] TU Kaiserslautern 18 The Reconfigurable Computing Paradox The spirit from the Mainframe Age is collapsing under the von Neumann syndrome There is something fundamentally wrong in using the von Neumann paradigm Up to 4 orders of magnitude speedup + tremendously slashing the electricity bill by migration to FPGA Bad FPGA technology: reconfigurability overhead, wiring overhead, routing congestion, slow clock speed The reason of this paradox ?

19 © 2007, [R.H.] TU Kaiserslautern 19 beyond von Neumann Parallelism We need an approach like this: The instruction-stream-based von Neumann approach: the watering pot model [Hartenstein] has several von Neumann overhead phenomena per CPU ! it’s data- stream- based RC * *) “RC” = Reconfigurable Computing

20 © 2007, [R.H.] TU Kaiserslautern 20 von Neumann overhead vs. Reconfigurable Computing overhead von Neumann machine hardwired anti machine reconfigurable anti machine instruction fetch instruction streamnone* state address computation instruction streamnone* data address computation instruction streamnone* data meet PU + other overh. instruction streamnone* i / o to / from off-chip RAM instruction streamnone* Inter PU communication instruction streamnone* message passing overheadinstruction streamnone* using reconfigurable data counters using data counters using program counter *) configured before run time rDPU rDPA: reconfigurable datapath array (coarse-grained rec.) no instruction fetch at run time

21 © 2007, [R.H.] TU Kaiserslautern 21 overhead von Neumann machine hardwired anti machine reconfigurable anti machine instruction fetch instruction streamnone* state address computation instruction streamnone* data address computation instruction streamnone* data meet P + other overh. instruction streamnone* i / o to / from off-chip RAM instruction streamnone* Inter PU communication instruction streamnone* message passing overheadinstruction streamnone* **) just by reconfigurable address generator von Neumann overhead vs. Reconfigurable Computing using reconfigurable data counters using data counters using program counter *) configured before run time rDPU [1989]: x 17 speedup by GAG ** (image processing example) rDPA: reconfigurable datapath array (coarse-grained rec.) [1989]: x 15,000 total speedup from this migration project

22 © 2007, [R.H.] TU Kaiserslautern 22 Reconfigurable Computing means … Reconfigurable Computing means moving overhead from run time to compile time ** For HPC run time is more precious than compiletime Reconfigurable Computing replaces “looping” at run time* … … by configuration before run time *) e. g. complex address computation **) or, loading time

23 © 2007, [R.H.] TU Kaiserslautern 23 Data meeting the Processing Unit (PU) by Software by Configware routing the data by memory-cycle-hungry instruction streams thru shared memory data-stream-based: placement * of the execution locality... We have 2 choices pipe network generated by configware compilation... explaining the RC advantage *) before run time (data) (PU)

24 © 2007, [R.H.] TU Kaiserslautern 24 pipe network, organized at compile time rDPA = rDPU array, i. e. coarse-grained rDPU = reconf. datapath unit (no program counter) What pipe network ? rDPA rDPU Generalization * of the systolic array array port receiving or sending a data stream rDPA rDPU [R. Kress, 1995] *) supporting non-linear pipes on free form hetero arrays depending on connect fabrics

25 © 2007, [R.H.] TU Kaiserslautern 25 data counter GAG RAM ASM : A uto- S equencing M emory rDPA ASM Migration benefit by on-chip RAM so that the drastic code size reduction by software to configware migration can beat the memory wall Some RC chips have hundreds of on-chip RAM blocks, orders of magnitude faster than off-chip RAM multiple on-chip RAM blocks are the enabling technology for ultra-fast anti machine solutions rDPU ASM rDPA = rDPU array, i. e. coarse-grained rDPU = reconf. datapath unit (no program counter) GAGs inside ASMs generate the data streams GAG = generic address generator

26 © 2007, [R.H.] TU Kaiserslautern 26 Coarse-grained Reconfigurable Array example image processing: SNN filter ( mainly a pipe network) note: kind of software perspective, but without instruction streams  datastreams + pipelining compiled by Nageldinger‘s KressArray Xplorer (Juergen Becker‘s CoDe-X inside) array size: 10 x 16 = 160 such rDPUs rout thru only not used backbus connect ASM rDPU bits wide mesh-connected; exceptions: see 3 x 3 fast on-chip RAM coming close to programmer‘s mind set (much closer than FPGA)

27 © 2007, [R.H.] TU Kaiserslautern 27 Outline von Neumann overhead hits the memory wall The manycore programming crisis Reconfigurable Computing is the solution We need a twin paradigm approach Conclusions

28 © 2007, [R.H.] TU Kaiserslautern 28 Software / Configware Co-Compilation Analyzer / Profiler SW code SW compiler paradigm “vN" machine CW Code CW compiler anti machine paradigm Partitioner C language source FW Code Juergen Becker 1996 But we need a dual paradigm approach: to run legacy software together w. configware Reconfigurable Computing: Technology is Ready. -- Users are Not ? apropos compilation: The CoDe-X co-compiler

29 © 2007, [R.H.] TU Kaiserslautern 29 Curricula from the mainframe age non-von-Neumann accelerators (procedural) structurally disabled (this is not a lecture on brain regions) no common model the education wall not really taught the main problem the common model is ready, but users are not t h e c o m m o n m o d e l i s r e a d y, b u t u s e r s a r e n o t

30 © 2007, [R.H.] TU Kaiserslautern 30 We need a twin paradigm education Brain Usage: both Hemispheres each side needs its own common model procedural structural (this is not a lecture on brain regions)

31 © 2007, [R.H.] TU Kaiserslautern 31 RCeducation The 3rd International Workshop on Reconfigurable Computing Education April 10, 2008, Montpellier, France teaching RC ?

32 © 2007, [R.H.] TU Kaiserslautern 32 We need new courses “We urgently need a Mead-&-Conway-like text book “ [R. H., Dagstuhl Seminar 03301,Germany, 2003] We need undergraduate lab courses with HW / CW / SW partitioning We need new courses with extended scope on parallelism and algorithmic cleverness for HW / CW / SW co-design 2007 Here it is !

33 © 2007, [R.H.] TU Kaiserslautern 33 Outline von Neumann overhead hits the memory wall The manycore programming crisis Reconfigurable Computing is the solution We need a twin paradigm approach Conclusions

34 © 2007, [R.H.] TU Kaiserslautern 34 Conclusions But we need it for some small code sizes, old legacy software, etc. … Data streaming is the key model of parallel computation – not vN We need to increase the population of HPC-competent people [B.S.] The twin paradigm approach is inevitable, also in education [R. H.]. Von-Neumann-type instruction streams considered harmful [RH] We need to increase the population of RC-competent people [R.H.]

35 © 2007, [R.H.] TU Kaiserslautern 35 An Open Question please, reply to: Coarse-grained arrays: technology ready*, users not ready Much closer to programmer’s mind set: really much closer than FPGAs** W h i c h e f f e c t i s d e l a y i n g t h e b r e a k - t h r o u g h ? *) offered by startups (PACT Corp. and others ) **) “FPGAs? Do we need to learn hardware design?”

36 © 2007, [R.H.] TU Kaiserslautern 36 thank you

37 © 2007, [R.H.] TU Kaiserslautern 37 END

38 © 2007, [R.H.] TU Kaiserslautern 38.

39 © 2007, [R.H.] TU Kaiserslautern 39 Disruptive Development The way the industry has grown up writing software - the languages we chose, the model of synchronization and orchestration, do not lead toward uncovering parallelism for allowing large-scale composition of big systems. [ I ann Barron]

40 © 2007, [R.H.] TU Kaiserslautern 40 Dual paradigm mind set: an old hat Mapped into a Hardware mind set: action box = Flipflop, decision box = (de)multiplexer Software mind set: instruction-stream-based: flow chart -> control instructions (mapping from procedural to structural domain) C. G. Bell et al: The Description and Use of Register-Transfer Modules (RTM's); IEEE Trans-C21/5, May 1972 W. A. Clark: Macromodular Computer Systems; 1967 SJCC, AFIPS Conf. Proc. 1967: 1972: FF token bit evoke FF


Download ppt "The von Neumann Syndrome Reiner Hartenstein TU Kaiserslautern TU Delft, Sept 28, 2007 (v.2)"

Similar presentations


Ads by Google