Download presentation
Presentation is loading. Please wait.
Published byStephanie Floyd Modified over 8 years ago
1
How to cope with the Power Wall Reiner Hartenstein TU Kaiserslautern DRAFT PATMOS 2015, the 25 th International Workshop on Power and Timing Modeling, Optimization and Simulation; Salvador, Bahia, Brazil, Sept 1-5, 2015
2
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 2 >> Outline << http://www.uni-kl.de The Power Wall “Dataflow” Computing Reconfigurable Computing Time to Space Mapping The Xputer Paradigm Conclusions
3
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 3 years elder than ISPLED. Oldest conference series on power efficiency issues spin-off from the EU’s PATMOS project Power efficiency is going to become an industry-wide issue Some incremental improvements are on track, at all abstraction levels - not only in hardware design and software development however, there is still a lot to do http://hartenstein.de/PATMOS/
4
© 2015, reiner@hartenstein.de http://hartenstein.de Power Efficiency of Programming Languages (an example)
5
© 2015, reiner@hartenstein.de http://hartenstein.de 5 source: iStockphoto Programming Language Popularity not yet determined by power efficiency (IEEE)
6
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 6 6 © New York Times Data Center at Dallas Columbia river G. Fettweis, E. Zimmermann: ICT Energy Consumption - Trends and Challenges; WPMC'08, Lapland, Finland, 8 – 11 Sep 2008 Power consumption by internet: x30 til 2030 if trends continue Already the internet’s 2007 carbon footprint was higher than that of worldwide air traffic It‘s more than the entire world‘s total power consumption to-day !!! IDC predicts: the total number of data centers around the world will peak at 8.6 million in 2017. ICT infrastructures
7
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern Google‘s Electricity Bill „The possibility of computer equipment power consumption spiraling out of control could have serious consequences for the overall affordability of computing” [L. A. Barrosso, Google] Google patent for water-based data centers: the ocean provides cooling & power (motion of surface) Google asked the Federal Energy Regulatory Commission for the authority to sell electricity, Cost of a data center is determined solely by the monthly power bill, not by the cost of hardware or maintenance (for example)
8
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern Massive Critique of the von Neumann Model John Backus 1978: Can programming be liberated from the von Neumann style? 8 “von Neumann Syndrome” C.V. Ramamoorthy Nathan’s Law: Software is a gas. It expands to fill all its containers... Nathan Myhrvold Dave Patterson bandwidth gap grows 50% / year has reached >1000x (long ago) Patterson’s Law: instruction streams are very slow and extremely memory-cycle-hungry Software dimension: MLoC (Million Lines of Code): SAP NetWeaver 2007238 Mac OS X 10.4 86 Debian 5.0 2009 324 von Neumann: an extremely power-inefficient paradigm Von Neumann Syndrome
9
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 9 >> Outline << http://www.uni-kl.de The Power Wall “Dataflow” Computing Reconfigurable Computing Time to Space Mapping The Xputer Paradigm Conclusions
10
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern Terminology Problems Searching a more power-efficient machine paradigm: Stressing differences to „ Control-Flow “ Computers an area called „Dataflow“ Computers was started mid‘ 70ies at MIT Xputer area people are forced to sidestep by using terms like „data-driven“ or „data streams“… although the „Dataflow“ scene is I-Structure-centered.
11
© 2015, reiner@hartenstein.de http://hartenstein.de 11 I-Structure Storage Communication Network PE I-Structure Storage PE MIT Tagged Token Dataflow Architecture* Wait-Match Unit & Waiting Token Store Instruction Fetch Unit Token Queue Program Store & Constant Store Form Token Unit ALU & Form Tag to/from the Communication Network RU SU PE : *) source: Jurij SilcJurij Silc „instruction is executed even if some of its operands are not yet available“ no PC no updateable global store ?
12
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern Solution: I- Structure concept * 12 can be read but not written at least one read request has been deferred read request deferrend but write operation is allowed [ source: Jurij Silc]Jurij Silc very complex data structures ! “each update consumes the structure and the value produces a new data structure” “ awkward or even impossible to implement ” Problems with “Dataflow” Architectures *) „ I- Structure Flow“ instead of „Dataflow“ „instruction is executed even if some of its operands are not yet available“ ? ?
13
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern A Second Opinion D. D. Gajski, D. A. Padua, D. J. Kuck, R. H. Kuhn: A Second Opinion on Data Flow Machines and Languages; IEEE COMPUTER, February 1982 the subtitle: "... data flow techniques attract a great deal of attention. Other alternatives, however, offer more hope for the future." ( Still active … 5th workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM 2015), Oct 18-21, 2015, San Francisco, CA, USA ) However, the power efficiency break-thru did not happen here
14
© 2015, reiner@hartenstein.de http://hartenstein.de 14 Computing Paradigms term program counter execution triggered by paradigm CPU yes instruction fetch instruction-stream- based (von Neumann) program counter DPU CPU term no data arrival ** data-driven or data- stream-based data counters r DPU RPU Xputer Reconfigurable Computing no complicated I - structure handling “ Dataflow ” Computer **) “transport-triggered” *) based on data dependency graph I - structure DPUs DFC DFC *
15
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 15 >> Outline << http://www.uni-kl.de The Power Wall “Dataflow” Computing Reconfigurable Computing Time to Space Mapping The Xputer Paradigm Conclusions
16
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern LUT Table LUT Table First FPGA available 1984 from Xilinx
17
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 17 [Tarek El-Ghazawi et al.: IEEE COMPUTER, Febr. 2008] much less equipment needed much less memory and bandwidth needed massively saving energy Reconfigurable Computing (RC): the intensive Impact SGI Altix 4700 with RC 100 RASC compared to Beowulf cluster Tarek El-Ghazawi Speed-ups by von Neumann to RC Migrations Application Speed-up factor Savings PowerCostSize DNA and Protein sequencing 872377922253 DES breaking 285143439961116
18
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 18 FFT 100 Reed-Solomon Decoding 2400 Viterbi Decoding 400 1000 MAC DSP and wireless Smith-Waterman pattern matching 288 molecular dynamics simulation 88 BLAST 52 protein identification 40 Bioinformatics GRAPE 20 Astrophysics SPIHT wavelet-based image compression 457 real-time face detection 6000 video-rate stereo vision 900 pattern recognition 730 Image processing, Pattern matching, Multimedia 3000 CT imaging Crypto 1000 28514 DES breaking 10 6 Speedup-Factor 199520002010 2005 ~300 3439 DNA & protein sequencing 8723 779 x2/yr 1985 1990 10 0 *) TU Kaiserslautern 10 0 10 3 + Pre-FPGA solutions Xputer 15000 no FPGA: DPLA on MoM by TU-KL* Grid-based DRC* („fair comparizon“) 1984: 1 DPLA replaces 256 FPGAs 10 5 Speedup- Factor Speed-ups by vN Software to FPGA Migrations 10 3 ~10% of speed-u p Power saving : http://www.fpl.uni-kl.de/staff/hartenstein/Hartenstein-Speedup-Factors.pdf http://www.fpl.uni-kl.de/staff/hartenstein/eishistory_en.html fabricated by E.I.S. Multi University Project Chip
19
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern Design Rule Check Accelerator 19 Processing 4-by-4 Reference Patterns DPLA: fabricated by the E.I.S. Multi University Project: *) PISA DRC accelerator [ICCAD 1984] 1984: 1 DPLA replaces 256 FPGAs DPLA was more area-efficient than FPGA - by several orders of magnitude http://www.fpl.uni-kl.de/staff/hartenstein/eishistory_en.html
20
© 2015, reiner@hartenstein.de http://hartenstein.de 20 although the effective integration density of FPGAs is by 4 orders of magnitude behind the Gordon Moore curve, because of: wiring overhead reconfigurability overhead routing congestion etc. The Reconfigurable Computing Paradox Enabling software developers to apply their skills over FPGAs has been a long and, as of yet, unreached research objective in reconfigurable computing. Reinvent Computing
21
© 2015, reiner@hartenstein.de http://hartenstein.de 21 Obstacles to widespread FPGA adoption go well beyond the required skill set - Workshop at FPL_2015 http://reconfigurablecomputing4themasses.net/
22
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 22 >> Outline << http://www.uni-kl.de The Power Wall “Dataflow” Computing Reconfigurable Computing Time to Space Mapping The Xputer Paradigm Conclusions
23
© 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern Dual paradigm mind set: an old hat – but was ignored 23 time to space mapping: procedural to structural: 1967: W. A. Clark: Macromodular Computer Systems; 1967 SJCC, AFIPS Conf. Proc. C. G. Bell et al: The Description and Use of Register- Transfer Modules (RTM's); IEEE Trans-C21/5, May 1972 FF token bit evoke FF „This is so simple …… 1971 loop to pipe mapping T u n n e l V i s i o n ! why did it take 25 years to find out? PDP-16 RTMs:
24
© 2015, reiner@hartenstein.de http://hartenstein.de Transformations since the 70ies time domain: procedure domain space domain: structure domain 24 Pipeline k time steps, n DPUs time algorithm space/time algorithmus Strip Mining Transformation Loop Transformations: rich methodology published: [survey: Diss. Karin Schmidt, 1994, Shaker Verlag]Karin SchmidtShaker Verlag n x k time steps, program loop 1 CPU (time to time/space mapping)
25
© 2015, reiner@hartenstein.de http://hartenstein.de 25 x x x x x x x x x | || xx x x x x xx x -- - input data stream xx x x x x xx x -- - - - - - - - - - - x x x x x x x x x | | | | | | | | | | | | | | output data streams „ data streams “ time port # time port # time port # define:... which data item at which time at which port The Systolic Arrays (1) no instruction streams needed 1980 DPA (pipe network) DataPath Array (array of DPUs) execution transport- triggered M. J. Foster, H. T. Kung: The Design of Special-Purpose VLSI Chips... IEEE 7th ISCA, La Baule, France, May 6-8, 1980 H. T. Kung
26
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern why not general purpose ? 26 just only Special purpose VLSI chips ? M. J. Foster and H. T. Kung: “The Design of Special- Purpose VLSI Chips... “ (Tunnel Vision) Karl Steinbuch It is not sufficient to invent something. You need to recognize, that you have invented something. Systolic Arrays (2) http://karl-steinbuch.org
27
© 2015, reiner@hartenstein.de http://hartenstein.de 27 H. T. Kung: „of course algebraic!“ ( linear projection) My student Rainer Kress replaced it by simulated annealing : supports also any irregular & wild shape pipe networks What Synthesis Method? The super-systolic array: a generalization of the systolic array: http://kressarray.de/ supports only special applications with strictly regular data dependencies only linear pipes *) KressArray [ASP-DAC-1995] now a general purpose methodology !
28
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern H. T. Kung: “It’s not our job” 28 *) or receives another Tunnel Vision Symptom without a sequencer: missed to invent a new machine paradigm Who generates* the datastreams? (the Xputer)
29
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern The Xputer machine Paradigm obtained by adding auto-sequencing memory ASM : A uto- S equencing M emory GAG : G eneric A ddress G enerstor ASM : A uto- S equencing M emory GAG : G eneric A ddress G enerstor with multiple data counters instead of a program counter is the TU-KL ‘ s Symbiosis of Time to Space Mapping and Reconfigurable Computing. (TU-KL) ASM data counter GAG RAM ASM Xputer literature
30
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 30 >> Outline << http://www.uni-kl.de The Power Wall “Dataflow” Computing Reconfigurable Computing Time to Space Mapping The Xputer Paradigm Conclusions
31
© 2015, reiner@hartenstein.de http://hartenstein.de What means Configware? Software Compiler Software Code 31 (instruction-procedural) Software Source space domain time domain Software to Configware Migration traditional Computing Configware Code (structural: space domain) mapper Placement & Routing Configware Source Reconfigurable Computing ( time domain) Xputer pages
32
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 32 Compilation: Software vs. Configware u. Flowware source program software compiler software code Software Engineering configware code mapper configware compiler scheduler flowware code source „ program “ Configware Engineering placement & routing data C, FORTRAN MATHLAB procedural: time domain) space domain time domain Xputer literature
33
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 33 Heterogeneous: Co-Compilation software compiler software code Software / Configware Co-Compiler configware code mapper configware compiler scheduler flowware code data C, FORTRAN, MATHLAB automatic SW / CW partitioner simulated annealing Xputer pages
34
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 34 Paradigm Shift Consequences Software Engineering 1 programming source needed algorithm: variable resources: fixed software CPU Program Counter (PC) configware resources: variable 2 programming sources needed flowware algorithm: variable Configware Engineering Data Counters (DCs) sequencing code (e. g. see MoPL language) Configuration Code (CC) von Neumann: Xputer: Xputer literature
35
© 2015, reiner@hartenstein.de http://hartenstein.de 35 data counter GAG RAM ASM : A uto- S equencing M emory use data counters, no program counter rDPA xx x x x x xx x -- - xx x x x x xx x -- - - - - - - - - - - ASM Data stream generators (pipe network) x x x x x x x x x | || x x x x x x x x x | | | | | | | | | | | | | | ASM implemented by distributed on-chip- memory 100 & more on-chip ASM are feasible 100 & more on-chip ASM are feasible reconfigurable 32 ports, or n x 32 ports other example Xputer machine paradigm Xputer pages
36
© 2006, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern Duality of procedural Languages Flowware Languages read next data item goto (data address) jump to (data address) data loop data loop nesting data loop escape data stream branching yes: internally parallel loops 36 Software Languages read next instruction goto (instruction address) jump to (instruction address) instruction loop instruction loop nesting instruction loop escape instruction stream branching no: no internally parallel loops Asymmetry But there is an Asymmetry program counter: data counter(s): more simple: no ALU tasks MoPL state register(s): Xputer literature Xputer pages
37
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 37 CoDe-X Dual Paradigm Application Development CPU SW compiler CW compiler C language source Partitioner rDPU Juergen Becker’s PhD thesis 1996 Road map to the Personal Supercomputer instruction- stream- based software codeconfigware code data- stream- based accelerator reconfigurable accelerator hardwired software/configware co-compiler high level language CPU
38
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 38 >> Outline << http://www.uni-kl.de The Power Wall “Dataflow” Computing Reconfigurable Computing Time to Space Mapping The Xputer Paradigm Conclusions
39
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 39 I llustrating the Paradigm Trap The von Neumann Manycore Approach the watering can model [Hartenstein] ( many crippled watering cans ) many von Neumann bottle- necks The von Neumann single core Approach The Memory Wall ( crippled watering can ) (1)
40
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 40 I llustrating the Paradigm Trap The “Dataflow“ Computer extremely complicated: no watering can model ! (2) (a power efficiency break-thru did not yet happen here)
41
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern 41 Xputer: the only massively power-efficient Paradigm The Xputer Paradigm has no von Neumann bottle- neck the watering can model [Hartenstein ]
42
© 2015, reiner@hartenstein.de http://hartenstein.de 42 thank you !
43
© 2015, reiner@hartenstein.de http://hartenstein.de 43 END
44
© 2015, reiner@hartenstein.de http://hartenstein.de 44 Backup for discussion:
45
© 2015, reiner@hartenstein.de http://hartenstein.de 45 no memory wall DPA x x x x x x x x x | || xx x x x x xx x -- - input data streams xx x x x x xx x -- - - - - - - - - - - x x x x x x x x x | | | | | | | | | | | | | | output data streams DPU operation is transport-triggered no instruction streams no message passing nor thru common memory massively avoiding memory cycles Pipelining through DPU Arrays: the TU-KL Xputer principle DPA = DPU array DPU = Data Path Unit
46
© 2015, reiner@hartenstein.de http://hartenstein.de I -Structures ( I = incremental) - part 1 http://csd.ijs.si/courses/dataflow/ Jurij Silc: Dataflow Architectures 28 27 28 26
47
© 2015, reiner@hartenstein.de http://hartenstein.de MIT Tagged-Token Dataflow Architecture (PE) 31 29 http://csd.ijs.si/courses/dataflow / Jurij Silc: Dataflow Architectures I -structure select I - structure assign 30 20 I -Structures - part 2
48
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern Taxonomy 48 Flynn‘s taxonomy: von Neumann only Diana‘s taxonomy: Reconfigurable Computing Reiner‘s taxonomy: heterogeneous systems Reiner‘s 2 nd taxonomy: Xputers only noI
49
© 2015, reiner@hartenstein.de http://hartenstein.de TU Kaiserslautern Von Neumann Syndrome 49 Lambert M. Surhone, Mariam T. Tennoe, Susan F. Hennessow (ed.): Von Neumann Syndrome; ßetascript publishing 2011 S h i f t i n g t o t h e d o m i n a n c e o f v o n - N e u m a n n - o n l y c a u s e d a n c u m u l a t e d d a m a g e o f a t l e a s t t r i l l i o n s o f D o l l a r s, i f n o t q u a d r i l l i o n s ….
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.