Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling Technologies for Reconfigurable Computing Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design Part 2: Data-Stream-based.

Similar presentations


Presentation on theme: "Enabling Technologies for Reconfigurable Computing Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design Part 2: Data-Stream-based."— Presentation transcript:

1 Enabling Technologies for Reconfigurable Computing Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design Part 2: Data-Stream-based Computing - Reiner Hartenstein University of Kaiserslautern July 8, 2002, ENST, Paris, France

2 © 2002, University of Kaiserslautern 2 Schedule timeslot xx.30 – xx.00 Reconfigurable Computing (RC) xx.00 – xx.30 coffee break xx.30 – xx.00 Design / Compilation Techniques xx.00 – xx.00 lunch break xx.00 – xx.30 Resources for Data-Stream-based RC xx.30 – xx.00 coffee break xx.00 – xx.30 FPGAs: recent developments

3 © 2002, University of Kaiserslautern 3 Opportunities by new patent laws ? to clever guys being keen on patents: don‘t file for patent following details ! everything shown in this presentation has been published years ago

4 © 2002, University of Kaiserslautern 4 >> EDA revolution EDA revolution Dead Supercomputer Data-Stream-based Computing Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

5 © 2002, University of Kaiserslautern 5 Makimoto’s 3rd wave [Hartenstein] The next EDA Industry Revolution 1978 Transistor entry: Applicon, Calma, CV Synthesis: Cadence, Synopsys Schematics entry: Daisy, Mentor, Valid... [Keutzer / Newton] EDA industry paradigm switching every 7 years 1999 (Co-) Compilation Data-Stream-based DPU arrays 2006

6 © 2002, University of Kaiserslautern 6 [Richard Newton]

7 © 2002, University of Kaiserslautern 7 Biggest Mistake in History

8 © 2002, University of Kaiserslautern 8 Innovation Stalled ? [Richard Newton] What is next after VHDL ?

9 © 2002, University of Kaiserslautern 9 © 2001, University of Kaiserslautern missing the next revolution Ignoring reconfigurable computing by teaching computing fundamentals within our CS curricula causing the waste billions of dollars. is one of the biggest mistakes in the history of information technology application [Hartenstein]

10 © 2002, University of Kaiserslautern 10 >> Dead Supercomputer EDA revolution Dead Supercomputer Data-Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

11 © 2002, University of Kaiserslautern 11 Dead Supercomputer Society ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/ Stellar/Stardent DAPP Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP Gould NPL Guiltech ICL Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories [Gordon Bell, keynote at ISCA 2000]. MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics

12 © 2002, University of Kaiserslautern 12 Dying Parallel Computing Society

13 © 2002, University of Kaiserslautern 13 >> Stream-based Computing EDA revolution Dead Supercomputer Data-Stream-based Computing Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

14 © 2002, University of Kaiserslautern 14 anti particles 1956: anti neutron created on Bevatron 1928: Paul Dirac: „there should be an anti electron having positive charge“ (Nobel price 1933) 1932: Carl David Anderson detected this „positron“ in cosmic radiation (Nobel price 1936) 1955 Owen Chamberlain et al. create anti proton on Bevatron 1954: new accelerators: cyclotron, like Berkeley‘s Bevatron 1965: creation of a deuterium anti nucleus at CERN hydrogen anti hydrogen 1995: hydrogen anti atom created at CERN – by forcing positron and anti proton to merge by very low energy..... but there are asymmetries” “in the universe should be regions of anti matter …

15 © 2002, University of Kaiserslautern 15 Matter & Antimatter: Atom and Anti Atom The World of Matter machine paradigm: the Atom Anti Matter machine paradigm: Anti Atom + + Electron spinning Positron spinning +

16 © 2002, University of Kaiserslautern 16 Matter & Antimatter of Informatics : Anti Machine paradigm instruction stream spinning Machine and Anti Machine + CPU st electronic computer (Konrad Zuse) Machine paradigm: „von Neumann“ 1946 v. N. machine paradigm st microprocessor (Ted Hoff) data stream spinning 1979 „data streams“ ( systolic array: Kung / Leiserson) 1995 rDPA / DPSS ( supersystolic: Rainer Kress) data-procedural - DPU anti machine paradigm published

17 © 2002, University of Kaiserslautern 17 RAM-based + CPU Data Path instruction sequencer RAM + simple machine paradigm + scalability + relocatability + compatibility = secret of success of software industry CPU:

18 © 2002, University of Kaiserslautern 18 Nasty Matter + CPU Data Path instruction sequencer RAM Address Computation Overhead Instruction Fetch Overhead central von Neumann bottleneck extremely power hungry and area inefficient performance problems reconfigurable? the wrong machine paradigm alw. new instruction sequencer needed

19 © 2002, University of Kaiserslautern 19 Parallelism by Concurrency independent instruction streams

20 © 2002, University of Kaiserslautern 20 Concurrent Computing.... Bus (es) or switch box Data Path instruction sequencer Data Path instruction sequencer Data Path instruction sequencer Data Path instruction sequencer Data Path instruction sequencer Data Path instruction sequencer extremely inefficient CPU massive switching activity at runtime may affect far beyond Amdahl‘s law

21 © 2002, University of Kaiserslautern 21 Coarse Grain Reconfigurable Arrays vs. Parallel Processes I-Seq ALU Data Sequencer rALU Paralellität auf Prozeß-EbeneParalellität auf Datenpfad-Ebene Parallelism at Process Level Parallelism at Datapath Level reconfigurable hardwired no instruction sequencing !

22 © 2002, University of Kaiserslautern 22 Some differences: CPU versus DPU + CPU Data Path instruction sequencer transport- triggered - DPU Data Path Unit DPU data streams external signal, or nothing central no vN bottleneck: multiple ports instruction fetch not at run time: no overhead data streams scheduled elsewhere RAM data sequencer RAM data sequencer RAM data sequencer … instruction stream routed here

23 © 2002, University of Kaiserslautern 23 machine paradigm: some differences + CPU - - DPA DPU + matter antimatter no. of streams = 1 no. of streams  1

24 © 2002, University of Kaiserslautern 24 DPA = DPU array - DPA - DPU DPA coherent data streams spinning around

25 © 2002, University of Kaiserslautern 25 >>> extremely high efficiency avoiding address computation overhead avoiding instruction fetch and interpretation overhead high parallelism, massively multiple deep pipelines much less configuration memory no routing areas to configure functions from CLBs

26 © 2002, University of Kaiserslautern 26 computing in space Computing in space and time data streams y 1 0  y 2 0 y y 1 y 2 y x 1 x 2 x computing in time a 12 a 11 a 21 a 32 a 31 a 23 a 33 a 22 a 13 placement systolic arrays etc. and other transformations migration by re-timing this dichotomy is completely ignored by our CS curricula

27 © 2002, University of Kaiserslautern 27 2 General Stream-based Computing System heterogenous Array of rDPUs (reconf. data path units) Scheduler Mapper expression tree DPU architectures y + * x a 1 simultaneous placement & routing * * * sh * xf - - data streams 4 The same mapper for both: Reconfigurable, or hardwired Kress DPSS [1995] simulated annealing free form pipe network time space

28 © 2002, University of Kaiserslautern 28 Super Pipe Networks The key is mapping, rather than architecture * *) KressArray [1995]

29 © 2002, University of Kaiserslautern 29 >> Design Space Explorers EDA revolution Dead Supercomputer Data-Stream-based Computing Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

30 © 2002, University of Kaiserslautern 30 domain-specific Reconfigurable Platforms will be suitable to cope with the 2 nd Design Crisis just as the general purpose massively parallel computer system general purpose is unrealistic an Illusion... KressArray Explorer... fully general purpose reconfigurable sometimes is....

31 © 2002, University of Kaiserslautern 31 Universal RAs: is it feasible?... such as obviously also the Universal Massively Parallel Computer Architecture... counter-example: Application Domain of Image Processing The General Purpose (coarse grain) Reconfigurable Array appears to be an Illusion... Motivation

32 © 2002, University of Kaiserslautern 32 -> Design Space Exploration Exploration: –Design Space Explorer (DSEs) –Platform Space Explorers (PSEs) –Compiler / PSE symbiosis –Parallel computing vs. reconfigurable Design Space Explorers: –For VLSI design in general –for parallel Computer Systems –Xplorer the only one f. reconfigurable platforms

33 © 2002, University of Kaiserslautern 33 Design Space Exploration Systems

34 © 2002, University of Kaiserslautern 34 >> KressArray Xplorer EDA revolution Dead Supercomputer Data-Stream-based Computing Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

35 © 2002, University of Kaiserslautern 35 Architecture & Mapping Editor Statistics KressArray DPSS Datastream Generator HDL Generator Simulator Datapath Generator Delay & Power Estimator Improvement Proposal Generator User DPSS Source Input KressArray (Design Space) Platform Space Explorer Xplorer Application Set accessible by internet: runs best with Netscape 4.6.1

36 © 2002, University of Kaiserslautern 36 >> Machine paradigms EDA revolution Dead Supercomputer Data-Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

37 © 2002, University of Kaiserslautern 37 © 2001, University of Kaiserslautern instructions program counter : state register Compiler RAM Datapath hardwired Sequencer Computer tightly coupled by compact instruction code “von Neumann” does not support soft data paths does not support soft data paths Datapath reconfigurable Computer: the wrong Machine Paradigm “von Neumann”

38 © 2002, University of Kaiserslautern 38 © 2001, University of Kaiserslautern Xputer Scheduler Compiler RAM (multiple) sequencer Datapath Array “instructions” University of Kaiserslautern loosely coupled by decision data bits only Xputer: The Soft Machine Paradigm reconfigurable also for hardwired Computer: the wrong Machine Paradigm “von Neumann” data stream spec there are some differences s data counter (anti machine)

39 © 2002, University of Kaiserslautern 39 Machine Paradigms

40 © 2002, University of Kaiserslautern 40 All Fundamental Concepts available Data Sequencer Methodology Data-procedural Languages (Duality w. v. N.)... supporting memory bandwidth optimization Soft Data Path Synthesis Algorithms Parallelizing Loop Transformation Methods Compilers supporting Soft Machines SW / CW Partitioning Co-Compilers Part 3

41 © 2002, University of Kaiserslautern 41 >> Co-Compilation EDA revolution Dead Supercomputer Data-Stream-based Computing Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

42 © 2002, University of Kaiserslautern 42 Changing Models of Computing “von Neumann” downloading RAM downloading data path instruction sequencer I / O (procedural) Software contemporary host hardwired downloading accelerator(s) CAD RAM reconfigurable computing host re- downloading conf. accelerator(s) RAM Software Configware both done at customer site Hardware designer needed done at vendor site ASIC s

43 © 2002, University of Kaiserslautern 43 Co-Compilation partitioning compiler high level programming language source  Processor Reconfigurable Accelerators interface Anti Machine Paradigm: Reconfigurable Architecture (rDPA) no CAD ! Compilation instead ! Hardware / Software Co-Design turns to Configware / Software Co-Design We introduce: Co-Compilation Computer Machine Paradigm Software running on “Soft” Anti Machine Configware running on

44 © 2002, University of Kaiserslautern 44 Jürgen Becker’s Co-DE-X Co-Compiler Analyzer / Profiler Host (vN) GNU C compiler paradigm Computer machine DPSS KressArray (rDPA) X-C compiler Anti machine paradigm Partitioner Loop Transfor- mations X-C is C language extended by MoPL X-C Resource Parameters supporting different platforms supporting platform-based design

45 © 2002, University of Kaiserslautern 45 Loop Transformation Examples loop 1-8 body endloop loop 1-8 body endloop loop 9-16 body endloop fork join strip mining loop 1-4 trigger endloop loop 1-2 trigger endloop loop 1-8 trigger endloop reconf.array: host: loop 1-16 body endloop sequential processes: resource parameter driven Co-Compilation loop unrolling

46 © 2002, University of Kaiserslautern 46 Future Coarse Grain RA Development It is indispensable to operate within the Convergence Area of Compilers, Co-Compilers, Architecture and full- custom-style VLSI Design (array cells). It is a must, that Products come with a Development Platform which encourages users,especially also those with a limited Hardware Background.

47 © 2002, University of Kaiserslautern 47 >> Design Space Explorers EDA revolution Dead Supercomputer Stream-based Computing Stream-based Memory Architecture Design Space Explorers KressArray Xplorer Machine paradigms Co-Compilation

48 © 2002, University of Kaiserslautern 48 END

49 © 2002, University of Kaiserslautern 49 Schedule timeslot – 10.00Reconfigurable Computing (RC) – 10.30coffee break – 12.00Data-stream-based Computing for RC – 14.00lunch break – 15.30Resources forRC – 16.00coffee break – 17.30FPGAs: recent developments

50 © 2002, University of Kaiserslautern 50 Schedule timeslot – 10.00Reconfigurable Computing (RC) – 10.30coffee break – 12.00Data-stream-based Computing for RC – 14.00lunch break – 15.30Resources for RC – 16.00coffee break – 17.30FPGAs: recent developments

51 © 2002, University of Kaiserslautern 51 Dead Supercomputer Society ACRI Alliant American Supercomputer Ametek Applied Dynamics Astronautics BBN CDC Convex Cray Computer Cray Research Culler-Harris Culler Scientific Cydrome Dana/Ardent/Stellar/Stardent DAP (ICL) Denelcor Elexsi ETA Systems Evans and Sutherland Computer Floating Point Systems Galaxy YH-1 Goodyear Aerospace MPP Gould NPL Guiltech Intel Scientific Computers International Parallel Machines Kendall Square Research Key Computer Laboratories MasPar Meiko Multiflow Myrias Numerix Prisma Tera Thinking Machines Saxpy Scientific Computer Systems (SCS) Soviet Supercomputers Supertek Supercomputer Systems Suprenum Vitesse Electronics

52 © 2002, University of Kaiserslautern 52 data counter instructions program counter : state register Compiler Memory Datapath hardwired Sequencer Computer tightly coupled by compact instruction code “von Neumann” does not support soft data paths does not support soft data paths Datapath reconfigurable Xputer Scheduler Compiler Memory multiple sequencer Datapath Array “instructions” University of Kaiserslautern loosely coupled by decision data bits only Xputer: The Soft Machine Paradigm reconfigurable also for hardwired Computer: the wrong Machine Paradigm “von Neumann”

53 © 2002, University of Kaiserslautern 53 Soft Machine Paradigm Xputer Parallel Xputer reconfigurable Scheduler Compiler Memory Sequencer Datapath “instructions” data counter Scheduler Compiler Sequencer Datapath Sequencer “instructions” data counters reconfigurable memory multiple Decision data only; i, e, loose coupling

54 © 2002, University of Kaiserslautern 54  Processor Co-Compilation partitioning compiler Computer Machine Paradigm Software running on Xputer “Soft” Machine Paradigm Configware running on GNU C compiler Analyzer / Profiler Hardware / Software Co-Design turns to Configware / Software Co-Design supporting different platforms Resource Parameters interface X-C compiler Reconfigurable Accelerators KressArray DPSS high level programming language source X-C Partitioner Jürgen Becker’s Co-DE-X Co-Compiler [ASP-DAC’95]

55 © 2002, University of Kaiserslautern 55 Computer: the wrong Machine Paradigm Compiler Memory Sequencer Decoder Datapath instructions program counter hardwired tightly coupled by a compact instruction code “von Neumann” does not support soft data paths: does not support soft data paths: “von Neumann” at run time: no instruction fetch : Instruction Sequencer Datapath reconfigurable

56 © 2002, University of Kaiserslautern 56 KressArray DPSS Application Set DPSS published at ASP-DAC 1995 Architecture Editor Mapping Editor statist. Data Delay Estim. Analyzer Architecture Estimator interm. form 2 expr. tree ALE-X Compiler Power Estimator Power Data VHDL Verilog HDL Generator Simulator User ALEX Code Improvement Proposal Generator Suggestion Selection User Interface interm. form 3 Mapper Design Rules Datapath Generator Kress rDPU Layout data stream Schedule Scheduler KressArray Xplorer (Platform Design Space Explorer) Xplorer Inference Engine (FOX) Sug- gest- ion KressArray family parameters Compiler Mapper Scheduler

57 © 2002, University of Kaiserslautern 57 Changing Models of Computation contemporary host hardwired Compiler accelerator(s) CAD RAM reconfigurable computing host re- Co-Compiler conf. accelerator(s) RAM Software Configware Machine paradigm EDA tools needed* ASIC s *) even 80% hardware people hate their tools both done at customer site done at vendor site no hardware experts needed

58 © 2002, University of Kaiserslautern 58 Machine Paradigms

59 © 2002, University of Kaiserslautern 59 KressArray Design Space Xplorer DPSS-N Data Path Systhesis System Analyser HDL Generator HDL Description.v Module Generator.krs Kress IP Library other IP Editor / User Interface Architecture Estimation Intermediate Format.map ALE-X Compiler ALE-X Code.alex User Mapper Interm. Format.map including configware code Technology Mapping Scheduler Data.seq Sequencing Code Kress rDPU.krs Layout Placement & Routing M a p p i n g Statistical Data.stat to Synthesis Environment

60 © 2002, University of Kaiserslautern 60 FPGA-Style Mapping for coarse grain reconfigurable arrays Compiler Mapper Scheduler specifies and assembles the data streams from / to array DPSS KressArray DPSS (Datapath Synthesis System)

61 © 2002, University of Kaiserslautern 61 Design Flow of Domain-specific Architecture Optimization Nageldinger’s KressArray Design Space Xplorer: including a Fuzzy Logic Improvement Proposal Generator accessible by internet: runs best with Netscape 4.6.1

62 © 2002, University of Kaiserslautern 62 History of Loop Transformations David Loveman, 1977, Allen and Kennedy, et al. Loop Unrolling, Loop Fusion, Strip Mining.... (Parameter-driven) Time to Time/Space Partitioning 1995/97 [Karin Schmidt / Jürgen Becker] : downto Datapath Level: e. g.: Transformation from Sequential Process to Super-systolic Multi-dimensional Loop Unrolling / Storage Scheme Optimization supporting burst-mode & parallel Memory Banks 2000 [Michael Herz] : optimized RA to Memory Communication Bandwidth: 70ies - 80ies: at Process Level: Sequential to Parallel Processes, incl. Vectorization

63 © 2002, University of Kaiserslautern 63 History of Loop Transformations For Sequential Programs on Parallel Computers: David Loveman, 1977, Allen and Kennedy, etc.: Loop Unrolling, Loop Fusion, Strip Mining.... For memory communication: Michael Herz (2000): Multi-Level Loop Unrolling to reduce Memory Cycles needed to create RA Data Streams For parallel Datapaths: Jürgen Becker (1997): to Sequential to Super-Systolic Transformation Optimize Throughput of Reconfigurable Arrays (RAs) Instruction Code vs. Reconfiguration Code

64 © 2002, University of Kaiserslautern 64 Paradigm Shift Mainstream Tornado Development of Hypergrowth Markets Harper Business 1995

65 © 2002, University of Kaiserslautern 65 EDA: where Electronics begins [Richard Newton] 1k Dataquest Initiative New book NASDAQ index EDA index

66 © 2002, University of Kaiserslautern 66 What is next after VHDL ? Motivations HDL-savvy designers needed New Business Model Co-Design never ending HDLs ? Extended HDLs – how far ? Automatic Partitioning

67 © 2002, University of Kaiserslautern 67 Dead Supercomputer Society About 40 university and corporate R&D projects: 2 or 3 successes… All the rest failed to work or to be successful (Research )

68 © 2002, University of Kaiserslautern 68 The End is near year to market transistors/chip x1.6/year The end of Hypergrowth ? x100/decade

69 © 2002, University of Kaiserslautern 69 Hot Research Topic: Memory Architectures High Performance Embedded Memory Architectures High Performance Memory Communication Architectures [Herz] Custom Memory Management Methodology [Cathoor] Data Reuse Transformations [Kougia et al.] Data Reuse Exploration [Soudris, Wuytak]

70 © 2002, University of Kaiserslautern 70 Processor Memory Performance Gap

71 © 2002, University of Kaiserslautern 71 RAs: Cache does not help the memory bandwidth problem is often more dramatic then for microprocessors interleaving is not practicable, since based on sequential instruction streams classical caches do not help, since instruction sequencing is not used the problem: throughput of parallel data streams, not instruction streams super pipe networks, no parallel computers ! Stream-based arrays are a memory bandwidth problem

72 © 2002, University of Kaiserslautern 72 Efficient Memory Communication should be directly supported by the Mapper Tools sequencers memory ports application not used Legend: Optimized Parallel Memory Controller An example by Nageldinger’s KressArray Xplorer Synthesizable Memory Communication

73 © 2002, University of Kaiserslautern 73 The Disk Farm? or a System On a Card? The 500GB disc card LOTS of bandwidth A few disks replaced by >10s Gbytes RAM and a processor 14" MicroDrive:1.7” x 1.4” x 0.2” 2006: ? 1999: 340 MB, 5400 RPM, 5 MB/s, 15 ms seek 2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW) Integrated IRAM processor 2x height Connected via crossbar switch growing like Moore’s law 16 Mbytes; ; 1.6 Gflops; 6.4 Gops 10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tflops [Gordon Bell, Jim Gray, ISCA2000]

74 © 2002, University of Kaiserslautern 74 Memory Communication Architecture hot research topic in embedded systems storage context transformations [Herz, others] for low power for high performance startups provide memory IP or generators

75 © 2002, University of Kaiserslautern 75 Stream-based Soft Machine Scheduler Memory (data memory) memory bank... “instructions” rDPA Compiler Sequencers (data stream generator)

76 © 2002, University of Kaiserslautern 76 Stream-based Computing DPU driven by data stream from / to memory or, from / to peripheral interface transport-triggered execution no instruction sequencer inside !

77 © 2002, University of Kaiserslautern 77 Stream-based Computing: (r) DPU array for both, reconfigurable, and, hardwired DPU driven by data streams

78 © 2002, University of Kaiserslautern 78 Systolic Stream-based Computing System Systolic Array [ H. T. Kung, 1980 ] : an array of DPUs (Data Path Units) DPU architecture y + * x a data streams equations placement linear projection or algebraic mapping The Mathematician’s Synthesis Method linear pipelines and uniform arrays only no routing!

79 © 2002, University of Kaiserslautern 79 Converging Design Flows this synthesis method is a generalization of systolic array synthesis: super systolic synthesis and DPA [Broderson, 2000]: terms: DPU: datpath unit DPA: data path array rDPU: reconfigurable DPU rDPA: reconfigurable DPA the same synthesis method may be used for mapping an algorithm onto both: rDPA [Kress, 1995],


Download ppt "Enabling Technologies for Reconfigurable Computing Enabling Technologies for Reconfigurable Computing and Software / Configware Co-Design Part 2: Data-Stream-based."

Similar presentations


Ads by Google