Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enabling Technologies for System-on-Chip Development Reconfigurable Computing Architectures and Methodologies for System-on-Chip Monday, November 19, 10:15.

Similar presentations

Presentation on theme: "Enabling Technologies for System-on-Chip Development Reconfigurable Computing Architectures and Methodologies for System-on-Chip Monday, November 19, 10:15."— Presentation transcript:

1 Enabling Technologies for System-on-Chip Development Reconfigurable Computing Architectures and Methodologies for System-on-Chip Monday, November 19, 10:15 - 11:00 hrs. Reiner Hartenstein University of Kaiserslautern November 19-20, 2001, Tampere, Finland

2 © 2001, University of Kaiserslautern 2 Conferences: growing attendence topic adoption by congresses: ASP-DAC, DAC, DATE, ISCAS, SPIE …. FCCM, FPGA (founded 1992), and FPL (founded 1991 at Oxford, UK): FPL 2002, La Grande Motte (Montpellier, France), Sept. 2 – 4 Paper Submission deadline : 15th March 2002 The International Conference on Field- programmable Logic and Applications Laboratoire d‘ Informatique, de Robotique et de Microélectronique de Montpellier Montpellier de The Largest

3 © 2001, University of Kaiserslautern 3 >> Introduction Conclusions Introduction FPGA boom Coarse Grain Architectures: rDPAs Programming rDPAs

4 © 2001, University of Kaiserslautern 4 The Impact of Reconfigurable Logic Reconfigurable platforms bring a new dimension to digital system development and have a strong impact on SoC design. A rapidly growing large user base of HDL-savvy designers with FPGA experience. Flexibility promises spin-around times downto minutes instead of months for real time in-system debugging, profiling, verification, tuning, field-maintenance, and field upgrades A New Business Model (in-field debugging and upgrading... ) A Fundamental Paradigm Shift in Silicon Application Revenue / month Time / months Update 1 Product Update 2 11020 ASIC Product reconfigurable Product with download 30 [T. Kean]

5 © 2001, University of Kaiserslautern 5 ? What’s coming next ? The History of Paradigm Shifts “Mainstream Silicon Application is switching every 10 Years” TTL µproc., memory “The Programmable System-on-a-Chip is the next wave“ custom standard 1957 1967 1977 1987 1997 2007 Makimoto’s Wave ASICs, accel’s LSI, MSI 1 st Design Crisis 2 nd Design Crisis ? reconfigurable Published in 1989

6 © 2001, University of Kaiserslautern 6 How’s next Wave ? 2007 FPGAs custom standard 1957 1967 1977 1987 1997 Tredennick’s Paradigm Shifts procedural programming algorithm: variable resources: fixed hardwired algorithm: fixed resources: fixed 2007 ? structural programming algorithm: variable resources: variable Coarse grain RAs no further wave ! Hartenstein’s Curve ? 4 th wave ?

7 © 2001, University of Kaiserslautern 7 The Impact of Makimoto’s Paradigm Shifts TTL µproc., memory custom standard ASICs, accel’s LSI, MSI reconfigurable 1957 1967 1977 1987 1997 2007 Procedural personalization via RAM-based Machine Paradigm Personalization (CAD) before fabrication structural personalization: RAM-based before run time Dr. Makimoto: FPL 2000 keynote Software Industry’s Secret of Success Repeat Success Story by new Machine Paradigm !

8 © 2001, University of Kaiserslautern 8 >> FPGA boom Introduction FPGA boom Coarse Grain Architectures (rDPAs) Programming rDPAs Conclusions & Future Developments

9 © 2001, University of Kaiserslautern 9 What is an FPGA ? single-length lines double-length lines S S S S L LL LL L LLL longlines S = Switch Box L = Logic Block Xilinx XC400E reconfigurable interconnect fabric L LL LL L LLL configurable logic blocks (CLBs)

10 © 2001, University of Kaiserslautern 10 Top 4 FPGA Manufacturers 2000 Xilinx 42% Altera 37% Lattice 15% Actel 6% Top 4 PLD Manufacturers 2000 total: $3.7 Bio [ Dataquest ] > $7 billion by 2003. "pre-fabricated" components and IP reuse for PLDs FPGAs going into every type of application – also SoC soon reach 50 million system gates / Chip PLD vendors provide libraries to support their products soft IPs Configware fastest growing semiconductor market segment killing the ASIC market improved design flow & libraries

11 © 2001, University of Kaiserslautern 11 Away from complex design flow User Code Compiler Executable Netlister Netlist Place and Route. Bitstream Schematics/ HDL [S. Guccione] use CPU for congfiguration management Compiler HLL [S. Guccione] HLL Compiler [S. Guccione] Compiler HLL [S. Guccione] Embedded CPU: Configware / Software Co-design is commonplace from HDL to HLL supporting....dynamically reconfigurable (RTR)

12 © 2001, University of Kaiserslautern 12 Configware as the Key Enabler Growing no. of independent configware houses (soft IP core vendors) and design services Xilinx AllianceCORE & Reference Design Alliance et al. Top FPGA vendors Currently the key innovators Design productivity and quality by configware libraries (soft IP cores) from various application areas. Cadence, Mentor, Synopsys just jumped in. Emerging separate EDA software market ( comparable to compiler / OS market in computers )

13 © 2001, University of Kaiserslautern 13 >> Coarse Grain Architectures Introduction FPGA boom Coarse Grain Architectures (rDPAs) Programming rDPAs Conclusions & Future Developments

14 © 2001, University of Kaiserslautern 14 Why coarse-grained ? S S S S resources needed for reconfigurability partly for configuration code storage L LL LL L LLL area used by application “hidden RAM” not shown Reconfigurability Overhead

15 © 2001, University of Kaiserslautern 15 Commercial rDPAs XPU family (IP cores): PACT Corp., Munich XPU128 **) bought ** flexible array: MorphICs CALISTO: Silicon Spice CS2000 family: Chameleon Systems MECA family: Malleable FIPSOC: SIDSA ACM: Quicksilver Tech CHESS array: Elixent MorphoSys: Morpho Tech * * *) here at SoC

16 © 2001, University of Kaiserslautern 16 Reconfigurable Interconnect Fabric separate routing area rDPA (Reconfigurable Datapath Array) rDPU RIF layouted over rDPUs: rDPA wired by abutment

17 © 2001, University of Kaiserslautern 17 KressArray Family generic Fabrics: a few examples Examples of 2 nd Level Interconnect: layouted over rDPU cell - no separate routing areas ! + rout-through and function rout- through only more NNports: rich Rout Resources Select Function Repertory select Nearest Neighbour (NN) Interconnect: an example 16328 24 4 2 rDPU Select mode, number, width of NNports Wired by Abutment

18 © 2001, University of Kaiserslautern 18 array size: 10 x 16 = 160 rDPUs SNN filter KressArray Mapping Example rout thru only not used backbus connect

19 © 2001, University of Kaiserslautern 19 It’s a General Paradigm Shift ! Using FPGAs (fine grain reconfigurable): just Logic Synthesis on a strange platform replaceConcurrent Processes by more efficient parallelism: Stream-based DPAs 1 **) reconfigurable and rDPAs 2 converging design flows 2 ) KressArray** [1995] and others [later] 1 ) systolic array* [1980] chip-on-a-day* [2000] [ Broderson ] Coarse Grain rDPAs (Reconfigurable Computing): a fundamental Paradigm Shift terms: DPU: datpath unit DPA: data path array rDPU: reconfigurable DPU rDPA: reconfigurable DPA Kress: a generalization of systolic array synthesis: super systolic synthesis ____ *) hardwired

20 © 2001, University of Kaiserslautern 20 compare Concurrent Computing DPU instruction sequencer DPU instruction sequencer DPU instruction sequencer DPU instruction sequencer.... Bus (es) or switch box CPU extremely inefficient control flow overhead instruction fetch / interpretation overhead address computation overhead - may be massive massive bottleneck phenomena at run time

21 © 2001, University of Kaiserslautern 21... with Stream-based Computing: (r) DPA for both, reconfigurable, and hardwired [ Brodersen ] DPU transport-triggered execution driven by data stream from / to memory or, from / to peripheral interface no instruction sequencer inside ! avoids run time overhead and bottleneck phenomena rDPA: drastically reduced reconfigurability overhead

22 © 2001, University of Kaiserslautern 22... let us compare Machine Paradigms: Concurrent vs. Stream-based Concurrent: separate own Instruction Sequencer within each CPU Programming: Data Scheduling Stream-based: Data Sequencers outside DPUs, outside array Programming: classical instruction set instruction fetch: at run time „instruction fetch“: at compile time

23 © 2001, University of Kaiserslautern 23 >> Programming rDPAs Introduction FPGA boom Coarse Grain Architectures (rDPAs) Programming rDPAs Conclusions & Future Developments

24 © 2001, University of Kaiserslautern 24 linear projection or algebraic mapping equations DPU architecture y + * x a computing in space placement y 1 0  y 2 0 y 3 0 - - - y 1 y 2 y 3 - - - x 1 x 2 x 3 - - - data streams Systolic Stream-based Computing System this dichotomy is completely ignored by our CS curricula computing in time systolic arrays etc. and other transformations migration by re-timing linear pipelines and uniform arrays only The Mathematician’s Synthesis Method no routing! Systolic Array [ H. T. Kung, 1980 ] : a DPA (Data Path Array)

25 © 2001, University of Kaiserslautern 25 2 General Stream-based Computing System heterogenous DPA or rDPA Scheduler Mapper expression tree DPU architectures y + * x a 1 simultaneous placement & routing 3 + ++ + * * * sh * xf - - data streams 4 The same mapper for both: Reconfigurable, or hardwired Kress DPSS [1995] simulated annealing free form pipe network

26 © 2001, University of Kaiserslautern 26 Architecture & Mapping Editor Statistics KressArray DPSS Datastream Generator HDL Generator Simulator Datapath Generator Delay & Power Estimator Improvement Proposal Generator KressArray (Design Space) Platform Space Explorer Application Set Xplorer User DPSS Source Input intermediate form

27 © 2001, University of Kaiserslautern 27 application not used Legend: an example by Nageldinger’s KressArray Xplorer Memory Communication Architecture … hot research topic in embedded systems storage context transformations [ Cathoor, Herz, Kougia, Soudris ] Synthesizable Memory Communication Architecture startups provide memory IP or generators sequencers memory ports Optimized Parallel Memory Controller GAG generic sequencer methodology vailable Herz

28 © 2001, University of Kaiserslautern 28... for a Stream-based Soft Machine Scheduler Memory (data memory) memory bank... rDPA Compiler Sequencers (data stream generator)

29 © 2001, University of Kaiserslautern 29 data counter program counter : state register Compiler Memory Datapath hardwired Sequencer Computer tightly coupled by compact instruction code “von Neumann” does not support soft data paths does not support soft data paths Datapath reconfigurable Xputer Scheduler Compiler Memory (multiple) sequencer Datapath Array University of Kaiserslautern loosely coupled by decision data bits only Xputer: The Soft Machine Paradigm reconfigurable Computer: the wrong Machine Paradigm “von Neumann” Fundamentals available (course on Wednesday) also for hardwired [ Broderson ]

30 © 2001, University of Kaiserslautern 30  Processor Co-Compilation partitioning compiler Computer Machine Paradigm Software running on Xputer “Soft” Machine Paradigm Configware running on GNU C compiler Analyzer / Profiler Hardware / Software Co-Design turns to Configware / Software Co-Design supporting different platforms Resource Parameters interface X-C compiler Reconfigurable Accelerators KressArray DPSS high level programming language source X-C Partitioner Jürgen Becker’s Co-DE-X Co-Compiler [ASP-DAC’95]

31 © 2001, University of Kaiserslautern 31 Loop Transformation Examples loop 1-8 body endloop loop 1-8 body endloop loop 9-16 body endloop fork join strip mining loop 1-4 trigger endloop loop 1-2 trigger endloop loop 1-8 trigger endloop reconf.array: host: loop 1-16 body endloop sequential processes: resource parameter driven Co-Compilation loop unrolling

32 © 2001, University of Kaiserslautern 32 History of Loop Transformations For Sequential Programs on Parallel Computers: David Loveman, 1977, Allen and Kennedy, etc.: Loop Unrolling, Loop Fusion, Strip Mining.... For memory communication: Michael Herz (2000): Multi-Level Loop Unrolling to reduce Memory Cycles needed to create RA Data Streams For parallel Datapaths: Jürgen Becker (1997): to Sequential to Super-Systolic Transformation Optimize Throughput of Reconfigurable Arrays (RAs) Instruction Code vs. Reconfiguration Code

33 © 2001, University of Kaiserslautern 33 >> Conclusions Introduction FPGA boom Coarse Grain Architectures (rDPAs) Programming rDPAs Conclusions & Future developments

34 © 2001, University of Kaiserslautern 34 FPGA CPUs soft CPU FPGA Memory core FPGA Compiler HLL corearchitectureplatform MicroBlaze 125 MHz 70 D-MIPS 32 bit st‘d RISC 32 reg. by 32 LUT RAM-based reg. Xilinx up to 100 on one FPGA Nios16-bit instr. setAltera Mercury Nios 50 MHz 32-bit instr. setAltera 22 D-MIPS Nios8 bitAltera Mercury gr104016-bit gr105032-bit My80i8080AFLEX10K30 or EPF6016 DSPuva1616 bit DSPSpartan-II corearchitectureplatform Leon 25 MhzSPARC ARM7 cloneARM uP1232 8-bitCISC, 32 reg.200 XC4000E CLBs REGIS8 bit instr.2 Xilinx 3020 LCA Reliance-112 bit DSPLattice 4 isp30256, 4 isp1016 1Popcorn-18 bit CISCAltera, Lattice, Xilinx Acorn-11 Flex 10K20 YARD-1A16-bit RISCold Xilinx FPGA Board xr16RISC integer CSpartanXL UCSC: 1990! Märaldalen University, Eskilstuna, Sweden Chalmers University, Göteborg, Sweden Cornell University Hiroshima City University, Japan Tokai University, Japan Universidad de Valladolid, Spain Washington University, St. Louis Gray Research Georgia Tech Michigan State Virginia Tech New Mexico Tech UC Riverside academic FPGA CPUs

35 © 2001, University of Kaiserslautern 35 Soft rDPA ? Memory soft CPU miscellanous softDPUarraysoftDPUarray HLL Compiler Rapid technology progress 50 mio system gates soon FPGAs f. relocateble configware code ? Compatibility at configuration code level ? Slower clock: compensated by more parellelism Even large rDPAs as a soft IP become feasible By >2005: don’t care about area efficiency ?

36 © 2001, University of Kaiserslautern 36 Main problems to be solved object code compatibility Dominant FPGA vendor needs: widely accepted OS & tools most software written for it most configware written for it conf‘w. object code compatibility widely accepted „OS“ & tools Most successful µprocessor: de facto standard configware libraries configw. code compatibility by de facto standard RC platform family scalable FPGA architectures supp‘n relocatable configuration code computing in space computing in time systolic arrays etc. widely spread dichotomy and FPGA awareness curricular innovations are urgently needed compilers to avoid needing HDL-savvy users FPGA-based de facto Standards: Education: relocatable code scalable memory important:

37 © 2001, University of Kaiserslautern 37 However, current CS Education …. Hardware invisible: under the surface … is based on the Submarine Model Brain usage: procedural-only Software Faculty Colleagues shy away from the Paradigm Shift: their Brain hurts? - can’t be: this Half has been amputated Algorithm Assembly Language procedural high level Programming Language Hardware Software This model disables...

38 © 2001, University of Kaiserslautern 38 Hardware, Configware Hardware and Software as Alternatives Algorithm Software partitioning Software only Software & Hardw/Configw procedural structural Brain Usage: both Hemispheres Hardw/Configw only

39 © 2001, University of Kaiserslautern 39 The Dominance of the Submarine Model... Hardware... indicates, that our CS education system produces zillions of mentally disabled Persons (procedural) structurally disabled … completely disabled to cope with solutions other than software only It‘s time to attack the software faculty dictatorship. Get involved!

40 © 2001, University of Kaiserslautern 40 >>> thank you thank you for listening

Download ppt "Enabling Technologies for System-on-Chip Development Reconfigurable Computing Architectures and Methodologies for System-on-Chip Monday, November 19, 10:15."

Similar presentations

Ads by Google