Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 1 Coordinator of the European Project Permanent Staff Researcher (part time) Istituto.

Similar presentations


Presentation on theme: "Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 1 Coordinator of the European Project Permanent Staff Researcher (part time) Istituto."— Presentation transcript:

1 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 1 Coordinator of the European Project Permanent Staff Researcher (part time) Istituto Nazionale di Fisica Nucleare Roma – Italy 4 generations of APE Massive Parallel Processors based on custom VLIW DSP (part time) Technology Director and Founder ATMEL Roma, corporate design center of ATMEL for Advanced DSP 2 generations of RISC + custom mAgic VLIW DSP MPSoC SHAPES: a tiled scalable Software Hardware Architecture Platform for Embedded Systems - HW overview P.S. Paolucci (INFN and Atmel Roma) for the SHAPES Consortium

2 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 2 Deep Sub-micron Architectures… l ~ 160 MGate available on a 100 mm2 chip (45nm CMOS, 2008) l Increasing GATES/CHIP  Design Complexity Management:  embedded processors use a few million gates only, IP reuse possible and needed; l WIRING threatens Moore’s law:  Wiring delay increases on new CMOS silicon generations  The full chip cannot be reached in a single clock cycle  Classic monolithic processor architectures do not scale  Locally Synchronous, Globally Asynchronous needed  Communication Centric SW and HW Architecture needed l … PROPOSED SOLUTION: … TILED ARCHITECTURE…BY SIMPLE GEOMETRIC DEMONSTRATION… IF CONSTANT LOGIC COMPLEXITY INSIDE EACH TILE… THEN (LENGTH OF INTRA-TILE WIRES SCALE DOWN AS THE TILE ITSELF… AND SHORT ~ FIRST NEIGHBOURS ON-CHIP AND OFF-CHIP INTER-TILE WIRES) l QUEST OF BEST TILE, ON-CHIP AND OFF-CHIP INTERCONNECT. BUT HOW TO PROGRAM? EXPLICIT PARALLEL PROGRAMMING PARADIGM, and CULTURE NEEDED l POWER DISSIPATION density approaching prohibitive values if higher clock speed used; much better Oper/Watt at moderate clock + parallelism (the human brain parallel architecture performs an excellent job at 50 HZ!... room for improvement)

3 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 3 Tiled HW Architecture l Communication Centric, not Processor Centric l Homogeneous SW interface for on-chip and off-chip scalable connection and I/O l 3D first-neighbour Toroidal System Eng. (3DT) for Off-Chip communication l Virtual tunnelling on packed switching NoC (Network on Chip) and off- chip 3DT l Parallelism Aware System SW: Manage memory distribution, capture real time constraints l Explicit parallel programming/Network of Actors FPGAFPGA DAC actuator Tile OFF-CHIP MEM Tile OFF-CHIP MEM Tile OFF-CHIP MEM Tile OFF-CHIP MEM DSP DNP Multi-Layer BUS NoC RISC POT 3DT Off-chip communication DXM Tile ADC sensor DAC actuator ADC sensor ADC/DAC

4 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 4 HW GLOSSARY HW GLOSSARY AT THE CHIP LEVEL l MTC: Multiple Tile Chip (composed of multiple Tiles) l NOC: Network On Chip (connecting Tiles) l 3DT: 3 Dim Toroidal Connection (outside the chip ) FUNDAMENTAL TYPE OF TILE l RDT includes:  RISC: (includes on chip memories RDM and RPM) +  DSP(includes on chip memories DDM and DPM)  DNP + DXM (off-chip mem) + POT (e.g. DAC/ADC conv) POSSIBLE TILE VARIANTS (subset of RDT) l RET := RDT minus DSP l DET:= RDT minus RISC l DDT:= DET minus DXM INSIDE THE TILE l Multilayer Bus Matrix sustains multiple simultaneous transfers l RISC max one per tile l DSP one or more per tile l DNP: Distributed Network Processor (always one per tile) l DDM: on-chip Distributed Data Mem (inside the DSP) l DPM: on-chip Distributed Progr Mem (inside the DSP) l DXM: Distributed eXternal Mem Interface (max one per tile, outside the RISC and DSP) l POT: Peripherals On Tile l RDM: Risc (tightly coupled) Data Memory l RPM: Risc (tightly coupled) Program Memory l RCM: Risc Cache Memory

5 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 5 Different Types of Tiles DSP DNP Multi-Layer BUS NoC RISC POT 3DT DXM DNP Multi-Layer BUS NoC RISC POT 3DT DXM DNP Multi-Layer BUS NoC DSP POT 3DT DXM RDT: RISC + DSP Elementary Tile RET: RISC Elementary Tile DET: DSP Elementary Tile RDT RET DET DXM Mem Bus POT Pads DXM Mem Bus POT Pads

6 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 6 Distributed Network Processor DNP: Interface DNP BUS Master (to read from intra-tile memories) BUS Slave (to receive commands from RISC & DSP) 3DT X+ (to forward/receive inter-tile off chip packets) 3DT X- 3DT Y+ 3DT Y- 3DT Z+ 3DT Z- NoC (to forward/receive inter-tile on-chip packets) BUS Master (simultaneous intra-tile memory write) Collective communication

7 Atmel Roma vs SHAPES and CASTNESS 7 Pier Stanislao Paolucci – INFN and Atmel Roma – SHAPES HW Overview mAgicV IP Architecture (Fully C programmable Gigaflops VLIW DSP)

8 Atmel Roma vs SHAPES and CASTNESS 8 Pier Stanislao Paolucci – INFN and Atmel Roma – SHAPES HW Overview Silicon Floorplan of Diopsis 940: RISC + mAgicV VLIW DSP DSP Reg File DSP Data Mem (DDM) DSP Prog Mem (DPM) DSP Logic ARM926 ARM RDM Peripherals AMBA Multilayer

9 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 9 The tile: Diopsis + DNP

10 Atmel Roma vs SHAPES and CASTNESS 10 Pier Stanislao Paolucci – INFN and Atmel Roma – SHAPES HW Overview Tile Complexity mAgicV DSP:  915 Kgates + 1 Mbit Prog Mem + 640 Kbit Data Mem ARM926 & peripherals  <2 equivalent Mgate (including 640 Kbit mem) Tile Complexity   4230 equivalent Kgate + DNP gate count -including on chip memories

11 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 11 Spidergon topology It’s a family of regular/symmetric topologies We look for a complexity/performance trade-off Low degree (router cost) Low number of links (wire cost) Symmetry (homogeneous building blocks; simple routing) Low diameter (performance) Good scalability (small network size granularity)

12 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 12 STNoC key components Network on Chip is a set of on-chip routers (up to layer 3), Network Interfaces (NI) (layer 4) and physical Link NI router IP link

13 Atmel Roma vs SHAPES and CASTNESS 13 Pier Stanislao Paolucci – INFN and Atmel Roma – SHAPES HW Overview Industrial Interest for Multi Tile Systems-on-Chip Large silicon area is needed for high Average Selling Price/unit: Multiple tile -> the way to design chips area >= 50mm 2 on future technologies -> Industrial Interest of Semiconductor Manufacturers to avoid a low profit commodities-like industry $ and # of embedded processors / persons increasing faster than conventional processors / persons  # of (phones, games, pdas, cars, home, medical, wearable) vs PC Collision/convergence on architectures is going to happen:  Because of changes on key driving markets  Because full systems can be integrated on a chip  Because of deep submicron technological facts: -WIRING, COMPLEXITY, POWER This time, …we are not in 1980, when a simpler solution was achievable through higher clock rate and monolithic architectures…we need multi-processor parallelism. European Tiles HW Research background exists to help, for example INFN solved the wiring problem on cubic meters using tiles + first neighbours 3D toroidal point-to-point physical links. Today we have face the same problem also for on-chip communications Embedded Systems versus Classical Computing

14 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 14 HW Background: Istituto Nazionale Fisica Nucleare APE family of Massive Parallel custom Very Long Instruction Word Floating-Point Procs. + 3D first neighbour toroidal communication for Numerical Physics Simulations APE (1984-1988) APE100 (1988-1993) APEmille (1994-1999) apeNEXT (2000-2005) Architecture SIMD SIMD++ # nodes 162048 4096 Topology flexible 1Drigid 3Dflexible 3D Aggregated memory 256 MB8 GB64 GB1 TB # registers (w.size) 64 (x32)128 (x32)512 (x32)512 (x64) LOW Clock frequency 8 MHz25 MHz66 MHz200 MHz Comp. Power/node 64 Mflops50 Mflops528 Mflops1600 Mflops Aggregated Comp. Power 1 GFlops100 GFlops1 TFlops7 TFlops

15 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 15 APENext (2005) 2048 processor system, VLIW processors manufactured by ATMEL

16 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 16 APEmille (1999) – 1 TFlops l l 2048 VLSI processing nodes l l SIMD, synchronous communications l l Fully integrated ”Host computer”, 64 PCs cPCI based Computing node “Processing Board” (PB) 8 nodes, 4GFlops “Torre” 32 PB, 128GFlops

17 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 17 APE100 (1993) - 100 GFlops PB (8 nodes) ~ 400 MFlops

18 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 18 …from INFN APE to Atmel Roma MPSoC tile… l 1997- 2001  Spin-off from INFN and Creation of IPITEC start-up (Intellectual Property Initiative for Tools and Embedded Cores) – (P.S. Paolucci, B. Altieri) l 2002-2004  mAgic VLIW DSP synthesizable core  IPITEC becomes ATMEL Roma Advanced DSP Products ATMEL l 2003: Diopsis 740: 1 Gigaflops mAgic DSP + Arm7 l 2007: Diopsis 940: 1 Gigaflops mAgicV DSP (fully C programmable) + Arm9 l 2007-2009: Shapes Tile l Diopsis 740 tile: A gigaflops VLIW+RISC SoC Tile - HotChips 15 Conference – Stanford (2003) … toward SHAPES tile

19 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 19 SW challenges from Tiled Architectures l Facilitate expression of parallelism: e.g. Network of Actors l Express real time constraints in a formal manner, feature missing in classical languages. This is a key cultural point!!! l Avoid destroying information about available algorithm parallelism l Compilation chain must fully aware of key architectural parameters: bandwidth, computational power, pipeline and latencies l Exploit memory locality – efficient management of Distributed Memories – get rid of classical chaches l Manage Long delays between distant tiles l Reduce Hot Spots in communications l Reduce RTOS overhead (time and memory footprint) - Networked RTOS? Minimal RTOS on each tile? l Introduce Hardware dependent Software and Hardware Abstraction Layers l Capture scalability in a library of characterized SW/HW components l Support for (semi)-automation of iterative design over HW, SW, Appl l Monitor quality and real-time constraints on real HW and Simulators l Simulation speed of multi-tiled architectures

20 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 20 Research Lines System SW ETH Zurich - Distributed Operation Layer; manage application parallelism RWTH Aachen Univ. - Simulation of Heterogeneous Multi Proc. Systems TIMA Lab and THALES - Hardware dependent Software Layer and RTOS TARGET Compiler Tech. - Retargetable VLIW Compilers System HW ATMEL Roma - Tile: – Evolution of (Diopsis TM : mAgicV VLIW DSP TM + RISC) + INFN DNP TM INFN Roma – DNP TM Distributed Network Processor + 3D Toroidal Eng.: – Evolution of APE Massive Parallel Processors STMicrolectronics + Univ. of Cagliari and Pisa – – Evolution of Spidergon TM Packet Switching Network on Chip Parallel application benchmarking Fraunhofer IDMT,IGD - Audio Wave Field Synthesis and Graphic Algorithm PIE, MedCom - Ultrasound scanner INFN - Physical Modelling Scalable Software Hardware Architecture Platform for Embedded Systems (2006-2009) FP6-IST – Future Emerging Technologies- Advanced Computer Architecture Project

21 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 21 SW Environment – Summary of Working Principles SW Environment – Summary of Working Principles Model Based Application Description –Network of Interacting Components incl. non-functional constraints, analytical predictions and run- time profiling Distributed Operation Layer –Maps components on Processing and Networking Resources –Stepwise approach to semi- automated mapping: By hand, assisted by simulation, run-time profiling and analytical models By algorithms for automated multi-objective randomised search Target Applications –Extensive inherent parallelism Optimised compilation on tiles and comms network Distributed Operation Layer hardware platform specification Simulator trace information Model Compiler component interaction, properties and constraints component source code mapping information HdS Generator HdS source code Compiler component binary HdS binary Link Dispatch OS services binary glue binary Mapping Memory mapping RTOS application specs

22 Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 22 Tiled HW Architecture Communication Centric, not Processor Centric Homogeneous SW interface for on-chip and off-chip scalable connection and I/O 3D first-heighbour Toroidal System Eng. (3DT) for Off-Chip communication Virtual tunnelling on packed switching NoC (Network on Chip) and off-chip 3DT Parallelism Aware System SW: Manage memory distribution Explicit parallel programming/Network of Actors FPGAFPGA DAC actuator Tile OFF-CHIP MEM Tile OFF-CHIP MEM Tile OFF-CHIP MEM Tile OFF-CHIP MEM DSP DNP Multi-Layer BUS NoC RISC POT 3DT Off-chip communication DXM Tile ADC sensor DAC actuator ADC sensor ADC/DAC


Download ppt "Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 1 Coordinator of the European Project Permanent Staff Researcher (part time) Istituto."

Similar presentations


Ads by Google