Scalable Software Hardware Architecture Platform for for Embedded Systems SHAPES at DATE 2007 Pier Stanislao PAOLUCCI chief technical officer – ATMEL Roma.

Slides:



Advertisements
Similar presentations
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Advertisements

Overview: Chapter 7  Sensor node platforms must contend with many issues  Energy consumption  Sensing environment  Networking  Real-time constraints.
MotoHawk Training Model-Based Design of Embedded Systems.
Addressing the System-on-a-Chip Interconnect Woes Through Communication-Based Design N. Vinay Krishnan EE249 Class Presentation.
Chapter 13 Embedded Systems
Chapter 13 Embedded Systems Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
MINIMISING DYNAMIC POWER CONSUMPTION IN ON-CHIP NETWORKS Robert Mullins Computer Architecture Group Computer Laboratory University of Cambridge, UK.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
ARTIST2 Network of Excellence on Embedded Systems Design cluster meeting –Bologna, May 22 nd, 2006 System Modelling Infrastructure Activity leader : Jan.
Trend towards Embedded Multiprocessors Popular Examples –Network processors (Intel, Motorola, etc.) –Graphics (NVIDIA) –Gaming (IBM, Sony, and Toshiba)
Network-on-Chip: Communication Synthesis Department of Computer Science Texas A&M University.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
Efficient Hardware dependant Software (HdS) Generation using SW Development Platforms Frédéric ROUSSEAU CASTNESS‘07 Computer Architectures and Software.
Computer performance.
1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.
Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.
Introduction to Interconnection Networks. Introduction to Interconnection network Digital systems(DS) are pervasive in modern society. Digital computers.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
CASTNESS‘11 Computer Architectures and Software Tools for Numerical Embedded Scalable Systems Workshop & School: Roma January 17-18th 2011 Frédéric ROUSSEAU.
SHAPES scalable Software Hardware Architecture Platform for Embedded Systems Hardware Architecture Atmel Roma, INFN Roma, ST Microelectronics Grenoble,
A Fast On-Chip Profiler Memory Roman Lysecky, Susan Cotterell, Frank Vahid* Department of Computer Science and Engineering University of California, Riverside.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
A Reconfigurable Processor Architecture and Software Development Environment for Embedded Systems Andrea Cappelli F. Campi, R.Guerrieri, A.Lodi, M.Toma,
Juha-Pekka Soininen Systems on Chip Workshop Villach, Austria, Part V: A NOC Design Methodology Juha-Pekka Soininen VTT Electronics Oulu, Finland.
Extreme Makeover for EDA Industry
REXAPP Bilal Saqib. REXAPP  Radio EXperimentation And Prototyping Platform Based on NOC  REXAPP Compiler.
High-Level Interconnect Architectures for FPGAs An investigation into network-based interconnect systems for existing and future FPGA architectures Nick.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Automated Design of Custom Architecture Tulika Mitra
Section 10: Advanced Topics 1 M. Balakrishnan Dept. of Comp. Sci. & Engg. I.I.T. Delhi.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Configurable, reconfigurable, and run-time reconfigurable computing.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
TEMPLATE DESIGN © Hardware Design, Synthesis, and Verification of a Multicore Communication API Ben Meakin, Ganesh Gopalakrishnan.
F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet Grenoble Cedex - France Embedded Memory Wrapper Generation.
© 2012 xtUML.org Bill Chown – Mentor Graphics Model Driven Engineering.
Veronica Eyo Sharvari Joshi. System on chip Overview Transition from Ad hoc System On Chip design to Platform based design Partitioning the communication.
MILAN: Technical Overview October 2, 2002 Akos Ledeczi MILAN Workshop Institute for Software Integrated.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) Reconfigurable Architectures Forces that drive.
A few issues on the design of future multicores André Seznec IRISA/INRIA.
A Systematic Approach to the Design of Distributed Wearable Systems Urs Anliker, Jan Beutel, Matthias Dyer, Rolf Enzler, Paul Lukowicz Computer Engineering.
Network On Chip Platform
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
Development of Programmable Architecture for Base-Band Processing S. Leung, A. Postula, Univ. of Queensland, Australia A. Hemani, Royal Institute of Tech.,
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
ECE 526 – Network Processing Systems Design Network Processor Introduction Chapter 11,12: D. E. Comer.
Axel Jantsch 1 Networks on Chip Axel Jantsch 1 Shashi Kumar 1, Juha-Pekka Soininen 2, Martti Forsell 2, Mikael Millberg 1, Johnny Öberg 1, Kari Tiensurjä.
Pier Stanislao Paolucci - Atmel Roma and INFN Roma - SHAPES HW Overview 1 Coordinator of the European Project Permanent Staff Researcher (part time) Istituto.
SOC Virtual Prototyping: An Approach towards fast System- On-Chip Solution Date – 09 th April 2012 Mamta CHALANA Tech Leader ST Microelectronics Pvt. Ltd,
CASTNESS'07 Objectives 1 CASTNESS’07 1/3 - Objectives promote development of European reference platform for Numerical Embedded and Scalable Systems: Mastery.
Interconnection network network interface and a case study.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
System-on-Chip Design Hao Zheng Comp Sci & Eng U of South Florida 1.
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
1 - CPRE 583 (Reconfigurable Computing): Reconfigurable Computing Architectures Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
1 of 14 Lab 2: Formal verification with UPPAAL. 2 of 14 2 The gossiping persons There are n persons. All have one secret to tell, which is not known to.
1 of 14 Lab 2: Design-Space Exploration with MPARM.
Aditya Dayal M. Tech, VLSI Design ITM University, Gwalior.
Microprocessor Design Process
System-on-Chip Design
Andreas Hoffmann Andreas Ropers Tim Kogel Stefan Pees Prof
ECE354 Embedded Systems Introduction C Andras Moritz.
A High Performance SoC: PkunityTM
HIGH LEVEL SYNTHESIS.
Introduction to Heterogeneous Parallel Computing
Presentation transcript:

Scalable Software Hardware Architecture Platform for for Embedded Systems SHAPES at DATE 2007 Pier Stanislao PAOLUCCI chief technical officer – ATMEL Roma & (part-time) permanent staff researcher – INFN Roma for the SHAPES Consortium

January, Introduction to SHAPES2 Project Motivation and Final Objective SHAPES Acronym: Scalable Software Hardware Architecture Platform for Embedded Systems Objective: Develop a prototype of Tiled Scalable HW & SW architecture for embedded applications characterized by inherent parallelism Experiment: “Small” Tiles (<10 MGate) connected by “short wires” weaving a packet switching on-chip and off-chip network  The HW architecture should scale on next deep-submicron technologies Challenges: how to program a tiled architecture Benchmarks  multi-loudspeaker multi-source wave field synthesis,  Multi-microphone voice extraction from noise on multi-microphone  Ultrasound scanners  Physical modelling of quantum chromo dynamics

January, Introduction to SHAPES3 HW HW Objectives  maintain profitable average selling prices  control NRE by IP reuse HW Solution  appropriate granularity: “Small” Tiles (<10 MGate) connected by “short (first neighbours) wires”  Inside the typical elementary Tile: Fully C programmable VLIW DSP for computing + RISC for control + Distributed Network Processor (a kind of generalized inter-tile DMA controller) for inter-tile communication   multi-tile Silicon area >40mm 2 <90mm 2  management of logic & place & route complexity through IP reuse  multi-level network Intra-tile: multi-layer bus matrix Inter-tile: NoC (intra-chip) + 3DT (inter-chip)  distributed routing fabric connects on-chip and off-chip tiles weaving a packet switching network

January, Introduction to SHAPES4 SW  Communication centric, real-time aware programming environment Application description: model based with explicit annotation of real-time constraints Provide automated optimized binding of processes to computing resources and binding of inter-process communication on communication resources + scheduling of processes and their communication Provide automated generation of hardware dependent software support Retargetable compilation managing intra-tile and inter- tile parallelism, bandwidth and latencies Fast simulation

January, Introduction to SHAPES5 Consortium Composition and Roles of the Partners System SW ETH Zurich - Distributed Operation Layer: manages application parallelism TIMA Lab and THALES - Hardware dependent Software Layer and RTOS TARGET Compiler Tech. - Retargetable Compilers RWTH Aachen Univ. – Fast Simulation of Heterogeneous Multi Proc. Systems System HW ATMEL Roma - Tile: Evolution of (Diopsis®: mAgicV VLIW DSP TM + RISC) + INFN DNP TM INFN Roma - DNP TM Distributed Network Processor + 3D Toroidal Eng.: Evolution of APE Massive Parallel Processors STMicrolectronics + Univ. of Cagliari and Pisa – Network on Chip: Evolution of Spidergon TM Packet Switching Network on Chip Parallel Application benchmarking Fraunhofer IDMT – multi-loudspeaker Audio Wave Field Synthesis ESAOTE, MedCom, Fraunhofer IGD - Ultrasound scanner INFN - Physical Modelling ATMEL – multi-microphone arrays for voice-extraction

January, Introduction to SHAPES6 Deep Sub-micron Architectures… ~ 160 MGate available on a 100 mm2 chip (45nm CMOS, 2008) Increasing GATES/CHIP  Design Complexity Management:  embedded processors use a few million gates only, IP reuse possible and needed; WIRING threatens Moore’s law:  Wiring delay increases on new CMOS silicon generations  The full chip cannot be reached in a single clock cycle  Classic monolithic processor architectures do not scale  Locally Synchronous, Globally Asynchronous needed  Communication Centric SW and HW Architecture needed … PROPOSED SOLUTION: … TILED ARCHITECTURE…BY SIMPLE GEOMETRIC DEMONSTRATION… IF CONSTANT LOGIC COMPLEXITY INSIDE EACH TILE… THEN (LENGTH OF INTRA-TILE WIRES SCALES DOWN AS THE TILE ITSELF… AND SHORT ~ FIRST NEIGHBOURS ON- CHIP AND OFF-CHIP INTER-TILE WIRES) QUEST OF BEST TILE, ON-CHIP AND OFF-CHIP INTERCONNECT. BUT HOW TO PROGRAM? EXPLICIT PARALLEL PROGRAMMING PARADIGM, and CULTURE NEEDED POWER DISSIPATION density approaching prohibitive values if higher clock speed used; much better Oper/Watt at moderate clock + parallelism (the human brain parallel architecture performs an excellent job at 50 HZ!... room for improvement)

January, Introduction to SHAPES7 Distributed Network Processor DNP: a generalized DMA controller for inter-tile or intra- tile packet routing DNP BUS Master (to read from intra-tile memories) BUS Slave (to receive commands from RISC & DSP) 3DT X+ (forward/receive inter-tile OFF-CHIP packets) 3DT X- 3DT Y+ 3DT Y- 3DT Z+ 3DT Z- NoC (to forward/receive inter-tile ON-CHIP packets) BUS Master (simultaneous intra-tile memory write) Collective communication

January, Introduction to SHAPES8 Different Types of Tiles DSP DNP Multi-Layer BUS NoC RISC POT 3DT DXM DNP Multi-Layer BUS NoC RISC POT 3DT DXM DNP Multi-Layer BUS NoC DSP POT 3DT DXM RDT: RISC + DSP Elementary Tile RET: RISC Elementary Tile DET: DSP Elementary Tile RDT RET DET DXM Mem Bus POT Pads DXM Mem Bus POT Pads

January, Introduction to SHAPES9 HW GLOSSARY HW GLOSSARY AT THE CHIP LEVEL MTC: Multiple Tile Chip (composed of multiple Tiles) NOC: Network On Chip (connecting Tiles) 3DT: 3 Dim Toroidal Connection (outside the chip ) FUNDAMENTAL TYPE OF TILE RDT includes:  RISC: (includes on chip memories RDM and RPM) +  DSP(includes on chip memories DDM and DPM)  DNP + DXM (off-chip mem) + POT (e.g. DAC/ADC conv) POSSIBLE TILE VARIANTS (subset of RDT) RET := RDT minus DSP DET:= RDT minus RISC DDT:= DET minus DXM INSIDE THE TILE Multilayer Bus Matrix sustains multiple simultaneous transfers RISC max one per tile DSP one or more per tile DNP: Distributed Network Processor (always one per tile) DDM: on-chip Distributed Data Mem (inside the DSP) DPM: on-chip Distributed Progr Mem (inside the DSP) DXM: Distributed eXternal Mem Interface (max one per tile, outside the RISC and DSP) POT: Peripherals On Tile RDM: Risc (tightly coupled) Data Memory RPM: Risc (tightly coupled) Program Memory RCM: Risc Cache Memory

January, WP RISC+ VLIW DSP + DNP Tile10 mAgicV IP Architecture (Fully C programmable Gigaflops VLIW DSP)

January, WP RISC+ VLIW DSP + DNP Tile11 Tile Complexity estimated through Synthesis & Place & Route trials mAgicV DSP:  915 Kgates + 1 Mbit Prog Mem Kbit Data Mem ARM926 & peripherals  <2 equivalent Mgate (including 640 Kbit mem) Tile Complexity   4230 equivalent Kgate + DNP gate count including on chip memories

January, WP RISC+ VLIW DSP + DNP Tile12 Silicon Floorplan Trial of RISC + mAgicV VLIW DSP Tile DSP Reg File DSP Data Mem (DDM) DSP Prog Mem (DPM) DSP Logic ARM926 ARM RDM Peripherals AMBA Multilayer

January, Introduction to SHAPES13 Spidergon NoC topology It’s a family of regular/symmetric topologies We look for a complexity/performance trade-off Low degree (router cost) Low number of links (wire cost) Symmetry (homogeneous building blocks; simple routing) Low diameter (performance) Good scalability (small network size granularity)

January, Introduction to SHAPES14 STNoC key components Network on Chip is a set of on-chip routers (up to layer 3), Network Interfaces (NI) (layer 4) and physical Link NI router IP link

January, Introduction to SHAPES15 Industrial Interest for Multi Tile Systems-on- Chip Large silicon area is needed for high Average Selling Price/unit: Multiple tile -> the way to efficient design of chips area > 40mm 2 on future technologies -> Industrial Interest of Semiconductor Manufacturers to avoid a low profit commodities-like industry $ and # of embedded processors / persons increasing faster than conventional processors / persons  # of (phones, games, pdas, cars, home, medical, wearable) vs PC Collision/convergence on architectures is going to happen:  Because of changes on key driving markets  Because full systems can be integrated on a chip  Because of deep submicron technological facts: WIRING, COMPLEXITY, POWER This time, …we are not in 1980, when a simpler solution was achievable through higher clock rate and monolithic architectures…we need multi-processor parallelism Embedded Systems versus Classical Computing

January, Introduction to SHAPES16 Background: APENext (2005) 2048 processor system, VLIW processors designed by INFN, manufactured by ATMEL

January, Introduction to SHAPES17 HW Background: Istituto Nazionale Fisica Nucleare APE family of Massive Parallel custom Very Long Instruction Word Floating- Point Procs. + 3D first neighbour toroidal communication for Numerical Physics Simulations APE ( ) APE100 ( ) APEmille ( ) apeNEXT ( ) Architecture SIMD SIMD++ # nodes Topology flexible 1Drigid 3Dflexible 3D Aggregated memory 256 MB8 GB64 GB1 TB # registers (w.size) 64 (x32)128 (x32)512 (x32)512 (x64) LOW Clock frequency 8 MHz25 MHz66 MHz200 MHz Comp. Power/node 64 Mflops50 Mflops528 Mflops1600 Mflops Aggregated Comp. Power 1 GFlops100 GFlops1 TFlops7 TFlops

January, Introduction to SHAPES18 SW challenges from Tiled Architectures Facilitate expression of parallelism: e.g. Network of Actors Express real time constraints in a formal manner, feature missing in classical languages. This is a key cultural point!!! Avoid destroying information about available algorithm parallelism Compilation chain must fully aware of key architectural parameters: bandwidth, computational power, pipeline and latencies Exploit memory locality – efficient management of Distributed Memories – get rid of classical caches Manage Long delays between distant tiles Reduce Hot Spots in communications Reduce Tiled RTOS overhead (time and memory footprint) Introduce Hardware dependent Software and Hardware Abstraction Layers Capture scalability in a library of characterized SW/HW components Support for (semi)-automation of iterative design over HW, SW, Appl Monitor quality and real-time constraints on real HW and Simulators Simulation speed of multi-tiled architectures

January, Introduction to SHAPES19 SW Architecture Optimised compilation on tiles and comms network Distributed Operation Layer hardware platform specification Simulator trace information Model Compiler component interaction, properties and constraints component source code mapping information HdS Generator HdS source code Compiler component binary HdS binary Link Dispatch OS services binary glue binary Mapping Memory mapping RTOS application specs

January, Introduction to SHAPES20 Distributed Operation Layer – Application Specification Two parts: Application structure level  processes  FIFO SW channels between processes  interconnection between processes Behavior of each process  process’ internals.c ….xml schema definition available AB C

January, Introduction to SHAPES21 Virtual SHAPES Platform (VSP) Enable early software development Explore different tile configurations Binary compatible with the SHAPES hardware Debugging capability Export performance information Scalability to multiple tiles VSP DOL Applications RTOSHdS HW SHAPES SW and app partners

January, Introduction to SHAPES22 VSP-DOL interfacing

January, Introduction to SHAPES23 TARGET Compiler Core related requirements TILE OFF- CHIP MEM TILE OFF- CHIP MEM TILE OFF- CHIP MEM TILE OFF- CHIP MEM mAgicV DSP ARM uP uP MEM COMM I/F DSP DATA MEM COMM I/F REG FILE DSP PROG MEM Communication related requirements Conv2 Div2 Sh/Log2 Conv1 Div1 Sh/Log1 RF FP/I * * RF FP/I * * Mul1 Mul2Mul4Mul3 Cadd1Cadd2 Add1Add2 Min Max2 Min Max1 P6_0 P5_0 P4_ 0 P3_0 P2_0 P6_1 P4_1 P5_1 P2_1 P3_1 Core_bus5 Core_bus7 Core_bus5 Core_bus7 mAgicV PCU PROGRAM MEMORY INSTR. DECODER INSTR. SEQUEN- CER DECOM- PACTION INTERRUPT CON- TROLLER INSTR. DECODER mAgicV core Support of VLIW instruction compaction Phase coupling: reg. allocation  SW pipelining Support of predicated execution Functional unit assignment for clustered VLIWs Communication latency aware scheduling Intra-tile multi- core on-chip debugging Inter-tile communication using DNP

January, Introduction to SHAPES24 TIMA - HdS & RTOS - Principles Hardware dependent Software: software directly dependent on the underlying hardware Communication differentiation  Intra-subsystem & inter-subsystem communications Networked operating system: Application HdS API RTOS (RT Linux) COMM HAL ARM ARM Subsystem Application HdS API MonitorCOMM HAL DSP DSP Subsystem HdS SW HW SW HW

January, Introduction to SHAPES25 SW Architecture hardware platform specification simulation environment (RWTH) WP 1.4 trace information model compiler (ETHZ, RWTH) WP 1.11, WP 1.4 component interaction, properties and constraints component source code mapping information HdS generator (TIMA) WP 1.10 HdS source code Compiler (TARGET) WP 1.9 component binary HdS binary Link Dispatch (TARGET) WP 1.9 OS services binary glue binary mapping (ETHZ) WP 1.11 Memory mapping RTOS (TIMA, THALES) WP 1.10 application specification

January, Introduction to SHAPES26 SHAPES SW Architecture: challenges High-level exploration, mapping, and simulation:  What is the degree of available parallelism? How can it be exposed to the mapping stage? What is suitable model-based specification formalism? What adaptations are necessary in order to expose the inherent parallelism?  Define a common Profiling Trace Interface (PTI) over which information can be exchanged. Hardware-dependent software and operation system:  To use the provided features of the HdS (i.e. platform abstraction) a generic interface API has to be defined. Compiler technology:  Modeling low-latency communication interfaces in the C source code that is the input for the C compiler, for the computational tiles.  Investigate how HdS can be modeled entirely in C source code, to be compiled by the C compiler for the computational tiles.

January, Introduction to SHAPES27 Tiled HW Architecture Communication Centric, not Processor Centric Homogeneous SW interface for on-chip and off-chip scalable connection and I/O 3D first-neighbour Toroidal System Eng. (3DT) for Off-Chip communication Virtual tunnelling on packed switching NoC (Network on Chip) and off- chip 3DT Parallelism Aware System SW: Manage memory distribution, capture real time constraints Explicit parallel programming/Network of Actors FPGAFPGA DAC actuator Tile OFF-CHIP MEM Tile OFF-CHIP MEM Tile OFF-CHIP MEM Tile OFF-CHIP MEM DSP DNP Multi-Layer BUS NoC RISC POT 3DT Off-chip communication DXM Tile ADC sensor DAC actuator ADC sensor ADC/DAC

January, Introduction to SHAPES28 The tile: DIOPSIS ® + DIOPSIS ® + DNP DNP

January, Introduction to SHAPES29 DOL – Input/Output Interfaces DOL Performance analysis Application Specification HW Architecture Specification Mapping constraints Application programmer HW architect Sys. SW designer Simulation framework Workload Specification Mapping Specification Performance Analysis Results Performance Queries Compiler & Linker HdS & RTOS Simulation framework Sys. SW designer Application functional Simulation Mapping Optimization

January, Introduction to SHAPES30 RISC 0 DOL - Mapping Specification bus DSP 0 RISC 1 bus DSP 1 MEM AB C NoC Mapping = binding + scheduling.xml schema definition available

January, Introduction to SHAPES31 VSP abstraction levels Statistical Analysis Tasks are represented as timing budgets: - Very high simulation speed - No architecture modeling & verification SHAPES Hardware platform Virtual Processing Unit (VPU) Generic abstract processor simulator: - adaptable to arbitrary processor core - high simulation speed - functional validation - user-dependent accuracy simulation speed accuracy WP1.4 WP1.11 WP1.1 Cycle Accurate (CA) Model Cycle accurate Instruction Set Simulators (ISS): - ARM9 (commercially available) - mAgic VLIW DSP (Target ISS) - DNP - STM Spidergon Network-on-Chip (STM model) Instruction Accurate (IA) Model Instruction accurate Instruction Set Simulators (ISS): - ARM9 - mAgic VLIW DSP - DNP - STM Spidergon Network-on-Chip