Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chip Design for the Next Generation Neuromorphic Many-Core Systems Sebastian Höppner collaborators: TUD, UMAN Madrid, 28th September 2015 September.

Similar presentations


Presentation on theme: "Chip Design for the Next Generation Neuromorphic Many-Core Systems Sebastian Höppner collaborators: TUD, UMAN Madrid, 28th September 2015 September."— Presentation transcript:

1 Chip Design for the Next Generation Neuromorphic Many-Core Systems Sebastian Höppner collaborators: TUD, UMAN Madrid, 28th September 2015 September 2015 HBP Summit 2015

2 Outline Targets and Architecture Chip Innovations Roadmap
September 2015 HBP Summit 2015

3 NM-MC (SpiNNaker) Rationale
Communication and memory centric architecture for efficient real-time simulation of spiking neural networks NM-MC-1 (SpiNNaker) has a broad user base ~40 systems in use around the world Flexibility: adaptable network, neuron model & plasticity Real-time: suits robotics & faster than HPC Capacity of 109 neurons and 1012 synapses Energy per synaptic event 10-8J (HPC: 10-4J) SpiNNaker uses old 130nm CMOS technology Scope for 10x improvement on modern technology with innovative circuit techniques September 2015 HBP Summit 2015

4 NM-MC Chip Scaling Targets
Feature SpiNNaker SpiNNaker2 technology 130nm 28nm cores 18 68 core frequency 200MHz >400MHz external memory 128MByte (1 Gbyte/s) 2GByte (>10 Gbyte/s) power 1W power management no yes floating point support exponential function hardware vector processing true random numbers biological realtime operation no. of neurons / synapses 16k / 16M 128k / 128M energy/synaptic event 10-8J 10-9J ≈10x improvement September 2015 HBP Summit 2015

5 SpiNNaker2 Chip Architecture
router sub-system (spike communication) processing sub-system (neuromorphic computation) memory sub-system (synaptic memory) 3 GBit/s x3 68 cores ARM M4 PE ARM M4 PE ARM M4 PE ARM M4 ARM M4 SpiNNaker Router >400MHz HMC Interface Network-on-Chip IP contract with ARM for Cortex M4 processor cores signed in 01/2015 400MHz Periphery (Timer, IRQ) True Random generator shared memory >10 GByte/s 3 GBit/s x3 3 GBit/s host link September 2015 HBP Summit 2015

6 Neuromorphic Computation Scenario
High update rates Peak processing load and spike communication bandwidth Latency requirements <1ms for processing and communication IP contract with ARM for Cortex M4 processor cores signed in 01/2015 Low update rates Relaxed processing load and spike communication bandwidth September 2015 HBP Summit 2015

7 Computation: Power Management Techniques
Dynamic voltage and frequency scaling (DVFS) Self DVFS by ARM cores based on neural network activity up to 40% energy reduction Adaptive power rail adaption to run at minimum required voltage up to 25% energy reduction VDD VDD2 VDD3 „slow“ AVS adaption „fast“ DVFS switching t 26-28 Jan 2015 HBP Summit 2015

8 Computation: Exponential Function Accelerator
Neural models often employ exponential decays Calculating exponentials is costly with standard processors typically dominates synapse update time in SpiNNaker Exponential function accelerator unit Latency: 4 clock cycles, fully pipelined Standard s16.15 fixed-point format, full accuracy (1 LSB) Integrated with ARM processor via AHB Up to 15x speed-up ARM Exp Unit FIFO AHB September 2015 HBP Summit 2015

9 Computation: Random Number Generation
Random numbers are required for neural modeling for stochastic computing Hardware acceleration for pseudo-random number generation (PRNG) per ARM core True random numbers from silicon noise Re-use PLL jitter as noise source  no power overhead! Dedicated true random oscillators Local generation and global distribution of entropy Santos contains the equivalent of 125k true random and 8M pseudo random number 8-bit September 2015 HBP Summit 2015

10 Communication: Energy Efficient Inter-Chip Links
Lightweight packet switched spike communication within SpiNNaker network 5GBit/s multi-standard SerDes in 28nm CMOS TX and RX equalization, clock-data recovery Towards energy proportional communication Low power IDLE mode Fast wake-up for biological real-time operation Reduced lock time by 1000x (1.5ms  < 1.5µs) Fast wake-up power data rate link IDLE September 2015 HBP Summit 2015

11 Santos Chip SerDes SerDes SerDes Processing Element Shared Memory
Router Processing Element Processing Element MCU Shared Memory Memory Interface SerDes SerDes SerDes SerDes GLOBALFOUNDRIES 28nm SLP CMOS, 18mm² Tape-out 07/2015 September 2015 HBP Summit 2015

12 Santos Evaluation Platform
Demonstrator and evaluation platform for NM-MC2 components Support NM-MC2 software development and platform integration Up to 4 boards with 4 Santos modules  64 ARM cores Available in Q1 2016 September 2015 HBP Summit 2015

13 NM-MC2 Chip Roadmap 2023 2022 SpiNNaker2 2021 ? 2020 2019
68 ARM M4 cores Power management SpiNNaker router with SerDes HMC interface Chip Size: 70mm², >1 billion transistors ? 2021 2020 2019 Nanolink28_gen3 Final SerDes Final HMC Interface 2018 ? 2017 Nanolink28_gen2 2nd Iteration SerDes HMC Interface ? Santos28 4 ARM M4 cores AVFS, DVFS power management SpiNNaker router with SerDes LPDDR2 Memory Interface Chip Size: 18mm² 2016 2015 NanoLink28 SerDes Transceiver Chip Size: 4mm² 2014 2013 September 2015 HBP Summit 2015

14 Thank you for your attention
The NM-MC2 chip design team: The University of Manchester: Luis Plana, Jim Garside, Steve Temple, David Lester, Steve Furber Technische Universität Dresden: Sebastian Höppner, Stefan Scholze, Andreas Dixius, Johannes Partzsch, Georg Ellguth, Stephan Hartmann, Thomas Hocker, Stephan Henker, Jörg Schreiter, Stefan Hänzsche, Stefan Schiefer, Love Cederstroem, René Schüffny, Christian Mayr September 2015 HBP Summit 2015

15 HBP Period 1 Review (Oct 2013 – Sep 2014)
Memory Integration DRAM for synaptic memory Hybrid Memory Cube (HMC) 3D stacked-die DRAM module Serial high-speed interface with 15GBit/s to processor chip for high bandwidth and low power consumption Status: conceptual planning Research challenge: 15G SerDes PHY in 28nm SLP technology Develop innovative and cost-efficient system-in-package integration concepts by Micron The Hybrid Memory Cube technology offers a low footprint DRAM integration based on a 3D chip stack. The HMC memory interface employs low voltage swing serial links at high data rates up to 15GBit/s for reduced energy per bit for memory access and compact physical realization (fewer physical lines). The implementation of these high speed interfaces in a super low power technology (e.g. 28nm SLP) with less device performance compared to the high-performance technology counterparts (e.g. 28nm HPP) is a main research challenge. Target footprint ≈2.5cm x 2.5cm Contributions by: 26-28 Jan 2015 HBP Period 1 Review (Oct 2013 – Sep 2014)


Download ppt "Chip Design for the Next Generation Neuromorphic Many-Core Systems Sebastian Höppner collaborators: TUD, UMAN Madrid, 28th September 2015 September."

Similar presentations


Ads by Google