Presentation is loading. Please wait.

Presentation is loading. Please wait.

ACACES 12 July 2009 1 Computing beyond a Million Processors - bio-inspired massively-parallel architectures Steve Furber The University of Manchester

Similar presentations


Presentation on theme: "ACACES 12 July 2009 1 Computing beyond a Million Processors - bio-inspired massively-parallel architectures Steve Furber The University of Manchester"— Presentation transcript:

1 ACACES 12 July Computing beyond a Million Processors - bio-inspired massively-parallel architectures Steve Furber The University of Manchester SBF is supported by a Royal Society-Wolfson Research Merit Award Andrew Brown The University of Southampton

2 ACACES 12 July Outline Computer Architecture Perspective Building Brains Living with Failure Design Principles SpiNNaker The SpiNNaker system Concurrency Conclusions

3 ACACES 12 July Multi-core CPUs High-end uniprocessors – diminishing returns from complexity – wire vs transistor delays Multi-core processors – cut-and-paste – simple way to deliver more MIPS Moore’s Law – more transistors – more cores … but what about the software?

4 ACACES 12 July Multi-core CPUS General-purpose parallelization – an unsolved problem – the ‘Holy Grail’ of computer science for half a century? – but imperative in the many-core world Once solved – few complex cores, or many simple cores? – simple cores win hands-down on power-efficiency!

5 ACACES 12 July Back to the future Imagine… – a limitless supply of (free) processors – load-balancing is irrelevant – all that matters is: the energy used to perform a computation formulating the problem to avoid synchronisation abandoning determinism How might such systems work?

6 ACACES 12 July Bio-inspiration How can massively parallel computing resources accelerate our understanding of brain function? How can our growing understanding of brain function point the way to more efficient parallel, fault-tolerant computation?

7 ACACES 12 July Outline Computer Architecture Perspective Building Brains Living with Failure Design Principles SpiNNaker The SpiNNaker system Concurrency Conclusions

8 ACACES 12 July Building brains Brains demonstrate – massive parallelism (10 11 neurons) – massive connectivity (10 15 synapses) – excellent power-efficiency much better than today’s microchips – low-performance components (~ 100 Hz) – low-speed communication (~ metres/sec) – adaptivity – tolerant of component failure – autonomous learning

9 ACACES 12 July (www.ship.edu/ ~cgboeree/theneuron.html)www.ship.edu/ ~cgboeree/theneuron.html Neurons Multiple inputs (dendrites) Single output (axon) – digital “spike” – fires at 10s to 100s of Hz – output connects to many targets Synapse at input/output connection

10 ACACES 12 July Neurons A flexible biological control component – very simple animals have a handful – bees: 850,000 – humans: (photo courtesy of the Brain Mind Institute, EPFL)

11 ACACES 12 July Regular high-level structure – e.g. 6-level cortical microachitecture low-level vision, to language, etc. Random low-level structure – adapts over time (faculty.washington.edu/ rhevner/Miscellany.html)faculty.washington.edu/ rhevner/Miscellany.html Neurons

12 ACACES 12 July Neural Computation To compute we need: – Processing – Communication – Storage Processing: abstract model – linear sum of weighted inputs ignores non-linear processes in dendrites – non-linear output function – learn by adjusting synaptic weights w1w1 x1x1 w2w2 x2x2 w3w3 x3x3 w4w4 x4x4 y f

13 ACACES 12 July Leaky integrate-and-fire model – inputs are a series of spikes – total input is a weighted sum of the spikes – neuron activation is the input with a “leaky” decay – when activation exceeds threshold, output fires – habituation, refractory period, …? Processing

14 ACACES 12 July Izhikevich model – two variables, one fast, one slow: – neuron fires when v > 30; then: – a, b, c & d select behaviour ( )www.izhikevich.com Processing v u

15 ACACES 12 July Communication Spikes – biological neurons communicate principally via ‘spike’ events – asynchronous – information is only: which neuron fires, and when it fires

16 ACACES 12 July Storage Synaptic weights – stable over long periods of time with diverse decay properties? – adaptive, with diverse rules Hebbian, anti-Hebbian, LTP, LTD,... Axon ‘delay lines’ Neuron dynamics – multiple time constants Dynamic network states

17 ACACES 12 July Outline Building Brains Computer Architecture Perspective Living with Failure Design Principles SpiNNaker The SpiNNaker system Concurrency Conclusions

18 ACACES 12 July The Good News... Transistors per Intel chip Year Millions of transistors per chip Pentium 4004 Pentium II Pentium III Pentium 4

19 ACACES 12 July and the Bad News Device variability & Component failure

20 ACACES 12 July Atomic Scale devices The simulation Paradigm now A 4.2 nm MOSFET In production 2023 A 22 nm MOSFET In production 2008

21 ACACES 12 July A view from Intel The Good News: – we will have 100 billion transistor ICs The Bad News: – billions will fail in manufacture unusable due to parameter variations – billions more will fail over the first year of operation intermittent and permanent faults (Shekhar Borkar, Intel Fellow)

22 ACACES 12 July A view from Intel Conclusions: – one-time production test will be out – burn-in to catch infant mortality will be impractical – test hardware will be an integral part of the design – dynamically self-test, detect errors, reconfigure, adapt,... (Shekhar Borkar, Intel Fellow)

23 ACACES 12 July Outline Building Brains Computer Architecture Perspective Living with Failure Design Principles SpiNNaker The SpiNNaker system Concurrency Conclusions

24 ACACES 12 July Design principles Virtualised topology – physical and logical connectivity are decoupled Bounded asynchrony – time models itself Energy frugality – processors are free – the real cost of computation is energy

25 ACACES 12 July Outline Building Brains Computer Architecture Perspective Living with Failure Design Principles SpiNNaker The SpiNNaker system Concurrency Conclusions

26 ACACES 12 July SpiNNaker project Multi-core CPU node – 20 ARM968 processors – to model large-scale systems of spiking neurons Scalable up to systems with 10,000s of nodes – over a million processors – >10 8 MIPS total Power ~ 25  w/neuron

27 ACACES 12 July SpiNNaker project

28 ACACES 12 July Fault-tolerant architecture for large- scale neural modelling A billion neurons in real time A step-function increase in the scale of neural computation Cost- and energy- efficient SpiNNaker project

29 ACACES 12 July SpiNNaker system

30 ACACES 12 July CMP node

31 ACACES 12 July ARM968 subsystem

32 ACACES 12 July GALS organization clocked IP blocks self-timed interconnect self-timed inter- chip links

33 ACACES 12 July Outline Building Brains Computer Architecture Perspective Living with Failure Design Principles SpiNNaker The SpiNNaker system Concurrency Conclusions

34 ACACES 12 July Circuit-level concurrency Delay-insensitive comms – 3-of-6 RTZ on chip – 2-of-7 NRZ off chip Deadlock resistance – Tx & Rx circuits have high deadlock immunity – Tx & Rx can be reset independently each injects a token at reset true transition detector filters surplus token din (2 phase) dout (4 phase) ¬reset¬ack Tx Rx data ack

35 ACACES 12 July System-level concurrency Breaking symmetry – any processor can be Monitor Processor local ‘election’ on each chip, after self-test – all nodes are identical at start-up addresses are computed relative to node with host connection (0,0) – system initialised using flood-fill nearest-neighbour packet type boot time (almost) independent of system scale

36 ACACES 12 July Application-level concurrency Event-driven real- time software – spike packet arrived initiate DMA – DMA of synaptic data completed process inputs insert axonal delay – 1ms Timer interrupt update_ Neurons(); update_ Stimulus(); sleeping event goto_Sleep(); Priority 1 Priority 2 Priority 3 Timer Millisecond Interrupt DMA Completion Interrupt Packet Received Interrupt fetch_ Synaptic_ Data();

37 ACACES 12 July Application-level concurrency Cross-system delay << 1ms – hardware routing – ‘emergency’ routing failed links Congestion – if all else fails drop packet

38 ACACES 12 July Firing rate population codes – N neurons – diverse tuning – collective coding of a physical parameter – accuracy – robust to neuron failure (Neural Engineering, Eliasmith & Anderson 2003) Biological concurrency

39 ACACES 12 July Single spike/neuron codes – choose N to fire from a population of M – order of firing may or may not matter Biological concurrency

40 ACACES 12 July Outline Building Brains Computer Architecture Perspective Living with Failure Design Principles SpiNNaker The SpiNNaker system Concurrency Conclusions

41 ACACES 12 July Software progress ARM SoC Designer SystemC model 4 chip x 2 CPU top- level Verilog model Running: boot code Izhikevich model PDP2 codes Basic configuration flow …it all works!

42 ACACES 12 July Where might this lead? Robots – iCub EU project – open humanoid robot platform – mechanics, but no brain!

43 ACACES 12 July Conclusions Many-core processing is coming – soon we will have far more processors than we can program When (if?) we crack parallelism… – more small processors are better than fewer large processors – synchronization, coherent global memory, determinism, are all impediments Biology suggests a way forward! – but we need new theories of biological concurrency

44 ACACES 12 July UoM SpiNNaker team


Download ppt "ACACES 12 July 2009 1 Computing beyond a Million Processors - bio-inspired massively-parallel architectures Steve Furber The University of Manchester"

Similar presentations


Ads by Google