Presentation is loading. Please wait.

Presentation is loading. Please wait.

Embedded Computer Architecture 5KK73 MPSoC Platforms Part2: Cell Bart Mesman and Henk Corporaal.

Similar presentations


Presentation on theme: "Embedded Computer Architecture 5KK73 MPSoC Platforms Part2: Cell Bart Mesman and Henk Corporaal."— Presentation transcript:

1 Embedded Computer Architecture 5KK73 MPSoC Platforms Part2: Cell Bart Mesman and Henk Corporaal

2 The Complexity Crisis I have always wished that my computer would be as easy to use as my telephone. My wish has come true. I no longer know how to use my telephone. --Bjarne Stroustrup 7/16/20152

3 3 The Software Crisis

4 7/16/20154 The first SW crisis Time Frame: ’60s and ’70s Problem: Assembly Language Programming –Computers could handle larger more complex programs Needed to get Abstraction and Portability without losing Performance Solution: –High-level languages for von-Neumann machines FORTRAN and C

5 7/16/20155 The second SW crisis Time Frame: ’80s and ’90s Problem: Inability to build and maintain complex and robust applications requiring multi-million lines of code developed by hundreds of programmers –Computers could handle larger more complex programs Needed to get Composability and Maintainability –High-performance was not an issue: left for Moore’s Law

6 7/16/20156 Solution Object Oriented Programming –C++, C# and Java Also… –Better tools Component libraries, Purify –Better software engineering methodology Design patterns, specification, testing, code reviews

7 7/16/20157 Today: Programmers are Oblivious to Processors Solid boundary between Hardware and Software Programmers don’t have to know anything about the processor –High level languages abstract away the processors Ex: Java bytecode is machine independent –Moore’s law does not require the programmers to know anything about the processors to get good speedups Programs are oblivious of the processor -> work on all processors –A program written in ’70 using C still works and is much faster today This abstraction provides a lot of freedom for the programmers

8 7/16/20158 The third crisis: Powered by PlayStation

9 7/16/20159 Contents Hammer your head against 4 walls –Or: Why Multi-Processor Cell Architecture Programming and porting –plus case-study

10 7/16/201510 Moore’s Law

11 7/16/201511 Single Processor SPECint Performance

12 7/16/201512 What’s stopping them? General-purpose uni-cores have stopped historic performance scaling –Power consumption –Wire delays –DRAM access latency –Diminishing returns of more instruction-level parallelism

13 7/16/201513 Power density

14 7/16/201514 Power Efficiency (Watts/Spec)

15 7/16/201515 1 clock cycle wire range

16 7/16/201516 Global wiring delay becomes dominant over gate delay

17 7/16/201517 Memory

18 7/16/201518 Now what? Latest research drained Tried every trick in the book So: We’re fresh out of ideas Multi-processor is all that’s left!

19 7/16/201519 Low power through parallelism Sequential Processor –Switching capacitance C –Frequency f –Voltage V –P =  fCV 2 Parallel Processor (two times the number of units) –Switching capacitance 2C –Frequency f/2 –Voltage V’ < V –P =  f/2 2C V’ 2 =  fCV’ 2

20 7/16/201520 Architecture methods Powerful Instructions (1) MD-technique Multiple data operands per operation SIMD: Single Instruction Multiple Data Vector instruction: for (i=0, i++, i<64) c[i] = a[i] + 5*b[i]; c = a + 5*b Assembly: set vl,64 ldv v1,0(r2) mulvi v2,v1,5 ldv v1,0(r1) addv v3,v1,v2 stv v3,0(r3)

21 7/16/201521 Architecture methods Powerful Instructions (1) Sub-word parallelism –SIMD on restricted scale: –Used for Multi-media instructions –Motivation: use a powerful 64-bit alu as 4 x 16-bit alus Examples –MMX, SUN-VIS, HP MAX-2, AMD- K7/Athlon 3Dnow, Trimedia II –Example:  i=1..4 |a i -b i | ****

22 7/16/201522 MPSoC Issues Homogeneous vs Heterogeneous Shared memory vs local memory Topology Communication (Bus vs. Network) Granularity (many small vs few large) Mapping –Automatic vs manual parallelization –TLP vs DLP –Parallel vs Pipelined

23 7/16/201523 Multi-core

24 7/16/201524 Cell

25 7/16/201525 What can it do?

26 7/16/201526 Cell/B.E. - the history Sony/Toshiba/IBM consortium –Austin, TX – March 2001 –Initial investment: $400,000,000 Official name: STI Cell Broadband Engine –Also goes by Cell BE, STI Cell, Cell In production for: –PlayStation 3 from Sony –Mercury’s blades

27 7/16/201527 Cell blade

28 7/16/201528 Cell/B.E. – the architecture 1 x PPE 64-bit PowerPC L1: 32 KB I$ + 32 KB D$ L2: 512 KB 8 x SPE cores: Local store: 256 KB 128 x 128 bit vector registers Hybrid memory model: PPE: Rd/Wr SPEs: Asynchronous DMA EIB: 205 GB/s sustained aggregate bandwidth Processor-to-memory bandwidth: 25.6 GB/s Processor-to-processor: 20 GB/s in each direction

29 7/16/201529 Cell chip

30 7/16/201530 SPE

31 7/16/201531 SPE

32 7/16/201532 SPE pipeline

33 7/16/201533 Communication

34 7/16/201534 8 parallel transactions

35 7/16/201535 C++ on Cell 1 2 3 4 Send the code of the function to be run on SPE Send address to fetch the data DMA data in LS from the main memory Run the code on the SPE 5 6 DMA data out of LS to the main memory Signal the PPE that the SPE has finished the function

36 7/16/201536 Conclusions Multi-processors inevitable Huge performance increase, but… Hell to program –Got to be an architecture expert –Portability?


Download ppt "Embedded Computer Architecture 5KK73 MPSoC Platforms Part2: Cell Bart Mesman and Henk Corporaal."

Similar presentations


Ads by Google