Presentation is loading. Please wait.

Presentation is loading. Please wait.

A few issues on the design of future multicores André Seznec IRISA/INRIA.

Similar presentations


Presentation on theme: "A few issues on the design of future multicores André Seznec IRISA/INRIA."— Presentation transcript:

1 A few issues on the design of future multicores André Seznec IRISA/INRIA

2 André Seznec CAPS project-team Irisa-Inria 2 Single Chip Uniprocessor: the end of the road  (Very) wide issue superscalar processors are not cost effective:  More than quadratic complexity on many key components: Register file Bypass network Issue logic  Limited performance return Failure of EV8 = end of very wide issue superscalar processors

3 André Seznec CAPS project-team Irisa-Inria 3 Hardware thread parallelism  High-end single chip component:  Chip multiprocessors: IBM Power 5, dual-core Intel Pentium 4, dual-core Athlon-64 Many CMP SoCs for embedded markets Cell  (Simultaneous) Multithreading: Pentium 4, Power 5, Multithreading

4 André Seznec CAPS project-team Irisa-Inria 4 Thread parallelism  Expressed by the application developer:  Depends on the application itself  Depends on the programming language or paradigm  Depends on the programmer  Discovered by the compiler:  Automatic (static) parallelization  Exploited by the runtime:  Task scheduling  Dynamically discovered/exploited by hardware or software:  Speculative hardware/software threading

5 André Seznec CAPS project-team Irisa-Inria 5 Direction of (single chip) architecture: betting on parallelism success  (Future) applications are intrinsically parallel:  As much as possible simple cores  (Future) applications are moderately parallel  A few complex state-of-the-art superscalar cores SSC: Sea of Simple Cores FCC: Few Complex Cores

6 André Seznec CAPS project-team Irisa-Inria 6 SSC: Sea of Simple Cores

7 André Seznec CAPS project-team Irisa-Inria 7 FCC: Few Complex Cores 4-way O-O-O superscalar 4-way O-O-O superscalar Shared L3 cache 4-way O-O-O superscalar

8 André Seznec CAPS project-team Irisa-Inria 8 Common architectural design issues

9 André Seznec CAPS project-team Irisa-Inria 9 Instruction Set Architecture  Single ISAs ?  Extension of “conventional” multiprocessors Shared or distributed memory ?  Hetorogeneous ISAs:  A la CELL ?: (master processor + slave processors) x N  A la SoC ? : specialized coprocessors  Radically new architecture ? Which one ?

10 André Seznec CAPS project-team Irisa-Inria 10 Hardware accelerators ?  SIMD extensions:  Seems to be accepted, report the burden to applications developers and compilers  Reconfigurable datapaths:  Popular when you get a well defined intrinsically parallel application  Vector extensions:  Might be the right move when targeting essentially scientific computing

11 André Seznec CAPS project-team Irisa-Inria 11 On-chip memory/processors/memory bandwidth  The uniprocessor credo was: “Use the remaining silicon for caches”  New issue:  An extra processor or more cache Extra processing power =  increased memory bandwidth demand  Increased power consumption, more temperature hot spots Extra cache = decreased (external) memory demand

12 André Seznec CAPS project-team Irisa-Inria 12 Memory hierarchy organization ?

13 André Seznec CAPS project-team Irisa-Inria 13 Flat: sharing a big L2/L3 cache? μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ L3 cache

14 André Seznec CAPS project-team Irisa-Inria 14 Flat: communication issues? through the big cache μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ L3 cache

15 André Seznec CAPS project-team Irisa-Inria 15 Flat: communication issues? Grid-like ? μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ L3 cache

16 André Seznec CAPS project-team Irisa-Inria 16 Hierarchical organization ? μP μP$ μP μP$ L2 $ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ μP μP$ L3 $

17 André Seznec CAPS project-team Irisa-Inria 17 Hierarchical organization ?  Arbitration at all levels  Coherency at all levels  Interleaving at all levels  Bandwidth dimensioning

18 André Seznec CAPS project-team Irisa-Inria 18 NoC structure  Very dependent of the memory hierarchy organization !!  + sharing coprocessors/hardware accelerators  + I/O buses/(processors ?)  + memory interface  + network interface

19 André Seznec CAPS project-team Irisa-Inria 19 Example μP μP$ μP μP$ L2 $ μP μP$ μP μP$ μP μP$ μP μP$ L3 $ Memory Int. IO

20 André Seznec CAPS project-team Irisa-Inria 20 Multithreading ?  An extra level thread parallelism !!  Might be an interesting alternative to prefetching on massively parallel applications

21 André Seznec CAPS project-team Irisa-Inria 21 Power and thermal issues  Voltage/frequency scaling to adapt to the workload ?  Adapting the workload to the available power ?  Adapting/dimensioning the architecture to the power budget  Activity migration for managing temperatures ?

22 André Seznec CAPS project-team Irisa-Inria 22 General issues for software/compiler  Parallelism detection and partitioning:  find the correct granularity  Memory bandwidth mastering  Non-uniform memory latency  Optimizing sequential code portions

23 André Seznec CAPS project-team Irisa-Inria 23 SSC design specificities

24 André Seznec CAPS project-team Irisa-Inria 24 Basic core granularity  RISC cores  VLIW cores  In-order superscalar cores

25 André Seznec CAPS project-team Irisa-Inria 25 Homogeneous vs. heterogeneous ISAs  Core specialization:  RISC + VLIW or DSP slaves ?  Master core + a set of special purpose cores ?

26 André Seznec CAPS project-team Irisa-Inria 26 Sharing issue  Simple cores:  Lot of duplications and lots of unused resources at any time  Adjacent cores can share:  Caches  Functional units: FP, mult/div, multimedia,  Hardware accelerators

27 André Seznec CAPS project-team Irisa-Inria 27 An example of sharing μP μPFP μP μP DL1 $ Inst. fetch IL1 $ μP μPFP μP μP DL1 $ Inst. fetch IL1 $ Hardware accelerator L2 cache

28 André Seznec CAPS project-team Irisa-Inria 28 Multithreading/prefetching  Multithreading:  Is the extra complexity worth for simple cores ?  Prefetching:  Is it worth ?  Sharing prefetch engines ?

29 André Seznec CAPS project-team Irisa-Inria 29 Vision of a SSC (my own vision )

30 André Seznec CAPS project-team Irisa-Inria 30 SSC: the basic brick μP μPFP μP μP D $ I $ μP μPFP μP μP D $ I $ L2 cache μP μPFP μP μP D $ I $ μP μPFP μP μP D $ I $

31 André Seznec CAPS project-team Irisa-Inria 31 Memory interface network interface System interface L3 cache μP μP FP μP μP D $ I $ μP μPFP μP μP D $ I $ L2 cache μP μPFP μP μP D $ I $ μP μP FP μP μP D $ I $ μP μP FP μP μP D $ I $ μP μPFP μP μP D $ I $ L2 cache μP μPFP μP μP D $ I $ μP μP FP μP μP D $ I $ μP μP FP μP μP D $ I $ μP μPFP μP μP D $ I $ L2 cache μP μPFP μP μP D $ I $ μP μP FP μP μP D $ I $ μP μP FP μP μP D $ I $ μP μPFP μP μP D $ I $ L2 cache μP μPFP μP μP D $ I $ μP μP FP μP μP D $ I $

32 André Seznec CAPS project-team Irisa-Inria 32 FCC design specificities

33 André Seznec CAPS project-team Irisa-Inria 33 Only limited available thread parallelism ?  Focus on uniprocessor architecture:  Find the correct tradeoff between complexity and performance  Power and temperature issues  Vector extensions ?  Contiguous vectors ( a la SSE) ?  Strided vectors in L2 caches ( Tarantula-like)

34 André Seznec CAPS project-team Irisa-Inria 34 Performance enablers  SMT for parallel workloads ?  Helper threads ?  Run ahead threads  Speculative multithreading hardware support

35 André Seznec CAPS project-team Irisa-Inria 35 Intermediate design ?  SCCs:  Shine on massively parallel applications  Poor/ limited performance on sequential sections  FCCs:  Moderate performance on parallel applications  Good performance on sequential sections

36 André Seznec CAPS project-team Irisa-Inria 36 Amdahl’s law Mix of FCC and SSC

37 André Seznec CAPS project-team Irisa-Inria 37 The basic brick L2 cache μP μPFP μP μP D $ I $ μP μPFP μP μP D $ I $ Ultimate Out-of-order Superscalar

38 André Seznec CAPS project-team Irisa-Inria 38 L2 $ D $ I $ D $ I $ Ult. O-O-O L2 $ D $ I $ D $ I $ Ult. O-O-O L2 $ D $ I $ D $ I $ Ult. O-O-O L2 $ D $ I $ D $ I $ Ult. O-O-O L3 cache Memory interface network interface System interface

39 André Seznec CAPS project-team Irisa-Inria 39 Conclusion  The era of uniprocessor has come to the end  No clear trend to continue  Might be time for more architecture diversity


Download ppt "A few issues on the design of future multicores André Seznec IRISA/INRIA."

Similar presentations


Ads by Google