Presentation is loading. Please wait.

Presentation is loading. Please wait.

Shared memory and message passing revisited in the many-core era 1 iCSC2016, Aram Santogidis, CERN Shared memory and Message passing revisited in the many-core.

Similar presentations


Presentation on theme: "Shared memory and message passing revisited in the many-core era 1 iCSC2016, Aram Santogidis, CERN Shared memory and Message passing revisited in the many-core."— Presentation transcript:

1 Shared memory and message passing revisited in the many-core era 1 iCSC2016, Aram Santogidis, CERN Shared memory and Message passing revisited in the many-core era Aram Santogidis CERN Inverted CERN School of Computing, 29 February – 2 March 2016

2 Shared memory and message passing revisited in the many-core era 2 iCSC2016, Aram Santogidis, CERN The pioneers of concurrent programming Edsger Dijkstra Mutual exclusion Cooperating Sequential Processes Semaphores Per Brinch Hansen Concurrent Pascal Shared Classes The Solo OS Distributed Processes C.A.R Hoare Communicating Sequential Processes (CSP) Monitors

3 Shared memory and message passing revisited in the many-core era 3 iCSC2016, Aram Santogidis, CERN Communication is important Time Process 1 Process 2 Process 3 Communication/ Synchronization Communication/ Synchronization Shared memory Message passing VS

4 Shared memory and message passing revisited in the many-core era 4 iCSC2016, Aram Santogidis, CERN Agenda of the talk  Concurrency and communication  Two basic examples of the two models  Conventional wisdom for the two models  Cache coherence and manycore processors  Emerging paradigm shift in OS architectures  The future perspective

5 Shared memory and message passing revisited in the many-core era 5 iCSC2016, Aram Santogidis, CERN The Shared Memory model Shared data structures Thread Shared Memory  Threads communicate implicitly with each other via shared data structures  Synchronization primitives (locks, semaphores, etc.)

6 Shared memory and message passing revisited in the many-core era 6 iCSC2016, Aram Santogidis, CERN The message passing model Thread Message  Threads communicate explicitly with each other by exchanging messages  Is the more fundamental class from the two  Synchronous or asynchronous communication

7 Shared memory and message passing revisited in the many-core era 7 iCSC2016, Aram Santogidis, CERN Lets see an example for each model 1.Image processing (shared memory) 2.Simple GUI (message passing)

8 Shared memory and message passing revisited in the many-core era 8 iCSC2016, Aram Santogidis, CERN A shared memory-based example: Convert from colour to grayscale R: 204 G: 46 B: 10 (R+G+B) / 3 = 130

9 Shared memory and message passing revisited in the many-core era 9 iCSC2016, Aram Santogidis, CERN A shared memory-based example: Convert from colour to grayscale We parallelize the computation by assigning tiles (pieces) of the image to threads which execute the conversion in parallel. SPEED

10 Shared memory and message passing revisited in the many-core era 10 iCSC2016, Aram Santogidis, CERN A message passing-based example Many operating system designs can be placed into one of two very rough categories, depending upon how they…  A GUI with 3 widgets  Text Area  Up scroll button  Down scroll button  Must be interactive (Immediate feedback)

11 Shared memory and message passing revisited in the many-core era 11 iCSC2016, Aram Santogidis, CERN GUI example implementation: Message passing solution Thread TextArea Thread UpButton Thread DownButton Many operating system designs can be placed into one of two very rough categories, depending upon how they… STructure

12 Shared memory and message passing revisited in the many-core era 12 iCSC2016, Aram Santogidis, CERN Conventional wisdom about the characteristics of the two models 1.Performance 2.Programmability

13 Shared memory and message passing revisited in the many-core era 13 iCSC2016, Aram Santogidis, CERN Performance comparison Shared MemoryMessage Passing Hardware supportExtensive (All popular architectures) Limited (Only special purpose architectures) Data transfer Overhead Low (Cache block management in HW) High (Data replication) Access/Sync overhead Sometimes high (Critical section contention, NUMA effects) Low (Local private memory access)

14 Shared memory and message passing revisited in the many-core era 14 iCSC2016, Aram Santogidis, CERN Programmability comparison Shared MemoryMessage Passing Communication ImplicitExplicit Synchronization Explicit (locks etc.) Implicit (side-effect) Interface (API)Read/write shared data structures, mutex primitives Send/Receive messages, Multicast HazardsRace conditions, Deadlocks, Starvation Deadlocks, Starvation

15 Shared memory and message passing revisited in the many-core era 15 iCSC2016, Aram Santogidis, CERN Towards the manycore architectures http://www.wired.com/images_blogs/gadgetlab/2009/10/tilera-wafer-1.jpg

16 Shared memory and message passing revisited in the many-core era 16 iCSC2016, Aram Santogidis, CERN The manycore era http://image.slideserve.com/277797/manycore-systems-design-space-n.jpg The graph is from (presentation): “Joshi, Ajay, et al. "Building manycore processor-to-DRAM networks using monolithic silicon photonics." High Performance Embedded Computing (HPEC) Workshop. 2008.”  Power limits the frequency increase of the processor.  Moore’s law: The transistors keep doubling every two years  Replication: Increasing number of cores

17 Shared memory and message passing revisited in the many-core era 17 iCSC2016, Aram Santogidis, CERN On the duality of operating systems structures  Operating Systems are generally classified as:  Message passing oriented  Procedure-oriented (shared memory)  Each system from one category has the other category.  Neither model is inherently better than the other (depends on the machine architecture). From: Lauer, Hugh C., and Roger M. Needham. "On the duality of operating system structures." ACM SIGOPS Operating Systems Review 13.2 (1979): 3-19.

18 Shared memory and message passing revisited in the many-core era 18 iCSC2016, Aram Santogidis, CERN Non Uniform Memory Access (NUMA) RAM Domain 0 core P0 Last Level Cache CPU 1 P1 CPU 2 P2 CPU 3 P3 Local Cache CPU 4 P4 CPU 5 P5 RAM Domain 1 Last Level Cache Socket 0 Socket 1 QPI Local Cache

19 Shared memory and message passing revisited in the many-core era 19 iCSC2016, Aram Santogidis, CERN Cache coherence Core 1 Core 0 0 i: j=i; i++; LLC L1 i: 0 Shared Cache Controller BUS Cache Controller L1 i: 0

20 Shared memory and message passing revisited in the many-core era 20 iCSC2016, Aram Santogidis, CERN Cache coherence Core 1 Core 0 0 i: j=i; i++; LLC L1 i: 1 Modified Cache Controller BUS Cache Controller L1 i: 0 Invalid Bus/ Invalidate Message

21 Shared memory and message passing revisited in the many-core era 21 iCSC2016, Aram Santogidis, CERN Cache coherence Core 1 Core 0 0 i: j=i; i++; LLC L1 i: 1 Shared Cache Controller BUS Cache Controller L1 i: 1 Bus Read Write back

22 Shared memory and message passing revisited in the many-core era 22 iCSC2016, Aram Santogidis, CERN A key question When updating shared state, which uproach is more expensive (in terms of latency), Shared memory or Message passing?

23 Shared memory and message passing revisited in the many-core era 23 iCSC2016, Aram Santogidis, CERN An experiment of shared memory vs message passing performance Thread Core Cache Shared state BUS Hyper Transport CPU in Socket Updating shared state of size [1,8] cachelines, relying on cache coherent shared memory on 4x4 AMD system

24 Shared memory and message passing revisited in the many-core era 24 iCSC2016, Aram Santogidis, CERN An experiment of shared memory vs message passing performance BUS Hyper Transport CPU in Socket Updating shared state of size [1,8] cachelines, relying on synchronous Lightweight Remote Procedure Calls (message passing) S Server, updating the shared state on behalf of the threads

25 Shared memory and message passing revisited in the many-core era 25 iCSC2016, Aram Santogidis, CERN Messages scale better than shared memory Message passing scales better than shared memory when increasing the core count and the size of the shared state. The plot is adapted from: Baumann, Andrew, et al. "The multikernel: a new OS architecture for scalable multicore systems." Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. ACM, 2009.

26 Shared memory and message passing revisited in the many-core era 26 iCSC2016, Aram Santogidis, CERN …some other hints that may lead to further fragmentation of coherency domains http://www.racktopsystems.com/wp-content/uploads/2013/01/sql-server-fragmentation.jpg

27 Shared memory and message passing revisited in the many-core era 27 iCSC2016, Aram Santogidis, CERN Increasing Heterogeneity of computing platforms Heterogeneity Time Multi socket Manycores, GPU Coprocessor FPGA Dual cores, pthreads Single cores, Concurrent OS, Coroutines  Message passing: Fundamental for communication in heterogeneous environment  Shared memory: Hard to implement in a heterogeneous environment

28 Shared memory and message passing revisited in the many-core era 28 iCSC2016, Aram Santogidis, CERN Message passing OS vs Shared memory OS Barrelfish OS (Message passing) Single arch(x86, etc.) GPU Linux Kernel Driver App Linux OS (Shared memory) ARM x86 … OSnode State replica GPU OSnode State replica OSnode State replica Async messages App Architecture dependant code Adapted from : Baumann, Andrew, et al. "The multikernel: a new OS architecture for scalable multicore systems." Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. ACM, 2009

29 Shared memory and message passing revisited in the many-core era 29 iCSC2016, Aram Santogidis, CERN What to expect ? http://tech.co/wp-content/uploads/2014/12/future-marketing.jpg

30 Shared memory and message passing revisited in the many-core era 30 iCSC2016, Aram Santogidis, CERN Emerging concurrency paradigms  Asynchronous tasks (Futures/Promises)  Partitioned Global Address Space (PGAS) languages/libraries  Actor Model  Functional Concurrency New high level paradigms are being developed, based on shared memory and/or message passing constructs.

31 Shared memory and message passing revisited in the many-core era 31 iCSC2016, Aram Santogidis, CERN The future perspective  Communication is the key  For energy efficiency  For runtime performance  To manage software complexity  To manage hardware heterogeneity  Innovation in the hardware sector pressures to systems software engineers to develop appropriate support  At the operating system level  Concurrent programming frameworks level  Communication-oriented tools and techniques to design, implement, analyse concurrent programs

32 Shared memory and message passing revisited in the many-core era 32 iCSC2016, Aram Santogidis, CERN http://globe-views.com/dcim/dreams/surprise/surprise-05.jpg

33 Shared memory and message passing revisited in the many-core era 33 iCSC2016, Aram Santogidis, CERN References  Baumann, Andrew, et al. "The multikernel: a new OS architecture for scalable multicore systems." Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. ACM, 2009.  Gerber, et al. "Not Your Parents' Physical Address Space", Proceedings ot the 15th Workshop on Hot Topics in Operating Systems (HotOS 15)  A Primer on Memory Consistency and Cache Coherence Daniel J. Sorin, Mark D. Hill, and David A. Wood  Martin, Milo MK, Mark D. Hill, and Daniel J. Sorin. "Why on-chip cache coherence is here to stay." Communications of the ACM 55.7 (2012): 78-89.  Hansen, Per Brinch. "The invention of concurrent programming." The origin of concurrent programming. Springer New York, 2002. 3-61.  Butcher, Paul. Seven Concurrency Models in Seven Weeks: When Threads Unravel. Pragmatic Bookshelf, 2014.  Hoare, Charles Antony Richard. "Communicating sequential processes."Communications of the ACM 21.8 (1978): 666-677.

34 Shared memory and message passing revisited in the many-core era 34 iCSC2016, Aram Santogidis, CERN Thank you for your attention Many thanks to my supporters and mentors for this presentation: Sebastian Lopienski, Sebastian.Lopienski@cern.ch, CERN Andreas Joachim Peters, Andreas.Joachim.Peters@cern.ch, CERN Andreas Hirstius, andreas.hirstius@intel.com, Intel GmbH Spyros Lalis, lalis@inf.uth.gr, University of Thessaly This work is support by the Marie Curie Early European Industrial Doctorates Fellowship of the European Community’s Seventh Framework Programme under contract number (PITN-GA-2012-316596-ICE-DIP).


Download ppt "Shared memory and message passing revisited in the many-core era 1 iCSC2016, Aram Santogidis, CERN Shared memory and Message passing revisited in the many-core."

Similar presentations


Ads by Google