Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues.

Similar presentations


Presentation on theme: "Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues."— Presentation transcript:

1 Lecture 27 Multiprocessor Scheduling

2

3 Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues related to multi-core: scheduling and scalability

4 The cache coherence problem Since we have multiple private caches: How to keep the data consistent across caches? Each core should perceive the memory as a monolithic array, shared by all the cores

5 The cache coherence problem Core 1Core 2Core 3Core 4 One or more levels of cache x=15213 One or more levels of cache x=15213 One or more levels of cache Main memory x=15213 multi-core chip

6 The cache coherence problem Core 1Core 2Core 3Core 4 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache Main memory x=15213 multi-core chip assuming write-back caches

7 The cache coherence problem Core 1Core 2Core 3Core 4 One or more levels of cache x=15213 One or more levels of cache x=15213 One or more levels of cache Main memory x=15213 multi-core chip

8 The cache coherence problem Core 1Core 2Core 3Core 4 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches

9 Solutions for cache coherence There exist many solution algorithms, coherence protocols, etc. A simple solution: Invalidation protocol with bus snooping

10 Inter-core bus Core 1Core 2Core 3Core 4 One or more levels of cache Main memory multi-core chip inter-core bus

11 Invalidation protocol with snooping Invalidation: If a core writes to a data item, all other copies of this data item in other caches are invalidated Snooping: All cores continuously “snoop” (monitor) the bus connecting the cores.

12 The cache coherence problem Core 1Core 2Core 3Core 4 One or more levels of cache x=15213 One or more levels of cache x=15213 One or more levels of cache Main memory x=15213 multi-core chip

13 The cache coherence problem Core 1Core 2Core 3Core 4 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches INVALIDATED sends invalidation request

14 The cache coherence problem Core 1Core 2Core 3Core 4 One or more levels of cache x=21660 One or more levels of cache x=21660 One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches

15 Alternative to invalidate protocol: update protocol Core 1Core 2Core 3Core 4 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches broadcasts updated value

16 Alternative to invalidate protocol: update protocol Core 1Core 2Core 3Core 4 One or more levels of cache x=21660 One or more levels of cache x=21660 One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches broadcasts updated value

17 Invalidation vs update Multiple writes to the same location invalidation: only the first time update: must broadcast each write (which includes new variable value) Invalidation generally performs better: it generates less bus traffic

18 Programmers still Need to Worry about Concurrency Mutex Condition variables Lock-free data structures

19 Single-Queue Multiprocessor Scheduling reuse the basic framework for single processor scheduling put all jobs that need to be scheduled into a single queue pick the best two jobs to run, if there are two CPUs Advantage: simple Disadvantage: does not scale

20 SQMS and Cache Affinity

21 Cache Affinity Thread migration is costly Need to restart the execution pipeline Cached data is invalidated OS scheduler tries to avoid migration as much as possible: it tends to keeps a thread on the same core

22 SQMS and Cache Affinity.

23 Multi-Queue Multiprocessor Scheduling Scalable Cache affinity

24 Load Imbalance Migration

25 Work Stealing A (source) queue that is low on jobs will occasionally peek at another (target) queue If the target queue is (notably) more full than the source queue, the source will “steal” one or more jobs from the target to help balance load Cannot look around at other queues too often

26 Linux Multiprocessor Schedulers Both approaches can be successful O(1) scheduler Completely Fair Scheduler (CFS) BF Scheduler (BFS), uses a single queue

27 An Analysis of Linux Scalability to Many Cores This paper asks whether traditional kernel designs can be used and implemented in a way that allows applications to scale

28 Amdahl's Law N: the number of threads of execution B: the fraction of the algorithm that is strictly serial the theoretical speedup:

29 Scalability Issues Global lock used for a shared data structure longer lock wait time Shared memory location overhead caused by the cache coherency algorithms Tasks compete for limited size-shared hardware cache increased cache miss rates Tasks compete for shared hardware resources (interconnects, DRAMinterfaces) more time wasted waiting Too few available tasks: less efficiency

30 How to avoid/fix These issues can often be avoided (or limited) using popular parallel programming techniques Lock-free algorithms Per-core data structures Fine-grained locking Cache-alignment Sloppy Counters

31

32 Current bottlenecks https://www.usenix.org/conference/osdi10/analysis- linux-scalability-many-cores


Download ppt "Lecture 27 Multiprocessor Scheduling. Last lecture: VMM Two old problems: CPU virtualization and memory virtualization I/O virtualization Today Issues."

Similar presentations


Ads by Google