Symmetric Multiprocessors: Synchronization and Sequential Consistency.

Slides:



Advertisements
Similar presentations
Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.
Advertisements

1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Operating Systems Part III: Process Management (Process Synchronization)
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Global Environment Model. MUTUAL EXCLUSION PROBLEM The operations used by processes to access to common resources (critical sections) must be mutually.
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Snoopy Caches I Steve Ko Computer Sciences and Engineering University at Buffalo.
4/4/2013 CS152, Spring 2013 CS 152 Computer Architecture and Engineering Lecture 17: Synchronization and Sequential Consistency Krste Asanovic Electrical.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 12 CS252 Graduate Computer Architecture Spring 2014 Lecture 12: Synchronization and Memory Models Krste.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Multiprocessor Synchronization Algorithms ( ) Lecturer: Danny Hendler The Mutual Exclusion problem.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.
CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
Operating Systems CMPSC 473 Mutual Exclusion Lecture 13: October 12, 2010 Instructor: Bhuvan Urgaonkar.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Synchronization and Consistency II Steve Ko Computer Sciences and Engineering University at.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Synchronization and Consistency I Steve Ko Computer Sciences and Engineering University at Buffalo.
6: Process Synchronization 1 1 PROCESS SYNCHRONIZATION I This is about getting processes to coordinate with each other. How do processes work with resources.
Chapter 6: Process Synchronization. Outline Background Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores Classic Problems.
CS 152 Computer Architecture and Engineering Lecture 19: Synchronization and Sequential Consistency Krste Asanovic Electrical Engineering and Computer.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
CS 152 Computer Architecture and Engineering Lecture 19: Synchronization and Sequential Consistency Krste Asanovic Electrical Engineering and Computer.
CS 252 Graduate Computer Architecture Lecture 11: Multiprocessors-II Krste Asanovic Electrical Engineering and Computer Sciences University of California,
April 13, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 18: Snoopy Caches Krste Asanovic Electrical Engineering and Computer.
Race Conditions CS550 Operating Systems. Review So far, we have discussed Processes and Threads and talked about multithreading and MPI processes by example.
April 8, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 19: Synchronization and Sequential Consistency Krste Asanovic Electrical.
CS 152 Computer Architecture and Engineering Lecture 20: Snoopy Caches Krste Asanovic Electrical Engineering and Computer Sciences University of California,
Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.
April 4, 2011CS152, Spring 2011 CS 152 Computer Architecture and Engineering Lecture 17: Synchronization and Sequential Consistency Krste Asanovic Electrical.
1 Lecture 22: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
April 15, 2010CS152, Spring 2010 CS 152 Computer Architecture and Engineering Lecture 20: Snoopy Caches Krste Asanovic Electrical Engineering and Computer.
Operating Systems CSE 411 CPU Management Oct Lecture 13 Instructor: Bhuvan Urgaonkar.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Process Synchronization Continued 7.2 Critical-Section Problem 7.3 Synchronization Hardware 7.4 Semaphores.
Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.
Chap 6 Synchronization. Background Concurrent access to shared data may result in data inconsistency Maintaining data consistency requires mechanisms.
Chapter 7 -1 CHAPTER 7 PROCESS SYNCHRONIZATION CGS Operating System Concepts UCF, Spring 2004.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Concurrency in Shared Memory Systems Synchronization and Mutual Exclusion.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 6: Process Synchronization.
4/13/2016 CS152, Spring 2016 CS 152 Computer Architecture and Engineering Lecture 18: Snoopy Caches Dr. George Michelogiannakis EECS, University of California.
Chapter 6 Synchronization Dr. Yingwu Zhu. The Problem with Concurrent Execution Concurrent processes (& threads) often access shared data and resources.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 20: Consistency Models, TM
CS 152 Computer Architecture and Engineering Lecture 18: Snoopy Caches
Background on the need for Synchronization
Memory Consistency Models
Memory Consistency Models
Chapter 5: Process Synchronization
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Krste Asanovic Electrical Engineering and Computer Sciences
Designing Parallel Algorithms (Synchronization)
Krste Asanovic Electrical Engineering and Computer Sciences
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Process Synchronization
Dr. George Michelogiannakis EECS, University of California at Berkeley
Grades.
Chapter 6: Process Synchronization
Memory Consistency Models
CS 152 Computer Architecture and Engineering Lecture 20: Snoopy Caches
Chapter 6: Synchronization Tools
CSE 542: Operating Systems
Presentation transcript:

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Symmetric Multiprocessors Processor CPU-Memory bus Memory bridge I/O controller Networks symmetric All memory is equally far away from all processors Any processor can do any I/O (set up a DMA transfer)

Synchronization The need for synchronization arises whenever there are parallel processes in a system (even in a uniprocessor system) Forks and Joins: In parallel programming a parallel process may want to wait until several events have occurred Producer-Consumer: A consumer process must wait until the producer process has produced data Exclusive use of a resource: Operating system has to ensure that only one process uses a resource at a given time fork join producer consumer P1P2

A Producer-Consumer Example producerconsumer tailhead R tail R head R Producer posting Item x: Consumer: R tail M[tail] M[ ] x R tail + 1 M[tail] R head M[head] R tail M[tail] if == R M[ ] R head + 1 M[head] process(R) The program is written assuming instructions are executed in order. Possible problems? What is the problem? Suppose the tail pointer gets updated before the item x is stored? Suppose R is loaded before x has been stored? spin:

A Producer-Consumer Example Producer posting Item x: Consumer: R tail M[tail] M[ ] x R tail + 1 M[tail] R head M[head] R tail M[tail] if == R M[ ] R head + 1 M[head] process(R) spin: Programmer assumes that if 3 happens after 2, then 4 happens after 1. Problems are: Sequence 2, 3, 4, 1 Sequence 4, 1, 2, 3 Programmer assumes that if 3 happens after 2, then 4 happens after 1.

A system is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in the order specified by the program Leslie Lamport Sequential Consistency = arbitrary order-preserving interleaving of memory references of sequential programs Sequential Consistency: A Memory Model

Concurrent sequential tasks: T1, T2 Shared variables: X, Y (initially X = 0, Y = 10) T1: T2: Store(X, 1) (X = 1) Load(R1, Y) Store(Y, 11) (Y = 11) Store(B, R1) (B = Y) Load(R2, X) Store(A, R2) (A = X) what are the legitimate answers for A and B ? (A, B) { (1, 11), (0, 10), (1, 10), (0, 11) } ? Sequential Consistency (0, 11) is not legit.

Sequential consistency imposes additional memory ordering constraints in addition to those imposed by uniprocessor program dependencies What are these in our example ? Does (can) a system with caches, write buffers, or out-of-order execution capability provide a sequentially consistent view of the memory ? More on this later Sequential Consistency

Multiple Consumer Example producer consumer1 tailhead R tail R head R R tail M[tail] M[ ] x R tail + 1 M[tail] R head M[head] R tail M[tail] if == R M[ ] R head + 1 M[head] process(R) spin: consumer2 R tail R head R R tail Producer posting Item x: Consumer: What is wrong with this code?

Multiple Consumer Example producer consumer1 tailhead consumer2 R tail M[tail] M[ ] x R tail + 1 M[tail] R head M[head] R tail M[tail] if == R M[ ] R head + 1 M[head] process(R) spin: Producer posting Item x: Consumer: Critical Section: Needs to be executed atomically by one consumer locks

Locks or Semaphores: E. W. Dijkstra, 1965 A semaphore is a non-negative integer, with the following operations: P(s): if s > 0 decrement s by 1 otherwise wait V(s): increment s by 1 and wake up one of the waiting processes Ps and Vs must be executed atomically, i.e., without interruptions or interleaved accesses to s by other processors Process i What does initial value of s determine? P(s) V(s) The maximum number of processes in the critical section. A sempahore is a visual system for sending information based on 2 flag sheld In each hand.

Implementation of Semaphores Semaphores (mutual exclusion) can be implemented using ordinary Load and Store instructions in the Sequential Consistency memory model. However, protocols for mutual exclusion are difficult to design... Simpler solution: atomic read-modify-write instructions Examples: (a is a memory address, R is a register) Test&Set(a, R): R M[a]; if ==0 then M[a] 1; Fetch&Add(a, RV, R): R M[a]; M[a] + ; Swap(a, R): Rt M[a]; M[a] ; R ;

Multiple Consumers Example: using the Test & Set Instruction P: Test&Set(mutex, R temp ) if ( != 0) goto P Rhead M[head] spin: R tail M[tail] if == goto spin R M[ ] Rhead + 1 M[head] V: 0) V: Store(mutex, 0) process(R) Critical Section Other atomic read-modify-write instructions (Swap, Fetch&Add, etc.) can also implement Ps and Vs What is the problem with this code? What if the process stops or is swapped out while in the critical section?

Nonblocking Synchronization Compare&Swap(a, R t, R s ): implicit arg -status if ( == M[a]) then M[a] ; R t ; status success; else status fail; try: R head M[head] spin: R tail M[tail] if == goto spin R M[ ] R newhead + 1 Compare&Swap(head, R head, R newhead ) if (status == fail) goto try process(R)

Load-reserve & Store-conditional Non-blocking Synchronization Special register(s) to hold reservation flag and address, and the outcome of store-conditional Load-reserve(R, a): ; R M[a]; Store-conditional(a, R): if == then cancel other procs reservation on a; M[a] ; status succeed; else status fail; try: Load-reserve(R head, head) spin: R tail M[tail] if == goto spin R M[ ] R head + 1 Store-conditional(head, R head ) if (status == fail) goto try process(R)

Mutual Exclusion Using Load/Store A protocol based on two shared variables c1 and c2 Initially., both c1 and c2 are 0 (not busy) Process 1Process 2... c1 = 1; L: if c2 == 1 then go to L c1 = 0;... c2 = 1; L: if c1 == then go to L c2 = 0; What is wrong?

Synchronization Process 1Process 2... L: c1 = 1; if c2 == 1 then { c1 = 0; goto L } c1 = 0; To avoid deadlock, let process give up reservation (i.e., Process 1 sets c1 to 0) while waiting.... L: c2 = 1; if c1 == 1 then { c2 = 0; goto L } c2 = 0; Deadlock is not possible. What could go wrong? This is the most promising solution, but alas, we still have a problem with bounded waiting. Suppose Process j continually reenters its entry protocol after leaving its exit protocol, while Process i is waiting. It is possible That Process j will repeatedly reach the while test when Process i has temporarily cleared its flag. We cannot place a bound on how many times this could happen.

A Protocol for Mutual Exclusion T. Dekker, 1966 Process 1Process 2... c1 = 1; turn = 1; L: if c2 == 1 && turn == 1 then goto L c1 = 0; A protocol based on 3 shared variables c1, c2 and turn. itially, both c1 and c2 are 0 (not busy)... c2 = 1; turn = 2; L: if c1 == 1 && turn == 2 then goto L c2 = 0; turn = i ensures that only process i can wait variables c1 and c2 ensure mutual exclusion Take a number approach used in bakeries. Never seen one in bakeries, but the RMV uses one.

N-process Mutual Exclusion Lamports Bakery Algorithm Process i Initially num[j] = 0, for all j choosing[i] = 1; num[i] = max(num[0], …, num[N-1]) + 1; choosing[i] = 0; for(j = 0; j < N; j++) while( choosing[j] ); while( num[j] && ( ( num[j] < num[i] ) || ( num[j] == num[i] && j < i ) ) ); } num[i] = 0; Entry Code Exit Code Wait if the process is currently choosing Wait if the process has a number and comes ahead of us.

Implementation Issues Implementation of SC is complicated by two issues Out-of-order execution capability Load(a); Load(b) yes Load(a); Store(b) yes if a b Store(a); Load(b) yes if a b Store(a); Store(b) yes if a b Caches Caches can prevent the effect of a store from being seen by other processors

Memory Fences: Instructions to serialize memory accesses Processors with relaxed or weak memory models (i.e., permit Loads and Stores to different addresses to be reordered) need memory fence instructions to force serialization of memory accesses Processors with relaxed memory models: Sparc V8 (TSO, PSO): Membar PowerPC (WO): nc, EIEIO Memory fences are expensive operations, however, one pays for serialization only when it is required Sy

Using Memory Fences producerconsumer tailhead R tail M[tail] M[ ] x membar SS R tail = + 1 M[tail] R head M[head] R tail M[tail] if == membar LL R M[ ] R head + 1 M[head] process(R) spin: Consumer:Producer posting Item x: What does this ensure? Ensures that tail pointer is not updated before X has been stored. Ensures that R is not loaded before x has been stored.