Anshul Kumar, CSE IITD ECE729 : Advance Computer Architecture Lecture 26: Synchronization, Memory Consistency 25 th March, 2010.

Slides:



Advertisements
Similar presentations
Symmetric Multiprocessors: Synchronization and Sequential Consistency.
Advertisements

1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
The University of Adelaide, School of Computer Science
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Multiprocessors—Synchronization. Synchronization Why Synchronize? Need to know when it is safe for different processes to use shared data Issues for Synchronization:
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 12 CS252 Graduate Computer Architecture Spring 2014 Lecture 12: Synchronization and Memory Models Krste.
CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
CS6290 Synchronization. Synchronization Shared counter/sum update example –Use a mutex variable for mutual exclusion –Only one processor can own the mutex.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Synchronization and Consistency II Steve Ko Computer Sciences and Engineering University at.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 21: Synchronization Topics: lock implementations (Sections )
Lecture 13: Consistency Models
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
1 Lecture 12: Relaxed Consistency Models Topics: sequential consistency recap, relaxing various SC constraints, performance comparison.
Synchron. CSE 4711 The Need for Synchronization Multiprogramming –“logical” concurrency: processes appear to run concurrently although there is only one.
Synchronization Todd C. Mowry CS 740 November 1, 2000 Topics Locks Barriers Hardware primitives.
1 Lecture 22: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Sunita Marathe.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
Ch4. Multiprocessors & Thread-Level Parallelism 4. Syn (Synchronization & Memory Consistency) ECE562 Advanced Computer Architecture Prof. Honggang Wang.
Multiprocessors – Locks
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Outline Introduction Centralized shared-memory architectures (Sec. 5.2) Distributed shared-memory and directory-based coherence (Sec. 5.4) Synchronization:
CS5102 High Performance Computer Systems Memory Consistency
Memory Consistency Models
Lecture 19: Coherence and Synchronization
Lecture 5: Synchronization
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 11: Consistency Models
Parallel Shared Memory
Memory Consistency Models
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Cache Coherence Protocols 15th April, 2006
Designing Parallel Algorithms (Synchronization)
Lecture 5: Snooping Protocol Design Issues
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 21: Synchronization and Consistency
Multiprocessor Highlights
Lecture: Coherence and Synchronization
Lecture 4: Synchronization
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture: Coherence, Synchronization
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture 21: Synchronization & Consistency
Lecture: Coherence and Synchronization
Lecture 11: Relaxed Consistency Models
Problems with Locks Andrew Whitaker CSE451.
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Anshul Kumar, CSE IITD ECE729 : Advance Computer Architecture Lecture 26: Synchronization, Memory Consistency 25 th March, 2010

Anshul Kumar, CSE IITD slide 2 Synchronization Problem Processes run on different processors independently At some point they need to know the status of each other for –communication –mutual exclusion etc Hardware primitives required for these operations

Anshul Kumar, CSE IITD slide 3 Consider an example Bank transaction from account number A : b = read_bal (A) b = b – debit_amt if b >= bmin update_bal (A, b)

Anshul Kumar, CSE IITD slide 4 Two concurrent transactions Transaction 1 : b1 = read_bal (A) b1 = b1 – debit_amt1 if b1 >= bmin update_bal (A, b1) Transaction 2 : b2 = read_bal (A) b2 = b2 – debit_amt2 if b2 >= bmin update_bal (A, b2)

Anshul Kumar, CSE IITD slide 5 Two concurrent transactions serialize reads Transaction 1 : b1 = read_bal (A) b1 = b1 – debit_amt1 if b1 >= bmin update_bal (A, b1) and writes Transaction 2 : b2 = read_bal (A) b2 = b2 – debit_amt2 if b2 >= bmin update_bal (A, b2)

Anshul Kumar, CSE IITD slide 6 Lock for mutual exclusion Transaction 1 : aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1 … release: clear (lock) Transaction 2 : aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2 … release: clear (lock) Transaction 1 : aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1 … … release: clear (lock) Transaction 2 : aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2 … … release: clear (lock)

Anshul Kumar, CSE IITD slide 7 Lock for mutual exclusion Transaction 1 : aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1 … release: clear (lock) Transaction 2 : aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2 … release: clear (lock) Transaction 1 : aquire: x1 = read (lock) if x1 = 1 then goto aquire set (lock) do transaction1 … … release: clear (lock) Transaction 2 : aquire: x2 = read (lock) if x2 = 1 then goto aquire set (lock) do transaction2 … … release: clear (lock)

Anshul Kumar, CSE IITD slide 8 Synchronization Primitives Hardware primitive required Should have atomic read+write operation Examples: – test&set –exchange –fetch&increment –load linked, store contitional

Spin Lock with Exchange Instr. Lock: 0 indicates free and 1 indicates locked Code to lock X : r2  1 lockit: r2  X ;atomic exchange if(r2  0)  lockit ;already locked locks are cached for efficiency, coherence is used Better code to lock X : lockit: r2  X ;read lock if(r2  0)  lockit ;not available r2  1 r2  X ;atomic exchange if(r2  0)  lockit ;already locked

Anshul Kumar, CSE IITD slide 10 LD Linked & ST conditional Simpler to implement atomic exchange r2  X using LL and SC try: r3  r2 ;move exchange value LL r1, X ;load linked SC r3, X ;store conditional if(r3=0)  try ;branch, store fails r2  r1 ;put loaded value in r2 fetch&increment using LL and SC try: LL r1, X ;load locked r3  r1 + 1 ;increment SC r3, X ;store conditional if(r3=0)  try ;branch, store fails

Anshul Kumar, CSE IITD slide 11 Spin Lock with LL & SC lockit: LL r2, X ;load locked if(r2  0)  lockit ;not available r2  1 SC r2, X ;store cond if(r2 = 0)  lockit ;branch store fails performance in presence of contention? spin lock with exponential back-off reduces contention

Anshul Kumar, CSE IITD slide 12 Barrier Synchronization lock (X) if(count=0)release  0 count++ unlock(X) if(count=total){count  0;release  1} else spin(release=1) 0 1

Anshul Kumar, CSE IITD slide 13 Improved Barrier Synch. local_sense  !local_sense lock (X) count++ unlock(X) if(count = total) {count  0;release  local_sense} else {spin(release = local_sense)} tree based barrier reduces contention

Anshul Kumar, CSE IITD slide 14 Memory Consistency Problem When must a processor see the value that has been written by another processor? Atomicity of operations – system wide? Can memory operations be re-ordered? Various models : models_tutorial.ps

Anshul Kumar, CSE IITD slide 15 ExampleExample P1: A = 0 P2: B = A = 1 B = 1 L1: if(B=0)S1 L2: if(A=0)S2 Which statements among S1 and S2 are done? Both S1, S2 may be done if writes are delayed

Anshul Kumar, CSE IITD slide 16 Sequential Consistency result of any execution is same as if the operations of all processors were executed in some sequential order operations of each processor occur in the order specified by its program - it requires all memory operations to be atomic - too restrictive, high overheads

Anshul Kumar, CSE IITD slide 17 Relaxing W  R order Loads are allowed to overtake stores  Write buffering is permitted 1.Total Store Ordering : Writes are atomic 2.Processor Consistency : Writes need not be atomic - Invalidations may gradually propagate

Anshul Kumar, CSE IITD slide 18 Relaxing W  R & W  W order Partial Store Ordering Loads are allowed to overtake stores Writes can be re-ordered Memory barrier or fence are used to explicitly order any operations Further improves the performance

Anshul Kumar, CSE IITD slide 19 ExamplesExamples P1 P2 A = 1; while(flag=0); flag = 1; print A; P1 P2 A = 1; print B; B = 1; print A; SC ensures that “1” is printed TSO, PC also do so PSO does not SC ensures that if B is printed as “1” then A is also printed as “1” TSO, PC also do so PSO does not

Anshul Kumar, CSE IITD slide 20 Examples - continued P1 P2 P3 A = 1; while(A=0); while(B=0); B = 1; print A; SC ensures that “1” is printed. TSO and PSO also do that but PC does not P1 P2 A = 1; B = 1; print B; print A; SC ensures that both can’t be printed as “0”. TSO, PC and PSO do not

Anshul Kumar, CSE IITD slide 21 Relaxing all R/W order Weak Ordering or Weak Consistency Loads and Stores are not restricted to follow an order Explicit synchronization primitives are used Synchronization primitives follow a strict order Easy to achieve Low overhead

Anshul Kumar, CSE IITD slide 22 Release Consistency Further relaxation of weak ordering Synch primitives are divided into aquire and release operations R/W operations after an aquire cannot move before it but those before it can be moved after R/W operations before a release cannot move after it but those after it can be moved before

Anshul Kumar, CSE IITD slide 23 WC and RC Comparison R/W … R/W … R/W … R/W synch R/W … R/W … R/W … R/W aquire release WC RC