CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.

Slides:



Advertisements
Similar presentations
Symmetric Multiprocessors: Synchronization and Sequential Consistency.
Advertisements

1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Snoopy Caches I Steve Ko Computer Sciences and Engineering University at Buffalo.
4/4/2013 CS152, Spring 2013 CS 152 Computer Architecture and Engineering Lecture 17: Synchronization and Sequential Consistency Krste Asanovic Electrical.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 12 CS252 Graduate Computer Architecture Spring 2014 Lecture 12: Synchronization and Memory Models Krste.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04 Selective, Accurate,
Is SC + ILP = RC? Presented by Vamshi Kadaru Chris Gniady, Babak Falsafi, and T. N. VijayKumar - Purdue University Spring 2005: CS 7968 Parallel Computer.
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Synchronization and Consistency II Steve Ko Computer Sciences and Engineering University at.
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.
1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Lecture 13: Consistency Models
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Memory Consistency Models
1 Lecture 12: Relaxed Consistency Models Topics: sequential consistency recap, relaxing various SC constraints, performance comparison.
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Meenaktchi Venkatachalam.
1 Lecture 22: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Sunita Marathe.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Evaluation of Memory Consistency Models in Titanium.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Shared Memory Consistency Models. Quiz (1)  Let’s define shared memory.
Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.
By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Synchronization, Memory Consistency 17th April, 2006.
Anshul Kumar, CSE IITD ECE729 : Advance Computer Architecture Lecture 26: Synchronization, Memory Consistency 25 th March, 2010.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Memory Consistency Zhonghai Lu Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
CS533 Concepts of Operating Systems Jonathan Walpole.
CISC 879 : Advanced Parallel Programming Rahul Deore Dept. of Computer & Information Sciences University of Delaware Exploring Memory Consistency for Massively-Threaded.
ECE/CS 552: Shared Memory © Prof. Mikko Lipasti Lecture notes based in part on slides created by Mark Hill, David Wood, Guri Sohi, John Shen and Jim Smith.
Ch4. Multiprocessors & Thread-Level Parallelism 4. Syn (Synchronization & Memory Consistency) ECE562 Advanced Computer Architecture Prof. Honggang Wang.
Fundamentals of Memory Consistency Smruti R. Sarangi Prereq: Slides for Chapter 11 (Multiprocessor Systems), Computer Organisation and Architecture, Smruti.
1 Programming with Shared Memory - 3 Recognizing parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Jan 22, 2016.
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 20: Consistency Models, TM
Memory Consistency Models
The University of Adelaide, School of Computer Science
Lecture 11: Consistency Models
Memory Consistency Models
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Cache Coherence Protocols 15th April, 2006
Shared Memory Consistency Models: A Tutorial
Lecture 12: TM, Consistency Models
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Introduction to High Performance Computing Lecture 20
Lecture 22: Consistency Models, TM
Shared Memory Consistency Models: A Tutorial
Lecture 10: Consistency Models
Programming with Shared Memory Specifying parallelism
Memory Consistency Models
CSE 153 Design of Operating Systems Winter 19
Programming with Shared Memory Specifying parallelism
Lecture 21: Synchronization & Consistency
Lecture: Consistency Models, TM
Lecture 11: Relaxed Consistency Models
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 19 Memory Consistency Models Krste Asanovic Electrical Engineering.
Lecture 11: Consistency Models
Presentation transcript:

CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU

What is Consistency ? When must a processor see the new value? P1:A = 0;P2:B = 0;..... A = 1;B = 1; L1: if (B == 0)...L2:if (A == 0)... Impossible for both if statements L1 & L2 to be true? – What if write invalidate is delayed & processor continues? Memory consistency – What are the rules for such cases? – Create uniform view of all memory locations relative to each other

Coherence vs. Consistency A = flag = 0 Processor 0Processor 1 A = 1;while (!flag); flag = 1;print A; Intuitive result: P1 prints A = 1 Cache coherence – Provide consistent view of a memory location – Does not say anything about the ordering of two different locations Consistency can determine whether this code works or not

Sequential Consistency Lamport’s definition “A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor occur in this sequence in the order specified by its program” Most intuitive consistency model for programmers – Processors see their own loads and stores in program order – Processors see others’ loads and stores in program order – All processors see same global load and store ordering

Slides from Prof. Arvind, MIT Sequential Consistency In-order instruction execution Atomic loads and stores SC is easy to understand but architects and compiler writers want to violate it for performance Processor 1Processor 2 Store (a), 10; L:Load r1, (flag); Store (flag), 1; if r 1 == 0 goto L; Load r2, (a); initially flag = 0

Memory Model Issues Architectural optimizations that are correct for uniprocessors, often violate sequential consistency and result in a new memory model for multiprocessors Slides from Prof. Arvind, MIT

L24-7 Committed Store Buffers CPU Cache Main Memory CPU Cache CPU can continue execution while earlier committed stores are still propagating through memory system – Processor can commit other instructions (including loads and stores) while first store is committing to memory – Committed store buffer can be combined with speculative store buffer in an out-of-order CPU Local loads can bypass values from buffered stores to same address Slides from Prof. Arvind, MIT

Suppose Loads can go ahead of Stores waiting in the store buffer: Yes ! Process 1Process 2 Store (flag 1 ),1;Store (flag 2 ),1; Load r 1, (flag 2 );Load r 2, (flag 1 ); Example 1: Store Buffers Initially, all memory locations contain zeros Question: Is it possible that r 1 =0 and r 2 =0? Sequential consistency: No Total Store Order (TSO): IBM 370, Sparc’s TSO memory model Slides from Prof. Arvind, MIT

Process 1Process 2 Store (flag 1 ), 1;Store (flag 2 ), 1; Load r 3, (flag 1 );Load r 4, (flag 2 ); Load r 1, (flag 2 );Load r 2, (flag 1 ); Example 2: Store-Load Bypassing Suppose Store-Load bypassing is permitted in the store buffer –No effect in Sparc’s TSO model, still not SC –In IBM 370, a load cannot return a written value until it is visible to other processors => implicity adds a memory fence, looks like SC Question: Do extra Loads have any effect? Sequential consistency: No Slides from Prof. Arvind, MIT

L24-10 Interleaved Memory System CPU Even Cache Memory (Even Addresses) Odd Cache Memory (Odd Addresses) Achieve greater throughput by spreading memory addresses across two or more parallel memory subsystems –In snooping system, can have two or more snoops in progress at same time (e.g., Sun UE10K system has four interleaved snooping busses) –Greater bandwidth from main memory system as two memory modules can be accessed in parallel Slides from Prof. Arvind, MIT

With non-FIFO store buffers: Yes Process 1Process 2 Store (a), 1;Load r 1, (flag); Store (flag), 1;Load r 2, (a); Example 3: Non-FIFO Store buffers Sparc’s PSO memory model Question: Is it possible that r 1 =1 but r 2 =0? Sequential consistency: No Slides from Prof. Arvind, MIT

Assuming stores are ordered: Yes because Loads can be reordered Example 4: Non-Blocking Caches Sparc’s RMO, PowerPC’s WO, Alpha Question: Is it possible that r 1 =1 but r 2 =0? Sequential consistency: No Process 1Process 2 Store (a), 1;Load r 1, (flag); Store (flag), 1;Load r 2, (a); Slides from Prof. Arvind, MIT

With speculative loads: Yes even if the stores are ordered Process 1Process 2 Store (a), 1; L:Load r 1, (flag); Store (flag), 1;if r 1 == 0 goto L; Load r 2, (a); Example 5: Speculative Execution Question: Is it possible that r 1 =1 but r 2 =0? Sequential consistency: No Slides from Prof. Arvind, MIT

Store (x), 1; Load r, (y); Instruction foo Address Speculation All modern processor will execute foo speculatively by assuming x is not equal to y. This also affects the memory model Slides from Prof. Arvind, MIT

Even if Loads on a processor are ordered, the different ordering of stores can be observed if the Store operation is not atomic. Process 1Process 2Process 3 Process 4 Store (a),1;Store (a),2; Load r 1, (a); Load r 3, (a); Load r 2, (a); Load r 4, (a); Example 6: Store Atomicity Question: Is it possible that r 1 =1 and r 2 =2 but r 3 =2 and r 4 =1 ? Sequential consistency: No Slides from Prof. Arvind, MIT

Example 8: Causality Process 1Process 2Process 3 Store (flag 1 ),1;Load r 1, (flag 1 );Load r 2, (flag 2 ); Store (flag 2 ),1;Load r 3, (flag 1 ); Question: Is it possible that r 1 =1 and r 2 =1 but r 3 =0 ? Sequential consistency: No

Relaxed Consistency Models Processor Consistency (PC) – Intel / AMD x86, SPARC (they have some differences) – Relax W→R ordering, but maintain W → W, R → W, and R → R orderings – Allows in-order write buffers – Does not confuse programmers too much Weak Ordering (WO) – Alpha, Itanium, ARM, PowerPC (they have some differences) – Relax all orderings – Use memory fence instruction when ordering is necessary – Use synchronization library, don’t write your own acquire fence critical section fence release

Weaker Memory Models Architectures with weaker memory models provide memory fence instructions to prevent otherwise permitted reorderings of loads and stores Fence wr Store (a 1 ), r2; Load r1, (a 2 ); Fence rr ;Fence rw ;Fence ww ; The Load and Store can be reordered if a 1 =/= a 2. Insertion of Fence wr will disallow this reordering Similarly: SUN’s Sparc: MEMBAR; MEMBARRR; MEMBARRW; MEMBARWR; MEMBARWW PowerPC: Sync; EIEIO Slides from Prof. Arvind, MIT

Enforcing Ordering using Fences Processor 1Processor 2 Store (a),10; L:Load r 1, (flag); Store (flag),1; if r 1 == 0 goto L; Load r 2, (a); Processor 1Processor 2 Store (a),10; L:Load r 1, (flag); Fence ww ;if r 1 == 0 goto L; Store (flag),1; Fence rr ; Load r 2, (a); Weak ordering Slides from Prof. Arvind, MIT

Backlash in the architecture community Abandon weaker memory models in favor of SC by employing aggressive “speculative execution” tricks. – all modern microprocessors have some ability to execute instructions speculatively, i.e., ability to kill instructions if something goes wrong (e.g. branch prediction) – treat all loads and stores that are executed out of order as speculative and kill them if we determine that SC is about to be violated. Slides from Prof. Arvind, MIT

Aggressive SC Implementations Loads can complete speculatively and out of order Processor 1Processor 2 Load r 1, (flag);Store (a),10; Load r 2, (a); miss hit Complex implementations, and still not as efficient as weaker memory model kill Load(a) and the subsequent instructions if Store(a) commits before Load(flag) commits How to implement the checks? –Re-execute speculative load at commit to see if value changes –Search speculative load queue for each incoming cache invalidate Slides from Prof. Arvind, MIT

Properly Synchronized Programs Very few programmers do programming that relies on SC; instead higher-level synchronization primitives are used – locks, semaphores, monitors, atomic transactions A “properly synchronized program” is one where each shared writable variable is protected (say, by a lock) so that there is no race in updating the variable. – There is still race to get the lock – There is no way to check if a program is properly synchronized For properly synchronized programs, instruction reordering does not matter as long as updated values are committed before leaving a locked region. Slides from Prof. Arvind, MIT

Release Consistency Only care about inter-processor memory ordering at thread synchronization points, not in between Can treat all synchronization instructions as the only ordering points … Acquire(lock) // All following loads get most recent written values … Read and write shared data.. Release(lock) // All preceding writes are globally visible before // lock is freed. … Slides from Prof. Arvind, MIT

Programming Language Memory Models We have focused on ISA level memory model and consistency issues But most programmers never write at assembly level and rely on language-level memory model An open research problem is how to specify and implement a sensible memory model at the language level. Difficulties: – Programmer wishes to be unaware of detailed memory layout of data structures – Compiler wishes to reorder primitive memory operations to improve performance – Programmer would prefer not to have to develop dead-lock free locking protocols Slides from Prof. Arvind, MIT

Takeaway High-level parallel programming should be oblivious of memory model issues. – Programmer should not be affected by changes in the memory model SC is too low level for a programming model. High-level programming should be based on critical sections & locks, atomic transactions, monitors,... ISA definition for Load, Store, Memory Fence, synchronization instructions should – Be precise – Permit maximum flexibility in hardware implementation – Permit efficient implementation of high-level parallel constructs. Slides from Prof. Arvind, MIT