D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

Slides:

Advertisements

Similar presentations

Symmetric Multiprocessors: Synchronization and Sequential Consistency.

Advertisements

CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.

Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.

Virtual Time “Virtual Time and Global States of Distributed Systems” Friedmann Mattern, 1989 The Model: An asynchronous distributed system = a set of processes.

CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,

Parallel Programming Motivation and terminology – from ACM/IEEE 2013 curricula.

Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.

Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder.

Synchronization Chapter clock synchronization * 5.2 logical clocks * 5.3 global state * 5.4 election algorithm * 5.5 mutual exclusion * 5.6 distributed.

Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.

PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.

CS514: Intermediate Course in Operating Systems Professor Ken Birman Vivek Vishnumurthy: TA.

Ordering and Consistent Cuts Presented By Biswanath Panda.

CPSC 668Set 16: Distributed Shared Memory1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch.

CPSC 668Set 12: Causality1 CPSC 668 Distributed Algorithms and Systems Fall 2009 Prof. Jennifer Welch.

1 Concurrent and Distributed Systems Introduction 8 lectures on concurrency control in centralised systems - interaction of components in main memory -

Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved Chapter 7 Consistency.

1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.

Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.

CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.

Lecture 12 Synchronization. EECE 411: Design of Distributed Software Applications Summary so far … A distributed system is: a collection of independent.

/ PSWLAB Eraser: A Dynamic Data Race Detector for Multithreaded Programs By Stefan Savage et al 5 th Mar 2008 presented by Hong,Shin Eraser:

Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport (1978) Presented by: Yoav Kantor.

Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.

Concurrency Recitation – 2/24 Nisarg Raval Slides by Prof. Landon Cox.

Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation.

CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.

Issues with Clocks. Context The tree correction protocol was based on the idea of local detection and correction. Protocols of this type are complex to.

Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.

Synchronization. Why we need synchronization? It is important that multiple processes do not access shared resources simultaneously. Synchronization in.

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Distributed Systems Fall 2010 Logical time, global states, and debugging.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

Consider the program fragment below left. Assume that the program containing this fragment executes t1() and t2() on separate threads running on separate.

Time, Clocks, and the Ordering of Events in a Distributed System Leslie Lamport Massachusetts Computer Associates,Inc. Presented by Xiaofeng Xiao.

Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.

D u k e S y s t e m s Asynchronous Replicated State Machines (Causal Multicast and All That) Jeff Chase Duke University.

13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch CSCE 668 Set 16: Distributed Shared Memory 1.

Event Ordering. CS 5204 – Operating Systems2 Time and Ordering The two critical differences between centralized and distributed systems are: absence of.

CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.

Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.

CISC 879 : Advanced Parallel Programming Rahul Deore Dept. of Computer & Information Sciences University of Delaware Exploring Memory Consistency for Massively-Threaded.

Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)

Week 9, Class 3: Java’s Happens-Before Memory Model (Slides used and skipped in class) SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors:

Agenda  Quick Review  Finish Introduction  Java Threads.

1 Written By: Adi Omari (Revised and corrected in 2012 by others) Memory Models CDP

Ordering of Events in Distributed Systems UNIVERSITY of WISCONSIN-MADISON Computer Sciences Department CS 739 Distributed Systems Andrea C. Arpaci-Dusseau.

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Memory Consistency Models

Memory Consistency Models

Specifying Multithreaded Java semantics for Program Verification

Overview of Ordering and Logical Time

Threads and Memory Models Hal Perkins Autumn 2011

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Concurrent Graph Exploration with Multiple Robots

Consistency Models.

Symmetric Multiprocessors: Synchronization and Sequential Consistency

COT 5611 Operating Systems Design Principles Spring 2014

Threads and Memory Models Hal Perkins Autumn 2009

Programming with Shared Memory Specifying parallelism

Memory Consistency Models

CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization

CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization

CSE 153 Design of Operating Systems Winter 19

Relaxed Consistency Finale

Programming with Shared Memory Specifying parallelism

Presentation transcript:

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University

JMM We’re discussing consistency in the context of the Java Memory Model [Manson, Pugh, Adve, PoPL 05]. The question: What is the right memory abstraction for multithreaded programs? – Admits an efficient machine implementation. – Admits compiler optimizations. – (Maximizes allowable concurrency.) – Runs correct programs correctly. – Conforms to Principle of No Unreasonable Surprises for incorrect programs. – Or: “No wild shared memory.” – (Easy for programmers to reason about.)

JMM: Three Lessons 1.Understand what it means for a program to be correct. – Synchronize! Use locks to avoid races. 2.Understand the memory model and its underpinnings for correct programs. – Happens-before, clocks, and all that, – Expose synchronization actions to the memory system. – Synchronization order induces a happens-before order. 3.Understand the need for a rigorous definition of the memory model for unsafe programs. Since your programs are correct, and we aren’t writing an optimizing compiler, we can “wave our hands” at the details.

Concurrency and time Multicore systems and distributed systems have no global linear time. – Nodes (or cores) “see” different subsets of events. – Events: messages, shared data updates, inputs, outputs, synchronization actions. – Some events are concurrent, and nodes that do see them may see them in different orders. If we want global linear time, we must make it. – Define words like “before”, “after”, “later” carefully. – Respect “natural” ordering constraints.

Concurrency and time A B A and B are cores, or threads, or networked nodes, processes, or clients. Each executes in a logical sequence: time goes to the right. Occasionally, one of them generates an event that is visible to the other (e.g., a message or a write to memory). Consistency concerns the order in which participants observe such events. Some possible orders “make sense” and some don’t. A consistency model defines what orders are allowable. Multicore memory models and JMM are examples of this concept.

Concurrency and time A B C C What do these words mean? after? last? subsequent? eventually? Time, Clocks, and the Ordering of Events in Distributed systems, by Leslie Lamport, CACM 21(7), July 1978

Same world, different timelines Which of these happened first? A B W(x)=v R(x) Message send Message receive “Event e1a wrote W(x)=v” e1a e1b e2 e3be4 e3a e1a is concurrent with e1b e3a is concurrent with e3b and e4 This is a partial order of events.

Lamport happened-before (  ) C A B C 1.If e1, e2 are in the same process/node, and e1 comes before e2, then e1  e2. - Also called program order Time, Clocks, and the Ordering of Events in Distributed Systems, by Leslie Lamport, CACM 21(7), July 1978 Over 8500 citations!

Lamport happened-before (  ) C A B C 2. If e1 is a message send, and e2 is the corresponding receive, then e1  e2. - The receive is “caused” by the send event, which happens before the receive. Time, Clocks, and the Ordering of Events in Distributed systems, by Leslie Lamport, CACM 21(7), July 1978

Lamport happened-before (  ) C A B C 3.  is transitive happened-before is the transitive closure of the relation defined by #1 and #2 potential causality Time, Clocks, and the Ordering of Events in Distributed systems, by Leslie Lamport, CACM 21(7), July 1978

Lamport happened-before (  ) C A B C Two events are concurrent if neither happens-before the other. Time, Clocks, and the Ordering of Events in Distributed systems, by Leslie Lamport, CACM 21(7), July 1978

Significance of happens-before Happens-before defines a partial order of events. – Based on some notion of a causal event, e.g., message send – These events capture causal dependencies in the system. In general, execution orders/schedules must be “consistent with” the happens-before order. Key point of JMM and multicore memory models: synchronization accesses are causal events! – JMM preserves happens-before with respect to lock/unlock. – Multi-threaded programs that use locking correctly see a consistent view of memory.

Thinking about data consistency Let us choose a total (sequential) order of data accesses at the storage service. – Sequential schedules are easy to reason about, e.g., we know how reads and writes should behave. – R(x) returns the “last” W(x)=v in the schedule A data consistency model defines required properties of the total order we choose. – E.g., we require the total order to be consistent with the “natural” partial order (  ). – Application might perceive an inconsistency if the ordering violates , otherwise not detectable. Some orders are legal in a given consistency model, and some orders are not.

Clocks: a quick overview Logical clocks (Lamport clocks) number events according to happens-before (  ). – If e1  e2, L(e1) < L(e2) – No relation defined if e1 and e2 are concurrent. Vector clocks label events with a vector V, where V(e)[i] is the logical clock of the latest event e1 in node i such that e1  e. – V(e2) dominates V(e1) iff e1  e2. – Two events are concurrent iff neither vector clock dominates the other. – You’ll see this again…

Same world, unified timelines? A B W(x)=v R(x) e1b e2 e4 e3a This is a total order of events. Also called a sequential schedule. It allows us to say “before” and “after”, etc. But it is arbitrary. External witness e1a e5 X e3b

Same world, unified timelines? A B W(x)=v R(x) e1b e2 e3b e4 e3a Here is another total order of the same events. Like the last one, it is consistent with the partial order: it does not change any existing orderings; it only assigns orderings to events that are concurrent in the partial order. External witness e1a X X

Concurrency and time: restated “Global linear time”  total order of events – Also known as a sequential schedule But concurrency  a partial order of events – Their timelines correspond only with respect to the events they happen to observe together. – E.g., partial order induced by happened-before A common trick: define a total order anyway! – Sequential schedules are easy to reason about! – Consider the set of total orders that are consistent with happened-before: pick one (or a subset).

Storage with data objects A SS C W(x)=v R(x) ?? OK W(y)=u OK How to think about time and event orderings for shared data updates to a storage service (SS)? Which of these happened first?

Storage with independent data objects A SS C W(x)=v R(y) u R(x) ?? OK W(y)=u OK How to think about time and event orderings for shared data updates to a storage service (SS)? Which of these happened first?

Example: sequential consistency P1 M W(x)=v R(x) v OK W(y)=u OK For all of you architects out there… Sequential consistency model [Lamport79]: - Memory/SS chooses a global total order for each cell. - Operations from a given P are in program order. - (Enables use of lock variables for mutual exclusion.) P2 ordered

1979: An early understanding of multicore memory consistency. Also applies to networked storage systems.

Sequential consistency is too strong! Sequential consistency requires the machine to do a lot of extra work that might be unnecessary. The machine must make memory updates by one core visible to others, even if the program doesn’t care. The machine must do some of the work even if no other core ever references the updated location! Can a multiprocessor with a weaker ordering than sequential consistency still execute programs correctly? Answer: yes. Modern multicore systems allow orderings that are weaker, but still respect the happens-before order induced by synchronization (lock/unlock).

Memory ordering Shared memory is complex on multicore systems. Does a load from a memory location (address) return the latest value written to that memory location by a store? What does “latest” mean in a parallel system? T1 M W(x)=1 W(y)=1 OK R(y) 1 T2 It is common to presume that load and store ops execute sequentially on a shared memory, and a store is immediately and simultaneously visible to load at all other threads. But not on real machines. R(x) 1

Memory ordering A load might fetch from the local cache and not from memory. A store may buffer a value in a local cache before draining the value to memory, where other cores can access it. Therefore, a load from one core does not necessarily return the “latest” value written by a store from another core. T1 M W(x)=1 W(y)=1 OK R(y) 0?? T2 A trick called Dekker’s algorithm supports mutual exclusion on multi-core without using atomic instructions. It assumes that load and store ops on a given location execute sequentially. But they don’t. R(x) 0??

“Sequential” Memory ordering A machine is sequentially consistent iff: Memory operations (loads and stores) appear to execute in some sequential order on the memory, and Ops from the same core appear to execute in program order. No sequentially consistent execution can produce the result below, yet it can occur on modern machines. T1 M W(x)=1 W(y)=1 OK R(y) 0?? T2 To produce this result: 4<2 (4 happens-before 2) and 3<1. No such schedule can exist unless it also reorders the accesses from T1 or T2. Then the reordered accesses are out of program order. R(x) 0??

The first thing to understand about memory behavior on multi-core systems Cores must see a “consistent” view of shared memory for programs to work properly. A machine can be “consistent” even if it is not “sequential”. But what does it mean? Synchronization accesses tell the machine that ordering matters: a happens-before relationship exists. Machines always respect that. – Modern machines work for race-free programs. – Otherwise, all bets are off. Synchronize! T1 M W(x)=1 W(y)=1 OK R(y) 1 T2 The most you should assume is that any memory store before a lock release is visible to a load on a core that has subsequently acquired the same lock. R(x) 0?? pass lock

What’s a race? Suppose we execute program P. The events are synchronization accesses (lock/unlock) and loads/stores on shared memory locations, e.g., x. The machine and scheduler choose a schedule S S imposes a total order on accesses for each lock, which induces a happens-before order on all events. Suppose there is some x with a concurrent load and store to x. (The load and store are conflicting.) Then P has a race. A race is a bug. P is unsafe. Summary: a race occurs when two or more conflicting accesses are concurrent.

Synchronization order mx->Acquire(); x = x + 1; mx->Release(); mx->Acquire(); x = x + 1; mx->Release(); before An execution schedule defines a total order of synchronization events (at least on any given lock/monitor): the synchronization order. 1.Events within a thread are ordered. 2.Mutex handoff orders events across threads: the release #N happens- before acquire #N+1. 3.The order is transitive: if (A < B) and (B < C) then A < C. Different schedules of a given program may have different synchronization orders. Just three rules govern synchronization order: Purple’s unlock/release action synchronizes- with the subsequent lock/acquire.

Happens-before revisited mx->Acquire(); x = x + 1; mx->Release(); mx->Acquire(); x = x + 1; mx->Release(); happens before (<) before An execution schedule defines a partial order of program events. The ordering relation (<) is called happens-before. 1.Events within a thread are ordered. 2.Mutex handoff orders events across threads: the release #N happens- before acquire #N+1. 3.Happens-before is transitive: if (A < B) and (B < C) then A < C. Two events are concurrent if neither happens-before the other in the schedule. Just three rules govern happens-before order: Machines may reorder concurrent events, but they always respect happens-before ordering.

Quotes from JMM paper “Happens-before is the transitive closure of program order and synchronization order.” “A program is said to be correctly synchronized or data-race-free iff all sequentially consistent executions of the program are free of data races.” [According to happens-before.]

JMM model The “simple” JMM happens-before model: A read cannot see a write that happens after it. If a read sees a write (to an item) that happens before the read, then the write must be the last write (to that item) that happens before the read. Augment for sane behavior for unsafe programs (loose): Don’t allow an early write that “depends on a read returning a value from a data race”. An uncommitted read must return the value of a write that happens-before the read.

The point of all that We use special atomic instructions to implement locks. E.g., a TSL or CMPXCHG on a lock variable lockvar is a synchronization access. Synchronization accesses also have special behavior with respect to the memory system. – Suppose core C1 executes a synchronization access to lockvar at time t1, and then core C2 executes a synchronization access to lockvar at time t2. – Then t1<t2: every memory store that happens-before t1 must be visible to any load on the same location after t2. If memory always had this expensive sequential behavior, i.e., every access is a synchronization access, then we would not need atomic instructions: we could use “Dekker’s algorithm”. We do not discuss Dekker’s algorithm because it is not applicable to modern machines. (Look it up on wikipedia if interested.)

7.1. LOCKED ATOMIC OPERATIONS The 32-bit IA-32 processors support locked atomic operations on locations in system memory. These operations are typically used to manage shared data structures (such as semaphores, segment descriptors, system segments, or page tables) in which two or more processors may try simultaneously to modify the same field or flag…. Note that the mechanisms for handling locked atomic operations have evolved as the complexity of IA-32 processors has evolved…. Synchronization mechanisms in multiple-processor systems may depend upon a strong memory-ordering model. Here, a program can use a locking instruction such as the XCHG instruction or the LOCK prefix to insure that a read-modify-write operation on memory is carried out atomically. Locking operations typically operate like I/O operations in that they wait for all previous instructions to complete and for all buffered writes to drain to memory…. This is just an example of a principle on a particular machine (IA32): these details aren’t important.

Extra slides These remaining slides were not discussed. They deal with consistency models more generally, building on the material presented. E.g., consistency for distributed data systems. You aren’t responsible for this material. These slides are left in for completeness.

Is it OK if… R(x) does not return the “last” W(x,v)? – If it has been “forever” since W(x,v) occurred? – If the reader and writer are the same client? – If the reader saw v on a previous read?

Is forever enough? A SS C W(x)=v R(x) 0 OK “forever” Is this read result “wrong”?

Is forever enough? A SS C W(x)=v R(x) 0 OK “Perfectly legal!” (consistent with “natural order”)

Strange reads I A SS W(x)=v OK R(x) 0 v This SS-assigned ordering violates sequential consistency and what is sometimes called read- your-writes consistency, or session consistency. R(x) W(x)=v R(x)

Strange reads II A SS C W(x)=v R(x) v 0 OK This SS-assigned ordering violates monotonic reads.

Is it OK if R(x) != v… If it has been “forever” since W(x,v) occurred? – eventual consistency If the reader and writer are the same client? – read-your-writes or session consistency If the reader saw v on a previous read? – monotonic read consistency

Is it OK if… R(x) does not return the last W(x,v)? – If the reader “might have heard about” W(x,v)? – If the reader saw some later W(y,u) from the same writer to a different object u?

Is forever enough? A SS C W(x)=v R(x) 0 OK “Perfectly legal!” (consistent with “natural order”)

Potential causality A SS C W(x)=v R(x) 0 OK “Try it now” “Event e1a wrote W(x)=v” If A’s message “might have caused” C to R(x), shouldn’t C’s read be ordered after A’s write?

Causal consistency A SS C W(x)=v R(x) 0 OK This SS-assigned ordering violates causal consistency. But how would the SS know? What if the message is unrelated? “Try it now” “Event e1a wrote W(x)=v”

Potential incoherence A SS C W(x)=v R(x) 0 OK W(y)=u OK R(y) u This SS-assigned ordering violates monotonic write consistency (and sequential consistency).

Potential incoherence A SS C W(x)=v R(x) 0 OK This ordering is called incoherent because the program might define some internal relationship between x and y. W(y)=u OK R(y) u IF (y==u) THEN assert (x==v)

Is it OK if R(x) != v… If the reader “might have heard about” W(x,v)? – causal consistency If the reader saw some later write from the same writer to a different object? – monotonic write consistency These might matter, and they might not, depending on the application.

“It depends” What does it depend on? How can the storage system know if it matters or not? Can the clients “tell” the storage system if and when it matters, through the API? – E.g., Yahoo! PNUTS Are “weak” storage systems OK if the applications know what behavior to expect? What if updates are executable method invocations, as in [BJW87]?

Is it OK if… R(x) does not return the last W(x,v)? – If the reader then writes to x? A SS C W(x)=v R(x) 0 W(x)=u OK

“It depends” What does it depend on? Could a parallel program that does this ever be correct/useful? – How could the program tell the memory/SS if ordering matters in this case? What about an Internet-scale service? Note: this case “does not matter” if there is at most one writer for a shared variable. – E.g., web publishers, DNS