Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace.

Similar presentations


Presentation on theme: "A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace."— Presentation transcript:

1 A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace

2 UPC Meeting, SC03: Consistency Models2Oct 17, 2003 Proposal for UPC Spec Replace wording in body of spec with prose that: Defines a data race as: Two concurrent memory operations from two different threads to the same memory location in which at least one is a write. Defines a race-free program as one in which: All executions of the program are free of data races (would be nice if the user could only worry about naïve implementations) And states that programs will behave as if all operations from each thread execute in order if one of the following holds: The program is race-free The program contains no relaxed operations Refers readers to an appendix for programs with races

3 UPC Meeting, SC03: Consistency Models3Oct 17, 2003 Formalism The appendix (or later section) of the language references would contain something akin to the following formalism This can be done in a page or 2 In addition it would refer to an extended report on: An operational (“state machine”) model of semantics A study of various optimizations techniques and whether or not they are correct Caching (when to flush, problems with not flushing) Reordering by the compiler (should be allowed on relaxed operations as long as there are no dependencies) Use of non-blocking operations or weak hw models (+fences)

4 UPC Meeting, SC03: Consistency Models4Oct 17, 2003 Behavioral Approach Problems with operations specifications Implicit assumptions about implementation strategy (e.g., caches) May unnecessarily restrict implementations Intuitive in principle, but complicated in practice A Behavioral Approach Based on partial and total orders Using Sequential Consistency definition as model Processor order defines a total order on each thread Their union defines a partial order 9 a consistent total order that is correct as a serial execution P 0 P 1

5 UPC Meeting, SC03: Consistency Models5Oct 17, 2003 Some Basic Notation The set of operations is O t = the set of operations issued by thread t The set of memory operations is: M = {m 0, m 1, …} M t = the set of memory operations from thread t Each memory operations has properties Thread(m i ) is the thread that executed the operation Location(m i ) is the memory location involved Memory operations are partitioned into 6 sets, given by S = Strict, R =Relaxed, P =Private W =Write, R =Read (in the 2 nd position) Some useful groups: Strict(M) = SW(M) [ SR(M) W(M) = SW(M) [ RW(M) [ PW(M)

6 UPC Meeting, SC03: Consistency Models6Oct 17, 2003 Compiler Assumption For specification purposes, assume the code is compiled by a naïve compiler in to ISO C machine Real compilers may do optimizations E.g., reorder, remove, insert memory operations Even strict operations may be reordered with sufficient analysis (cycle detection) These must produce an execution whose input/output and volatile behavior is identical to that of an unoptimized program (ISO C)

7 UPC Meeting, SC03: Consistency Models7Oct 17, 2003 Orderings on Strict Operations Threads must agree on an ordering of: For pairs of strict accesses, it will be total: For a strict/relaxed pair on the same thread, they will all see the program order

8 UPC Meeting, SC03: Consistency Models8Oct 17, 2003 Orderings on Local Operations Conflicting accesses have the usual definition Given a serial execution S = [o 1,…o n ] defining < S let S t be the subsequence of operations issued by t S conforms to program order for thread t iff: S t is consistent with the program text for t (follows control flow) S conforms to program dependence order for t iff 9 a permutation P(S) such that: P(S) conforms to program order for t 8 (m 1, m 2 ) 2 Conflicting(M) m 1 < S m 2, m 1 < P(S) m 2 This is a bit too strong on anti- dependencies

9 UPC Meeting, SC03: Consistency Models9Oct 17, 2003 UPC Consistency An execution on T threads with memory ops M is UPC consistent iff: 9 a partial < strict that orients all pairs in allStrict(M) And for each thread t 9 a total order < t on O t [ W(M) [ SR(M) < t is consistent with < strict All threads agree on ordering of strict operations < t conforms to program dependence order Local dependencies are observed < t is a correct execution Reads return most recent write values

10 UPC Meeting, SC03: Consistency Models10Oct 17, 2003 Intuition on Strict Oderings Each thread may “build” its own total order to explain behavior They all agree on the strict ordering shown above in black, but Different threads may see relaxed writes in different orders Allows non-blocking writes to be used in implementations Each thread sees own dependencies, but not those of other threads Weak, but otherwise there would place consistency requirements on some relaxed operations (e.g., local cache control insufficient) Preserving dependencies requires usual compiler/hw analysis P 0 P 1

11 UPC Meeting, SC03: Consistency Models11Oct 17, 2003 Synchronization Operations UPC has both global and pairwise synchronization In addition to the synchronization properties, they also have memory model implications: Locks upc_lock is a strict read upc_unlock is a strict write Barriers (which may be split-phase) upc_notify (begin barrier) is a strict write upc_wait (end of barrier) is a strict read upc_barrier = upc_notify; upc_wait

12 UPC Meeting, SC03: Consistency Models12Oct 17, 2003 Alternative Models As specified, two relaxed writes to the same location may be viewed differently by different processors Nothing to force eventual consistency (likely in implementations) May add this to barrier points, at least So far it looks ad hoc Adding directionality to reads/writes seems reasonable Strict reads “fence” things that follow Strict writes “fence” things that precede Simply replace for StrictOnThreads definition Support user-defined synchronization primitive built from strict operations

13 UPC Meeting, SC03: Consistency Models13Oct 17, 2003 Some Bizarre Behavior The following “out of thin air” behavior: Given shared variables x&y, where x&y are initially 0 t 0 : r1 = x y = r1 t 1 : r2 = y x = r2 x and y end with 42 (or any other arbitrary value) How does this happen? t 0 speculates that x is 42 and writes that value to y t 1 sees 42 in y and writes it into x this validates t 0 ’s speculative read

14 UPC Meeting, SC03: Consistency Models14Oct 17, 2003 Atomicity Issues Atomicity: Is there a word size (or type) such that A write of anything larger is defined as a set of word-sized operations (so a user might see a partial update) E.g., is writing a struct the same as writing each field (or some maximum size) Tearing: Is there a word size (or type) such that Can two writes to the same location result in a merged value? Clobbering: Is there a word size (or type) such that If something smaller is written, it might clobber writes to a neighboring value E.g., two processors write to two consecutive bytes in an array, the processor does a read-modify-write for each, one can be lost Conflicts: on what size are these defined?

15 UPC Meeting, SC03: Consistency Models15Oct 17, 2003 UPC Bulk Operation Semantics Are upc_memput, upc_memget, upc_memcpy relaxed or strict? If relaxed, then the user can get strict behavior by putting a strict operation (or operations in the nonsymmetric case) before and after Will this be surprising to users? What do current implementations do?

16 UPC Meeting, SC03: Consistency Models16Oct 17, 2003 UPC Fence Operations Should UPC have separate functions for: read fence: prevents memory operations from moving before it write fence: prevents memory operations from moving after it Or let the programming build these by doing a stricture read/write to some otherwise unused variable?

17 UPC Meeting, SC03: Consistency Models17Oct 17, 2003 Future Plans Show that various implementations satisfy this spec Use of non-blocking writes for relaxed writes with write fench/synch at strict points Compiler-inserted prefetching of relaxed reads Compiler-inserted “message vectorization” to aggregate a set of small operations into one larger one A software caching implementation with cache flushes at strict points Develop an operational model and show equivalence (or at least that it implements the spec) Define the data unit of atomicity Fundamental unit of interleaving, Data tearing, Conflicts

18 UPC Meeting, SC03: Consistency Models18Oct 17, 2003 Properties of UPC Consistency A program containing only strict operations is sequentially consistent A program that produces only race-free executions is sequentially consistent A UPC consistent execution of a program is race-free if for all threads t and all enabling orderings < t For all potential races: If m 1 < t m 2 then 9 synchronization operations o 1, o 2 such that m 1 < t o 1 < t o 2 < t m 2 and Thread ( o 1 ) = Thread ( m 1 ) and Thread ( o 2 ) = Thread ( m 2 ) and either o 1 is upc_notify and o 2 is upc_wait or o 1 is upc_unlock and o 2 is upc_lock on the same lock variable


Download ppt "A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace."

Similar presentations


Ads by Google