Presentation is loading. Please wait.

Presentation is loading. Please wait.

Concurrent Executions on Relaxed Memory Models Challenges & Opportunities for Software Model Checking Rajeev Alur University of Pennsylvania Joint work.

Similar presentations


Presentation on theme: "Concurrent Executions on Relaxed Memory Models Challenges & Opportunities for Software Model Checking Rajeev Alur University of Pennsylvania Joint work."— Presentation transcript:

1 Concurrent Executions on Relaxed Memory Models Challenges & Opportunities for Software Model Checking Rajeev Alur University of Pennsylvania Joint work with Sebastian Burckhardt and Milo Martin SPIN Workshop, July 2007

2 Model Checking Concurrent Software Multi-threaded Software Shared-memory Multiprocessor Concurrent Executions Bugs

3 Concurrency on Multiprocessors Output not consistent with any interleaved execution!  can be the result of out-of-order stores  can be the result of out-of-order loads  improves performance x = 1 y = 1 print y print x thread 1thread 2 → 1 → 0 Initially x = y = 0

4 Architectures with Weak Memory Models  A modern multiprocessor does not enforce global ordering of all instructions for performance reasons  Each processor has pipelined architecture and executes multiple instructions simultaneously  Each processor has a local cache, and loads/stores to local cache become visible to other processors and shared memory at different times  Lamport (1979): Sequential consistency semantics for correctness of multiprocessor shared memory (like interleaving)  Considered too limiting, and many “relaxations” proposed  In theory: TSO, RMO, Relaxed …  In practice: Alpha, Intel IA-32, IBM 370, Sun SPARC, PowerPC …  Active research area in computer architecture

5 Programming with Weak Memory Models  Concurrent programming is already hard, shouldn’t the effects of weaker models be hidden from the programmer?  Mostly yes …  Safe programming using extensive use of synchronization primitives  Use locks for every access to shared data  Compilers use memory fences to enforce ordering  Not always …  Non-blocking data structures  Highly optimized library code for concurrency  Code for lock/unlock instructions

6 Non-blocking Queue (MS’96) boolean_t dequeue(queue_t *queue, value_t *pvalue) { node_t *head; node_t *tail; node_t *next; while (true) { head = queue->head; tail = queue->tail; next = head->next; if (head == queue->head) { if (head == tail) { if (next == 0) return false; cas(&queue->tail, (uint32) tail, (uint32) next); } else { *pvalue = next->value; if (cas(&queue->head, (uint32) head, (uint32) next)) break; } delete_node(head); return true; } 1 23 headtail

7 Software Model Checking for Concurrent Code on Multiprocessors Why?: Real bugs in real code  Opportunities  10s—100s lines of low-level library C code  Hard to design and verify -> buggy  Effects of weak memory models, fences …  Challenges  Lots of behaviors possible: high level of concurrency  How to formalize and reason about weak memory models?

8 Talk Outline Motivation  Relaxed Memory Models  CheckFence: Analysis tool for Concurrent Data Types

9 Hierarchy of Abstractions Programs (multi-threaded) -- Synchronization primitives: locks, atomic blocks -- Library primitives (e.g. shared queues) Assembly code -- Synchronization primitives: compare&swap, LL/SC -- Loads/stores from shared memory Hardware -- multiple processors -- write buffers, caches, communication bus … Language level memory model Architecture level memory model

10 Shared Memory Consistency Models  Specifies restrictions on what values a read from shared memory can return  Program Order: x < p y if x and y are instructions belonging to the same thread and x appears before y  Sequential Consistency (Lamport 79): There exists a global order < of all accesses such that  < is consistent with the program order < p  Each read returns value of most recent, according to <, write to the same location (or initial value, if no such write exists)  Clean abstraction for programmers, but high implementation cost

11 Effect of Memory Model Ensures mutual exclusion if architecture supports SC memory Most architectures do not enforce ordering of accesses to different memory locations  Does not ensure mutual exclusion under weaker models Ordering can be enforced using “fence” instructions  Insert MEMBAR between lines 1 and 2 to ensure mutual exclusion 1. flag1 = 1; 2. if (flag2 == 0) crit. sect. 1. flag2 = 1; 2. if (flag1 == 0) crit. sect. thread 1 thread 2 Initially flag1 = flag2 = 0

12 Relaxed Memory Models  A large variety of models exist; a good starting point: Shared Memory Consistency Models: A tutorial IEEE Computer 96, Adve & Gharachorloo  How to relax memory order requirement?  Operations of same thread to different locations need not be globally ordered  How to relax write atomicity requirement?  Read may return value of a write not yet globally visible  Uniprocessor semantics preserved  Typically defined in architecture manuals (e.g. SPARC manual)

13 Unusual Effects of Memory Models Possible on TSO/SPARC  Write to A propagated only to local reads to A  Reads to flags can occur before writes to flags Not allowed on IBM 370  Read of A on a processor waits till write to A is complete flag1 = 1; A = 1; reg1 = A; reg2 = flag2; thread 1 thread 2 Initially A = flag1 = flag2 = 0 flag2 = 1; A = 2; reg3 = A; reg4 = flag1; Result reg1 = 1; reg3 = 2; reg2 = reg4 = 0

14 Which Memory Model?  Memory models are platform dependent  We use a conservative approximation “Relaxed” to capture common effects  Once code is correct for “Relaxed”, it is correct for many models TSO PSO IA-32Alpha Relaxed RMO 390 SC

15 Formalization of Relaxed A set of addresses V set of values X set of memory accesses S are stores L are loads a(x) memory address of x v(x) value loaded or stored by x < p is a partial order over X (program order) Correctness wrt Relaxed demands existence of a total order < over X s.t. For a load l in L, define the following set of stores that are “visible to l”: S(l) = { s in S | a(s) = a(l) and (s < l or s < p l ) } 1. If x < p y and a(x) = a(y) and y in S, then x < y 2. For l in L and s in S(l), always either v(l) = v(s) or there exists another store s’ in S(l) such that s < s’

16 Talk Outline Motivation Relaxed memory models  CheckFence: Analysis tool for Concurrent Data Types

17 concurrency libraries with lock-free synchronization... are simple, fast, and safe to use  concurrent versions of queues, sets, maps, etc.  more concurrency, less waiting  fewer deadlocks... are notoriously hard to design and verify  tricky interleavings routinely escape reasoning and testing  exposed to relaxed memory models code needs to contain memory fences for correct operation! CheckFence Focus

18 Architecture: + Multiprocessors + Relaxed memory models Concurrent Algorithms: + Lock-free queues, sets, lists Computer-Aided Verification: + Model checking C code + Counterexamples CheckFence Tool References [CAV 2006], [PLDI 2007] Burckhardt thesis

19 Non-blocking Queue The implementation  optimized: no locks.  not race-free  exposed to memory model The client program  on multiple processors  calls operations....... enqueue(1)... enqueue(2).... Processor 1....... a = dequeue() b = dequeue() Processor 2 void enqueue(int val) {... } int dequeue() {... }

20 Non-blocking Queue (MS’96) boolean_t dequeue(queue_t *queue, value_t *pvalue) { node_t *head; node_t *tail; node_t *next; while (true) { head = queue->head; tail = queue->tail; next = head->next; if (head == queue->head) { if (head == tail) { if (next == 0) return false; cas(&queue->tail, (uint32) tail, (uint32) next); } else { *pvalue = next->value; if (cas(&queue->head, (uint32) head, (uint32) next)) break; } delete_node(head); return true; } 1 23 headtail

21 Correctness Condition Data type implementations must appear sequentially consistent to the client program: the observed argument and return values must be consistent with some interleaved, atomic execution of the operations. enqueue(1) dequeue() -> 2 enqueue(2) dequeue() -> 1 enqueue(1) enqueue(2) dequeue() -> 1 dequeue() -> 2  Observation  Witness Interleaving

22 thread 1 enqueue(X) thread 2 dequeue() → Y How To Bound Executions  Verify individual “symbolic tests”  finite number of concurrent threads  finite number of operations/thread  nondeterministic input values  Example  User creates suite of tests of increasing size

23 Why symbolic test programs? 1) Make everything finite  State is unbounded (dynamic memory allocation)... is bounded for individual test  Checking sequential consistency is undecidable (AMP 96)... is decidable for individual test 2) Gives us finite instruction sequence to work with  State space too large for interleaved system model.... can directly encode value flow between instructions  Memory model specified by axioms.... can directly encode ordering axioms on instructions

24 Bounded Model Checker Pass: all executions of the test are observationally equivalent to a serial execution Fail: CheckFence Memory Model Axioms Inconclusive: runs out of time or memory

25 Tool Architecture C code Symbolic Test Trace Symbolic test gives exponentially many executions (symbolic inputs, dynamic memory allocation, ordering of instructions). CheckFence solves for “bad” executions. Memory model

26 C code Symbolic Test Trace Symbolic Test automatic, lazy loop unrolling automatic specification mining (enumerate correct observations) construct CNF formula whose solutions correspond precisely to the concurrent executions Memory model

27 thread 1 enqueue(X); enqueue(Y) thread 2 dequeue() → Z Specification Mining Possible Operation-level Interleavings enqueue(X) enqueue(Y) dequeue() -> Z enqueue(X) dequeue() -> Z enqueue(Y) dequeue() -> Z enqueue(X) enqueue(Y) For each interleaving, obtain symbolic constraint by encoding corresponding executions in SAT solver Spec is disjunction of all possibilities: Spec: (Z=X) | (Z=null) To find bugs, check satisfiability of Phi & ~ Spec where Phi encodes all possible concurrent executions

28 Encoding Memory Order  Example:  Encoding variables  Use bool vars for relative order (x<y) of memory accesses  Use bitvector variables A x and D x for address and data values associated with memory access x  Encode constraints  encode transitivity of memory order  encode ordering axioms of the memory model Example (for SC): (s1<s2) & (l1<l2)  encode value flow “Loaded value must match last value stored to same address” Example: value must flow from s1 to l1 under following conditions: ( (s1 (D s1 = D l1 ) s1 store s2 store l1 load l2 load thread 1thread 2

29

30

31

32

33 Example: Memory Model Bug Processor 1 links new node into list Processor 2 reads value at head of list --> Processor 2 loads uninitialized value... 3 node->value = 2;... 1 head = node;... 2 value = head->value;... Processor 1 reorders the stores! memory accesses happen in order 1 2 3 adding a fence between lines on left side prevents reordering 1 23 head

34 TypeDescriptionLOCSource QueueTwo-lock queue80M. Michael and L. Scott (PODC 1996) QueueNon-blocking queue98 SetLazy list-based set141Heller et al. (OPODIS 2005) SetNonblocking list174T. Harris (DISC 2001) Deque“snark” algorithm159D. Detlefs et al. (DISC 2000) LL/VL/SCCAS-based74M. Moir (PODC 1997) LL/VL/SCBounded Tags198 Studied Algorithms

35 2 known 1 unknown regular bugs Bounded Tags CAS-based fixed “snark” original “snark” Nonblocking list Lazy list-based set Non-blocking queue Two-lock queue Description Deque LL/VL/SC Deque Set Queue Type Results  snark algorithm has 2 known bugs  lazy list-based set had a unknown bug (missing initialization; missed by formal correctness proof [CAV 2006] because of hand-translation of pseudocode)

36 # Fences inserted 2 known 1 unknown regular bugs 4 1 1 2 1 Store 2 4 Load 4 2 3 1 1 Dependent Loads 4 3 6 3 2 Aliased Loads Bounded Tags CAS-based fixed “snark” original “snark” Nonblocking list Lazy list-based set Non-blocking queue Two-lock queue Description Deque LL/VL/SC Deque Set Queue Type Results  snark algorithm has 2 known bugs  lazy list-based set had a unknown bug (missing initialization; missed by formal correctness proof [CAV 2006] because of hand-translation of pseudocode)  Many failures on relaxed memory model inserted fences by hand to fix them small testcases sufficient for this purpose

37 Typical Tool Performance  Very efficient on small testcases (< 100 memory accesses) Example (nonblocking queue): T0 = i (e | d) T1 = i (e | e | d | d ) - find counterexamples within a few seconds - verify within a few minutes - enough to cover all 9 fences in nonblocking queue  Slows down with increasing number of memory accesses in test Example (snark deque): Dq = pop_l | pop_l | pop_r | pop_r | push_l | push_l | push_r | push_r - has 134 memory accesses (77 loads, 57 stores) - Dq finds second snark bug within ~1 hour  Does not scale past ~300 memory accesses

38 Conclusions  Software model checking of low-level concurrent software requires encoding of memory models  Challenge for model checking due to high level of concurrency and axiomatic specifications  Opportunity to find bugs in library code that’s hard to design and verify  CheckFence project at Penn  SAT-based bounded model checking for concurrent data types  Real bugs in real code with fences

39 Research Challenges  What’s the best way to verify C code (on relaxed memory models)?  SAT-based encoding seems suitable to capture specifications of memory models, but many opportunities for improvement  Can one develop explicit-state methods for these applications?  Language-level memory models: Can we verify Java concurrency libraries using the new new Java memory model?  Hardware support for transactional memory  Current interest in industry and architecture research  Can formal verification influence designs/standards?


Download ppt "Concurrent Executions on Relaxed Memory Models Challenges & Opportunities for Software Model Checking Rajeev Alur University of Pennsylvania Joint work."

Similar presentations


Ads by Google