# Verification for Concurrency Part 2: Incomplete techniques and bug finding.

## Presentation on theme: "Verification for Concurrency Part 2: Incomplete techniques and bug finding."— Presentation transcript:

Verification for Concurrency Part 2: Incomplete techniques and bug finding

Contents  Race detection  Context bounding and Sequentialization  Odds and ends

Race detection

Data Races  “Two threads simultaneously access the same memory location, with at least one access being a write”  Most concurrent software is written to avoid data-races  Extremely tricky to write racy code that is correct  For racy code, correctness on one architecture and complier does not imply correctness for all  Race-detection is a very efficient light-weight bug detection technique  Dynamic techniques: Work on actual executions

The Lockset algorithm  Lockset assumption: Whenever two different threads access a location (with one of them being a write), they will both hold a common lock.  Possible to write non-racy code violating this assumption  False positives from code that violates this assumption

Vanilla Lockset algorithm  Based on a simple locking discipline  Each variable is protected by some lock for each tid: LocksHeld[tid] = ∅ for each v : Cands[v] = set of all locks for each instruction in trace: // Maintain lockset if (instruction = lock(l)) LocksHeld[tid] = LocksHeld[tid] ∪ { l } if (instruction = unlock(l)) LocksHeld[tid] = LocksHeld[tid] \ { l } // Update candidate locks if (access to v by thread tid) Cands[v] := Cands[v] ∩ LocksHeld[tid] if (Cands[v] = ∅ ) Output “Potential race on v”

Vanilla Lockset algorithm: Example Thread 1: Thread 2: lock (l1) x := x + 4 unlock (l1) lock(l1) lock(l2) x := x – y y := y + 4 unlock(l2) unlock(l1) lock (l2) y = y + 5 unlock (l2) x := x - 3

Vanilla LockSet: False positive Thread1:Thread2:Thread3: // Initialize g := 1729 // Share published := true assume(published) y := g z := g y := f(l, y); z := g(l, z) output(y) output(z) lock(l) unlock(l) lock(l) unlock(l) lock(l) unlock(l)

More advanced locking disciplines  Simple locking discipline leads to false positives in many common cases  Lazy initialization  Initialize followed by read-only access Virgin ExclusiveShared Shared Modified write r/w (first thread) write (new thread) read (new thread) write

More advanced locking discipline  Works well for initialize-and-publish, lazy initialization, etc  Doesn’t work with ownership-transfer, etc for each access to v by thread tid: update State[v] if State[v] = Exclusive // Do nothing else if State[v] = Shared // Update lockset, don’t report any races Cands[v] = Cands[v] ∩ LocksHeld[tid] else if State[v] = Shared-Modified // Update lockset, report races Cands[v] := Cands[v] ∩ LocksHeld[tid] if (Cands[v] = ∅ ) Output “Potential race on v”

Advanced LockSet: Examples Thread1:Thread2:Thread3: // Initialize g := 1729 // Share published := true assume(published) y := g z := g y := f(l, y); z := g(l, z) output(y) output(z) lock(l) unlock(l) lock(l) unlock(l) lock(l) unlock(l)

Advanced LockSet: False positive Thread1:Thread2: … await (Obj.owner == 1); Obj.foo() Obj.bar := baz(); … Obj.owner := 2 … await (Obj.owner = 2); Obj.bar() Obj.foo = baz(); …

Happens-Before relation  Proposed by Lamport in 1978 for distributed systems  Many race detection algorithms try and approximate the happens-before relation  Basic idea: two events are related if and only if communication allows information-flow between them  We write e i  e j for event-i happens-before event-j  Informally, if e i  e j, e i happens-before e j in all variations of the trace  A race if two events e i and e j accesses the same location, and neither e i  e j holds, nor e j  e i holds

Happens-Before Relation: Defintion  If two events are from the same thread, the earlier one happens-before the later  thread(e i ) = thread(e j ) ∧ i < j  e i  e j  Happens-before is transitive  (e i  e j ) ∧ (e j  e k )  (e i  e k )  Every synchronization gives some happens-befores  LOCK: if e i is a unlock and e j is a lock later, e i  e j  WAIT/NOTIFY: if e i is a notify and e j is a corresponding wait, e i  e j ……

Happens-Before: Examples Thread1:Thread2: obj := new Foo() data = readFile() Notify(obj) Wait(obj) obj.data = data Notify(obj) Wait(obj) lock(l) obj.data = obj.data + 4 unlock(l) lock(l) obj.data = obj.data - 4 unlock(l)…

Computing the Happens-Before: Vector Clocks  The happens-before relation is usually very expensive to compute  Few dynamic techniques actually compute the full relation  Classical method proposed by Lamport himself  Vector clocks: with each event, associate a “vector clock” storing the last event from the other threads that affects it  e i  e j if and only if VC[e j ][thread(i)] >= e i

Vector clocks VC[e] = [ e 1, e 2, e 3, …, e n ] Last relevant event from thread 1 Last relevant event from thread 2 VC[e][thread(e)] := e // regular events VC[e][tid] := VC[prev(e)][tid] // Acquire locks VC[e][tid] := max(VC[prev(e)][tid], LVC[tid]) // Release locks VC[e][tid] := VC[prev(e)][tid] LVC[tid] := max(VC[e][tid], LVC[tid]) Can be extended to other synchronization primitives! e i  e j if and only if VC[e j ][thread(i)] >= e i

Vector clocks: Examples Thread1:Thread2: T1_0: obj := new Foo() T2_0: data = readFile() T1_1: Notify(obj) T2_1: Wait(obj) T2_2: obj.data = data T2_3: Notify(obj) T1_2: Wait(obj) T1_3: lock(l) T1_4: obj.data = obj.data + 4 T1_5: unlock(l) T2_4: lock(l) T2_5: obj.data = obj.data - 4 T2_6: unlock(l)…

What does HB miss?  Every race or false race reported by happens-before based methods are also reported by LockSet based methods  Fewer false positives, Potential false negatives  Why? Every happens-before relation is not really a true synchronization Thread1:Thread2: y = y + 1 lock(l) x = x + 1 unlock(l) lock(l) x = x + 1 unlock(l) y = y + 1

Race detection: summary  First line of defence for most concurrent programs  Many bugs just show up as race conditions  Lockset is fast  Lots of false positives  Happens-before is slow  Reports only true data races  Potential false negatives  There are hybrid techniques  Compute approximations of LockSet and HB

Context bounding and Sequentialization

Context bounding  Folk knowledge: Most concurrency bugs are shallow in terms of required context-switches  Most bugs require very few bug fixes  Most concurrency bugs are atomicity violations or order violations  For an empirical study, see Shan Lu et al. 2006…2008  Why not check concurrent programs only up to a few context switches?  Much more efficient

CHESS: Systematic exploration  Culmination of techniques proposed by Qadeer et al in 2004  Correctness primarily given by assertions in the code  Can also use monitors  Can detect data-races, deadlocks, etc  Main idea: Use a scheduler that explores traces of the program deterministically, prioritizing traces having few context-switches

CHESS: Controlling scheduler  Non-determinism source:  Input  Scheduling  Timing and library  Input non-determinism controlled by specifying fixed inputs  Scheduling non-determinism controlled by writing deterministic scheduler  Library non-determinism: model library code

State-space explosion  Exploring k steps in each of the n threads  Number of executions is O(n nk )  Exploring k steps in each thread, but only c context-switches  Number of executions is O((n 2 k) c.n!)  Not exponential in k Thread1: x = 1 … y = k Threadn: x = 1 … y = k … Additionally, scheduler can use polynomial amount of space Remember c spots for context switches Permutations of the n+c atomic blocks

Scheduling: Picking pre-emption points void Deposit100() { ChessSchedule(); EnterCriticalSection(&cs); balance += 100; ChessSchedule(); LeaveCriticalSection(&cs); } void Withdraw100() { int t; ChessSchedule(); EnterCriticalSection(&cs); t = balance; ChessSchedule(); LeaveCriticalSection(&cs); ChessSchedule(); EnterCriticalSection(&cs); balance = t -100; ChessSchedule(); LeaveCriticalSection(&cs); }  Heuristics: More pre-emption points in critical code, etc  Coverage guarantee: When 2 context-switches are explored, every remaining bug requires at least 3 context-switches

CHESS: Summary  Build a deterministic scheduler  Complications: Fairness and Live locks, weak memory models  Advantages:  Runs real code on real systems  Only scheduler has been replaced  Disadvantages:  Is mostly program agnostic  Exhaustive testing

Sequentialization  CHESS approach: Concurrent program + bound on context switches  explore all interleavings  General sequentialization approach: Concurrent program + bound on context switches  Sequential program  Then, verify sequential program using your favourite verification technique  Many flavours of context-bounded analysis:  PDS based (Qadeer et al.)  Transformation based sequentialization: Eager, Lazy (Lal et al.)  BMC based (Parlato et al.)

Sequentialization: Basic idea  What is hard about sequentialization?  Have to remember local variables across phases (though they don’t change)  If exploring T1  T2  T1, have to remember locals of T1 across phase of T2  Lal-Reps 2008: Instead, do a source to source transformation  Copy each statement and global variable c times  Now, we can explore T1  T1  T2 instead of T1  T2  T1  Only one threads local variables relevant at each stage

Sequentialization: Basic idea  Replace each global variable X by X[tid][0..K]  X[tid][i] represents the value of the global variable X the i th time thread tid is scheduled X := X + 1 if (phase = 0) X[tid][0] := X[tid][0] + 1 else if (phase = 1) X[tid][1] := X[tid][1] + 1 … else if (phase = K) X[tid][K] = X[tid][K] + 1 if (phase < K && *) phase++; if phase == K + 1 phase = 1 Thread[tid+1]()

Sequentialization: Basic idea  A program (T1||T2) is rewritten into Seq(T1); Seq(T2); check()  Roughly,  Execute each thread sequentially  But, at random points, guess new values for global variables  In the end, check the guessed new values are consistent for phase = 0 to K if (phase > 0) assume (X[0][phase] == X[N][phase – 1] for tid = 1 to N assume (X[tid][phase] == X[tid-1][phase])

Sequentialization Thread 0: … X[0][0] := X[0][0] + 1 … X[0][1] := X[0][1] + 1 … X[0][2] := X[0][2] + 1 … Thread 1: … X[1][0] := X[1][0] + 1 … X[1][1] := X[1][1] + 1 … X[1][2] := X[1][2] + 1 … Each green arrow is one part of the check!

Sequentialization  The original Lal/Reps technique uses summarization for verification of the sequential program  Compute summaries for the relation of initial and final values of global variables  Extremely powerful idea  Advantage: Reduces the need to reason about locals of different threads  No need to reason explicitly about interleavings  Interleavings encoded into data (variables)  Scales linearly with number of threads

Sequentialization and BMC  Currently, the best tools in the concurrency verification competitions use “sequentialization + BMC”  The previous sequentialization technique is better suited for analysis techniques, not model checking  No additional advantage using additional globals and then checking for consistency  Instead, just explicitly use non-determinism

BMC for concurrency  First, rewrite threads by unrolling loops and inlining function calls  No loops  No function calls  Forward only control flow  Write a driver “main” function to schedule the threads one by one

Naïve sequentialization for BMC Main driver: pc0 = 0, …, pcn = 0 main() { for (r = 0; r < K, r++) for (i = 0; i < n; i++) threadi(); } threadi(): switch(pci) { case 0: goto 0; case 1: goto 1; … } 0: CS(0); stmt0; 1: CS(1); stmt1; … M: CS(M); stmtm; CS(j) := if(*) { pci = j; return } The resume mechanism jumps into “right” spot in the thread There is a potential CS before each statement What’s the problem? Lots of jumps in the control flow Bad for SMT encoding

Better sequentialization for BMC Main driver: pc0 = 0, …, pcn = 0 main() { for (r = 0; r < K, r++) for (i = 0; i < n; i++) nextCS = * assume (nextCS >= pci) threadi(); pci = nextCS } threadi(): 0: CS(0); stmt0; 1: CS(1); stmt1; … M: CS(M); stmtm; CS(j) := if(j = nextCS) { goto j+1; } Avoid the multiple control flow breaking jumps Restricted non-determinism to one spot

Context bounding and Sequentialization: Summary  Host of related techniques  Can be adapted for analysis, model checking, testing, etc  Different techniques need different kinds of tuning  Basic idea: Most bugs require few context switches to turn up  Can leverage standard sequential program analysis techniques

Odds and Ends Things we didn’t cover

Specification-free correctness  In many cases we don’t want to write assertions  Just want concurrent program to do the same thing as a sequential program is doing  Standard correctness conditions  Linearizability [Herlihy/Wing 91]  Serializability [Papadimitrou and others 70s] Method 0 Method 3 Method 2 Method 1 Method 0Method 3Method 1 Method 2 Conc. ExecConc. Exec Seq. Exec

Testing for concurrency  Root cause of bugs  Ordering violations  Atomicity violations  Data races  Coverage metrics and coverage guided search  Define use pairs [Tasirin et al]  Find ordering violations based on define use orderings  HaPSet [ Wang et al]  Find interesting interleavings by trying to cover all “immediate histories” of events  Cute/JCute [Sen et al]  Concolic testing: Accumulate constraints along test run to guide future test runs

(Symbolic) Predictive Analysis  Analyze variations of the given concurrent trace  Run a test and record information  Build a predictive model by relaxing scheduling constraints  Analyze predictive model for alternate interleavings  Can flag false bugs  Symbolic predictive analysis  From a trace, build precise predictive model (as SMT formula)  No false bugs

This is the End  Brief overview of concurrent verification techniques  Lecture 1: Full proof techniques  Lecture 2: Incomplete techniques / Bug finding  What did we learn?  Full verification is hard, not many techniques for weak- memory architectures  Use light-weight and incomplete techniques to detect shallow bugs  Code using a strict concurrency discipline is more likely to be correct, easier to verify

Download ppt "Verification for Concurrency Part 2: Incomplete techniques and bug finding."

Similar presentations