Concurrency Checking with CHESS: Learning from Experience Tom Ball, Sebastian Burckhardt, Chris Dern, Madan Musuvathi, Shaz Qadeer.

Concurrency Checking with CHESS: Learning from Experience Tom Ball, Sebastian Burckhardt, Chris Dern, Madan Musuvathi, Shaz Qadeer

Outline What is CHESS? – a testing tool, plus – a test methodology (concurrency unit tests) – a platform for research and teaching Chess design decisions Learnings from CHESS user forum, champions

What is CHESS? CHESS is a user-mode scheduler Controls all scheduling nondeterminism – Hijacks scheduling control from the OS Guarantees: – Every run takes a different thread schedule – Reproduce the schedule for every run

Concurrency Unit Tests Generally, in our test environment, we want to test what we call scenarios. A scenario might be a specific feature or API usage. In my case I am trying to test the scenario of a user canceling a command execution on a different thread. Steve Hale, Microsoft

A Concurrency Unit Test Pattern: Fork-Join void ForkJoinTest() { var t1 = new Thread(() => { S1 }); var t2 = new Thread(() => { S2 }); t1.Start(); t2.Start(); t1.Join(); t2.Join(); Debug.Assert(...); }

Concurrency Unit Tests Small scope hypothesis – For most bugs, there exists a short-running scenario with only a few threads that can find it Unit tests provide – Better coverage of schedules – Easier debugging, regression, etc.

CHESS as Research/Teaching Platform http://research.microsoft.com/chess/ http://research.microsoft.com/chess/ Source code release – chesstool.codeplex.com chesstool.codeplex.com Courseware with CHESS – Practical Parallel and Concurrent Programming Practical Parallel and Concurrent Programming – coming this fall! Preemption bounding [PLDI07] – speed search for bugs – simple counterexamples Fair stateless exploration [PLDI08] – scales to large programs Architecture [OSDI08] – Tasks and SyncVars – API wrappers Store buffer simulation [CAV08] Preemption sealing [TACAS10] – orthogonal to preemption bounding – where (not) to search for bugs Best-first search [PPoPP10] Automatic linearizability checking [PLDI10] More features – Data race detection – Partial order reduction – More monitors…

CHESS Design Decisions Stateless state space exploration No change to underlying scheduler Ability to enumerate all/only feasible schedules Schedule points = synchronization points and use race detection to make up the difference Serialize concurrent behavior Suite of search/reduction strategies – preemption bounding, sealing – best-first search Monitor API to easily add new checking capability

Stateless model checking [Verisoft] Given a program with an acyclic state space Systematically enumerate all paths Dont capture program states Not necessary for termination Precisely capturing states is hard and expensive At the cost of potentially revisiting states Partial-order reduction alleviates redundant exploration

CHESS architecture CHESS Scheduler CHESS Scheduler Unmanaged Program Unmanaged Program Windows Managed Program Managed Program CLR CHESS Exploration Engine CHESS Exploration Engine Win32 Wrappers.NET Wrappers Capture scheduling nondeterminism Drive the program along an interleaving of choice

Running Example Lock (l); bal += x; Unlock(l); Lock (l); bal += x; Unlock(l); Lock (l); t = bal; Unlock(l); Lock (l); bal = t - y; Unlock(l); Lock (l); t = bal; Unlock(l); Lock (l); bal = t - y; Unlock(l); Thread 1Thread 2

Introduce Schedule() points Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Thread 1Thread 2 Instrument calls to the CHESS scheduler Each call is a potential preemption point

First-cut solution: Random sleeps Introduce random sleep at schedule points Does not introduce new behaviors Sleep models a possible preemption at each location Sleeping for a finite amount guarantees starvation-freedom Sleep(rand()); Lock (l); bal += x; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); bal += x; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); t = bal; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); bal = t - y; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); t = bal; Sleep(rand()); Unlock(l); Sleep(rand()); Lock (l); bal = t - y; Sleep(rand()); Unlock(l); Thread 1Thread 2

Improvement 1: Capture the happens-before graph Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Thread 1Thread 2 Delays that result in the same happens-before graph are equivalent Avoid exploring equivalent interleavings Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Sleep(5)

Improvement 2: Understand synchronization semantics Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Thread 1Thread 2 Avoid exploring delays that are impossible Identify when threads can make progress CHESS maintains a run queue and a wait queue Mimics OS scheduler state Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Lock (l); t = bal;

Emulate execution on a uniprocessor Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Schedule(); Lock (l); bal += x; Schedule(); Unlock(l); Thread 1Thread 2 Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); bal = t - y; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Schedule(); Lock (l); t = bal; Schedule(); Unlock(l); Enable only one thread at a time Linearizes a partial-order into a total-order Controls the order of data- races

CHESS modes: speed vs coverage Fast-mode Introduce schedule points before synchronizations, volatile accesses, and interlocked operations Finds many bugs in practice Data-race mode Repeat Find data races Introduce schedule points before racing memory accesses Captures all sequentially consistent (SC) executions

Capture all sources of nondeterminism? No. Scheduling nondeterminism? Yes Timing nondeterminism? Yes Controls when and in what order the timers fire Nondeterministic system calls? Mostly CHESS uses precise abstractions for many system calls Input nondeterminism? No Rely on users to provide inputs Program inputs, files read, packets received,… Good tradeoff in the short term But cant find race-conditions on error handling code

CHESS architecture CHESS Scheduler CHESS Scheduler Unmanaged Program Unmanaged Program Windows Managed Program Managed Program CLR CHESS Exploration Engine CHESS Exploration Engine Win32 Wrappers.NET Wrappers

CHESS wrappers Translate Win32/.NET synchronizations Into CHESS scheduler abstractions Tasks : schedulable entities Threads, threadpool work items, async. callbacks, timer functions SyncVars : resources used by tasks Generate happens-before edges during execution Executable specification for complex APIs Most time consuming and error-prone part of CHESS Enables CHESS to handle multiple platforms

http://msdn.microsoft.com/en-us/devlabs/cc950526.aspx http://social.msdn.microsoft.com/Forums/en-US/chess/threads/ Learning from Experience: User forum, Champions

CHESS Doesnt Scale Hmm… we just ran CHESS on the Singularity operating system (and found bugs in the bootup/shutdown sequence) What they usually mean: CHESS isnt very effective on a long-running test There are a lot of possible schedules! Time for enumerative model checking (Time to execute one test) x (# schedules)

Find lots of bugs with 2 preemptions ProgramLines of codeBugs Work Stealing Q4K4 CDS6K1 CCR9K3 ConcRT16K4 Dryad18K7 APE19K4 STM20K2 TPL24K9 PLINQ24K1 Singularity175K2 37 (total)

CHESS Isnt Push Button The more I look at CHESS the more I realize that I could use some general guidance on how to author test code that will actually help CHESS reveal concurrency bugs. Daniel Stolt

Challenge -> Opportunity: New Push button concurrency tools Cuzz [ASPLOS 2010]: Concurrency Fuzzing Attach to any running executable Find concurrency bugs faster through smart fuzzing Lineup [PLDI 2010]: Automatic Linearizability Checking Generate thread-safety tests for a class automatically Use sequential behavior as oracle for concurrent behavior CHESS underneath

CHESS Doesnt Find This Bug RTFM is not helpful Instead, generate helpful warning messages Warning: running CHESS without race detection can miss bugs Or, turn race detection on for a few executions. void ForkJoinTest() { int x = 0; var t1 = new Thread(() => { x=x+1; }); var t2 = new Thread(() => { x=x+1; }); t1.Start(); t2.Start(); t1.Join(); t2.Join(); Debug.Assert(x==2); }

CHESS Cant Avoid Finding Bugs Solution is working and found two bug with CHESS. To get the second bug, I had to fix first bug first That liveness bug is such a minor performance problem that I wont fix it.

Playing CHESS with George

Sealed Methods AssertsTimeoutsLivelocksDeadlocksLeaksPass 5340005 +TryDequeue6501140 +WaitForTask5502140 +Reg.Recv. +PostInternal 5500043

CHESS is Confusing Me

The Nondeterminism Saga: static data, lazily initialized E F If replay of p.E fails, yielding p.F, then try again and see if p.F replays Report lost coverage p

Nondeterminism Junkie: Too much information Why does this test pass instead of say Detected nondeterminism outside the control of CHESS"?

Is this good behavior for CHESS to return three different results for the same code?

CHESS Time Isnt Real Time: Its a feature, not a bug. The call to WaitOne(60000, false) immediately returns false, which isnt correct. If I use WaitOne() or WaitOne(Timeout.Infinite, false) instead of WaitOne(60000, false), the WaitHandle waits till the Event is set, returns true and everything goes fine. But waiting without a timeout isn't an option in my case.

The expected: I cant play CHESS on x64 Multi-process programs Message passing, distributed systems The Boost library.NET without the CLR Profiler Java Unix …

Learning from Experience: Forums, Champions Chris Dern, Steve Hale, Ram Natarajan, Roy Tan

Congratulations CHESS team!!!!! I have proven outside of CHESS that the issue it is finding in our product on the 106 th thread schedule looks like a valid product bug!! I wrote a quick application to launch my CHESS test outside of CHESS and by freezing/thawing threads I was able to reproduce the issue independently. This is incredibly exciting!!! Many thanks for your patience, perseverance, and CHESS bug fixes as Ive struggled to understand CHESS. Steve Hale, Microsoft, 2/12/2009

ConcurrentDictionary ConcurrentBag SemaphoreSlim ManualResetEventSlim Barrier BlockingCollection Task TaskScheduler PLINQ Parallel.For

As the true value of a test is in its ability to find bugs, lets take a look at how our CHESS tests did. Over the development cycle to date, the CHESS test found seven bugs, and was used to reproduce another seven for a total of 14, out of the 276 high priority bugs over the same time. While only 14 bugs against 276 appear sadly anemic, its important to dig a bit deeper. If we address each of the issues raised, would we find more bugs? Chris Dern, PFX_CHESS_Review_Final.docx

Early on the adoption of CHESS, we made a fatal mistake. Perhaps it was wishful thinking on our part, or perhaps we believed too much in the marketing hype and didnt read the fine print. We believed early on that CHESS was a turnkey solution capable of using existing tests and test approaches and finding the bugs. C. Dern

The schedule for any product group is always under attack. Over the life cycle of a product, features are in constant flux, with managers always balancing risk and reward. In the face of this pressure, any untried tool, methodology, or approach faces an uphill battle. C. Dern

For tool developers, its important that once you engage with a customer you help find then drive to some level of success. Finding a single bug is a priceless commodity when arguing to continue the time investment in a specific tool. Take small bites, set modest goals and drive to success. Perfect is the enemy of good, or at least good enough right now. C. Dern

Derns DOs and DONTs DO NOT expect that CHESS will magically find your bugs. CHESS is a tool, mainly focused at enumerating schedules for a given bound. While it can find specific types of concurrency bugs, e.g. deadlocks, for free the value and benefit of CHESS comes with deliberate tests.

DO develop an understanding of what properties, invariants, and behaviors your test is testing DO run your tests. While this may seem a silly tip, but its important to remember that CHESS enables the familiar write, run, refactor test experience for concurrent tests, which we enjoy with sequential tests today.

DO NOT add artificial spinning/busy work in the test. CHESS will explore all schedules for your specified bound. Adding busy work, like you may find in a stress test to increase coverage, only increases the test runtime when under CHESS.

AVOID blindly converting an existing stress style unit test into a CHESS test. The size, scale, and assertions that one tends to find in those types of tests make for a weak CHESS test at best, or a unusable CHESS test at worst.

Stepping Back from the Fray: High-level Learnings Proper expectation setting Good methodology Good default behavior Good warnings and messages Minimize cognitive dissonance Cultivate champions Listen to them and learn!

Three CHESS Learnings 1. If you want deterministic scheduling with ability to explore all schedules without changing the underlying scheduler Then its hard to achieve high API coverage robustness Action: we need observable and controllable schedulers! 2. Concurrency unit testing can be effective, but requires careful planning and scoping 3. Search/reduction strategies are absolutely essential

Uplifting Message and Blatant Advertisement for LineUp Talk Partnerships and Collaborations The success of the LineUp work is a perfect example of [the benefits of] an open dialog between the teams along with continual experimentation by both sides. Combining innovations from both research and product testing group, we create[d] a complete solution to one area of concurrency testing. C. Dern

Concurrency Checking with CHESS: Learning from Experience Tom Ball, Sebastian Burckhardt, Chris Dern, Madan Musuvathi, Shaz Qadeer.

Similar presentations

Presentation on theme: "Concurrency Checking with CHESS: Learning from Experience Tom Ball, Sebastian Burckhardt, Chris Dern, Madan Musuvathi, Shaz Qadeer."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Concurrency Checking with CHESS: Learning from Experience Tom Ball, Sebastian Burckhardt, Chris Dern, Madan Musuvathi, Shaz Qadeer.

Similar presentations

Presentation on theme: "Concurrency Checking with CHESS: Learning from Experience Tom Ball, Sebastian Burckhardt, Chris Dern, Madan Musuvathi, Shaz Qadeer."— Presentation transcript:

Similar presentations

About project

Feedback