Healing Data Races On-The-Fly

Healing Data Races On-The-Fly
Bohuslav Krena, Zdenek Letko, Rachel Tzoref, Shmuel Ur, and Tomas Vojnar Ok-Kyoon Ha OS Lab., GNU

Contents Background Motivation Self-Healing Steps Experiment
Problem Detection Problem Localization Problem Healing Healing Assurance Experiment Conclusion

Background- what is a race?
A data race occurs when two concurrent threads access a shared variable - at least one access is a write - the accesses are unordered by any synchronization Usually a data race is a serious error caused by failure to synchronize properly. This paper distinguishes races - Atomicity races - Inherent races

Background- Atomicity Races
Races caused by violation of wrong assumptions that some blocks of code will be executed atomically Thread 1 Thread 2 void someMethod( ){ shared = update(shard); }

Background- Inherent Races
Races not related to atomicity Data race if the following holds: Executing any segment of cod in each thread atomically does not determine an order of accesses to shared variable. The different orders in which the shared variable is accessed can be classified as “good” and “bad” according to the expected behavior of the program.

Motivation Race detection tools do not verify some of the races, or they can report many false alarms. Even if the problems are known in the best testing or verification techniques, there are situations in which it is not easy to fix it. - embedded software in hardware requires expensive cost for solving it (replacing, updating) If the software could fix its concurrency problems itself on-the-fly, it would be very desirable.

Self-Healing Steps Problem detection Problem localization
Problem healing Healing assurance to detect that something is wrong with the system to find the root cause of the problem applying a fix to the problem using the localization stage to check/prove the self-healing action

Problem Detection Eraser algorithm Principle:
- Detects so called apparent data races Principle: - For each variable maintains its state and the set of candidate locks - Race is detected whenever: + the variable is state shared + the set of candidates locks becomes empty

Extended Eraser Algorithm
Virgin – the variable has not been initialized yet. Exclusive – the variable is accessed only by the thread which initialized it. Shared – the variable is read by multiple threads. Shared-modified – the variable is read and written by multiple threads. Race – a data race on this variable has been detected (due to no or a wrong lock has been used when accessing the variable). Figure 1: Possible states of a shared variable

An Example of Detection
Main T1 T2 bookTicket ( ); <lock> static class Flight { private int soldSeats; … Flight ( ) { soldSeats = 0; } boolean bookTicket ( ) { soldSeats++; new Flight ( ); bookTicket ( ); Time Shared <T1> C(v) = {lock} Exclusive <Main> C(v) = {} Virgin Race <T2> C(v) = {}

Problem Localization Often hard work even for programmer.
This paper uses pre-specified data race bug patterns in the code with the aid of information collected by race detector Use formal methods to reduce the number of false alarms but with reasonable overhead.

Atomicity Violation Bug Patterns
load-store bug pattern x++; test-and-use bug pattern if (p != null) p = p.next; repeated test-and-use bug pattern while (p != null) 0: aload_0 1: getfield #2 4: ifnull 7: aload_0 8: aload_0 9: getfield #2 12:getfield #3 15:putfield #2 18: …

An Example of a Bug Pattern
static class Flight { private int soldSeats; … Flight ( ) { soldSeats = 0; } boolean bookTicket ( ) { soldSeats++; 2: getfield #2 5: iconst_1 6: iadd 7: putfield #2

Healing Atomicity Races
Influencing the scheduler Forcing a context switch: yield( ) or sleep(0) to guarantee full time for atomicity execution from the scheduler safe and legal solution only decrease the probability of race manifestation T1 T2 Thread.yield( ); 2: getfield #2 5: iconst_1 6: iadd 7: putfield #2 2: getfield #2 5: iconst_1 6: iadd 7: putfield #2

Healing Atomicity Races
Influencing the scheduler Temporary changes of the priorities to guarantee full time for atomicity execution from the scheduler safe and legal solution only decrease the probability of race manifestation strongly dependent on OS and JVM Thread.setPriority (MAXPRIORITY); … Thread.setPriority (originalPriority);

Healing Atomicity Violation
Adding Synchronization Actions Suitable use of mutexes (locks). to prevent accesses being simultaneous heal the race can introduce new (and even more dangerous) bugs: deadlock HealingMutex.lock ( ); … HealingMutex.unlock ( );

Healing Inherent Races
Distinguish between “good” and “bad” orders Thread 1 1) done = false Thread 2 for (int i=1; i<100; i++) { print (i); } done = true raceLock.lock( ); raceLock.unlock( ); Thread 2 → Thread 1 : Bad order (done = false) Thread 1 →Thread 2 : Good order (done = true)

enforce on “good” order change the scheduling of the program: wait( ) and notify( ) Thread 1 raceLock.lock ( ); done = false raceLock.unlock ( ); Thread 2 for (int i=1; i<100; i++) { print (i); } raceLock.lock ( ); done = true raceLock.unlock ( ); wait ( ); notify ( );

override “bad” order concentrate on write accesses does not prevent bad order from occurring Thread 1 raceLock.lock ( ); done = false raceLock.unlock ( ); Thread 2 for (int i=1; i<100; i++) { print (i); } raceLock.lock ( ); done = true raceLock.unlock ( ); assume that we know good order (T1 → T2) only maintains T2’s value, if it execution bad order (T2 → T1)

Healing Assurance static analysis or bounded model checking
reduce false alarms during detection and localization ensure that a new bug can not be introduced help to choose suitable healing method

Preliminary Results Implemented race detector is able:
to detect wrong locking policy using Eraser algorithm to detect load-store atomicity bug pattern to localize the race and give enough information to the developer to heal founded race by influencing scheduler and also by additional synchronization

Experiments made all the tests on 1, 2, and 4 processor and for 2, 3, 5, 10, and 15 working threads heals the race in all cases by new explicit lock

Related Work ToleRace tool concentrates on asymmetric races
based on transforming the critical regions of code at the end of the region check a race can “tolerate” it by producing the correct result based on the local copies of shared variables possible to heal only a read-write race does not heal write-write races

conclusion applies self-healing in the context of fixing data races in the Java programs explained three bug patterns leading to data races proposed possible self-healing actions to be taken when a bug pattern is detected Future work implementation of efficient healing techniques [LeVK08]

Healing Data Races On-The-Fly

Similar presentations

Presentation on theme: "Healing Data Races On-The-Fly"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Healing Data Races On-The-Fly

Similar presentations

Presentation on theme: "Healing Data Races On-The-Fly"— Presentation transcript:

Similar presentations

About project

Feedback