Presentation is loading. Please wait.

Presentation is loading. Please wait.

Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew.

Similar presentations


Presentation on theme: "Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew."— Presentation transcript:

1 Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew

2 Outline Motivation Approach & Implementation Results Related Work Conclusion

3 Motivation Programmers develop “low-lock” code for better performance  lock is expensive  data race are deliberately employed  require sequential consistency (SC) model Such code might fail in relaxed consistency (RC) models  E.g. Double Checked Locking (DCL) for lazy initialized singleton

4 Example 1 (a) : Lazy initialized singleton Object::Object() { this.field = 100; } Object Object::getInstance() { if (!_instance) _instance = new Object(); return _instance; } Object Object::getInstance() { lock(l); if (!_instance) _instance = new Object(); unlock(l); return _instance; } work only for single thread work for multi-thread, but is expensive... void Object::useInstance() { Object ins; ins = Object::getInstance(); int f = ins.getField(); }

5 (b): Double Checked Locking for lazy initialized singleton Object Object::getInstance() { if (!_instance) { lock(l); if (!_instance) _instance = new Object(); unlock(l); } return _instance; } If the architecture is SC, then it works correctly, with better performance than (a). But, how about running on RC models that allows write-write reorder?

6 A possible execution interleave…correct! Object Object::getInstance() { if (!_instance) { lock(l); if (!_instance) { temp = malloc(..); A1: temp->field = 100; A2: _instance = temp; } unlock(l); } return _instance; } B1: if (!_instance) {…} … B2: read _instance->field; Initializer Thread (T1)Reader Thread (T2) Data races are employed, since these accesses are improperly synchronized

7 But, how about reorder write-write? Object Object::getInstance() { if (!_instance) { lock(l); if (!_instance) { temp = malloc(..); temp->field = 100; A2: _instance = temp; A1: temp->field = 100; } … B1: if (!_instance) {…} … B2: read _instance->field; Initializer Thread (T1)Reader Thread (T2) Get Un-initialized value of instance->field Violate Sequential Consistency

8 bug pattern: Potential Violation of Sequential Consistency (PVSC), - since these defects might cause SC violation. How to detect and eliminate PVSC bugs? - Basically, we combine Shasha/Snir’s conflict graph and delay set theory with existing data race detection scheme.

9 Outline Motivation Approach & Implementation Results Related Work Conclusion

10 our scheme (1) Construct Race Graph (2) Find cycles in it  A cycle in race graph corresponds to a PVSC bug (3) Compute delay set (4) Insert memory ordering fences

11 Constructing Race Graph For all the instructions that executed in a particular execution of a program P:  Add program order edge for instructions in each thread.  Add race edge for each data race. wr a wr b rd b rd a Thread 1Thread 2 Race edge Program order edge

12 A: wr a B: wr b C: rd b D: rd a Example 1. Race Graph for DCL … lock(l); if (!_instance) { temp = malloc(..); temp->field = 100; _instance = temp; } unlock(l); } if (!_instance) {…} … read _instance->field;

13 Find cycles in race graph Theorem 1. A cycle in race graph corresponds to a PVSC bug.  Proof: If a cycle is found in race graph, then it is possible to get a non-sequential-consistent execution by letting the race order be consistent with the cycle. E.g, we can get a non- SC execution E={B->C, D->A} from the cycle A- >B->C->D->A in previous example.

14 Compute delay set Delay lemma : Any execution should be consistent with a delay set D. [Shasha/Snir] Theorem 2. Let D be the delay set which contains all the program order edge of the race cycles in race graph. Then D enforces sequential consistency for the executions that generates D.  Proof: Omitted

15 Insert memory ordering fences A fence instruction delays the issue of an instruction until all previous instructions completed. Insert a fence for each delay in D. Then D can be enforced, and, Detected PVSC can be eliminated.

16 Thread 2Thread 1 Examples for above 3 steps… wr a wr b rd a rd b Fig. 1 : No cycles, no PVSC, no fence is needed. (Implies that any execution on RC is sequential consistent, thus we don ’ t need fences.)

17 Thread 1Thread 2Thread 3 A: a=1 C: b = 1 D: if (b) B: if (a) Fig. 2 : contains a cycle A->B->C->D->E->A, PVSC. It’s possible to get the execution {A->B, C -> D,E->A} which violates SC and results in {a=1,b=1, R1=0}. If we insert fences between A and B, C and D, then PVSC is eliminated. E: R1=a Initially a = b = 0

18 Fig. 3: Corrected version of DCL for lazy initialized singleton. Object getInstance() { Object *tmp = _instance; Fence(); if (!tmp) { lock(l); tmp = _instance; if (!tmp) tmp = new Object(); Fence(); _instance = tmp; unlock(l); } return _instance; }

19 Optimization To handle real-world applications with  Long execution time  Many threads We convert the race graph into PC race graph  Combine nodes with same PC into one node. The graph contains N nodes, where N equals the number of race access instructions.  Adopt SCC algorithm on PC race graph. Each SCC corresponds to a PVSC bug Can introduce false negatives.

20 Outline Motivation Approach & Implementation Results Related Work Conclusion

21 Results Detected PVSC bugs Performance loss after fence insertion Cost of PVSC detection over race detection

22 Part of detected bugs MySQL 5.0.x sql/slave.c, handle_slave_io() Assertion in slave shutdown. mi->slave_running=0 could be visible to other threads before the cleanup is completed. Thus causes assertion during slave shutdown. httpd 2.2.xmodules/cache/ mod_cache.c, cache_store_content() store_header() might be visible to other threads before store_body(), thus mod_cache might provide old content despite new content has been fetched. httpd 2.2.xprefork/prefork.c, ap_mpm_run() restart_pending = shutdown_pending = 0; might be visible to child threads after set_singal(), thus if httpd receives SIGTERM, it will be ignored while child processes are being spawned.

23 Performance loss of SPLASH-2 Figure 10: Performance on Intel Itanium SMP

24 Cost over data race detection Figure 13: Cost of PVSC detection over different race detecting algorithm

25 Related Work Compiler Analysis: Conservative for C/C++ programs, insert much redundant fences which hurt performance severely. [K.Yelick@ucb, S.Midkiff@purdue] Verification: Enumerate all possible executions fit with a RC model. Not scale to large applications. [S.Burckhardt@msr] Data race detection: Do not concern with the problem of SC violation. [many] Other concurrency bugs : Atomicity[AVIO,yyzhou], Correlation[MUVI,yyzhou], do not consider the PVSC problem.

26 Outline Motivation Approach & Implementation Results Related Work Conclusion

27 An effective and efficient scheme of detect Potential Violation of Sequential Consistency for concurrent C/C++ programs.  Easy to be ported to the matured data race detection tools.  Retain the performance after PVSC elimination.  Scalable and low-cost. Current limitation  Dynamic data race detection limitations: false positive and false negative.  Can be addressed with the progress in data race detection  Loop

28 Thanks! Suggestion?


Download ppt "Detecting and Eliminating Potential Violation of Sequential Consistency for concurrent C/C++ program Duan Yuelu, Feng Xiaobing, Pen-chung Yew."

Similar presentations


Ads by Google