Heap Decomposition for Concurrent Shape Analysis R. Manevich T. Lev-Ami M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine MSR Cambridge Dagstuhl.

Heap Decomposition for Concurrent Shape Analysis R. Manevich T. Lev-Ami M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine MSR Cambridge Dagstuhl 08061, February 7, 2008

2 Thread modular analysis for coarse-grained concurrency E.g., [Qadeer & Flanagan, SPIN’03] [Gotsman et al., PLDI’07] … With each lock lk subheap h(lk) Partition heap H = h(lk 1 ) *…* h(lk n ) local invariant I(lk) inferred/specified When thread t acquires lk it assumes I(lk) releases lk it ensures I(lk) Can analyze each thread “separately” Avoid explicitly enumerating all thread interleavings

3 Thread modular analysis for fine-grained concurrency? CAS CAS (Compare And Swap) No locks means more interference between threads No nice heap partitioning Still idea of reasoning about threads separately appealing

4 Overview State space is too large for two reasons Unbounded number of objects  infinite Apply finitary abstractions to data structures (e.g., abstract away length of list) Exponential in the number of threads Observation: Threads operate on part of state Correlations between different substates often irrelevant to prove safety properties Our approach: develop abstraction for substates Abstract away correlations between substates of different threads Reduce exponential state space

5 Non-blocking stack [Treiber 1986] [1] void push(Stack *S, data_type v) { [2] Node *x = alloc(sizeof(Node)); [3] x->d = v; [4] do { [5] Node *t = S->Top; [6] x->n = t; [7] } while (!CAS(&S->Top,t,x)); [8] } [9] data_type pop(Stack *S){ [10] do { [11] Node *t = S->Top; [12] if (t == NULL) [13] return EMPTY; [14] Node *s = t->n; [15] data_type r = s->d; [16] } while (!CAS(&S->Top,t,s)); [17] return r; [18] } #define EMPTY -1 typedef int data type; typedef struct node t { data type d; struct node t *n; } Node; typedef struct stack t { struct node t *Top; } Stack;

6 Example: successful push [1] void push(Stack *S, data_type v) { [2] Node *x = alloc(sizeof(Node)); [3] x->d = v; [4] do { [5] Node *t = S->Top; [6] x->n = t; [7] } while (!CAS(&S->Top,t,x)); [8] } Top n t n x n

7 Example: successful push [1] void push(Stack *S, data_type v) { [2] Node *x = alloc(sizeof(Node)); [3] x->d = v; [4] do { [5] Node *t = S->Top; [6] x->n = t; [7] } while (!CAS(&S->Top,t,x)); [8] } Top = CAS succeeds n n t n x

8 Example: unsuccessful push [1] void push(Stack *S, data_type v) { [2] Node *x = alloc(sizeof(Node)); [3] x->d = v; [4] do { [5] Node *t = S->Top; [6] x->n = t; [7] } while (!CAS(&S->Top,t,x)); [8] }  CAS fails Top n t n x n n

9 Concrete states with storable threads Top n x n x t s t t n n prod1 cons1 prod2 pc=7 cons2 pc=6 pc=14 pc=16 t thread object: name + program location local variable next field of list

10 Full state S1 Top n x n x t s t t n n prod1 cons1 prod2 pc=7 cons2 pc=6 pc=14 pc=16 t

11 Top n x n t n prod1 pc=7 Top n n x t prod2 pc=6 Top n n cons1 pc=14 t Top n n t s n cons2 pc=16 M1M2M3M4 Decomposition(S1) = M1  M2  M3  M4 Decomposition(S1) Note that S1  Decomposition(S1) A substate represents all full states that contain it Decomposition is state-sensitive (depends on values of pointers and heap connectivity)

12 Full states S1  S2 S1 S2 Top n x n x t s t t n n prod1 cons1 prod2 pc=7 cons2 pc=6 pc=14 pc=16 t Top n x n x t s t t n n prod2 cons2 prod1 pc=7 cons1 pc=6 pc=14 pc=16 t

13 Decomposition(S1  S2) improve explanation Top n x n t n prod1 pc=7 Top n n x t n prod2 pc=6 Top n n t cons1 pc=14 Top n n t s n pc=16 cons2 Top n n x t n prod1 pc=6 Top n x n t n prod2 pc=7 Top n n t s n pc=16 cons1 Top n n t cons2 pc=14   M1 M2 M3 M4 K1 K2 K3 K4 (S1  S2)  Decomposition(S1  S2) Cartesian abstraction ignores correlations between substates Decomposition(S1  S2) = (M1  K1)  (M2  K2)  (M3  K3)  (M4  K4) State space exponentially more compact

14 Abstraction properties Substates in each subdomain correspond to a single thread Abstract away correlations between threads Exponential reduction of state space Substates preserve information on part of heap (relevant to one thread) Substates may overlap Useful for reasoning about programs with fine-grained concurrency Better approximate interference between threads

15 Main results New parametric abstraction for heaps Heap decomposition + Cartesian abstraction Parametric in underlying abstraction + decomposition Parametric sound transformers Allows balancing efficiency and precision Implementation in HeDec Heap Decomposition + Canonical Abstraction Used to prove interesting properties of heap- manipulating programs with fine-grained concurrency Linearizability Analysis scales linearly in number of threads

16 Sound transformers  {XHj1} j1  {XHj2} j2  {XHj3} j3  {X j4 } j4  {YHj1’} j1’  {YHj2’} j2’  {YHj3’} j3’  {YHj4’} j4’ ##

17 Pointwise transformers  {XHj1} j1  {XHj2} j2  {XHj3} j3  {XHj4} j4  {YHj1’} j1’ ##  {YHj2’} j2’ ##  {YHj3’} j3’ ##  {YHj4’} j4’ ## often too imprecise efficient

18 Imprecision example [1] void push(Stack *S, data_type v) { [2] Node *x = alloc(sizeof(Node)); [3] x->d = v; [4] do { [5] Node *t = S->Top; [6] x->n = t; [7] } while (!CAS(&S->Top,t,x)); [8] } Top n n x t n prod2 pc=6 M2  # : schedules prod1 and executes x->n=t But where do x and t of prod1 point to?

19 Imprecision example [1] void push(Stack *S, data_type v) { [2] Node *x = alloc(sizeof(Node)); [3] x->d = v; [4] do { [5] Node *t = S->Top; [6] x->n = t; [7] } while (!CAS(&S->Top,t,x)); [8] } Top n x n x t s t t n n prod2 cons1 prod1 pc=7 cons2 pc=6 pc=14 pc=16 t ## Top n x n t n prod2 pc=7 false alarm: possible cyclic list

20 Full composition transformers  {XHj1} j1  {XHj2} j2  {XHj3} j3  {XHj4} j4 {XHj1}  {XHj1}  {XHj1}  {XHj1} ##  # ({XHj1}  {XHj2}  {XHj3}  {XHj4})  {YHj1’} j1’  {YHj2’} j2’  {YHj3’} j3’  {YHj4’} j4’ exponential space blow-up precise

21 Partial composition  {XHj1} j1  {XHj2} j2  {XHj3} j3  {XHj4} j4 {XHj1}  {XHj2} {XHj1}  {XHj3} {XHj1}  {XHj4}

22 Partial composition {XHj1}  {XHj2}{XHj1}  {XHj3}{XHj1}  {XHj4}  {YHj1’} j1’  {YHj2’} j2’  {YHj3’} j3’  {YHj4’} j4’ ##  # ({XHj1}  {XHj2}) ##  # ({XHj1}  {XHj3}) ##  # ({XHj1}  {XHj4}) efficient and precise

23 Partial composition example Top n x n t n prod1 pc=7 Top n n x t n prod2 pc=6 Top n n x t n prod1 pc=6 Top n x n t n prod2 pc=7  M1 M2 K1 K2 {XHj1}  {XHj2}

24 Partial composition example  {XHj1} j1  {XHj2} j2 {XHj1}  {XHj2} Top n x n x t t n prod2 prod1 pc=7 Top n x n x t t n prod2 prod1 pc=7 pc=6 n K2  k1 K2  M1 pc=7 false alarm avoided

25 Sound transformers Parametric transformers Compose pre-subdomains Apply point-wise transformations Compose post-subdomains Specified by analysis designer for each type of statement (once and for all) Any transformer specification is valid But precision/efficiency varies Some heuristics Details in tech. reporttech. report

26 Experimental results List-based fine-grained algorithms Non-blocking stack [Treiber 1986] Non-blocking queue [Doherty and Groves FORTE’04] Two-lock queue [Michael and Scott PODC’96] Benign data races Verified absence of nullderef + mem. Leaks Verified Linearizability Analysis built on top of existing full heap analysis of [Amit et al. CAV’07] Scaled analysis from 2/3 threads to 20 threads Extended to unbounded threads (different work)

27 Experimental results Exponential time/space reduction Non-blocking stack + linearizability

28 Related work Disjoint regions decomposition [TACAS’07] Fixed decomposition scheme Most precise transformer is FNP-complete Partial join [Manevich et al. SAS’04] Orthogonal to decomposition In HeDec we combine decomposition + partial join [Yang et al.] Handling concurrency for an unbounded number of threads Thread-modular analysis [Gotsman et al. PLDI’07] Rely-guarantee [Vafeadis et al. CAV’07] Thread quantification (submitted)

29 More related work Local transformers Works by Reynolds, O’Hearn, Berdine, Yang, Gotsman, Calcagno Heap analysis by separation [Yahav & Ramalingam PLDI’04] [Hackett & Rugina POPL’05] Decompose verification problem itself and conservatively approximate contexts Heap decomposition for interprocedural analysis [Rinetzky et al. POPL’05] [Rinetzky et al. SAS’05] [Gotsman et al. SAS’06] [Gotsman et al. PLDI’07] Decompose/compose at procedure boundaries Predicate/variable clustering [Clark et al. CAV’00] Statically-determined decomposition

30 Conclusion Parametric framework for shape analysis Scaling analyses of program with fine-grained concurrency Generalizes thread-modular analysis Key idea: state decomposition Also useful for sequential programs Used prove intricate properties like linearizability HeDec tool http://www.cs.tau.ac.il/~tvla#HEDEC

31 Future/ongoing work Extended analysis for an unbounded number of threads via thread quantification Orthogonal technique Both techniques compose very well Can we automatically infer good decompositions? Can we automatically tune transformers? Can we ruse ideas to non-shape analyses?

32 Invited questions How do you choose a decomposition? How do you choose transformers? How does it compare to separation logic? What is a general principle and what is specific to shape analysis? Caveats / limitations?

33 How do you choose a decomposition? In general this an open problem Perhaps ctrex. refinement can help Depends on property you want to prove Aim at causes of combinatorial explosion Threads Iterators For linearizability we used For each thread t Thread node, objects referenced by local variables, objects referenced by global variables Objects referenced by global variables and objects correlated with seq. execution Locks component: for each lock thread that acquires it

34 How do you choose transformers? In general challenging problem Have to balance efficiency and precision Have some heuristics Core subdomains

35 How does it compare to separation logic? Relevant separating conjunction *r Like * but without the disjointness requirement Do you have an analog of the frame rule? For disjoint regions decomposition [TACAS’07] In general no, but instead we can use transformers of different level of precision  # (I1  I2) =  #precise (I1)   #less-precise (I2) where  #less-precise is cheap to compute Perhaps can find conditions for which  # (I1  I2) =  #precise (I1)  I2 Relativized formulae

36 What is a general principle and what is specific to shape analysis? Decomposing abstract domains is general Substate abstraction + Cartesian product Parametric transformers for Cartesian abstractions is general Chopping down heaps by heterogeneous abstractions is shape-analysis specific

37 Caveats / limitations? Decomposition + transformers defined by user Not specialized for program/property Too much overlap between substates can lead to more expensive analyses Too fine decomposition requires lots of composition Partial composition is a bottle neck We have the theory for finer grained compositions + incremental transformers but no implementation Instantiated framework for just one abstraction (Canonical Abstraction) Can this be useful for separation logic-based analyzers?

Heap Decomposition for Concurrent Shape Analysis R. Manevich T. Lev-Ami M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine MSR Cambridge Dagstuhl.

Similar presentations

Presentation on theme: "Heap Decomposition for Concurrent Shape Analysis R. Manevich T. Lev-Ami M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine MSR Cambridge Dagstuhl."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Heap Decomposition for Concurrent Shape Analysis R. Manevich T. Lev-Ami M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine MSR Cambridge Dagstuhl.

Similar presentations

Presentation on theme: "Heap Decomposition for Concurrent Shape Analysis R. Manevich T. Lev-Ami M. Sagiv Tel Aviv University G. Ramalingam MSR India J. Berdine MSR Cambridge Dagstuhl."— Presentation transcript:

Similar presentations

About project

Feedback