Presentation is loading. Please wait.

Presentation is loading. Please wait.

Assume/Guarantee Reasoning using Abstract Interpretation Nurit Dor Tom Reps Greta Yorsh Mooly Sagiv.

Similar presentations


Presentation on theme: "Assume/Guarantee Reasoning using Abstract Interpretation Nurit Dor Tom Reps Greta Yorsh Mooly Sagiv."— Presentation transcript:

1 Assume/Guarantee Reasoning using Abstract Interpretation Nurit Dor Tom Reps Greta Yorsh Mooly Sagiv

2 Limitations of Whole Program Analysis Complexity of Chaotic Iterations Not all the source code is available –Large libraries –Software components No interaction with the client –Program design

3 A Motivating Example List rev(List x) { if (x ==null) return null ; return append(rev(x  next), x); } List append(List x, List y) { List e; if (x == null) return y; e = malloc(…); e  data = x  data; e  next = append(x  next, y); } List rev(List x) requires acyclic(x) ensures $$=reverse(x) List append(List x, List y) requires acyclic(x)  acyclic(y) ensures $$= x || y Contract Can also used for runtime testing

4 Challenges in A/G Reasoning Specifying procedure contracts Performing abstract interpretation using contracts

5 Specifying Contracts Executable specifications –assert –Can use loops –Expressive –Natural –But what about side-effects Declarative specifications –Types –First order logic –Z Hybrid –Larch –Java Modeling Language

6 Procedure Contracts and Modularity The postcondition does not reveal the whole story void foo(List x, List z) { List y, t ; y = rev(x); t = rev(z); } List rev(List x) requires acyclic(x) ensures $$=reverse(x) List foo(List x) requires acyclic(x)  acyclic(y) ensures true

7 Procedure Contracts and Modularity Specify parts of the state which may be modified But difficult to define potential side-effects Can use abstract interpretation void foo(List x, List z) { List y, t ; y = rev(x); t = rev(z) } List rev(List x) requires acyclic(x) ensures $$=reverse(x) List foo(List x) requires acyclic(x)  acyclic(y) ensures true

8 Issues in Specifying Contracts Expressible Conciseness Natural Reuse Cost of dynamic check (model checking) Decidability Cost of abstract interpretation

9 Plan CSSV: A tool for verifying absence of buffer overruns (N. Dor) An algorithm for performing abstract interpretation in the most precise way using specification

10 CSSV: Towards a Realistic Tool for Statically Detecting All Buffer Overflows in C Nurit Dor, Michael Rodeh, Mooly Sagiv DAEDALUS project

11 /* from web2c [strpascal.c] */ void foo(char *s) { while ( *s != ‘ ‘ ) s++; *s = 0; } Vulnerabilities of C programs Null dereference Dereference to unallocated storage Out of bound pointer arithmeticOut of bound update

12 Is it common? General belief – yes! FUZZ study –Test reliability by random input –Tens of applications on 9 different UNIX systems –18% – 23% hang or crash CERT advisory –Up to 50% of attacks are due to buffer overflow COMMON AND DANGEROUS

13 CSSV’s Goals Efficient conservative static checking algorithm –Verify the absence of buffer overflow not just finding bugs –All C constructs Pointer arithmetic, casting, dynamic memory, … –Real programs –Minimum false alarms

14 Verifying Absence of Buffer Overflow is non-trivial void safe_cat(char *dst, int size, char *src ) { if ( size > strlen(src) + strlen(dst) ) { dst = dst + strlen(dst); strcpy(dst, src); } {string(src)  alloc(dst) > len(src)} {string(src)  string(dst)  alloc(dst+len(dst)) > len(src)} string(src)  string(dst)  (size > len(src)+len(dst))  alloc(dst+len(dst)) > len(src))

15 Can this be done for real programs? Complex linear relationships Pointer arithmetic Loops Procedures Use Polyhedra[CH78] Pointer analysis Widening Procedure contracts Very few false alarms!

16 Linear Relation Analysis  Cousot and Halbwachs, 78  Statically analyze program variable relations: a 1 * var 1 + a 2 * var 2 + … + a n * var n  b  Polyhedron y  1 x + y  3 -x + y  1 0 1 2 3 x 0 1 2 3 y V = { (1,2) (2,1) } R = { (1,0) (1,1) }

17 C String Static Verifier Detects string violations –Buffer overflow (update beyond bounds) –Unsafe pointer arithmetic –References beyond null termination –Unsafe library calls Handles full C –Multi-level pointers, pointer arithmetic, structures, casting, … Applied to real programs –Public domain software –C code from Airbus

18 Plan Semantics for C program Contract language Static analysis algorithm Implementation

19 Standard C Semantics void safe_cat(char *dst, int size, char *src ) { if ( size > strlen(src) + strlen(dst) ) { dst = dst + strlen(dst); strcpy(dst, src); } } src 0x480588 dst 0x480580 size 0x480584 0x5058510 125 ‘x’ 0x5050510 0x5050518 0 ‘y’ 0x6000009 0x6000A00 0 0x6000009

20 Instrumented C Semantics src 0x480588 dst 0x480580 size 0x480584 0x5058510 125 ‘x’ 0x5050510 0x5050518 0 ‘y’ 0x6000009 0x6000A00 0 4 130 baseasize 4 4 245 0x6000009

21 Instrumented C Semantics src 0x480588 dst 0x480580 size 0x480584 0x5058510 125 ‘x’ 0x5050510 0x5050518 0 ‘y’ 0x6000009 0x6000A00 0 4 130 baseasize 4 4 245 0x6000009 0 offset 9 0x6000000

22  The instrumented semantics checks validity of C expressions  ANSI C  Cleanness  dst = dst + i Safety offset(dst) + i  asize(base(dst)) dst offset(dst) base(dst) asize(base(dst)) i

23 Contracts Defined in the instrumented semantics Specify string behavior of procedures (C expressions) –Precondition –Postcondition Use of values at procedure entry –Side-effects Can be approximated from pointed information No need to specify pointer information –Not aiming for modular pointer analysis

24 Contracts’ Advantages Modular analysis –Use contracts on call statements –Not all the code is available –Enable more expensive analyses User control of the verification –Detect errors at point of logical error –Improve the precision of the analysis Check additional properties –Beyond ANSI-C

25 Example char* strcpy(char* dst, char* src) requires mod ensures ( string(src)  alloc(dst) > len(src) ) ( len(dst) = = [len(src)] pre  return = = [dst] pre ) dst

26 safe_cat’s contract void safe_cat(char* dst, int size, char* src) requires mod ensures ( string(src)  string(dst) alloc(dst) == size ) ( len(dst) <= [len(src)] pre + [len(dst)] pre  len(dst) >= [len(dst)] pre ) dst

27 Contracts and Soundness All errors are detected –Violation of statement’s precondition …a[i]… –Violation of procedure’s precondition Call –Violation of procedure's postcondition Return Violation messages depend on the contracts But may lead to more false alarms (e.g., trivial contracts)

28 CSSV Static Analysis 1.Inline contracts Expose behavior of called procedures 2.Pointer analysis (global) Find relationship between base addresses 3.Integer analysis Compute offset information

29 Step 1: Inliner void safe_cat( char *dst, int size, char *src ) { … strcpy(dst, src); … } void safe_cat( char *dst, int size, char *src ) requires ( string(src)  string(dst) alloc(dst) == size) mod dst ensures ( len(dst) = = [pre@len(src)] pre + [len(dst)] pre ) char* strcpy( char *dst, char *src ) requires ( string(src) alloc(dst) > len(src)) mod dst ensures ( len(dst) = = [len(src)] pre  return = = [dst] pre )

30 Step 1: Inliner void safe_cat( char *dst, int size, char *src ) { … strcpy(dst, src); … } void safe_cat( char *dst, int size, char *src ) requires ( string(src)  string(dst) alloc(dst) == size) mod dst ensures ( len(dst) = = [pre@len(src)] pre + [len(dst)] pre ) char* strcpy( char *dst, char *src ) requires ( string(src) alloc(dst) > len(src)) mod dst ensures ( len(dst) = = [len(src)] pre  return = = [dst] pre ) assume assert

31 Step 1: Inliner void safe_cat( char *dst, int size, char *src ) { … strcpy(dst, src); … } void safe_cat( char *dst, int size, char *src ) requires ( string(src)  string(dst) alloc(dst) == size) mod dst ensures ( len(dst) = = [pre@len(src)] pre + [len(dst)] pre ) char* strcpy( char *dst, char *src ) requires ( string(src) alloc(dst) > len(src)) mod dst ensures ( len(dst) = = [len(src)] pre  return = = [dst] pre ) assume assert

32 Step 2: Compute Pointer Information Required for reasoning about pointers Every base address is abstracted by an abstract location Relationships between base addresses is computed (points-to) Global analysis –Scalable –Imprecise Flow insensitive (Almost) Context insensitive

33 Global Points-To main() { char s[10], t[20],r; char *p1, *p2; … p1= r + i; safe_cat(s,10,p1); p2 = r + j; safe_cat(t,10,p2); … } str p2 dstsrc safe_cat( char *dst, int size, char *src ) { … strcpy(dst, src); … } p1

34 Procedural Points-to (PPT) “Project” pointer information on visible variables of the procedure Introduce abstract locations for formal parameters Allow destructive updates through formal parameters (well behaved programs) Can decrease precision in some procedures

35 PPT Param #1Param # 2 dstsrc safe_cat( char *dst, int size, char *src ) { … strcpy(dst, src); … }

36 Step 3: Static Analysis Prove linear inequalities on string indices Abstract string properties using constraint variables Use abstract interpretation to conservatively interpret program statements Verify safety preconditions

37 Back to Semantics src 0x480588 dst 0x480580 size 0x480584 0x5058510 125 ‘x’ 0x5050510 0x5050518 0 ‘y’ 0x6000009 0x6000A00 0 4 130 baseasize 4 4 245 0x6000009 0 offset 9 0x6000000

38 Abstract Representation src dst size n1n1 n2n2 Base address relationship src 0x480588 dst 0x480580 size 0x480584 0x5058510 125 ‘x’ 0x5050510 0x5050518 0 ‘y’ 0x6000009 0x6000A00 0 0x6000009 0x6000000

39 Constraint Variables For every abstract location a.offset src.offset = 9 src

40 Constraint Variables For every integer abstract location a.val size.val = 125 size

41 Constraint Variables For every abstract location a.is_nullt a.len a.asize n1n1 n 1.len n 1.asize 0

42 Abstract Representation src dst size n1n1 n2n2 dst.offset < n 1.len size.val+ dst.offset = n 1.asize n 1.is_nullt = true n 2.is_nullt = true

43 What does it represent? dst size ? ? n 1.is_nullt = true 0 ? dst.offset < n 1.len n 1. len dst.offset size.val + dst.offset = n 1.asize size.val n 1. asize

44 Abstract Interpretation dst.offset < n 1.len size.val = n 1.asize - dst.offset dst = dst + strlen(dst); dst.offset = n 1.len size.val = n 1.asize - dst.offset + n 1.len

45 Verify Safety Condition dst = dst + i dst offset(dst) base(dst) asize(base(dst)) i offset(dst) + i  asize(base(dst)) concrete semantics abstract semantics dst.offset + i.val  n 1.asize n1n1 dst.offset n 1.asize dst i

46 The Assume-Operation Use two copies of constraint variables Set modified values to ⊤ Meet the post

47 CSSV Implementation C files Pre Mod Post C files contracts Procedure name Pointer Analysis Procedure ’ s Pointer info Inliner C files C ’ files C2IP Integer Procedure Potential Error Messages Integer Analysis

48 Used Software ASToolKit [Microsoft] Core C [TAU - Greta Yorsh] GOLF [Microsoft - Manuvir Das] New Polka [Inria - Bertrand Jeannet]

49 Applications Verified string library from Airbus with 6 false alarms –Could be avoided by analyzing correlated conditions Found 8 real errors in another string intensive application with 2 false alarms –In one case safety depends on correctness –Could be avoided by defensive programming 1 - 206 CPU seconds per procedure –No optimizations Very few false alarms

50 Related Work Non-Conservative Wagner et. al. [NDSS’00] LCLint’s extension [USENIX’01] Eau Claire [IEEE Oakland 02] Conservative Polyspace verifier Dor, Rodeh and Sagiv [SAS’01]

51 Further work Derive contracts Improve efficiency Interprocedural

52 CSSV: Summary Semantics –Safety checking –Full C –Enables abstractions Contract language –String behavior –Omit pointer aliasing Procedural points-to –Scalable –Improve precision Static analysis –Tracks important string properties –Utilizes integer analysis

53 Foundation of A/G abstract interpretation Greta Yorsh www.cs.tau.ac.il/~gretay

54 Assume-Guarantee Reasoning using AI T bar(); void foo() { T p;... p = bar();... } {pre bar, post bar } {pre foo, post foo } assume[pre foo ]; assert[pre bar ]; ----------- assume[post bar ]; assert[post foo ]; Is  (a)     ? assert[  ](a) assume[  ](a) <⊤><⊤>  (  (a) ⋂    )  a ⋂  (    )

55 Goals Generic algorithms for assert & assume Effective Efficient Allow natural specifications Rather precise verification

56 Motivation New approach to using symbolic techniques in abstract interpretation –for shape analysis –for other analyses What does it mean to harness a decision procedure for use in static analysis? –what are the requirements ? –what does it buy us ?

57 What are the requirements ? Formulas S ∈  (a) ⇔ S   (a) ^ AbstractConcrete a  ^  Is  (a) empty? Is  (a) satisfiable? ^ ⇔   (a)

58 [x  0, y  0, z  0] [x  0, y  1, z  0] [x  0, y  2, z  0]  [x  0, y  , z  0]   AbstractConcreteFormulas (x=0)  (z=0)  ^ S ⊧  (a) ⇔ S ∈  (a) ^

59 FormulasConcrete Values Abstract Values u1u1 x u     x... x  v1,v2 : node u1 (v1)  node u (v2)  v1 ≠ v2   v : node u1 (v)  node u (v) ...

60 What does it buy us ? Guarantee the most-precise result w.r.t. to the abstraction –best transformer –other abstract operations Modular reasoning –assume-guarantee reasoning –scalability

61 AbstractConcrete The assume[  ](a) Operation a   =  (  (a)    ) Formulas   (a)  ^ X (a)(a)   ^  (  (a)  ) ^ ^ assume[  ](a) X

62 Formulas AbstractConcrete The abstraction operation  (  ) ^   ^ a1a1 a2a2   

63 Assume-Guarantee Reasoning using AI T bar(); void foo() { T p;... p = bar();... } {pre bar, post bar } {pre foo, post foo } assume[pre foo ]; assert[pre bar ]; ----------- assume[post bar ]; assert[post foo ]; ^ Is  (a)   ? assert[  ](a) assume[  ](a) <⊤><⊤>  ( )  (  (a) ⋀  ) ^ ^

64 Formulas AbstractConcrete Computing  (  )  ^ ^ ans   ⊤ a1a1

65 3-Valued Logical Structures Relation meaning over {0, 1, ½} Kleene – 1: True – 0: False – ½ : Unknown A join semi-lattice: 0 ⊔ 1 = ½   ½

66 Canonical Abstraction x u1u1 u2u2 u3u3 u4u4 c,r x x u1u1 u2u2 x ∃ v 1,v 2 :node u1 (v 1 ) ⋀ node u2 (v 2 ) ⋀∀ w: node u1 (w) ⋁ node u2 (w) ⋀ ∀ w 1,w 2 :node u1 (w 1 ) ⋀ node u1 (w 2 ) ⇒ (w 1 =w 2 ) ⋀⌝ n(w 1,w 2 ) ⋀∀ v:r x (v) ⇔∃ v1: x(v1) ⋀ n*(v1,v) ⋀∀ v:c(v) ⇔∃ v1:n(v,v1) ⋀ n*(v1,v) ⋀∀ v1,v2:x(v1) ⋀ x(v2) ⇒ v1=v2 ⋀ ∀ v,v1,v2:n(v,v1) ⋀ n(v,v2) ⇒ v1=v2 FO TC  (a) ≜ ^

67  y == x->n Formulas Concrete  ⊤ ans  ≜ ∀ v 1 :y(v 1 ) ↔ ∃ v 2 : x(v 2 ) ⋀ n(v 2, v 1 ) Abstract x u1u1 u2u2 yy x u1u1 uyuy y x u1u1 u2u2 uyuy y x (()(() ^

68 Example - Materialization x u1u1 u2u2 yy x u1u1 u2u2 y y y(u 2 )=0 materialization u 2  u y, u 2 y(u y ) = 1, y(u 2 ) =0 u2u2 x u1u1 uyuy y y y y(u 2 )=1 x u1u1 u2u2 y y  Is  (a)   satisfiable ? ^ y == x->n

69 Example – Refinement x u1u1 uyuy y u2u2 n(u y,u 2 ) = 0 u1u1 uyuy y u2u2 x n(u y,u 2 ) = 1 u1u1 uyuy y u2u2 x u1u1 uyuy y u2u2 x n(u y,u 2 ) = ½ ∀ concrete stores ∃ two pairs of nodes n(a 1, a 2 ) = 1 and n(b 1, b 2 ) = 0 ∀ concrete stores ∀ pair of nodes n(a 1, a 2 ) = 1 or n(a 1, a 2 ) = 0 y == x->n Is  (a)   satisfiable ? ^

70 Abstract Operations  (  ) – best abstract value that represents  What does it buy us ? assume[  ](a) =  (  (a) ⋀  ) –assume-guarantee reasoning –pre- and post-conditions specified by logical formulas BT(t,a) =  (  ( extend (a)) ⋀ t ) –best abstract transformer –parametric abstractions meet(a 1, a 2 ) =  (  (a 1 ) ⋀  (a 2 ) ) ^ ^ ^ ^ ^ ^ ^ ^

71 SPASS Experience Handles arbitrary FO formulas Can diverge –use timeout Converges in our examples –Captures older shape analysis algorithms How to handle FO TC ? –Overapproximations lead to too many structures

72 Decidable Transitive-closure Logic Neil Immerman (UMASS), Alexander Rabinovich (TAU) ∃∀ (TC,f) is subset of FO TC –exist-forall form –arbitrary unary relations –single function f Decidable for satisfiability –NEXPTIME-complete Any “reasonable” extension is undecidable Rather limited

73 Simulation Technique – CAV’04 Neil Immerman (UMASS), Alexander Rabinovich (TAU) Simulate realistic data structures using decidable logic over tractable structures –Singly linked list - shared/cyclic/nested –Doubly linked list –Trees Preserved under mutations Abstract interpretation, Hoare-style verification

74 Further Work Implementation Decidable logic for shape analysis Assume-guarantee of “real” programs –case study: Java Collection (B. Livshits, Noam) –Estimate side-effects (A. Skidanov) –specification language –write procedure specifications Extend to other domains –Infinite-height Tune the abstraction based on specification

75 Summary A/G Approach can scale program analysis/verification But requires some effort –Language designers –Programmers –Abstract interpretation –Efficient runtime testing


Download ppt "Assume/Guarantee Reasoning using Abstract Interpretation Nurit Dor Tom Reps Greta Yorsh Mooly Sagiv."

Similar presentations


Ads by Google