Program Analysis and Verification 0368-4479 Noam Rinetzky Lecture 8: Abstract Interpretation 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.

Program Analysis and Verification 0368-4479 Noam Rinetzky Lecture 8: Abstract Interpretation 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav

Abstract Interpretation [Cousot’77] Mathematical foundation of static analysis 2

Order-Related Terminology Preorder Partial Order Pointed Posets Join/Meet Complete lattices

A complete lattice (D, , , , ,  ) is A set of elements D A partial order x  y A join operator  A meet operator  A bottom element  =   =  D A top element  =  D =   4

Abstract (conservative) interpretation 5 set of states operational semantics (concrete semantics) statement S set of states  abstract representation abstract semantics statement S abstract representation concretization

Domain Theory Monotone functions Chains Complete partial orders (CPO) – Every chain has a LUB Pointed CPOs Constructing CPOs – Cartesian product, relational product, disjunctive completion …

Continuity A monotonic function maps a chain of inputs into a chain of outputs: x 0  x 1  …  f(x 0 )  f(x 1 )  … It is always true that:  i  f(  i ) But f(  i )   i is not always true 7

Fixed Points Solve equation: where W:∑   ∑  ; W= S  while b do S  W(S  s   ) if B  b  (  )=true W(  ) =  if B  b  (  )=false  if B  b  (  )=  8 {

Fixed Points Solve equation: where W:∑   ∑  ; W= S  while b do S  Alternatively, W = F(W) where: F(W) = . W(S  s   ) if B  b  (  )=true W(  ) =  if B  b  (  )=false  if B  b  (  )=  9 { W(S  s   ) if B  b  (  )=true  if B  b  (  )=false  if B  b  (  )=  {

Fixed Point (cont) Thus we are looking for a solution for W = F( W) – a fixed point of F Typically there are many fixed points We may argue that W ought to be continuous W  [∑   ∑  ] Cut the number of solutions We will see how to find the least fixed point for such an equation provided that F itself is continuous 10

Fixed Point Theorem Define F k = x. F( F(… F( x)…)) (F composed k times) If D is a pointed cpo and F : D  D is continuous, then – for any fixed-point x of F and k  N F k (  )  x – The least of all fixed points is  k F k (  ) Proof: i.By induction on k. Base: F 0 (  ) =   x Induction step: F k+1 (  ) = F( F k (  ))  F( x) = x ii.It suffices to show that  k F k (  ) is a fixed-point F(  k F k (  )) =  k F k+1 (  ) =  k F k (  ) 11

Fixed-Points (notes) If F is continuous on a pointed cpo, we know how to find the least fixed point All other fixed points can be regarded as refinements of the least one – They contain more information, they are more precise – In general, they are also more arbitrary 12

Fixed-Points (notes) If F is continuous on a pointed cpo, we know how to find the least fixed point All other fixed points can be regarded as refinements of the least one – They contain more information, they are more precise – In general, they are also more arbitrary – They also make less sense for our purposes 13

Complete Lattice Let (D,  ) be a partial order D is a complete lattice if every subset has both greatest lower bounds and least upper bounds 14

Knaster-Tarski Theorem Let f: L  L be a monotonic function on a complete lattice L The least fixed point lfp(f) exists – lfp(f) =  {x  L: f(x)  x} 15

Fixed Points A monotone function f: L  L where (L, , , , ,  ) is a complete lattice Fix(f) = { l: l  L, f(l) = l} Red(f) = {l: l  L, f(l)  l} Ext(f) = {l: l  L, l  f(l)} – l 1  l 2  f(l 1 )  f(l 2 ) Tarski’s Theorem 1955: if f is monotone then: – lfp(f) =  Fix(f) =  Red(f)  Fix(f) – gfp(f) =  Fix(f) =  Ext(f)  Fix(f)   f(  ) f(  ) f2()f2() f2()f2() Fix(f) Ext(f) Red(f) gfp(f) lfp(f) 16

Collecting semantics 1 label0: if x <= 0 goto label1 x := x – 1 goto label0 label1: 2 3 4 5 if x > 0 x := x - 1 2 3 entry exit [x1][x1] [x1][x1] [x1][x1] [x0][x0] [x0][x0] [ x  -1] [x2][x2] [x2][x2] [x2][x2] [x2][x2] [x3][x3] [x3][x3] [x3][x3] … … … 17 [ x  -2] …

Defining the collecting semantics How should we represent the set of states at a given control-flow node by a lattice? How should we represent the sets of states at all control-flow nodes by a lattice? 18

Finite maps For a complete lattice L = (D, , , , ,  ) and finite set V Define the poset L V  L = (V  D,  V  L,  V  L,  V  L,  V  L,  V  L ) as follows: – f 1  V  L f 2 iff for all v  V f 1 (v)  f 2 (v) –  V  L = ?  V  L = ?  V  L = ?  V  L = ? Lemma: L is a complete lattice Define the map constructor L V  L = Map(V, L) 19

The collecting lattice Lattice for a given control-flow node v: ? Lattice for entire control-flow graph with nodes V: ? We will use this lattice as a baseline for static analysis and define abstractions of its elements 20

The collecting lattice Lattice for a given control-flow node v: L v =(2 State, , , , , State) Lattice for entire control-flow graph with nodes V: L CFG = Map(V, L v ) We will use this lattice as a baseline for static analysis and define abstractions of its elements 21

Equational definition of the semantics Define variables of type set of states for each control-flow node Define constraints between them 22 if x > 0 x := x - 1 2 3 entry exit R[entry] R[2] R[3] R[exit]

Equational definition of the semantics R[2] = R[entry]   x:=x-1  R[3] R[3] = R[2]  {s | s(x) > 0} R[exit] = R[2]  {s | s(x)  0} A system of recursive equations How can we approximate it using what we have learned so far? 23 if x > 0 x := x - 1 2 3 entry exit R[entry] R[2] R[3] R[exit]

An abstract semantics R[2] = R[entry]   x:=x-1  # R[3] R[3] = R[2]  {s | s(x) > 0} # R[exit] = R[2]  {s | s(x)  0} # A system of recursive equations 24 if x > 0 x := x - 1 2 3 entry exit R[entry] R[2] R[3] R[exit] Abstract transformer for x:=x-1 Abstract representation of {s | s(x) < 0}

Abstract interpretation via abstraction 25 set of states collecting semantics statement S abstract representation of sets of states  abstract representation of sets of states abstract semantics statement S abstract representation of sets of states abstraction {P}{P}S{Q}{Q}sp(S, P)  generalizes axiomatic verification

Abstract interpretation via concretization 26 set of states collecting semantics statement S set of states  abstract representation of sets of states abstract semantics statement S abstract representation of sets of states concretization {P}{P}S{Q}{Q} models(P)models(sp(S, P))models(Q) 

Required knowledge Collecting semantics Abstract semantics Connection between collecting semantics and abstract semantics Algorithm to compute abstract semantics 27

The collecting lattice (sets of states) Lattice for a given control-flow node v: L v =(2 State, , , , , State) Lattice for entire control-flow graph with nodes V: L CFG = Map(V, L v ) We will use this lattice as a baseline for static analysis and define abstractions of its elements 28

Equation systems in general Let L be a complete lattice (D, , , , ,  ) Let R be a vector of variables R[0, …, n]  D  …  D Let F be a vector of functions of the type F[i] : R[0, …, n]  R[0, …, n] A system of equations R[0] = f[0](R[0], …, R[n]) … R[n] = f[n](R[0], …, R[n]) In vector notation R = F(R) Questions: 1.Does a solution always exist? 2.If so, is it unique? 3.If so, is it computable? 29

Equation systems in general Let L be a complete lattice (D, , , , ,  ) Let R be a vector of variables R[0, …, n]  D  …  D Let F be a vector of functions of the type F[i] : R[0, …, n]  R[0, …, n] A system of equations R[0] = f[0](R[0], …, R[n]) … R[n] = f[n](R[0], …, R[n]) In vector notation R = F(R) Questions: 1.Does a solution always exist? 2.If so, is it unique? 3.If so, is it computable? 30

Monotone functions Let L 1 =(D 1,  ) and L 2 =(D 2,  ) be two posets A function f : D 1  D 2 is monotone if for every pair x, y  D 1 x  y implies f(x)  f(y) A special case: L 1 =L 2 =(D,  ) f : D  D 31

Important cases of monotonicity Join: f(X, Y) = X  Y Prove it! For a set X and any function g F(X) = { g(x) | x  X } Prove it! Notice that the collecting semantics function is defined in terms of – Join (set union) – Semantic function for atomic statements lifted to sets of states 32

Extensive/reductive functions Let L=(D,  ) be a poset A function f : D  D is extensive if for every x  D, we have that x  f(x) A function f : D  D is reductive if for every x  D, we have that x  f(x) 33

Fixed-points L = (D, , , , ,  ) f : D  D monotone Fix(f) = { d | f(d) = d } Red(f) = { d | f(d)  d } Ext(f) = { d | d  f(d) } Theorem [Tarski 1955] – lfp(f) =  Fix(f) =  Red(f)  Fix(f) – gfp(f) =  Fix(f) =  Ext(f)  Fix(f) 34 Red(f) Ext(f) Fix(f)   lfp gfp fn()fn() fn()fn() 1.Does a solution always exist? Yes 2.If so, is it unique? No, but it has least/greatest solutions 3.If so, is it computable? Under some conditions…

Fixed point example for program R[0] = { x  Z} R[1] = R[0]  R[4] R[2] = R[1]  {s | s(x) > 0} R[3] = R[1]  {s | s(x)  0} R[4] =  x:=x-1  R[2] 35 if x>0 x := x-1 2 3 entry exit xZxZ xZxZ { x >0}{ x <0} if x>0 x := x-1 2 3 entry exit xZxZ xZxZ { x >0}{ x <0} F(d) : Fixed-point = d

Fixed point example for program R[0] = { x  Z} R[1] = R[0]  R[4] R[2] = R[1]  {s | s(x) > 0} R[3] = R[1]  {s | s(x)  0} R[4] =  x:=x-1  R[2] 36 if x>0 x := x-1 2 3 entry exit xZxZ xZxZ { x >0}{ x <-5} if x>0 x := x-1 2 3 entry exit xZxZ xZxZ { x >0}{ x <0} F(d) : pre Fixed-point  d

Fixed point example for program R[0] = { x  Z} R[1] = R[0]  R[4] R[2] = R[1]  {s | s(x) > 0} R[3] = R[1]  {s | s(x)  0} R[4] =  x:=x-1  R[2] 37 if x>0 x := x-1 2 3 entry exit xZxZ xZxZ { x >0}{ x <9} if x>0 x := x-1 2 3 entry exit xZxZ xZxZ { x >0}{ x <0} F(d) : post Fixed-point  d

Continuity and ACC condition Let L = (D, , ,  ) be a complete partial order – Every ascending chain has an upper bound A function f is continuous if for every increasing chain Y  D*, f(  Y) =  { f(y) | y  Y } L satisfies the ascending chain condition (ACC) if every ascending chain eventually stabilizes: d 0  d 1  …  d n = d n+1 = … 38

Fixed-point theorem [Kleene] Let L = (D, , ,  ) be a complete partial order and a continuous function f: D  D then lfp(f) =  n  N f n (  ) Lemma: Monotone functions on posets satisfying ACC are continuous Proof: 39

Resulting algorithm Kleene’s fixed point theorem gives a constructive method for computing the lfp 40   lfp fn()fn() f()f() f2()f2() … d :=  while f(d)  d do d := d  f(d) return d Algorithm lfp(f) =  n  N f n (  ) Mathematical definition

Chaotic iteration 41 Input: – A cpo L = (D, , ,  ) satisfying ACC – L n = L  L  …  L – A monotone function f : D n  D n – A system of equations { X[i] | f(X) | 1  i  n } Output: lfp(f) A worklist-based algorithm for i:=1 to n do X[i] :=  WL = {1,…,n} while WL   do i := pop WL // choose index non-deterministically N := F[i](X) if N  X[i] then X[i] := N add all the indexes that directly depend on i to WL (X[j] depends on X[i] if F[j] contains X[i]) return X

Chaotic iteration for static analysis Specialize chaotic iteration for programs Create a CFG for program Choose a cpo of properties for the static analysis to infer: L = (D, , ,  ) Define variables R[0,…,n] for input/output of each CFG node such that R[i]  D For each node v let v out be the variable at the output of that node: v out = F[v](  u | (u,v) is a CFG edge) – Make sure each F[v] is monotone Variable dependence determined by outgoing edges in CFG 42

Constant propagation example 43 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit x := 4; while (y  5) do z := x; x := 4

Constant propagation lattice For each variable x define L as For a set of program variables Var=x 1,…,x n L n = L  L  …  L 44  0-212...  no information not-a-constant

Write down variables 45 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit x := 4; while (y  5) do z := x; x := 4

Write down equations 46 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R2R2 R3R3 R4R4 R6R6 R1R1 R5R5 R0R0 x := 4; while (y  5) do z := x; x := 4

Collecting semantics equations 47 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R2R2 R3R3 R4R4 R6R6 R 0 = State R 1 =  x:=4  R 0 R 2 = R 1  R 5 R 3 =  assume y  5  R 2 R 4 =  z:=x  R 3 R 5 =  x:=4  R 4 R 6 =  assume y=5  R 2 R1R1 R5R5 R0R0

Constant propagation equations 48 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R2R2 R3R3 R4R4 R6R6 R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R1R1 R5R5 R0R0 abstract transformer

Abstract operations for CP 49 R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 Lattice elements have the form: (v x, v y, v z )  x:=4  # (v x,v y,v z ) = (4, v y, v z )  z:=x  # (v x,v y,v z ) = (v x, v y, v x )  assume y  5  # (v x,v y,v z ) = (v x, v y, v x )  assume y=5  # (v x,v y,v z ) = if v y = k  5 then ( , ,  ) else (v x, 5, v z ) R 1  R 5 = (a 1, b 1, c 1 )  (a 5, b 5, c 5 ) = (a 1  a 5, b 1  b 5, c 1  c 5 )  0-212...  CP lattice for a single variable

Chaotic iteration for CP: initialization 50 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R 2 =( , ,  ) R2R2 R2R2 R 3 =( , ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 1 =( , ,  ) R 5 =( , ,  ) R 0 =( , ,  ) WL = {R 0, R 1, R 2, R 3, R 4, R 5, R 6 }

Chaotic iteration for CP 51 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R 2 =( , ,  ) R2R2 R2R2 R 3 =( , ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 1 =( , ,  ) R 5 =( , ,  ) R 0 =( , ,  ) WL = {R 1, R 2, R 3, R 4, R 5, R 6 }

Chaotic iteration for CP 52 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R 2 =( , ,  ) R2R2 R2R2 R 3 =( , ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 1 =(4, ,  ) R 5 =( , ,  ) R 0 =( , ,  ) WL = {R 2, R 3, R 4, R 5, R 6 }

Chaotic iteration for CP 53 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R 2 =( , ,  ) R2R2 R2R2 R 3 =( , ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 1 =(4, ,  ) R 5 =( , ,  ) R 0 =( , ,  )  0-212...  3 4 WL = {R 2, R 3, R 4, R 5, R 6 }

Chaotic iteration for CP 54 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R 2 =(4, ,  ) R2R2 R2R2 R 3 =( , ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 1 =(4, ,  ) R 5 =( , ,  ) R 0 =( , ,  )  0-212...  3 4 WL = {R 2, R 3, R 4, R 5, R 6 }

Chaotic iteration for CP 55 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R 2 =(4, ,  ) R2R2 R2R2 R 3 =( , ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 1 =(4, ,  ) R 5 =( , ,  ) R 0 =( , ,  ) WL = {R 3, R 4, R 5, R 6 }

Chaotic iteration for CP 56 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R 3 =(4, ,  ) R 4 =( , ,  ) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 5 =( , ,  ) R 1 =(4, ,  ) R 0 =( , ,  ) R 2 =(4, ,  ) WL = {R 4, R 5, R 6 }

Chaotic iteration for CP 57 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R 3 =(4, ,  ) R 4 =(4, , 4) R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 5 =( , ,  ) R 1 =(4, ,  ) R 0 =( , ,  ) R 2 =(4, ,  ) WL = {R 5, R 6 }

Chaotic iteration for CP 58 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R 6 =( , ,  ) R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 R 5 =(4, , 4) R 4 =(4, , 4) R 3 =(4, ,  ) R 1 =(4, ,  ) R 0 =( , ,  ) R 2 =(4, ,  ) WL = {R 2, R 6 } added R 2 back to worklist since it depends on R 5

Chaotic iteration for CP 59 R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R 6 =( , ,  ) R 5 =(4, , 4) R 4 =(4, , 4) R 3 =(4, ,  ) R 1 =(4, ,  ) R 0 =( , ,  ) R 2 =(4, ,  ) WL = {R 6 }

Chaotic iteration for CP 60 R 0 =  R 1 =  x:=4  # R 0 R 2 = R 1  R 5 R 3 =  assume y  5  # R 2 R 4 =  z:=x  # R 3 R 5 =  x:=4  # R 4 R 6 =  assume y=5  # R 2 x := 4 if (*) assume y  5 assume y=5 z := x x := 4 entry exit R2R2 R2R2 R 6 =(4, 5,  ) R 5 =(4, , 4) R 4 =(4, , 4) R 3 =(4, ,  ) R 1 =(4, ,  ) R 0 =( , ,  ) R 2 =(4, ,  ) Fixed-point WL = {} In practice maintain a worklist of nodes

Chaotic iteration for static analysis Specialize chaotic iteration for programs Create a CFG for program Choose a cpo of properties for the static analysis to infer: L = (D, , ,  ) Define variables R[0,…,n] for input/output of each CFG node such that R[i]  D For each node v let v out be the variable at the output of that node: v out = F[v](  u | (u,v) is a CFG edge) – Make sure each F[v] is monotone Variable dependence determined by outgoing edges in CFG 61

Complexity of chaotic iteration Parameters: – n the number of CFG nodes – k is the maximum in-degree of edges – Height h of lattice L – c is the maximum cost of Applying F v  Checking fixed-point condition for lattice L Complexity: O(n  h  c  k) Incremental (worklist) algorithm reduces the n factor in factor – Implement worklist by priority queue and order nodes by reversed topological order 62

Required knowledge Collecting semantics Abstract semantics (over lattices) Algorithm to compute abstract semantics (chaotic iteration) Connection between collecting semantics and abstract semantics Abstract transformers 63

Recap We defined a reference semantics – the collecting semantics We defined an abstract semantics for a given lattice and abstract transformers We defined an algorithm to compute abstract least fixed-point when transformers are monotone and lattice obeys ACC Questions: 1.What is the connection between the two least fixed- points? 2.Transformer monotonicity is required for termination – what should we require for correctness? 64

Recap We defined a reference semantics – the collecting semantics We defined an abstract semantics for a given lattice and abstract transformers We defined an algorithm to compute abstract least fixed-point when transformers are monotone and lattice obeys ACC Questions: 1.What is the connection between the two least fixed- points? 2.Transformer monotonicity is required for termination – what should we require for correctness? 65

Galois Connection Given two complete lattices C = (D C,  C,  C,  C,  C,  C )– concrete domain A = (D A,  A,  A,  A,  A,  A )– abstract domain A Galois Connection (GC) is quadruple (C, , , A) that relates C and A via the monotone functions – The abstraction function  : D C  D A – The concretization function  : D A  D C for every concrete element c  D C and abstract element a  D A  (  (a))  a and c   (  (c)) Alternatively  (c)  a iff c   (a) 66

Galois Connection: c   (  (c)) 67 1   c 2 (c)(c) 3  (  (c))  The most precise (least) element in A representing c CA

Galois Connection:  (  (a))  a 68 1   3  (  (a)) 2 (a)(a) a  CA What a represents in C (its meaning)

Example: lattice of equalities Concrete lattice: C = (2 State, , , , , State) Abstract lattice: EQ = { x=y | x, y  Var} A = (2 EQ, , , , EQ,  ) – Treat elements of A as both formulas and sets of constraints Useful for copy propagation – a compiler optimization  (X) = ?  (Y) = ? 69

Example: lattice of equalities Concrete lattice: C = (2 State, , , , , State) Abstract lattice: EQ = { x=y | x, y  Var} A = (2 EQ, , , , EQ,  ) – Treat elements of A as both formulas and sets of constraints Useful for copy propagation – a compiler optimization  (s) =  ({s}) = { x=y | s x = s y} that is s  x=y  (X) =  {  (s) | s  X} =  A {  (s) | s  X}  (Y) = { s | s   Y } = models(  Y) 70

Galois Connection: c   (  (c)) 71 1   [x  5, y  5, z  5] 2 x=x, y=y, z=z, x=y, y=x, x=z, z=x, y=z, z=y 3 … [x  6, y  6, z  6] [x  5, y  5, z  5] [x  4, y  4, z  4] …  4 x=x, y=y, z=z  The most precise (least) element in A representing [x  5, y  5, z  5] CA

Most precise abstract representation 72 1 c 5  CA 4 6 2 73   8 9  (c)(c)  (c) =  {c’ | c   (c’)} 

Most precise abstract representation 73 1 c 5  CA 4 6 2 73   8 9   (c)= x=x, y=y, z=z, x=y, y=x, x=z, z=x, y=z, z=y  (c) =  {c’ | c   (c’)} [x  5, y  5, z  5] x=y, y=z x=y, z=y x=y

Galois Connection:  (  (a))  a 74 1   3 x=x, y=y, z=z, x=y, y=x, x=z, z=x, y=z, z=y 2 … [x  6, y  6, z  6] [x  5, y  5, z  5] [x  4, y  4, z  4] … x=y, y=z  What a represents in C (its meaning)    is called a semantic reduction CA

Galois Insertion  a:  (  (a))=a 75   1 x=x, y=y, z=z, x=y, y=x, x=z, z=x, y=z, z=y 2 … [x  6, y  6, z  6] [x  5, y  5, z  5] [x  4, y  4, z  4] … CA How can we obtain a Galois Insertion from a Galois Connection? All elements are reduced

Properties of a Galois Connection The abstraction and concretization functions uniquely determine each other:  (a) =  {c |  (c)  a}  (c) =  {a | c   (a)} 76

Abstracting (disjunctive) sets It is usually convenient to first define the abstraction of single elements  (s) =  ({s}) Then lift the abstraction to sets of elements  (X) =  A {  (s) | s  X} 77

The case of symbolic domains An important class of abstract domains are symbolic domains – domains of formulas C = (2 State, , , , , State) A = (D A,  A,  A,  A,  A,  A ) If D A is a set of formulas then the abstraction of a state is defined as  (s) =  ({s}) =  A {  | s   } the least formula from D A that s satisfies The abstraction of a set of states is  (X) =  A {  (s) | s  X} The concretization is  (  ) = { s | s   } = models(  ) 78

Inducing along the connections Assume the complete lattices C = (D C,  C,  C,  C,  C,  C ) A = (D A,  A,  A,  A,  A,  A ) M = (D M,  M,  M,  M,  M,  M ) and Galois connections GC C,A =(C,  C,A,  A,C, A) and GC A,M =(A,  A,M,  M,A, M) Lemma: both connections induce the GC C,M = (C,  C,M,  M,C, M) defined by  C,M =  C,A   A,M and  M,C =  M,A   A,C 79

Inducing along the connections 80 1 C,AC,A A,CA,C c 2 C,A(c)C,A(c) 5 CA 3 M A,MA,M 4 M,AM,A c’c’ a’ =  A,M (  C,A (c))

Sound abstract transformer Given two lattices C = (D C,  C,  C,  C,  C,  C ) A = (D A,  A,  A,  A,  A,  A ) and GC C,A =(C, , , A) with A concrete transformer f : D C  D C an abstract transformer f # : D A  D A We say that f # is a sound transformer (w.r.t. f) if  c: f(c)=c’   (f # (c))   (c’) For every a and a’ such that  (f(  (a)))  A f # (a) 81

Transformer soundness condition 1 82 12 CA f 3 4 f#f# 5   c: f(c)=c’   (f # (c))   (c’)

Transformer soundness condition 2 83 CA 12 f#f# 3 5 f 4   a: f # (a)=a’  f(  (a))   (a’)

Best (induced) transformer 84 CA 2 3 f f # (a)=  (f(  (a))) 1 f#f# 4 Problem:  incomputable directly

Best abstract transformer [CC’77] Best in terms of precision – Most precise abstract transformer – May be too expensive to compute Constructively defined as f # =   f   – Induced by the GC Not directly computable because first step is concretization We often compromise for a “good enough” transformer – Useful tool: partial concretization 85

Transformer example C = (2 State, , , , , State) EQ = { x=y | x, y  Var} A = (2 EQ, , , , EQ,  )  (s) =  ({s}) = { x=y | s x = s y} that is s  x=y  (X) =  {  (s) | s  X} =  A {  (s) | s  X}  (  ) = { s | s   } = models(  ) Concrete:  x:=y  X = { s[x  s y] | s  X} Abstract:  x:=y  # X = ? 86

Developing a transformer for EQ - 1 Input has the form X =  {a=b} sp(x:=expr,  ) =  v. x=expr[v/x]   [v/x] sp(x:=y,  X) =  v. x=y[v/x]   {a=b}[v/x] = … Let’s define helper notations: – EQ(X, y) = {y=a, b=y  X} Subset of equalities containing y – EQc(X, y) = X \ EQ(X, y) Subset of equalities not containing y 87

Developing a transformer for EQ - 2 sp(x:=y,  X) =  v. x=y[v/x]   {a=b}[v/x] = … Two cases – x is y: sp(x:=y,  X) = X – x is different from y: sp(x:=y,  X) =  v. x=y  EQ)X, x)[v/x]  EQc(X, x)[v/x] = x=y  EQc(X, x)   v. EQ)X, x)[v/x]  x=y  EQc(X, x) Vanilla transformer:  x:=y  #1 X = x=y  EQc(X, x) Example:  x:=y  #1  {x=p, q=x, m=n} =  {x=y, m=n} Is this the most precise result? 88

Developing a transformer for EQ - 3  x:=y  #1  {x=p, x=q, m=n} =  {x=y, m=n}   {x=y, m=n, p=q} – Where does the information p=q come from? sp(x:=y,  X) = x=y  EQc(X, x)   v. EQ)X, x)[v/x]  v. EQ)X, x)[v/x] holds possible equalities between different a’s and b’s – how can we account for that? 89

Developing a transformer for EQ - 4 Define a reduction operator: Explicate(X) = if exist {a=b, b=c}  X but not {a=c}  X then Explicate(X  {a=c}) else X Define  x:=y  #2 =  x:=y  #1  Explicate  x:=y  #2 )  {x=p, x=q, m=n}) =  {x=y, m=n, p=q} is this the best transformer? 90

Developing a transformer for EQ - 5  x:=y  #2 )  {y=z}) = {x=y, y=z}  {x=y, y=z, x=z} Idea: apply reduction operator again after the vanilla transformer  x:=y  #3 = Explicate   x:=y  #1  Explicate Observation : after the first time we apply Explicate, all subsequent values will be in the image of the abstraction so really we only need to apply it once to the input Finally:  x:=y  # (X) = Explicate   x:=y  #1 – Best transformer for reduced elements (elements in the image of the abstraction) 91

Negative property of best transformers Let f # =   f   Best transformer does not compose  (f(f(  (a))))  f # (f # (a)) 92

 (f(f(  (a))))  f # (f # (a)) 93 CA 2 3 f 1 f#f# 6 5 4 f 7  f#f# 8 9 f

Soundness theorem 1 1.Given two complete lattices C = (D C,  C,  C,  C,  C,  C ) A = (D A,  A,  A,  A,  A,  A ) and GC C,A =(C, , , A) with 2.Monotone concrete transformer f : D C  D C 3.Monotone abstract transformer f # : D A  D A 4.  a  D A : f(  (a))   (f # (a)) Then lfp(f)   (lfp(f # ))  (lfp(f))  lfp(f # ) 94

Soundness theorem 1 95 CA  f  fn fn  … lpf(f)  f2 f2  f3f3 f #n  … lpf(f # )  f#2 f#2  f#3f#3 f# f#   a  D A : f(  (a))   (f # (a))   a  D A : f n (  (a))   (f #n (a))   a  D A : lfp(f n )(  (a))   (lfp(f #n )(a))  lfp(f)   lfp(f # ) 

Soundness theorem 2 1.Given two complete lattices C = (D C,  C,  C,  C,  C,  C ) A = (D A,  A,  A,  A,  A,  A ) and GC C,A =(C, , , A) with 2.Monotone concrete transformer f : D C  D C 3.Monotone abstract transformer f # : D A  D A 4.  c  D C :  (f(c))  f # (  (c)) Then  (lfp(f))  lfp(f # ) lfp(f)   (lfp(f # )) 96

Soundness theorem 2 97 CA   f  fn fn  … lpf(f)  f2 f2  f3f3 f #n  … lpf(f # )  f#2 f#2  f#3f#3 f# f#   c  D C :  (f(c))  f # (  (c))   c  D C :  (f n (c))  f #n (  (c))   c  D C :  (lfp(f)(c))  lfp(f # )(  (c))  lfp(f)   lfp(f # ) 

A recipe for a sound static analysis Define an “appropriate” operational semantics Define “collecting” structural operational semantics Establish a Galois connection between collecting states and abstract states Local correctness: show that the abstract interpretation of every atomic statement is sound w.r.t. the collecting semantics Global correctness: conclude that the analysis is sound 98

Completeness Local property: – forward complete:  c:  (f # (c)) =  (f(c)) – backward complete:  a: f(  (a)) =  (f # (a)) A property of domain and the (best) transformer Global property: –  (lfp(f)) = lfp(f # ) – lfp(f) =  (lfp(f # )) Very ideal but usually not possible unless we change the program model (apply strong abstraction) and/or aim for very simple properties 99

Forward complete transformer 100 12 CA f 3 4 f#f#  c:  (f # (c)) =  (f(c))

Backward complete transformer 101 CA 12 f#f# 3 5 f  a: f(  (a)) =  (f # (a))

Global (backward) completeness 102 CA  f  fn fn  … lpf(f)  f2 f2  f3f3 f #n  … lpf(f # )  f#2 f#2  f#3f#3 f# f#   a: f(  (a)) =  (f # (a))   a: f n (  (a)) =  (f #n (a))   a  D A : lfp(f n )(  (a)) =  (lfp(f #n )(a))  lfp(f)  = lfp(f # ) 

Global (forward) completeness 103 CA   f  fn fn  … lpf(f)  f2 f2  f3f3 f #n  … lpf(f # )  f#2 f#2  f#3f#3 f# f#   c  D C :  (f(c)) = f # (  (c))   c  D C :  (f n (c)) = f #n (  (c))   c  D C :  (lfp(f)(c)) = lfp(f # )(  (c))  lfp(f)  = lfp(f # ) 

Widening/Narrowing 104

How can we prove this automatically? 105 RelProd(CP, VE)

Intervals domain One of the simplest numerical domains Maintain for each variable x an interval [L,H] – L is either an integer of -  – H is either an integer of +  A (non-relational) numeric domain 106

Intervals lattice for variable x 107  [0,0][-1,-1][-2,-2][1,1][2,2]... [- ,+  ] [0,1][1,2][2,3][-1,0][-2,-1] [-10,10] [1, +  ][ - ,0 ]... [2, +  ][0, +  ][ - ,-1 ]... [-20,10]

Intervals lattice for variable x D int [x] = { (L,H) | L  - ,Z and H  Z,+  and L  H}   =[- ,+  ]  = ? – [1,2]  [3,4] ? – [1,4]  [1,3] ? – [1,3]  [1,4] ? – [1,3]  [- ,+  ] ? What is the lattice height? 108

Intervals lattice for variable x D int [x] = { (L,H) | L  - ,Z and H  Z,+  and L  H}   =[- ,+  ]  = ? – [1,2]  [3,4] no – [1,4]  [1,3] no – [1,3]  [1,4] yes – [1,3]  [- ,+  ]yes What is the lattice height? Infinite 109

Joining/meeting intervals [a,b]  [c,d] = ? – [1,1]  [2,2] = ? – [1,1]  [2, +  ] = ? [a,b]  [c,d] = ? – [1,2]  [3,4] = ? – [1,4]  [3,4] = ? – [1,1]  [1,+  ] = ? Check that indeed x  y if and only if x  y=y 110

Joining/meeting intervals [a,b]  [c,d] = [min(a,c), max(b,d)] – [1,1]  [2,2] = [1,2] – [1,1]  [2,+  ] = [1,+  ] [a,b]  [c,d] = [max(a,c), min(b,d)] if a proper interval and otherwise  – [1,2]  [3,4] =  – [1,4]  [3,4] = [3,4] – [1,1]  [1,+  ] = [1,1] Check that indeed x  y if and only if x  y=y 111

Interval domain for programs D int [x] = { (L,H) | L  - ,Z and H  Z,+  and L  H} For a program with variables Var={x 1,…,x k } D int [Var] = ? 112

Interval domain for programs D int [x] = { (L,H) | L  - ,Z and H  Z,+  and L  H} For a program with variables Var={x 1,…,x k } D int [Var] = D int [x 1 ]  …  D int [x k ] How can we represent it in terms of formulas? 113

Interval domain for programs D int [x] = { (L,H) | L  - ,Z and H  Z,+  and L  H} For a program with variables Var={x 1,…,x k } D int [Var] = D int [x 1 ]  …  D int [x k ] How can we represent it in terms of formulas? – Two types of factoids x  c and x  c – Example: S =  {x  9, y  5, y  10} – Helper operations c + +  = +  remove(S, x) = S without any x-constraints lb(S, x) = 114

Assignment transformers  x := c  # S = ?  x := y  # S = ?  x := y+c  # S = ?  x := y+z  # S = ?  x := y*c  # S = ?  x := y*z  # S = ? 115

Assignment transformers  x := c  # S = remove(S,x)  {x  c, x  c}  x := y  # S = remove(S,x)  {x  lb(S,y), x  ub(S,y)}  x := y+c  # S = remove(S,x)  {x  lb(S,y)+c, x  ub(S,y)+c}  x := y+z  # S = remove(S,x)  {x  lb(S,y)+lb(S,z), x  ub(S,y)+ub(S,z)}  x := y*c  # S = remove(S,x)  if c>0 {x  lb(S,y)*c, x  ub(S,y)*c} else {x  ub(S,y)*-c, x  lb(S,y)*-c}  x := y*z  # S = remove(S,x)  ? 116

assume transformers  assume x=c  # S = ?  assume x<c  # S = ?  assume x=y  # S = ?  assume x  c  # S = ? 117

assume transformers  assume x=c  # S = S  {x  c, x  c}  assume x<c  # S = S  {x  c-1}  assume x=y  # S = S  {x  lb(S,y), x  ub(S,y)}  assume x  c  # S = ? 118

assume transformers  assume x=c  # S = S  {x  c, x  c}  assume x<c  # S = S  {x  c-1}  assume x=y  # S = S  {x  lb(S,y), x  ub(S,y)}  assume x  c  # S = (S  {x  c-1})  (S  {x  c+1}) 119

Effect of function f on lattice elements L = (D, , , , ,  ) f : D  D monotone Fix(f) = { d | f(d) = d } Red(f) = { d | f(d)  d } Ext(f) = { d | d  f(d) } Theorem [Tarski 1955] – lfp(f) =  Fix(f) =  Red(f)  Fix(f) – gfp(f) =  Fix(f) =  Ext(f)  Fix(f) 120 Red(f) Ext(f) Fix(f)   lfp gfp fn()fn() fn()fn()

Effect of function f on lattice elements L = (D, , , , ,  ) f : D  D monotone Fix(f) = { d | f(d) = d } Red(f) = { d | f(d)  d } Ext(f) = { d | d  f(d) } Theorem [Tarski 1955] – lfp(f) =  Fix(f) =  Red(f)  Fix(f) – gfp(f) =  Fix(f) =  Ext(f)  Fix(f) 121 Red(f) Ext(f) Fix(f)   lfp gfp fn()fn() fn()fn()

Continuity and ACC condition Let L = (D, , ,  ) be a complete partial order – Every ascending chain has an upper bound A function f is continuous if for every increasing chain Y  D*, f(  Y) =  { f(y) | y  Y } L satisfies the ascending chain condition (ACC) if every ascending chain eventually stabilizes: d 0  d 1  …  d n = d n+1 = … 122

Fixed-point theorem [Kleene] Let L = (D, , ,  ) be a complete partial order and a continuous function f: D  D then lfp(f) =  n  N f n (  ) 123

Resulting algorithm Kleene’s fixed point theorem gives a constructive method for computing the lfp 124   lfp fn()fn() f()f() f2()f2() … d :=  while f(d)  d do d := d  f(d) return d Algorithm lfp(f) =  n  N f n (  ) Mathematical definition

Chaotic iteration 125 Input: – A cpo L = (D, , ,  ) satisfying ACC – L n = L  L  …  L – A monotone function f : D n  D n – A system of equations { X[i] | f(X) | 1  i  n } Output: lfp(f) A worklist-based algorithm for i:=1 to n do X[i] :=  WL = {1,…,n} while WL   do j := pop WL // choose index non-deterministically N := F[i](X) if N  X[i] then X[i] := N add all the indexes that directly depend on i to WL (X[j] depends on X[i] if F[j] contains X[i]) return X

Concrete semantics equations 126 R[0] = { x  Z} R[1] =  x:=7  R[2] = R[1]  R[4] R[3] = R[2]  {s | s(x) < 1000} R[4] =  x:=x+1  R[3] R[5] = R[2]  {s | s(x)  1000} R[6] = R[5]  {s | s(x)  1001} R[0] R[2] R[3] R[4] R[1] R[5] R[6]

Abstract semantics equations 127 R[0] =  ({ x  Z}) R[1] =  x:=7  # R[2] = R[1]  R[4] R[3] = R[2]   ({s | s(x) < 1000}) R[4] =  x:=x+1  # R[3] R[5] = R[2]   ({s | s(x)  1000}) R[6] = R[5]   ({s | s(x)  1001})  R[5]   ({s | s(x)  999}) R[0] R[2] R[3] R[4] R[1] R[5] R[6]

Abstract semantics equations 128 R[0] =  R[1] = [7,7] R[2] = R[1]  R[4] R[3] = R[2]  [- ,999] R[4] = R[3] + [1,1] R[5] = R[2]  [1000,+  ] R[6] = R[5]  [999,+  ]  R[5]  [1001,+  ] R[0] R[2] R[3] R[4] R[1] R[5] R[6]

Too many iterations to converge 129

How many iterations for this one? 130

Widening Introduce a new binary operator to ensure termination – A kind of extrapolation Enables static analysis to use infinite height lattices – Dynamically adapts to given program Tricky to design Precision less predictable then with finite- height domains (widening non-monotone) 131

Formal definition For all elements d 1  d 2  d 1  d 2 For all ascending chains d 0  d 1  d 2  … the following sequence is finite – y 0 = d 0 – y i+1 = y i  d i+1 For a monotone function f : D  D define – x 0 =  – x i+1 = x i  f(x i ) Theorem: – There exits k such that x k+1 = x k – x k  Red(f) = { d | d  D and f(d)  d } 132

Analysis with finite-height lattice 133 A  f #n  = lpf(f # )  … f#2 f#2  f#3f#3 f# f#  Red(f) Fix(f)

Analysis with widening 134 A  f#2  f#3f#2  f#3 f#2 f#2  f#3f#3 f# f#  Red(f) Fix(f) lpf(f # ) 

Widening for Intervals Analysis   [c, d] = [c, d] [a, b]  [c, d] = [ if a  c then a else - , if b  d then b else  135

Semantic equations with widening 136 R[0] =  R[1] = [7,7] R[2] = R[1]  R[4] R[2.1] = R[2.1]  R[2] R[3] = R[2.1]  [- ,999] R[4] = R[3] + [1,1] R[5] = R[2]  [1001,+  ] R[6] = R[5]  [999,+  ]  R[5]  [1001,+  ] R[0] R[2] R[3] R[4] R[1] R[5] R[6]

Choosing analysis with widening 137 Enable widening

Non monotonicity of widening [0,1]  [0,2] = ? [0,2]  [0,2] = ?

Non monotonicity of widening [0,1]  [0,2] = [0,  ] [0,2]  [0,2] = [0,2]

Analysis results with widening 140 Did we prove it?

Analysis with narrowing 141 A  f#2  f#3f#2  f#3 f#2 f#2  f#3f#3 f# f#  Red(f) Fix(f) lpf(f # ) 

Formal definition of narrowing Improves the result of widening y  x  y  (x  y)  x For all decreasing chains x 0  x 1  … the following sequence is finite – y 0 = x 0 – y i+1 = y i  x i+1 For a monotone function f: D  D and x k  Red(f) = { d | d  D and f(d)  d } define – y 0 = x – y i+1 = y i  f(y i ) Theorem: – There exits k such that y k+1 =y k – y k  Red(f) = { d | d  D and f(d)  d }

Narrowing for Interval Analysis [a, b]   = [a, b] [a, b]  [c, d] = [ if a = -  then c else a, if b =  then d else b ]

Semantic equations with narrowing 144 R[0] =  R[1] = [7,7] R[2] = R[1]  R[4] R[2.1] = R[2.1]  R[2] R[3] = R[2.1]  [- ,999] R[4] = R[3]+[1,1] R[5] = R[2] #  [1000,+  ] R[6] = R[5]  [999,+  ]  R[5]  [1001,+  ] R[0] R[2] R[3] R[4] R[1] R[5] R[6]

Analysis with widening/narrowing Two phases – Phase 1: analyze with widening until converging – Phase 2: use values to analyze with narrowing 145 Phase 2: R[0] =  R[1] = [7,7] R[2] = R[1]  R[4] R[2.1] = R[2.1]  R[2] R[3] = R[2.1]  [- ,999] R[4] = R[3]+[1,1] R[5] = R[2] #  [1000,+  ] R[6] = R[5]  [999,+  ]  R[5]  [1001,+  ] Phase 1: R[0] =  R[1] = [7,7] R[2] = R[1]  R[4] R[2.1] = R[2.1]  R[2] R[3] = R[2.1]  [- ,999] R[4] = R[3] + [1,1] R[5] = R[2]  [1001,+  ] R[6] = R[5]  [999,+  ]  R[5]  [1001,+  ]

Analysis with widening/narrowing 146

Analysis results widening/narrowing 147 Precise invariant

Program Analysis and Verification 0368-4479 Noam Rinetzky Lecture 8: Abstract Interpretation 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.

Similar presentations

Presentation on theme: "Program Analysis and Verification 0368-4479 Noam Rinetzky Lecture 8: Abstract Interpretation 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Program Analysis and Verification 0368-4479 Noam Rinetzky Lecture 8: Abstract Interpretation 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav.

Similar presentations

Presentation on theme: "Program Analysis and Verification 0368-4479 Noam Rinetzky Lecture 8: Abstract Interpretation 1 Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav."— Presentation transcript:

Similar presentations

About project

Feedback