1 Abstraction and Approximation via Abstract Interpretation: a systematic approach to program analysis and verification Giorgio Levi Dipartimento di Informatica,

Slides:



Advertisements
Similar presentations
Technologies for finding errors in object-oriented software K. Rustan M. Leino Microsoft Research, Redmond, WA Lecture 1 Summer school on Formal Models.
Advertisements

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
1 A simple abstract interpreter to compute Sign s.
Semantics Static semantics Dynamic semantics attribute grammars
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
1 PROPERTIES OF A TYPE ABSTRACT INTERPRETATER. 2 MOTIVATION OF THE EXPERIMENT § a well understood case l type inference in functional programming à la.
1 How to transform an analyzer into a verifier. 2 OUTLINE OF THE LECTURE a verification technique which combines abstract interpretation and Park’s fixpoint.
Inferring Disjunctive Postconditions Corneliu Popeea and Wei-Ngan Chin School of Computing National University of Singapore - ASIAN
Rigorous Software Development CSCI-GA Instructor: Thomas Wies Spring 2012 Lecture 11.
Copyright © 2006 Addison-Wesley. All rights reserved.1-1 ICS 410: Programming Languages Chapter 3 : Describing Syntax and Semantics Axiomatic Semantics.
ISBN Chapter 3 Describing Syntax and Semantics.
1 Semantic Description of Programming languages. 2 Static versus Dynamic Semantics n Static Semantics represents legal forms of programs that cannot be.
CS 355 – Programming Languages
Comp 205: Comparative Programming Languages Semantics of Imperative Programming Languages denotational semantics operational semantics logical semantics.
Introduction to Computability Theory
1 Basic abstract interpretation theory. 2 The general idea §a semantics l any definition style, from a denotational definition to a detailed interpreter.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Catriel Beeri Pls/Winter 2004/5 type reconstruction 1 Type Reconstruction & Parametric Polymorphism  Introduction  Unification and type reconstruction.
Data Flow Analysis Compiler Design October 5, 2004 These slides live on the Web. I obtained them from Jeff Foster and he said that he obtained.
Abstract Interpretation Part I Mooly Sagiv Textbook: Chapter 4.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 18 Program Correctness To treat programming.
Data Flow Analysis Compiler Design Nov. 8, 2005.
1 Combining verification and analysis. 2 CONCLUSIONS ON VERIFICATION  denotational abstract interpreters have the extra-value of being easily transformed.
Describing Syntax and Semantics
Program Analysis Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
1 Tentative Schedule u Today: Theory of abstract interpretation u May 5 Procedures u May 15, Orna Grumberg u May 12 Yom Hatzamaut u May.
Copyright © Cengage Learning. All rights reserved. CHAPTER 11 ANALYSIS OF ALGORITHM EFFICIENCY ANALYSIS OF ALGORITHM EFFICIENCY.
Towers of Hanoi. Introduction This problem is discussed in many maths texts, And in computer science an AI as an illustration of recursion and problem.
Notes for Chapter 12 Logic Programming The AI War Basic Concepts of Logic Programming Prolog Review questions.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 2: Operational Semantics I Roman Manevich Ben-Gurion University.
MATH 224 – Discrete Mathematics
Chapter 2 Mathematical preliminaries 2.1 Set, Relation and Functions 2.2 Proof Methods 2.3 Logarithms 2.4 Floor and Ceiling Functions 2.5 Factorial and.
Pattern-directed inference systems
Type Systems CS Definitions Program analysis Discovering facts about programs. Dynamic analysis Program analysis by using program executions.
ISBN Chapter 3 Describing Semantics -Attribute Grammars -Dynamic Semantics.
CS 363 Comparative Programming Languages Semantics.
Propositional Calculus CS 270: Mathematical Foundations of Computer Science Jeremy Johnson.
Universidad Nacional de ColombiaUniversidad Nacional de Colombia Facultad de IngenieríaFacultad de Ingeniería Departamento de Sistemas- 2002Departamento.
Program Analysis and Verification Spring 2014 Program Analysis and Verification Lecture 4: Axiomatic Semantics I Roman Manevich Ben-Gurion University.
3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
Semantics In Text: Chapter 3.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 12: Abstract Interpretation IV Roman Manevich Ben-Gurion University.
Logic Programming and Prolog Goal: use formalism of first-order logic Output described by logical formula (theorem) Input described by set of formulae.
Ch. 13 Ch. 131 jcmt CSE 3302 Programming Languages CSE3302 Programming Languages (notes?) Dr. Carter Tiernan.
Lecture 5 1 CSP tools for verification of Sec Prot Overview of the lecture The Casper interface Refinement checking and FDR Model checking Theorem proving.
Predicate Abstraction. Abstract state space exploration Method: (1) start in the abstract initial state (2) use to compute reachable states (invariants)
© Copyright 2008 STI INNSBRUCK Intelligent Systems Propositional Logic.
Program Analysis and Verification Spring 2015 Program Analysis and Verification Lecture 13: Abstract Interpretation V Roman Manevich Ben-Gurion University.
1 Combining Abstract Interpreters Mooly Sagiv Tel Aviv University
1 Iterative Program Analysis Abstract Interpretation Mooly Sagiv Tel Aviv University Textbook:
CSC3315 (Spring 2009)1 CSC 3315 Languages & Compilers Hamid Harroud School of Science and Engineering, Akhawayn University
PREDICATES AND QUANTIFIERS COSC-1321 Discrete Structures 1.
Operational Semantics Mooly Sagiv Tel Aviv University Sunday Scrieber 8 Monday Schrieber.
Operational Semantics Mooly Sagiv Reference: Semantics with Applications Chapter 2 H. Nielson and F. Nielson
Abstraction and Abstract Interpretation. Abstraction (a simplified view) Abstraction is an effective tool in verification Given a transition system, we.
1 Abstract interpretation Giorgio Levi Dipartimento di Informatica, Università di Pisa
COMP 412, FALL Type Systems C OMP 412 Rice University Houston, Texas Fall 2000 Copyright 2000, Robert Cartwright, all rights reserved. Students.
Certifying and Synthesizing Membership Equational Proofs Patrick Lincoln (SRI) joint work with Steven Eker (SRI), Jose Meseguer (Urbana) and Grigore Rosu.
Spring 2017 Program Analysis and Verification
Spring 2017 Program Analysis and Verification
Formal methods: Lecture
Spring 2017 Program Analysis and Verification
Symbolic Implementation of the Best Transformer
Programming Languages 2nd edition Tucker and Noonan
Semantics In Text: Chapter 3.
Computer Security: Art and Science, 2nd Edition
Data Flow Analysis Compiler Design
This Lecture Substitution model
Programming Languages 2nd edition Tucker and Noonan
Presentation transcript:

1 Abstraction and Approximation via Abstract Interpretation: a systematic approach to program analysis and verification Giorgio Levi Dipartimento di Informatica, Università di Pisa

2 Abstraction and approximation l two relevant concepts in several areas of computer science (and engineering) –to reason about complex systems –to make reasoning computationally feasible

3 Abstract Interpretation (Cousot & Cousot, POPL 77 & 79) l a 20-years old technique to systematically handle abstraction and approximation –born to describe (and prove correct) static analyses (for imperative programs) –popular mainly in declarative paradigms –viewed today as a general technique to reason about semantics at different levels of abstraction –successfully applied to distributed and mobile systems and to model checking –recently applied to program verification

4 Abstract Interpretation, Semantics, Analysis Algorithms l how abstract interpretation is often used in static program analysis –a semantics –an analysis algorithm developed by ad-hoc techniques –the A.I. Theory (definition of an abstract domain) is used to prove that the algorithm is correct, i.e., that its results are an approximation of the property to be analyzed

5 Abstract Interpretation, Semantics, Analysis Algorithms l the abstract interpretation I like –a semantics –an abstract domain designed to model the property to be analyzed –the A.I. Theory is used to systematically derive the abstract semantics –the analysis algorithm is exactly the computation of the abstract semantics and is correct by construction

6 Abstract Interpretation Theory in 4 Steps l concrete and abstract domain l the Galois insertion l abstract operations l from the concrete to the abstract semantics

7 Concrete and Abstract Domains l two complete partial orders –the partial orders reflect precision u smaller is better (C,  concrete domain (C,  C – C has the structure of a powerset (A,  abstract domain (A,  –each abstract value is a description of “a set of” concrete values

8 The Sign Abstract Domain (P(Z),  concrete domain (P(Z),  – sets of integers (Sign,  abstract domain (Sign, 

9 Galois insertions (C,  (A,   : A  C (concretization)  :  C  A (abstraction)  monotonic  x  C  x  x   y  A  y  y  mutually determine each other

10 The sign example  sign  (x) – , if x= bot –{y|y>0}, if x= + –{y|y  0}, if x= 0+ –{0}, if x= 0 –{y|y  0}, if x= 0- –{y|y<0}, if x= - – Z, if x= top  sign  y) = glb of –bot, if y=  –-, if y  {y|y<0} –0-, if y  {y|y  0} –0, if y  {0} –0+, if y  {y|y  0} –+, if y  {y|y  0} –top, if y  Z

11 Abstract Operations the concrete semantic evaluation function is defined in terms of primitive semantic operations f i on C for each f i we need to provide a corresponding f i  defined on A  f i  must be locally correct, i.e.  x 1,..,x n  C  f i  x 1,..,x n )  f i   x 1 ,..,  x n   the optimal (most precise) abstract operator is f i   y 1,..,y n )  =  f i  y 1 ,..,  y n   the operator is complete (precise) if  x 1,..,x n  C   f i  x 1,..,x n ))  f i    x 1 ,..,  x n 

12 Times Sign

13 Plus Sign

14 The Sign example Times and Plus are the usual operations lifted to P(Z) l both Times sign and Plus sign are optimal (hence correct) l Times sign is also complete (no approximation) l Plus sign is necessarily incomplete  sign (Times({2},{-3})) = Times sign (  sign ({2}),  sign ({-3}))  sign (Plus({2},{-3}))  Plus sign (  sign ({2}),  sign ({-3}))

15 The Abstract Semantics F = concrete semantic evaluation function –if we start from a standard semantic definition, the lifting to the powerset (collecting semantics) is simply a conceptual operation lfp F = concrete semantics F  = abstract semantic evaluation function –obtained by replacing in F every concrete semantic operation by a corresponding (locally correct) abstract operation lfp F  = abstract semantics  global correctness  ( lfp F)  lfp F  –the abstract semantics is less precise than the abstraction of the concrete semantics

16 Where does the approximation come from? l incomplete abstract operations l more execution paths in the abstract control flow –the abstract state has not enough information to make deterministic choices –conditionals, pattern matching, etc. u the set of resulting abstract states is turned into a single abstract state, by performing an abstract lub operation

17 Approximation in abstract Sign computations l concrete state [x={3}] l if x>2 then y:=3 else y:=-5; l concrete state [x={3}, y={3}] u abstract state [x=+] u if x>2 then y:=3 else y:=-5; –the abstract guard “can be both true and false” –both paths need to be abstractly evaluated –the two resulting abstract states are merged by performing a lub in Sign u abstract state [x=+,y=top]

18  ( lfp F )  lfp F  why computing lfp F  ? lfp F cannot be computed in finitely many steps –  steps are in general required lfp F   can be computed in finitely many steps, if the abstract domain is finite or at least noetherian –no infinite increasing chains l static analysis 1 –noetherian abstract domain –termination, approximation l static analysis 2 –non-noetherian domain –termination via widening –further approximation l comparative semantics –non-noetherian domain –abstraction without approximation (completeness)  ( lfp F)  lfp F 

19 Static Analysis l abstract domain and Galois connection to model the property l (possibly optimal) correct abstract operations F  the analysis is the computation of lfp F  l if the abstract domain is non- noetherian, or if the complexity of lfp F   is too high –use a widening operator –which effectively computes an (upper) approximation of lfp F  u one example later

20 Comparative Semantics  ( lfp F )  lfp F  l none of the two fixpoints is finitely computable l useful to reason about different semantics and to systematically derive more abstract semantics –choice of the most adequate reference semantics for analysis and verification F   is less expensive than F in computing the observable property modeled by  –no junk l hierarchy of transition systems semantics (P. Cousot, MFPS 97) –trace, big-step operational, denotational, relational, predicate transformer, axiomatic, etc. l systematic reconstruction of several fixpoint (T P -like) semantics for (positive) logic programs (Comini, Levi & Meo, Info. & Comp. 00) –applied in Pisa also to finite failure & infinite computations, CLP, CCP, Prolog, -Prolog, sequent calculi

21 Polymorphic type inference in ML-like functional languages l the ad-hoc solution –Milner’s algorithm, specified by a set of inference rules l an elegant, well-understood, universally accepted semantic formalization l the systematic derivation via abstract interpretation –provides a better insight –shows how to improve precision l inference rules mimic the concrete semantics –in the structure of the semantic evaluation function –in the semantic domains (environment) l semantics to well-typed programs only introduces approximation –if true then 2 else false l the most general polymorphic type for recursive functions is not computable –the inferred type may not be the most general –some type-correct programs cannot be typed

22 Polymorphic type inference via Abstract Interpretation l abstract values = pairs of –a term (with variables) u type expression –a constraint (on variables) u set of term equalities in solved form l partial order (on terms only) –top is “no type” –bottom is “any type” –t 1  t 2, if t 2 is an instance of t 1 l the domain is non-noetherian –there exist infinite increasing chains l an optimal abstract operation –+  ((t1,c1),(t2,c2)) = (int, c1  c2  {t1=int,t2=int}) l abstracting functional values –the concrete semantics E x.e  = v. E e (bind  x v) –the abstract value let v1 = newvar() in let (v2,c2) = E  e (bind  x (v1,{})) in (v1 c2 -> v2,c2)

23 Recursion and Widening l the abstraction of recursive functions is similar to the one of regular functions, but –a fixpoint computation is required –the first approximation of the abstract value of the function is bottom l since the abstract domain is non-noetherian the fixpoint computation may diverge l the solution in Milner’s algorithm –take the results of the first two iterations and compute their lub (most general common instantiation, computed through unification) –if the lub is top (unification fails), the program is not typable (type error) l this is exactly a widening operator, which returns a (correct) upper approximation of the lfp (Furiesi, Master Thesis Pisa. 99)

24 How to improve precision l straightforward! –perform at most k iterations of the fixpoint computation –if we reach a fixpoint, it is the most general type –otherwise, we apply Milner’s widening to the last two results u we succeed in typing more functions u we get more precise types l one example (due to Cousot) l CaML –# let rec f f1 g n x = if n=0 then (g x) else (((((f f1)(fun x -> (fun h -> (g(h x)))))(n - 1))(x))(f1));; This expression has type ('a -> 'a) -> 'b but is here used with type 'b l our answer (the fixpoint is reached in 3 iterations) –val f : ('a -> 'a) -> ('a -> 'b) -> int -> 'a -> 'b =

25 Abstract Interpretation vs. Type Systems l Patrick Cousot has reconstructed a hierarchy of type systems for ML-like languages by using abstract interpretation (Cousot, POPL 97) l type systems have been proposed to cope with other static analyses (strictness, various properties related to security) l type systems need to be proved correct wrt a semantics l abstract semantics are systematically derived from the semantics and are correct by construction two related open interesting problems –comparison of the two approaches from the viewpoint of expressive power and analysis precision (and complexity) –definition of methods to automatically translate formalizations from one approach to the other

26 Static Analysis of Logic Programs l abstract Interpretation is very popular in logic languages –the computational model has several opportunities for optimization, based on analysis results –it is (relatively) easy to define, because the standard semantics is collecting and the concrete domain (sets of substitutions) is quite simple l several important properties (groundness, freeness, sharing, depth(k)) l for some properties (i.e., groundness and sharing) a lot of different abstract domains –techniques to compare the relative precision of abstract domains –important results on techniques for the systematic design of abstract domains, which can probably be applied to other paradigms as well l abstract compilation in CLP (Giacobazzi, Debray & Levi, JLP 95) –the program is transformed by syntactically replacing concrete constraints by abstract constraints –the abstract computation is a standard CLP computation on a different constraint system

27 Groundness in Logic Programs l CLP version l concrete domain –(P(Eqns),  ), sets of sets of term equations in solved form l concrete semantics –the CLP version of the s-semantics (answer constraints) l 3 abstract domains –G: the property of being ground –DEF: functional groundness dependencies –POS: DEF + some disjunctive information u lattices shown in the 2-variables case

28 An example l the program p(X,Y) :- X=a. p(X,Y) :- Y=b. q(X,Y) :- X=Y. r(X,Y) :- p(X,Y),q(X,Y). l the concrete semantics p(X,Y) -> {{X=a},{Y=b}} q(X,Y) -> {{X=Y}} r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}} l in the concrete semantics of r –both the arguments are bound to ground terms (in all the answer constraints)

29 The domain G l the program p(X,Y) :- X=a. p(X,Y) :- Y=b. q(X,Y) :- X=Y. r(X,Y) :- p(X,Y),q(X,Y). l the concrete semantics p(X,Y) -> {{X=a},{Y=b}} q(X,Y) -> {{X=Y}} r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}}  G  (v) = – , if v= bot –{e  Eqns | X is bound to a ground term in e }, if v= X X is always ground –Eqns, if v= true no groundness information l the abstraction of the concrete semantics p(X,Y) -> true q(X,Y) -> true r(X,Y) -> X & Y l the abstract program p(X,Y) :- lub G (X,Y). q(X,Y) :- true. r(X,Y) :- glb G (p(X,Y),q(X,Y)). l the abstract semantics p(X,Y) -> true q(X,Y) -> true r(X,Y) -> true

30 The domain Def l the program p(X,Y) :- X=a. p(X,Y) :- Y=b. q(X,Y) :- X=Y. r(X,Y) :- p(X,Y),q(X,Y). l the concrete semantics p(X,Y) -> {{X=a},{Y=b}} q(X,Y) -> {{X=Y}} r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}}  Def  (v) = –{e  Eqns | X = Y  e}, if v= X  Y X is ground if and only if Y is ground –{e  Eqns | X = t  e and Y occurs in t}, if v= X  Y if X is ground then Y is ground –….. l the abstraction of the concrete semantics p(X,Y) -> true q(X,Y) -> X  Y r(X,Y) -> X & Y l the abstract program p(X,Y) :- lub Def (X,Y). q(X,Y) :- X  Y. r(X,Y) :- glb Def (p(X,Y),q(X,Y)). l the abstract semantics p(X,Y) -> true q(X,Y) -> X  Y r(X,Y) -> X  Y

31 The domain Pos l the program p(X,Y) :- X=a. p(X,Y) :- Y=b. q(X,Y) :- X=Y. r(X,Y) :- p(X,Y),q(X,Y). l the concrete semantics p(X,Y) -> {{X=a},{Y=b}} q(X,Y) -> {{X=Y}} r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}}  pos  (v) = –{e  Eqns | either X or Y is bound to a ground term in e }, if v= X  Y either X or Y is ground –…. l the abstraction of the concrete semantics p(X,Y) -> X  Y q(X,Y) -> X  Y r(X,Y) -> X & Y l the abstract program p(X,Y) :- lub pos (X,Y). q(X,Y) :- X  Y. r(X,Y) :- glb pos (p(X,Y),q(X,Y)). l the abstract semantics p(X,Y) -> X  Y q(X,Y) -> X  Y r(X,Y) -> X & Y

32 Program Verification by Abstract Interpretation F = concrete semantic evaluation function –concrete enough to observe the property the property is modeled by an abstract domain (A,   and a Galois insertion ,  F  = abstract semantic evaluation function S  = specification of the property, i.e., abstraction of the intended concrete semantics partial correctness:  (lfp F)   S  sufficient partial correctness condition: F  ( S  )    S  (Comini, Levi, Meo & Vitiello, JLP 99) –if F  (  S  )    S  –then S  is a prefixpoint of F  –hence  (lfp F)   lfp F   S 

33 Analysis and Verification F = concrete semantic evaluation function F  = abstract semantic evaluation function analysis: compute lfp F  –we need to compute a fixpoint –noetherian domain or widening S  = specification of the property verification: prove F  (  S  )    S  –no fixpoint computation and no need for noetherian domains –finite representation of the specification –decidability of 

34 Completeness of the proof method assume the program to be partially correct wrt the specification S , i.e.,   (lfp F)   S  then there exists another specification T , stronger than S , such that the sufficient condition F  (  T  )   T  holds l we have shown that the proof method is complete if and only if the abstraction is complete (precise) (Levi & Volpe, PLILP 98)

35 Proof methods and the reference semantics l one can be interested in establishing different kinds of properties –of the final state –of the relation between initial and final state –of the relation between specific pairs of intermediate states, e.g., procedure calls –…. l there exist different corresponding proof methods all the proof methods are instances of F  (  S  )    S  for different choices of the concrete semantic evaluation function F F can be derived by abstract interpretation (comparative semantics) from the most concrete semantics, i.e., a trace semantics l first step of abstraction = choice of the “right” semantics in (positive) logic programming, all the known verification methods have been reconstructed (Levi & Volpe, PLILP 98)

36 Making F  (  S  )    S  effective l extensional specifications –typical analysis properties described by noetherian abstract domains –properties such as polimorphic types which lead to finite abstract semantics, even with non-noetherian domains l intensional specifications, specified by means of assertions l assertions are abstract domains –a formula describes the set of all the concrete states which “satisfy” it (concretization) –if the specification language is closed under conjunction, it is easy to define the abstraction function we can derive an abstract function F , which computes on the domain of assertions and instantiate the verification condition (Comini, Gori & Levi, MFCSIT 00) the relation  on the domain of assertions must be decidable an open problem: completeness of the abstract semantics associated to a specific language of assertions

37 Specification Languages l decidable specification languages have been proposed for functional programming and logic programming –one example: a powerful language which allows one to express several properties of logic programs, including types, freeness and groundness (Volpe, SCP 00) l experiments using Horn Clause Logic as specification language (Comini, Gori & Levi, AGP 00) –it is not decidable –most of the verification conditions can be proved without using a theorem prover u simple logic program transformation techniques, which can be partially supported by an automatic tool

38 Systematic abstract domain design l once we have the abstract domain, the design of the abstract semantics is systematic l abstract interpretation theory provides results which can be exploited to make the design of abstract domains (more) systematic –to compare and combine domains –to refine domains so as to improve their precision reduced product (of domains A and B ) –allows one to analyze (together) the properties modeled by A and B –often delivers better results than the separate analyses u because of domain interaction lifting to the powerset (and disjunctive completion ) –roughly speaking, transform A into P(A) –better precision u no loss of information in computing lub’s

39 Operations on Abstract Domains l several useful operators on abstract domains (refinements) –a survey in (File’, Giacobazzi & Ranzato, ACM Comput. Surv. 96) l linear completion (Giacobazzi, Ranzato & Scozzari, SAS 98) –functional dependencies modeled by linear implication l reconstruction of all the known domains for groundness analysis (Scozzari, SAS 97) –DEF = G -> G –POS = DEF -> DEF –POS = POS -> POS  optimality of POS successfully applied to other domains for logic programs –types (Levi & Spoto, PLILP 98) –sharing and freeness (Levi & Spoto, PEPM 00) open problems –do the same refinements apply to other programming paradigms? –can refinements be extended to domains of assertions and to type systems?

40 Abstract Interpretation l a mathematically simple and solid foundation for –comparative semantics –static analysis –verification l a methodology for the systematic derivation of –abstract domains from the property u complexity issues? u quantitative analyses? –abstract semantics from the concrete semantics and the abstract domain