Logics, Automata, and Algorithms for Analysis of Structured Programs Rajeev Alur University of Pennsylvania Marktoberdorf Summer School, August 2006.

Logics, Automata, and Algorithms for Analysis of Structured Programs Rajeev Alur University of Pennsylvania Marktoberdorf Summer School, August 2006

Software Model Checking AbstractorVerifier Debugger Model Specification Counter-example Program Yes/proof No/bug Research challenges  Search algorithms  Abstraction  Static analysis  Refinement  Expressive specs Applications  Device drivers, OS code  Network protocols  Concurrent data types Tools: SLAM, Blast, CBMC, F-SOFT

do { KeAcquireSpinLock(); nPacketsOld = nPackets; if(request){ request = request->Next; KeReleaseSpinLock(); nPackets++; } } while (nPackets != nPacketsOld); KeReleaseSpinLock(); Verification Example Does this code obey the locking spec? UnlockedLocked Error Rel Acq Rel Specification

Classical Model Checking  Both model M and specification S define regular languages M as a generator of all possible behaviors S as an acceptor of “good” behaviors (verification is language inclusion of M in S) or as an acceptor of “bad” behaviors (verification is checking emptiness of intersection of M and S)  Typical specifications (using automata or temporal logic) Safety: Lock and unlock operations alternate Liveness: Every request has an eventual response Branching: Initial state is always reachable  Robust foundations Finite automata / regular languages Buchi automata / omega-regular languages Tree automata / parity games / regular tree languages

Checking Structured Programs  Control-flow requires stack, so model M defines a context-free language  Algorithms exist for checking regular specifications against context-free models Emptiness of pushdown automata is solvable Product of a regular language and a context-free language is context-free  But, checking context-free spec against a context-free model is undecidable! Context-free languages are not closed under intersection Inclusion as well as emptiness of intersection undecidable  Existing software model checkers: pushdown models (Boolean programs) and regular specifications

Are Context-free Specs Interesting?  Classical Hoare-style pre/post conditions If p holds when procedure A is invoked, q holds upon return Total correctness: every invocation of A terminates Integral part of emerging standard JML  Stack inspection properties (security/access control) If setuuid bit is being set, root must be in call stack  Interprocedural data-flow analysis  All these need matching of calls with returns, or finding unmatched calls Recall: Language of words over [, ] such that brackets are well matched is not regular, but context-free

Checking Context-free Specs  Many tools exist for checking specific properties Security research on stack inspection properties Annotating programs with asserts and local variables Inter-procedural data-flow analysis algorithms  What’s common to checkable properties? Both model M and spec S have their own stacks, but the two stacks are synchronized  As a generator, program should expose the matching structure of calls and returns Solution: Nested words and theory of regular languages over nested words

Course Outline  Model: Recursive/Hierarchical State Machines  Reachability Algorithms  Theory of Nested Words  Fair Cycle Detection Algorithms  Linear Temporal Logic CaRet  Theory of Nested Trees  Fixpoint Logic NT-mu

Program Abstraction int x, y; if x>0 { ………… y:=x+1 ……….} else { ………… y:=x+1 ……….} bool bx, by; if bx { ………… by:=true ……….} else { ………… by:={true,false} ……….} Predicate Abstraction bx: x>0; by : y>0

Modeling Structured Programs main() { bool y; … x = P(y); … z = P(x); … } bool P(u: bool) { … return Q(u); } bool Q(w: bool) { if … else return P(~w) } A2 A1 A3 A2 A3 A1 Entry-node Exit-node Box (superstate) Program Recursive State Machine (RSM)/ Pushdown automaton

Recursive State Machine A  M: Set of module names, m 0 : main module  Each module m has a state machine A m  Set N m of nodes  Set B m of boxes (calls to other modules)  Entry nodes: E m (input parameters)  Exit nodes: X m (return values)  Labeling Y m : Boxes B m -> Module names M  Calls of m : (box b, entry node e of Y m (b))  Returns of m: (box b, exit node x of Y m (b))  Edges  : (source: node/return; destination: node/call)

Transition System of RSM  Vertices V m of a module: Nodes, calls, returns of m  A state of A: (sequence of boxes, vertex)  Set of states B*V is infinite  Initial state: ( ,e), where e is entry-node of m 0  Transitions: Local transitions: if (u,v) is an edge, then (b 1..b n,u) has a transition to (b 1 …b n,v) Call transitions : for a call vertex (b,e), (b 1 …b n,(b,e)) has a transition to (b 1 …b n b,e) Return transitions: for a return vertex (b,x), (b 1 …b n b,x) has a transition to (b 1 …b n, (b,x))

Variations of Definitions  We can label nodes (and boxes) with propositions, and/or edges with alphabet symbols, as needed  Boolean programs can be easily mapped to RSMs  Expressiveness same as the classical model of pushdown automata  Claim: RSMs are well-suited to study algorithms, logics, automata etc for program analysis Pushdown automata have no modular structure No clutter of variables, parameter passing mechanisms etc of real programs Complexity parameters (number of entries/exits) clear

Hierarchical State Machine A2 A1 A3 A2 A3 Entry node Exit node Box (superstate) - Node

Hierarchical State Machines  An RSM A is an HSM, if there is a total ordering of the modules such that if a box b of a module m is mapped to madule m’ then m’ appears after m  The transition system of an HSM is finite, but can be exponential in size  Analysis algorithms for RSMs are typically of same complexity as those of HSMs  Typical code may not use recursion, so can be abstracted to HSMs, but what follows does not really depend on this Specifications of programs should not depend on the bound on the depth of hierarchy For algorithms/specifications we do not want to flatten the hierarchy for succinctness reasons

Reachability  Given an RSM A, a node v is reachable, if (b1…bn,v) is reachable in underlying transition system, for some context b1..bn  Goal: Given A, compute all reachable nodes Key idea: There are |M| interrelated searches: if exit x is reachable from entry e of module m, then add summary edge from call (b,e) to return (b,x), for every box b mapped to module m

Same-level Reachability Algorithm  Let R be the smallest subset of U m E m x V m satisfying, for each module m, each entry e of m, R(e,e) For each edge u to v of m, if R(e,u) then R(e,v) For each box b, for each call vertex (b,f) and return vertex (b,x), if R(e,(b,f)) and R(f,x) then R(e,(b,x))  Claim: For each m, for each entry node e and vertex u of module m, R(e,u) holds iff ( ,u) is reachable from ( ,e) in underlying transition system e u v b f x f x

Reachability Algorithm  Phase 1: Compute the set R capturing same level reachability information (evaluation can be viewed as AND-OR reachability)  Define a graph over vertices of A as follows: Retain all edges of A Add summary edges from (b,e) to (b,x) if R(e,x) holds Add call edges: from call (b,e) to entry e  A node u is reachable in A iff u is reachable in this graph from entry node of m o

Complexity Analysis  Let  be the number of vertices of A,  be the number of edges, and let  be max number of entry nodes per module  Phase 1 AND-OR graph has  vertices and  2 edges, and hence, predicates e(.) can be evaluated in time linear in these  Graph of phase 2 has  vertices and  edges, and can be searched in time linear in these  Thm: Given an RSM A, the set of reachable nodes of A can be computed in time O(|A|  2 )

Mixing Forward and Backward Search  If a module m has a single exit x, then it’s beneficial to search it backwards: compute the set of vertices from which x is reachable Base case: x(x) Or case: x(v) -> x(u); if there is an edge from u to v And case: x((b,y)) & e(y) -> x((b,e)); if e is entry; y an exit  In general, in phase 1, for a module with less exits and entries, compute relations x(.) instead of e(.)  Thm: Given an RSM A, the set of reachable nodes of A can be computed in time O(|A|  2 ), where  = max m min {|E m |,|X m |}  Thm: Given a set F of nodes of m 0, checking whether a node in F is reachable is PTIME-complete (for HSMs with  =1)

Implementations  On-the-fly reachability Is some node in given target set F reachable? Modified Depth-first-search that has early termination and explores a module only when needed  Tabled logic programming Reachability in RSMs is evaluation of Datalog programs Efficient implementations (e.g. XSB) exist  Symbolic Search In presence of global and local variables, at every control point u, compute the set R(I,V) where I contains global and input vars and V contains global and local vars Represent the sets R(I,V) using OBDDs with appropriately modified search algorithm

Nested Words Nested word: Linear sequence + well-nested edges Positions labeled with symbols in  a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a7 a8a8 a9a9 a 10 a 11 a 12 Positions classified as: Call positions: both linear and hierarchical successors Return positions: both linear and hierarchical predecessors Internal positions: otherwise Each position has at most one nested edge

Program Executions as Nested Words Program bool P() { local int x,y; … x = 3; if Q x = y ; … } bool Q () { local int x; … x = 1; return (x==0); } An execution as a word Symbols: w : write x r : read x s : other s w s w r s w s 1 1 2 2 3 3 4 4 An execution as a nested word s w s w r s w s 1 2 3 4 Summary edges from calls to returns

Model for Linear Hierarchical Data  Nested words: both linear and hierarchical structure is made explicit. This seems natural in many applications Executions of structured program RNA: primary backbone is linear, secondary bonds are well-nested XML documents: matching of open/close tags  Words: only linear structure is explicit Pushdown automata add/discover hierarchical structure Parantheses languages: implicit nesting edges  Ordered Trees: only hierarchical structure is explicit Ordering of siblings imparts explicit partial order Linear order is implicit, and can be recovered by infix traversal

RNA as a Nested Word Primary structure: Linear sequence of nucleotides (A, C, G, U) Secondary structure: Hydrogen bonds between complementary nucleotides (A-U, G-C, G-U) In literature, this is modeled as trees. Algorithmic question: Find similarity between RNAs using edit distances G C U G A A U A C G C G C U C G

Linguistic Annotated Data Linguistic data stored as annotated sentences (eg. Penn Treebank) Nested words, possibly with labels on edges Sample query: Find nouns that follow a verb which is a child of a verb phrase Existing query languages: XPath, XQuery, LPath (BCDLZ) NP V Det Adj N Prep Det N N I saw the old man with a dog today NP PP NP VP

Nested Word Automata (NWA) a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a7 a8a8 a9a9 States Q, initial state q 0, final states F Starts in initial state, reads the word from left to right labeling edges with states, where states on the outgoing edges are determined from states of incoming edges Transition function:   c : Q x  -> Q x Q (for call positions)   i : Q x  -> Q (for internal positions)   r : Q x Q x  -> Q (for return positions) Nested word is accepted if the run ends in a final state q8q8 q7q7 q5q5 q4q4 q3q3 q2q2 q1q1 q0q0 q 9 =  r (q 8,q 29,a 9 )q 6 =  i (q 5,a 6 )(q 2,q 29 )=  c (q 1,a 2 )q 29 q 47

Regular Languages of Nested Words  A set of nested words is regular if there is a finite-state NWA that accepts it  Nondeterministic automata over nested words Transition function:  c : Qx  ->2 QxQ,  i :Q x  -> 2 Q,  r :Q x Q x  -> 2 Q Can be determinized  Graph automata over nested words defined using tiling systems  Appealing theoretical properties Effectively closed under various operations (union, intersection, complement, concatenation, projection, Kleene-* …) Decidable decision problems: membership, language inclusion, language equivalence … Alternate characterization: MSO, syntactic congruences

Software Analysis  A program P with stack-based control is modeled by a set L of nested words it generates Choice of  depends on the intended application Summary edges exposing call/return structure are added (exposure can depend on what needs to be checked) If P has finite data (e.g. pushdown automata, Boolean programs, recursive state machines) then L is regular  Specification S given as a regular language of nested words  Verification: Does every behavior in L satisfy S ? Runtime monitoring: Check if current execution is accepted by S (compiled as a deterministic automaton) Model checking: Check if L is contained in S, decidable when P has finite data

RSMs as Nested Word Automata  Given an RSM A, we can transform it into a nondeterministic finite-state NWA A’ as follows States of A’ = Vertices of A Initial state of A’ = Entry node of main module m o of A All states are final Local transitions of A’ = Edges of A Call transitions of A’ = From call (b,e), push b, and goto e Return transitions of A’ = From exit x, pop b, and goto return (b,x)  Choose alphabet  and add labels to transitions as needed  L(A’) models all behaviors of A as nested words

Writing Program Specifications Intuition: Keeping track of context is easy; just skip using a summary edge Finite-state properties of paths, where a path can be a local path, a global path, or a mixture Sample regular properties: If p holds at a call, q should hold at matching return If x is being written, procedure P must be in call stack Within a procedure, an unlock must follow a lock All properties specifiable in standard temporal logics (LTL)

Local Regularity Let L be a regular language, Local(L): every local path is in L (skip summary edges) E.g. L: every write (w) is followed by a read (r) Given a DFA A for L, construct NWA B for Local(L) States Q, initial state q 0, final states F, same as A  i (q,a) =  (q,a)  c (q,a) = (q 0,  (q,a))  r (q,q’,a) =  (q’,a) if q is in F

Determinization Goal: Given a nondeterministic automaton A with states Q, construct an equivalent deterministic automaton B Intuition: Maintain a set of “summaries” (pairs of states) State-space of B: 2 QxQ Initially, state contains q->q, for each q At call, if state u splits into (u’,u’’), summary q->u splits into (q->u’,u’->u’’) At return, summaries q->u’ and u’->w join to give q->u A state of B is final if it contains q->q’, where q is initial and q’ is final q->q q’->q’… q->u q’->v… u’->u’’ v’->v’’… u’->w u’->w’ v’->w’’… q->w q->w’ q’->w’’… q->u’ q’->v’…

Closure Properties The class of regular languages of nested words is effectively closed under many operations Intersection: Take product of automata (key: nesting given by input) Union: Use nondeterminism Complementation: Complement final states of deterministic NWA Projection: Use nondeterminism Concatenation/Kleene*: Guess the split (as in case of word automata) Reverse (reversal of a nested word reverses nested edges also)

Decision Problems  Membership: Is a given nested word w accepted by NWA A? Solvable in polynomial time If A is fixed, then in time O(|w|) and space O(nesting depth of w)  Emptiness: Given NWA A, is its language empty? Solvable in time O(|A| 3 ): view A as a pushdown automaton  Universality, Language inclusion, Language equivalence: Solvable in polynomial-time for deterministic automata For nondeterministic automata, use determinization and complementation; causes exponential blow-up, Exptime-complete problems

MSO-based Characterization  Monadic Second Order Logic of Nested Words First order variables: x,y,z; Set variables: X,Y,Z… Atomic formulas: a(x), X(x), x=y, x y Logical connectives and quantifiers  Sample formula: For all x,y. ( (a(x) and x -> y) implies b(y)) Every call labeled a is matched by a return labeled b  Thm: A language L of nested words is regular iff it is definable by an MSO sentence Robust characterization of regularity as in case of languages of words and languages of trees

MSO-NWA Equivalence (Proof sketch)  From deterministic NWA to MSO Unary predicates and q l and q h for each state q of A Formula says that these predicates encode a run of A consistent with its transition function (q h is used to encode state-labels on nesting edges)  r requirement can be encoded using nesting-edge predicate ->  Only existential-second-order prefix suffices  From MSO to nondeterministic NWA NWA can check base predicates x=y, x y Use closure properties: union, complement, and projection

Congruence Based Characterization Context C: A nested word and a linear edge Substitution I(C,w): Insert nested word w in a context C Congruence: Given a language L of nested words, w ~ L w’ if for every context C, I(C,w) is in L iff I(C,w’) is in L Thm: A language L of nested words is regular iff the congruence ~ L is of finite index.

Relating to Word Languages Words labeled with a typed alphabet (visibly pushdown words) Symbols partitioned into calls, returns, and internals Two views are basically the same giving similar results Visibly Pushdown Automata Pushdown automaton that must push while reading a call, must pop while reading a return, and not update stack on internals Height of stack determined by input word read so far a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a7 a8a8 a9a9 a 10 a 11 a 12 Visibly Pushdown Languages A robust subclass of deterministic context-free languages a1a1 a2a2 a3a3 a4a4 a5a5 a6a6 a7a7 a8a8 a9a9 a 10 a 11 a 12

VPLs vs DCFLs Fix . For each partitioning of  into  c,  i,  r, we get a corresponding class of visibly pushdown languages Each class is closed under Boolean operations Decidable equivalence, inclusion problems etc Instead of static typing of symbols, one can use dynamic types determined by an automaton to get more VPL classes[Caucal’06] Regular DCFL Dyck Are these VPLs? L 1 = {a n b n | n > 0}, L 2 = {b n a n | n > 0}, L 3 = words with same # of a’s & b’s

Relating to Tree Languages A binary tree is hiding in a nested word At calls, left subtree encodes what happens in the called procedure, and right subtree gives what happens after return Why not use tree encoding and tree automata ? Notion of regularity is same in both views Nesting is encoded, but linear structure is lost Deterministic tree automata are not expressive No notion of reading input from left to right XML literature has lots of attempts to address this deficiency: Tree walking automata…

Summary Table Word Automata Pushdown Automata Tree Automata NWA Unionyes Intersectionyesnoyes Complementyesnoyes Det= Nondetyesno yes EmptinessNlogspacePtime Inclusion (Nondet) PspaceUndecExptime

Related Work  Restricted context-free languages Parantheses languages, Dyck languages Input-driven languages  Logical characterization of context-free languages (LST’94)  Connection between pushdown automata and tree automata Set of parse trees of a CFG is a regular tree language Pushdown automata for query processing in XML  Algorithms for pushdown automata compute summaries Context-free reachability Inter-procedural data-flow analysis  Game semantics for programming languages (Abramsky et al)  Model checking of pushdown automata LTL, CTL,  -calculus, pushdown games LTL with regular valuations of stack contents CaRet (LTL with calls and returns)

Research Directions  Visible Pushdown Languages (AM, STOC’04) Extends to  -regular languages of infinite words  VPL triggered research Games (LMS, FSTTCS’04) Congruences and minimization (AKMV ICALP’05, KMV Concur’06) Third-order Algol with iteration (MW FoSSaCS’05) Dynamic logic with recursive programs (LS FoSSaCS’06) Synchronization of pushdown automata (Caucal DLT’06) (Bi)Simulation and other equivalences (Srba STACS’06) Regularity and counter automata (BLS CSL’06)  Linear-time Temporal Logics CaRet (Logic of calls and returns) (AEM TACAS’04) Caution: Not studied in the nested word framework

Fair Cycle Detection  Given a set F of nodes, is there an infinite path in the underlying transition system that contains infinitely many states with node-component in F?  Relevant information about a box: for entry e and exit x, is x reachable from e along a path containing a node in F?  Complexity same as reachability (Ptime-complete, and in time O(|A|  2 )

Keeping Track of Visits to F  Let R* be the smallest subset of U m E m x V m satisfying, for each module m, each entry e of m, If R(e,u) and u is in F then R*(e,u) For each edge u to v of m, if R*(e,u) then R*(e,v) For each box b, for each call vertex (b,f) and return vertex (b,x), if R*(e,(b,f)) and R(f,x) then R*(e,(b,x)) if R(e,(b,f)) and R*(f,x) then R*(e,(b,x))  Claim: For each m, for each entry node e and vertex u of module m, R*(e,u) holds iff ( ,u) is reachable from ( ,e) in underlying transition system along a path visiting some state with node component in F e u v b f x f x

Fair Cycle Detection Algorithm  Phase 1: Compute the set R  Phase 2: Compute the set R*  Define a graph over vertices of A as follows: Retain all edges of A Add summary edges from (b,e) to (b,x) if R(e,x) holds, and add this edge to F if R*(e,x) holds Add call edges: from call vertex (b,e) to entry e  A has infinite fair path iff the above graph has a cycle that is reachable from entry-node of m 0 and contains either a node or an edge in F e u v f x f x

Linear-time Propositional Temporal Logic Q ::- p | not Q | Q or Q’ | Next Q | Always Q | Eventually Q | Q Until Q’ Interpreted over (infinite) words. Models of an LTL formula is a -regular language. Useful for stating sequencing properties:  If req happens, then req holds until it is granted: Always ( req → (req Until grant) )  An exception is never raised: Always ( not Exception )

CARET CARET: A temporal logic for Calls and Returns Expresses -regular properties of infinite nested words A B C A Global successor used by LTL ………….

CARET CARET: A temporal logic for Calls and Returns Expresses -regular properties of infinite nested words A B C D Global successor used by LTL …………. Local successor: Jump from calls to returns along nesting edge Otherwise global successor at the same level

CARET CARET: A temporal logic for Calls and Returns Expresses -regular properties of infinite nested words A B C A Global successor used by LTL …………. Local successor: Jump from calls to returns along nesting edge Otherwise global successor at the same level

CARET CARET: A temporal logic for Calls and Returns Expresses -regular properties of infinite nested words A B C A Global successor used by LTL …………. Local successor: Jump from calls to returns along nesting edge Otherwise global successor at the same level Local path

CARET CARET: A temporal logic for Calls and Returns Expresses -regular properties of infinite nested words A B C A Global successor used by LTL …………. Local successor: Jump from calls to returns along nesting edge Otherwise global successor at the same level Caller modality: Jump to the caller of the current module Defined for every position except top-level ones

CARET CARET: A temporal logic for Calls and Returns Expresses -regular properties of infinite nested words A B C A Global successor used by LTL …………. Abstract successor: Jump from calls to returns along nesting edge Otherwise global successor at the same level Caller modality: Jump to the caller of the current module Defined for every position except top-level ones Caller path gives the stack content!

Expressing properties in Caret Pre-post conditions: If P holds when A is called, then Q must hold when the call returns Always ( (P and call-to-A) Local-Next Q ) A P Q Integrating Manna/Pnueli-style reasoning for reactive computations with Hoare-style reasoning for structured programs

Expressing properties in Caret If A is called with low priority, then it cannot access the file Always ( call-to-A and low-priority Local-Always ( not access-file ) ) A low-priority A high-priority access-file

Expressing properties in Caret Stack inspection properties If a variable x is accessed, then A must be on the call stack Always ( access-to-x CallerPath-Eventually call-to-A ) access-to-x A

Model checking CARET  Given: A recursive state machine/ nested word automaton M A CARET formula Q  Model-checking: Do all infinite nested words of M satisfy specification Q? CARET can be model-checked in time that is polynomial in M and exponential in Q. |M| 3. 2 O(|Q|) Complexity class same as that for LTL ! Generalization of Vardi-Wolper construction

Model-checking CARET: Intuition  Given M and formula Q,  Build a product nested word automaton that accepts words exhibited by M that satisfy (not Q)  M does not satisfy Q if this product has a fair cycle  Construction builds on the classical tableaux for LTL  (State of M, set of subformulas of Q) s Q1 Local-Next Q1

Nested Trees Tree edges + Nesting edges Unranked (arity not fixed) Unordered Infinite Given pushdown automaton/ RSM) A, model it by a nested tree T A Each path models an execution as a nested word Branching-time model checking: Specification is a language of nested trees, verification is membership

Transition function of a tree automaton  : Q x  -> D D depends on type of automaton and type of trees  Nondeterministic over binary trees: D is a set of pairs; A choice (u,v) means send u to left child and v to right child  Nondeterministic over ordered trees: D is a regular language over Q: the sequence of states sent along children must be in D  Nondeterministic over unordered unranked trees: D is a set of terms in 2 Q x Q; A choice ({q 1,q 2 }, q 3 ) means that send q 1 to one child, q 2 to a different child, and q 3 to all remaining children  Alternating over unordered unranked trees: D contains formulas that positive Boolean combination of terms of the form, [q]; A formula ( or ) and [q 3 ] means send q 3 to all children, and either q 1 or q 2 to one of them Tree Automata Definitions

 Finitely many states Q, initial states  Run of the automaton: Labeling of edges with states consistent with initial set and transition function  Local transitions:  i (q,a) is a set of terms in 2 Q x Q  Call transitions:  c (q,a) is a set of terms in 2 QxQ x Q x Q; ({(q 1,q 2 )},q 3,q 4 ) means send q 1 to one child, q 2 along corresponding nesting edges, q 3 to remaining children, and q 4 along all remaining nesting edges  Return transitions:  r (q,q’,a) is set of terms in 2 Q xQ, here q is the state along tree edge, and q’ is the state along nesting edge  Acceptance condition: Final states, Buchi, Parity (NPNTA) Nondeterministic Nested Tree Automata

Properties of NPNTAs  Thm: Closed under union and projection.  Thm: Closed under intersection. Proof idea: Finite-state; just take product.  Thm: Not closed under complement.  Thm: Emptiness checkable in EXPTIME. Proof idea: Special case of emptiness of pushdown tree automata.  Thm: Model-checking on pushdown systems in EXPTIME. Proof idea: The stack of the input pushdown system is synchronized with the implicit stack of the NPNTA, so a product construction works.  Thm: Universality undecidable. Extension: alternation. Extra expressive power, unlike in the case of tree automata

 Transition Terms (TT): Positive Boolean combination of atomic terms of the form (send q to some child), [q] (send q to all children)  CTT: Positive boolean combination of terms of the form (send q to some child and q’ to all corresponding nesting edges) [q,q’] (q on all tree edges, q’ on all nesting edges)  Transition function has call, return and internal components:  i : Q x  -> TT,  c : Q x  -> CTT,  r : Q x Q x  -> TT  Run of the automaton: game between the automaton and an adversary.  Winning condition: Parity  Tree accepted iff automaton has a winning strategy Alternating Nested Tree Automata

Properties of APNTAs  Thm: Closed under union, intersection.  Thm: Closed under complement. Proof idea: Parity games are determined, and designing the complement game is easy.  Thm: Not closed under projection.  Thm: Can express some non-context-free tree languages.  Theorem: Model-checking EXPTIME-complete. Proof idea: Stack of the input pushdown system is synchronized with the implicit stack of the NTA, so the problem can be reduced to a pushdown game, solvable in EXPTIME.  Thm: Emptiness, universality undecidable. Next… Logics on nested trees

Logics for Trees mu-calculus Canonical temporal logic Fixpoints over sets of states Suitable for symbolic implementation Equivalent to bisimulation-closed alternating tree automata Decidable model-checking on pushdown systems LTL CTL

Mu-Calculus Assembly language of temporal logics Formulas  Sets of nodes Least and greatest fixpoints of f f, f, f : there is an edge to call/ret/local node satisfying f

Fixpoints in mu-calculus Reachability in mu-calculus: Formula describes a terminating symbolic computation for finite- state systems. Application: mu-calculus is the “assembly language” in temporal logic model-checkers like NuSMV. What about pushdown models (interprocedural analysis)? Algorithms use “summarization”, and not captured by mu-calculus Model-checking mu-calculus on pushdown systems is decidable. But…

Summary Subtrees call ret p s u v Nesting edges let us chop a nested tree into subtrees that summarize contexts. We could jump across contexts if we could reason about concatenation. local Summary s u v Matching returns of s = {u,v}

Logic of Subtrees s u Mu-calculus formulas can be interpreted at subtrees rather than nodes Formula  set of subtrees Modalities argue about full subtrees rooted at children Why not a fixpoint calculus where: Formulas  sets of summary trees and modalities for concatenation? Proposal: NT-mu. <>f f

Operations on Summaries local s u s Formulas  sets of summaries call ret

Colored Summary Trees Number of “leaves” is unbounded Solution: assign leaves k colors Colors are defined by formulas (obligations upon return) Within f, we use the propositions R 1, R 2, … R k to refer to the colors of return leaves

Mu-calculus vs NT-mu mu-calculus: fixpoints over full subtrees NT-mu: fixpoints over summary trees

Semantics of NT-mu  k-colored summary tree specified by (s,U 1, … U k ), where s is a tree node, and each U i is a subset of matching returns of s  Meaning of each formula f of NT-mu is a set of summaries (s, U 1, … U k ) |= p if label of s satisfies p Meaning of Boolean connectives is standard (s, U 1, … U k ) |= f if s has an internal-child t s.t. (t,U 1, … U k ) |= f (s, U 1, … U k ) |= R i if s has a return-child t s.t. t is in U i (s, U 1, … U k ) |= f(g 1,…g m ) if s has a call-child t s.t. (t,V 1 … V m ) |=f where V j contains all matching returns w of t s.t. (w,U 1, … U k ) |= g j Formulas define monotonic functions from summary sets to summary sets; fixpoint semantics is standard  A nested tree T with root r satisfies f if ( r ) |= f

Examples  There exists a return colored 1: summaries (s,U) s.t. U is non- empty f :  X. ( R1 or X or X {X} )  p is reachable : EF p  X. ( p or X or X {} or f {X})  Local reachability: p is reachable within the same procedural context  X. (p or X or f {X}

Specifying Requirements  Branching-time properties that mix local and global paths  Inter-procedural data-flow analysis Set of program points where expression e is very busy (along every path e is used before a variable in e gets modified) If e contains local variables, this is not definable in mu-calculus  Stack inspection, access control, stack overflow  Pre-post conditions (universal as well as branching)

Model Checking  Given an RSM A and NT-mu formula f, does the nested tree T A satisfy f ?  Consider a point a in a component with exits u and v A sample state of A is of the form .a, where  is a stack of boxes State at any matching return of .a is either .u or .v  Claim 1: NT-mu is a tree logic, so even though .a may appear at multiple places in T A, it satisfies the same formulas  Claim 2: NT-mu formulas are evaluated over summary trees (cannot access nodes beyond matching returns), satisfaction of formula at .a does not depend on the context 

Bisimulation Closure A summary (s,U 1,…U k ) is bismulation- closed if two matching returns w and w’ are bisimilar, then w in U i iff w’ in U i Claim: During fixpoint evaluation, it suffices to consider only bisimulation- closed summaries Closing each color under bisimulation does not change the truth of formulas Return nodes corresponding to the same exit are bisimilar Corollory: Bisimulation-closed summaries have finite representation (colors for each exit) .a .u .v a u v

Model Checking  Model checking procedure: Consider RSM-summaries of the form (s,U 1,..U k ), where s is a vertex in a component, and U i is a subset of exit points Finitely many RSM summaries Evaluate NT-mu formula using standard fixpoint computation  Model checking RSMs wrt NT-mu is Exptime-complete Same complexity as CTL or mu-calculus model checking  Recall reachability in NT-mu f:  X. ( R1 or X or X {X} ) EF p :  X. ( p or X or X {} or f {X} ) Local-reach:  X. (p or X or f {X})  Evaluation of these over RSM-summaries is the standard way of solving reachability Evaluating f corresponds to pre-computing summaries Global/local reachability are computationally similar

Expressiveness Thm: NT-mu and APNTA are equally expressive Corollary: NT-mu can capture every property that the mu- calculus can. Corollary: CARET (a linear temporal logic of calls and returns, AEM’04) is contained in NT-mu. Corollary: Satisfiability of NT-mu is undecidable. NT-mu can express pushdown games Thm: Expressiveness increases with the number of colors

From NT-mu to APNTA (Proof sketch)  Given an NT-mu formula f, construct equivalent APNTA A  States of A are subformulas of f  Simplify(f,a), where a is an assignment to atomic props Unroll any top-level fixpoint of f Replace each top-level proposition by its T/F value according to a Simplify(f,a) is a positive Boolean comb of terms like <>g and []g   i (f,a) = Simplify(f,a)   c (f{g 1,…g k },a) = (Simplify(f,a), (g 1,…g k )) Evaluate f at call node and send (g 1,..g k ) along nesting edge   r (R i, (g 1,…g k ),a) = Simplify(g i,a) Retrive i-th return obligation from nesting edge, and evaluate it  Fixpoints handled using parity condition

From APNTA to NT-mu (Proof sketch)  Given alternating NTA A with Q = {1..n}, accepting by final state, construct a set of least fixpoint equations  Number of colors (return parameters): n  For each pair of states, a variable X ij  Intended meaning: A summary (s, U 1,…U n ) is in X ij iff A has a strategy starting at s in state i, with state j along all nested edges to return, to end up in a matching return s’ in U k in state k  Write equations among X ij variables so that the lfp captures the intended meaning s state j in color k if game get here in state k Game starts here in state i

MSO Logic for Nested Trees Thm: Model-checking even the bisimulation-closed fragment of MSO is undecidable. Thm: More expressive than NPNTAs. Thm: Can encode a property not expressible by APNTAs. Conjecture: Expressiveness of MSO and APNTAs incomparable. Monadic Second Order Logic of Nested Trees First order variables: x,y,z; Set variables: X,Y,Z… Atomic formulas: a(x), X(x), x=y, x ->y, x -> y Logical connectives and quantifiers

Recap  Basic analysis technique for hierarchical/recursive programs: and-or reachability  Executions of programs should expose hierarchical structure  Nested words: Model for linear+hierarchical data  Nested word automata for defining regular languages of nested words  New Opportunities More expressive specifications for software analysis tools (model checking, testing, theorem proving, monitoring…) Query processing for XML data Using NWA instead of pushdown automata as well as tree automata (over finite ordered trees) can lead to simplification of constructions and/or new insights Nested tree automata: new frontier of decidability

Papers  Analysis of recursive state machines (with Etessami, Yannakakis; CAV’01)  Adding nesting structure to words (with Madhusudan; DLT’06; precursor: Visibly pushdown languages, STOC’04)  Temporal logic of calls and returns (with Etessami and Madhusudan; TACAS’04)  Fixpoint logic for local and global flows (with Chaudhuri and Madhusudan, POPL’06)  Languages of nested trees (with Chaudhuri and Madhusudan; CAV’06) Available on my webpage

Research in Software Model Checking  Convergence of program analysis and model checking  Efficient ways for property-guided abstraction  Symbolic computation of function summaries using SAT solvers and/or OBDDs  Avoid abstraction; encode state-space directly (after slicing and range analysis)  Automated compositional reasoning using computational learning of interface assumptions  How to deal with features of C ?  What are appropriate application domains?

Logics, Automata, and Algorithms for Analysis of Structured Programs Rajeev Alur University of Pennsylvania Marktoberdorf Summer School, August 2006.

Similar presentations

Presentation on theme: "Logics, Automata, and Algorithms for Analysis of Structured Programs Rajeev Alur University of Pennsylvania Marktoberdorf Summer School, August 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Logics, Automata, and Algorithms for Analysis of Structured Programs Rajeev Alur University of Pennsylvania Marktoberdorf Summer School, August 2006.

Similar presentations

Presentation on theme: "Logics, Automata, and Algorithms for Analysis of Structured Programs Rajeev Alur University of Pennsylvania Marktoberdorf Summer School, August 2006."— Presentation transcript:

Similar presentations

About project

Feedback