Generating Complex Input (and its other applications) Course Software Testing & Verification 2013/14 Wishnu Prasetya.

Slides:

Advertisements

Similar presentations

Heuristic Search techniques

Advertisements

Regular Expressions and DFAs COP 3402 (Summer 2014)

Model Based Testing Course Software Testing & Verification 2013/14 Wishnu Prasetya.

C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.

Grammars, constituency and order A grammar describes the legal strings of a language in terms of constituency and order. For example, a grammar for a fragment.

Introduction to Software Testing Chapter 2.1, 2.2 Overview Graph Coverage Criteria Paul Ammann & Jeff Offutt

1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.

1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.

Introduction to Computability Theory

1 Introduction to Computability Theory Lecture7: PushDown Automata (Part 1) Prof. Amos Israeli.

CS5371 Theory of Computation

CS 330 Programming Languages 09 / 13 / 2007 Instructor: Michael Eckmann.

PZ02A - Language translation

Context-Free Grammars Lecture 7

A basis for computer theory and A means of specifying languages

1 Foundations of Software Design Lecture 23: Finite Automata and Context-Free Grammars Marti Hearst Fall 2002.

Parsing — Part II (Ambiguity, Top-down parsing, Left-recursion Removal)

Normal forms for Context-Free Grammars

CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.

Introduction to Finite Automata Adapted from the slides of Stanford CS154.

Introduction to Software Testing Chapter 2.7 Representing Graphs Algebraically Paul Ammann & Jeff Offutt

Reverse Engineering State Machines by Interactive Grammar Inference Neil Walkinshaw, Kirill Bogdanov, Mike Holcombe, Sarah Salahuddin.

1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.

1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.

Introduction to Software Testing Chapter 2.3 Graph Coverage for Source Code Paul Ammann & Jeff Offutt

Automated Testing, and Generating Complex Input (and other applications) Course Software Testing & Verification 2014/15 Wishnu Prasetya.

PART I: overview material

Profs. Necula CS 164 Lecture Top-Down Parsing ICOM 4036 Lecture 5.

Copyright © by Curt Hill Grammar Types The Chomsky Hierarchy BNF and Derivation Trees.

Context Free Grammars. Context Free Languages (CFL) The pumping lemma showed there are languages that are not regular –There are many classes “larger”

Introduction to Parsing

Introduction to Software Testing Chapter 2.3 Graph Coverage for Source Code Paul Ammann & Jeff Offutt

Pushdown Automata Chapters Generators vs. Recognizers For Regular Languages: –regular expressions are generators –FAs are recognizers For Context-free.

Representing Graphs Algebraically (Path Products, Path Expressions) © Ammann & Offutt.

1 Language translation Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Sections

CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.

Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.

1 / 48 Formal a Language Theory and Describing Semantics Principles of Programming Languages 4.

Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.

1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.

Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.

CS412/413 Introduction to Compilers Radu Rugina Lecture 18: Control Flow Graphs 29 Feb 02.

1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.

Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.

Software Testing and Maintenance Lecture 3 Graph Coverage for Source Code Paul Ammann & Jeff Offutt Instructor: Hossein Momeni Mazandaran.

Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.

Paul Ammann & Jeff Offutt

Course Software Testing & Verification 2016/17 Wishnu Prasetya

lec02-parserCFG May 8, 2018 Syntax Analyzer

CS510 Compiler Lecture 4.

Software Testing and Maintenance 1

Paul Ammann & Jeff Offutt

Paul Ammann & Jeff Offutt

Automata and Languages What do these have in common?

Simplifications of Context-Free Grammars

Paul Ammann & Jeff Offutt

Introduction to Software Testing Chapter 5.1 Syntax-based Testing

Paul Ammann & Jeff Offutt

Compiler Design 4. Language Grammars

Syntax-Directed Translation

Paul Ammann & Jeff Offutt

Course Software Testing & Verification 2017/18 Wishnu Prasetya

Paul Ammann & Jeff Offutt

Teori Bahasa dan Automata Lecture 9: Contex-Free Grammars

Introduction to Software Testing Chapter 5.1 Syntax-based Testing

Paul Ammann & Jeff Offutt

Paul Ammann & Jeff Offutt

lec02-parserCFG May 27, 2019 Syntax Analyzer

COMPILER CONSTRUCTION

Presentation transcript:

Generating Complex Input (and its other applications) Course Software Testing & Verification 2013/14 Wishnu Prasetya

Content BNF to describe inputs Generating inputs Coverage Regular expression Other applications of regular expression 2 Important note: generating complex inputs is a non-trivial task. But this is only partially addressed by AO. E.g. chapter 5 set up the right background, but then they went on to focus more on mutation. Here we will complement that with additional materials. Usable sections from AO related to the issues of input generation are the following: Section (very short!) about BNF Section 2.7 about regular expression, but this section is actually about using regular expression to e.g. calculate the needed test paths to deliver a given graph coverage. This is not directly related to input generation; but we will discuss this as well.

Describing “allowed” inputs Using Bakus Naur Form (BNF) notation / context free grammar (see example in p171) : Terminologies: start symbol, terminal, non-terminal, epsilon (not in book, just check Wikipedia, or the lecture notes of Languages & Compilers) 3 S  Brace | Curly |  Brace  “(“ S “)” S Curly  “{“ S “}” S implicitly THREE production rules!

Production Rule A production rule has the form N  Z, where N is a non-terminal and Z is a sequence of symbols. A rule like A  a(B|C)d is seen as a short hand for a set of production rules: A  aBd A  aCd People often use extended BNF e.g. : Brace  ( “(“ S “)” )* 4

Example: NL post codes Sometimes there are additional constraints, e.g. codes above 9999 XL do not actually exists (do not map to an existing address). A constraint is not always expressible in BNF; or it is expressible but not conveniently. 5 NLpostcode  Area Space Street Area  FirstDigit Digit Digit Digit Street  Letter Letter FirstDigit  1 | 2... Digit  0 | 1 | 2... Letter  a | b | c... | A | B | C... Space  “ “ *

Generating inputs 6 S  Brace | Curly |  Brace  “(“ S “)” S Curly  “{“ S “}” S A derivation is a series of expansion of the grammar that result in a sequence of terminal symbols. It follows that the sequence is a valid sentence of the grammar. We can use this to generate valid sentences. Example : S  Brace  ( S ) S  (  ) S  (  ) 

Derivation tree 7 S  Brace // RSb S  Curly // RSc S   // RSe Brace  “(“ S “)” S Curly  “{“ S “}” S A derivation : S  Brace  ( S ) S  (  ) S  (  )  RSb RBrace ( RSe ) RSe   A derivation can also be described by a derivation tree such as above. Given such a tree, you can reconstruct what the derived sentence is.

One more example 8 RSb RBrace ( RSc ) RSe  CRurly { RSe } RSe   Represent such a tree e.g. by: data Dtree = Term String | RuleName  [DTree] To get the sentence from a d::DTree, simply flatten it. S  Brace // RSb S  Curly // RSc S   // RSe Brace  “(“ S “)” S Curly  “{“ S “}” S

A more generic formulation: basic idea Every derivation rule implicitly generates all derivation trees that begins with its LHS non- terminal. We will implement this by a function of type: type Gen = ()  [Dtree] Later on, we still have to select which trees to take. To actually “run” a generator gen to produce sentences we do: map flatten. select. gen $ (), with some implementation of “select”. May not terminate if you have infinite trees  later. 9

Combining generators Combining generators:  to interpret “alternatives”, and rule and rule_ to interpret a production rule:  :: Gen  Gen  Gen (f  g) () = f()  g() rule :: Name  [Gen]  Gen rule n rhs = ( (). [ n   |  rhs 1 ()  rhs 2 ... ] ) And this operator to represent the interpretation of terminals: term :: String  Gen term v = (). [ Term v ] 10

Defining “rule” rule :: Name  [Gen]  Gen Only showing a simpler case, suppose the rule was: rule1 : A  BC rule “rule1” [genB, genC] = (). [“rule1”  [ t, u ] | t  genB(), u  genC() ] 11

Example 12 S  Brace | Curly |  Brace  “(“ S “)” S Curly  “{“ S “}” S s = rule “RSb” [brace]  rule “RSc” [curly]  rule “RSe” [term “”] brace = rule “Rbrace” [term “(”, s, term “)”, s ] curly = rule “Rcurly” [term “{”, s, term “}”, s] Notice the similarity between the structures of the generator and the original grammar. An initial generator is the generator that corresponds to the grammar initial symbol. So, the function s above is an initial generator.

Only generate trees of depth < kMax The previous generators generate infinite number of trees, some may have infinite depth. In a non-lazy language, extend the generators so that it counts how far the current node is from the root: Adapting the combinators, e.g.: 13 term s k = if k  0 then [Term s] else [] (f  g) k = f k  g k type Gen = Int  [Derivation]

Defining “rule” type Gen = Int  [Derivation] rule :: Name  [Gen]  Gen Only showing a simpler case : rule name [gen1,gen2] = newrule where newrule k = if k < 0 then [] else [ name  [ t, u ] | t  gen1 (k-1), u  gen2 (k-1) ] 14

Example revisited s 2 produces { "()", "{}", "" } s 4 produces: Too many... However, you actually have the full derivation trees which you can exploit for filtering. 15 s = rule “s0” [brace]  rule “s1” [curly]  rule “s2” [term “”] brace = rule “brace” [term “(”, s, term “)”, s ] curly = rule “curly” [term “{”, s, term “}”, s] ["(())()","(()){}","(())","({})()","({}){}","({} )","()()","(){}","()","{()}()","{()}{}","{()}"," {{}}()","{{}}{}","{{}}","{}()","{}{}","{}","“]

We can see... We can see which terminals are produced. We can see which rules were used. We can infer which non-terminals were produced during the derivation. 16 S  Brace | Curly |  Brace  “(“ S “)” S Curly  “{“ S “}” S RSb RBrace ( RSc ) RSe  RCurly { RSe } RSe  

BNF coverage (C5.29) TR contains each terminal symbol from the given grammar G. (C5.30) TR contains each production rule in G. Production coverage subsumes terminal coverage; but these are usually too weak. Pair-wise production coverage: TR contains every feasible pair (R1,R2) of production rules. Feasible means that they can actually be applied in succession in a derivation from G. Can be generalized to k-wise, but may blow up the size of TR. Alternatively, if G is not too large you can still manually add new requirements to your TR. 17

Pair-wise production coverage A derivation tree t covers covers a pair rule R1;R2 if the pair appears as two consecutive nodes in in t. A set T of derivation trees gives full pair-wise production coverage if every feasible pair of rules R1;R2 is covered by some t in T. Analogously for k-wise coverage. 18 RSb RBrace ( RSc ) RSe  RCurly { RSe } RSe  

Example { “()”, “{}” } gives full terminal as well as production coverage. Combinations of brace-curly and curly-brace can only be enforced by pair-wise coverage. But none of those coverage criteria can distinguish between e.g. ({}) and (){} 19 S  Brace | Curly |  Brace  “(“ S “)” S Curly  “{“ S “}” S

Rule-rule coverage alts(N) = the set of production rules of non-terminal N; alts(R,i) = the set of production rules of the i-th symbol of the rule R; equal to alts(N) if N is the non-terminal at i-th pos. A derivation tree t covers R ;i R’ if it R’ appears as the i-th child of some R in t. Each Rule-Rule Coverage (ERRC): for every rule R and every applicable i, TR includes every R;i R’ for every R’  alts(R,i). For example, TR includes: RBrace ; 0 RSb 20 S  Brace | Curly |  // RSb. RSc, RSe Brace  “(“ S “)” S // RBrace Curly  “{“ S “}” S // RCurly

ERRC example Importantly, ERRC also requires these to be in TR: –,, – Similarly for Curly Just ({}) covers Brace;1 RSc, but not Brace;3 RSc Similarly (){} covers Brace;3 RSc, but not Brace;1 RSc Example of a tests-set giving full ERRC coverage: – (), ({}), (){}, (()), ()(), {}, {()}, {}(), {{}}, {}{} 21 S  Brace | Curly |  // RSb,c,e Brace  “(“ S “)” S // RBrace Curly  “{“ S “}” S // RCurly

ERRC example Importantly, ERRC also requires these to be in TR: –,, – Similarly for Curly ERRC does not force you to cover all “combinations”  ARRC next slide, but this may produce a very large TR. 22 S  Brace | Curly |  // RSb,c,e Brace  “(“ S “)” S // RBrace Curly  “{“ S “}” S // RCurly

All-combinations Let R be a rule producing k non- terminals. A combination of R is a vector c of : R;1 R’ 1,..., k R’ k A derivation tree t covers such a combination c if it appears as sibling labels in t. All Rule-Rule Coverage (ARRC): for every rule R TR includes every combinations of R. 23 RSb RBrace ( RSe ) RSe  

Subsumption 24 ARRC ERRC pair-wise production coverage production coverage terminal coverage 3-wise production coverage

Regular expression Example: (aa | bb)*, ( “(“ ”)” | “{“ “}” )* Easy to write, but not as expressive as BNF. Syntax : rexp | rexp rexp rexp rexp* rexp+ ( rexp ) 25

The sentences of an Rexp L(e) = the set of sentences described by the rexp e. Defined as below : 26 L(e*) = {  }  L(e+) L(e+) = L(ee*) L(e | f ) = L(e)  L(f) L(de) = { s++t | s  L(d), t  L(e) }

Regular expression Can be equivalently described by a BNF grammar, but this is beyond our scope; check the course Languages and Compilers. In practice people use e.g. POSIX extension; e.g. to describe NL post codes: [1..9][:digit:] [:digit:] [:digit:][:blank:]*[:alpha:][:alpha:] 27

Generating sentences Discussed in AO. We’ll generalize; let’s represent a regular expression with values of this type:  is just Term “” AO also has e M-N, for iterating e at least m times and at most n times. But this can be expressed with Seq and Alt. 28 data Rexp = Term String | Seq Rexp Rexp | Alt Rexp Rexp | Star Rexp

Generating sentences Will generate all derivable string, but of course may not terminate. Make it finite, e.g. by only expanding Star finite times. 29 gen :: Rexp  [String] gen (Term s) = [s] gen (Alt d e) = gen d  gen e gen (Seq d e) = [ s++t | s  gen d, t  gen e ] gen (Star e) = {  }  gen (Seq e (Star e))

Another application, representing your control flow graph The language described by a regular expression can equivalently be described by a state automaton. Example: Such an automaton can be “executed” by following the arrows from the initial state to a final state. This produces the corresponding sequence of “labels”, which is then the sentence generated by the execution. 30 a*b(c|d)ef a b c d ef (final state) (initial state)

representing control flow graph (CFG) with Rexp More on the equivalence between regular expressions and state automata is discussed in the course Languages & Compilers. Notice that a state automaton is a graph! So, it can be seen as describing a control flow graph. It follows that we can represent a CFG with a regular expression. To distinguish the arrows in the CFG, we will first assign a unique label to each. 31

Simple example Not so complicated, but things can get a bit confusing when you have nested loops. Next slides describe a conversion algorithm; this is from AO a*b(c|d 1 d 2 )ef a b d1 c ef (exit node) (entry node) d2

© Ammann & Offutt33 You can merge sequential edges Assuming one single end-node; else add virtual end. Combine/multiply sequential edges Example: combine edges h and i g a c hi f e d b a c i h g f e d b Introduction to Software Testing (Ch 2)

© Ammann & Offutt34 You can merge parallel edges Combine parallel edges (edges with the same source and target) Example : Combine edges b and c g a c hi f e d b g a f e d b + c Introduction to Software Testing (Ch 2)

© Ammann & Offutt35 You can remove self-Loops Combine all self-loops (loops from a node to itself) Add a new “dummy” node An incoming edge with exponent Merge the resulting sequential edges with multiplication g a hi f e d b + c g a hi b + c 3 f e* d 3’ Introduction to Software Testing (Ch 2) de*f 24

© Ammann & Offutt36 You can remove “middle node” A middle node  not an initial nor final node. Replace the middle node by inserting edges from all predecessors to all successors. But the middle node should not self-loop. Multiply path expressions from all incoming with all outgoing edges C A B D AC AD BC BD Introduction to Software Testing (Ch 2)

© Ammann & Offutt37 Example of removing middle Remove node 2 Edges (1, 2) and (2, 4) become one edge Edges (4, 2) and (2, 4) become a self-loop g a hi b + c de*f a hi bde*f + cde*f gde*f Introduction to Software Testing (Ch 2)

Keep doing it until only one edge is left … Introduction to Software Testing (Ch 2) © Ammann & Offutt38 a hi bde*f + cde*f gde*f hi abde*f + acde*f gde*f hiabde*f + acde*f(gde*f)* 046 4’ 06 abde*f (gde*f)* hi + acde*f (gde*f)* hi

Applications The obvious one: we can use gen exp to get the set of “all” possible test paths through the CFG, but this is perhaps not very useful because what we ultimately want are test cases. But you can calculate some other useful information. 39 a*b(c|d 1 d 2 )ef a b d1 c ef (exit node) (entry node) d2

Calculating the number of paths in the CFG Iterating more than once is treated equivalent as iterating just once. AO: you can do that by “transforming” your “ expression: a*b(c|d 1 d 2 )ef  (  |a)b(c|d 1 d 2 )ef  (1 + 1)*1*(1 + 1*1)*1 = 4 40 a*b(c|d 1 d 2 )ef a b d1 c ef (exit node) (entry node) d2

But we can of course also doing it like this... Notice similar structure as in “gen”; we just use different operators. 41 cnt :: Rexp  Int cnt (Term s) = 1 cnt (Alt d e) = cnt d + cnt e cnt (Seq d e) = cnt d * cnt e cnt (Star e) = 1 + cnt e

Other applications 1.Calculating the longest path through the CFG/regular-exp 2.Calculating the minimum number of paths that would cover all branches (assuming loops always have an exit edge) 3.Calculating a minimalistic set of test paths that would satisfy (2) a*b(c|d 1 d 2 )ef a b d1 c ef (exit node) (entry node) d2

We can do them too, by folding maxLength :: Rexp  Int maxLength (Term s) = length s maxLength (Alt d e) = maxLength d `max` maxLength e maxLength (Seq d e) = maxLength d + maxLength e maxLength (Star e) = maxLength e minCnt :: Rexp  Int minCnt(Term s) = 1 minCnt (Alt d e) = minCnt d + minCnt e minCnt (Seq d e) = minCnt d `max` minCnt e minCnt (Star e) = 1 + minCnt e minPaths :: Rexp  [String]... do this yourself.

Complementary operation analysis Example of “complementary” operations (here called C/create and D/destruct): – push and pop – fileOpen and fileClose – getLock and releaseLock an execution path that contain more destructs than creates is suspicious  not necessarily an error, because at the actual run it may still happen that some of the destructs simply have no effect. Actually, #C  #D should hold along any prefix of an execution path. 44

Complementary operation analysis Given an execution s, let  (t) = number of C’s in t – number of D’s in t.  (t) has to be  0, otherwise  unsafe. Actually, for all prefixes s of t  check that  (s)  0 But can we check this for all executions of the CFG? 45 (CD* D | CD) D D C C D D D

Complementary operation analysis Consider the regexp that equivalently describes the graph. We’ll write a function  :: Rexpr  [Formula] to generate formulas describing all possible  ’s of all sentences of the regular expr. Checking safety is then “simple” : safe :: Rexpr  Bool safe e = all [ f  0 is valid | f   e ] 46 (CD* D | CD) D D C C D D D

The algorithm 47  :: Rexp  [Formula]  C = [ “1” ]  D = [ “-1” ]   = [ “0” ]  (Alt d e) =  d   e  (Seq d e) = [ k + m | k   d, m   e ]  (Star e) = [ k * n | k   e ], where n is a fresh name To also generate constraints over prefixes, extend  to produce two components, to also produce formulas for the prefixes, e.g.:  (Seq d e) = (  1 d   2 d, [ k + m | k   2 d, m   2 e ] )