# Generating Complex Input (and its other applications) Course Software Testing & Verification 2013/14 Wishnu Prasetya.

## Presentation on theme: "Generating Complex Input (and its other applications) Course Software Testing & Verification 2013/14 Wishnu Prasetya."— Presentation transcript:

Generating Complex Input (and its other applications) Course Software Testing & Verification 2013/14 Wishnu Prasetya

Content BNF to describe inputs Generating inputs Coverage Regular expression Other applications of regular expression 2 Important note: generating complex inputs is a non-trivial task. But this is only partially addressed by AO. E.g. chapter 5 set up the right background, but then they went on to focus more on mutation. Here we will complement that with additional materials. Usable sections from AO related to the issues of input generation are the following: Section 5.1.1 (very short!) about BNF Section 2.7 about regular expression, but this section is actually about using regular expression to e.g. calculate the needed test paths to deliver a given graph coverage. This is not directly related to input generation; but we will discuss this as well.

Describing “allowed” inputs Using Bakus Naur Form (BNF) notation / context free grammar (see example in p171) : Terminologies: start symbol, terminal, non-terminal, epsilon (not in book, just check Wikipedia, or the lecture notes of Languages & Compilers) 3 S  Brace | Curly |  Brace  “(“ S “)” S Curly  “{“ S “}” S implicitly THREE production rules!

Production Rule A production rule has the form N  Z, where N is a non-terminal and Z is a sequence of symbols. A rule like A  a(B|C)d is seen as a short hand for a set of production rules: A  aBd A  aCd People often use extended BNF e.g. : Brace  ( “(“ S “)” )* 4

Example: NL post codes Sometimes there are additional constraints, e.g. codes above 9999 XL do not actually exists (do not map to an existing address). A constraint is not always expressible in BNF; or it is expressible but not conveniently. 5 NLpostcode  Area Space Street Area  FirstDigit Digit Digit Digit Street  Letter Letter FirstDigit  1 | 2... Digit  0 | 1 | 2... Letter  a | b | c... | A | B | C... Space  “ “ *

Generating inputs 6 S  Brace | Curly |  Brace  “(“ S “)” S Curly  “{“ S “}” S A derivation is a series of expansion of the grammar that result in a sequence of terminal symbols. It follows that the sequence is a valid sentence of the grammar. We can use this to generate valid sentences. Example : S  Brace  ( S ) S  (  ) S  (  ) 

Derivation tree 7 S  Brace // RSb S  Curly // RSc S   // RSe Brace  “(“ S “)” S Curly  “{“ S “}” S A derivation : S  Brace  ( S ) S  (  ) S  (  )  RSb RBrace ( RSe ) RSe   A derivation can also be described by a derivation tree such as above. Given such a tree, you can reconstruct what the derived sentence is.

One more example 8 RSb RBrace ( RSc ) RSe  CRurly { RSe } RSe   Represent such a tree e.g. by: data Dtree = Term String | RuleName  [DTree] To get the sentence from a d::DTree, simply flatten it. S  Brace // RSb S  Curly // RSc S   // RSe Brace  “(“ S “)” S Curly  “{“ S “}” S

A more generic formulation: basic idea Every derivation rule implicitly generates all derivation trees that begins with its LHS non- terminal. We will implement this by a function of type: type Gen = ()  [Dtree] Later on, we still have to select which trees to take. To actually “run” a generator gen to produce sentences we do: map flatten. select. gen \$ (), with some implementation of “select”. May not terminate if you have infinite trees  later. 9

Combining generators Combining generators:  to interpret “alternatives”, and rule and rule_ to interpret a production rule:  :: Gen  Gen  Gen (f  g) () = f()  g() rule :: Name  [Gen]  Gen rule n rhs = ( (). [ n   |  rhs 1 ()  rhs 2 ... ] ) And this operator to represent the interpretation of terminals: term :: String  Gen term v = (). [ Term v ] 10

Defining “rule” rule :: Name  [Gen]  Gen Only showing a simpler case, suppose the rule was: rule1 : A  BC rule “rule1” [genB, genC] = (). [“rule1”  [ t, u ] | t  genB(), u  genC() ] 11

Example 12 S  Brace | Curly |  Brace  “(“ S “)” S Curly  “{“ S “}” S s = rule “RSb” [brace]  rule “RSc” [curly]  rule “RSe” [term “”] brace = rule “Rbrace” [term “(”, s, term “)”, s ] curly = rule “Rcurly” [term “{”, s, term “}”, s] Notice the similarity between the structures of the generator and the original grammar. An initial generator is the generator that corresponds to the grammar initial symbol. So, the function s above is an initial generator.

Only generate trees of depth < kMax The previous generators generate infinite number of trees, some may have infinite depth. In a non-lazy language, extend the generators so that it counts how far the current node is from the root: Adapting the combinators, e.g.: 13 term s k = if k  0 then [Term s] else [] (f  g) k = f k  g k type Gen = Int  [Derivation]

Defining “rule” type Gen = Int  [Derivation] rule :: Name  [Gen]  Gen Only showing a simpler case : rule name [gen1,gen2] = newrule where newrule k = if k < 0 then [] else [ name  [ t, u ] | t  gen1 (k-1), u  gen2 (k-1) ] 14

Example revisited s 2 produces { "()", "{}", "" } s 4 produces: Too many... However, you actually have the full derivation trees which you can exploit for filtering. 15 s = rule “s0” [brace]  rule “s1” [curly]  rule “s2” [term “”] brace = rule “brace” [term “(”, s, term “)”, s ] curly = rule “curly” [term “{”, s, term “}”, s] ["(())()","(()){}","(())","({})()","({}){}","({} )","()()","(){}","()","{()}()","{()}{}","{()}"," {{}}()","{{}}{}","{{}}","{}()","{}{}","{}","“]

We can see... We can see which terminals are produced. We can see which rules were used. We can infer which non-terminals were produced during the derivation. 16 S  Brace | Curly |  Brace  “(“ S “)” S Curly  “{“ S “}” S RSb RBrace ( RSc ) RSe  RCurly { RSe } RSe  

BNF coverage (C5.29) TR contains each terminal symbol from the given grammar G. (C5.30) TR contains each production rule in G. Production coverage subsumes terminal coverage; but these are usually too weak. Pair-wise production coverage: TR contains every feasible pair (R1,R2) of production rules. Feasible means that they can actually be applied in succession in a derivation from G. Can be generalized to k-wise, but may blow up the size of TR. Alternatively, if G is not too large you can still manually add new requirements to your TR. 17

Pair-wise production coverage A derivation tree t covers covers a pair rule R1;R2 if the pair appears as two consecutive nodes in in t. A set T of derivation trees gives full pair-wise production coverage if every feasible pair of rules R1;R2 is covered by some t in T. Analogously for k-wise coverage. 18 RSb RBrace ( RSc ) RSe  RCurly { RSe } RSe  

Example { “()”, “{}” } gives full terminal as well as production coverage. Combinations of brace-curly and curly-brace can only be enforced by pair-wise coverage. But none of those coverage criteria can distinguish between e.g. ({}) and (){} 19 S  Brace | Curly |  Brace  “(“ S “)” S Curly  “{“ S “}” S

Rule-rule coverage alts(N) = the set of production rules of non-terminal N; alts(R,i) = the set of production rules of the i-th symbol of the rule R; equal to alts(N) if N is the non-terminal at i-th pos. A derivation tree t covers R ;i R’ if it R’ appears as the i-th child of some R in t. Each Rule-Rule Coverage (ERRC): for every rule R and every applicable i, TR includes every R;i R’ for every R’  alts(R,i). For example, TR includes: RBrace ; 0 RSb 20 S  Brace | Curly |  // RSb. RSc, RSe Brace  “(“ S “)” S // RBrace Curly  “{“ S “}” S // RCurly

ERRC example Importantly, ERRC also requires these to be in TR: –,, – Similarly for Curly Just ({}) covers Brace;1 RSc, but not Brace;3 RSc Similarly (){} covers Brace;3 RSc, but not Brace;1 RSc Example of a tests-set giving full ERRC coverage: – (), ({}), (){}, (()), ()(), {}, {()}, {}(), {{}}, {}{} 21 S  Brace | Curly |  // RSb,c,e Brace  “(“ S “)” S // RBrace Curly  “{“ S “}” S // RCurly

ERRC example Importantly, ERRC also requires these to be in TR: –,, – Similarly for Curly ERRC does not force you to cover all “combinations”  ARRC next slide, but this may produce a very large TR. 22 S  Brace | Curly |  // RSb,c,e Brace  “(“ S “)” S // RBrace Curly  “{“ S “}” S // RCurly

All-combinations Let R be a rule producing k non- terminals. A combination of R is a vector c of : R;1 R’ 1,..., k R’ k A derivation tree t covers such a combination c if it appears as sibling labels in t. All Rule-Rule Coverage (ARRC): for every rule R TR includes every combinations of R. 23 RSb RBrace ( RSe ) RSe  

Subsumption 24 ARRC ERRC pair-wise production coverage production coverage terminal coverage 3-wise production coverage

Regular expression Example: (aa | bb)*, ( “(“ ”)” | “{“ “}” )* Easy to write, but not as expressive as BNF. Syntax : rexp | rexp rexp rexp rexp* rexp+ ( rexp ) 25

The sentences of an Rexp L(e) = the set of sentences described by the rexp e. Defined as below : 26 L(e*) = {  }  L(e+) L(e+) = L(ee*) L(e | f ) = L(e)  L(f) L(de) = { s++t | s  L(d), t  L(e) }

Regular expression Can be equivalently described by a BNF grammar, but this is beyond our scope; check the course Languages and Compilers. In practice people use e.g. POSIX extension; e.g. to describe NL post codes: [1..9][:digit:] [:digit:] [:digit:][:blank:]*[:alpha:][:alpha:] 27

Generating sentences Discussed in AO. We’ll generalize; let’s represent a regular expression with values of this type:  is just Term “” AO also has e M-N, for iterating e at least m times and at most n times. But this can be expressed with Seq and Alt. 28 data Rexp = Term String | Seq Rexp Rexp | Alt Rexp Rexp | Star Rexp

Generating sentences Will generate all derivable string, but of course may not terminate. Make it finite, e.g. by only expanding Star finite times. 29 gen :: Rexp  [String] gen (Term s) = [s] gen (Alt d e) = gen d  gen e gen (Seq d e) = [ s++t | s  gen d, t  gen e ] gen (Star e) = {  }  gen (Seq e (Star e))

Another application, representing your control flow graph The language described by a regular expression can equivalently be described by a state automaton. Example: Such an automaton can be “executed” by following the arrows from the initial state to a final state. This produces the corresponding sequence of “labels”, which is then the sentence generated by the execution. 30 a*b(c|d)ef a b c d ef (final state) (initial state)

representing control flow graph (CFG) with Rexp More on the equivalence between regular expressions and state automata is discussed in the course Languages & Compilers. Notice that a state automaton is a graph! So, it can be seen as describing a control flow graph. It follows that we can represent a CFG with a regular expression. To distinguish the arrows in the CFG, we will first assign a unique label to each. 31

Simple example Not so complicated, but things can get a bit confusing when you have nested loops. Next slides describe a conversion algorithm; this is from AO 2.7.1 32 a*b(c|d 1 d 2 )ef a b d1 c ef (exit node) (entry node) d2

© Ammann & Offutt33 You can merge sequential edges Assuming one single end-node; else add virtual end. Combine/multiply sequential edges Example: combine edges h and i g a 021346 c hi f e d b a 0213546 c i h g f e d b Introduction to Software Testing (Ch 2)

© Ammann & Offutt34 You can merge parallel edges Combine parallel edges (edges with the same source and target) Example : Combine edges b and c g a 021346 c hi f e d b g a 021346 f e d b + c Introduction to Software Testing (Ch 2)

© Ammann & Offutt35 You can remove self-Loops Combine all self-loops (loops from a node to itself) Add a new “dummy” node An incoming edge with exponent Merge the resulting sequential edges with multiplication g a 021346 hi f e d b + c g a 0 2 1 4 6 hi b + c 3 f e* d 3’ Introduction to Software Testing (Ch 2) de*f 24

© Ammann & Offutt36 You can remove “middle node” A middle node  not an initial nor final node. Replace the middle node by inserting edges from all predecessors to all successors. But the middle node should not self-loop. Multiply path expressions from all incoming with all outgoing edges C A B 3 2 5 1 4 D AC AD BC 2 5 1 4 BD Introduction to Software Testing (Ch 2)

© Ammann & Offutt37 Example of removing middle Remove node 2 Edges (1, 2) and (2, 4) become one edge Edges (4, 2) and (2, 4) become a self-loop g a 0 2 1 4 6 hi b + c de*f a 01 4 6 hi bde*f + cde*f gde*f Introduction to Software Testing (Ch 2)

Keep doing it until only one edge is left … Introduction to Software Testing (Ch 2) © Ammann & Offutt38 a 01 4 6 hi bde*f + cde*f gde*f 0 4 6 hi abde*f + acde*f gde*f hiabde*f + acde*f(gde*f)* 046 4’ 06 abde*f (gde*f)* hi + acde*f (gde*f)* hi

Applications The obvious one: we can use gen exp to get the set of “all” possible test paths through the CFG, but this is perhaps not very useful because what we ultimately want are test cases. But you can calculate some other useful information. 39 a*b(c|d 1 d 2 )ef a b d1 c ef (exit node) (entry node) d2

Calculating the number of paths in the CFG Iterating more than once is treated equivalent as iterating just once. AO: you can do that by “transforming” your “ expression: a*b(c|d 1 d 2 )ef  (  |a)b(c|d 1 d 2 )ef  (1 + 1)*1*(1 + 1*1)*1 = 4 40 a*b(c|d 1 d 2 )ef a b d1 c ef (exit node) (entry node) d2

But we can of course also doing it like this... Notice similar structure as in “gen”; we just use different operators. 41 cnt :: Rexp  Int cnt (Term s) = 1 cnt (Alt d e) = cnt d + cnt e cnt (Seq d e) = cnt d * cnt e cnt (Star e) = 1 + cnt e

Other applications 1.Calculating the longest path through the CFG/regular-exp 2.Calculating the minimum number of paths that would cover all branches (assuming loops always have an exit edge) 3.Calculating a minimalistic set of test paths that would satisfy (2). 4.... 42 a*b(c|d 1 d 2 )ef a b d1 c ef (exit node) (entry node) d2

We can do them too, by folding... 43 maxLength :: Rexp  Int maxLength (Term s) = length s maxLength (Alt d e) = maxLength d `max` maxLength e maxLength (Seq d e) = maxLength d + maxLength e maxLength (Star e) = maxLength e minCnt :: Rexp  Int minCnt(Term s) = 1 minCnt (Alt d e) = minCnt d + minCnt e minCnt (Seq d e) = minCnt d `max` minCnt e minCnt (Star e) = 1 + minCnt e minPaths :: Rexp  [String]... do this yourself.

Complementary operation analysis Example of “complementary” operations (here called C/create and D/destruct): – push and pop – fileOpen and fileClose – getLock and releaseLock an execution path that contain more destructs than creates is suspicious  not necessarily an error, because at the actual run it may still happen that some of the destructs simply have no effect. Actually, #C  #D should hold along any prefix of an execution path. 44

Complementary operation analysis Given an execution s, let  (t) = number of C’s in t – number of D’s in t.  (t) has to be  0, otherwise  unsafe. Actually, for all prefixes s of t  check that  (s)  0 But can we check this for all executions of the CFG? 45 (CD* D | CD) D D C C D D D

Complementary operation analysis Consider the regexp that equivalently describes the graph. We’ll write a function  :: Rexpr  [Formula] to generate formulas describing all possible  ’s of all sentences of the regular expr. Checking safety is then “simple” : safe :: Rexpr  Bool safe e = all [ f  0 is valid | f   e ] 46 (CD* D | CD) D D C C D D D

The algorithm 47  :: Rexp  [Formula]  C = [ “1” ]  D = [ “-1” ]   = [ “0” ]  (Alt d e) =  d   e  (Seq d e) = [ k + m | k   d, m   e ]  (Star e) = [ k * n | k   e ], where n is a fresh name To also generate constraints over prefixes, extend  to produce two components, to also produce formulas for the prefixes, e.g.:  (Seq d e) = (  1 d   2 d, [ k + m | k   2 d, m   2 e ] )

Download ppt "Generating Complex Input (and its other applications) Course Software Testing & Verification 2013/14 Wishnu Prasetya."

Similar presentations