Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo,

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

CSE 311 Foundations of Computing I
4b Lexical analysis Finite Automata
Lecture 24 MAS 714 Hartmut Klauck
Traveling Salesperson Problem
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Finite Automata CPSC 388 Ellen Walker Hiram College.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Simplifying CFGs There are several ways in which context-free grammars can be simplified. One natural way is to eliminate useless symbols those that cannot.
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet: Languages.
Complexity 12-1 Complexity Andrei Bulatov Non-Deterministic Space.
Introduction to Computability Theory
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 Introduction to Computability Theory Lecture2: Non Deterministic Finite Automata Prof. Amos Israeli.
1 Module 19 LNFA subset of LFSA –Theorem 4.1 on page 131 of Martin textbook –Compare with set closure proofs Main idea –A state in FSA represents a set.
1 Introduction to Computability Theory Lecture2: Non Deterministic Finite Automata (cont.) Prof. Amos Israeli.
Graph COMP171 Fall Graph / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D E A C F B Vertex Edge.
Lecture 18 NFA’s with -transitions –NFA- ’s Formal definition Simplifies construction –LNFA- –Showing LNFA  is a subset of LNFA and therefore a subset.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
1 Lecture 20 LNFA subset of LFSA –Theorem 4.1 on page 105 of Martin textbook –Compare with set closure proofs Main idea –A state in FSA represents a set.
NFA- to NFA conversion. Purpose This presentation presents an example execution of the algorithm which takes as input an NFA with -transitions and produces.
1 Reducibility. 2 Problem is reduced to problem If we can solve problem then we can solve problem.
Great Theoretical Ideas in Computer Science.
Regular Model Checking Ahmed Bouajjani,Benget Jonsson, Marcus Nillson and Tayssir Touili Moran Ben Tulila
Formal Language Finite set of alphabets Σ: e.g., {0, 1}, {a, b, c}, { ‘{‘, ‘}’ } Language L is a subset of strings on Σ, e.g., {00, 110, 01} a finite language,
Finite-State Machines with No Output
NFA ε - NFA - DFA equivalence. What is an NFA An NFA is an automaton that its states might have none, one or more outgoing arrows under a specific symbol.
Theory of Computing Lecture 15 MAS 714 Hartmut Klauck.
Lecture 03: Theory of Automata:08 Finite Automata.
Basics of automata theory
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
1 Section 14.2 A Hierarchy of Languages Context-Sensitive Languages A context-sensitive grammar has productions of the form xAz  xyz, where A is a nonterminal.
Lecture 07: Formal Methods in SE Finite Automata Lecture # 07 Qaisar Javaid Assistant Professor.
Lexical Analysis Constructing a Scanner from Regular Expressions.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
Push-down Automata Section 3.3 Fri, Oct 21, 2005.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
CHAPTER 1 Regular Languages
 2005 SDU Lecture13 Reducibility — A methodology for proving un- decidability.
CS 203: Introduction to Formal Languages and Automata
Modeling Computation: Finite State Machines without Output
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
1 Ch. 2: Getting Started. 2 About this lecture Study a few simple algorithms for sorting – Insertion Sort – Selection Sort (Exercise) – Merge Sort Show.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2007.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
98 Nondeterministic Automata vs Deterministic Automata We learned that NFA is a convenient model for showing the relationships among regular grammars,
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
1 Section 11.2 Finite Automata Can a machine(i.e., algorithm) recognize a regular language? Yes! Deterministic Finite Automata A deterministic finite automaton.
Deterministic Finite Automata Nondeterministic Finite Automata.
CSE 105 theory of computation
Jaya Krishna, M.Tech, Assistant Professor
Chapter 2 FINITE AUTOMATA.
Principles of Computing – UFCFA3-30-1
4. Properties of Regular Languages
Minimal DFA Among the many DFAs accepting the same regular language L, there is exactly one (up to renaming of states) which has the smallest possible.
CSE 311: Foundations of Computing
Finite Automata.
4b Lexical analysis Finite Automata
4b Lexical analysis Finite Automata
CSE 311 Foundations of Computing I
Some Graph Algorithms.
Presentation transcript:

Three New Algorithms for Regular Language Enumeration Erkki Makinen University of Tempere Tempere, Finland Margareta Ackerman University of Waterloo Waterloo, ON

What kind of words does this NFA accepts? A B C D E

ε Cross-section problem: enumerate all words of length n accepted by the NFA in lexicographic order A B C D E

ε Enumeration problem: enumerate the first m words accepted by the NFA in length-lexicographic order A B C D E

ε Min-word problem: find the first word of length n accepted by the NFA A B C D E

Applications Correctness testing, provides evidence that an NFA generates the expected language. An enumeration algorithm can be used to verify whether two NFAs accept the same language (Conway, 1971). A cross-section algorithm can be used to determine whether every word accepted by a given NFA is a power - a string of the from w n for n>1, |w|>0. (Anderson, Rampersad, Santean, and Shallit, 2007) A cross-section algorithm can be used to solve the “k- subset of an n-set” problem: Enumerate all k-subset of a set in alphabetical order. (Ackerman & Shallit, 2007)

Objectives Find algorithms for the three problems that are Asymptotically efficient in – Size of the NFA (s states and d transitions) – Output size (t) – The length of the words in the cross-section (n) Efficient in practice

Previous Work A cross-section algorithm, where finding each consecutive word is super-exponential in the size of the cross-section (Domosi, 1998). A cross-section algorithm that is exponential in n (length of words in the cross-section) is found in the Grail computation package. – “Breast-First-Search” approach – Trace all paths of length n in the NFA, storing the paths that end at a final state. – O(dσ n+1 ), where d is the number of transitions in the NFA and σ is the alphabet size.

Previous Polynomial Algorithms: Makinen, 1997 Dynamic programming solution – Min-word O(dn 2 ) – Cross-section O(dn 2 +dt) – Enumeration O(d(e+t)) e: the number of empty cross-section encountered d: the number of transitions in the NFA n: the length of words in the cross-section t: the number of characters in the output Quadratic in n

Previous Polynomial Algorithms: Ackerman and Shallit, 2007 Linear in the length of words in the cross-section – Min-word: O(s n) – Cross-section: O(s n+dt) – Enumeration: O(s c+dt) c: the number of cross-section encountered d: the number of transitions in the NFA n: the length of words in the cross-section t: the number of characters in the output Linear in n

Previous Polynomial Algorithms: Ackerman and Shallit, 2007 The algorithm uses “smart breadth first search,” following only those paths that lead to a final state. Main idea: compute a look-ahead matrix, used to determine whether there is a path of length i starting at state s and ending at a final state. In practice, Makinen’s algorithm (slightly modified) is usually more efficient, except on some boundary cases.

Contributions Present 3 algorithms for each of the enumeration problems, including: O(dn) algorithm for min-word O(dn+dt) algorithm for cross-section Algorithms with improved practical performance for each of the enumeration problems

Contributions: Detailed We present three sets of algorithms 1.AMSorted: - An efficient min-word algorithm, based on Makinen’s original algorithm. - A cross-section and enumeration algorithms based on this min-word algorithm. 2.AMBoolean: - A more efficient min-word algorithm, based on minWordAMSorted. - A cross-section and enumeration algorithms based on this min-word algorithm. 3. Intersection-based: - An elegant min-word algorithm. - A cross-section algorithm based on this min-word algorithm.

Key ideas behind our first two algorithms -Makinen’s algorithm uses simple dynamic programming, which is efficient in practice on most NFAs. -The algorithm by Ackerman & Shallit uses “smart breadth first search,” following only those paths that lead to a final state. -We build on these ideas to yield algorithms that are more efficient both asymptotically and in practice.

Makinen’s original min-word algorithm A B C 123 A -(3,C) B 0(2,B)(0,A) C 1(1,B) S[i] stores a representation of the minimal word w of length i that appears on a path from S to a final state. 1

Makinen’s original min-word algorithm A B C 123 A -(3,C) B 0(2,B)(0,A) C 1(1,B) The minimal word of length n can be found by tracing back from the last column of the start state. 1

Makinen’s original min-word algorithm Initialize the first column For columns i = 2...n – For each state S Find S[i] by comparing all words of length i appearing on paths from S to a final state. 123 A-(3,C) B0(2,B)(0,A) C1(1,B) A B C 1

Makinen’s original min-word algorithm Initialize the first column For columns i = 2...n – For each state S Find S[i] by comparing all words of length i appearing on paths from S to a final state. i operations 123 A-(3,C) B0(2,B)(0,A) C1(1,B) A B C 1

Makinen’s original min-word algorithm Initialize the first column For columns i = 2...n – For each state S Find S[i] by comparing all words of length i appearing on paths from S to a final state. i operations Theorem: Makinen’s original min-word algorithm is O(dn 2 ).

New min-word algorithm: MinWordAMSorted Idea: Sort every columns by the words that the entries represent A B C A-(3,C) B0(2,B)(0,A) C1(1,B)

New min-word algorithm: MinWordAMSorted We define an order on {S[i] : S a state in N}. If A[1]=a and B[1]=b, where a<b, then A[1]<B[1]. For i > 1, A[i] = (a, A’) and B[i] = (b, B’) – If a<b, then A[i] < B[i]. – If a = b, and A’[i-1] < B’[i-1], then A[i] < B[i]. If A[i] is defined, and B[i] is undefined, then A[i] > B[i].

New min-word algorithm: MinWordAMSorted Initialize the first column For columns i = 2...n – For each state S Find S[i] using only column i-1 and the edges leaving S. – Sort column i 123 A-(3,C) B0(2,B)(0,A) C1(1,B) A B C 1

New min-word algorithm: MinWordAMSorted Initialize the first column For columns i = 2...n – For each state S Find S[i] using only column i-1 and the edges leaving S. – Sort column i d operations s log s operations Theorem: The algorithm minWordAMSorted is O((s log s +d) n).

New cross-section algorithm: crossSectionAMSorted A state S is i-complete if there exists a path of length i from state S to a final state. To enumerate all words of length n: 1. Call minWordAMSorted (create a table) 2. Perform a “smart BFS”: - Begin at the start state. - Follow only those paths of length n that end at a final state, by using the table to identify i-complete states. O((s log s +d) n). O(dt) Theorem: The algorithm crossSectionAMSorted is O(n (s log s + d) + dn).

New enumeration algorithm: enumAMSorted Run the cross-section algorithm until the required number of words are listed, while reusing the table. Theorem: The algorithm enumAMSorted is O(c (s log s + d)+ dt). c: the number of cross-section encountered d: the number of transitions in the NFA t: the number of characters in the output

What have we got so far? c: the number of cross-section encountered e: the number of empty cross-section encountered d: the number of transitions in the NFA n: the length of words in the cross-section t: the number of characters in the output

New min-word algorithm: minWordAMBoolean Idea: instead of using a table to find the minimal word, construct a table whose only purpose is to determine i-complete states. Can be done using a similar algorithm to minWordAMSorted, but more efficiently, since there is no need to sort.

New min-word algorithm: minWordAMBoolean A B C 123 A FTT B TTT C TTF

Fill in the first column For i=2... n – For every state S Determine whether S is i-complete using only the transitions leaving S and column i-1 Starting at the start state, follow minimal transitions to paths that can complete a word of length n (using the table). 123 AFTT BTTT CTTF A B C

New min-word algorithm: minWordAMBoolean Fill in the first column For i=2... n – For every state S Determine whether S is i-complete using only the transitions leaving S and column i-1 Starting at the start state, follow minimal transitions to paths that can complete a word of length n (using the table). 123 AFTT BTTT CTTF A B C d operations

New min-word algorithm: minWordAMBoolean Fill in the first column For i=2... n – For every state S Determine whether S is i-complete using only the transitions leaving S and column i-1 Starting at the start state, follow minimal transitions to paths that can complete a word of length n (using the table). d operations Theorem: The algorithm minWordAMBoolean is O(dn).

New cross-section algorithm: crossSectionAMBoolean Extend to a cross-section algorithm using the same approach as the Sorted algorithm. To enumerate all words of length n: – Call minWordAMBoolean (create a table) – Perform a “smart BFS”: - Begin at the start state. - Follow only those paths of length n that end at a final state, by using the table to identify i-complete states. O(dn). O(dt) Theorem: The algorithm crossSectionAMBoolean is O(dn+dt).

New enumeration algorithm: enumAMBoolean Run the cross-section algorithm until the required number of words are listed, while reusing the table. Theorem: The algorithm enumAMBoolean is O(de+ dn). e: the number of empty cross-section encountered d: the number of transitions in the NFA n: the length of words in the cross-section t: the number of characters in the output

What have we got so far? c: the number of cross-section encountered e: the number of empty cross-section encountered d: the number of transitions in the NFA n: the length of words in the cross-section t: the number of characters in the output

We present surprisingly elegant min-word and cross-section algorithms that have the asymptotic efficiency of the Boolean-based algorithms. However, these algorithms are not as efficient in practice as the Boolean-based and Sorted- based algorithms. Intersection-Based Algorithms

Let N be the input NFA, and A be the NFA that accepts the language of all words of length n. 1.Let C = N x A 2.Remove all states of C that cannot be reached from the final states of C using reversed transitions. 3. Starting at the start state, follow the minimal n consecutive transitions to a final state. New min-word algorithm: minWordIntersection

Let N be the input NFA, and A be the NFA that accepts the language of all words of length n. 1.Let C = N x A 2.Remove all states of C that cannot be reached from the final states of C using reversed transitions. 3. Starting at the start state, follow the minimal n consecutive transitions to a final state. New min-word algorithm: minWordIntersection A B C Automaton N Let n = Automaton A 0

Let N be the input NFA, and A be the NFA that accepts the language of all words of length n. 1.Let C = N x A 2.Remove all states of C that cannot be reached from the final states of C using reversed transitions. 3. Starting at the start state, follow the minimal n consecutive transitions to a final state. New min-word algorithm: minWordIntersection Automaton N Automaton C A B C 0

Let N be the input NFA, and A be the NFA that accepts the language of all words of length n. 1.Let C = N x A 2.Remove all states of C that cannot be reached from the final states of C using reversed transitions. 3. Starting at the start state, follow the minimal n consecutive transitions to a final state. New min-word algorithm: minWordIntersection 1 1

Let N be the input NFA, and A be the NFA that accepts the language of all words of length n. 1.Let C = N x A 2.Remove all states of C that cannot be reached from the final states of C using reverse transitions. 3. Starting at the start state, follow the minimal n consecutive transitions to a final state. New min-word algorithm: minWordIntersection 1 1 Thus the minimal word of length 2 accepted by N is “11”

1.Let C = N x A 2.Remove all states of C that cannot be reached from the final states of C using reverse transitions. 3. Starting at the start state, Follow the minimal n consecutive transitions to final. Asymptotic running time of minWordIntersection Each step is proportional to size of C, which is O(nd). Theorem: The algorithm minWordIntersection is O(dn). Theorem: The algorithm minWordIntersection is O(dn). Concatenate n copies of N.

To enumerate all words of length n, perform BFS on C = N x A, and remove all states not reachable from final state removed (using reverse transitions). Since all paths of length n starting at the start state lead to a final state, there is no need to check for i-completness. New cross-section algorithm: crossSectionIntersection Theorem: The algorithm crossSectionIntersection is O(dn+dt).

We compared Makinen’s, Ackerman-Shallit, AMSorted, and AMBoolean, and Intersection-based algorithms. Tested the algorithms on a variety of NFAs: dense, sparse, few and many final states, different alphabet size, worst case for Makinen’s algorithm, ect… Here are the best performing algorithms: – Min-word: AMSorted – Cross-section: AMBoolean – Enumeration: AMBoolean Practical Performance

Summary c: the number of cross-section encountered e: the number of empty cross-section encountered d: the number of transitions in the NFA n: the length of words in the cross-section t: the number of characters in the output : most efficient in practice

Extending the intersection-based cross-section algorithm to an enumeration algorithm. Lower bounds. Can better results be obtained using a different order? Restricting attention to a smaller family of NFAs. Open problems