Presentation is loading. Please wait.

Presentation is loading. Please wait.

Functional Design and Programming Lecture 10: Regular expressions and finite state machines.

Similar presentations


Presentation on theme: "Functional Design and Programming Lecture 10: Regular expressions and finite state machines."— Presentation transcript:

1 Functional Design and Programming Lecture 10: Regular expressions and finite state machines

2 Literature  These notes  Randal C. Nelson’s notes on finite automata and regular expressions: http://www.cs.rochester.edu/u/nelson/courses /csc_173/fa/

3 Exercises  Consider: alphabetic identifiers in Standard ML; file names that end with “.txt”; words in a text that contain subsequence “s”, “e”, “c”, “r”, “e”, “t”; comments in XML.  For each of the above: Write a regular expression that generates the strings Give a finite state automaton that recognizes the strings.  See home page for more exercises.

4 Overview  Regular expressions  Finite state automata  Applications of finite automata and regular expressions  Regular expressions and context-free grammars  Construction of finite automata  Implementation of deterministic finite automata

5 Regular expressions  Expressions that describe (possibly) infinite sets of strings.  Examples:.*\.sml: strings ending with “.sml”.*glei.*: strings containting substring “glei”.

6 Regular expressions: Definition Reg. exp.Set denoted  {} a{a} RQ {st: s  L(R), t  L(Q)} R|Q L(R)  L(Q) R* {s 1 s 2...s n : s i  R}

7 Derived regular expressions Reg. expDefinition R+RR* R? R|  [a-z]a | b |... | z [^a-z]disjunction of all symbols but a, b,..., z.disjunction of all symbols

8 Finite State Machines  Finite state automaton: Description of abstract machine with a finite number of different states and transitions on input symbols between states.  Finite state transducer: Like finite state automaton, but additionally with output symbols on transitions.

9 Finite State Automata  A finite state automaton is a 5-tuple consisting of: a finite set  of characters or symbols (alphabet), a finite set Q of states, a start state q 0  Q, a subset F  Q of accepting (or final) states, a set of transitions (q, a, q’) with q, q’  Q, a  S, written qq’ a

10 Finite state automata  An FSM accepts a string s = a 1 a 2...a n if there are transitions ending in a final state: q0q’ a1a2an final state

11 Finite state automata...  An FSM recognizes the language (set of strings) L   which it accepts.  An FSM is deterministic if no state more than 1 transition on any given symbol.  Theorem: The same classes of languages over  are definable (recognizable) by finite state machines, deterministic finite state machines and regular expressions.

12 Applications of regular expressions and FSM’s  Text searching and processing  State-based protocols  Dialog/interaction control  Hardware verification  Protocol verification  Programming language processing  Natural language processing

13 Regular expressions and context-free grammars  Regular expressions can be understood as restricted CFG’s: RE = CFG incl. * (Kleene closure) with no (mutual) recursive definitions of nonterminals.  Regular definitions: A sequence of definitions r i = R i for variables r i such that R i is regular expression with possible occurrences of r 1,...,r i-1.

14 Regular expressions and finite state automata  Regular expressions are often convenient methods of specifying a desired language.  Deterministic finite state machines are a good model for efficient implementation of recognizing the language.

15 Construction of finite automata Regular expression Nondeterministic finite state automaton (NFA) Deterministic finite state automaton (DFA) subset construction(trivial) Thomson’s construction Path algebra construction

16 Implementation of deterministic FSM’s (1)  Table-based implementation: Represent states by indexes 0,...,n-1. Represent characters by indexes 0,...,m-1 (e.g., m=256). Represent transitions by two-dimensional vector (vector of vector of indices or 2-dimensional array/vector) T such that T(q,a)= SOME q’ if (q,a,q’) is a transition. [SML: Vector.sub(Vector.sub(T, q), a)]

17 Implementation of deterministic FSM’s (2)  Sparse table-based implementation: Represent states by indexes 0,...,n-1. Represent transitions by one-dimensional vector of association lists such that lookup(Vector.sub(T, q), a) = SOME q’ if T(q,a)=q’. Optimization: Use a better data structure than association lists (e.g., hash tables, search trees).

18 Implementation of deterministic FSM’s (3)  Functional implementation: datatype S = STATE of (symbol -> S) * bool. Represent states by by functions of type S. Transitions are represented as part of the state (first component). Whether a state is accepting or not is represented by the second component. Execute a transition by applying the second component of the state: fun trans (STATE (t, f)) a = t a Note: Intuitively trans corresponds to a curried version of T: Q *  Q. It can be implemented more efficiently than the uncurried version (in principle).

19 Other problems  Minimize a DFA.  Decide whether two DFA’s are equivalent.


Download ppt "Functional Design and Programming Lecture 10: Regular expressions and finite state machines."

Similar presentations


Ads by Google