 Regular Languages Regular Expressions Finite-State Automata

Presentation on theme: "Regular Languages Regular Expressions Finite-State Automata"— Presentation transcript:

Regular Languages Regular Expressions Finite-State Automata
Torbjörn Lager, Stockholm University Regular Languages Regular Expressions Finite-State Automata Välkommen. Fokus på verktyg? Nej! Inte så mycket morfologi

Languages Sets of strings Example: {“ac” “abc” “abbc” “abbbc” ...}
FST - Torbjörn Lager, UU

Finite-State Automata
Directed graphs consisting of states, and arcs labelled with symbols. For example: 2 1 a c b FST - Torbjörn Lager, UU

Regular expressions Example: [a b* c] Note:
Atomic symbols (e.g. ‘a’) denote languages (e.g. {“a”}) Operations (here: concatenation and kleene-star) are operations over languages [a b* c] FST - Torbjörn Lager, UU

Regular expressions, regular languages and automata
denote compile into Regular languages Finite-state automata generate accept FST - Torbjörn Lager, UU

Regular expressions, regular languages and automata
[a b* c] denotes compiles into b {“ac” “abc” “abbc” ...} a c 1 2 generates accepts FST - Torbjörn Lager, UU

Regular expression operators
Concatenation A B Union A | B Iteration (Kleene-star) A* Difference A - B Intersection A & B Grouping of expressions [A] FST - Torbjörn Lager, UU

Examples a b {“ab”} a [b|c] {“ab” “ac”} a* [b|c]* {“” “a” “ab” ... }
[a|b] & [b|c] {“b”} [a|b] - b {“a”} [a|b] - [b|a] {} FST - Torbjörn Lager, UU

Special symbols ? The any symbol ?* The universal language
[] The empty-string language 0 (or “”) The empty string (epsilon) FST - Torbjörn Lager, UU

Regular expression operators
Optionality (A) Kleene-plus A+ Complement ~A Containment \$A Restriction A => B _ C FST - Torbjörn Lager, UU

Examples a (b) c {“ac” “abc”} a b+ c {“abc” “abbc” “abbbc” ...}
~[a b c] {“” ... “a” ... “ab” ... “abca” ..} \$[a|b] {“a” ... “abba” ... “abcd” ...} b => a _ c {“” ... “a” ... “ccc” ... “abc” ..} FST - Torbjörn Lager, UU

Component technologies in FST
Word lists and lexica Tokenisers Morphological analysers Part-of-speech taggers Parsers detta är sånt som vi ska göra under laborationer FST - Torbjörn Lager, UU

Applications of FST Named-entity recognition Information extraction
Corpus linguistics Spelling- and grammar checking Speech processing applications FST - Torbjörn Lager, UU

The Xerox Finite-State Tool
Compiles extended regular expressions into finite-state machines (automata and transducers) Allows the user to display, examine and modify the machines FST - Torbjörn Lager, UU

Non-deterministic FSAs
At least one state has more than one transition leading from it labelled with the same symbol b a b 1 2 FST - Torbjörn Lager, UU

Determinization of FSAs
Any non-deterministic FSA can be transformed into an equivalent deterministic FSA. Example: Determinize for efficiency! b b a b a b 1 2 1 2 FST - Torbjörn Lager, UU

Minimization of FSAs Any (deterministic) FSA can be transformed into an equivalent FSA that has a minimal number of states. Minimize for space! Storleken på automaten växer i värsta fall linjärt med storleken på språket FST - Torbjörn Lager, UU

Representing word lists
Think of a word list as a regular language Use the calculus of regular expressions to query and update the wordlist Determinize for speed! Minimize for space! FST - Torbjörn Lager, UU

Various equivalences (A) = A|[] A+ = A A* A+ = A* - [] A - B = A & ~B
De två sista är De-morgans lagar! FST - Torbjörn Lager, UU

Various equivalences A - A = ~[?*] A | ~[?*] = A A [] = A
FST - Torbjörn Lager, UU

Important theoretical results
Kleene’s theorem (concerning FSAs) Closure properties of regular languages and regular relations Decidability FST - Torbjörn Lager, UU

FSAs and regular expressions
Kleene’s theorem: Any language recognised by an FSA is denoted by a regular expression and any language denoted by a regular expression can be recognised by a FSA. FST - Torbjörn Lager, UU

Nondeterministic FSA with epsilon transitions
Regex to FSA to regex Nondeterministic FSA Deterministic FSA Nondeterministic FSA with epsilon transitions Regular expressions Picture adapted from Hopcroft & Ullman 1979 FST - Torbjörn Lager, UU

From regular expressions to finite-state automata
The only really necessary operators: Disjunction Concatenation Iteration Sidenote: Compare regular grammars: A --> x B A --> x (where A and B are nonterminals, and where x is a sequence of terminals) FST - Torbjörn Lager, UU

Closure properties of regular languages
A set is said to be closed under an operation iff applying the operation to members of the set will never take us outside the set Example: if A and B are regular languages, then [A|B] is always regular. Therefore regular languages are closed under union. FST - Torbjörn Lager, UU

Decidability Given one automaton A: Given two automata A1 and A2:
Is the string S a string in L(A) ? Does L(A) contain any strings at all ? Is L(A) equivalent to ?* ? Given two automata A1 and A2: Is L(A1) a subset of L(A2) ? Are L(A1) and L(A2) equivalent ? Do L(A1) and L(A2) overlap ? FST - Torbjörn Lager, UU

FST - Torbjörn Lager, UU

Download ppt "Regular Languages Regular Expressions Finite-State Automata"

Similar presentations