Finite-State Methods in Natural-Language Processing: Basic Mathematics

Slides:



Advertisements
Similar presentations
CS2303-THEORY OF COMPUTATION Closure Properties of Regular Languages
Advertisements

Lecture 24 MAS 714 Hartmut Klauck
Properties of Regular Languages
1Basic Mathematics - Finite-State Methods in Natural-Language Processing: Basic Mathematics Ronald M. Kaplan and Martin Kay.
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet: Languages.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
1 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY (For next time: Read Chapter 1.3 of the book)
Lecture 3: Closure Properties & Regular Expressions Jim Hook Tim Sheard Portland State University.
1 Languages and Finite Automata or how to talk to machines...
Automata & Formal Languages, Feodor F. Dragan, Kent State University 1 CHAPTER 1 Regular Languages Contents Finite Automata (FA or DFA) definitions, examples,
Costas Busch - RPI1 Mathematical Preliminaries. Costas Busch - RPI2 Mathematical Preliminaries Sets Functions Relations Graphs Proof Techniques.
Courtesy Costas Busch - RPI1 Mathematical Preliminaries.
FSA Lecture 1 Finite State Machines. Creating a Automaton  Given a language L over an alphabet , design a deterministic finite automaton (DFA) M such.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
Nondeterminism (Deterministic) FA required for every state q and every symbol  of the alphabet to have exactly one arrow out of q labeled . What happens.
Regular Languages A language is regular over  if it can be built from ;, {  }, and { a } for every a 2 , using operators union ( [ ), concatenation.
NFA Closure Properties Sipser pages pages NFAs also have closure properties We have given constructions for showing that DFAs are closed under.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Mathematical Preliminaries. Sets Functions Relations Graphs Proof Techniques.
1 State SymbolRead- Q E(Q) a b a b a b Convert to a DFA: Start state: Final States:
Theory of Computing Lecture 21 MAS 714 Hartmut Klauck.
Chapter 4 Pumping Lemma Properties of Regular Languages Decidable questions on Regular Languages.
INHERENT LIMITATIONS OF COMPUTER PROGAMS CSci 4011.
1 Closure Properties of Regular Languages L 1 and L 2 are regular. How about L 1  L 2, L 1  L 2, L 1 L 2, L 1, L 1 * ?
CS 203: Introduction to Formal Languages and Automata
Computability Regular expressions. Languages defined by regular expresses = Regular languages (languages recognized by FSM). Applications. Pumping lemma.
Regular Expressions Fundamental Data Structures and Algorithms Peter Lee March 13, 2003.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Lecture 2 Overview Topics What I forgot from last lecture Proof techniques continued Alphabets, strings, languages Automata June 2, 2015 CSCE 355 Foundations.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen Department of Computer Science University of Texas-Pan American.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
Regular Languages Chapter 1 Giorgi Japaridze Theory of Computability.
Deterministic Finite-State Machine (or Deterministic Finite Automaton) A DFA is a 5-tuple, (S, Σ, T, s, A), consisting of: S: a finite set of states Σ:
CIS Automata and Formal Languages – Pei Wang
Foundations of Computing Science
Introduction to the Theory of Computation
Languages.
CSE 105 theory of computation
Closure Properties of Regular Languages
Deterministic FA/ PDA Sequential Machine Theory Prof. K. J. Hintz
PROPERTIES OF REGULAR LANGUAGES
FORMAL LANGUAGES AND AUTOMATA THEORY
CSE 105 theory of computation
Formal Language & Automata Theory
CSE 3813 Introduction to Formal Languages and Automata
PDAs Accept Context-Free Languages
Hierarchy of languages
Lecture3 DFA vs. NFA, properties of RL
Closure Properties for Regular Languages
CS 154, Lecture 3: DFANFA, Regular Expressions.
CSE322 CONSTRUCTION OF FINITE AUTOMATA EQUIVALENT TO REGULAR EXPRESSION Lecture #9.
4. Properties of Regular Languages
Deterministic PDAs - DPDAs
CSE 2001: Introduction to Theory of Computation Fall 2009
Closure Properties of Context-Free languages
FORMAL LANGUAGES, AUTOMATA, AND COMPUTABILITY
Closure Properties of Regular Languages
Properties of Context-Free languages
Closure Properties of Regular Languages
Regular Languages ภาษาปกติ.
Chapter 1 Introduction to the Theory of Computation
Chapter 1 Regular Language
CSCI 2670 Introduction to Theory of Computing
CSE 105 theory of computation
More Undecidable Problems
NFAs accept the Regular Languages
Languages Fall 2018.
CSCE 355 Foundations of Computation
Presentation transcript:

Finite-State Methods in Natural-Language Processing: Basic Mathematics Ronald M. Kaplan and Martin Kay

Regular Languages {  } The empty set and {a} for all a in  are regular languages If L1, L2, and L are regular languages, then so are L1L2 = {xy | x  L1 and y  L2} (concatenation) L1L2 (union) ∞ L* =  Li (Kleene closure) i=0 There are no other regular languages

Correspondence Theorem (Kleene) Every regular language L is accepted by some FSM M(L) L * = U i 

... can all characterize any regular language Types of FSMs + - Deterministic + - -free + - Minimal + - Complete ... can all characterize any regular language

A State is final if any member of the set is final Determinizing : Q    2Q Nondeterministic Deterministic : Q’    Q’ Q’=2Q Search time is linear for a deterministic, but exponential for a nondeterministic FSM A State is final if any member of the set is final

Complete vs. Pruned Pruned: Smaller  is a partial function. No dead states Lookup is faster Dead (failure) state Follow a dashed line iff there is no solid one for the current character

Minimization Minimal machine is unique (up to renaming of states and arc ordering) Minimized: No two states have congruent suffix graphs

Proof Strategy for Language Properties To prove (L1, ... Ln) Get machines M(L1), ... M(Ln) Transform M(L1), ... M(Ln)  M Show L(M) = (L1, ... Ln) L(M) is the language accepted by M Compute with machine — They are finite!

L1  L2 is Regular  = 2 x 1 x a b x 1a 2b 2b is final if 2 and b are final

L is Regular Get deterministic, complete M(L) = <, Q, q, F,  > M = <  , Q, q, F’, > where F’=Q-F L(M)=L? Suppose x not in L x takes M(L) into rQ-F.  x takes M into r  F’  x  L(M). If M(L) were not deterministic, a string x could take M(L) into r  FandsF It would therefore take L(M) into r  F’ands  F’ Suppose x in L x takes M(L) into r  Q.  x takes M into r  Q-F’  x  L(M).

Properties of Regular Languages Closure Intersection Union Complementation Concatenation Iteration Reversal Decidable Predicates Emptiness Equality Finiteness

String Relations n-way concatenation  X = <x1, x2 ... xn> Y = <y1, y2 ... yn>   XY = <x1y1, x2y2 ... xnyn> The n-way concatenation of two string-tuples is the tuple of strings formed by string concatenation of the corresponding elements. One can construct families of string relations that parallel the usual classes of formal languages

A Context-free Relation S  <s, > <(,  > NP VP <),  > NP  <np, > <(,  > DET N <),  > VP  <vp,  > <(,  > V NP <),  > DET  <det, the> N  <n, dog> N  <n, cat> V  <v, chased> s ( np ( det n ) vp ( v np ( det n ))) the dog chased the cat

Regular Relations The empty set and {afor all a in  are regular relations If R1, R2, and R are regular n-relations, then so are R1R2 = {xy | x  R1 and y R2} (concatenation) R1R2 (union) ∞ R* =  Ri (n-way Kleene closure) i=0 There are no other regular relations

n-way Regular Expressions a:b:c* e:f:g denotes {<aie, bif, cig> | i  0}

All Correspondences Every n-way regular expression describes a regular n-relation. Every regular n-relation is described by an n-way regular expression. Every n-tape finite-state transducer accepts a regular n-relation. Every regular n-relation is accepted by an n-tape finite-state transducer.

Regular— ... and the following languages Dom(R) Range(R) L/R R/L x/R If L, L2, and L are regular languages and R1, R2, and R are regular relations, the following relations are regular: R1R R1R2 R* R-1 R  R2 Id(L) L1L2 Rev(R) ... and the following languages Dom(R) Range(R) L/R R/L x/R R/x

Not Regular R1  R2 R

Intersection is not Regular

R is not regular because ... if it were, you could use it, together with union to construct intersection!

n-way Automata—Transducers An n-way automaton is defined by a quintuple similar to the ones that define ordinary finite-state machines (, Q, q, F, ) Where  is a finite alphabet, Q is a finite set of states q  Q is the initial state F in Q is the set of final states  maps Q  to 2Q From now on, we limit our discussion to binary relations and transducers

Union and Iteration 

Concatenation

Range and Domain Dom(R) = R/S* Range(R) = */R Accepting FSMs derived from T(R) by replacing all transition labels a:b by a (domain) or b (range)  Regular languages.

R-1

Id(L)

Parallel extensions for transducers Extending  To state sets To strings The machine accepts a string x just in case is not empty. Parallel extensions for transducers

Cartesian Product Let where Claim: T accepts L1  L2 N.B. L L  ID(L) Let where Claim: T accepts L1  L2

T accepts L1  L2 Proof: by induction. Thus, T enters a final state on <x,y> iff M(L1) enters a final state on x and M(L2) enters a final state on y.

R  R2

Images R/intractable = {intractable, iNtractable} xRy  <x,y>  R x/R = {y | <x,y>  R} R/y = {x | <x,y>  R} R/intractable = {intractable, iNtractable} iNtractable/R = {intractable}

Images are Regular

Rev(R) Start

Pumping Lemma Statei Statei It is possible to delete a part of any sufficiently long substring of a regular language and leave a string the is a member of the language Statei Statei