Comprehension & Compilation in Optimality Theory Jason Eisner Jason Eisner Johns Hopkins University July 8, 2002 — ACL.

Slides:



Advertisements
Similar presentations
Specifying Languages Our aim is to be able to specify languages for use in the computer. The sketch of the FSA is easy for us to understand, but difficult.
Advertisements

Intro to NLP - J. Eisner1 Phonology [These slides are missing most examples and discussion from class …]
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Theory Of Automata By Dr. MM Alam
Finite Automata CPSC 388 Ellen Walker Hiram College.
1 Module 20 NFA’s with -transitions –NFA- ’s Formal definition Simplifies construction –LNFA- –Showing LNFA  is a subset of LNFA (extra credit) and therefore.
Intro to NLP - J. Eisner1 Finite-State Methods.
February 6, 2015CS21 Lecture 141 CS21 Decidability and Tractability Lecture 14 February 6, 2015.
CS21 Decidability and Tractability
1 Regular Expressions and Automata September Lecture #2-2.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Courtesy Costas Busch - RPI1 A Universal Turing Machine.
CS5371 Theory of Computation
1 Module 15 FSA’s –Defining FSA’s –Computing with FSA’s Defining L(M) –Defining language class LFSA –Comparing LFSA to set of solvable languages (REC)
No Free Lunch (NFL) Theorem Many slides are based on a presentation of Y.C. Ho Presentation by Kristian Nolde.
Lecture 18 NFA’s with -transitions –NFA- ’s Formal definition Simplifies construction –LNFA- –Showing LNFA  is a subset of LNFA and therefore a subset.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 12: 10/5.
Fall 2004COMP 3351 A Universal Turing Machine. Fall 2004COMP 3352 Turing Machines are “hardwired” they execute only one program A limitation of Turing.
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
Topics Automata Theory Grammars and Languages Complexities
Theory of Computing Lecture 20 MAS 714 Hartmut Klauck.
Programming Language Semantics Denotational Semantics Chapter 5 Part III Based on a lecture by Martin Abadi.
Lecture 1 Introduction: Linguistic Theory and Theories
[kmpjuteynl] [fownldi]
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
Introduction Chapter 0. Three Central Areas 1.Automata 2.Computability 3.Complexity.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
CS/IT 138 THEORY OF COMPUTATION Chapter 1 Introduction to the Theory of Computation.
The Pumping Lemma for Context Free Grammars. Chomsky Normal Form Chomsky Normal Form (CNF) is a simple and useful form of a CFG Every rule of a CNF grammar.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
Context-free Languages Chapter 2. Ambiguity.
Finite State Transducers for Morphological Parsing
Parsing Introduction Syntactic Analysis I. Parsing Introduction 2 The Role of the Parser The Syntactic Analyzer, or Parser, is the heart of the front.
MA/CSSE 474 Theory of Computation Decision Problems DFSMs.
Computability Review homework. Regular Operations. Nondeterministic machines. NFSM = FSM Homework: By hand, build FSM version of specific NFSM. Try to.
A Universal Turing Machine
Regular Expressions Chapter 6 1. Regular Languages Regular Language Regular Expression Finite State Machine L Accepts 2.
CS 3240 – Chapter 4.  Closure Properties  Algorithms for Elementary Questions:  Is a given word, w, in L?  Is L empty, finite or infinite?  Are L.
Logic Gates Informatics INFO I101 February 3, 2003 John C. Paolillo, Instructor.
Hashing Fundamental Data Structures and Algorithms Margaret Reid-Miller 18 January 2005.
Specifying Languages Our aim is to be able to specify languages for use in the computer. The sketch of an FSA is easy for us to understand, but difficult.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
 Relation: like a function, but multiple outputs ok  Regular: finite-state  Transducer: automaton w/ outputs  b  ? a  ?  aaaaa  ?  Invertible?
Chapter 7 Pushdown Automata
Chapter 3 Regular Expressions, Nondeterminism, and Kleene’s Theorem Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
MA/CSSE 474 Theory of Computation Minimizing DFSMs.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
Principles Rules or Constraints
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
Intro to NLP - J. Eisner1 Phonology [These slides are missing most examples and discussion from class …]
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
Transparency No. 4-1 Formal Language and Automata Theory Chapter 4 Patterns, Regular Expressions and Finite Automata (include lecture 7,8,9) Transparency.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
November 2003Computational Morphology VI1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation.
1 Language Recognition (11.4) Longin Jan Latecki Temple University Based on slides by Costas Busch from the courseCostas Busch
Optimality Theory. Linguistic theory in the 1990s... and beyond!
Unrestricted Grammars
Regular Languages Chapter 1 Giorgi Japaridze Theory of Computability.
1 Local Search for Optimal Permutations Jason Eisner and Roy Tromble with Very Large-Scale Neighborhoods in Machine Translation.
MA/CSSE 474 Theory of Computation How many regular/non-regular languages are there? Closure properties of Regular Languages (if there is time) Pumping.
MA/CSSE 474 Theory of Computation Decision Problems, Continued DFSMs.
Intro to NLP - J. Eisner1 Building Finite-State Machines.
CSE 105 theory of computation
[These slides are missing most examples and discussion from class …]
RE-Tree: An Efficient Index Structure for Regular Expressions
Presentation transcript:

Comprehension & Compilation in Optimality Theory Jason Eisner Jason Eisner Johns Hopkins University July 8, 2002 — ACL

Introduction  This paper is batting cleanup.  Pursues some other people’s ideas to their logical conclusion. Results are important, but follow easily from previous work.  Comprehension: More finite-state woes for OT  Compilation: How to shoehorn OT into finite-state world  Other motivations:  Clean up the notation. (Especially, what counts as “underlying” and “surface” material and how their correspondence is encoded.)  Discuss interface to morphology and phonetics.  Help confused people. I get a lot of .

Computational OT is Mainly Finite-State – Why?  Good news:  Individual OT constraints appear to be finite-state  Bad news (gives us something to work on):  OT grammars are not always finite-state compilation map each input to the best candidate (aggregates several constraints (easy part) and uses them to search (hard part)) evaluate a given candidate (good or bad? how bad?)

Computational OT is Mainly Finite-State – Why?  Good news:  Individual OT constraints appear to be finite-state  Bad news:  OT grammars are not always finite-state  Oops! Too powerful for phonology.  Oops! Don’t support nice computation.  Fast generation  Fast comprehension  Interface with rest of linguistic system or NLP/speech system

Get FS grammar by hook or by crook interface w/ morphology, phonetics… ? OT constraints are generally finite-state Eisner 1997 Main Ideas in Finite-State OT unify these maneuvers? approximate OT Generation algo. from finite-state constraints Ellison 1994 Finite-state constraints don’t yield FS grammar Frank & Satta 1998 change OT Eisner 2000 Karttunen 1998; Gerdemann & van Noord 2000 comprehension? Encode funky represent - ations as strings

Phonology in the Abstract x = “abdip” underlying form in  * z = “a[di][bu]” surface form in  * phonology morphology ab + dip or IN + HOUSE phonetics

OT in the Abstract x = “abdip” underlying form in  * z = “a[di][bu]” surface form in  * y = “aab0[ddii][pb0u]” candidate in ( 

OT in the Abstract x = “abdip” underlying form in  * z = “a[di][bu]” surface form in  * y = “aab0[ddii][pb0u]” candidate in (  can extract x   *

OT in the Abstract x = “abdip” underlying form in  * z = “a[di][bu]” surface form in  * y = “aab0[ddii][pb0u]” candidate in (  can extract z   *

OT in the Abstract x = “abdip” underlying form in  * z = “a[di][bu]” surface form in  * y = “aab0[ddii][pb0u]” candidate to evaluate x  z mapping, just evaluate y ! is z a close variant of x ? (faithfulness) is z easy to pronounce? (well-formedness) y contains all the info: x, z, & their alignment

OT in the Abstract x = “abdip” underlying form in  * z = “a[di][bu]” surface form in  * y = “aab0[ddii][pb0u]” candidate

OT in the Abstract x = “abdip” underlying form in  * z = surface form in  * Y = {“aabbddiipp”, “aab0[ddii][pb0u]”, “[0baa]b0d0i0p0”, …} many candidates

OT in the Abstract x = “abdip” underlying form in  * z = surface form in  * Y = {“aabbddiipp”, “aab0[ddii][pb0u]”, “[0baa]b0d0i0p0”, …} pick the best candidate “a[di][bu]”

OT in the Abstract x = “abdip” Y 0 (x) = {A,B,C,D,E,F,G, …} Gen Z(x) = { “a[di][bu]”, …} Pron Y 1 (x) = { B, D,E, …} constraint 1 Y 2 (x) = { D, …} constraint 2 =“aab0[ddii][pb0u]”, Don’t worry yet about how the constraints are defined.

Pron OT Comprehension? No … x = “abdip” Y 0 (x) = {A,B,C,D,E,F,G, …} Gen Z(x) = { “a[di][bu]”, …} Y 1 (x) = { B, D,E, …} constraint 1 Y 2 (x) = { D, …} constraint 2 =“aab0[ddii][pb0u]”,

Pron OT Comprehension? No … z = “a[di][bu]” Y 2 (z) = {A,B,C,D,E,F,G, …} Gen Y 1 (z) = { B, D,E, …} constraint 1 constraint 2 Y 0 (z) = { D, …} =“aab0[ddii][pb0u]”, X(z) = { “abdip”, …}

x = “abdip” ? Y 0 (x) = {A,B,C,D,E,F,G, …} Y 1 (x) = { B, D,E, …} Y 2 (x) = { D, …} Pron OT Comprehension Looks Hard! Gen Z(x) = { “a[di][bu]”, …} constraint 1 constraint 2 x = “dipu” ? Y 0 (x) = {C,D,G,H,L …} Y 1 (x) = { D, H, …} Y 2 (x) = { H, …} x = “adipu” ? Y 0 (x) = {B,D,K,L,M, …} Y 1 (x) = {B,D, L,M, …} Y 2 (x) = { D, M,…}

Pron OT Comprehension Is Hard! Gen Z(x) = { [], …} constraint 1 Constraint 1: One violation for each a inside brackets (*[ a ]) or b outside brackets (* b ) x = aabbb Y 0 (x) = {[aabbb], aabbb} Y 1 (x) = {[aabbb]} x = aabb Y 0 (x) = {[aabb], aabb} Y 1 (x) = {[aabb], aabb} possible x ’s are all strings where # a ’s  # b ’s ! Not a regular set. x = aaabb Y 0 (x) = {[aaabb], aaabb} Y 1 (x) = { aaabb}

OT Comprehension Is Hard! Constraint 1: One violation for each a inside brackets or b outside brackets possible x ’s are all strings where # a ’s  # b ’s ! Not a regular set.  The constraint is finite-state (we’ll see what this means)  Also, can be made more linguistically natural  If all constraints are finite-state:  Already knew: Given x, set of possible z’s is regular (Ellison 1994)  That’s why Ellison can use finite-state methods for generation  The new fact: Given z, set of possible x’s can be non-regular  So finite-state methods probably cannot do comprehension  Stronger than previous Hiller-Smolensky-Frank-Satta result that the relation (x,z) can be non-regular

Possible Solutions 1.Eliminate nasty constraints  Doesn’t work: problem can arise by nasty grammars of nice constraints (linguistically natural or primitive-OT) 2.Allow only a finite lexicon  Then the grammar defines a finite, regular relation  In effect, try all x’s and see which ones  z  In practice, do this faster by precompilation & lookup  But then can’t comprehend novel words or phrases  Unless lexicon is “all forms of length < 20”; inefficient? 3.Make OT regular “by hook or by crook”

In a Perfect World, Y0, Y1, Y2, … Z Would Be Regular Relations (FSTs) x = “abdip” Y 0 (x) = {A,B,C,D,E,F,G, …} Gen Z(x) = { “a[di][bu]”, …} Pron Y 1 (x) = { B, D,E, …} constraint 1 Y 2 (x) = { D, …} constraint 2 Y0 = Gen is regular construct Y1 from Y0 construct Y2 from Y1 whole system Z = Y2 o Pron

In a Perfect World, Compose FSTs To Get an Invertible, Full-System FST x = “abdip” Z(x) = “a[di][bu]” phonology morphology ab + dip or IN + HOUSE Gen Pron phonetics adibu or language model FSA pronunciation “dictionary” FST (built by OT!) acoustic model FST

How Can We Make Y0, Y1, Y2, … Z Be Regular Relations (FSTs) ? x = “abdip” Y 0 (x) = {A,B,C,D,E,F,G, …} Gen Z(x) = { “a[di][bu]”, …} Pron Y 1 (x) = { B, D,E, …} constraint 1 Y 2 (x) = { D, …} constraint 2 Need to talk now about what the constraints say and how they are used.

A General View of Constraints x = aabbb Y i (x) = {[aabbb], aabbb} Y i+1 (x) = {[aabbb]} One violation for each a inside brackets or b outside brackets x = abdip Y i (x) = {[aabb][ddiipp], aab0[ddii][pb0u], …} Y i+1 (x) = { aab0[ddii][pb0u], …} One violation for each surface coda consonant: b], p], etc. break into 2 steps

A General View of Constraints x = aabbb Y i (x) = {[aabbb], aabbb} One violation for each a inside brackets or b outside brackets x = abdip Y i (x) = {[aabb][ddiipp], aab0[ddii][pb0u], …} One violation for each surface coda consonant: b], p], etc. Y i+1 (x) = {[aabb][ddiipp], aab0[ddii][pb0u], …} Y i+1 (x) = {[aabbb]} Y i+1 (x) = { aab0[ddii][pb0u], …} Y i+1 (x) = {[aabbb], aabbb} constraint harmonic pruning

A General View of Constraints x = aabbb Y i (x) = {[aabbb], aabbb} One violation for each a inside brackets or b outside brackets x = abdip Y i (x) = {[aabb][ddiipp], aab0[ddii][pb0u], …} One violation for each surface coda consonant: b], p], etc. Y i+1 (x) = {[aabb  ][ddiipp  ], aab0[ddii][pb0u], …} Y i+1 (x) = {[aabbb]} Y i+1 (x) = { aab0[ddii][pb0u], …} Y i+1 (x) = {[a  a  bbb], aab  b  b  } constraint harmonic pruning

Why Is This View “General”?  Constraint doesn’t just count  ’s but marks their location  We might consider other kinds of harmonic pruning  Including OT variants that are sensitive to location of  Y i (x) = {[aabbb], aabbb}Y i (x) = {[aabb][ddiipp], aab0[ddii][pb0u], …} Y i+1 (x) = {[aabb  ][ddiipp  ], aab0[ddii][pb0u], …} Y i+1 (x) = {[aabbb]} Y i+1 (x) = { aab0[ddii][pb0u], …} Y i+1 (x) = {[a  a  bbb], aab  b  b  } constraint harmonic pruning

The Harmony Ordering  An OT grammar really has 4 components:  Gen, Pron, harmony ordering, constraint seq.  Harmony ordering compares 2 starred candidates that share underlying material:  Traditional OT says “fewer stars is better”  aab0[ddii][pb0u] > [aabb  ][ddiipp  ]“0 beats 2”  [a  a  bbb] > aab  b  b  “2 beats 3”  Unordered: [a  a  bb], aab  b  “2 vs. 2”  Unordered: aab0[ddii][pb0u], aab  b  b  “ abdip vs. aabbb ” language-specific

Regular Harmony Orderings  A harmony ordering > is a binary relation  If it’s a regular relation, it can be computed by a finite-state transducer H  H accepts (q,r) iff q > r (e.g., [a  a  bbb] > aab  b  b  )  H(q) = range(q o H) = {r: q > r} “set of r’s that are worse than q”  H(Q) = range(Q o H) = U q  Q {r: q > r} “set of r’s that are worse than something in Q” (or if Q is an FST, worse than some output of Q)

Using a Regular Harmony Ordering range(Q o H) = U q  Q {r: q > r} (where H accepts (q,r) iff q > r) “set of starred candidates r that are worse than some output of Q”  Y i is FST that maps each x to its optimal candidates under first i constraints By induction, assume it’s regular! Note: Y i (x)  Y i (x’) =   range(Y i+1 o H ) is set of all suboptimal starred candidates – those comparable to, but worse than, something in Y i+1 Y i (x) = {[aabbb], aabbb} {[a  a  bbb]} Y i+1 (x) = {[a  a  bbb], aab  b  b  } Y i+1 (x) = {[aabbb]}  Y i+1 = Y i o C i+1 maps each x to same candidates, but starred by constraint i+1 C i+1

Using a Regular Harmony Ordering range(Q o H) = U q  Q {r: q > r} (where H accepts (q,r) iff q > r) “set of starred candidates r that are worse than some output of Q” Y i (x) = {[aabbb], aabbb} Y i+1 (x) = {[a  a  bbb], aab  b  b  } Y i (x) = {[aabb][ddiipp], aab0[ddii][pb0u], …} Y i+1 (x) = {[aabb  ][ddiipp  ], aab0[ddii][pb0u], …} {[a  a  bbb]} Y i+1 (x) = {[aabbb]} { aab0[ddii][pb0u], …} Y i+1 (x) = { aab0[ddii][pb0u], …}  range(Y i+1 o H ) is set of all suboptimal starred candidates – those comparable to, but worse than, something in Y i+1 to be removed! C i+1 []a[a  b]b  b 

Using a Regular Harmony Ordering range(Q o H) = U q  Q {r: q > r} (where H accepts (q,r) iff q > r) “set of starred candidates r that are worse than some output of Q” Y i (x) = {[aabbb], aabbb} {[a  a  bbb]} Y i+1 (x) = {[a  a  bbb], aab  b  b  }  Y i is FST that maps each x to its optimal candidates under first i constraints Note: Y i (x)  Y i (x’) =  Y i+1 (x) = {[aabbb]}  Y i+1 = Y i o C i+1 maps each x to same candidates, but starred by constraint i+1  range(Y i+1 o H ) is set of all suboptimal starred candidates – those comparable to, but worse than, something in Y i+1  Delete  ’s to get Y i+1 (by composition with another FST) C i+1  Y i+1 o ~ range(Y i+1 o H ) maps x to just the C i+1 -optimal candidates in Y i+1 (x)

 An OT grammar has 4 components:  Gen, Pron, constraints, harmony ordering  Theorem (by induction):  If all of these are regular relations, then so is the full phonology Z. What Have We Proved? Y 0 (x) Z(x) Y 1 (x) Y 2 (x) C1C1 C2C2 x Y0Y0 Y1Y1 Y2Y2 Z Gen Pron  Z = (Gen oo H C1 oo H C2) o Pron where Y oo H C = Y o C o ~range(Y o C o H ) o D  Generalizes Gerdemann & van Noord 2000  Operator notation follows Karttunen 1998

Consequences: A Family of Optimality Operators oo H  Y o C Inviolable constraint (traditional composition)  Y oo H C Violable constraint with harmony ordering H  Y o+ C Traditional OT: harmony compares # of stars Not a finite-state operator!  Y oo C:Binary constraint: “no stars” > “some stars” This H is a regular relation: Can build an FST that accepts (q,r) iff q has no stars and r has some stars, and q,r have same underlying x Therefore oo is a finite-state operator! If Y is a regular relation and C is a regular constraint, then Y oo C is a regular relation q > r

Consequences: A Family of Optimality Operators oo H  Y o C Inviolable constraint (traditional composition)  Y oo H C Violable constraint with harmony ordering H  Y o+ C Traditional OT: harmony compares # of stars Not a finite-state operator!  Y oo C:Binary constraint: “no stars” > “some stars”  Y oo 3 C Bounded constraint: 0 > 1 > 2 > 3 = 4 = 5 … Frank & Satta 1998; Karttunen 1998 Yields big approximate FSTs that count  Y oo  CSubset approximation to o+ (traditional OT) Gerdemann & van Noord 2000 Exact for many grammars, though not all  Y o> CDirectional constraint (Eisner 2000) Y <o CNon-traditional OT – linguistic motivation

Consequences: A Family of Optimality Operators oo H  For each operator, the paper shows how to construct H as a finite-state transducer.  Z = (Gen oo H C1 oo H C2) o Pron becomes, e.g.,  Z = (Gen oo C1 oo 3 C2) o Pron  Z = (Gen o  C1 oo C2) o Pron  Z = (Gen o> C1 < o C2) o Pron For each operator, the paper shows how to construct H as a finite-state transducer.

 Y oo  CSubset approximation to o+ (traditional OT) Gerdemann & van Noord 2000 Exact for many grammars, not all  As for many harmony orderings, ignores surface symbols. Just looks at underlying and starred symbols. Subset Approximation a  b c d  e abc deabc de > a  b  c d e  incomparable; both survive top candidate wins

 Y o> CDirectional constraint (Eisner 2000) Y <o CNon-traditional OT – linguistic motivation  As for many harmony orderings, ignores surface symbols. Just looks at underlying and starred symbols. Directional Constraints a  b c d  e abc deabc de > a  b c d  e  a  b  c d e always same result as subset approx if subset approx has a result at all if subset approx has a problem, resolves constraints directionally top candidate wins under o> bottom candidate wins under <o Seems to be what languages do, too.

Directional Constraints  So one nice outcome of our construction is an algebraic construction for directional constraints – much easier to understand than machine construction.

Interesting Questions  Are there any other optimality operators worth considering? Hybrids?  Are these finite-state operators useful for filtering nondeterminism in any finite-state systems other than OT phonologies?

Get FS grammar by hook or by crook OT constraints are generally finite-state Eisner 1997 Summary unify these maneuvers? approximate OT Generation algo. from finite-state constraints Ellison 1994 Finite-state constraints don’t yield FS grammar Frank & Satta 1998 change OT Eisner 2000 Karttunen 1998; Gerdemann & van Noord 2000 comprehension? NO YES – everything works great if harmony ordering is made regular and more

FIN