 Relation: like a function, but multiple outputs ok  Regular: finite-state  Transducer: automaton w/ outputs  b  ? a  ?  aaaaa  ?  Invertible?

Slides:



Advertisements
Similar presentations
Intro to NLP - J. Eisner1 Phonology [These slides are missing most examples and discussion from class …]
Advertisements

CSE 311 Foundations of Computing I
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
1 Constraint Satisfaction Problems A Quick Overview (based on AIMA book slides)
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Efficient Generation in Primitive Optimality Theory Jason Eisner University of Pennsylvania ACL
Algorithms + L. Grewe.
Intro to NLP - J. Eisner1 Finite-State Methods.
Formal Language, chapter 9, slide 1Copyright © 2007 by Adam Webber Chapter Nine: Advanced Topics in Regular Languages.
Windows Scheduling Problems for Broadcast System 1 Amotz Bar-Noy, and Richard E. Ladner Presented by Qiaosheng Shi.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
61 Nondeterminism and Nodeterministic Automata. 62 The computational machine models that we learned in the class are deterministic in the sense that the.
Finite state automaton (FSA)
Normal forms for Context-Free Grammars
Topics Automata Theory Grammars and Languages Complexities
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
[kmpjuteynl] [fownldi]
Regular Model Checking Ahmed Bouajjani,Benget Jonsson, Marcus Nillson and Tayssir Touili Moran Ben Tulila
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 7 Mälardalen University 2010.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
SVM by Sequential Minimal Optimization (SMO)
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Intro to NLP - J. Eisner1 Finite-State and the Noisy Channel.
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
1 Outline Informal sketch of lexical analysis –Identifies tokens in input string Issues in lexical analysis –Lookahead –Ambiguities Specifying lexers –Regular.
PZ02B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ02B - Regular grammars Programming Language Design.
Linguistic Theory Lecture 10 Grammaticality. How do grammars determine what is grammatical? 1 st idea (traditional – 1970): 1 st idea (traditional – 1970):
Grammars CPSC 5135.
Finite State Transducers for Morphological Parsing
Lexical Analysis Constructing a Scanner from Regular Expressions.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Comprehension & Compilation in Optimality Theory Jason Eisner Jason Eisner Johns Hopkins University July 8, 2002 — ACL.
Copyright © Curt Hill Languages and Grammars This is not English Class. But there is a resemblance.
Models of Linguistic Choice Christopher Manning. 2 Explaining more: How do people choose to express things? What people do say has two parts: Contingent.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Intro to NLP - J. Eisner1 Building Finite-State Machines.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Copyright © Curt Hill Finite State Automata Again This Time No Output.
CS 3240 – Chapter 4.  Closure Properties  Algorithms for Elementary Questions:  Is a given word, w, in L?  Is L empty, finite or infinite?  Are L.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
Chapter 6 Properties of Regular Languages. 2 Regular Sets and Languages  Claim(1). The family of languages accepted by FSAs consists of precisely the.
11 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 7 School of Innovation, Design and Engineering Mälardalen University 2012.
A Fast Finite-state Relaxation Method for Enforcing Global Constraints on Sequence Decoding Roy Tromble & Jason Eisner Johns Hopkins University.
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
Lecture 16: Modeling Computation Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output.
Intro to NLP - J. Eisner1 Phonology [These slides are missing most examples and discussion from class …]
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
using Deterministic Finite Automata & Nondeterministic Finite Automata
CS 154 Formal Languages and Computability February 11 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
Optimality Theory. Linguistic theory in the 1990s... and beyond!
Bottom Up Parsing CS 671 January 31, CS 671 – Spring Where Are We? Finished Top-Down Parsing Starting Bottom-Up Parsing Lexical Analysis.
CSCI 4325 / 6339 Theory of Computation Zhixiang Chen.
1 LING 696B: Maximum-Entropy and Random Fields. 2 Review: two worlds Statistical model and OT seem to ask different questions about learning UG: what.
Pushdown Automata Chapter 12. Recognizing Context-Free Languages Two notions of recognition: (1) Say yes or no, just like with FSMs (2) Say yes or no,
Theory of Languages and Automata By: Mojtaba Khezrian.
MA/CSSE 474 Theory of Computation How many regular/non-regular languages are there? Closure properties of Regular Languages (if there is time) Pumping.
Intro to NLP - J. Eisner1 Building Finite-State Machines.
CS314 – Section 5 Recitation 9
[These slides are missing most examples and discussion from class …]
G. Pullaiah College of Engineering and Technology
Syntax Specification and Analysis
Chapter Nine: Advanced Topics in Regular Languages
Building Finite-State Machines
Constraint Satisfaction Problems
Presentation transcript:

 Relation: like a function, but multiple outputs ok  Regular: finite-state  Transducer: automaton w/ outputs  b  ? a  ?  aaaaa  ?  Invertible?  Closed under composition? Regular Relation (of strings) b:b a:a a:a: a:c b:b: b:b ?:c ?:a ?:b a little pre-talk review

 Can weight the arcs:  vs.   a  {}b  {b}  aaaaa  {ac, aca, acab, acabc}  How to find best outputs?  For aaaaa?  For all inputs at once? Regular Relation (of strings) b:b a:a a:a: a:ca:c b:b: b:b ?:c?:c ?:a ?:b a little pre-talk review

Jason Eisner U. of Rochester August 3, 2000 – COLING - Saarbrücken Directional Constraint Evaluation in OT

Synopsis: Fixing OT’s Power  Consensus: Phonology = regular relation E.g., composition of little local adjustments (= FSTs)  Problem: Even finite-state OT is worse than that Global “counting” (Frank & Satta 1998)  Problem: Phonologists want to add even more Try to capture iterativity by Gen. Alignment constraints  Solution: In OT, replace counting by iterativity Each constraint does an iterative optimization

Outline  Review of Optimality Theory  The new “directional constraints” idea  Linguistically: Fits the facts better  Computationally: Removes excess power  Formal stuff  The proposal  Compilation into finite-state transducers  Expressive power of directional constraints

What Is Optimality Theory?  Prince & Smolensky (1993)  Alternative to stepwise derivation  Stepwise winnowing of candidate set Gen Constraint 1 Constraint 2 Constraint 3 input... output such that different constraint orders yield different languages

Filtering, OT-style constraint would prefer A, but only allowed to break tie among B,D,E  = candidate violates constraint twice 

A Troublesome Example Input: bantodibo HarmonyFaithfulness ban.to.di.bo  ben.ti.do.bu  ban.ta.da.ba  bon.to.do.bo   “Majority assimilation” – impossible with FST - - and doesn’t happen in practice!

Outline  Review of Optimality Theory  The new “directional constraints” idea  Linguistically: Fits the facts better  Computationally: Removes excess power  Formal stuff  The proposal  Compilation into finite-state transducers  Expressive power of directional constraints

An Artificial Example NoCoda  ban.to.di.bo  ban.ton.di.bo  ban.to.dim.bon  ban.ton.dim.bon  Candidates have 1, 2, 3, 4 violations of NoCoda

An Artificial Example CNoCoda ban.to.di.bo   ban.ton.di.bo  ban.to.dim.bon  ban.ton.dim.bon  Add a higher-ranked constraint This forces a tradeoff: ton vs. dim.bon

An Artificial Example C 11 22 33 44 ban.to.di.bo   ban.ton.di.bo  ban.to.dim.bon  ban.ton.dim.bon  NoCoda Imagine splitting NoCoda into 4 syllable-specific constraints

An Artificial Example C 11 22 33 44 ban.to.di.bo  ban.ton.di.bo   ban.to.dim.bon  ban.ton.dim.bon  NoCoda Imagine splitting NoCoda into 4 syllable-specific constraints Now ban.to.dim.bon wins - more violations but they’re later

An Artificial Example C 44 33 22 11 ban.to.di.bo   ban.ton.di.bo  ban.to.dim.bon  ban.ton.dim.bon  NoCoda For “right-to-left” evaluation, reverse order (  4 first)

Outline  Review of Optimality Theory  The new “directional constraints” idea  Linguistically: Fits the facts better  Computationally: Removes excess power  Formal stuff  The proposal  Compilation into finite-state transducers  Expressive power of directional constraints

When is Directional Different?  The crucial configuration: 11 22 33 44 ban.to.di.bo  ban.ton.di.bo   ban.to.dim.bon   Forced location tradeoff  Can choose where to violate, but must violate somewhere  Locations aren’t “orthogonal” solve location conflict by ranking locations (sound familiar?)

When is Directional Different?  The crucial configuration: 11 22 33 44 ban.to.di.bo  ban.ton.di.bo   ban.to.dim.bon   But if candidate 1 were available …

When is Directional Different?  But usually locations are orthogonal: 11 22 33 44  ban.to.di.bo  ban.ton.di.bo  ban.to.dim.bon   Usually, if you can satisfy  2 and  3 separately, you can satisfy them together  Same winner under either counting or directional eval. (satisfies everywhere possible)

Linguistic Hypothesis  Q: When is directional evaluation different?  A: When something forces a location tradeoff.  Hypothesis: Languages always resolve these cases directionally.

Test Cases for Directionality  Prosodic groupings  Syllabification [CV.CVC.CV] V [CVC.CV.CV] V Analysis: NoInsert is evaluated L-to-R Cairene Arabic L-to-R syllabification Iraqi Arabic R-to-L syllabification Analysis: NoInsert is evaluated R-to-L In a CV(C) language, /CVCCCV/ needs epenthesis

Test Cases for Directionality  Prosodic groupings  Syllabification [CV.CVC.CV]vs. [CVC.CV.CV]  Footing  R-to-L Parse-   unattested  L-to-R Parse-  With binary footing,  must have lapse

Test Cases for Directionality  Prosodic groupings  Syllabification [CV.CVC.CV]vs. [CVC.CV.CV]  Footing  vs.   Floating material  Lexical:  Tone dockingban.tó.di.bo vs. ban.to.di.bó  Infixationgrumadwetvs. gradwumet  Stress “end rule” (bán.to)(di.bo) vs. (ban.to)(dí.bo)  OnlyStressFootHead, HaveStress » NoStress (L-R)  Harmony and OCP effects

Generalized Alignment  Phonology has directional phenomena  [CV.CVC.CV] vs. [CVC.CV.CV] - both have 1 coda, 1 V  Directional constraints work fine  But isn’t Generalized Alignment fine too?  Ugly  Non-local; uses addition  Not well formalized  Measure “distance” to “the” target “edge”  Way too powerful  Can center tone on word, which is not possible using any system of finite-state constraints (Eisner 1997)

Outline  Review of Optimality Theory  The new “directional constraints” idea  Linguistically: Fits the facts better  Computationally: Removes excess power  Formal stuff  The proposal  Compilation into finite-state transducers  Expressive power of directional constraints

Computational Motivation  Directionality not just a substitute for GA  Also a substitute for counting  Frank & Satta 1998: OTFS > FST (Finite-state OT is more powerful than finite-state transduction)

Why OTFS > FST?  It matters that OT can count  HeightHarmony » HeightFaithfulness  Input: to.tu.to.to.tu  Output: to.to.to.to.to vs. tu.tu.tu.tu.tu prefer candidate with fewer faithfulness violations  Majority assimilation (Baković 1999, Lombardi 1999)  Beyond FST power - fortunately, unattested can both be implemented by weighted FSAs

Why Is OT > FST a Problem?  Consensus: Phonology = regular relation  OT supposed to offer elegance, not power  FSTs have many benefits!  Generation in linear time (with no grammar constant)  Comprehension likewise (cf. no known OTFS algorithm)  Invert the FST  Apply in parallel to weighted speech lattice  Intersect with lexicon  Compute difference between 2 grammars

Making OT=FST: Proposals  Approximate by bounded constraints  Frank & Satta 1998, Karttunen 1998  Allow only up to 10 violations of NoCoda  Yields huge FSTs - cost of missing the generalization  Another approximation  Gerdemann & van Noord 2000  Exact if location tradeoffs are between close locations  Allow directional and/or bounded constraints only  Directional NoCoda correctly disprefers all codas  Handle location tradeoffs by ranking locations  Treats counting as a bug, not a feature to approximate

Outline  Review of Optimality Theory  The new “directional constraints” idea  Linguistically: Fits the facts better  Computationally: Removes excess power  Formal stuff  The proposal  Compilation into finite-state transducers  Expressive power of directional constraints

11 22 33 44 ban.ton.di.bo   ban.to.dim.bon  ban.ton.dim.bon  Tuples  Violation levels aren’t integers like   They’re integer tuples, ordered lexicographically NoCoda

Tuples  Violation levels aren’t integers like   They’re integer tuples, ordered lexicographically  But what about candidates with 5 syllables?  And syllables aren’t fine-grained enough in general 11 22 33 44 ban.ton.di.bo1100  ban.to.dim.bon1011 ban.ton.dim.bon1111 NoCoda

Alignment to Input  Split by input symbols, not syllables  Tuple length = input string length + 1 Input: bantodibo Output: bantondibo For this input (length 9), NoCoda assigns each output candidate a 10-tuple Possible because output is aligned with the input So each output violation associated with an input position

Alignment to Input  Split by input symbols, not syllables  Tuple length = input length + 1, for all outputs Input: bantodibo Output: bantondibo bantodimbon

Alignment to Input  Split by input symbols, not syllables  Tuple length = input length + 1, for all outputs Input: bantodibo Output: bantondibo bantodimbon ibantondimtimbonnn

Alignment to Input  Split by input symbols, not syllables  Tuple length = input length + 1, for all outputs Input: bantodibo Output: bantondibo ibantondimtimbonnn does not count as “postponing” n so this candidate doesn’t win (thanks to alignment) unbounded

Finite-State Approach Gen Constraint 1 Constraint 2 Constraint 3 input... output T0 = Gen T1 maps each input to all outputs that survive constraint 1 T3 = the full grammar T2

Finite-State Approach  FST maps each input to set of outputs (nondeterministic mapping)  The transducer gives an alignment Input: bantodibo Output: ibantondimtimbonnn Output: T2 FST i:im  :t  :im

Finite-State Machines  FST maps each input to set of outputs Input: bantodibo Output: ibantondimtimbonnn Output: T2 FST

Finite-State Machines  FST maps each input to set of aligned outputs  Constraint is a weighted FSA that reads candidate Input: bantodibo Output: ibantondimtimbonnn Output: NoCoda WFSA T2 FST

Finite-State Machines  FST maps input to aligned candidates (nondeterm.)  Constraint is a weighted FSA that reads candidate Input: bantodibo Output: ibantondimtimbonnn Output: T2 FST NoCoda WFSA

Finite-State Machines  FST maps input to aligned candidates (nondeterm.)  Constraint is a weighted FSA that reads candidate  Sum weights of aligned substrings to get our tuple Input: bantodibo Output: ibantondimtimbonnn Output: T2 FST NoCoda WFSA Remark: OTFS would just count a total of 7 viols

Similar Work  Bounded Local Optimization  Walther 1998, 1999 (for DP)  Trommer 1998, 1999 (for OT)  An independent proposal  Motivated by directional syllabification  Greedy pruning of a candidate-set FSA  Violations with different prefixes are incomparable  No alignment, so insertion can postpone violations  No ability to handle multiple inputs at once (FST)

Outline  Review of Optimality Theory  The new “directional constraints” idea  Linguistically: Fits the facts better  Computationally: Removes excess power  Formal stuff  The proposal  Compilation into finite-state transducers  Expressive power of directional constraints

The Construction  Our job is to construct T3 - a “filtered” version of T2  First compose T2 with NoCoda … Input: bantodibo Output: ibantondimtimbonnn Output: T2 FST NoCoda WFSA i:imtim

The Construction  Our job is to construct T3 - a “filtered” version of T2  First compose T2 with NoCoda to get a weighted FST Input: bantodibo Output: ibantondimtimbonnn Output: WFST i:im  :t  :im

The Construction  Our job is to construct T3 - a “filtered” version of T2  First compose T2 with NoCoda to get a weighted FST  Now prune this weighted FST to obtain T3  Keep only the paths with minimal tuples: Directional Best Paths Input: bantodibo Output: ibantondimtimbonnn Output: WFST i:im  :t  :im

Directional Best Paths (sketch)  Handle all inputs simultaneously!  Must keep best outputs for each input: at least 1. For input abc : abc axc For input abd : axd Must allow red arc just if next input is d a:a b:b b:xb:x c:c d:d In this case, just make state 6 non-final

Directional Best Paths (sketch)  Must pursue counterfactuals  Recall determinization (2 n states)  DFA simulates a parallel traverser of the NDFA  “What states could I be in, given input so far?”  Simulate a neurotic traverser of the WFST  “If I had taken a cheaper (greedier) path on the input so far, what states could I be in right now?”  Shouldn’t proceed to state q if there was a cheaper path to q on same input  Shouldn’t terminate in state q if there was a cheaper terminating path (perhaps to state r) on same input  3 n states: track statesets for equal and cheaper paths

Outline  Review of Optimality Theory  The new “directional constraints” idea  Linguistically: Fits the facts better  Computationally: Removes excess power  Formal stuff  The proposal  Compilation into finite-state transducers  Expressive power of directional constraints

Expressive Power bounded constraints traditional (summing) constraints directional constraints a traditional constraint with > FST power can’t be replaced by any system of directional constraints a directional constraint making exponentially many distinctions can’t be replaced by any system of trad. finite-state constraints *b (L-R) sorts {a,b} n alphabetically

Future Work  Further empirical support?  Examples where 1 early violation trades against 2 late violations of the same constraint?  How do directional constraints change the style of analysis?  How to formulate constraint families? (They must specify precisely where violations fall.)

An Old Slide (1997) FST < OTFS < OTFS + GA Should we pare OT back to this level? Hard to imagine making it any simpler than Primitive OT. Same power as Primitive OT (formal linguistic proposal of Eisner 1997) Should we beef OT up to this level, by allowing GA? Ugly mechanisms like GA weren’t needed before OT.

The New Idea (2000) FST < OTFS < OTFS + GA Should we pare OT back to this level? Hard to imagine making it any simpler than Primitive OT. Same power as Primitive OT (formal linguistic proposal of Eisner 1997) Should we beef OT up to this level, by allowing GA? Ugly mechanisms like GA weren’t needed before OT. (summation) directionality = =