Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.

Similar presentations


Presentation on theme: "A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE."— Presentation transcript:

1 A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE

2 Lauri Karttunen / 24 Aug 2001 / page 2 Overview Introduction What is morphology? Two strains of finite-state morphology State of the art circa 1980. Two-Level Morphology Origins, basic idea Implementations, compilers Recent Developments

3 Lauri Karttunen / 24 Aug 2001 / page 3 What is Morphology? Morphosyntax Words are composed of smalled units of meaning called morphemes that must be combined in a certain order. piti-less-ness vs. *piti-ness-less Morphological Alternations The shape of a morpheme depends on its environment. piti-less vs *pity-less

4 Lauri Karttunen / 24 Aug 2001 / page 4 Sequential Model... Surface form Intermediate form Lexical form fst 1 fst 2 fst n Ordered sequence of rewrite rules (Chomsky & Halle ‘68) can be modeled by a cascade of finite-state transducers Johnson ‘72 Kaplan & Kay ‘81

5 Lauri Karttunen / 24 Aug 2001 / page 5 Parallel Model Set of parallel of two-level rules compiled into finite-state automata interpreted as transducers Koskenniemi ‘83 fst 1 fst 2 fst n... Surface form Lexical form

6 Lauri Karttunen / 24 Aug 2001 / page 6 Sequential vs. Parallel intersect compose FST Perhaps too large to be practical.... Surface form Intermediate form Lexical form fst 1 fst 2 fst n fst 1 fst 2 fst n... Surface form Lexical form

7 Lauri Karttunen / 24 Aug 2001 / page 7 State of the Art circa 1980 Cut-and-paste analysis leaves --> leave --> leav --> leaf ad-hoc programs, not reversible for generation Paradigm tables comprendre 45 not reversible for analysis, impractical for morphologically complex languages Chomsky-Halle rewrite rules x -> y / z _ w computationally complex, no implementation, reversible?

8 Lauri Karttunen / 24 Aug 2001 / page 8 Discovery and Rediscovery C. Douglas Johnson (1972) showed that –phonological rewrite rules are interpreted in a way that makes them less powerful than they appear –rewrite rules can be modeled by finite transducers –for any two finite transducers applied in a sequence there exists an equivalent single transducer (Schützenberger 1961). Johnson’s result was ignored and forgotten, rediscovered by Ronald M. Kaplan and Martin Kay at Xerox around 1980.

9 Lauri Karttunen / 24 Aug 2001 / page 9 Sequential Application N -> m / _ p p -> m / m _ k a N p a n k a m p a n k a m m a n

10 Lauri Karttunen / 24 Aug 2001 / page 10 Sequential Application in Detail N:m N ? ? 0 2 1 p m p N m p:m ? ? 0 1 m p m k a N p a n k a m p a n k a m m a n 0 0 0 2 0 0 0 0 0 0 1 0 0 0

11 Lauri Karttunen / 24 Aug 2001 / page 11 Composition N:m N ? ? 0 3 1 m p N ? m 2 p:m N m N:m k a N p a n k a m m a n 0 0 0 3 0 0 0

12 Lauri Karttunen / 24 Aug 2001 / page 12 Building a Compiler Requires a finite-state calculus concatenation, union, intersection, complementation... Constraints are regular languages “if p occurs then q follows”... p.... q.... ?* p ?* q ?* ~[ ?* p ~[ ?* q ?* ]] The idea of double negation was Kaplan and Kay’s first insight. Many details remained to be worked out.

13 Lauri Karttunen / 24 Aug 2001 / page 13 The Problem of “Overanalysis” k a m m a n k a m p a n k a m m a n k a N p a n

14 Lauri Karttunen / 24 Aug 2001 / page 14 The Birth of Two-Level Morpholgy In the spring of 1981 Kimmo Koskenniemi came to UT at Austin in search of a dissertation topic. Karttunen demoed his TEXFIN analyzer/generator for Finnish. Kaplan and Kay briefed him about their discoveries. Koskenniemi visited PARC. After a gestation period of about a year, two- level morphology was born.

15 Lauri Karttunen / 24 Aug 2001 / page 15 The Three Ideas of Two-Level Morphology Rules are symbol-to-symbol constraints that are applied in parallel, not sequentially like rewrite rules. The constraints can refer to the lexical context, to the surface context or to both contexts at the same time. Lexical lookup and morphological analysis are performed in tandem.

16 Lauri Karttunen / 24 Aug 2001 / page 16 Two-Level Constraints 1 k a N p a n k a m m a n k a N p a n k a m m a n N:m correspondence requires a following p on the lexical side. p:m correspondence requires a preceding m on the surface side. In this context, all other possible realization of a lexical p are prohibited. In this context, all other possible realization of a lexical N are prohibited.

17 Lauri Karttunen / 24 Aug 2001 / page 17 Two-Level Constraints 2 s p y 0 + s s p i e 0 s y:i _ 0:e s p y 0 + s s p i e 0 s 0:e Cons: y: _ +: s:

18 Lauri Karttunen / 24 Aug 2001 / page 18 Parallel Application N:m Rule p:m Rule k a m m a n k a N p

19 Lauri Karttunen / 24 Aug 2001 / page 19 Lookup and Analysis in Tandem k a N p N:m Rule p:m Rule k a m m a n

20 Lauri Karttunen / 24 Aug 2001 / page 20 Two-Level Implementations 1982 Koskenniemi (Pascal) 1983 Karttunen et al. at UTexas (Lisp) 1986- Antworth et al. at SIL (C) 1987 Black et al. Alvey Project (Lisp) 1989 Beesley Alpnet (Lisp) 1991 Pulman et al. ALEP (Prolog) 1995 Carter SRI CLE (Prolog) 1995 Petitpierre et al. MULTEXT (C)

21 Lauri Karttunen / 24 Aug 2001 / page 21 Two-Level Rule Compilers 1985 Kaplan and Koskenniemi: the basic compilation algorithm developed during Koskenniemi’s visit at CSLI at Stanford on a Dandelion (Xerox Lisp machine). It was based on the techniques Kaplan and Kay had developed for compiling rewrite-rules. 1985-87 Koskenniemi and Karttunen: the first compiler 1992 Current C version (twolc) by Karttunen and Beesley. 1996 Grimley-Evans, Kiraz, Pulman: compiler for a “partition-based” two-level formalism

22 Lauri Karttunen / 24 Aug 2001 / page 22 Seeds of Dissatisfaction Two-level morphological analyzers became a standard component in natural language processing systems. But there was no publicly available compiler until recently. Morphotactics was “improved” by adding feature unification.Two-level analyzers acquired a reputation for being slow. Two-level rules are notoriously difficult to write, even with a compiler.

23 Lauri Karttunen / 24 Aug 2001 / page 23 Rule Conflicts Resolution by underspecification: k:0 | k:v Vowel _ Vowel k:v u _ u u _ u Vowel _ Vowel k:0 k:v makun ma un Exception pukun puvun General rule

24 Lauri Karttunen / 24 Aug 2001 / page 24 Recent Developments The pioneers of finite-state morphology new that a cascade of transducers or a set of parallel rules could be combined into a single transducer. But the resulting single transducer is typically huge compared to the size of the original rule networks. Impractical in most cases. The obvious solution, not seen for a long time, was to compose the rules with the lexicon.

25 Lauri Karttunen / 24 Aug 2001 / page 25 Lexical Transducer Source Lexicon R1R2Rn... Lexical Transducer && o Karttunen, Kaplan, Zaenen 1992 intersection composition inflected form canonical forminflection codes s spy0+Noun+PL peis0

26 Lauri Karttunen / 24 Aug 2001 / page 26 Cascade of Compositions Source Lexicon R1 Cj Rn... Ci o o o o replace rule constraint Lexical Transducer composition

27 Lauri Karttunen / 24 Aug 2001 / page 27 Linguistic Issues The idea of rules as parallel constraints was not picked up by mainstream linguists in the 80’s. Many arguments had been advanced to show that phonological alternations could not be described or explained without sequential rewrite rules. The two-level model was perceived as a computational “hack”, not worthy of academic interest.

28 Lauri Karttunen / 24 Aug 2001 / page 28 Rise of Optimality Theory Optimality Theory, the dominant paradigm in phonology since 1993 is a two-level model with parallel constraints. Most optimality constraints can be encoded trivially as two-level rules. The main difference is that OT constraints are ranked and violable.

29 Lauri Karttunen / 24 Aug 2001 / page 29 Back to the Big Picture... Surface form Intermediate form Lexical form fst 1 fst 2 fst n fst 1 fst 2 fst n... Surface form Lexical form While the sequential model was popular among mainstream linguists, computational linguists preferred the parallel model. Now it is almost the other way round, although for computational linguists there is no substantive difference.


Download ppt "A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE."

Similar presentations


Ads by Google