Functional Morphology by Markus Forsberg & Aarne Ranta Otakar Smrž Institute of Formal and Applied Linguistics Charles University in Prague.

Slides:



Advertisements
Similar presentations
8 th June 2009EA TRS Study Consortium1 Enterprise Architecture TRS Dissemination Workshop Overview of Study Results & Recommendations presented by Ian.
Advertisements

Mitsunori Ogihara Center for Computational Science
MAIN NOTIONS OF MORPHOLOGY
Corpus Processing and NLP
Towards a Morphological Analyzer for Old Norse. Morpholog. Analyzer - CHLT Introduction Goal: a computer program that analyzes morphological structure.
Finite-State Transducers: Applications in Natural Language Processing Heli Uibo Institute of Computer Science University of Tartu
ISO DSDL ISO – Document Schema Definition Languages (DSDL) Martin Bryan Convenor, JTC1/SC18 WG1.
A Short History of Two-Level Morphology Lauri Karttunen, Xerox PARC Kenneth R. Beesley, XRCE.
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
Beesley 2001 Finite-State Technology and Linguistic Applications March 2001 Xerox Research Centre Europe Grenoble Laboratory 6, chemin de Maupertuis.
October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.
Prague Arabic Dependency Treebank Center for Computational Linguistics Institute of Formal and Applied Linguistics Charles University in Prague MorphoTrees.
Brief introduction to morphology
1 Words and the Lexicon September 10th 2009 Lecture #3.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1/7 INFO60021 Natural Language Processing Harold Somers Professor of Language Engineering.
Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.
 2003 CSLI Publications Ling 566 Oct 16, 2007 How the Grammar Works.
Semantics with Applications Mooly Sagiv Schrirber html:// Textbooks:Winskel The.
1. 2 Content WSK Online is a new online database of specialized dictionaries covering all the major areas of linguistics and communication science: Biannual.
Building the Valency Lexicon of Arabic Verbs Viktor Bielický Otakar Smrž LREC 2008, Marrakech, Morocco.
Finite-state automata 3 Morphology Day 14 LING Computational Linguistics Harry Howard Tulane University.
1 CSCE Programming Languages Introduction and Course Administration Dr. Hyunyoung Lee 410B HR Bright
EMELD Workshop on Digitizing Lexical Information Modeling Lexical Entries in Bilingual Dictionaries —Or— Exegeting the UML Model Mike Maxwell Linguistic.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Finite-State Methods in Natural Language Processing Lauri Karttunen LSA 2005 Summer Institute August 1, 2005.
W3C Workshop, Beijing, 2nd of November 2005 An extension to the SSML for diacritics auto-completion R&D Centre Vocal Services Section.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
SALA October 2001 Design of an Electronic Sanskrit Reader SALA XXI, Konstanz, October 2001 Gérard Huet INRIA.
What is linguistics  It is the science of language.  Linguistics is the systematic study of language.  The field of linguistics is concerned with the.
An Algorithm to Align Words for Historical Comparison Michael A. Covington (The University of Georgia) Journal of Computational Linguistics 1996 February.
Declining a Latin Noun.
Lecture 3, 7/27/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 3 27 July 2005.
PETRA – the Personal Embedded Translation and Reading Assistant Werner Winiwarter University of Vienna InSTIL/ICALL Symposium 2004 June 17-19, 2004.
Computer Science 101 Web Services. Alsos Search for Niels Bohr.
Linguistics week 12 Morphology 2.
Reasons to Study Lexicography  You love words  It can help you evaluate dictionaries  It might make you more sensitive to what dictionaries have in.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
Morphological Analysis of Hungarian in NooJ
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
SIL FieldWorks Language Explorer: The lexicon component Gary Simons SIL International Lexicon Tools and Lexicon Standards Nijmegen, 4–5 August 2010.
An Ambiguity-Controlled Morphological Analyzer for Modern Standard Arabic By: Mohammed A. Attia Abbas Al-Julaih Natural Language Processing ICS.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Natural Language Processing Chapter 2 : Morphology.
FST Morphology Miriam Butt October 2003 Based on Beesley and Karttunen 2003.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
Morphological Analysis Functional Morphology Daniel Zeman
November 2003Computational Morphology VI1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation.
Modern lexicography in Iceland 10th annual conference of EFNIL at Budapest October Guðrún Kvaran - University of Iceland.
1 ACCESSING THE PURDUE LIBRARY DATABASES AND ONLINE JOURNALS September 14, 2006.
Creating Speech Recognizers Quickly Björn Bringert Department of Computer Science and Engineering Chalmers.
CIS, Ludwig-Maximilians-Universität München Computational Morphology
CS 326 Programming Languages, Concepts and Implementation
Composition is Our Friend
Assessing Grammar Module 5 Activity 5.
Natural Language Processing (NLP)
Formal Language Theory
Prague Arabic Dependency Treebank
Martin Kay Stanford University
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
CS4705 Natural Language Processing
Natlanco Senior Supervisor:
Natural Language Processing (NLP)
CSCE 314: Programming Languages Dr. Dylan Shell
Natural Language Processing (NLP)
Presentation transcript:

Functional Morphology by Markus Forsberg & Aarne Ranta Otakar Smrž Institute of Formal and Applied Linguistics Charles University in Prague

February 3, 2005Linguistic Data Consortium2 Functional Morphology  Implementing morphological models  Programming environment within Haskell  Extensible, powerful, language-independent  Markus Forsberg & Aarne Ranta  Chalmers University of Technology  September 2004, International Conference on Functional Programming  Inspired by Gérard Huet’s toolkit Zen  Computational processing of Sanskrit, 2002

February 3, 2005Linguistic Data Consortium3 Outline of the Talk  Little bit of Theory and Research  Karttunen, Stump, Buckwalter, Maxwell, Huet  Finite-state modeling of morphology  Regular relations, finite-state transducers  Two-level morphology, lexicons and grammars  Functional Morphology  Features, concepts, implementation issues  Demo of the system – formats, applications  Meeting requirements of different languages

February 3, 2005Linguistic Data Consortium4 Linguistic Perspective  Inflectional morphology is understood in various ways (Stump 2001)  Description of the inflectional processes  Inferentialrules, paradigms  Lexicaldecomposition, affixation  Preferred direction of consideration  Realizationalforms reflect parameters  Incrementalmorphs identify features

February 3, 2005Linguistic Data Consortium5 Decisive Evidence  Extended morphological exponence  One or more markings of a single property  Null morphological exponence  Composition/decomposition not equivalent  Non-concatenative inflection  Why restrict morphological operations to concatenation? good < better << best * good|er << good|est dobr|ý < lep|ší << nej|lep|ší * dobř|ejší << nej|dobř|ejší

February 3, 2005Linguistic Data Consortium6 Computational Concern  Morphology can be captured by finite-state networks (Beesley and Karttunen 2003)  Implementationregular expressions, right linear grammars  Complexitylinear runtime, advanced compilation techniques  Efficiencyfast, but large networks  Non-regular formalisms might be difficult to implement efficiently enough

February 3, 2005Linguistic Data Consortium7 Efficiency vs. Expressivity  Xerox Finite-State Tools like xfst, lexc  Languages of Europe, Arabic, Korean, Malay  AT&T, Inxight,..., open-source FS tools  Hybrid systems – Buckwalter’s Analyzer  DATR/KATR, MORPHE, Hermit Crab, …  Functional Morphology in Haskell, Zen in Objective Caml – compiled into tries

February 3, 2005Linguistic Data Consortium8 Languages as Networks  Languages are sets of sequences of symbols  Networks with limited number of states  Sequences of symbols recorded in arcs n i c e i r h t h h g e e r g g it h n i c e i r h t h g e g g i h

February 3, 2005Linguistic Data Consortium9 Languages as Networks  Languages are sets of sequences of symbols  Networks with limited number of states  Sequences of symbols recorded in arcs n i c e i r h t h h g e e r g g it h n i c e i r h t h g e g g i h

February 3, 2005Linguistic Data Consortium10 REs and RLGs  Regular expressions describe such networks L =(nicer|night|higher|height) listing =(ni(cer|ght)|h(eight|igher)) prefix- =((nic|high)er|(n|he)ight) suffix trie  Right linear grammars / lexicons do as well ADJ ->{nice,high,happy}{CMP,{}} where CMP ->{+er} deriving from L’ ->{ADJ,{}} L’ = {nice,nice+er,happy+er,high,…} or even {nice/ADJ+er/CMP,high/ADJ,…}

February 3, 2005Linguistic Data Consortium11 Regular Relations  Networks can convert input into output  Two languages – lexical/upper : surface/lower L” = {nice/ADJ+er/CMP:nicer,high/ADJ:high, happy/ADJ+er/CMP:happier,…} regular relation  Invertible structure, analysis iff synthesis  Networks can be composed one over another  Building relations is not trivial!  Two-level rules for orthographical alternations  Every information merges into untyped string

February 3, 2005Linguistic Data Consortium12 Not Only Finite-State (Beesley)  Flag diacritics vs. network multiplication ~$[ Art%+ ?* %+Indef ].o. the filter in xfst  lecture-notes/Beesley2004/thupm.html lecture-notes/Beesley2004/thupm.html

February 3, 2005Linguistic Data Consortium13 Burning Issues (Karttunen)  Non-concatenative phenomena like interdigitation or reduplication  Non-local dependencies  Syntax/morphology interface  Handouts/karttunen.ppt Handouts/karttunen.ppt

February 3, 2005Linguistic Data Consortium14 More Burning Issues  Does the direct coding allow to implement one’s linguistic abstraction adequately?  Correspondence of formulations, expressivity  Is the model extensible and reusable?  How much will it cost to add a lexical item?  Will refinement of information require global re-design, and/or will it cause inconsistencies?  How can it be integrated into applications?  API and GUI interfaces, modularity, openness

February 3, 2005Linguistic Data Consortium15 Why Functional  Purely functional programming language Haskell  Higher-order functions, type classes, polymorphism  Linguistic process ~ function on entities of the given description  Distinction between functions and forms in a language  Inflectional morphology may extend to derivational  Decomposition – phonology, orthography, grammar, …  Excellent progressive functionality  FM provides high-level interfaces for concrete models  Inferential-realizational generality & freedom of speech

February 3, 2005Linguistic Data Consortium16 Why Morphology  Methodology for developing similar models  Paradigms, inflectional + inherent parameters  Embedded domain-specific language  Collection of morphology implementations  Swedish, Spanish, Russian, Italian, Latin  The Zen Computational Linguistics Toolkit  Grammatical Framework  FST Studio

February 3, 2005Linguistic Data Consortium17 FM Architecture  The language model  Types meta-information  Functions tables/rules  Lexicons classified units  Provisions by FM  Dictionary compilation  Runtime applications  Data export utilities Dictionary FM LibrarySynthesizer Analyzer Exporter Linguist-dependentLinguist-independent / FM-generated The Model

February 3, 2005Linguistic Data Consortium18 Inflection Tables & Parameters  Inflection described by finite functions  Analogy shown on a selected instance of the given group  Realization of inflectional parameters yields the word form rosaSingularPlural Nominativerosarosae Vocativerosarosae Accusativerosamrosas Genitiverosaerosarum Dativerosaerosis Ablativerosarosis

February 3, 2005Linguistic Data Consortium19 Inherent Properties & Classes  How do I describe words’ non-inflectional properties, i.e. inherent parameters?  Design word classes that refine the inflectional groups, and characterize them  Lexicon associates lemmas with the classes  Dictionary lists the expanded information

February 3, 2005Linguistic Data Consortium20 Parameters in FM/Haskell  Parameters take their distinct type of values  Values are constructed by symbolic names data Case = Nominative | Genitive | Accusative | Ablative | Dative | Vocative data Number = Singular | Plural data Gender = Feminine | Neuter | Masculine data NounInfl = NounInfl Case Number

February 3, 2005Linguistic Data Consortium21 Paradigm Definition  Using functions with type signatures ourParadigm :: String -> NounInfl -> String ourParadigm rosa (NounInfl n c) = let rosae = rosa ++ “e” rosis = init rosa ++ “is” in case n of Singular -> case c of Accusative -> rosa ++ “m” Genitive -> rosae Dative -> rosae _ -> rosa -- next slide

February 3, 2005Linguistic Data Consortium22 -- continued Plural -> case c of Nominative -> rosae Vocative -> rosae Accusative -> rosa ++ “s” Genitive -> rosa ++ “rum” _ -> rosis -- where rosis = init rosa ++ “is”  How, when and what does it compute? ourParadigm “barba” (NounInfl Plural Genitive)  “barbarum” ourParadigm “dea” (NounInfl Plural Dative)  “deis” which is not correct Latin – we misused the paradigm

February 3, 2005Linguistic Data Consortium23 FM pre-defined functions  Programmer is free to be creative, as long as she keeps to the inferred system of types  FM accounts for exceptions, missing/only forms, multiple variants, stem changes, …  Each new model can add to this repertoire  FM implements the whole mechanism  Tries for efficient analysis/synthesis  Exports to XML, SQL, xfst, lexc, GF, LaTeX, …

February 3, 2005Linguistic Data Consortium24 Lexicon Format  Word class identification and the lemma  Lemma might yet be a function into a database  No programming needed – pure lexicography Dictionary Format  Class functions listing the information ourClass :: String -> Entry type Dictionary = [Entry]

February 3, 2005Linguistic Data Consortium25 Demo of the System

February 3, 2005Linguistic Data Consortium26 Inflection in Sanskrit  Computationally pioneered by Huet (2003)  Challenging issues in Sanskrit  Segmentation of compound words/verses  Alternation rules – external and internal sandhi  Phonetical orthography!  The Zen Toolkit inspired FM greatly

February 3, 2005Linguistic Data Consortium27 Inflection in Arabic  Quite structuralist computational models!  Functional Arabic Morphology  Revised description of grammatical parameters  Implementation in FM, providing its extensions  Challenging issues in Arabic  Run-on tokens, complex change of parameters  Decomposition of phonology and orthography

February 3, 2005Linguistic Data Consortium28 Summary  Functional Morphology reconciles linguistic abstraction with computational implementation  Haskell is a powerful, modern language  Development of morphologies requires only little initial programming knowledge  Development of lexicons reduces to natural lexicography

February 3, 2005Linguistic Data Consortium29 References  Markus Forsberg and Aarne Ranta Functional Morphology. In Proceedings of the ICFP 2004, pages 213—223. ACM Press.  Gérard Huet Lexicon-directed Segmentation and Tagging of Sanskrit. In XIIth World Sanskrit Conference, pages 307—325, Helsinki, Finland.  Gregory T. Stump Inflectional Morphology: A Theory of Paradigm Structure. Cambridge Studies in Linguistics 93. Cambridge University Press.  Kenneth R. Beesley and Lauri Karttunen Finite State Morphology. CSLI Studies in Computational Linguistics. CSLI Publications, Stanford, California.

February 3, 2005Linguistic Data Consortium30 Web Links        