Finite State Machinery - I Fundamentals Recognisers and Transducers.

Slides:



Advertisements
Similar presentations
Automata Theory Part 1: Introduction & NFA November 2002.
Advertisements

Finite State Automata. A very simple and intuitive formalism suitable for certain tasks A bit like a flow chart, but can be used for both recognition.
Augmented Transition Networks
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
Finite-State Transducers: Applications in Natural Language Processing Heli Uibo Institute of Computer Science University of Tartu
Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011.
PSY 369: Psycholinguistics Some basic linguistic theory part2.
October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.
1 Pertemuan 22 Natural Language Processing Syntactic Processing Matakuliah: T0264/Intelijensia Semu Tahun: Juli 2006 Versi: 2/2.
Computational Language Finite State Machines and Regular Expressions.
Pushdown Automaton (PDA)
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
FST Morphology Miriam Butt October 2002 Based on Beesley and Karttunen 2002.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
Introduction to English Morphology Finite State Transducers
Three Generative grammars
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Introduction to Natural Language Processing Heshaam Faili University of Tehran.
CPSC 388 – Compiler Design and Construction Scanners – Finite State Automata.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Computational Linguistics INTroduction
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
Pushdown Automata.
1 Computational Linguistics Ling 200 Spring 2006.
October 2004CSA3050 NL Algorithms1 CSA3050: Natural Language Algorithms Words, Strings and Regular Expressions Finite State Automota.
Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code  source.
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Lecture 1 Computation and Languages CS311 Fall 2012.
Finite State Transducers for Morphological Parsing
October 2005csa3180: Parsing Algorithms 11 CSA350: NLP Algorithms Sentence Parsing I The Parsing Problem Parsing as Search Top Down/Bottom Up Parsing Strategies.
7. Parsing in functional unification grammar Han gi-deuc.
Introduction to Computational Linguistics Finite State Machines (derived from Ken Beesley)
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
CSA2050 Introduction to Computational Linguistics Lecture 1 Overview.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
Rules, Movement, Ambiguity
Artificial Intelligence: Natural Language
CSA2050 Introduction to Computational Linguistics Parsing I.
CSA3050: Natural Language Algorithms Finite State Devices.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
November 2003CSA4050: Computational Morphology IV 1 CSA405: Advanced Topics in NLP Computational Morphology IV: xfst.
October 2007Natural Language Processing1 CSA3050: Natural Language Algorithms Words and Finite State Machinery.
Lexical Analysis - Scanner- Contd Computer Science Rensselaer Polytechnic Compiler Design Lecture 3(01/21/98)
FST Morphology Miriam Butt October 2003 Based on Beesley and Karttunen 2003.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.
CSA4050: Advanced Topics in NLP Computational Morphology II Introduction 2 Level Morphology.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
November 2003Computational Morphology VI1 CSA4050 Advanced Topics in NLP Non-Concatenative Morphology – Reduplication – Interdigitation.
November 2004csa3050: Parsing Algorithms 11 CSA350: NLP Algorithms Parsing Algorithms 1 Top Down Bottom-Up Left Corner.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Introduction to Deep Processing Techniques for NLP Deep Processing Techniques for NLP Ling 571 January 4, 2016 Gina-Anne Levow.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
BİL711 Natural Language Processing
Two issues in lexical analysis
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Non-Deterministic Finite Automata
Lecture 7: Introduction to Parsing (Syntax Analysis)
languages & relations regular expressions finite-state networks
Natural Language Processing
CPSC 503 Computational Linguistics
NFAs and Transition Graphs
Morphological Parsing
CSC312 Automata Theory Lecture # 24 Chapter # 11 by Cohen Decidability.
Presentation transcript:

Finite State Machinery - I Fundamentals Recognisers and Transducers

4 Reference Outline Websites – Xerox: – Groningen: grid.let.rug.nl/~vannoord/FSA/fsa.html – AT & T: Books/Collections – Karttunen & Oflazer (2000) – Jurafsky & Martin (2000) – Hopcraft and Ullman (1979) – Roche and Schabes (1977) Classic Articles – Kaplan and Kay (1994) – Koskenniemi (1983) – Johnson (1972) Tools – Van Noord et al. – Mohri et al. – Daciuk. – Karttunen & Beesley

5 Acknowledgements to Lauri Karttunen, Ken Beesley and colleagues at Xerox. Most materials in this tutorial are from their website. Forthcoming book: Finite State Morphology – Xerox Tools and Techniques.

FS Motivation Chomsky hierarchy of language classes based on classes of descriptive notation, and also on asociated classes of machine. Chomsky (1957) dismissed FS grammars, and associated machinery, as fundamentally inadequate for the description of NL.

Embedding Basic problem is not that sentences can grow to arbitrary length, it is that the description of a syntactic constitutent may embed any other constituents including the sentence itelf. The dog bit the cat. The dog that the man saw bit the cat. The dog that the man that the horse kicked saw bit the cat. etc

On the other hand …... Plenty of language just ain't like that. Words – Orthographic spelling. – Phonological spelling. – Morphology. Fixed expression types (e.g dqtes). Gross constitutent structures (e.g. the big, bad, blue wolf).

Recent Application Areas for FS Technology Include POS Tagging Spell Checking Information Extraction Speech Recognition Text to Speech Spoken Dialogue Parsing

21 Recognition of Italian Words The coke machine recognises words in the coke machine language. The following machine recognises two words in Italian. Recognition mechanism is language independent. CASA I NQUE

22 The Process of Analysis Start in the initial state and at the first symbol of the word. If there is an arc labelled with that symbol, the machine transitions to the next state, and the symbol is consumed. The process continues with successive symbols until.....

23 The Process of Analysis One or more of these conditions holds: A. A final state is reached B. All symbols are consumed C. There are no transitions out of a state for the current symbol. – If both A and B, analysis succeeds and the word is recognised. – Otherwise recognition fails.

24 Success and Failure I NQUE CASA EENT L LE; CASA; CINQUANTA; LENTEMENTE

27 Transducers Recognisers either accept or reject a word. Although this is useful, networks can actually return more substantial information. This is achieved by providing networks with the ability to write as well as to read.

28 Basic Transducer Each transition of a transducer is labelled with a pair of symbols rather than with a single symbol. Analysis proceeds as before, except that input symbols are matched against the lower-side symbols on transitions. If analysis succeeds, return the string of upper- side symbols on the path to the final state

Confusing Terminology Lower side = surface side. Upper side = "deep" side. Analysis proceeds from lower to upper. Synthesis (generation) proceeds from upper to lower.

29 Lexical Transducers In common parlance, a transducer is a device which converts one form of energy into another, e.g. a microphone converts from sound to electrical signals. Next we look at lexical transducers which convert one string of symbols into another.

30 Lexical Transducer Example CASA CASE Input: CASE Output: CASA lexical string surface string

31 Morphological Analysis R ATNOC OCNT   +VE+SG+1P  O  Input: CONTO Output: CONTARE +V +1P +SG

32 Remarks  stands for "epsilon". During analysis, epsilon transitions are taken freely without consuming any input. Note also single symbols with multi- character print names (e.g. +SG). The order of these symbols, and the choice of infinitive as baseform, is determined by linguists.

33 Exercise The word "conto" in Italian is also a masculine noun meaning (a) story and (b) bank account Draw the corresponding 2-level networks. How can the different meanings be incorporated into the same network

31 Conto +N +SG +N OTNOC OCNT  O +SG  Input: CONTO Output: CONTO +N+SG A 

34 Synthesis Transducers are reversible. This means that they can be used to perform the inverse transduction from an transducers. The process of synthesis is the inverse of analysis

35 The Process of Synthesis Start at the start state and at the beginning of the input string. Match the input symbols against the upper- side symbols of the arcs, consuming symbols until a final state is reached. If successful, return the string of lower-side symbols (else nothing).

36 Morphological Synthesis R ATNOC OCNT   +VE+SG+1P  O  Input: CONTARE +V +1P +SG Output: CONTO N.B.  symbols are ignored on output

37 Analysis and Synthesis Upper Side Language (Lexical Strings). Lower Side Language (Surface Strings). Transducer maps between the two. However large the lexical transducer may become, analysis and synthesis are performed by the same language- independent matching techniques.