Finite State Automata. A very simple and intuitive formalism suitable for certain tasks A bit like a flow chart, but can be used for both recognition.

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

Automata Theory Part 1: Introduction & NFA November 2002.
CS 3240: Languages and Computation
Augmented Transition Networks
4b Lexical analysis Finite Automata
CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
C O N T E X T - F R E E LANGUAGES ( use a grammar to describe a language) 1.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
October 2006Advanced Topics in NLP1 Finite State Machinery Xerox Tools.
Finite-state automata 2 Day 13 LING Computational Linguistics Harry Howard Tulane University.
1 Regular Expressions and Automata September Lecture #2-2.
Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
Computational language: week 9 Finish finite state machines FSA’s for modelling word structure Declarative language models knowledge representation and.
Amirkabir University of Technology Computer Engineering Faculty AILAB Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing Course,
Computational Language Finite State Machines and Regular Expressions.
Morphology See Harald Trost “Morphology”. Chapter 2 of R Mitkov (ed.) The Oxford Handbook of Computational Linguistics, Oxford (2004): OUP D Jurafsky &
Aho-Corasick String Matching An Efficient String Matching.
Regular Expressions and Automata Chapter 2. Regular Expressions Standard notation for characterizing text sequences Used in all kinds of text processing.
Grammars, Languages and Finite-state automata Languages are described by grammars We need an algorithm that takes as input grammar sentence And gives a.
Introduction to English Morphology Finite State Transducers
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
October 2006Advanced Topics in NLP1 CSA3050: NLP Algorithms Finite State Transducers for Morphological Parsing.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture4 1 August 2007.
Morphological Recognition We take each sub-lexicon of each stem class and we expand each arc (e.g. the reg-noun arc) with all the morphemes that make up.
Computational Linguistics Yoad Winter *General overview *Examples: Transducers; Stanford Parser; Google Translate; Word-Sense Disambiguation * Finite State.
Finite State Automata and Tries Sambhav Jain IIIT Hyderabad.
Chapter 2. Regular Expressions and Automata From: Chapter 2 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition,
March 1, 2009 Dr. Muhammed Al-mulhem 1 ICS 482 Natural Language Processing Regular Expression and Finite Automata Muhammed Al-Mulhem March 1, 2009.
October 2004CSA3050 NL Algorithms1 CSA3050: Natural Language Algorithms Words, Strings and Regular Expressions Finite State Automota.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
1 INFO 2950 Prof. Carla Gomes Module Modeling Computation: Language Recognition Rosen, Chapter 12.4.
Finite State Transducers for Morphological Parsing
Finite State Machinery - I Fundamentals Recognisers and Transducers.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
Copyright © Curt Hill Finite State Automata Again This Time No Output.
CSA3050: Natural Language Algorithms Finite State Devices.
The Simplest NL Applications: Text Searching and Pattern Matching Read J & M Chapter 2.
Joey Paquet, 2000, Lecture 2 Lexical Analysis.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
1 / 48 Formal a Language Theory and Describing Semantics Principles of Programming Languages 4.
November 2003CSA4050: Computational Morphology IV 1 CSA405: Advanced Topics in NLP Computational Morphology IV: xfst.
October 2007Natural Language Processing1 CSA3050: Natural Language Algorithms Words and Finite State Machinery.
Natural Language Processing Lecture 4 : Regular Expressions and Automata.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.
Modeling Computation: Finite State Machines without Output
UNIT - I Formal Language and Regular Expressions: Languages Definition regular expressions Regular sets identity rules. Finite Automata: DFA NFA NFA with.
November 2003Computational Morphology III1 CSA405: Advanced Topics in NLP Xerox Notation.
October 2004CSA3050 NLP Algorithms1 CSA3050: Natural Language Algorithms Morphological Parsing.
1 Section 13.1 Turing Machines A Turing machine (TM) is a simple computer that has an infinite amount of storage in the form of cells on an infinite tape.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
1 Turing Machines and Equivalent Models Section 13.1 Turing Machines.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Nondeterministic Finite Automata (NFAs). Reminder: Deterministic Finite Automata (DFA) q For every state q in Q and every character  in , one and only.
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Set, Alphabets, Strings, and Languages. The regular languages. Clouser properties of regular sets. Finite State Automata. Types of Finite State Automata.
CSC-305 Design and Analysis of AlgorithmsBS(CS) -6 Fall-2014CSC-305 Design and Analysis of AlgorithmsBS(CS) -6 Fall-2014 Design and Analysis of Algorithms.
Department of Software & Media Technology
Recap: Nondeterministic Finite Automaton (NFA) A deterministic finite automaton (NFA) is a 5-tuple (Q, , ,s,F) where: Q is a finite set of elements called.
BİL711 Natural Language Processing
Finite State Machines Dr K R Bond 2009
Two issues in lexical analysis
CSCI 5832 Natural Language Processing
LING 138/238 SYMBSYS 138 Intro to Computer Speech and Language Processing Dan Jurafsky 11/24/2018 LING 138/238 Autumn 2004.
CSC NLP - Regex, Finite State Automata
CPSC 503 Computational Linguistics
Morphological Parsing
Presentation transcript:

Finite State Automata

A very simple and intuitive formalism suitable for certain tasks A bit like a flow chart, but can be used for both recognition and generation “Transition network” Unique start point Series of states linked by transitions Transitions represent input to be accounted for, or output to be generated Legal exit-point(s) explicitly identified

Example Jurafsky & Martin, Figure 2.10 Loop on q 3 means that it can account for infinite length strings “Deterministic” because in any state, its behaviour is fully predictable q0q0 q1q1 q2q2 q3q3 q4q4 b aa! a

Non-deterministic FSA Jurafsky & Martin, Figure 2.18 At state q 2 with input “a” there is a choice of transitions We can also have “jump” arcs (or empty transitions), which also introduce non- determinism q0q0 q1q1 q2q2 q3q3 q4q4 b aa! a 2.19 ε

Augmented Transition Networks ATNs were used for parsing in the 60s and 70s For parsing, you need to pass constraints (e.g. for agreement) as well as account for input: the Transition Networks were “augmented” by having a “register” into/from which such information could be put/taken. It’s easy to write recognizers, but computing structure is difficult ATNs quickly become very complex; one solution isto have a “cascade” of ATNs, where transitions can call other networks

Augmented Transition Networks Sq1q1 NPq1q1 ex push NP put “num” det put “num” push VP get “num” n put “num” adj q2q2 ε pop NPprep

Exercises q0q0 q1q1 q2q2 q3q3 q4q4 b aa! a fsa([[0,b,1],[1,a,2],[2,a,3],[3,a,3],[3,!,end]]). [0,b,1] [1,a,2] [2,a,3] [3,a,3] [3,!,end]

NDSFA q0q0 q1q1 q2q2 q3q3 q4q4 b aa! ε fsa([[0,b,1],[1,a,2],[2,a,3],[3,empty,2],[3,!,end]]). [0,b,1] [1,a,2] [2,a,3] [3,!,end] [3,empty,2]

FSA and NDFSA programs First load (consult) the file, eg 219.pl | ?- help. Options are as follows run - a simple recognizer; on prompt type in string with space between each element, ending in. or ! or ? run(v) - verbose recognizer gives trace of transitions gen(X) - generate text; will interact at choice points rec(X,quiet) - to generate text deterministically. Type ; to get other grammatical sequences | ?- run. b a a a a ! Enter your string: yes

FSA and NDFSA programs | ?- run(v). Enter your string: 0-b-1 1-a-2 2-a-3 3-skip-2 2-a-3 3-skip-2 2-a-3 3-skip-2 3-!-end yes b a a a a !

| ?- gen(X). FSA and NDFSA programs Choice at state 3. Choose state from (1)[!,end] (2) [empty,2] Select choice number: 2. Choice at state 3. Choose state from (1) [!,end] (2) [empty,2] Select choice number: 2. Choice at state 3. Choose state from (1) [!,end] (2) [empty,2] Select choice number: 1. X = [b,a,a,a,a,!] ? yes

| ?- rec(X,quiet). X = [b,a,a] ? FSA and NDFSA programs ; X = [b,a,a,a] ? ; X = [b,a,a,a,a] ? ; X = [b,a,a,a,a,a] ? yes

FSAs and regular expressions FSAs have a close relationship with “regular expressions”, a formalism for expressing strings, mainly used for searching texts, or stipulating patterns of strings Regular expressions are defined by combinations of literal characters and special operators

Regular expressions CharacterMeaningExamples [ ]alternatives/[aeiou]/, /m[ae]n/ ­ range /[a-z]/ [^ ]not/[^pbm]/, /[^ox]s/ ?optionality/Kath?mandu/ *zero or more/baa*!/ +one or more/ba+!/.any character /cat.[aeiou]/ ^, $start, end of line \not special character \.\?\^ |alternate strings/cat|dog/ ( )substring/cit(y|ies)/ etc.

Regular expressions A regular expression can be mapped onto an FSA Can be a good way of handling morphology Especially in connection with Finite State Transducers

Finite State Transducers A “transducer” defines a relationship (a mapping) between two things Typically used for “two-level morphology”, but can be used for other things Like an FSA, but each state transition stipulates a pair of symbols, and thus a mapping

Finite State Transducers Three functions: –Recognizer (verification): takes a pair of strings and verifies if the FST is able to map them onto each other –Generator (synthesis): can generate a legal pair of strings –Translator (transduction): given one string, can generate the corresponding string

Some conventions Transitions are marked by “:” A non-changing transition “x:x” can be shown simply as “x” Wild-cards are shown as Empty string shown as “ε”

An example J&M Fig. 3.9, p.74 q0q0 q6q6 q5q5 q4q4 q3q3 q2q2 q1q1 q7q7 f o x c a t d o g g o o s e s h e e p m o u s e g o:e o:e s e s h e e p m o:i u:εs:c e N:ε P:^ s # S:# P:# lexical:intermediate

q0q0 q6q6 q5q5 q4q4 q3q3 q2q2 q1q1 q7q7 g o o s e s h e e p m o u s e g o:e o:e s e s h e e p m o:i u:εs:c e N:ε P:^ s # S:# P:# [0] f:f o:o x:x [1] N:ε [4] P:^ s:s #:# [7] [0] f:f o:o x:x [1] N:ε [4] S:# [7] [0] c:c a:a t:t [1] N:ε [4] P:^ s:s #:# [7] [0] s:s h:h e:e p:p [2] N:ε [5] S:# [7] [0] g:g o:o o:o s:s e:e [2] N:ε [5] P:# [7] f o x N P s # : f o x ^ s # f o x N S : f o x # c a t N P s # : c a t ^ s # s h e e p N S : s h e e p # g o o s e N P : g e e s e # f o x c a t d o g

Lexical:surface mapping J&M Fig. 3.14, p.78 ε  e / {x s z} ^ __ s # f o x N P s # : f o x ^ s # c a t N P s # : c a t ^ s # q5q5 q4q4 q0q0 q2q2 q3q3 q1q1 ^: ε # other z, s, x #, otherz, x ^: ε s ε:e s #

f o x ^ s # f o x e s # c a t ^ s # : c a t ^ s # q5q5 q4q4 q0q0 q2q2 q3q3 q1q1 ^: ε # other z, s, x #, otherz, x ^: ε s ε:e s # [0] f:f [0] o:o [0] x:x [1] ^:ε [2] ε:e [3] s:s [4] #:# [0] [0] c:c [0] a:a [0] t:t [0] ^:ε [0] s:s [0] #:# [0]

FST Can be generated automatically Therefore, slightly different formalism

FST compiler [d o g N P.x. d o g s ] | [c a t N P.x. c a t s ] | [f o x N P.x. f o x e s ] | [g o o s e N P.x. g e e s e] s0: c -> s1, d -> s2, f -> s3, g -> s4. s1: a -> s5. s2: o -> s6. s3: o -> s7. s4: -> s8. s5: t -> s9. s6: g -> s9. s7: x -> s10. s8: -> s11. s9: -> s12. s10: -> s13. s11: s -> s14. s12: -> fs15. s13: -> fs15. s14: e -> s16. fs15: (no arcs) s16: -> s12. s0s0 s3s3 s2s2 s1s1 s4s4 c d f g

s0: c -> s1, d -> s2, f -> s3, g -> s4. s1: a -> s5. s2: o -> s6. s3: o -> s7. s4: -> s8. s5: t -> s9. s6: g -> s9. s7: x -> s10. s8: -> s11. s9: -> s12. s10: -> s13. s11: s -> s14. s12: -> fs15. s13: -> fs15. s14: e -> s16. fs15: (no arcs) s16: -> s12. fst([ [s0,[c,s1], [d,s2], [f,s3], [g,s4]], [s1,[a,s5]], [s2,[o,s6]], [s3,[o,s7]], [s4,[[o,e],s8]], [s5,[t,s9]], [s6,[g,s9]], [s7,[x,s10]], [s8,[[o,e],s11]], [s9,[['N',s],s12]], [s10,[['N',e],s13]], [s11,[s,s14]], [s12,[['P',0],fs15]], [s13,[['P',s],fs15]], [s14,[e,s16]], [fs15, noarcs], [s16,[['N',0],s12]] ]).

FST 3.9 s0s0 q6q6 q5q5 q4q4 q3q3 q2q2 q1q1 q7q7 g o o s e s h e e p m o u s e g o:e o:e s e s h e e p m o:i u:εs:c e N:ε PL:^ s # SG:# PL:# f o x c a t d o g

s0s0 q1q1 f o x c a t d o g FST 3.9 (portion) [s0,[f,s1], [c,s3], [d,s5]], [s1,[o,s2]], [s2,[x,q1]], [s3,[a,s4]], [s4,[t,q1]], [s5,[o,s6]], [s6,[g,q1]], s0s0 q1q1 f s1s1 s2s2 s3s3 s4s4 s5s5 s6s6 c d o a o x t g