Topic : algorithms on FSA -- M.Mohri,on some applications of Finite- state automata theory to natural language processing. Natural Language Eng 1 (1996)

Slides:



Advertisements
Similar presentations
4b Lexical analysis Finite Automata
Advertisements

Theory Of Automata By Dr. MM Alam
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
YES-NO machines Finite State Automata as language recognizers.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 8: 9/29.
Lecture 3UofH - COSC Dr. Verma 1 COSC 3340: Introduction to Theory of Computation University of Houston Dr. Verma Lecture 3.
1 Module 19 LNFA subset of LFSA –Theorem 4.1 on page 131 of Martin textbook –Compare with set closure proofs Main idea –A state in FSA represents a set.
Finite Automata Finite-state machine with no output. FA consists of States, Transitions between states FA is a 5-tuple Example! A string x is recognized.
1 Chapter 2: Scanning 朱治平. Scanner (or Lexical Analyzer) the interface between source & compiler could be a separate pass and places its output on an.
COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 11: 10/3.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
Topics Automata Theory Grammars and Languages Complexities
Regular Expressions (RE) Empty set Φ A RE denotes the empty set Empty string λ A RE denotes the set {λ} Symbol a A RE denotes the set {a} Alternation M.
Finite-State Machines with No Output
Lexical Analysis CSE 340 – Principles of Programming Languages Fall 2015 Adam Doupé Arizona State University
::ICS 804:: Theory of Computation - Ibrahim Otieno SCI/ICT Building Rm. G15.
Lecture # 1 (Automata Theory)
DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.
CSC312 Automata Theory Lecture # 2 Languages.
Lecture # 3 Chapter #3: Lexical Analysis. Role of Lexical Analyzer It is the first phase of compiler Its main task is to read the input characters and.
1 Chapter 1 Introduction to the Theory of Computation.
2. Scanning College of Information and Communications Prof. Heejin Park.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
Introduction to Theory of Automata By: Wasim Ahmad Khan.
Regular Grammars Chapter 7. Regular Grammars A regular grammar G is a quadruple (V, , R, S), where: ● V is the rule alphabet, which contains nonterminals.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Mathematical Notions and Terminology Lecture 2 Section 0.2 Fri, Aug 24, 2007.
Complexity and Computability Theory I Lecture #2 Rina Zviel-Girshin Leah Epstein Winter
Copyright © Curt Hill Finite State Automata Again This Time No Output.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Lecture 4 Theory of AUTOMATA
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.
Mathematical Foundations of Computer Science Chapter 3: Regular Languages and Regular Grammars.
98 Nondeterministic Automata vs Deterministic Automata We learned that NFA is a convenient model for showing the relationships among regular grammars,
Algorithms for hard problems Automata and tree automata Juris Viksna, 2015.
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
Conversions Regular Expression to FA FA to Regular Expression.
 2004 SDU Lecture4 Regular Expressions.  2004 SDU 2 Regular expressions A third way to view regular languages. Say that R is a regular expression if.
1 Section 11.2 Finite Automata Can a machine(i.e., algorithm) recognize a regular language? Yes! Deterministic Finite Automata A deterministic finite automaton.
Akram Salah ISSR Basic Concepts Languages Grammar Automata (Automaton)
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Theory of Computation Automata Theory Dr. Ayman Srour.
Topic 3: Automata Theory 1. OutlineOutline Finite state machine, Regular expressions, DFA, NDFA, and their equivalence, Grammars and Chomsky hierarchy.
Fall 2004COMP 3351 Finite Automata. Fall 2004COMP 3352 Finite Automaton Input String Output String Finite Automaton.
CSE 589 Applied Algorithms Spring 1999
Implementation of Haskell Modules for Automata and Sticker Systems
Finite State Machines Dr K R Bond 2009
Languages.
Lecture 1 Theory of Automata
Lexical analysis Finite Automata
CS314 – Section 5 Recitation 3
Two issues in lexical analysis
Finite-state automata
Chapter 2 FINITE AUTOMATA.
REGULAR LANGUAGES AND REGULAR GRAMMARS
Chapter 7 Regular Grammars
COSC 3340: Introduction to Theory of Computation
Finite Automata.
4b Lexical analysis Finite Automata
Compiler Construction
4b Lexical analysis Finite Automata
Chapter 1 Introduction to the Theory of Computation
Finite-State Machines with No Output
CSC312 Automata Theory Lecture # 2 Languages.
Recap Lecture 3 RE, Recursive definition of RE, defining languages by RE, { x}*, { x}+, {a+b}*, Language of strings having exactly one aa, Language of.
Lexical Analysis Uses formalism of Regular Languages
Prepared by- Patel priya( ) Guided by – Prof. Archana Singh Gandhinagar Institute of Technology SUBJECT - CD ( ) Introcution to Regular.
Finite Automata Part Three
Presentation transcript:

Topic : algorithms on FSA -- M.Mohri,on some applications of Finite- state automata theory to natural language processing. Natural Language Eng 1 (1996) ---Determinization of transducer ---Indexation with automata

Motivation Consideration of time and space efficiency Time efficiency is usually achieved by deterministic automata Space efficiency is achieved by classic minimization algorithms for deterministic automata Applications such as large scale dictionary compilation have shown deterministic transducer to be very efficient in practice. Indexation of natural language texts

Determinization of Transducer Concepts and Notations Main Idea Example

Concepts and Notations--transducer

Concepts and Notations(cont.) ^ (x,y)--Longest common prefix of two strings x and y eg: ^ (a,b)= , ^ (aa,a)=a, ^ (ab,  b)=  x -1 (xy)--the string y obtained by dividing (xy) at left by x eg: a -1 (ab)=b, (bb) -1 (  bb)=  Q--the queue to maintain the set of states of the resulting transducer T 2

Main Idea New state---Set of (state,output) pairs _:a b:c b:? {(1,a),..} New output---Greatest common output 1 {(1,a),..}

Example — step1:initial state T1: T2: {(0,  )} Final state:  Initial state:{(0,  )} Q:{(0,  )} 0

Determinization-step2:final state q2: {(0,  )} (0,  )  q2,0  F1  =  T1: T2:  {(0,  )} {(0,  )} 00,  )} 0

Determinization-step3:output & transition For each input label of transitions leaving the state of {(0,  )}:a,b,c consider respectively:  2 ( {(0,  )},a),  2 ( {(0,  )},a )  2 ( {(0,  )},b),  2 ( {(0,  )},b )  2 ( {(0,  )},c),  2 ( {(0,  )},c ) T1: c:? T2: b:? a:? {(0,  )}  0,0, ? ? ?

Determinization-step4  2 ( {(0,  )},a)=  (^(a,b))=   2 ( {(0,  )},a)= {(2,  -1 (  a)}  {(1,  -1 (  b)} ={(2,a),(1,b)} New state! ->Q  2 ( (0,  ),b),  2 ( (0,  ),b)  2 ( (0,  ),c),  2 ( (0,  ),c) T1: c:? T2: b:? a:  {(0,  )}  {(2,a),(1,b)} 0,0, ? ?

Determinization-step5  2 ( {(0,  )},b)=  (b)= b  2 ( {(0,  )},b)= {(0, b -1 (  b)} ={(0,  )} not a new state!  2 ( {(0,  )},c)=  (c)= c  2 ( {(0,  )},c)= {(0, c -1 (  c)} ={(0,  )} not a new state!  Q:{(2, a),(1,b)} T1: c:c T2: b:b a:  {(0,  )}  {(2,a),(1,b)} 0,0,

Determinization-step6  F2=F2  {(2, a),(1,b)},  =a  2 ( {(2, a),(1,b)},a)= a(^(a, b))=a  2 ( {(2, a),(1,b)},a)= {(2, a -1 (aa)),(1, a -1 (ab)} = {(2, a )),(1, b)} not a new state!  2 ( {(2, a),(1,b)},b)= b(b)=bb  2 ( {(2, a),(1,b)},b) = {(0, (bb) -1 bb)} ={(0,  )} not a new state! Q empty-- done! T1: c:c T2: b:b {(0,  ) a:  a:a b:bb  {(2,a),(1,b)} a 0,0,

summary Time efficiency Not all transducers can be determinized Extension:p-subsequential

Indexation with automata States with positions ’ lists Each list corresponds to the set of ending positions of any word reaching this state when read from the initial state Eg :aabba

a a b b a art p init p a p a p b p b p a p b P b a b l=0 l=1 l=2 3:l=3 l=4 l=5 s0=art s1=init s2=1 s3=init s4=r s5=1 s3=r list=4 list=5 list=1 list=2 list=3 list=1,2 r: list=1,2,5 list=3,4 sr=init lr= r 5

summary The automaton constructed this way is the minimal automaton recognizing the set of suffixes of a given text (Blumer et al.1987) Time efficiency:quadratic Deterministic automaton

Questions? Thanks!