A Weight Pushing Algorithm Michael Güttinger. Overview Semirings Weighted finite-state acceptors (WFSAs) The Weight Pushing Algorithm Results.

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

Modeling Computation Chapter 13.
A MEMORY-EFFICIENT  -REMOVAL ALGORITHM FOR WEIGHTED FINITE-STATE AUTOMATA Thomas Hanneforth, Universität Potsdam.
4b Lexical analysis Finite Automata
Finite Automata Section 1.1 CSC 4170 Theory of Computation.
Chapter Section Section Summary Set of Strings Finite-State Automata Language Recognition by Finite-State Machines Designing Finite-State.
January 5, 2015CS21 Lecture 11 CS21 Decidability and Tractability Lecture 1 January 5, 2015.
Finite-State Automata Shallow Processing Techniques for NLP Ling570 October 5, 2011.
FSA and HMM LING 572 Fei Xia 1/5/06.
Courtesy Costas Busch - RPI1 Non Deterministic Automata.
Finite Automata Finite-state machine with no output. FA consists of States, Transitions between states FA is a 5-tuple Example! A string x is recognized.
Finite state automaton (FSA)
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
1 Finite state automaton (FSA) LING 570 Fei Xia Week 2: 10/07/09 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
CMPT-825 (Natural Language Processing) Presentation on Zipf’s Law & Edit distance with extensions Presented by: Kaustav Mukherjee School of Computing Science,
Grammars, Languages and Finite-state automata Languages are described by grammars We need an algorithm that takes as input grammar sentence And gives a.
1 Non-Deterministic Finite Automata. 2 Alphabet = Nondeterministic Finite Automaton (NFA)
The Golden Chain εNFA  NFA  DFA  REGEX. Regular Expressions.
NFA ε - NFA - DFA equivalence. What is an NFA An NFA is an automaton that its states might have none, one or more outgoing arrows under a specific symbol.
Lecture 23: Finite State Machines with no Outputs Acceptors & Recognizers.
Pushdown Automata CS 130: Theory of Computation HMU textbook, Chap 6.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
Dr. Eng. Farag Elnagahy Office Phone: King ABDUL AZIZ University Faculty Of Computing and Information Technology CPCS 222.
Lesson No.6 Naveen Z Quazilbash. Overview Attendance and lesson plan sharing Assignments Quiz (10 mins.). Some basic ideas about this course Regular Expressions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.
Complexity and Computability Theory I Lecture #11 Instructor: Rina Zviel-Girshin Lea Epstein.
Models of Computation. Computation: Computation is a general term for any type of information processing information processing CPU memory.
Ravello, Settembre 2003Indexing Structures for Approximate String Matching Alessandra Gabriele Filippo Mignosi Antonio Restivo Marinella Sciortino.
Probabilistic Automaton Ashish Srivastava Harshil Pathak.
Finite State Machines 1.Finite state machines with output 2.Finite state machines with no output 3.DFA 4.NDFA.
Modeling Computation: Finite State Machines without Output
Lecture Notes 
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Chapter 5 Finite Automata Finite State Automata n Capable of recognizing numerous symbol patterns, the class of regular languages n Suitable for.
BİL711 Natural Language Processing1 Regular Expressions & FSAs Any regular expression can be realized as a finite state automaton (FSA) There are two kinds.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Theory of Computation Automata Theory Dr. Ayman Srour.
1 Section 11.2 Finite Automata Can a machine(i.e., algorithm) recognize a regular language? Yes! Deterministic Finite Automata A deterministic finite automaton.
1 Chapter Pushdown Automata. 2 Section 12.2 Pushdown Automata A pushdown automaton (PDA) is a finite automaton with a stack that has stack operations.
Lecture 06: Theory of Automata:08 Finite Automata with Output.
Department of Software & Media Technology
Deterministic Finite-State Machine (or Deterministic Finite Automaton) A DFA is a 5-tuple, (S, Σ, T, s, A), consisting of: S: a finite set of states Σ:
NFAε - NFA - DFA equivalence
Lexical analysis Finite Automata
An overview of decoding techniques for LVCSR
Non Deterministic Automata
Chapter 2 Finite Automata
Two issues in lexical analysis
An Introduction to Finite Automata
Non-Determinism 12CS45 Finite Automata.
Party-by-Night Problem
Non-Deterministic Finite Automata
Non Deterministic Automata
NFAs and Transition Graphs
Definitions Equivalence to Finite Automata
Finite Automata.
4b Lexical analysis Finite Automata
4b Lexical analysis Finite Automata
LECTURE 15: REESTIMATION, EM AND MIXTURES
Definitions Equivalence to Finite Automata
CSE 311 Foundations of Computing I
CSC 4170 Theory of Computation Finite Automata Section 1.1.
NFAs and Transition Graphs
Non Deterministic Automata
Statistical NLP Winter 2009
Presentation transcript:

A Weight Pushing Algorithm Michael Güttinger

Overview Semirings Weighted finite-state acceptors (WFSAs) The Weight Pushing Algorithm Results

Semirings Semiring (K, ⊕, ⊗,, ) K a set ⊕ associative and commutative ⊗ associative a ⊗ (b ⊕ c) = a ⊗ b ⊕ a ⊗ c Identity ( ⊕ ), Identity ( ⊗ ) 0 ⊗ a= a ⊗ 0=0

Semirings Examples Probability semiring ( ℝ₊,+,*,0,1) Log semiring ( ℝ₊  {∞}, ⊕,+,∞,0) with ∀ a,b ∈ ℝ  {∞},a ⊕ b =-log(exp(-a)+exp(- b)) Tropical Semiring ( ℝ₊  {∞},min,+, {∞},0)

Weighted Finite State Acceptors (WFSAs) WFSA A=(∑,Q,E,i,F,λ,ρ) over a Semiring K ∑ alphabet Q finite set of states E finite set of transitions ⊆ Q  ∑  {ε}  K  Q Initial state i ∈ Q, set of final states F ⊆ Q Initial weight λ,final weight function ρ

WFSA Transitions Transition t = (p[t],l[t],w[t],n[t]) Source state p[t], Destination state n[t], Label l[t], Weight w[t]

WFSA: Path A path in A is a consecutive transitions with: n[ ] = p[ ] ( ∀ i=1,..,n-1) A successful path π= is a path from the initial state i to a final state F The weight w[π] of path π is: w[π] = λ ⊗ w[ ] ⊗.... ⊗ w[ ] ⊗ ρ(n[ ]) Two WFSAs are equivalent when they associate the same weight with any given input string. Equivalent WFSAs may have weights distributed differently along their paths.

Motivation for Weight Pushing The weight pushing algorithm will build an equivalent automata whose weight distribution is better for pruning during speech recognition. This means pushing the weights as far towards the initial state as possible. The final weights assigned to all arcs leaving any given state will sum to unity.

Weight Pushing Potential function V:Q  K – {0}: The weights are updated by: λ  λ ⊗ V(i) ∀ e E,w[e]  [V(p[e])] ⁻ ¹ ⊗ (w[e] ⊗ V(n[e])) ∀ f F, ρ(f)  [V(f)] ⁻ ¹ ⊗ ρ[f] D[q] = w[ π ] V(q) is equal to the shortest distance from q to any final state.

Weighted Acceptor A V(q)= min w[ π ] Tropical Semiring V[0]=0; V[1]=0; V[2]=10; V[3]=0; w[e]  [V(p[e])] ⁻ ¹ + (w[e] +V(n[e])) w[e]  w[e] +V(n[e])- V(p[e])

Result of Pushing A over the Tropical Semiring

Weighted Acceptor A V(q)= ⊕ w[ π ] Log - Semiring w[e]  [V(p[e])] ⁻ ¹ ⊗ (w[e] ⊗ V(n[e])) λ  λ ⊗ V(i)

Result of Pushing A over Log Semiring

Consequence of Pushing In the Tropical Semiring the shortest path from each state to a final State has weight 0 In the Automaton we obtained from A by pushing over the log semiring: At each state,the outgoing weights sum to 1 With classical minimization the size of both automatons can be reduced

Conditions for Computing V(q ) The semiring K is divisible, when A k-closed Semiring is a semiring for which there exists a k such that:

Algorithm for Computing V(q) Source-Shortest-Distance(q); (tropical Semiring) For j= 1 to |Q| do d[j] =r[j]= ∞ ( ) d[j] an estimate of the shortest d[q] = r[q] =0 ( ) distance from q to j S={q}r[j] the total weight add to d[j]since While S  the last time j was extracted from S do node=head(S) DEQUEUE(S) R=r[node] r[node]=∞ ( ) for each e ∈ E[q] do if d[n[e]]  min (d[n[e]],(R+w[e]) //d[n[e]]  d[n[e]] ⊕ (R ⊗ w[e]) then d[n[e]]  min (d[n[e]],(R+w[e]) // d[n[e]] = d[n[e]] ⊕ (R ⊗ w[e]) r[n[e]]  min (r[n[e]],(R+w[e]) // r[n[e]] = r[n[e]] ⊕ (R ⊗ w[e]) if n[e] ∉ S then ENQUEUE(S,n[e]) d[q]=0

Algorithm to compute V(q) Source-Shortest-Distance(q); () For j= 1 to |Q| do d[j] =r[j]= d[j] an estimate of the shortest d[q] = r[q] = distance from q to j S={q} r[j] the total weight add to d[j]since While S  the last time j was extracted from S do node=head(S) DEQUEUE(S) R=r[node] r[node]=0 for each e ∈ E[q] do if d[n[e]]  d[n[e]] ⊕ (R ⊗ w[e]) then d[n[e]] = d[n[e]] ⊕ (R ⊗ w[e]) r[n[e]] = r[n[e]] ⊕ (R ⊗ w[e]) if n[e] ∉ S then ENQUEUE(S,n[e]) d[q]=0

Some facts about Weight Pushing Using tropical or log semiring pushing in the minimization step result in equivalent machines. But the distribution of the weights differs often radically The log semirring benefits speech recognition pruning Using the tropical semiring can be harmful in some cases

Experiments and Results

40k-word NAB Task:North American BusinessNews Vocabulary size: 40K Words Trigram language model with transitions Triphonic acoustic model Compaq alpha processor

X Real Time Word Accuracy 40k-word NAB

160k-word NAB Task:North American BusinessNews Vocabulary size: 160K Words 6-gram language model with transitions Triphonic acoustic model Compaq alpha processor

X Real Time Word Accuracy 160-word NAB

Summary The principle of weight pushing. The algorithm for the shortest distance. In consequence of the weight pushing algorithm is that speech recognition will be much faster.