Presentation is loading. Please wait.

Presentation is loading. Please wait.


Similar presentations

Presentation on theme: "A MEMORY-EFFICIENT  -REMOVAL ALGORITHM FOR WEIGHTED FINITE-STATE AUTOMATA Thomas Hanneforth, Universität Potsdam."— Presentation transcript:


2 Overview   -transitions in finite-state based NLP  Removing  -transitions in weighted finite-state automata: an algorithm by M. Mohri  Some formal definitions  An improved algorithm demonstrated in the case of acyclic automata  Experiments

3  -Transitions in finite-state based NLP  Many NLP applications based on (weighted) finite-state automata (WFSM) create a lot of  -transitions during processing  Examples:  Applying bracketing rules (NE-recognition, local grammars)  Corpus processing  These  -transitions have to be removed due to speed and efficiency reasons.  In many cases, the finite state automata containing the  -transitions are acyclic.

4 Example: N-gram counting in corpora  A corpus is a disjunction of sentences.  The corpus of which the N-grams are to be counted is represented as an acyclic WFSM over the real semiring.  That means: the weighted along a path in the corpus WFSM are multiplied to compute the absolute frequency of a given sentence.  The N-gram counter is represented as a special cyclic weighted finite-state transducer.

5 Example: N-gram counting in corpora A corpus C as a WFSM  For example, the absolute frequency of the sentence bbcd is 4 · 0.25 · 1 · 1 · 1 = 1

6 Example: N-gram counting in corpora A corpus as a WFSM  Counting is basically composition of the corpus with the counting transducer and taking the lower tape of the result:  2 (C  T) A bigram counting transducer T

7 Example: N-gram counting in corpora  2 (C  T)

8  -removal in WFSMs: Mohri‘s algorithm 1. For each state p compute the  -distance to any other reachable state q. 2. For each  -path with distance w from p to q and a single transition from q to r labeled with a   and weight w’, add a transition from p to r with label a and weight ww’ to the FSA. If q is a final state, p will also become a final state. If p already was a final state, the final weights of q and p are additively combined. 3. Remove all  -transitions, non-reachable states and non-contributing transitions.

9  -removal in WFSMs: Mohri‘s algorithm General  -removal pattern: The states for which the pattern is applied can be visited in any order

10  -removal in WFSMs: Mohri‘s algorithm If the  -subgraph of the WFSM is acyclic, it is possible to process the states in reverse topological order: Example: Reverse topological order  Two transitions attached to non-reachable states are superfluous and have to be removed in step 3  Nevertheless, they preserve the weights associated with  -transitions earlier in the reversed topological order.

11 An improved algorithm: Idea  The attachment of newly created transitions to inaccessible states must be somehow avoided  But, when applying the reverse topological order strategy, these transitions are necessary even if they are deleted in step 3 of the algorithm  Thus, the reverse topological order strategy can be no longer used  Simple idea: keep track of reachable states  I will focus on the special case of acyclic WFSMs

12 Some formal definitions 1) is a commutative monoid with 0 as the identity element for  2) is a monoid with 1 as the identity element for  3)  distributes over  4) 0 is an annihilator for  :  w  , w  0 = 0  w = 0 A structure is a semiring if it fulfils the following conditions: Semiring Common semirings are the real semiring and the tropical semiring.

13 Some formal definitions  -distance between two states p and q  w()w()  -dist(p,q) =    (p, , q)

14 An improved algorithm: example  -Reachable Topological order  -distance(0) = {  1,0.1 ,  2,0.3 ,  3,0.6  }  -distance(4) =  = {0,4}= {0}

15 An improved algorithm Input: An acyclic WFSA A = ,Q,q 0,F,E, Output: An equivalent -free WFSA A’ R  -reachable({q 0 }) for all p  Q in ascending order do if p  R then D  compute-shortest--distances(A,p) R‘   for all q,w  D do for all t  E[q] do E  E  { p, l[t],w  w[t], n[t] } R‘  R‘  {n[t]} end for adjust-final-state(A,p,q) end for R  R  -reachable(R‘) end if end for delete--transitions(A) delete-states(Q-R) connect(A) return A

16 Improved algorithm:  -distances   -distances are usually computed with a generalized shortest-distance algorithm  For cyclic WFSMs, this algorithm may be optimized by letting it operate on the strongly connected components of the WFSM  For acyclic WFSMs, relaxation in topological order is the most efficient algorithm

17 Improved algorithm: Computing  -distances 1. Topologically sort the input WFSM and use this order for computing  -distances 2. Construct a embedded topological order for every  -subautomaton (two-pass strategy) 3. As 2., but cache already computed distances 4. Topologically sort the input WFSM and make use of a priority queue which is ordered after state number There are at least 4 approaches to compute acyclic  -distances:

18 Improved algorithm: Computing  -distances in an acyclic WFSM Example:  The global topological order is 0 1 2 3 4 5 6  There are two  -subgraphs rooted at states 1 and 2, respectively. The topological orders are:  1 3 4 5  2 4 5  In a topologically ordered WFSM, whenever you have a transition p  q, the state number of q is strictly greater than the state number of p.

19 Improved algorithm:  -distances with a priority queue Input: Output: S  PQ   enqueue(PQ,p) while PQ   do q  pop(PQ) if q  S then S  S  {q} if q = p then d q  1 else d q  d[q] end if for all t  E[q] do d[n[t]]  d[n[t]]  (d q  w[t]) enqueue(PQ,n[t]) end for end if end while return d

20 Improved algorithm: Complexity  Of course, in the worst case the algorithm presented here has the same complexity as Mohri‘s algorithm  So, the complexity is:  In the acyclic case: O(|Q||E| + |Q| 2 )  In the cyclic case: O(|Q||E| +|Q| 2 log |Q|)  The memory complexity is in O(|Q|)  As the experiments will show, there is a clear improvement in practical cases

21 Experiments: Input data  Input data: 50,000 sentences of the German TiGer corpus  Compiled into an optimised WFSM over the real semiring with 681,689 states and 730,175 transitions with |  | = 89,418  To that, a trigram counter was applied  This resulted in a WFSM with 2,724,212 states and 3,615,890 transitions (1,429,530  -transitions)  The out-degree, that is, the maximum number of outgoing transitions for a state was 14,044

22 Experiments The experiments were run on an Intel Quadcore CPU with 2.5 GHz (one core used) Transition labels and weights use both 4 Bytes

23 Experiments: Conclusions  Mohri's original algorithm is very fast, since in the acyclic case it only requires a single traversion through the state sequence. But, 83.5 % of the added transitions were useless  Its memory usage depends crucially on the out-degree of the input WFSM which in turn depends on the size of the alphabet  That is, for bigger corpora with alphabet sizes of several hundred thousand symbols, the non-optimized approach may become unfeasible  The revised algorithm in its two variants perform slower, since they compute  -distances  But their memory requirements are much lower

24 Appendix adjust-final-state(A,p,q) if q  F then if p  F then (p)  (p)  (w  (q)) else F  F  {p} (p)  w  (q) end if


Similar presentations

Ads by Google