Download presentation

Presentation is loading. Please wait.

Published byClayton Marley Modified over 2 years ago

1
A MEMORY-EFFICIENT -REMOVAL ALGORITHM FOR WEIGHTED FINITE-STATE AUTOMATA Thomas Hanneforth, Universität Potsdam

2
Overview -transitions in finite-state based NLP Removing -transitions in weighted finite-state automata: an algorithm by M. Mohri Some formal definitions An improved algorithm demonstrated in the case of acyclic automata Experiments

3
-Transitions in finite-state based NLP Many NLP applications based on (weighted) finite-state automata (WFSM) create a lot of -transitions during processing Examples: Applying bracketing rules (NE-recognition, local grammars) Corpus processing These -transitions have to be removed due to speed and efficiency reasons. In many cases, the finite state automata containing the -transitions are acyclic.

4
Example: N-gram counting in corpora A corpus is a disjunction of sentences. The corpus of which the N-grams are to be counted is represented as an acyclic WFSM over the real semiring. That means: the weighted along a path in the corpus WFSM are multiplied to compute the absolute frequency of a given sentence. The N-gram counter is represented as a special cyclic weighted finite-state transducer.

5
Example: N-gram counting in corpora A corpus C as a WFSM For example, the absolute frequency of the sentence bbcd is 4 · 0.25 · 1 · 1 · 1 = 1

6
Example: N-gram counting in corpora A corpus as a WFSM Counting is basically composition of the corpus with the counting transducer and taking the lower tape of the result: 2 (C T) A bigram counting transducer T

7
Example: N-gram counting in corpora 2 (C T)

8
-removal in WFSMs: Mohri‘s algorithm 1. For each state p compute the -distance to any other reachable state q. 2. For each -path with distance w from p to q and a single transition from q to r labeled with a and weight w’, add a transition from p to r with label a and weight ww’ to the FSA. If q is a final state, p will also become a final state. If p already was a final state, the final weights of q and p are additively combined. 3. Remove all -transitions, non-reachable states and non-contributing transitions.

9
-removal in WFSMs: Mohri‘s algorithm General -removal pattern: The states for which the pattern is applied can be visited in any order

10
-removal in WFSMs: Mohri‘s algorithm If the -subgraph of the WFSM is acyclic, it is possible to process the states in reverse topological order: Example: Reverse topological order Two transitions attached to non-reachable states are superfluous and have to be removed in step 3 Nevertheless, they preserve the weights associated with -transitions earlier in the reversed topological order.

11
An improved algorithm: Idea The attachment of newly created transitions to inaccessible states must be somehow avoided But, when applying the reverse topological order strategy, these transitions are necessary even if they are deleted in step 3 of the algorithm Thus, the reverse topological order strategy can be no longer used Simple idea: keep track of reachable states I will focus on the special case of acyclic WFSMs

12
Some formal definitions 1) is a commutative monoid with 0 as the identity element for 2) is a monoid with 1 as the identity element for 3) distributes over 4) 0 is an annihilator for : w , w 0 = 0 w = 0 A structure is a semiring if it fulfils the following conditions: Semiring Common semirings are the real semiring and the tropical semiring.

13
Some formal definitions -distance between two states p and q w()w() -dist(p,q) = (p, , q)

14
An improved algorithm: example -Reachable Topological order -distance(0) = { 1,0.1 , 2,0.3 , 3,0.6 } -distance(4) = = {0,4}= {0}

15
An improved algorithm Input: An acyclic WFSA A = ,Q,q 0,F,E, Output: An equivalent -free WFSA A’ R -reachable({q 0 }) for all p Q in ascending order do if p R then D compute-shortest--distances(A,p) R‘ for all q,w D do for all t E[q] do E E { p, l[t],w w[t], n[t] } R‘ R‘ {n[t]} end for adjust-final-state(A,p,q) end for R R -reachable(R‘) end if end for delete--transitions(A) delete-states(Q-R) connect(A) return A

16
Improved algorithm: -distances -distances are usually computed with a generalized shortest-distance algorithm For cyclic WFSMs, this algorithm may be optimized by letting it operate on the strongly connected components of the WFSM For acyclic WFSMs, relaxation in topological order is the most efficient algorithm

17
Improved algorithm: Computing -distances 1. Topologically sort the input WFSM and use this order for computing -distances 2. Construct a embedded topological order for every -subautomaton (two-pass strategy) 3. As 2., but cache already computed distances 4. Topologically sort the input WFSM and make use of a priority queue which is ordered after state number There are at least 4 approaches to compute acyclic -distances:

18
Improved algorithm: Computing -distances in an acyclic WFSM Example: The global topological order is There are two -subgraphs rooted at states 1 and 2, respectively. The topological orders are: In a topologically ordered WFSM, whenever you have a transition p q, the state number of q is strictly greater than the state number of p.

19
Improved algorithm: -distances with a priority queue Input: Output: S PQ enqueue(PQ,p) while PQ do q pop(PQ) if q S then S S {q} if q = p then d q 1 else d q d[q] end if for all t E[q] do d[n[t]] d[n[t]] (d q w[t]) enqueue(PQ,n[t]) end for end if end while return d

20
Improved algorithm: Complexity Of course, in the worst case the algorithm presented here has the same complexity as Mohri‘s algorithm So, the complexity is: In the acyclic case: O(|Q||E| + |Q| 2 ) In the cyclic case: O(|Q||E| +|Q| 2 log |Q|) The memory complexity is in O(|Q|) As the experiments will show, there is a clear improvement in practical cases

21
Experiments: Input data Input data: 50,000 sentences of the German TiGer corpus Compiled into an optimised WFSM over the real semiring with 681,689 states and 730,175 transitions with | | = 89,418 To that, a trigram counter was applied This resulted in a WFSM with 2,724,212 states and 3,615,890 transitions (1,429,530 -transitions) The out-degree, that is, the maximum number of outgoing transitions for a state was 14,044

22
Experiments The experiments were run on an Intel Quadcore CPU with 2.5 GHz (one core used) Transition labels and weights use both 4 Bytes

23
Experiments: Conclusions Mohri's original algorithm is very fast, since in the acyclic case it only requires a single traversion through the state sequence. But, 83.5 % of the added transitions were useless Its memory usage depends crucially on the out-degree of the input WFSM which in turn depends on the size of the alphabet That is, for bigger corpora with alphabet sizes of several hundred thousand symbols, the non-optimized approach may become unfeasible The revised algorithm in its two variants perform slower, since they compute -distances But their memory requirements are much lower

24
Appendix adjust-final-state(A,p,q) if q F then if p F then (p) (p) (w (q)) else F F {p} (p) w (q) end if

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google