Presentation is loading. Please wait.

Presentation is loading. Please wait.

An overview of decoding techniques for LVCSR

Similar presentations


Presentation on theme: "An overview of decoding techniques for LVCSR"— Presentation transcript:

1 An overview of decoding techniques for LVCSR
Author :Xavier L.Aubert

2 Outline Introduction General formulation of the decoding problem
Representation of the knowledge sources classification of decoding methods Heuristic techniques to further reduce the search space Experiment results Conclusion

3 introduction Decoding is basically a search process to uncover the word sequence that has the maximum posterior probability for the given acoustic observation Why needs decoding strategies : search space and time This study has been structure along two main axes Static vs. dynamic expansion of the search space Time-synchronous vs. asynchronous decoding

4 General formulation of the decoding problem
Heuristic LM factor

5 General formulation of the decoding problem (cont)
Recombination principle : Select the “best” among several paths in the network as soon as it appears that these paths have identically scored extensions. Pruning principle : discard the unpromising paths

6 General formulation of the decoding problem (cont)
Main actions to be performed by any decoder Generating hypothetical word sequences, usually by successive extensions. Scoring the “active” hypotheses using the knowledge sources. Recombining i.e. merging paths according to the knowledge sources. Pruning to discard the most unpromising paths. Creating “back-pointers” to retrieve the best sentence.

7 Representation of the knowledge sources
Use of stochastic m-gram LM Prefix-tree organization of the lexicon Context-dependent phonetic constraints

8 Use of stochastic m-gram LM
Introducing constraints upon the word sequences Two implications : The search network is fully branched at the word level, each word being possibly followed by any other; The word probabilities depend on the m-1 predecessors :

9 Prefix-tree organization of the lexicon
Figure 1. Prefix tree structure of the lexicon with LM look-ahead (Aubert 2002)

10 Context-dependent phonetic constraints
The last triphone arc of the predecessor word must be replicated Figure 2. cross-word (CW) vs. non-CW triphone transitions (Aubert 2002)

11 classification of decoding methods
Figure 3. Classification “tree” of decoding techniques (Aubert 2002)

12 Static network expansion
Sparsity of knowledge sources and network redundancies Central role of the m-gram LM in the search network Weighted finite state transducer method (WFST)

13 Sparsity of knowledge sources and network redundancies
There are two main sources of potential reduction of the network size: exploiting the sparsity of the knowledge sources detecting and taking advantage of the redundancies.

14 Central role of the m-gram LM in the search network
Figure 4. Interpolated backing-off bigram using a null node (Aubert 2002)

15 Central role of the m-gram LM in the search network(cont)
Figure 5. Bigram network with null node and successor trees (Antoniol et al., 1995).

16 Weighted finite state transducer method (WFST)
Semiring: is a ring that may lack negation. It has two associative operation and that are closed over the set K, they have identities and , respectively, and distributes over . For example , is a probability semiring

17 WFST(cont) Formally, a WFST over the semiring K is giving by an input alphabet , an output alphabet , a finite set of state Q , an finite set of transitions an initial state i Q, a set of final states F Q, an initial weight and a final weight function

18 WFST(cont) transition input:output final states Initial state
Figure 6. Weighted finite-state transducer examples. (Mohri et al. 2000)

19 WFST(cont) Three operation : Composition/Intersection Determinization
Minimization

20 WFST(cont) Composition/Intersection
Figure 7. Example of transducer composition. (Mohri et al. 2000)

21 WFST(cont) Determinization
Figure 8. Example of transducer Determinization . (Mohri et al. 2000)

22 WFST(cont) Minimization
Figure 8. Example of transducer Minimization. (Mohri et al. 2000)

23 Dynamic search network expansion
Re-entrant lexical tree (word-conditioned search) Start-synchronous tree (time-conditioned search) A comparison of time-conditioned and word condition search techniques Asynchronous stack decoding

24 Re-entrant lexical tree
score of the best path up to time t that ends in state s of the lexical tree for the two-word history (u,v) starting time of the best path up to time t that ends in state s of the lexical tree for the two-word history (u,v) The dynamic programming recursion within the tree copies : Where is the product of transition and emission probabilities of the underlying HMM Denotes the optimum predecessor state for the hypothesis (t,s) and two-word history(u,v)

25 Re-entrant lexical tree (cont)
Recombination equation at the word boundaries : where p(w|u,v) is the conditional trigram probability for the word triple (u,v,w) and denotes a terminal state of the lexical tree for the word w To start up new words, we hava to pass on the score and the time index:

26 Re-entrant lexical tree (cont)
The word history conditioned DP organization The per-state stack organization Integration of CW contexts

27 The word history conditioned DP organization
The order of the three dependent coordinates : LM-State->Arc-Id->State-Id Figure 9. Search organization conditioned on word histories (Ney et al., 1992).

28 The per-state stack organization
The order of the three dependent coordinates : Arc-Id->State-Id->LM-State Figure 10. Search organization using the per-state stack (Alleva et al., 1996).

29 Integration of CW contexts
Figure 11. CW transitions with an optional “long” pause. (Aubert 2002)

30 Start-synchronous tree
probability that the acoustic vectors are generated by a word/state sequence with uv as the last two words and as the word Boundary The dynamic programming equation : Recombination equation : where

31 A comparison of time-conditioned and word condition search techniques
For the same number of active states, the average number of active trees per time frame in time-conditioned method is typically much lower than in the word- conditioned method. When we consider the computational effort for the LM recombination ,it is much greater in the time-conditioned search.

32 Asynchronous stack decoding
Implementing a best-first tree search which proceeds by extending, word by word, one or several selected hypotheses without the constraint that they all end at the same time. Three problems to be solved in a stack decoder : Which theory(ies) should be selected for extension How to efficiently compute one-word continuations How to get “reference” score values for pruning

33 Asynchronous stack decoding(cont)
Which theory(ies) should be selected for extension : essentially depends on which information is available regarding the not yet decoded part of the sentence How to efficiently compute one-word continuations : computed with the start-synchronous tree method or using a fast-match algorithm How to get “reference” score values for pruning : consisting of progressively updating the best likelihood scores that can be achieved along the time axis by a path having complete word extensions

34 Heuristic techniques to further reduce the search space
Decoupling the LM from the acoustic-phonetic constraints Acoustic look-ahead pruning

35 Decoupling the LM from the acoustic-phonetic constraints
Interaction between LM contribution and word boundaries : word boundary optimization step Re-entrant tree : Start-synchronous tree : carried out explicitly over all start time that are produced by different “start trees” Delayed LM incorporation with heuristic boundary optimization : the LM will be applied after word expansion has been completed Where is the start time of w for each m-tuple ending with w at t is the word model w ending at t

36 Acoustic look-ahead pruning
Principle of a fast acoustic match : providing a short list of word candidates to extend the most likely theories Phoneme look-ahead in time-synchronous decoders : Figure 12. Combined acoustic and LM look-ahead in time-synchronous search(Aubert & Blasig, 2000), LM stands for language models, LA for look-ahead and AC for acoustic model.

37 Experiment results

38 Conclusion Pros and cons of decoding techniques
static network expansion using WFST time-synchronous dynamic search stack decoders

39 Conclusion (cont) avenues that are currently being studied and appear definitely worth pursuing: hybrid expansion strategies increasing importance of word-graphs integration of very long range syntactical constraints:


Download ppt "An overview of decoding techniques for LVCSR"

Similar presentations


Ads by Google