A NONPARAMETRIC BAYESIAN APPROACH FOR

A NONPARAMETRIC BAYESIAN APPROACH FOR
AUTOMATIC DISCOVERY OF A LEXICON AND ACOUSTIC UNIT Temple University College of Engineering Amir Harati Conversational Technologies Jibo Inc. Joseph Picone The Institute for Signal and Information Processing Temple University Introduction State of the art speech recognition systems use data-intensive context-dependent phonemes as acoustic units. Resources such as a language model or a lexicon might not be available for all languages. Learning the lexicon and acoustic units from data is attractive for low resource languages. Learning acoustic units is an example of a problem where the complexity of model (e.g. number of units) is unknown. Automatically Discovered Units Require a sub-word acoustic unit (e.g. phonemes). Most approaches learn using a two step process: (1) segmentation and (2) clustering. Instead of an initial segmentation/clustering, we first learn a transducer. Automatically discovered units (ADUs) are relatively stationary. Learning: Use Ergodic HDPHMM/DHDPHMM to train the transducer. Decoding : Use Viterbi, Forward-backward to find the optimum path or probability distribution. Experiments: Relation to Phonemes Experiment using TIMIT dataset. ADU transducer is learned from training section of TIMIT. Test subset of TIMIT is decoded using the learned ADU transducer. ADU units are aligned with phonemes. ADU units are modeled by states of HDPHMM (Gaussian mixtures) and therefore are more stationary relative to phonemes. Experiments: Lexicon Learning ADU transducer trained on TIMIT but lexicon is learned from Resource Management (RM). For low complexity ASR, ADUs perform better, but the performance gains diminish as system complexity increases. Table 4. A Comparison of Lexicon Learning Algorithms Figure 1. Model Complexity as a Function of Available Data. (a) 20 (b) 200 (c) 2000 Data Points Figure 3. Relationship Between ADUs and Phonemes Figure 2. Example of ADUs vs. Phonemes. Nonparametric Bayesian Models Lexicon Learning Lexicon: a mapping of words into acoustic units. For ADUs we need to also learn the lexicon. We assume existence of parallel transcriptions. The algorithm uses a special version of dynamic time warping (DTW) algorithm that can align a small sequence in a larger one. Learning Algorithm Generate the posteriorgram representation for all utterances in the dataset using an ADU transducer. Generate an approximate alignment between the words and the output stream of the ADU transducer Use the aligned transcription to extract all examples of each word. Use a sub-sequence DTW algorithm to align all examples and find instances with least average edit distance to other instances. Generate a lexicon and use it train a new ASR system. Force align the transcriptions using new ASR system. Use the aligned transcriptions to extract all examples of each word If convergence is not achieved, go to step 4. Experiments: Spoken Term Detection Task: given a sample of a word (query), find all the occurrences in the dataset. Approach: Convert the database and query into an ADU. Search the query in the database using a sub-sequence DTW algorithm. Future Work Develop nonparametric Bayesian models that can model non-stationary units. Currently, each unit is modeled by a single state of HDPHMM. However, we need HDPHMMs in which each state is modeled by another HMM model. Evaluations on other datasets and languages is needed to validate results. Investigate new approaches for mapping ADU units to words. For example, we can train a G2P using parallel streams of ADUs and letters. References A. Harati and J. Picone, “Speech Acoustic Unit Segmentation Using Hierarchical Dirichlet Processes,” in Proceedings of INTERSPEECH, 2013, pp. 637–641. A. Harati and J. Picone, “A Nonparametric Bayesian Approach for Spoken Term Detection by Example Query,” in Proceedings of INTERSPEECH, 2016, 313. A. Harati and J. Picone, “A Doubly Hierarchical Dirichlet Process Hidden Markov Model with a Non-Ergodic Structure,” IEEE/ACM Transactions on Audio, Speech, and Language Proc., vol. 24, no. 1, pp. 174–184, 2016. Mixture Model Generative Model Nonparametric Bayesian equivalent extended to multi-group where each group is modeled with a mixture (HDP): Extend the same approach to HMMs (HDPHMM): an infinite number of states; each state output is modeled by a DPM. Previously introduced an extension (DHDPHMM) that allows outputs to be modeled by HDP: Implementation is available at: Table 1. Spoken Term Detection by Query Table 2. Segmentation Performance Table 3. Error Examples

A NONPARAMETRIC BAYESIAN APPROACH FOR

Similar presentations

Presentation on theme: "A NONPARAMETRIC BAYESIAN APPROACH FOR"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A NONPARAMETRIC BAYESIAN APPROACH FOR

Similar presentations

Presentation on theme: "A NONPARAMETRIC BAYESIAN APPROACH FOR"— Presentation transcript:

Similar presentations

About project

Feedback