Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.

Similar presentations


Presentation on theme: "Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23."— Presentation transcript:

1 Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23

2 Outline “Novel Weighting Scheme for Unsupervised Language Model Adaptation Using Latent Dirichlet Allocation”, 2010 “Unsupervised Language Model Adaptation Using Latent Dirichlet Allocation and Dynamic Marginals”, 2011 “Topic N-gram Count Language Model Adaptation for Speech Recognition”, 2012 “Comparison of a Bigram PLSA and a Novel Context-Based PLSA Language Model for Speech Recognition”, 2013 2

3 Novel Weighting Scheme for Unsupervised Language Model Adaptation Using Latent Dirichlet Allocation Md. Akmal Haidar and Douglas O’Shaughnessy INTERSPEECH 2010

4 Introduction Adaptation is required when the styles, domains or topics of the test data are mismatched with the training data. It is also important as natural language is highly variable since the topic information is highly non-stationary. The idea of an unsupervised LM adaptation approach is to extract the latent topics from the training set and then adapt the topic specific LM with proper mixture weights, finally interpolated with the generic n-gram LM. In this paper, we propose the idea that the weights of topic models are generated using the word count of the topics generated by a hard-clustering method. 4

5 Proposed Method 5 Adaptation: –we can create a dynamically adapted topic model by using a mixture of LMs from different topics as:

6 Proposed Method 6 The adapted topic model is then interpolated with the generic LM as:

7 Experiment Setup We evaluated the LM adaptation approach using the Brown Corpus and WSJ1 corpus transcription text data. 7

8 Experiments 8

9 Unsupervised Language Model Adaptation Using Latent Dirichlet Allocation and Dynamic Marginals Md. Akmal Haidar and Douglas O’Shaughnessy INTERSPEECH 2010

10 Introduction To overcome the mismatch problem. we introduce an unsupervised language model adaptation approach using latent Dirichlet allocation (LDA) and dynamic marginals: locally estimated (smoothed) unigram probabilities from in-domain text data. we extend our previous work to find an adapted model by using the minimum discriminant information (MDI), which uses KL divergence as the distance measure between probability distributions. The final adapted model is formed by minimizing the KL divergence between the final adapted model and the LDA adapted topic model. 10

11 Proposed Method Topic clustering 11 LDA adapted topic mixture model generation

12 Proposed Method Adaptation using dynamic marginals –The adapted model using dynamic marginals is obtained by minimizing the KL-divergence between the adapted model and the background model subject to the marginalization constraint for each word w in the vocabulary: 12 –The constraint optimization problem has close connection to the maximum entropy approach, which provides that the adapted model is a rescaled version of the background model:

13 Proposed Method The background and the LDA adapted topic model have standard back-off structure and the above constraint, so the adapted LM has the following recursive formula: 13

14 Experiments 14

15 Topic N-gram Count Language Model Adaptation for Speech Recognition Md. Akmal Haidar and Douglas O’Shaughnessy INTERSPEECH 2010

16 Introduction 16

17 Proposed Method 17 Using these features of the LDA model, we proposed two confidence measures to compute the topic mixture weights for each n-gram: The topic n-gram language models are then generated using the topic n-gram counts and defined as TNCLM.

18 Proposed Method 18 The ANCLM are then interpolated with the background n- gram model to capture the local constraints using the linear interpolation as:

19 Experiments 19


Download ppt "Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23."

Similar presentations


Ads by Google