Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.

Similar presentations


Presentation on theme: "1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu."— Presentation transcript:

1 1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu

2 Speech Lab. NTNU 2 Reference Chia-Sheng Wu, “ Bayesian Latent Semantic Analysis for Text Categorization and Information Retrieval ”, 2005 Q. Huo and C.-H. Lee, “ On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate ”, 1997

3 Speech Lab. NTNU 3 Outline IntroductionPLSA ML (Maximum Likelihood) MAP (Maximum A Posterior) QB (Quasi-Bayes) ExperimentsConclusions

4 Speech Lab. NTNU 4 Introduction LSA vs. PLSA Linear algebra and probability Semantic space and latent topics Batch learning vs. Incremental learning

5 Speech Lab. NTNU 5 PLSA PLSA is a general machine learning technique, which adopts the aspect model to represent the co-occurrence data. Topics (hidden variables) Corpus (document-word pairs)

6 Speech Lab. NTNU 6 PLSA Assume that d i and w j are independent conditionally on the mixture of associated topic z k Joint probability:

7 Speech Lab. NTNU 7 ML PLSA Log likelihood of Y: ML estimation:

8 Speech Lab. NTNU 8 ML PLSA Maximization:

9 Speech Lab. NTNU 9 ML PLSA Complete data: Incomplete data: EM (Expectation-Maximization) Algorithm E-stepM-step

10 Speech Lab. NTNU 10 ML PLSA E-Step

11 Speech Lab. NTNU 11 ML PLSA Auxiliary function: And

12 Speech Lab. NTNU 12 ML PLSA M-step: Lagrange multiplier

13 Speech Lab. NTNU 13 ML PLSA Differentiation New parameter estimation:

14 Speech Lab. NTNU 14 MAP PLSA Estimation by Maximizing the posteriori probability: Definition of prior distribution: Dirichlet density: Prior density: Kronecker delta Assume and are independent

15 Speech Lab. NTNU 15 MAP PLSA Consider prior density: Maximum a Posteriori:

16 Speech Lab. NTNU 16 MAP PLSA E-step:expectation Auxiliary function:

17 Speech Lab. NTNU 17 MAP PLSA M-step Lagrange multiplier

18 Speech Lab. NTNU 18 MAP PLSA Auxiliary function:

19 Speech Lab. NTNU 19 MAP PLSA Differentiation New parameter estimation:

20 Speech Lab. NTNU 20 QB PLSA It needs to update continuously for an online information system. Estimation by maximize the posteriori probability: Posterior density is approximated by the closest tractable prior density with hyperparameters As compared to MAP PLSA, the key difference using QB PLSA is due to the updating of hyperparameters.

21 Speech Lab. NTNU 21 QB PLSA Conjugate prior: In Bayesian probability theory, a conjugate prior is a prior distribution which has the property that the posterior distribution is the same type of distribution. A close-form solution A reproducible prior/posteriori pair for incremental learning

22 Speech Lab. NTNU 22 QB PLSA Hyperparameter α:

23 Speech Lab. NTNU 23 QB PLSA After careful arrangement, exponential of posteriori expectation function can be expressed: A reproducible prior/posterior pair is generated to build the updating mechanism of hyperparameters

24 Speech Lab. NTNU 24 Initial Hyperparameters A open issue in Bayesian learning If the initial prior knowledge is too strong or after a lot of adaptation data have been incrementally processed, the new adaptation data usually have only a small impact on parameters updating in incremental training.

25 Speech Lab. NTNU 25 Experiments MED Corpus: 1033 medical abstracts with 30 queries 7014 unique terms 433 abstracts for ML training 600 abstracts for MAP or QB training Query subset for testing K=8Reuters-21578 4270 documents for training 2925 for QB learning 2790 documents for testing 13353 unique words 10 categories

26 Speech Lab. NTNU 26 Experiments

27 Speech Lab. NTNU 27 Experiments

28 Speech Lab. NTNU 28 Experiments

29 Speech Lab. NTNU 29 Conclusions This paper presented an adaptive text modeling and classification approach for PLSA based information system. Future work: Extension of PLSA for bigram or trigram will be explored. Application for spoken document classification and retrieval

30 30 Discriminative Maximum Entropy Language Model for Speech Recognition Chuang-Hua Chueh, To-Chang Chien and Jen- Tzung Chien Presenter: Hsuan-Sheng Chiu

31 Speech Lab. NTNU 31 Reference R. Rosenfeld, S. F. Chen and X. Zhu, “ Whole-sentence exponential language models : a vehicle for linguistic statistical integration ”, 2001 W.H. Tsai, “ An Initial Study on Language Model Estimation and Adaptation Techniques for Mandarin Large Vocabulary Continuous Speech Recognition ”, 2005

32 Speech Lab. NTNU 32 Outline Introduction Whole-sentence exponential model Discriminative ME language model ExperimentConclusions

33 Speech Lab. NTNU 33 Introduction Language model Statistical n-gram model Latent semantic language model Structured language model Based on maximum entropy principle, we can integrate different features to establish optimal probability distribution.

34 Speech Lab. NTNU 34 Whole-Sentence Exponential Model Traditional method: Exponential form: Usage: When used for speech recognition, the model is not suitable for the first pass of the recognizer, and should be used to re-score N-best lists.

35 Speech Lab. NTNU 35 Whole-Sentence ME Language Model Expectation of feature function: Empirical:Actual:Constraint:

36 Speech Lab. NTNU 36 Whole-Sentence ME Language Model To Solve the constrained optimization problem:

37 Speech Lab. NTNU 37 GIS algorithm

38 Speech Lab. NTNU 38 Discriminative ME Language Model In general, ME can be considered as a maximum likelihood model using log-linear distribution. Propose a Discriminative language model based on whole- sentence ME model (DME)

39 Speech Lab. NTNU 39 Discriminative ME Language Model Acoustic features for ME estimation: Sentence-level log-likelihood ratio of competing and target sentences Feature weight parameter: Namely, we activate feature parameter to be one for those speech signals observed in training database

40 Speech Lab. NTNU 40 Discriminative ME Language Model New estimation: Upgrade to discriminative linguistic parameters

41 Speech Lab. NTNU 41 Discriminative ME Language Model

42 Speech Lab. NTNU 42 Experiment Corpus: TCC300 32 mixtures 12 Mel-frequency cepstral coefficients 1 log-energy and first derivation 4200 sentences for training, 450 for testing Corpus: Academia Sinica CKIP balanced corpus Five million words Vocabulary 32909 words

43 Speech Lab. NTNU 43 Experiment

44 Speech Lab. NTNU 44 Conclusions A new ME language model integrating linguistic and acoustic features for speech recognition The derived ME language model was inherent with discriminative power. DME model involved a constrained optimization procedure and was powerful for knowledge integration.

45 Speech Lab. NTNU 45 Relation between DME and MMI MMI criterion: Modified MMI criterion: Express ME model as ML model:

46 Speech Lab. NTNU 46 Relation between DME and MMI The optimal parameter:

47 Speech Lab. NTNU 47 Relation between DME and MMI

48 Speech Lab. NTNU 48 Relation between DME and MMI


Download ppt "1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu."

Similar presentations


Ads by Google