Presentation is loading. Please wait.

Presentation is loading. Please wait.

Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass.

Similar presentations


Presentation on theme: "Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass."— Presentation transcript:

1 Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass

2 2 Outline Introduction LDA HMM-LDA Experiments Conclusions

3 3 Introduction An effective LM needs to not only account for the casual speaking style of lectures but also accommodate the topic-specific vocabulary of the subject matter Available training corpora rarely match the target lecture in both style and topic In this paper, the syntactic state and semantic topic assignment are investigated using HMM with LDA model

4 4 LDA A generative probabilistic model of a corpus The topic mixture is drawn from a conjugate Dirichlet prior –PLSA –LDA –Model parameters

5 5 Markov chain Monte Carlo A class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its stationary distribution The most common application of these algorithms is numerically calculating multi-dimensional integrals –an ensemble of "walkers" moves around randomly A Markov chain is constructed in such a way as to have the integrand as its equilibrium distribution

6 6 LDA Estimate posteriori Integrating out: Gibbs sampling

7 7 Markov chain Monte Carlo (cont.) Gibbs Sampling http://en.wikipedia.org/wiki/Gibbs_sampling http://en.wikipedia.org/wiki/Gibbs_sampling

8 8 HMM+LDA HMMs generate documents purely based on syntactic relations among unobserved word classes –Short-range dependencies Topic model generate documents based on semantic correlations between words, independent of word order –long-range dependencies A major advantage of generative models is modularity –Different models are easily combined –Words are exhibited by Mixture of model & product of model –Only a subset of words, content words, exhibit long-range dependencies Replace one probability distribution over words used in syntactic model with the semantic model

9 9 HMM+LDA (cont.) Notation: –A sequence of words –A sequence of topic assignments –A sequence of classes – means semantic class –zth topic associated with distribution over words –Each class is associated with distribution over words –Each document has a distribution over topic –Transition between class and follows a distribution

10 10 HMM+LDA (cont.) A document is generated: –Sample from a prior –For each word in document Draw from If, then draw from,else draw from

11 11 HMM+LDA (cont.) Inference – are drawn from –The row of the transition matrix are drawn from – are drawn from –Assume all Dirichlet distribution are symmetric

12 12 HMM+LDA (cont.) Gibbs Sampling

13 13 HMM-LDA Analysis Lectures Corpus –3 undergraduate subject in math, physics, computer science –10 CS lectures for development set, 10 CS lectures for test set Textbook Corpus –CS course textbook –divided in to 271 topic-cohesive documents at every section heading Run Gibbs sampler against the two dataset –L: 2,800 iterations, T: 2,000 iterations –Use lowest perplexity model as the final model

14 14 HMM-LDA Analysis (cont.) Semantic topics (Lectures) Machine learningLinear Algebra Magnetism : cursory examination of the data suggests that speakers talking about children tend to laugh more during the lecture Although it may not be desirable to capture speaker idiosyncrasies in the topic mixtures, HMM-LDA has clearly demonstrated its ability to capture distinctive semantic topics in a corpus Childhood Memories

15 15 HMM-LDA Analysis (cont.) Semantic topics (Textbook) A topically coherent paragraph 6 of the 7 instances of the words “and” and “or” (underline) are correctly classified Multi-word topic key phrases can be identified for n-gram topic models the context-dependent labeling abilities of the HMM-LDA models is demonstrated

16 16 HMM-LDA Analysis (cont.) Syntactic States (Lectures) –State 20 is topic state Verbs Prepositions Hesitation disfluencies Conjunctions As demonstrated with spontaneous speech, HMM-LDA yields syntactic states that have a good correspondence to part-of speech labels, without requiring any labeled training data

17 17 Discussions Although MCMC techniques converge to the global stationary distribution, we cannot guarantee convergence from observation of the perplexity alone Unlike EM algorithms, random sampling may actually temporarily decrease the model likelihood The number of iteration was chosen to be at least double the point at which the PP first appeared to converge

18 18 Language Modeling Experiments Baseline model: Lecture + Textbook Interpolated trigram model (using modified Kneser-Ney discounting) Topic-deemphasized style (trigram) model (Lectures): –To deemphasize the observed occurrences of topic words and ideally redistribute these counts to all potential topic words –The counts of topic to style word transitions are not altered

19 19 Language Modeling Experiments (cont.) Textbook model should ideally have higher weight in the contexts containing topic words Domain trigram model (Textbook): –Emphasize the sequences containing a topic word in the context by doubling their counts

20 20 Language Modeling Experiments (cont.) unsmoothed topical tirgram model: –Apply HMM-LDA with 100 topics to identify representative words and their associated contexts for each topics Topic mixtures for all models –Mixture weights were tuned on individual target lectures (cheat) –15 of 100 topics account for over 90% of the total weight

21 21 Language Modeling Experiments (cont.) Since the topic distribution shifts over a long lecture, modeling a lecture with fixed weights may not be the most optimal Update the mixture distribution by linearly interpolating it with the posterior topic distribution given the current word

22 22 Language Modeling Experiments (cont.) The variation of topic mixtures Review previous lecture -> Show an example of computation using accumulators -> Focus the lecture on stream as a data structure, with an intervening example that finds pairs of i and j that sum up to a prime

23 23 Language Modeling Experiments (cont.) Experimental results

24 24 Conclusions HMM-LDA shows great promise for finding structure in unlabeled data, from which we can build more sophisticated models Speaker-specific adaptation will be investigated in the future


Download ppt "Style & Topic Language Model Adaptation Using HMM-LDA Bo-June (Paul) Hsu, James Glass."

Similar presentations


Ads by Google