Presentation is loading. Please wait.

Presentation is loading. Please wait.

British Museum Library, London Picture Courtesy: flickr.

Similar presentations


Presentation on theme: "British Museum Library, London Picture Courtesy: flickr."— Presentation transcript:

1

2 British Museum Library, London Picture Courtesy: flickr

3 Courtesy: Wikipedia

4 Topic Models and the Role of Sampling Barnan Das

5 British Museum Library, London Picture Courtesy: flickr

6 Topic Modeling Methods for automatically organizing, understanding, searching and summarizing large electronic archives. Uncover hidden topical patterns in collections. Annotate documents according to topics. Using annotations to organize, summarize and search.

7 Topic Modeling NIH Grants Topic Map 2011 NIH Map Viewer (https://app.nihmaps.org)https://app.nihmaps.org

8 Topic Modeling Applications Information retrieval. Content-based image retrieval. Bioinformatics

9 Overview of this Presentation Latent Dirichlet allocation (LDA) Approximate posterior inference Gibbs sampling Paper Fast collapsed Gibbs sampling for LDA

10 Latent Dirichlet Allocation David Blei’s Talk Machine Learning Summer School, Cambridge 2009 D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," The Journal of Machine Learning Research, vol. 3, pp. 993-1022, 2003.Latent dirichlet allocation

11 Probabilistic Model Generative probabilistic modeling Treats data as observations Contains hidden variables Hidden variables reflect thematic structure of the collection. Infer hidden structure using posterior inference Discovering topics in the collection. Placing new data into the estimated model Situating new documents into the estimated topic structure.

12 Intuition

13 Generative Model

14 Posterior Distribution Only documents are observable. Infer underlying topic structure. Topics that generated the documents. For each document, distribution of topics. For each word, which topic generated the word. Algorithmic challenge: Finding the conditional distribution of all the latent variables, given the observation.

15 LDA as Graphical Model Dirichlet Multinomial

16 Posterior Distribution From a collection of documents W, infer Per-word topic assignment z d,n Per-document topic proportions  d Per-corpus topic distribution  k Use posterior expectation to perform different tasks.

17 Posterior Distribution Evaluate P(z|W): posterior distribution over the assignment of words to topic.  and  can be estimated.

18 Computing P(z|W) Involves evaluating a probability distribution over a large discrete space. Contribution of each z d,n depends on: All z -n values. N k W n -># of times word W d,n has been assigned a topic k. N k d -># of times a word from document d has been assigned a topic k. Sampling from the target distribution using MCMC.

19 Approximate posterior inference: Gibbs Sampling C. M. Bishop and SpringerLink, Pattern recognition and machine learning vol. 4: Springer New York, 2006.Pattern recognition and machine learning Iain Murray’s Talk Machine Learning Summer School, Cambridge 2009

20 Overview When exact inference is intractable. Standard sampling techniques have limitation: Cannot handle all kinds of distributions. Cannot handle high dimensional data. MCMC techniques do not have these limitations. Markov chain: For random variables x (1),…,x (M), p(x (m+1) |x (1),…,x (m) )=p(x (m+1) |x (m) ) ; m  {1,…M-1}

21 Gibbs Sampling Target distribution: p(x) = p(x 1,…,x M ). Choose the initial state of the Markov chain: {x i :i=1,…M}. Replace x i by a value drawn from the distribution p(x i |x -i ). x i : ith component of Z x -i : x 1,…,x M but x i omitted. This process is repeated for all the variables. Repeat the whole cycle for however many samples are needed.

22 Why Gibbs Sampling? Compared to other MCMC techniques, Gibbs sampling is: Easy to implement Requires little memory Competitive in speed and performance

23 Gibbs Sampling for LDA The full conditional distribution is: Probability of W d,n under topic k Probability of topic k in document d Z =  k  

24 Gibbs Sampling for LDA Target distribution: Initial state of Markov chain: {z n } will have value in {1,2,…,K}. Chain run for a number of iterations. In each iteration a new state is found by sampling {z n } from

25 Gibbs Sampling for LDA Subsequent samples are taken after appropriate lag to ensure that their autocorrelation is low. This is collapsed Gibbs sampling. For single sample  and  are calculated from z.

26 Fast Collapsed Gibbs Sampling For Latent Dirichlet Allocation Ian Porteous, David Newman, Alexander Ihler, Arthur Asuncion, Padhraic Smyth, Max Welling University of California, Irvine

27 FastLDA: Graphical Representation

28 FastLDA: Segments Sequence of bounds on the Z: Z 1,…, Z k Z 1  Z 2  …  Z K = Z Several s l k …s K k segments for each topic. 1 st segment: conservative estimate on the probability of the topic given the upper bound Z k on the true normalization factor Z. Subsequent segments: corrections for the missing probability mass for a topic given the improved bound.

29 FastLDA: Segments

30 Upper Bounds for Z Find a sequence of improving bounds on the normalization constant. Z defined in terms of component vectors. Holder’s inequality to construct initial upper bound. Bound intelligently improved for each topic.

31 Fast LDA Algorithm Algorithm: Sort topics in decreasing order of N k d u ~ Uniform[0,1] For topics in order: Calculate length of segments. For each next topic, Z k is improved. When sum of segments > u: Return topic and return. Complexity: Not more than O(K log K) for any operation.

32 Experiments Four large datasets: NIPS full papers Enron emails NY Times news articles PubMed abstracts  = 0.01 and  = 2/K Computations run on workstations with: Dual Xeon 3.0Ghz processors Code compiled by gcc version 3.4.

33 Results Speedup : 5-8 times

34 Results Speedup relatively insensitive to number of documents in the corpus.

35 Results Large Dirichlet parameter smooths the distribution of the topics within a document. FastLDA needs to visit and compute more topics before drawing a sample.

36 Discussions

37 Other domains. Other sampling techniques. Other distributions other than Dirichlet. Parallel computation. Newman et al. “Scalable parallel topic models”.Scalable parallel topic models Deciding on the value of K. Choices of bounds. Reason behind choosing these datasets. Are the values mentioned in the paper magic numbers? Why were the words having count <10 discarded? Assigning weights to words.

38

39 Backup Slides

40 Dirichlet Distribution The Dirichlet distribution is an exponential family distribution over the simplex, i.e., positive vectors that sum to one. The Dirichlet is conjugate to the multinomial. Given a multinomial observation, the posterior distribution of  is a Dirichlet.


Download ppt "British Museum Library, London Picture Courtesy: flickr."

Similar presentations


Ads by Google