British Museum Library, London Picture Courtesy: flickr.

Slides:



Advertisements
Similar presentations
Scaling Up Graphical Model Inference
Advertisements

Topic models Source: Topic models, David Blei, MLSS 09.
Information retrieval – LSI, pLSI and LDA
Hierarchical Dirichlet Processes
Title: The Author-Topic Model for Authors and Documents
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
LDA Training System 8/22/2012.
CS 599: Social Media Analysis University of Southern California1 Elementary Text Analysis & Topic Modeling Kristina Lerman University of Southern California.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Statistical Topic Modeling part 1
Latent Dirichlet Allocation (LDA)
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Parallelized variational EM for Latent Dirichlet Allocation: An experimental evaluation of speed and scalability Ramesh Nallapati, William Cohen and John.
Stochastic Collapsed Variational Bayesian Inference for Latent Dirichlet Allocation James Foulds 1, Levi Boyles 1, Christopher DuBois 2 Padhraic Smyth.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Latent Dirichlet Allocation a generative model for text
Computer vision: models, learning and inference Chapter 10 Graphical Models.
LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.
Scalable Text Mining with Sparse Generative Models
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
6. Experimental Analysis Visible Boltzmann machine with higher-order potentials: Conditional random field (CRF): Exponential random graph model (ERGM):
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
Online Learning for Latent Dirichlet Allocation
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
27. May Topic Models Nam Khanh Tran L3S Research Center.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Finding the Hidden Scenes Behind Android Applications Joey Allen Mentor: Xiangyu Niu CURENT REU Program: Final Presentation 7/16/2014.
Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.
Integrating Topics and Syntax -Thomas L
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
Summary We propose a framework for jointly modeling networks and text associated with them, such as networks or user review websites. The proposed.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Building Topic Models in a Federated Digital Library Through Selective Document Exclusion ASIST 2011 New Orleans, LA October 10, 2011 Miles Efron Peter.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Topic Modeling using Latent Dirichlet Allocation
Latent Dirichlet Allocation
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
How can we maintain an error bound? Settle for a “per-step” bound What’s the probability of a mistake at each step? Not cumulative, but Equal footing with.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Online Multiscale Dynamic Topic Models
Multimodal Learning with Deep Boltzmann Machines
Latent Dirichlet Analysis
The topic discovery models
Bayesian Inference for Mixture Language Models
Stochastic Optimization Maximization for Latent Variable Models
Topic models for corpora and for graphs
Michal Rosen-Zvi University of California, Irvine
Expectation-Maximization & Belief Propagation
Latent Dirichlet Allocation
Junghoo “John” Cho UCLA
Topic Models in Text Processing
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

British Museum Library, London Picture Courtesy: flickr

Courtesy: Wikipedia

Topic Models and the Role of Sampling Barnan Das

British Museum Library, London Picture Courtesy: flickr

Topic Modeling Methods for automatically organizing, understanding, searching and summarizing large electronic archives. Uncover hidden topical patterns in collections. Annotate documents according to topics. Using annotations to organize, summarize and search.

Topic Modeling NIH Grants Topic Map 2011 NIH Map Viewer (

Topic Modeling Applications Information retrieval. Content-based image retrieval. Bioinformatics

Overview of this Presentation Latent Dirichlet allocation (LDA) Approximate posterior inference Gibbs sampling Paper Fast collapsed Gibbs sampling for LDA

Latent Dirichlet Allocation David Blei’s Talk Machine Learning Summer School, Cambridge 2009 D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," The Journal of Machine Learning Research, vol. 3, pp , 2003.Latent dirichlet allocation

Probabilistic Model Generative probabilistic modeling Treats data as observations Contains hidden variables Hidden variables reflect thematic structure of the collection. Infer hidden structure using posterior inference Discovering topics in the collection. Placing new data into the estimated model Situating new documents into the estimated topic structure.

Intuition

Generative Model

Posterior Distribution Only documents are observable. Infer underlying topic structure. Topics that generated the documents. For each document, distribution of topics. For each word, which topic generated the word. Algorithmic challenge: Finding the conditional distribution of all the latent variables, given the observation.

LDA as Graphical Model Dirichlet Multinomial

Posterior Distribution From a collection of documents W, infer Per-word topic assignment z d,n Per-document topic proportions  d Per-corpus topic distribution  k Use posterior expectation to perform different tasks.

Posterior Distribution Evaluate P(z|W): posterior distribution over the assignment of words to topic.  and  can be estimated.

Computing P(z|W) Involves evaluating a probability distribution over a large discrete space. Contribution of each z d,n depends on: All z -n values. N k W n -># of times word W d,n has been assigned a topic k. N k d -># of times a word from document d has been assigned a topic k. Sampling from the target distribution using MCMC.

Approximate posterior inference: Gibbs Sampling C. M. Bishop and SpringerLink, Pattern recognition and machine learning vol. 4: Springer New York, 2006.Pattern recognition and machine learning Iain Murray’s Talk Machine Learning Summer School, Cambridge 2009

Overview When exact inference is intractable. Standard sampling techniques have limitation: Cannot handle all kinds of distributions. Cannot handle high dimensional data. MCMC techniques do not have these limitations. Markov chain: For random variables x (1),…,x (M), p(x (m+1) |x (1),…,x (m) )=p(x (m+1) |x (m) ) ; m  {1,…M-1}

Gibbs Sampling Target distribution: p(x) = p(x 1,…,x M ). Choose the initial state of the Markov chain: {x i :i=1,…M}. Replace x i by a value drawn from the distribution p(x i |x -i ). x i : ith component of Z x -i : x 1,…,x M but x i omitted. This process is repeated for all the variables. Repeat the whole cycle for however many samples are needed.

Why Gibbs Sampling? Compared to other MCMC techniques, Gibbs sampling is: Easy to implement Requires little memory Competitive in speed and performance

Gibbs Sampling for LDA The full conditional distribution is: Probability of W d,n under topic k Probability of topic k in document d Z =  k  

Gibbs Sampling for LDA Target distribution: Initial state of Markov chain: {z n } will have value in {1,2,…,K}. Chain run for a number of iterations. In each iteration a new state is found by sampling {z n } from

Gibbs Sampling for LDA Subsequent samples are taken after appropriate lag to ensure that their autocorrelation is low. This is collapsed Gibbs sampling. For single sample  and  are calculated from z.

Fast Collapsed Gibbs Sampling For Latent Dirichlet Allocation Ian Porteous, David Newman, Alexander Ihler, Arthur Asuncion, Padhraic Smyth, Max Welling University of California, Irvine

FastLDA: Graphical Representation

FastLDA: Segments Sequence of bounds on the Z: Z 1,…, Z k Z 1  Z 2  …  Z K = Z Several s l k …s K k segments for each topic. 1 st segment: conservative estimate on the probability of the topic given the upper bound Z k on the true normalization factor Z. Subsequent segments: corrections for the missing probability mass for a topic given the improved bound.

FastLDA: Segments

Upper Bounds for Z Find a sequence of improving bounds on the normalization constant. Z defined in terms of component vectors. Holder’s inequality to construct initial upper bound. Bound intelligently improved for each topic.

Fast LDA Algorithm Algorithm: Sort topics in decreasing order of N k d u ~ Uniform[0,1] For topics in order: Calculate length of segments. For each next topic, Z k is improved. When sum of segments > u: Return topic and return. Complexity: Not more than O(K log K) for any operation.

Experiments Four large datasets: NIPS full papers Enron s NY Times news articles PubMed abstracts  = 0.01 and  = 2/K Computations run on workstations with: Dual Xeon 3.0Ghz processors Code compiled by gcc version 3.4.

Results Speedup : 5-8 times

Results Speedup relatively insensitive to number of documents in the corpus.

Results Large Dirichlet parameter smooths the distribution of the topics within a document. FastLDA needs to visit and compute more topics before drawing a sample.

Discussions

Other domains. Other sampling techniques. Other distributions other than Dirichlet. Parallel computation. Newman et al. “Scalable parallel topic models”.Scalable parallel topic models Deciding on the value of K. Choices of bounds. Reason behind choosing these datasets. Are the values mentioned in the paper magic numbers? Why were the words having count <10 discarded? Assigning weights to words.

Backup Slides

Dirichlet Distribution The Dirichlet distribution is an exponential family distribution over the simplex, i.e., positive vectors that sum to one. The Dirichlet is conjugate to the multinomial. Given a multinomial observation, the posterior distribution of  is a Dirichlet.