Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.

Slides:

Advertisements

Similar presentations

Topic models Source: Topic models, David Blei, MLSS 09.

Advertisements

Introduction to Haplotype Estimation Stat/Biostat 550.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,

An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.

Chapter 4: Linear Models for Classification

Statistical Topic Modeling part 1

Machine Learning and Data Mining Clustering

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov April TexPoint fonts used in EMF. Read the TexPoint manual before.

Visual Recognition Tutorial

Overview Full Bayesian Learning MAP learning

Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.

Visual Recognition Tutorial

Learning Bayesian Networks

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

Language Modeling Approaches for Information Retrieval Rong Jin.

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.

Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

EM and expected complete log-likelihood Mixture of Experts

PARAMETRIC STATISTICAL INFERENCE

Text Classification, Active/Interactive learning.

CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by HAO-WEI, YEH.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.

Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.

Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.

Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.

- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.

Randomized Algorithms for Bayesian Hierarchical Clustering

Probabilistic Models for Discovering E-Communities Ding Zhou, Eren Manavoglu, Jia Li, C. Lee Giles, Hongyuan Zha The Pennsylvania State University WWW.

Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.

Flat clustering approaches

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Gaussian Processes For Regression, Classification, and Prediction.

Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Analysis of Social Media MLD , LTI William Cohen

Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )

Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

The inference and accuracy We learned how to estimate the probability that the percentage of some subjects in the sample would be in a given interval by.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Oliver Schulte Machine Learning 726

Machine Learning and Data Mining Clustering

Bayes Net Learning: Bayesian Approaches

Oliver Schulte Machine Learning 726

Latent Variables, Mixture Models and EM

CSCI 5822 Probabilistic Models of Human and Machine Learning

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

More about Posterior Distributions

KAIST CS LAB Oh Jong-Hoon

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Bayesian Inference for Mixture Language Models

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

Topic Models in Text Processing

Parametric Methods Berlin Chen, 2005 References:

Machine Learning and Data Mining Clustering

Presentation transcript:

Introduction to LDA Jinyang Gao

Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter Setting

Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter Setting

Bayesian Analysis Suppose we have some coins, they have an average 0.75 probability to appear the FRONT. We throw a coin, how should we esimate? FRONT: 0.75 BACK: 0.25 Prior Estimation

Bayesian Analysis Suppose we throw a coin 100 times, and we observed that 25 of them is FRONT. How should we estimate the next throw: FRONT: 0.25 BACK: 0.75 Maximum Likelihood Estimation

Bayesian Analysis Can we give a trade-off between prior and observation? Prior is NOT certain to be some fixed value. – Change 0.75 to a distribution of Beta(u|15, 5) Add posterior observation (5 FRONT 15 BACK) – Beta(u|15, 5) to Beta(u|15, 15) Calculate the expectation etc.

Bayesian Analysis Key idea: – Express the uncertainty of prior estimation as a distribution. – Distribution converge to a single value after more and more observation – Little observation : prior estimation – Large observation: posterior observation – If we have strong confidence about prior, a single value estimation after any observation won’t change.

Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter Setting

Dirichlet Distribution

Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter Setting

Evolution of Topic Model Here we give some solutions from NAÏVE to LDA. – Kmeans (TF vector version) – Kmeans with KL-divergence(Language Model Version) – PLSA (fixed topic frequency prior) – LDA (based on topic frequency observation and smoothing)

Evolution of Topic Model K-means with TF vector: – We begin with one simplest model. – Just cluster the document! – Each document is a vector of terms. – How to cluster? K-means! – Each cluster is a topic. – Each topic is a TF vector.

Evolution of Topic Model Problems of K-means with TF vector – High frequency words over influence(idf logtf and stop words can help some) – Correlation among the words – Single word than a topic (implement it and you will see)

Evolution of Topic Model K-means with KL-divergence: – Generation model about text. – Each text is a probability distribution of words. – Still just cluster the document. – K-means(not cosine or Euclidean, KL-divegence) – Each cluster is a topic. – Each topic is a distribution of words.

Evolution of Topic Model Problems of K-means with KL-divergence: – Much better, some topic appear. – Still not clearly. – Each document only have one topic? – It’s still just a good cluster method for documents.

Evolution of Topic Model PLSA/PLSI – Each text is a probability distribution of words. – Each text is a distribution of topics. – Probabilistic way to assign topics and words(EM). – Each cluster is a topic (but no entire document in a cluster). – Each topic is a distribution of words.

Evolution of Topic Model Problems of PLSA: – First available version of topic model in this evolution! – General words? Context information? See works of QZ Mei among – What about the k in K-means? – Each topic is not in the same size. – Can two topics with same distribution combine? – Can a large topic break?

Evolution of Topic Model LDA: – Gives a prior distribution of topics. – From maximum likelihood estimation(MLE) to Bayesian analysis in word-to-topic assignments. – Dirichlet is the easiest way! – Give a complete Bayesian analysis.

Evolution of Topic Model Analysis of LDA: – Small topic will disappear (even the central point text has a larger probability to be chose by a large nearby topic). K is self-adaption here. – Smoothing in topic-word distribution.

What About Short Text? Consider the following: – Lots of documents only have one meaningful word. – How many words is enough to be a topic? – Usually no ‘blue’ and ‘red’ co-occurred in a short text, but “blue plane” or “red car”. – ……

Evolution of Topic Model This is only some milestone in this evolution line. Small changes may give different results. – Text weight – General words – Probabilistic clustering – Hyperparameters – Context information – Hierarchy

Evolution of Topic Model You SHOULD implement ALL of them if you want to get a deep understand of topic model ! – I implemented all of them in both long and short text in my undergraduate. The code is easy and data is also easy to be obtained. – Check some topic (and their variation in iteration) and find why they work well or bad. – You will know more about each consideration in model inference and some derivation is not difficult in code.

Evolution of Topic Model You should know why some models are RIGHT rather than performs good in experiment. Otherwise you can’t know which model is RIGHT in your own problem (usually some features changed). Study the features of models, data and targets carefully. Use Occam's Razor to develop your model.

Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter Setting

Gibbs Sampling Gibbs sampling: – Key idea: if all the parameters are decided, then the decision for new things should be easy. – Choose one thing (e.g. one word’s topic etc.) – Fix all others. – Sample (not optimize) based on other. – Loop until converge.

Gibbs Sampling Pls read the paper carefully for the details. It is a easy-to-follow material for Gibbs in LDA.

Gibbs Sampling EM – Fix all parameters or settings – Compute the best(maximize likelihood) for all parameters or settings – Changed to the new setting – Loop until converge

Gibbs Sampling Either Gibbs or EM gives a best estimation! Exact best estimation is to calculate the expectation of each random variable consider all the possible situation(exponential), but NOT their optimized expectation in current status. But so far these are the best we can do. No good or bad for them in my personal view.

Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter Settings

Parameter Settings

Summary Bayesian Analysis: Prior-Observation Trade-off Dirichlet Distribution: Smoothing Method Topic Model Evolution: Why It Works Well Gibbs and EM: Variable Inference Methods Parameter Setting: How Many Topics, Words in a Topic

THANKS Q&AQ&AQ&AQ&A