LDA Training System 8/22/2012.

LDA Training System xueminzhao@tencent.com 8/22/2012

Outline Introduction SparseLDA Rethinking LDA: Why Priors Matter LDA Training System Design: MapReduce-LDA

Problem – Text Relevance Q1: apple pie Q2: iphone crack Doc1: Apple Computer Inc. is a well known company located in California, USA. Doc2: The apple is the pomaceous fruit of the apple tree, spcies Malus domestica in the rose.

Topic Models

Topic Model – Generative Process

Topic Model - Inference

Latent Dirichlet Allocation

Gibbs Sampling for LDA

Document-Topic Statistics

Topic-Word Statistics

For each token,

Sample a new topic

For each token,

Summary so far

The normalizing constant

Statistics are sparse

Summary so far

Huge savings: time and memory

Priors for LDA

Comparing Priors for LDA

Optimizing m

Selecting T

Overview

MapReduce Jobs

Scalability Hypothesis - memory 40GB per machine; - 5 words per doc. Scalability - if # limit; - if # limit.

Experiment for Correctness Validation

References D. Blei, Andrew Ng, and M. Jordan, Latent Dirichlet Allocation, JMLR2003. Thomas L. Griffiths, and Mark Steyvers, Finding scientific topics, PNAS2004. Gregor Heinrich, Parameter estimation for text analysis, Technical Report, 2009. Limin Yao, David Mimno, and Andrew McCallum. Efficient Methods for Topic Model Inference on StreamingDocument Collections. KDD'09. Hanna M. Wallach, David Mimno, and Andrew McCallum, Rethinking LDA: Why Priors Matter, NIPS2009. David Newman, Arthur Asuncion, Padhraic Smyth, and Max Welling, Distributed Inference for Latent Dirichlet Allocation, NIPS2007. Yi Wang, Hongjie Bai, Matt Stanton, Wen-Yen Chen, and Edward Y. Chang, PLDA: Parallel Latent Dirichlet Allocation for Large-scale Applications, AAIM2009. Xueminzhao. LDA design doc. http://x.x.x.x/~xueminzhao/html_docs/internal/modules/lda.html. http://x.x.x.x/~xueminzhao/html_docs/internal/modules/lda.html

LDA Training System 8/22/2012.

Similar presentations

Presentation on theme: "LDA Training System 8/22/2012."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LDA Training System 8/22/2012.

Similar presentations

Presentation on theme: "LDA Training System 8/22/2012."— Presentation transcript:

Similar presentations

About project

Feedback