Presentation is loading. Please wait.

Presentation is loading. Please wait.

LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.

Similar presentations


Presentation on theme: "LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference."— Presentation transcript:

1 LATENT DIRICHLET ALLOCATION

2 Outline Introduction Model Description Inference and Parameter Estimation Example Reference

3 Introduction As more information becomes available, it becomes more difficult to access what we are looking for. We need new tools to help us organize, search, and understand these vast amounts of information.

4 Introduction Topic modeling provides methods for automatically organizing, understanding, searching, and summarizing large electronic archives. Uncover the hidden topical patterns that pervade the collection. Annotate the documents according to those topics. Use the annotations to organize, summarize, and search the texts.

5 Intuition behind LDA

6 Notation and Assumption We have a set of documents, constituting a corpus. Each document is a collection of words or a “bag of words”. (Exchangeability) After elimination of some stopping words, a corpus contains V words:, involve K topic with distributions: Each document is composed of N “important” or “Effective” words: and with topic proportions.

7 1….. topic …..K 1...nth word..Nd 1…word idx…V

8 Model Definition

9 Dirichlet and Multinomial Distribution It’s more like such a distribution that is used to describe another distribution. E.g. Multinomial Multinomial: where and Dirichlet Where variable \theta can take values in the (k-1) simplex.

10 Dirichlet and Multinomial Distribution

11 Properties

12 LSA & LDA

13 Reference Latent Dirichlet Allocation, DM Blei, AY Ng, MI jordan – the journal of machine learning research, 2003 Topic Models Vs. Unstructured Data, G Anthes – Communications of the ACM, 2010 Probabilistic Topic Models, M Steyvers, T Griffiths – Handbook of latent sematic analysis, 2007 GibbsSampling for the Uninitiated, P Resnik, E Hardisty - 2010


Download ppt "LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference."

Similar presentations


Ads by Google