Information retrieval – LSI, pLSI and LDA

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Teg Grenager NLP Group Lunch February 24, 2005
Dimensionality Reduction PCA -- SVD
Adaption Adjusting Model’s parameters for a new speaker. Adjusting all parameters need a huge amount of data (impractical). The solution is to cluster.
An Introduction to LDA Tools Kuan-Yu Chen Institute of Information Science, Academia Sinica.
Probabilistic Clustering-Projection Model for Discrete Data
Statistical Topic Modeling part 1
Unsupervised and Weakly-Supervised Probabilistic Modeling of Text Ivan Titov April TexPoint fonts used in EMF. Read the TexPoint manual before.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Hinrich Schütze and Christina Lioma
Principal Component Analysis
Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Latent Dirichlet Allocation a generative model for text
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
Singular Value Decomposition
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.
British Museum Library, London Picture Courtesy: flickr.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
A Bayesian Hierarchical Model for Learning Natural Scene Categories L. Fei-Fei and P. Perona. CVPR 2005 Discovering objects and their location in images.
Probabilistic Latent Semantic Analysis
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
LATENT DIRICHLET ALLOCATION. Outline Introduction Model Description Inference and Parameter Estimation Example Reference.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires too much data and is computationally complex. Solution: Create.
Chapter 2 Dimensionality Reduction. Linear Methods
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Example 16,000 documents 100 topic Picked those with large p(w|z)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
2010 © University of Michigan Latent Semantic Indexing SI650: Information Retrieval Winter 2010 School of Information University of Michigan 1.
Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
Eric Xing © Eric CMU, Machine Learning Latent Aspect Models Eric Xing Lecture 14, August 15, 2010 Reading: see class homepage.
Text Categorization Moshe Koppel Lecture 12:Latent Semantic Indexing Adapted from slides by Prabhaker Raghavan, Chris Manning and TK Prasad.
Integrating Topics and Syntax -Thomas L
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Topic Modeling using Latent Dirichlet Allocation
An Introduction to Latent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation
Adaption Def: To adjust model parameters for new speakers. Adjusting all parameters requires an impractical amount of data. Solution: Create clusters and.
CS246 Latent Dirichlet Analysis. LSI  LSI uses SVD to find the best rank-K approximation  The result is difficult to interpret especially with negative.
Web-Mining Agents Topic Analysis: pLSI and LDA
14.0 Linguistic Processing and Latent Topic Analysis.
Inferring User Interest Familiarity and Topic Similarity with Social Neighbors in Facebook INSTRUCTOR: DONGCHUL KIM ANUSHA BOOTHPUR
Latent Semantic Analysis (LSA) Jed Crandall 16 June 2009.
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Document Clustering Based on Non-negative Matrix Factorization
Matrices and vector spaces
Reading Notes Wang Ning Lab of Database and Information Systems
LSI, SVD and Data Management
Latent Dirichlet Analysis
Michal Rosen-Zvi University of California, Irvine
Latent Dirichlet Allocation
CS246: Latent Dirichlet Analysis
Junghoo “John” Cho UCLA
Lecture 13: Singular Value Decomposition (SVD)
Topic Models in Text Processing
14.0 Linguistic Processing and Latent Topic Analysis
Language Models for TR Rong Jin
Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement Rie Kubota Ando. Latent semantic space: Iterative.
Presentation transcript:

Information retrieval – LSI, pLSI and LDA Jian-Yun Nie

Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/Eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and λ a scalar (eigenvalue) E.g.

                                                                                                                     Why using eigenvector? Linear algebra: A x = b Eigenvector: A x = λ x

Why using eigenvector Eigenvectors are orthogonal (seen as being independent) Eigenvector represents the basis of the original vector A Useful for Solving linear equations Determine the natural frequency of bridge …

Latent Semantic Indexing (LSI)

Latent Semantic Analysis

LSI

Classic LSI Example (Deerwester)

LSI, SVD, & Eigenvectors SVD decomposes: Term x Document matrix X as X=UVT Where U,V left and right singular vector matrices, and  is a diagonal matrix of singular values Corresponds to eigenvector-eigenvalue decompostion: Y=VLVT Where V is orthonormal and L is diagonal U: matrix of eigenvectors of Y=XXT V: matrix of eigenvectors of Y=XTX  : diagonal matrix L of eigenvalues

SVD: Dimensionality Reduction

Cutting the dimensions with the least singular values

Computing Similarity in LSI

LSI and PLSI LSI: find the k-dimensions that Minimizes the Frobenius norm of A-A’. Frobenius norm of A: pLSI: defines one’s own objective function to minimize (maximize)

pLSI – a generative model

pLSI – a probabilistic approach

pLSI Assume a multinomial distribution Distribution of topics (z) Question: How to determine z ?

Using EM Likelihood E-step M-step

Relation with LSI Relation Difference: LSI: minimize Frobenius (L-2) norm ~ additive Gaussian noise assumption on counts pLSI: log-likelihood of training data ~ cross-entropy / KL-divergence

Mixture of Unigrams (traditional) Zi wi1 w2i w3i w4i Mixture of Unigrams Model (this is just Naïve Bayes) For each of M documents, Choose a topic z. Choose N words by drawing each one independently from a multinomial conditioned on z. In the Mixture of Unigrams model, we can only have one topic per document!

Probabilistic Latent Semantic Indexing (pLSI) Model The pLSI Model d For each word of document d in the training set, Choose a topic z according to a multinomial conditioned on the index d. Generate the word by drawing from a multinomial conditioned on z. In pLSI, documents can have multiple topics. zd1 zd2 zd3 zd4 wd1 wd2 wd3 wd4 Probabilistic Latent Semantic Indexing (pLSI) Model

Problem of pLSI It is not a proper generative model for document: Document is generated from a mixture of topics The number of topics may grow linearly with the size of the corpus Difficult to generate a new document

Dirichlet Distributions In the LDA model, we would like to say that the topic mixture proportions for each document are drawn from some distribution. So, we want to put a distribution on multinomials. That is, k-tuples of non-negative numbers that sum to one. The space is of all of these multinomials has a nice geometric interpretation as a (k-1)-simplex, which is just a generalization of a triangle to (k-1) dimensions. Criteria for selecting our prior: It needs to be defined for a (k-1)-simplex. Algebraically speaking, we would like it to play nice with the multinomial distribution.

Dirichlet Distributions Useful Facts: This distribution is defined over a (k-1)-simplex. That is, it takes k non-negative arguments which sum to one. Consequently it is a natural distribution to use over multinomial distributions. In fact, the Dirichlet distribution is the conjugate prior to the multinomial distribution. (This means that if our likelihood is multinomial with a Dirichlet prior, then the posterior is also Dirichlet!) The Dirichlet parameter i can be thought of as a prior count of the ith class.

The LDA Model For each document, Choose ~Dirichlet() z1 z2 z3 z4 z1 z2 z3 z4 z1 z2 z3 z4 w1 w2 w3 w4 w1 w2 w3 w4 w1 w2 w3 w4 For each document, Choose ~Dirichlet() For each of the N words wn: Choose a topic zn» Multinomial() Choose a word wn from p(wn|zn,), a multinomial probability conditioned on the topic zn. b

The LDA Model For each document, Choose » Dirichlet() For each of the N words wn: Choose a topic zn» Multinomial() Choose a word wn from p(wn|zn,), a multinomial probability conditioned on the topic zn.

LDA (Latent Dirichlet Allocation) Document = mixture of topics (as in pLSI), but according to a Dirichlet prior When we use a uniform Dirichlet prior, pLSI=LDA A word is also generated according to another variable :

Variational Inference In variational inference, we consider a simplified graphical model with variational parameters ,  and minimize the KL Divergence between the variational and posterior distributions.

Use of LDA A widely used topic model Complexity is an issue Use in IR: Interpolate a topic model with traditional LM Improvements over traditional LM, But no improvement over Relevance model (Wei and Croft, SIGIR 06)

References Also see Wikipedia articles on LSI, pLSI and LDA LSI pLSI Deerwester, S., et al, Improving Information Retrieval with Latent Semantic Indexing, Proceedings of the 51st Annual Meeting of the American Society for Information Science 25, 1988, pp. 36–40. Michael W. Berry, Susan T. Dumais and Gavin W. O'Brien, Using Linear Algebra for Intelligent Information Retrieval, UT-CS-94-270,1994 pLSI Thomas Hofmann, Probabilistic Latent Semantic Indexing, Proceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval (SIGIR-99), 1999 LDA Latent Dirichlet allocation. D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Finding Scientific Topics. Griffiths, T., & Steyvers, M. (2004). Proceedings of the National Academy of Sciences, 101 (suppl. 1), 5228-5235. Hierarchical topic models and the nested Chinese restaurant process. D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum In S. Thrun, L. Saul, and B. Scholkopf, editors, Advances in Neural Information Processing Systems (NIPS) 16, Cambridge, MA, 2004. MIT Press. Also see Wikipedia articles on LSI, pLSI and LDA