Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.

Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18

Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions (say 50,000  100) while preserving distances? Now, O(nm) to compute cos(d,q) for all d Then, O(km+kn) where k << n,m Two methods: “Latent semantic indexing” Random projection

Briefly LSI is data-dependent Create a k-dim subspace by eliminating redundant axes Pull together “related” axes – hopefully car and automobile Random projection is data-independent Choose a k-dim subspace that guarantees good stretching properties with high probability between any pair of points. What about polysemy ?

Notions from linear algebra Matrix A, vector v Matrix transpose (A t ) Matrix product Rank Eigenvalues  and eigenvector v: Av = v

Overview of LSI Pre-process docs using a technique from linear algebra called Singular Value Decomposition Create a new (smaller) vector space Queries handled (faster) in this new space

Singular-Value Decomposition Recall m  n matrix of terms  docs, A. A has rank r  m,n Define term-term correlation matrix T=AA t T is a square, symmetric m  m matrix Let P be m  r matrix of eigenvectors of T Define doc-doc correlation matrix D=A t A D is a square, symmetric n  n matrix Let R be n  r matrix of eigenvectors of D

A’s decomposition Given P (for T, m  r) and R (for D, n  r) formed by orthonormal columns (unit dot-product) It turns out that A = P  R t Where  is a diagonal matrix with the eigenvalues of T=AA t in decreasing order. = A P  RtRt mnmnmrmr rrrr rnrn

 For some k << r, zero out all but the k biggest eigenvalues in  [choice of k is crucial] Denote by  k this new version of , having rank k Typically k is about 100, while r ( A’s rank ) is > 10,000 = P kk RtRt Dimensionality reduction AkAk document useless due to 0-col/0-row of  k m x r r x n r k k k 00 0 A m x k k x n

Guarantee A k is a pretty good approximation to A: Relative distances are (approximately) preserved Of all m  n matrices of rank k, A k is the best approximation to A wrt the following measures: min B, rank(B)=k ||A-B|| 2 = ||A-A k || 2 =  k  min B, rank(B)=k ||A-B|| F 2 = ||A-A k || F 2 =  k  2  k+2 2  r 2 Frobenius norm ||A|| F 2 =   2   2  r 2

Reduction X k =  k R t is the doc-matrix k x n, hence reduced to k dim Since we are interested in doc/q correlation, we consider: D=A t A =(P  R t ) t (P  R t ) = (  R t ) t (  R t ) Approx  with  k, thus get A t A  X k t X k (both are n x n matr.) We use X k to define how to project A and Q: X k =  k R t, substitute R t =   P t A, so get P k t A In fact,  k   P t = P k t which is a k x m matrix This means that to reduce a doc/query vector is enough to multiply it by P k t thus paying O(km) per doc/query Cost of sim(q,d), for all d, is O(kn+km) instead of O(mn) R,P are formed by orthonormal eigenvectors of the matrices D,T

Which are the concepts ? c-th concept = c-th row of P k t (which is k x m) Denote it by P k t [c], whose size is m = #terms P k t [c][i] = strength of association between c-th concept and i-th term Projected document: d’ j = P k t d j d’ j [c] = strenght of concept c in d j Projected query: q’ = P k t q q’ [c] = strenght of concept c in q

Random Projections Paolo Ferragina Dipartimento di Informatica Università di Pisa Slides only !

An interesting math result f() is called JL-embedding Setting v=0 we also get a bound on f(u)’s stretching!!! Lemma (Johnson-Linderstrauss, ‘82) Let P be a set of n distinct points in m-dimensions. Given  > 0, there exists a function f : P  IR k such that for every pair of points u,v in P it holds: (1 -  ) ||u - v|| 2 ≤ ||f(u) – f(v)|| 2 ≤ (1 +  ) ||u-v|| 2 Where k = O(  -2 log n)

What about the cosine-distance ? f(u)’s, f(v)’s stretching substituting formula above for ||u-v|| 2

How to compute a JL-embedding? E[r i,j ] = 0 Var[r i,j ] = 1 If we set R = r i,j to be a random mx k matrix, where the components are independent random variables with one of the following distributions

Finally...  Random projections hide large constants  k  (1/  ) 2 * log m, so it may be large…  it is simple and fast to compute  LSI is intuitive and may scale to any k  optimal under various metrics  but costly to compute, now good libraries indeed

Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.

Similar presentations

Presentation on theme: "Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.

Similar presentations

Presentation on theme: "Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18."— Presentation transcript:

Similar presentations

About project

Feedback