Presentation is loading. Please wait.

Presentation is loading. Please wait.

Information Retrieval Latent Semantic Indexing. Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions.

Similar presentations


Presentation on theme: "Information Retrieval Latent Semantic Indexing. Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions."— Presentation transcript:

1 Information Retrieval Latent Semantic Indexing

2 Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions (say 50,000  100) while preserving distances? Two methods: “Latent semantic indexing” Random projection

3 Two approaches LSI is data-dependent Create a k-dim subspace by eliminating redundant axes Pull together “related” axes – hopefully car and automobile Random projection is data-independent Choose a k-dim subspace that guarantees probable stretching properties between pair of points.

4 Notions from linear algebra Matrix A, vector v Matrix transpose (A t ) Matrix product Rank Eigenvalues  and eigenvector v: Av = v

5 Overview of LSI Pre-process docs using a technique from linear algebra called Singular Value Decomposition Create a new (smaller) vector space Queries handled in this new vector space

6 Example 16 terms 17 docs

7 Intuition (contd) More than dimension reduction: Derive a set of new uncorrelated features (roughly, artificial concepts), one per dimension. Docs with lots of overlapping terms stay together Terms also get pulled together onto the same dimension Each term or document is then characterized by a vector of weights indicating its strength of association with each of these underlying concepts Ex. car and automobile get pulled together, since co-occur in docs with tires, radiator, cylinder,… Here comes “semantic” !!!

8 Singular-Value Decomposition Recall m  n matrix of terms  docs, A. A has rank r  m,n Define term-term correlation matrix T=AA t T is a square, symmetric m  m matrix Let P be m  r matrix of eigenvectors of T Define doc-doc correlation matrix D=A t A D is a square, symmetric n  n matrix Let R be n  r matrix of eigenvectors of D

9 A’s decomposition Do exist matrices P (for T, m  r) and R (for D, n  r) formed by orthonormal columns (unit dot-product) It turns out that A = P  R t Where  is a diagonal matrix with the eigenvalues of T=AA t in decreasing order. = A P  RtRt mnmnmrmr rrrr rnrn

10  For some k << r, zero out all but the k biggest eigenvalues in  [choice of k is crucial] Denote by  k this new version of , having rank k Typically k is about 100, while r ( A’s rank ) is > 10,000 = P kk RtRt Dimensionality reduction AkAk document useless due to 0-col/0-row of  k m x r r x n r k k k 00 0 A m x k k x n

11 Guarantee A k is a pretty good approximation to A: Relative distances are (approximately) preserved Of all m  n matrices of rank k, A k is the best approximation to A wrt the following measures: min B, rank(B)=k ||A-B|| 2 = ||A-A k || 2 =  k  min B, rank(B)=k ||A-B|| F 2 = ||A-A k || F 2 =  k  2  k+2 2  r 2 Frobenius norm ||A|| F 2 =   2   2  r 2

12 Reduction X k =  k R t is the doc-matrix reduced to k<n dim Take the doc-correlation matrix: It is D=A t A =(P  R t ) t (P  R t ) = (  R t ) t (  R t ) Approx  with  k, and thus get A t A  X k t X k We use X k to approx A: X k =  k R t = P k t A. This means that to reduce a doc/query vector is enough to multiply it by P k t (i.e. k x m matrix) Cost of sim(q,d), for all d, is O(kn+km) instead of O(mn) R,P are formed by orthonormal eigenvectors of the matrices D,T

13 Which are the concepts ? c-th concept = c-th row of P k t (which is k x m) Denote it by P k t [c], note its size is m = #terms P k t [c][i] = strength of association between c-th concept and i-th term Projected document: d’ j = P k t d j d’ j [c] = strenght of concept c in d j

14 Information Retrieval Random Projection

15 An interesting math result! Setting v=0 we also get a bound on f(u)’s stretching!!!

16 What about the cosine-distance ? f(u)’s, f(v)’s stretching

17 Defining the projection matrix R’s columns k

18 Concentration bound!!! Is R a JL-embedding?

19 Gaussians are good!! NOTE: Every col of R is unitary and uniformly distributed over the unit-sphere; moreover, the k cols of R are orthonormal on average.

20 A practical-theoretical idea !!! E[r i,j ] = 0 Var[r i,j ] = 1

21 Question !! Various theoretical results known. What about practical cases?


Download ppt "Information Retrieval Latent Semantic Indexing. Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions."

Similar presentations


Ads by Google