Presentation is loading. Please wait.

Presentation is loading. Please wait.

Latent Semantic Indexing via a Semi-discrete Matrix Decomposition.

Similar presentations


Presentation on theme: "Latent Semantic Indexing via a Semi-discrete Matrix Decomposition."— Presentation transcript:

1 Latent Semantic Indexing via a Semi-discrete Matrix Decomposition

2 Papers from the same authors with similar topics 1.Kolda, T.G. & O'Leary, D.P. A semidiscrete matrix decomposition for latent semantic indexing information retrieval ACM Trans. Inf. Syst., 1998, 16, 322-346 2.Kolda, T.G. & O’Leary, D.P. George Cybenko, D.P.O. (ed.) Latentsemantic indexing via a semi-discrete matrix decomposition Springer-Verlag, 1999, 107, 73–80 3.Kolda, T.G. & O'leary, D.P. Algorithm 805: computation and uses of the semidiscrete matrix decomposition ACM Transactions on Mathematical Software, 2000, 26, 415– 435

3 Vector Space Framework Query:

4 Weight of term in a document

5

6 Motivation for using SDD Singular Value Decomposition (SVD) is used for Latent Semantic Indexing (LSI) to estimate the structure of word usage across documents. Use Semi-discrete Decomposition (SDD) instead of SVD for LSI to save storage space and retrieval time.

7 Why? Claim: SVD has nice theoretical properties but SVD contains a lot of information, probably more than is necessary for this application.

8 SVD vs SDD SVD: SDD:

9 SDD is an approximate representation of the matrix. Repackaging, even without removing anything, might not result in the original matrix. Theorems exist that say that as the number of terms k tends to infinity, slowly you will converge to the original matrix. The speed of convergence depends on the original estimate, used to "initialize" the iterative decomposition algorithm.

10 Result: Storage Space SVDSDD Approximate comparative storage space 10.05 (for same given rank k ) Size per element Double word (64 bits) 2 bits Size per scalar value Double word (64 bits) Single word (32 bits) TOTAL8k(m + n + 1)4k + ¼ k(m + n)

11 Medline test case

12 Results on Medline test case

13 Method for SDD Greedy algorithm to iteratively construct the kth triplet, d k, x k, and y k.

14 Metrics in those papers 1.Kolda, T.G. & O'Leary, D.P. A semidiscrete matrix decomposition for latent semantic indexing information retrieval ACM Trans. Inf. Syst., 1998, 16, 322-346 2.Kolda, T.G. & O’Leary, D.P. George Cybenko, D.P.O. (ed.) Latentsemantic indexing via a semi-discrete matrix decomposition Springer-Verlag, 1999, 107, 73–80 3.Kolda, T.G. & O'leary, D.P. Algorithm 805: computation and uses of the semidiscrete matrix decomposition ACM Transactions on Mathematical Software, 2000, 26, 415–435 NOTE: all y’s above are fixed. x and y are alternatively fixed in each algorithm

15 Greedy Algorithm

16 Notes on the algorithm Starting vector y: every 100 th element is 1 and all the other are 0. A k  A as k  ∞ Find the minimum F-norm can be simplified to find an optimal x. Improvement threshold may be 0.01. improvement = |new - old| / old

17 1.Fix y. 2.Find optimal d. 3.Remove d by plugging d. 4.Solve x. 5.Use x and y to find d. Finding x and d This is simplified to (and use for the algorithm) 1.Fix y 2.Find optimal x over m x-vectors 3.Given x and y, find d*

18

19 There are m possible values for J; thus, we only need to check m possible x vectors to determine the optimal solution.


Download ppt "Latent Semantic Indexing via a Semi-discrete Matrix Decomposition."

Similar presentations


Ads by Google