Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.

Similar presentations


Presentation on theme: "1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking."— Presentation transcript:

1 1/ 30

2 Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking the best Experimental results SVR Vs. IRR SVR Conclusion Future work 2/ 30

3 Problems for classical IR models LSI SVD 3/ 30

4 Problems for classical IR models Synonymy: Various words and phrases refer to the same concept (lowers recall). Polysemy: Individual words have more than one meaning (lowers precision) Independence: No significance is given to two terms that frequently appear together 4/ 30

5 Latent Semantic Analysis General idea – Map documents (and terms) to a low-dimensional representation. – Design a mapping such that the low-dimensional space reflects semantic associations (latent semantic space). – Compute document similarity based on the inner product in the latent semantic space. Goals – Similar terms map to similar location in low dimensional space. – Noise reduction by dimension reduction. 5/ 30

6 Vector Model 6/ 30 Set of document: A finite set of terms : Every document can be displayed as vector: the same to the query: Similarity of query q and document d: Given a threshold, all documents with similarity > threshold are retrieved i j djdj q 

7 SVD and low-rank approximations This optimality property of very useful in, e.g., Principal Component Analysis (PCA), LSI, etc. Truncate the SVD by keeping n ≤ k terms: 7/ 30 orthogonal matrix containing the top k left (right) singular vectors of A. diagonal matrix containing the top k singular values of A. ordered non-increasingly.  rank of A, the number of non-zero singular values. diagonal matrix containing the top k singular values of A. ordered non-increasingly.  rank of A, the number of non-zero singular values. the “best” matrix among all rank-k matrices wrt. to the spectral and Frobenius norms

8 8/ 30

9 9/ 30

10 10/ 30

11 11/ 30

12 12/ 30

13 13/ 30

14 14/ 30

15 TREC-4 data set.” http://trec.nist.gov/ ”http://trec.nist.gov/ randomly chose 5305 documents. tested with 20 queries. Stemming “Porter Stemmer” and stop-word were used.” http://www.tartarus.org/~martin/PorterStemmer/”;” http://www.lextek.com/manuals/onix/stopwords1.html” http://www.tartarus.org/~martin/PorterStemmer/ http://www.lextek.com/manuals/onix/stopwords1.html term-by-document matrix was of dimension 16,571 x 5305 and was determined to have a full rank of 5305 through the SVD process. 15/ 30

16 T, measuring the area covered between the IRP curve and the horizontal axis of Recall and representing the average interpolated precision over the full range ([0, 1]) of recall 16/ 30

17 LSI IRR A term doc Weight SVD U VTVT eigenvalue eigenvector rescaling 17/ 30

18 term doc turn to term sentence IRR U  VTVT Put all document as a query to count the similarity 18/ 30

19 19/ 30

20 20/ 30

21 21/ 30 Fig 5: shows SVD of 2x2 matrix

22 22/ 30

23 Mathematical analysis showed that: – The difference between the results of version A and version B is a factor of S 2 with S being the diagonal matrix of singular values in the dimension-reduced model. – The retrieval results from version B and version B’ are always identical if the Equivalency Principle is satisfied. – Version B (B’) should be a better option than version A. 23/ 30

24 Experiments on standardized TREC data set confirmed that: – 5.9% The improvement ratio of Using SVR in addition to the conventional LSI over using the conventional LSI alone. – SVR is computationally as efficient as the best standard query method ”Version B”. – SVR performs better than IRR. 24/ 30

25 Applying SVR to other fields of IR such as image retrieval and video/audio retrieval. Seeking mathematical justification of SVR, including the relationship between the optimal rescaling factor S_exp and the characteristics of any particular data set. 25/ 30

26


Download ppt "1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking."

Similar presentations


Ads by Google