1 Discussion Class 4 Latent Semantic Indexing. 2 Discussion Classes Format: Question Ask a member of the class to answer. Provide opportunity for others.

Slides:



Advertisements
Similar presentations
Understanding Fossil Butte
Advertisements

The Mathematics of Information Retrieval 11/21/2005 Presented by Jeremy Chapman, Grant Gelven and Ben Lakin.
Text Databases Text Types
Latent Semantic Analysis
Dimensionality Reduction PCA -- SVD
INF 141 IR METRICS LATENT SEMANTIC ANALYSIS AND INDEXING Crista Lopes.
1 Discussion Class 3 The Porter Stemmer. 2 Course Administration No class on Thursday.
Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Hinrich Schütze and Christina Lioma
1 Discussion Class 2 A Vector Space Model for Automated Indexing.
1 Discussion Class 12 Medical Subject Headings (MeSH) and Unified Medical Language System (UML)
Latent Semantic Indexing via a Semi-discrete Matrix Decomposition.
1 Discussion Class 11 Click through Data as Implicit Feedback.
1 CS 430 / INFO 430 Information Retrieval Lecture 2 Searching Full Text 2.
1 CS 430 / INFO 430 Information Retrieval Lecture 11 Latent Semantic Indexing Extending the Boolean Model.
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li
Indexing by Latent Semantic Analysis Written by Deerwester, Dumais, Furnas, Landauer, and Harshman (1990) Reviewed by Cinthia Levy.
1 Discussion Class 10 Informedia. 2 Discussion Classes Format: Question Ask a member of the class to answer. Provide opportunity for others to comment.
Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.
1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.
1 Discussion Class 12 User Interfaces and Visualization.
1 Discussion Class 3 Inverse Document Frequency. 2 Discussion Classes Format: Questions. Ask a member of the class to answer. Provide opportunity for.
Adding Semantics to Information Retrieval By Kedar Bellare 20 th April 2003.
1 Discussion Class 2 A Vector Space Model for Automated Indexing.
1 Discussion Class 6 Crawling the Web. 2 Discussion Classes Format: Questions. Ask a member of the class to answer. Provide opportunity for others to.
1 Discussion Class 8 The Google File System. 2 Discussion Classes Format: Question Ask a member of the class to answer. Provide opportunity for others.
1 Discussion Class 5 TREC. 2 Discussion Classes Format: Questions. Ask a member of the class to answer. Provide opportunity for others to comment. When.
1 Final Discussion Class User Interfaces. 2 Discussion Classes Format: Question Ask a member of the class to answer Provide opportunity for others to.
1 Discussion Class 1 Three Information Retrieval Systems.
1 LSI (lecture 19) Using latent semantic analysis to improve access to textual information (Dumais et al, CHI-88) What’s the best source of info about.
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Latent Semantic Indexing.
Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])
1 CS 430 / INFO 430 Information Retrieval Lecture 2 Text Based Information Retrieval.
Understanding Fossil Butte
Indices Tomasz Bartoszewski. Inverted Index Search Construction Compression.
1 Discussion Class 9 Thesaurus Construction. 2 Discussion Classes Format: Question Ask a member of the class to answer Provide opportunity for others.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
Nature of Science/Scientific Method Unit 1, Notes
Understanding The Semantics of Media Chapter 8 Camilo A. Celis.
Authors: Rosario Sotomayor, Joe Carthy and John Dunnion Speaker: Rosario Sotomayor Intelligent Information Retrieval Group (IIRG) UCD School of Computer.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
SINGULAR VALUE DECOMPOSITION (SVD)
1 Discussion Class 8 MARC. 2 Discussion Classes Format: Question Ask a member of the class to answer. Provide opportunity for others to comment. When.
Alternative IR models DR.Yeni Herdiyeni, M.Kom STMIK ERESHA.
1 Mining the Web to Determine Similarity Between Words, Objects, and Communities Author : Mehran Sahami Reporter : Tse Ho Lin 2007/9/10 FLAIRS, 2006.
SCIENTIFIC METHOD CA STATE STANDARD 8.
1 CS 430: Information Discovery Lecture 12 Latent Semantic Indexing.
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
1 CS 430: Information Discovery Lecture 11 Latent Semantic Indexing.
1 Discussion Class 1 Three Information Retrieval Systems.
1 Discussion Class 1 Inverted Files. 2 Discussion Classes Format: Question Ask a member of the class to answer Provide opportunity for others to comment.
1 Discussion Class 10 Thesaurus Construction. 2 Discussion Classes Format: Question Ask a member of the class to answer Provide opportunity for others.
1 Discussion Class 2 A Vector Space Model for Automated Indexing.
CS 430: Information Discovery
Scientific Method Section 2.
6.5 Solving Inequalities by Factoring
Discussion Class 7 Lucene.
Relevance Feedback and Query Modification
Restructuring Sparse High Dimensional Data for Effective Retrieval
Introduction to information retrieval
Discussion Class 9 Google.
Discussion Class 7 User Requirements.
Discussion Class 8 User Interfaces.
Latent Semantic Analysis
Presentation transcript:

1 Discussion Class 4 Latent Semantic Indexing

2 Discussion Classes Format: Question Ask a member of the class to answer. Provide opportunity for others to comment. When answering: Stand up. Give your name. Make sure that the TA hears it. Speak clearly so that all the class can hear. Suggestions: Do not be shy at presenting partial answers. Differing viewpoints are welcome.

3 Question 1: Basics (a)Explain the name "latent semantic analysis"? (b)What problems is latent semantic analysis attempting to solve? (c)What criteria were used in selecting singular-value decomposition?

4 Question 2 term document query --- cosine > 0.9

5 Question 3: Rank Reduction (a) Explain the matrices in the singular value decomposition: X = T 0 S 0 D 0 ' (b) The rank reduction method is to keep the first k elements of S 0 and set the others to zero. This gives: X X = TSD' What has this to do with latent semantics? ~ ~ ^

6 Q4: Experimental Results: 100 Factors (a) LSI-100 does better at the right of this graph than on the left. What has this to do with synonymy and polysemy? (b) Why were the authors surprised that TERM and SMART gave similar results?

7 Question 5: Experimental Results (a)Describe the methodology of the MED experiment. (b)What conclusions can you draw from this experiment? (c)The results of the CISI experiment were disappointing. What are some possible explanations? (d)This is a new method. What comes next?

8 Question 6: Number of Factors What data does this graph plot? What conclusions can you draw from this graph?

9 Question 7: Performance The paper states, "the only way documents can be retrieved is by an exhaustive comparison of a query vector against all stored document vectors." (a) Explain this statement (b) Is this a serious problem?