Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval Outline ❶ Latent semantic indexing ❷ Dimensionality reduction ❸ LSI in information retrieval 1.
Advertisements

Latent Semantic Analysis
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Basic IR: Modeling Basic IR Task: Slightly more complex:
An Ω(n 1/3 ) Lower Bound for Bilinear Group Based Private Information Retrieval Alexander Razborov Sergey Yekhanin.
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
Dimensionality Reduction PCA -- SVD
Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
Hinrich Schütze and Christina Lioma
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.
Lecture 16 Graphs and Matrices in Practice Eigenvalue and Eigenvector Shang-Hua Teng.
Learning Techniques for Information Retrieval Perceptron algorithm Least mean.
Lecture 19 Quadratic Shapes and Symmetric Positive Definite Matrices Shang-Hua Teng.
Chapter 5: Query Operations Baeza-Yates, 1999 Modern Information Retrieval.
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Modeling Modern Information Retrieval
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Paper Summary of: Modelling Retrieval and Navigation in Context by: Massimo Melucci Ahmed A. AlNazer May 2008 ICS-542: Multimedia Computing – 072.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.
1 CS 430 / INFO 430 Information Retrieval Lecture 9 Latent Semantic Indexing.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Query Operations: Automatic Global Analysis. Motivation Methods of local analysis extract information from local set of documents retrieved to expand.
Information Retrieval Latent Semantic Indexing. Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])
Chapter 2 Dimensionality Reduction. Linear Methods
Linear Algebra Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
Indices Tomasz Bartoszewski. Inverted Index Search Construction Compression.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
IR Models J. H. Wang Mar. 11, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y.
Chapter. 02: Modeling Contenue... 19/10/2015Dr. Almetwally Mostafa 1.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
Generic text summarization using relevance measure and latent semantic analysis Gong Yihong and Xin Liu SIGIR, April 2015 Yubin Lim.
Generalized Vector Model n Classic models enforce independence of index terms. n For the Vector model: u Set of term vectors {k1, k2,..., kt} are linearly.
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
June 5, 2006University of Trento1 Latent Semantic Indexing for the Routing Problem Doctorate course “Web Information Retrieval” PhD Student Irina Veredina.
SINGULAR VALUE DECOMPOSITION (SVD)
Alternative IR models DR.Yeni Herdiyeni, M.Kom STMIK ERESHA.
Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.
Latent Semantic Indexing and Probabilistic (Bayesian) Information Retrieval.
Progress Report (Concept Extraction) Presented by: Mohsen Kamyar.
LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Information Retrieval and Web Search Probabilistic IR and Alternative IR Models Rada Mihalcea (Some of the slides in this slide set come from a lecture.
1 CS 430: Information Discovery Lecture 11 Latent Semantic Indexing.
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
10.0 Latent Semantic Analysis for Linguistic Processing References : 1. “Exploiting Latent Semantic Information in Statistical Language Modeling”, Proceedings.
Information Retrieval CSE 8337 Spring 2005 Modeling (Part II) Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates.
Recuperação de Informação B Cap. 02: Modeling (Latent Semantic Indexing & Neural Network Model) 2.7.2, September 27, 1999.
Natural Language Processing Topics in Information Retrieval August, 2002.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Latent Semantic Indexing
CS 430: Information Discovery
Information Retrieval and Web Search
Recuperação de Informação B
Latent Semantic Analysis
Presentation transcript:

Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)

Latent Semantic Indexing Classic IR might lead to poor retrieval due to:  unrelated documents might be included in the answer set  relevant documents that do not contain at least one index term are not retrieved  Reasoning: retrieval based on index terms is vague and noisy The user information need is more related to concepts and ideas than to index terms A document that shares concepts with another document known to be relevant might be of interest

Latent Semantic Indexing The key idea is to map documents and queries into a lower dimensional space (i.e., composed of higher level concepts which are in fewer number than the index terms) Retrieval in this reduced concept space might be superior to retrieval in the space of index terms

Latent Semantic Indexing Definitions  Let t be the total number of index terms  Let N be the number of documents  Let (Mij) be a term-document matrix with t rows and N columns  To each element of this matrix is assigned a weight wij associated with the pair [ki,dj]  The weight wij can be based on a tf-idf weighting scheme

Latent Semantic Indexing The matrix (Mij) can be decomposed into 3 matrices (singular value decomposition) as follows:  (Mij) = (K) (S) (D) t  (K) is the matrix of eigenvectors derived from (M)(M) t  (D) t is the matrix of eigenvectors derived from (M) t (M)  (S) is an r x r diagonal matrix of singular values where  r = min(t,N) that is, the rank of (Mij)

Computing an Example

Computing an Example (cont) M=

Computing an Example (cont) K=

Computing an Example (cont) S=

Computing an Example (cont) D=

Computing an Example (cont) New M=

Latent Semantic Indexing In the matrix (S), select only the s largest singular values Keep the corresponding columns in (K) and (D) t The resultant matrix is called (M) s and is given by  (M)s = (K) s (S) s (D) t  where s, s < r, is the dimensionality of the concept space The parameter s should be  large enough to allow fitting the characteristics of the data  small enough to filter out the non-relevant representational details s

Latent Ranking The user query can be modelled as a pseudo- document in the original (M) matrix Assume the query is modelled as the document numbered 0 in the (M) matrix The matrix (M) t (M) s quantifies the relantionship between any two documents in the reduced concept space The first row of this matrix provides the rank of all the documents with regard to the user query (represented as the document numbered 0) s

Conclusions Latent semantic indexing provides an interesting conceptualization of the IR problem Latent semantic indexing provides an interesting conceptualization of the IR problem It allows reducing the complexity of the underline representational framework which might be explored, for instance, with the purpose of interfacing with the user It allows reducing the complexity of the underline representational framework which might be explored, for instance, with the purpose of interfacing with the user