1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005

Slides:



Advertisements
Similar presentations
Introduction to Information Retrieval Outline ❶ Latent semantic indexing ❷ Dimensionality reduction ❸ LSI in information retrieval 1.
Advertisements

Eigen Decomposition and Singular Value Decomposition
Latent Semantic Analysis
Information retrieval – LSI, pLSI and LDA
Dimensionality Reduction PCA -- SVD
PCA + SVD.
Comparison of information retrieval techniques: Latent semantic indexing (LSI) and Concept indexing (CI) Jasminka Dobša Faculty of organization and informatics,
Lecture 19 Singular Value Decomposition
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
What is missing? Reasons that ideal effectiveness hard to achieve: 1. Users’ inability to describe queries precisely. 2. Document representation loses.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Hinrich Schütze and Christina Lioma
Sampling algorithms for l 2 regression and applications Michael W. Mahoney Yahoo Research (Joint work with P. Drineas.
Principal Component Analysis
Lecture 19 Quadratic Shapes and Symmetric Positive Definite Matrices Shang-Hua Teng.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
Singular Value Decomposition in Text Mining Ram Akella University of California Berkeley Silicon Valley Center/SC Lecture 4b February 9, 2011.
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction
Singular Value Decomposition
1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 5 April 23, 2006
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Singular Value Decomposition and Data Management
Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Homework Define a loss function that compares two matrices (say mean square error) b = svd(bellcore) b2 = b$u[,1:2] %*% diag(b$d[1:2]) %*% t(b$v[,1:2])
Linear Algebra Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
Latent Semantic Indexing Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Matrix Factorization and Latent Semantic Indexing 1 Lecture 13: Matrix Factorization and Latent Semantic Indexing Web Search and Mining.
Introduction to Information Retrieval Lecture 19 LSI Thanks to Thomas Hofmann for some slides.
An Introduction to Latent Semantic Analysis. 2 Matrix Decompositions Definition: The factorization of a matrix M into two or more matrices M 1, M 2,…,
SVD: Singular Value Decomposition
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
Latent Semantic Indexing: A probabilistic Analysis Christos Papadimitriou Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala.
Chapter 5 Eigenvalues and Eigenvectors 大葉大學 資訊工程系 黃鈴玲 Linear Algebra.
Text Categorization Moshe Koppel Lecture 12:Latent Semantic Indexing Adapted from slides by Prabhaker Raghavan, Chris Manning and TK Prasad.
Latent Semantic Indexing
LATENT SEMANTIC INDEXING BY SINGULAR VALUE DECOMPOSITION
Modern information retreival Chapter. 02: Modeling (Latent Semantic Indexing)
1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret
Web Search and Text Mining Lecture 5. Outline Review of VSM More on LSI through SVD Term relatedness Probabilistic LSI.
10.0 Latent Semantic Analysis for Linguistic Processing References : 1. “Exploiting Latent Semantic Information in Statistical Language Modeling”, Proceedings.
ITCS 6265 Information Retrieval & Web Mining Lecture 16 Latent semantic indexing Thanks to Thomas Hofmann for some slides.
Recuperação de Informação B Cap. 02: Modeling (Latent Semantic Indexing & Neural Network Model) 2.7.2, September 27, 1999.
Web Search and Data Mining Lecture 4 Adapted from Manning, Raghavan and Schuetze.
DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.
Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Lecture XXVII. Orthonormal Bases and Projections Suppose that a set of vectors {x 1,…,x r } for a basis for some space S in R m space such that r  m.
Document Clustering Based on Non-negative Matrix Factorization
LSI, SVD and Data Management
Singular Value Decomposition
Linear Algebra Lecture 32.
Information Retrieval and Web Search
Eigenvalues and Eigenvectors
Lecture 20 SVD and Its Applications
Latent Semantic Analysis
Presentation transcript:

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30,

2 Spectral Methods in Information Retrieval

3 Outline Motivation: synonymy and polysemy Latent Semantic Indexing (LSI) Singular Value Decomposition (SVD) LSI via SVD Why LSI works? HITS and SVD

4 Synonymy and Polysemy Synonymy: multiple terms with (almost) the same meaning  Ex: cars, autos, vehicles  Harms recall Polysemy: a term with multiple meanings  Ex: java (programming language, coffee, island)  Harms precision

5 Traditional Solutions Query expansion  Synonymy: OR on all synonyms Manual/automatic use of thesauri Too few synonyms: recall still low Too many synonyms: harms precision  Polysemy: AND on term and additional specializing terms Ex: +java +”programming language” Too broad terms: precision still low Too narrow terms: harms recall

6 Syntactic Space D: document collection, |D| = n T: term space, |T| = m A t,d : “weight” of t in d (e.g., TFIDF) A T A: pairwise document similarities AA T : pairwise term similarities A m n terms documents

7 Syntactic Indexing Index keys: terms Limitations  Synonymy (Near)-identical rows  Polysemy  Space inefficiency Matrix usually is not full rank Gap between syntax and semantics: Information need is semantic but index and query are syntactic.

8 Semantic Space C: concept space, |C| = r B c,d : “weight” of c in d Change of basis Compare to wavelet and Fourier transforms B r n concepts documents

9 Latent Semantic Indexing (LSI) [Deerwester et al. 1990] Index keys: concepts Documents & query: mixtures of concepts Given a query, finds the most similar documents Bridges the syntax-semantics gap Space-efficient  Concepts are orthogonal  Matrix is full rank Questions  What is the concept space?  What is the transformation from the syntax space to the semantic space?  How to filter “noise concepts”?

10 Singular Values A: m×n real matrix Definition:  ≥ 0 is a singular value of A if there exist a pair of vectors u,v s.t. Av =  u and A T u =  v u and v are called singular vectors. Ex:  = ||A|| 2 = max ||x|| 2 = 1 ||Ax|| 2.  Corresponding singular vectors: x that maximizes ||Ax|| 2 and y = Ax / ||A|| 2. Note: A T Av =  2 v and AA T u =  2 u   2 is eigenvalue of A T A and AA T  u eigenvector of A T A  v eigenvector of AA T

11 Singular Value Decomposition (SVD) Theorem: For every m×n real matrix A, there exists a singular value decomposition: A = U  V T   1 ≥ … ≥  p ≥ 0 (p = min(m,n)): singular values of A   = Diag(  1,…,  p )  U: column-orthogonal m×m matrix (U T U = I )  V: column-orthogonal m×m matrix (V T V = I ) AU VTVT ××  =

12 Singular Values vs. Eigenvalues A = U  V T  1,…,  p : singular values of A   1 2,…,  p 2 : eigenvalues of A T A and AA T u 1,…,u m : columns of U  Orthonormal basis of R m  Left singular vectors of A  Eigenvectors of A T A v 1,…,v n : columns of V  Orthonormal basis of R n  Right singular vectors  Eigenvectors of AA T

13 Economy SVD Let r = max i s.t.  i > 0   r+1 = … =  p = 0  rank(A) = r u 1,…,u r : left singular vectors v 1,…,v n : right singular vectors U T A =  V T AU VTVT ××  = r mm nn r r

14 LSI as SVD U T A =  V T u 1,…,u r : concept basis B =  V T : LSI matrix A d : d-th column of A B d : d-th column of B B d = U T A d B d [c] =

15 Noisy Concepts B = U T A =  V T B d [c] =  c v d [c] If  c is small, then B d [c] small for all d k = largest i s.t.  i is “large” For all c = k+1,…,r, and for all d, c is a low- weight concept in d Main idea: filter out all concepts c = k+1,…,r  Space efficient: # of index terms = k (vs. r or m)  Better retrieval: noisy concepts are filtered out across the board

16 Low-rank SVD B = U T A =  V T U k = (u 1,…,u k ) V k = (v 1,…,v k )  k = upper-left k×k sub-matrix of  A k = U k  k V k T B k =  k V k T rank(A k ) = rank(B k ) = k

17 Low Dimensional Embedding Forbenius norm: Fact: Therefore, if is small, then for “most” d,d’,. A k preserves pairwise similarities among documents  at least as good as A for retrieval.

18 Why is LSI Better? [Papadimitriou et al. 1998] [Azar et al. 2001] LSI summary  Documents are embedded in low dimensional space (m  k)  Pairwise similarities are preserved  More space-efficient But why is retrieval better?  Synonymy  Polysemy

19 Generative Model T: term space, |T| = m A concept c: a distribution on T C: concept space, |C| = k C’: space of all convex combinations of concepts D: distribution on C’×N A corpus model M = (T,C’,D) A document d is generated as follows:  Sample (w,n) according to D  Repeat n times: Sample a concept c from C according to w Sample a term t from T according to c

20 Simplifying Assumptions A: m×n term-document matrix, representing n instantiations of the model D c : documents whose topic is the concept c T c : terms in supp(c) Assumptions: Every document has a single topic (C’ = C) For every two concepts c,c’, ||c – c’|| ≥ 1 -  The probability of every term under a concept c is at most some constant .

21 LSI Works Theorem [Papadimitriou et al. 1998] Given the above assumptions, then with high probability, for every two documents d,d’,  If d,d’ have the same topic, then  If d,d’ have different topics, then

22 Proof For simplicity, assume  = 0 Want to show:  (1) if d,d’ on same topic, A d k, A d’ k are in the same direction  (2) If d,d’ on different topics, A d k, A d’ k are orthogonal A has non-zeroes only in blocks: B 1,…,B k, where B c : sub-matrix of A with rows in T c and columns in D c A T A is a block diagonal matrix with blocks B T 1 B 1,…, B T k B k (i,j)-th entry of B T c B c : term similarity between i-th and j-th documents on the concept c B T c B c : adjacency matrix of a bipartite (multi-)graph G c on D c

23 Proof (cont.) G c is a “random” graph First and second eigenvalues of B T c B c are well separated For all c,c’, second eigenvalue of B T c B c is smaller than first eigenvalue of B T c’ B c’ Top k eigenvalues of A T A are the principal eigenvalues of B T c B c for c = 1,…,k Let u 1,…,u k be corresponding eigenvectors For every document d on topic c, A d is orthogonal to all u 1,…,u k, except for u d. A k d is a scalar multiple of u d.

24 Extensions [Azar et al. 2001] A more general generative model Explain also improved treatment of polysemy

25 Computing SVD Compute singular values of A, by computing eigenvalues of A T A Compute U,V by computing eigenvectors of A T A and AA T Running time not too good: O(m 2 n + m n 2 )  Not practical for huge corpora Sub-linear time algorithms for estimating A_k [Frieze,Kannan,Vempala 1998]

26 HITS and SVD A: adjacency matrix of a web (sub-)graph G a: authority vector h: hub vector a is principal eigenvector of A T A h is principal eigenvector of AA T Therefore: a and h give A 1 : the rank-1 SVD of A Generalization: using A k, we can get k authority and hub vectors, corresponding to other topics in G.

27 End of Lecture 4