Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs.

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

Introduction to Information Retrieval Outline ❶ Latent semantic indexing ❷ Dimensionality reduction ❸ LSI in information retrieval 1.
Eigen Decomposition and Singular Value Decomposition
3D Geometry for Computer Graphics
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
PCA Data. PCA Data minus mean Eigenvectors Compressed Data.
Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.
Dimensionality Reduction PCA -- SVD
Lecture 19 Singular Value Decomposition
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Hinrich Schütze and Christina Lioma
Dimensionality reduction. Outline From distances to points : – MultiDimensional Scaling (MDS) – FastMap Dimensionality Reductions or data projections.
Sampling algorithms for l 2 regression and applications Michael W. Mahoney Yahoo Research (Joint work with P. Drineas.
Text Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and video.
3D Geometry for Computer Graphics
Multivariable Control Systems Ali Karimpour Assistant Professor Ferdowsi University of Mashhad.
1 Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
TFIDF-space  An obvious way to combine TF-IDF: the coordinate of document in axis is given by  General form of consists of three parts: Local weight.
1/ 30. Problems for classical IR models Introduction & Background(LSI,SVD,..etc) Example Standard query method Analysis standard query method Seeking.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing 1.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
3D Geometry for Computer Graphics
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Ordinary least squares regression (OLS)
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Singular Value Decomposition and Data Management
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
DATA MINING LECTURE 7 Dimensionality Reduction PCA – SVD
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Information Retrieval Latent Semantic Indexing. Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions.
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
A vector can be interpreted as a file of data A matrix is a collection of vectors and can be interpreted as a data base The red matrix contain three column.
SVD(Singular Value Decomposition) and Its Applications
Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.
CS246 Topic-Based Models. Motivation  Q: For query “car”, will a document with the word “automobile” be returned as a result under the TF-IDF vector.
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
D. van Alphen1 ECE 455 – Lecture 12 Orthogonal Matrices Singular Value Decomposition (SVD) SVD for Image Compression Recall: If vectors a and b have the.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
AN ORTHOGONAL PROJECTION
Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
CpSc 881: Information Retrieval. 2 Recall: Term-document matrix This matrix is the basis for computing the similarity between documents and queries. Today:
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 6. Dimensionality Reduction.
AGC DSP AGC DSP Professor A G Constantinides©1 Signal Spaces The purpose of this part of the course is to introduce the basic concepts behind generalised.
CpSc 881: Machine Learning PCA and MDS. 2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources.
Introduction to Linear Algebra Mark Goldman Emily Mackevicius.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
Dimensionality Reduction
Online Social Networks and Media Recommender Systems Collaborative Filtering Social recommendations Thanks to: Jure Leskovec, Anand Rajaraman, Jeff Ullman.
ITCS 6265 Information Retrieval & Web Mining Lecture 16 Latent semantic indexing Thanks to Thomas Hofmann for some slides.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.
Instructor: Mircea Nicolescu Lecture 8 CS 485 / 685 Computer Vision.
Matrix Factorization & Singular Value Decomposition Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Jeffrey D. Ullman Stanford University.  Often, our data can be represented by an m-by-n matrix.  And this matrix can be closely approximated by the.
Chapter 13 Discrete Image Transforms
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
UV Decomposition Singular-Value Decomposition CUR Decomposition
Lecture 8:Eigenfaces and Shared Features
Lecture: Face Recognition and Feature Reduction
DATA MINING LECTURE 6 Dimensionality Reduction PCA – SVD
LSI, SVD and Data Management
Packing to fewer dimensions
Dimensionality Reduction: SVD & CUR
Recitation: SVD and dimensionality reduction
Lecture 13: Singular Value Decomposition (SVD)
Presentation transcript:

Dimensionality Reduction

High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs – Surveys – Netflix: 480k users x 177k movies Slides by Jure Leskovec2

Dimensionality Reduction Compress / reduce dimensionality: – 10 6 rows; 10 3 columns; no updates – random access to any cell(s); small error: OK Slides by Jure Leskovec3

Dimensionality Reduction Assumption: Data lies on or near a low d-dimensional subspace Axes of this subspace are effective representation of the data Slides by Jure Leskovec4

Why Reduce Dimensions? Why reduce dimensions? Discover hidden correlations/topics – Words that occur commonly together Remove redundant and noisy features – Not all words are useful Interpretation and visualization Easier storage and processing of the data Slides by Jure Leskovec5

SVD - Definition A [m x n] = U [m x r]   r x r] (V [n x r] ) T A: Input data matrix – m x n matrix (e.g., m documents, n terms) U: Left singular vectors – m x r matrix (m documents, r concepts)  : Singular values – r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix A) V: Right singular vectors – n x r matrix (n terms, r concepts) Slides by Jure Leskovec6

SVD Slides by Jure Leskovec7 A m n  m n U VTVT  T

SVD Slides by Jure Leskovec8 A m n  + 1u1v11u1v1 2u2v22u2v2 σ i … scalar u i … vector v i … vector T

SVD - Properties It is always possible to decompose a real matrix A into A = U  V T, where U, , V: unique U, V: column orthonormal: – U T U = I; V T V = I (I: identity matrix) – (Cols. are orthogonal unit vectors)  : diagonal – Entries (singular values) are positive, and sorted in decreasing order ( σ 1  σ 2 ...  0) Slides by Jure Leskovec9

SVD – Example: Users-to-Movies A = U  V T - example: Slides by Jure Leskovec10 = SciFi Romnce xx Matrix Alien Serenity Casablanca Amelie

SVD – Example: Users-to-Movies A = U  V T - example: Slides by Jure Leskovec11 = xx SciFi-concept Romance-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie

SVD - Example A = U  V T - example: Slides by Jure Leskovec12 = xx SciFi-concept Romance-concept U is “user-to-concept” similarity matrix SciFi Romnce Matrix Alien Serenity Casablanca Amelie

SVD - Example A = U  V T - example: Slides by Jure Leskovec13 = xx ‘strength’ of SciFi-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie

SVD - Example A = U  V T - example: Slides by Jure Leskovec14 = xx V is “movie-to-concept” similarity matrix SciFi-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie

SVD - Example A = U  V T - example: Slides by Jure Leskovec15 = xx SciFi-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie V is “movie-to-concept” similarity matrix

SVD - Interpretation #1 ‘movies’, ‘users’ and ‘concepts’: U: user-to-concept similarity matrix V: movie-to-concept sim. matrix  : its diagonal elements: ‘strength’ of each concept Slides by Jure Leskovec16

SVD - interpretation #2 Slides by Jure Leskovec17 SVD gives best axis to project on: ‘best’ = min sum of squares of projection errors minimum reconstruction error v1v1 first right singular vector Movie 1 rating Movie 2 rating

SVD - Interpretation #2 A = U  V T - example: Slides by Jure Leskovec18 xx v1v1 = v1v1 first right singular vector Movie 1 rating Movie 2 rating

SVD - Interpretation #2 A = U  V T - example: Slides by Jure Leskovec19 xx variance (‘spread’) on the v 1 axis =

SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? Slides by Jure Leskovec20 xx =

SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero Slides by Jure Leskovec21 = xx A=

SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero Slides by Jure Leskovec22 xx A= ~

SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero: Slides by Jure Leskovec23 xx A= ~

SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero: Slides by Jure Leskovec24 xx A= ~

SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero Slides by Jure Leskovec25 ~ A= B= Frobenius norm: ǁ M ǁ F = Σ ij M ij 2 ǁ A-B ǁ F = Σ ij (A ij -B ij ) 2 is “small”

Slides by Jure Leskovec26 A U Sigma VTVT = B U VTVT = B is approx A

SVD – Best Low Rank Approx. Slides by Jure Leskovec27

SVD – Best Low Rank Approx. Slides by Jure Leskovec28 We apply: -- P column orthonormal -- R row orthonormal -- Q is diagonal

SVD – Best Low Rank Approx. Slides by Jure Leskovec29 U  V T - U S V T = U (  - S) V T

SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix: Slides by Jure Leskovec30 = xx u1u1 u2u2 σ1σ1 σ2σ2 v1v1 v2v2

SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix Slides by Jure Leskovec31 = u1u1 σ1σ1 vT1vT1 u2u2 σ2σ2 vT2vT n m n x 1 1 x m k terms Assume: σ 1  σ 2  σ 3 ...  0 Why is setting small σs the thing to do? Vectors u i and v i are unit length, so σ i scales them. So, zeroing small σs introduces less error.

SVD - Interpretation #2 Q: How many σ s to keep? A: Rule-of-a thumb: keep 80-90% of ‘energy’ (=  σ i 2 ) Slides by Jure Leskovec32 =u1u1 σ1σ1 vT1vT1 u2u2 σ2σ2 vT2vT n m assume: σ 1  σ 2  σ 3 ...

SVD - Complexity To compute SVD: – O(nm 2 ) or O(n 2 m) (whichever is less) But: – Less work, if we just want singular values – or if we want first k singular vectors – or if the matrix is sparse Implemented in linear algebra packages like – LINPACK, Matlab, SPlus, Mathematica... Slides by Jure Leskovec33

SVD - Conclusions so far SVD: A= U  V T : unique – U: user-to-concept similarities – V: movie-to-concept similarities –  : strength of each concept Dimensionality reduction: – keep the few largest singular values (80-90% of ‘energy’) – SVD: picks up linear correlations Slides by Jure Leskovec34

Case study: How to query? Q: Find users that like ‘Matrix’ and ‘Alien’ Slides by Jure Leskovec35 = SciFi Romnce xx Matrix Alien Serenity Casablanca Amelie

Case study: How to query? Q: Find users that like ‘Matrix’ A: Map query into a ‘concept space’ – how? Slides by Jure Leskovec36 = SciFi Romnce xx Matrix Alien Serenity Casablanca Amelie

Case study: How to query? Q: Find users that like ‘Matrix’ A: map query vectors into ‘concept space’ – how? Slides by Jure Leskovec37 q=q= Matrix Alien v1 q v2 Matrix Alien Serenity Casablanca Amelie Project into concept space: Inner product with each ‘concept’ vector v i

Case study: How to query? Q: Find users that like ‘Matrix’ A: map the vector into ‘concept space’ – how? Slides by Jure Leskovec38 v1 q q*v 1 q=q= Matrix Alien Serenity Casablanca Amelie v2 Matrix Alien Project into concept space: Inner product with each ‘concept’ vector v i

Case study: How to query? Compactly, we have: q concept = q V E.g.: Slides by Jure Leskovec39 movie-to-concept similarities = SciFi-concept q=q= Matrix Alien Serenity Casablanca Amelie

Case study: How to query? How would the user d that rated (‘Alien’, ‘Serenity’) be handled? d concept = d V E.g.: Slides by Jure Leskovec40 movie-to-concept similarities = SciFi-concept d= Matrix Alien Serenity Casablanca Amelie

Case study: How to query? Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to query “user” q that rated (‘Matrix’), although d did not rate ‘Matrix’! Slides by Jure Leskovec41 d= SciFi-concept q=q= Matrix Alien Serenity Casablanca Amelie Similarity = 0 Similarity ≠ 0

SVD: Drawbacks + Optimal low-rank approximation: in Frobenius norm - Interpretability problem: – A singular vector specifies a linear combination of all input columns or rows - Lack of sparsity: – Singular vectors are dense! Slides by Jure Leskovec42 U  VTVT