# Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs.

## Presentation on theme: "Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs."— Presentation transcript:

Dimensionality Reduction

High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs – Surveys – Netflix: 480k users x 177k movies Slides by Jure Leskovec2

Dimensionality Reduction Compress / reduce dimensionality: – 10 6 rows; 10 3 columns; no updates – random access to any cell(s); small error: OK Slides by Jure Leskovec3

Dimensionality Reduction Assumption: Data lies on or near a low d-dimensional subspace Axes of this subspace are effective representation of the data Slides by Jure Leskovec4

Why Reduce Dimensions? Why reduce dimensions? Discover hidden correlations/topics – Words that occur commonly together Remove redundant and noisy features – Not all words are useful Interpretation and visualization Easier storage and processing of the data Slides by Jure Leskovec5

SVD - Definition A [m x n] = U [m x r]   r x r] (V [n x r] ) T A: Input data matrix – m x n matrix (e.g., m documents, n terms) U: Left singular vectors – m x r matrix (m documents, r concepts)  : Singular values – r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix A) V: Right singular vectors – n x r matrix (n terms, r concepts) Slides by Jure Leskovec6

SVD Slides by Jure Leskovec7 A m n  m n U VTVT  T

SVD Slides by Jure Leskovec8 A m n  + 1u1v11u1v1 2u2v22u2v2 σ i … scalar u i … vector v i … vector T

SVD - Properties It is always possible to decompose a real matrix A into A = U  V T, where U, , V: unique U, V: column orthonormal: – U T U = I; V T V = I (I: identity matrix) – (Cols. are orthogonal unit vectors)  : diagonal – Entries (singular values) are positive, and sorted in decreasing order ( σ 1  σ 2 ...  0) Slides by Jure Leskovec9

SVD – Example: Users-to-Movies A = U  V T - example: Slides by Jure Leskovec10 = SciFi Romnce xx Matrix Alien Serenity Casablanca Amelie

SVD – Example: Users-to-Movies A = U  V T - example: Slides by Jure Leskovec11 = xx SciFi-concept Romance-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie

SVD - Example A = U  V T - example: Slides by Jure Leskovec12 = xx SciFi-concept Romance-concept U is “user-to-concept” similarity matrix SciFi Romnce Matrix Alien Serenity Casablanca Amelie

SVD - Example A = U  V T - example: Slides by Jure Leskovec13 = xx ‘strength’ of SciFi-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie

SVD - Example A = U  V T - example: Slides by Jure Leskovec14 = xx V is “movie-to-concept” similarity matrix SciFi-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie

SVD - Example A = U  V T - example: Slides by Jure Leskovec15 = xx SciFi-concept SciFi Romnce Matrix Alien Serenity Casablanca Amelie V is “movie-to-concept” similarity matrix

SVD - Interpretation #1 ‘movies’, ‘users’ and ‘concepts’: U: user-to-concept similarity matrix V: movie-to-concept sim. matrix  : its diagonal elements: ‘strength’ of each concept Slides by Jure Leskovec16

SVD - interpretation #2 Slides by Jure Leskovec17 SVD gives best axis to project on: ‘best’ = min sum of squares of projection errors minimum reconstruction error v1v1 first right singular vector Movie 1 rating Movie 2 rating

SVD - Interpretation #2 A = U  V T - example: Slides by Jure Leskovec18 xx v1v1 = v1v1 first right singular vector Movie 1 rating Movie 2 rating

SVD - Interpretation #2 A = U  V T - example: Slides by Jure Leskovec19 xx variance (‘spread’) on the v 1 axis =

SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? Slides by Jure Leskovec20 xx =

SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero Slides by Jure Leskovec21 = xx A=

SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero Slides by Jure Leskovec22 xx A= ~

SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero: Slides by Jure Leskovec23 xx A= ~

SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero: Slides by Jure Leskovec24 xx A= ~

SVD - Interpretation #2 More details Q: How exactly is dim. reduction done? A: Set the smallest singular values to zero Slides by Jure Leskovec25 ~ A= B= Frobenius norm: ǁ M ǁ F = Σ ij M ij 2 ǁ A-B ǁ F = Σ ij (A ij -B ij ) 2 is “small”

Slides by Jure Leskovec26 A U Sigma VTVT = B U VTVT = B is approx A

SVD – Best Low Rank Approx. Slides by Jure Leskovec27

SVD – Best Low Rank Approx. Slides by Jure Leskovec28 We apply: -- P column orthonormal -- R row orthonormal -- Q is diagonal

SVD – Best Low Rank Approx. Slides by Jure Leskovec29 U  V T - U S V T = U (  - S) V T

SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix: Slides by Jure Leskovec30 = xx u1u1 u2u2 σ1σ1 σ2σ2 v1v1 v2v2

SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix Slides by Jure Leskovec31 = u1u1 σ1σ1 vT1vT1 u2u2 σ2σ2 vT2vT2 + +... n m n x 1 1 x m k terms Assume: σ 1  σ 2  σ 3 ...  0 Why is setting small σs the thing to do? Vectors u i and v i are unit length, so σ i scales them. So, zeroing small σs introduces less error.

SVD - Interpretation #2 Q: How many σ s to keep? A: Rule-of-a thumb: keep 80-90% of ‘energy’ (=  σ i 2 ) Slides by Jure Leskovec32 =u1u1 σ1σ1 vT1vT1 u2u2 σ2σ2 vT2vT2 + +... n m assume: σ 1  σ 2  σ 3 ...

SVD - Complexity To compute SVD: – O(nm 2 ) or O(n 2 m) (whichever is less) But: – Less work, if we just want singular values – or if we want first k singular vectors – or if the matrix is sparse Implemented in linear algebra packages like – LINPACK, Matlab, SPlus, Mathematica... Slides by Jure Leskovec33

SVD - Conclusions so far SVD: A= U  V T : unique – U: user-to-concept similarities – V: movie-to-concept similarities –  : strength of each concept Dimensionality reduction: – keep the few largest singular values (80-90% of ‘energy’) – SVD: picks up linear correlations Slides by Jure Leskovec34

Case study: How to query? Q: Find users that like ‘Matrix’ and ‘Alien’ Slides by Jure Leskovec35 = SciFi Romnce xx Matrix Alien Serenity Casablanca Amelie

Case study: How to query? Q: Find users that like ‘Matrix’ A: Map query into a ‘concept space’ – how? Slides by Jure Leskovec36 = SciFi Romnce xx Matrix Alien Serenity Casablanca Amelie

Case study: How to query? Q: Find users that like ‘Matrix’ A: map query vectors into ‘concept space’ – how? Slides by Jure Leskovec37 q=q= Matrix Alien v1 q v2 Matrix Alien Serenity Casablanca Amelie Project into concept space: Inner product with each ‘concept’ vector v i

Case study: How to query? Q: Find users that like ‘Matrix’ A: map the vector into ‘concept space’ – how? Slides by Jure Leskovec38 v1 q q*v 1 q=q= Matrix Alien Serenity Casablanca Amelie v2 Matrix Alien Project into concept space: Inner product with each ‘concept’ vector v i

Case study: How to query? Compactly, we have: q concept = q V E.g.: Slides by Jure Leskovec39 movie-to-concept similarities = SciFi-concept q=q= Matrix Alien Serenity Casablanca Amelie

Case study: How to query? How would the user d that rated (‘Alien’, ‘Serenity’) be handled? d concept = d V E.g.: Slides by Jure Leskovec40 movie-to-concept similarities = SciFi-concept d= Matrix Alien Serenity Casablanca Amelie

Case study: How to query? Observation: User d that rated (‘Alien’, ‘Serenity’) will be similar to query “user” q that rated (‘Matrix’), although d did not rate ‘Matrix’! Slides by Jure Leskovec41 d= SciFi-concept q=q= Matrix Alien Serenity Casablanca Amelie Similarity = 0 Similarity ≠ 0

SVD: Drawbacks + Optimal low-rank approximation: in Frobenius norm - Interpretability problem: – A singular vector specifies a linear combination of all input columns or rows - Lack of sparsity: – Singular vectors are dense! Slides by Jure Leskovec42 U  VTVT

Download ppt "Dimensionality Reduction. High-dimensional == many features Find concepts/topics/genres: – Documents: Features: Thousands of words, millions of word pairs."

Similar presentations