Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.

Similar presentations


Presentation on theme: "CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos."— Presentation transcript:

1 CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos

2 CMU SCS 15-826Copyright: C. Faloutsos (2014)2 Must-read Material Numerical Recipes in C ch. 2.6;Numerical Recipes in C MM Textbook Appendix DMM Textbook

3 CMU SCS 15-826Copyright: C. Faloutsos (2014)3 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining

4 CMU SCS 15-826Copyright: C. Faloutsos (2014)4 Indexing - Detailed outline primary key indexing secondary key / multi-key indexing spatial access methods fractals text Singular Value Decomposition (SVD) multimedia...

5 CMU SCS 15-826Copyright: C. Faloutsos (2014)5 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties

6 CMU SCS 15-826Copyright: C. Faloutsos (2014)6 SVD - Motivation problem #1: text - LSI: find ‘concepts’ problem #2: compression / dim. reduction

7 CMU SCS 15-826Copyright: C. Faloutsos (2014)7 SVD - Motivation problem #1: text - LSI: find ‘concepts’

8 CMU SCS 15-826Copyright: C. Faloutsos (2014)P1-8 SVD - Motivation Customer-product, for recommendation system: bread lettuce beef vegetarians meat eaters tomatos chicken

9 CMU SCS 15-826Copyright: C. Faloutsos (2014)9 SVD - Motivation problem #2: compress / reduce dimensionality

10 CMU SCS 15-826Copyright: C. Faloutsos (2014)10 Problem - specs ~10**6 rows; ~10**3 columns; no updates; random access to any cell(s) ; small error: OK

11 CMU SCS 15-826Copyright: C. Faloutsos (2014)11 SVD - Motivation

12 CMU SCS 15-826Copyright: C. Faloutsos (2014)12 SVD - Motivation

13 CMU SCS 15-826Copyright: C. Faloutsos (2014)13 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties

14 CMU SCS 15-826Copyright: C. Faloutsos (2014)14 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 =

15 CMU SCS 15-826Copyright: C. Faloutsos (2014)15 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 = 3 x 1

16 CMU SCS 15-826Copyright: C. Faloutsos (2014)16 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 = 3 x 1

17 CMU SCS 15-826Copyright: C. Faloutsos (2014)17 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 = 3 x 1

18 CMU SCS 15-826Copyright: C. Faloutsos (2014)18 SVD - Definition (reminder: matrix multiplication x=

19 CMU SCS 15-826Copyright: C. Faloutsos (2014)19 SVD - Definition A = U  V T - example:

20 CMU SCS 15-826Copyright: C. Faloutsos (2014)20 SVD - Definition A [n x m] = U [n x r]   r x r] (V [m x r] ) T A: n x m matrix (eg., n documents, m terms) U: n x r matrix (n documents, r concepts)  : r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) V: m x r matrix (m terms, r concepts)

21 CMU SCS 15-826Copyright: C. Faloutsos (2014)21 SVD - Properties THEOREM [Press+92]: always possible to decompose matrix A into A = U  V T, where U,  V: unique (*) U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other) –U T U = I; V T V = I (I: identity matrix)  : singular are positive, and sorted in decreasing order

22 CMU SCS 15-826Copyright: C. Faloutsos (2014)22 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx

23 CMU SCS 15-826Copyright: C. Faloutsos (2014)23 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx CS-concept MD-concept

24 CMU SCS 15-826Copyright: C. Faloutsos (2014)24 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx CS-concept MD-concept doc-to-concept similarity matrix

25 CMU SCS 15-826Copyright: C. Faloutsos (2014)25 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx ‘strength’ of CS-concept

26 CMU SCS 15-826Copyright: C. Faloutsos (2014)26 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx term-to-concept similarity matrix CS-concept

27 CMU SCS 15-826Copyright: C. Faloutsos (2014)27 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx term-to-concept similarity matrix CS-concept

28 CMU SCS 15-826Copyright: C. Faloutsos (2014)28 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties

29 CMU SCS 15-826Copyright: C. Faloutsos (2014)29 SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: U: document-to-concept similarity matrix V: term-to-concept sim. matrix  : its diagonal elements: ‘strength’ of each concept

30 CMU SCS 15-826Copyright: C. Faloutsos (2014)30 SVD – Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: Q: if A is the document-to-term matrix, what is A T A? A: Q: A A T ? A:

31 CMU SCS 15-826Copyright: C. Faloutsos (2014)31 SVD – Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: Q: if A is the document-to-term matrix, what is A T A? A: term-to-term ([m x m]) similarity matrix Q: A A T ? A: document-to-document ([n x n]) similarity matrix

32 CMU SCS 15-826Copyright: C. Faloutsos (2014)P1-32Copyright: Faloutsos, Tong (2009) 2-32 SVD properties V are the eigenvectors of the covariance matrix A T A U are the eigenvectors of the Gram (inner- product) matrix AA T Further reading: 1. Ian T. Jolliffe, Principal Component Analysis (2 nd ed), Springer, 2002. 2. Gilbert Strang, Linear Algebra and Its Applications (4 th ed), Brooks Cole, 2005.

33 CMU SCS 15-826Copyright: C. Faloutsos (2014)33 SVD - Interpretation #2 best axis to project on: (‘best’ = min sum of squares of projection errors)

34 CMU SCS 15-826Copyright: C. Faloutsos (2014)34 SVD - Motivation

35 CMU SCS 15-826Copyright: C. Faloutsos (2014)35 SVD - interpretation #2 minimum RMS error SVD: gives best axis to project v1 first singular vector

36 CMU SCS 15-826Copyright: C. Faloutsos (2014)36 SVD - Interpretation #2

37 CMU SCS 15-826Copyright: C. Faloutsos (2014)37 SVD - Interpretation #2 A = U  V T - example: = xx v1

38 CMU SCS 15-826Copyright: C. Faloutsos (2014)38 SVD - Interpretation #2 A = U  V T - example: = xx variance (‘spread’) on the v1 axis

39 CMU SCS 15-826Copyright: C. Faloutsos (2014)39 SVD - Interpretation #2 A = U  V T - example: –U  gives the coordinates of the points in the projection axis = xx

40 CMU SCS 15-826Copyright: C. Faloutsos (2014)40 SVD - Interpretation #2 More details Q: how exactly is dim. reduction done? = xx

41 CMU SCS 15-826Copyright: C. Faloutsos (2014)41 SVD - Interpretation #2 More details Q: how exactly is dim. reduction done? A: set the smallest singular values to zero: = xx

42 CMU SCS 15-826Copyright: C. Faloutsos (2014)42 SVD - Interpretation #2 ~ xx

43 CMU SCS 15-826Copyright: C. Faloutsos (2014)43 SVD - Interpretation #2 ~ xx

44 CMU SCS 15-826Copyright: C. Faloutsos (2014)44 SVD - Interpretation #2 ~ xx

45 CMU SCS 15-826Copyright: C. Faloutsos (2014)45 SVD - Interpretation #2 ~

46 CMU SCS 15-826Copyright: C. Faloutsos (2014)46 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: = xx

47 CMU SCS 15-826Copyright: C. Faloutsos (2014)47 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: = xx u1u1 u2u2 1 2 v1v1 v2v2

48 CMU SCS 15-826Copyright: C. Faloutsos (2014)48 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m

49 CMU SCS 15-826Copyright: C. Faloutsos (2014)49 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m n x 1 1 x m r terms

50 CMU SCS 15-826Copyright: C. Faloutsos (2014)50 SVD - Interpretation #2 approximation / dim. reduction: by keeping the first few terms (Q: how many?) =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m assume: 1 >= 2 >=...

51 CMU SCS 15-826Copyright: C. Faloutsos (2014)51 SVD - Interpretation #2 A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of i ’s) =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m assume: 1 >= 2 >=...

52 CMU SCS 15-826Copyright: C. Faloutsos (2014)52 Pictorially: matrix form of SVD –Best rank-k approximation in L2 A m n  m n U VTVT 

53 CMU SCS 15-826Copyright: C. Faloutsos (2014)53 Pictorially: Spectral form of SVD –Best rank-k approximation in L2 A m n  + 1u1v11u1v1 2u2v22u2v2

54 CMU SCS 15-826Copyright: C. Faloutsos (2014)54 SVD - Detailed outline Motivation Definition - properties Interpretation –#1: documents/terms/concepts –#2: dim. reduction –#3: picking non-zero, rectangular ‘blobs’ Complexity Case studies Additional properties

55 CMU SCS 15-826Copyright: C. Faloutsos (2014)55 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

56 CMU SCS 15-826Copyright: C. Faloutsos (2014)56 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

57 CMU SCS 15-826Copyright: C. Faloutsos (2014)P1-57 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = ‘communities’ (bi-partite cores, here) Row 1 Row 4 Col 1 Col 3 Col 4 Row 5 Row 7

58 CMU SCS 15-826Copyright: C. Faloutsos (2014)58 SVD - Interpretation #3 Drill: find the SVD, ‘by inspection’! Q: rank = ?? = xx??

59 CMU SCS 15-826Copyright: C. Faloutsos (2014)59 SVD - Interpretation #3 A: rank = 2 (2 linearly independent rows/cols) = xx??

60 CMU SCS 15-826Copyright: C. Faloutsos (2014)60 SVD - Interpretation #3 A: rank = 2 (2 linearly independent rows/cols) = xx orthogonal??

61 CMU SCS 15-826Copyright: C. Faloutsos (2014)61 SVD - Interpretation #3 column vectors: are orthogonal - but not unit vectors: = xx 1/sqrt(3) 0 0 1/sqrt(2)

62 CMU SCS 15-826Copyright: C. Faloutsos (2014)62 SVD - Interpretation #3 and the singular values are: = xx 1/sqrt(3) 0 0 1/sqrt(2)

63 CMU SCS 15-826Copyright: C. Faloutsos (2014)63 SVD - Interpretation #3 Q: How to check we are correct? = xx 1/sqrt(3) 0 0 1/sqrt(2)

64 CMU SCS 15-826Copyright: C. Faloutsos (2014)64 SVD - Interpretation #3 A: SVD properties: –matrix product should give back matrix A –matrix U should be column-orthonormal, i.e., columns should be unit vectors, orthogonal to each other –ditto for matrix V –matrix  should be diagonal, with positive values

65 CMU SCS 15-826Copyright: C. Faloutsos (2014)65 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties

66 CMU SCS 15-826Copyright: C. Faloutsos (2014)66 SVD - Complexity O( n * m * m) or O( n * n * m) (whichever is less) less work, if we just want singular values or if we want first k singular vectors or if the matrix is sparse [Berry] Implemented: in any linear algebra package (LINPACK, matlab, Splus/R, mathematica...)

67 CMU SCS 15-826Copyright: C. Faloutsos (2014)67 SVD - conclusions so far SVD: A= U  V T : unique (*) U: document-to-concept similarities V: term-to-concept similarities  : strength of each concept dim. reduction: keep the first few strongest singular values (80-90% of ‘energy’) –SVD: picks up linear correlations SVD: picks up non-zero ‘blobs’

68 CMU SCS 15-826Copyright: C. Faloutsos (2014)68 References Berry, Michael: http://www.cs.utk.edu/~lsi/ Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, Academic Press. Press, W. H., S. A. Teukolsky, et al. (1992). Numerical Recipes in C, Cambridge University Press.


Download ppt "CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos."

Similar presentations


Ads by Google