Presentation is loading. Please wait.

Presentation is loading. Please wait.

10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.

Similar presentations


Presentation on theme: "10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos."— Presentation transcript:

1 10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos

2 Multi DB and D.M.Copyright: C. Faloutsos (2001)2 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining

3 Multi DB and D.M.Copyright: C. Faloutsos (2001)3 Indexing - Detailed outline primary key indexing secondary key / multi-key indexing spatial access methods fractals text Singular Value Decomposition (SVD) multimedia...

4 Multi DB and D.M.Copyright: C. Faloutsos (2001)4 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties

5 Multi DB and D.M.Copyright: C. Faloutsos (2001)5 SVD - Motivation problem #1: text - LSI: find ‘concepts’ problem #2: compression / dim. reduction

6 Multi DB and D.M.Copyright: C. Faloutsos (2001)6 SVD - Motivation problem #1: text - LSI: find ‘concepts’

7 Multi DB and D.M.Copyright: C. Faloutsos (2001)7 SVD - Motivation problem #2: compress / reduce dimensionality

8 Multi DB and D.M.Copyright: C. Faloutsos (2001)8 Problem - specs ~10**6 rows; ~10**3 columns; no updates; random access to any cell(s) ; small error: OK

9 Multi DB and D.M.Copyright: C. Faloutsos (2001)9 SVD - Motivation

10 Multi DB and D.M.Copyright: C. Faloutsos (2001)10 SVD - Motivation

11 Multi DB and D.M.Copyright: C. Faloutsos (2001)11 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties

12 Multi DB and D.M.Copyright: C. Faloutsos (2001)12 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 =

13 Multi DB and D.M.Copyright: C. Faloutsos (2001)13 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 = 3 x 1

14 Multi DB and D.M.Copyright: C. Faloutsos (2001)14 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 = 3 x 1

15 Multi DB and D.M.Copyright: C. Faloutsos (2001)15 SVD - Definition (reminder: matrix multiplication x 3 x 22 x 1 = 3 x 1

16 Multi DB and D.M.Copyright: C. Faloutsos (2001)16 SVD - Definition (reminder: matrix multiplication x=

17 Multi DB and D.M.Copyright: C. Faloutsos (2001)17 SVD - Definition A [n x m] = U [n x r]   r x r] (V [m x r] ) T A: n x m matrix (eg., n documents, m terms) U: n x r matrix (n documents, r concepts)  : r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) V: m x r matrix (m terms, r concepts)

18 Multi DB and D.M.Copyright: C. Faloutsos (2001)18 SVD - Definition A = U  V T - example:

19 Multi DB and D.M.Copyright: C. Faloutsos (2001)19 SVD - Properties THEOREM [Press+92]: always possible to decompose matrix A into A = U  V T, where U,  V: unique (*) U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other) –U T U = I; V T V = I (I: identity matrix)  : eigenvalues are positive, and sorted in decreasing order

20 Multi DB and D.M.Copyright: C. Faloutsos (2001)20 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx

21 Multi DB and D.M.Copyright: C. Faloutsos (2001)21 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx CS-concept MD-concept

22 Multi DB and D.M.Copyright: C. Faloutsos (2001)22 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx CS-concept MD-concept doc-to-concept similarity matrix

23 Multi DB and D.M.Copyright: C. Faloutsos (2001)23 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx ‘strength’ of CS-concept

24 Multi DB and D.M.Copyright: C. Faloutsos (2001)24 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx term-to-concept similarity matrix CS-concept

25 Multi DB and D.M.Copyright: C. Faloutsos (2001)25 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx term-to-concept similarity matrix CS-concept

26 Multi DB and D.M.Copyright: C. Faloutsos (2001)26 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties

27 Multi DB and D.M.Copyright: C. Faloutsos (2001)27 SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: U: document-to-concept similarity matrix V: term-to-concept sim. matrix  : its diagonal elements: ‘strength’ of each concept

28 Multi DB and D.M.Copyright: C. Faloutsos (2001)28 SVD - Interpretation #2 best axis to project on: (‘best’ = min sum of squares of projection errors)

29 Multi DB and D.M.Copyright: C. Faloutsos (2001)29 SVD - Motivation

30 Multi DB and D.M.Copyright: C. Faloutsos (2001)30 SVD - interpretation #2 minimum RMS error SVD: gives best axis to project v1

31 Multi DB and D.M.Copyright: C. Faloutsos (2001)31 SVD - Interpretation #2

32 Multi DB and D.M.Copyright: C. Faloutsos (2001)32 SVD - Interpretation #2 A = U  V T - example: = xx v1

33 Multi DB and D.M.Copyright: C. Faloutsos (2001)33 SVD - Interpretation #2 A = U  V T - example: = xx variance (‘spread’) on the v1 axis

34 Multi DB and D.M.Copyright: C. Faloutsos (2001)34 SVD - Interpretation #2 A = U  V T - example: –U  gives the coordinates of the points in the projection axis = xx

35 Multi DB and D.M.Copyright: C. Faloutsos (2001)35 SVD - Interpretation #2 More details Q: how exactly is dim. reduction done? = xx

36 Multi DB and D.M.Copyright: C. Faloutsos (2001)36 SVD - Interpretation #2 More details Q: how exactly is dim. reduction done? A: set the smallest eigenvalues to zero: = xx

37 Multi DB and D.M.Copyright: C. Faloutsos (2001)37 SVD - Interpretation #2 ~ xx

38 Multi DB and D.M.Copyright: C. Faloutsos (2001)38 SVD - Interpretation #2 ~ xx

39 Multi DB and D.M.Copyright: C. Faloutsos (2001)39 SVD - Interpretation #2 ~ xx

40 Multi DB and D.M.Copyright: C. Faloutsos (2001)40 SVD - Interpretation #2 ~

41 Multi DB and D.M.Copyright: C. Faloutsos (2001)41 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: = xx

42 Multi DB and D.M.Copyright: C. Faloutsos (2001)42 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: = xx u1u1 u2u2 1 2 v1v1 v2v2

43 Multi DB and D.M.Copyright: C. Faloutsos (2001)43 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m

44 Multi DB and D.M.Copyright: C. Faloutsos (2001)44 SVD - Interpretation #2 Exactly equivalent: ‘spectral decomposition’ of the matrix: =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m n x 1 1 x m r terms

45 Multi DB and D.M.Copyright: C. Faloutsos (2001)45 SVD - Interpretation #2 approximation / dim. reduction: by keeping the first few terms (Q: how many?) =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m assume: 1 >= 2 >=...

46 Multi DB and D.M.Copyright: C. Faloutsos (2001)46 SVD - Interpretation #2 A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of i ’s) =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m assume: 1 >= 2 >=...

47 Multi DB and D.M.Copyright: C. Faloutsos (2001)47 SVD - Detailed outline Motivation Definition - properties Interpretation –#1: documents/terms/concepts –#2: dim. reduction –#3: picking non-zero, rectangular ‘blobs’ Complexity Case studies Additional properties

48 Multi DB and D.M.Copyright: C. Faloutsos (2001)48 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

49 Multi DB and D.M.Copyright: C. Faloutsos (2001)49 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

50 Multi DB and D.M.Copyright: C. Faloutsos (2001)50 SVD - Interpretation #3 Drill: find the SVD, ‘by inspection’! Q: rank = ?? = xx??

51 Multi DB and D.M.Copyright: C. Faloutsos (2001)51 SVD - Interpretation #3 A: rank = 2 (2 linearly independent rows/cols) = xx??

52 Multi DB and D.M.Copyright: C. Faloutsos (2001)52 SVD - Interpretation #3 A: rank = 2 (2 linearly independent rows/cols) = xx orthogonal??

53 Multi DB and D.M.Copyright: C. Faloutsos (2001)53 SVD - Interpretation #3 column vectors: are orthogonal - but not unit vectors: = xx

54 Multi DB and D.M.Copyright: C. Faloutsos (2001)54 SVD - Interpretation #3 and the eigenvalues are: = xx

55 Multi DB and D.M.Copyright: C. Faloutsos (2001)55 SVD - Interpretation #3 Q: How to check we are correct? = xx

56 Multi DB and D.M.Copyright: C. Faloutsos (2001)56 SVD - Interpretation #3 A: SVD properties: –matrix product should give back matrix A –matrix U should be column-orthonormal, i.e., columns should be unit vectors, orthogonal to each other –ditto for matrix V –matrix  should be diagonal, with positive values

57 Multi DB and D.M.Copyright: C. Faloutsos (2001)57 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties

58 Multi DB and D.M.Copyright: C. Faloutsos (2001)58 SVD - Complexity O( n * m * m) or O( n * n * m) (whichever is less) less work, if we just want eigenvalues or if we want first k eigenvectors or if the matrix is sparse [Berry] Implemented: in any linear algebra package (LINPACK, matlab, Splus, mathematica...)

59 Multi DB and D.M.Copyright: C. Faloutsos (2001)59 SVD - conclusions so far SVD: A= U  V T : unique (*) U: document-to-concept similarities V: term-to-concept similarities  : strength of each concept dim. reduction: keep the first few strongest eigenvalues (80-90% of ‘energy’) –SVD: picks up linear correlations SVD: picks up non-zero ‘blobs’

60 Multi DB and D.M.Copyright: C. Faloutsos (2001)60 References Berry, Michael: http://www.cs.utk.edu/~lsi/ Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, Academic Press. Press, W. H., S. A. Teukolsky, et al. (1992). Numerical Recipes in C, Cambridge University Press.


Download ppt "10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos."

Similar presentations


Ads by Google