Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.

Similar presentations


Presentation on theme: "Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and."— Presentation transcript:

1 Multimedia Databases LSI and SVD

2 Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and LSI

3 Information Filtering + LSI [Foltz+,’92] Goal: users specify interests (= keywords) system alerts them, on suitable news- documents Major contribution: LSI = Latent Semantic Indexing latent (‘hidden’) concepts

4 Information Filtering + LSI Main idea map each document into some ‘concepts’ map each term into some ‘concepts’ ‘Concept’:~ a set of terms, with weights, e.g. “data” (0.8), “system” (0.5), “retrieval” (0.6) -> DBMS_concept

5 Information Filtering + LSI Pictorially: term-document matrix (BEFORE)

6 Information Filtering + LSI Pictorially: concept-document matrix and...

7 Information Filtering + LSI... and concept-term matrix

8 Information Filtering + LSI Q: How to search, eg., for ‘system’?

9 Information Filtering + LSI A: find the corresponding concept(s); and the corresponding documents

10 Information Filtering + LSI A: find the corresponding concept(s); and the corresponding documents

11 Information Filtering + LSI Thus it works like an (automatically constructed) thesaurus: we may retrieve documents that DON’T have the term ‘system’, but they contain almost everything else (‘data’, ‘retrieval’)

12 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties

13 SVD - Motivation problem #1: text - LSI: find ‘concepts’ problem #2: compression / dim. reduction

14 SVD - Motivation problem #1: text - LSI: find ‘concepts’

15 SVD - Motivation problem #2: compress / reduce dimensionality

16 Problem - specs ~10**6 rows; ~10**3 columns; no updates; random access to any cell(s) ; small error: OK

17 SVD - Motivation

18

19 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties

20 SVD - Definition A [n x m] = U [n x r]   r x r] (V [m x r] ) T A: n x m matrix (eg., n documents, m terms) U: n x r matrix (n documents, r concepts)  : r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) V: m x r matrix (m terms, r concepts)

21 SVD - Definition A = U  V T - example:

22 SVD - Properties THEOREM [Press+92]: always possible to decompose matrix A into A = U  V T, where U,  V: unique (*) U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other) U T U = I; V T V = I (I: identity matrix)  : eigenvalues are positive, and sorted in decreasing order

23 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx

24 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx CS-concept MD-concept

25 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx CS-concept MD-concept doc-to-concept similarity matrix

26 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx ‘strength’ of CS-concept

27 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx term-to-concept similarity matrix CS-concept

28 SVD - Example A = U  V T - example: data inf. retrieval brain lung = CS MD xx term-to-concept similarity matrix CS-concept

29 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties

30 SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: U: document-to-concept similarity matrix V: term-to-concept sim. matrix  : its diagonal elements: ‘strength’ of each concept

31 SVD - Interpretation #2 best axis to project on: (‘best’ = min sum of squares of projection errors)

32 SVD - Motivation

33 SVD - interpretation #2 minimum RMS error SVD: gives best axis to project v1

34 SVD - Interpretation #2

35 A = U  V T - example: = xx v1

36 SVD - Interpretation #2 A = U  V T - example: = xx variance (‘spread’) on the v1 axis

37 SVD - Interpretation #2 A = U  V T - example: U  gives the coordinates of the points in the projection axis = xx

38 SVD - Interpretation #2 More details Q: how exactly is dim. reduction done? = xx

39 SVD - Interpretation #2 More details Q: how exactly is dim. reduction done? A: set the smallest eigenvalues to zero: = xx

40 SVD - Interpretation #2 ~ xx

41 ~ xx

42 ~ xx

43 ~

44 Equivalent: ‘spectral decomposition’ of the matrix: = xx

45 SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix: = xx u1u1 u2u2 1 2 v1v1 v2v2

46 SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix: =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m

47 SVD - Interpretation #2 ‘spectral decomposition’ of the matrix: =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m n x 1 1 x m r terms

48 SVD - Interpretation #2 approximation / dim. reduction: by keeping the first few terms (Q: how many?) =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m assume: 1 >= 2 >=...

49 SVD - Interpretation #2 A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of i ’s) =u1u1 1 vT1vT1 u2u2 2 vT2vT2 + +... n m assume: 1 >= 2 >=...

50 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

51 SVD - Interpretation #3 finds non-zero ‘blobs’ in a data matrix = xx

52 SVD - Interpretation #3 Drill: find the SVD, ‘by inspection’! Q: rank = ?? = xx??

53 SVD - Interpretation #3 A: rank = 2 (2 linearly independent rows/cols) = xx??

54 SVD - Interpretation #3 A: rank = 2 (2 linearly independent rows/cols) = xx orthogonal??

55 SVD - Interpretation #3 column vectors: are orthogonal - but not unit vectors: = xx

56 SVD - Interpretation #3 and the eigenvalues are: = xx

57 SVD - Interpretation #3 Q: How to check we are correct? = xx

58 SVD - Interpretation #3 A: SVD properties: matrix product should give back matrix A matrix U should be column-orthonormal, i.e., columns should be unit vectors, orthogonal to each other ditto for matrix V matrix  should be diagonal, with positive values

59 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies Additional properties

60 SVD - Complexity O( n * m * m) or O( n * n * m) (whichever is less) less work, if we just want eigenvalues or if we want first k eigenvectors or if the matrix is sparse [Berry] Implemented: in any linear algebra package (LINPACK, matlab, Splus, mathematica...)

61 SVD - conclusions so far SVD: A= U  V T : unique (*) U: document-to-concept similarities V: term-to-concept similarities  : strength of each concept dim. reduction: keep the first few strongest eigenvalues (80-90% of ‘energy’) SVD: picks up linear correlations SVD: picks up non-zero ‘blobs’

62 References Berry, Michael: http://www.cs.utk.edu/~lsi/ Fukunaga, K. (1990). Introduction to Statistical Pattern Recognition, Academic Press. Press, W. H., S. A. Teukolsky, et al. (1992). Numerical Recipes in C, Cambridge University Press.


Download ppt "Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and."

Similar presentations


Ads by Google