Presentation is loading. Please wait.

Presentation is loading. Please wait.

Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU www.cs.cmu.edu/~christos.

Similar presentations


Presentation on theme: "Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU www.cs.cmu.edu/~christos."— Presentation transcript:

1 Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU www.cs.cmu.edu/~christos

2 U. of Alberta, 2001C. Faloutsos2 Outline Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search New tools for Data Mining: Fractals Conclusions Resources

3 U. of Alberta, 2001C. Faloutsos3 Problem Given a large collection of (multimedia) records, find similar/interesting things, ie: Allow fast, approximate queries, and Find rules/patterns

4 U. of Alberta, 2001C. Faloutsos4 Sample queries Similarity search –Find pairs of branches with similar sales patterns –find medical cases similar to Smith's –Find pairs of sensor series that move in sync

5 U. of Alberta, 2001C. Faloutsos5 Sample queries –cont’d Rule discovery –Clusters (of patients; of customers;...) –Forecasting (total sales for next year?) –Outliers (eg., fraud detection)

6 U. of Alberta, 2001C. Faloutsos6 Outline Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search New tools for Data Mining: Fractals Conclusions Resourses

7 U. of Alberta, 2001C. Faloutsos7 Indexing - Multimedia Problem: given a set of (multimedia) objects, find the ones similar to a desirable query object (quickly!)

8 U. of Alberta, 2001C. Faloutsos8 day $price 1365 day $price 1365 day $price 1365 distance function: by expert

9 U. of Alberta, 2001C. Faloutsos9 day 1365 day 1365 S1 Sn F(S1) F(Sn) ‘GEMINI’ - Pictorially eg, avg eg,. std off-the-shelf S.A.Ms (spatial Access Methods)

10 U. of Alberta, 2001C. Faloutsos10 ‘GEMINI’ fast; ‘correct’ (=no false dismissals) used for –images (eg., QBIC) (2x, 10x faster) –shapes (27x faster) –video (eg., InforMedia) –time sequences ([Rafiei+Mendelzon], ++)

11 U. of Alberta, 2001C. Faloutsos11 Remaining issues how to extract features automatically? how to merge similarity scores from different media

12 U. of Alberta, 2001C. Faloutsos12 Outline Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search –Visualization: Fastmap –Relevance feedback: FALCON Data Mining / Fractals Conclusions

13 U. of Alberta, 2001C. Faloutsos13 FastMap O1O2O3O4O5 O1011100 O2101100 O3110100 O4100 01 O5100 10 ~100 ~1 ??

14 U. of Alberta, 2001C. Faloutsos14 FastMap Multi-dimensional scaling (MDS) can do that, but in O(N**2) time We want a linear algorithm: FastMap [SIGMOD95]

15 U. of Alberta, 2001C. Faloutsos15 Applications: time sequences given n co-evolving time sequences visualize them + find rules [ICDE00] time rate HKD JPY DEM

16 U. of Alberta, 2001C. Faloutsos16 Applications - financial currency exchange rates [ICDE00] USD(t) USD(t-5) FRF GBP JPY HKD

17 U. of Alberta, 2001C. Faloutsos17 Applications - financial currency exchange rates [ICDE00] USD HKD JPY FRF DEM GBP USD(t) USD(t-5)

18 U. of Alberta, 2001C. Faloutsos18 Application: VideoTrails [ACM MM97]

19 U. of Alberta, 2001C. Faloutsos19 VideoTrails - usage scene-cut detection (about 10% errors) scene classification (eg., dialogue vs action)

20 U. of Alberta, 2001C. Faloutsos20 Outline Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search –Visualization: Fastmap –Relevance feedback: FALCON Data Mining / Fractals Conclusions

21 U. of Alberta, 2001C. Faloutsos21 Merging similarity scores eg., video: text, color, motion, audio –weights change with the query! solution 1: user specifies weights solution 2: user gives examples –and we ‘learn’ what he/she wants: rel. feedback (Rocchio, MARS, MindReader) –but: how about disjunctive queries?

22 U. of Alberta, 2001C. Faloutsos22 DEMO server demo

23 U. of Alberta, 2001C. Faloutsos23 ‘FALCON’ Inverted VsVs Trader wants only ‘unstable’ stocks

24 U. of Alberta, 2001C. Faloutsos24 ‘FALCON’ Inverted VsVs average: is flat!

25 U. of Alberta, 2001C. Faloutsos25 “Single query point” methods Rocchio + + + + + + x avg std

26 U. of Alberta, 2001C. Faloutsos26 “Single query point” methods RocchioMindReader + + + + + + + + + + + + + + + + + + MARS The averaging affect in action... xx x

27 U. of Alberta, 2001C. Faloutsos27 + + + + + Main idea: FALCON Contours feature1 (eg., avg) feature2 eg., std [Wu+, vldb2000]

28 U. of Alberta, 2001C. Faloutsos28 A: Aggregate Dissimilarity  : parameter (~ -5 ~ ‘soft OR’) + + + + + g1 g2 x

29 U. of Alberta, 2001C. Faloutsos29 converges quickly (~5 iterations) good precision/recall is fast (can use off-the-shelf ‘spatial/metric access methods’) FALCON

30 U. of Alberta, 2001C. Faloutsos30 Conclusions for indexing + visualization GEMINI: fast indexing, exploiting off-the- shelf SAMs FastMap: automatic feature extraction in O(N) time FALCON: relevance feedback for disjunctive queries

31 U. of Alberta, 2001C. Faloutsos31 Outline Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search New tools for Data Mining: Fractals Conclusions Resourses

32 U. of Alberta, 2001C. Faloutsos32 Data mining & fractals – Road map Motivation – problems / case study Definition of fractals and power laws Solutions to posed problems More examples

33 U. of Alberta, 2001C. Faloutsos33 Problem #1 - spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol) - ‘spiral’ and ‘elliptical’ galaxies (stores & households; healthy & ill subjects) - patterns? (not Gaussian; not uniform) -attraction/repulsion? - separability??

34 U. of Alberta, 2001C. Faloutsos34 Problem#2: dim. reduction given attributes x 1,... x n –possibly, non-linearly correlated drop the useless ones (Q: why? A: to avoid the ‘dimensionality curse’) engine size mpg

35 U. of Alberta, 2001C. Faloutsos35 Answer: Fractals / self-similarities / power laws

36 U. of Alberta, 2001C. Faloutsos36 What is a fractal? = self-similar point set, e.g., Sierpinski triangle:... zero area; infinite length!

37 U. of Alberta, 2001C. Faloutsos37 Definitions (cont’d) Paradox: Infinite perimeter ; Zero area! ‘dimensionality’: between 1 and 2 actually: Log(3)/Log(2) = 1.58… (long story)

38 U. of Alberta, 2001C. Faloutsos38 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? xy 51 42 33 24 Eg: #cylinders; miles / gallon

39 U. of Alberta, 2001C. Faloutsos39 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1

40 U. of Alberta, 2001C. Faloutsos40 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 Q: fd of a plane? A: nn ( <= r ) ~ r^2 fd== slope of (log(nn) vs log(r) )

41 U. of Alberta, 2001C. Faloutsos41 Sierpinsky triangle log( r ) log(#pairs within <=r ) 1.58 == ‘correlation integral’

42 U. of Alberta, 2001C. Faloutsos42 Observations self-similarity -> fractals scale-free power-laws (y=x^a, F=C*r^(-2)) log( r ) log(#pairs within <=r ) 1.58

43 U. of Alberta, 2001C. Faloutsos43 Road map Motivation – problems / case studies Definition of fractals and power laws Solutions to posed problems More examples Conclusions

44 U. of Alberta, 2001C. Faloutsos44 Solution#1: spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol - ‘BOPS’ plot - [sigmod2000]) clusters? separable? attraction/repulsion? data ‘scrubbing’ – duplicates?

45 U. of Alberta, 2001C. Faloutsos45 Solution#1: spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell - 1.8 slope - plateau! - repulsion!

46 U. of Alberta, 2001C. Faloutsos46 Solution#1: spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell - 1.8 slope - plateau! - repulsion! [w/ Seeger, Traina, Traina, SIGMOD00]

47 U. of Alberta, 2001C. Faloutsos47 spatial d.m. r1r2 r1 r2 Heuristic on choosing # of clusters

48 U. of Alberta, 2001C. Faloutsos48 Solution#1: spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell - 1.8 slope - plateau! - repulsion!

49 U. of Alberta, 2001C. Faloutsos49 Solution#1: spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell - 1.8 slope - plateau! -repulsion!! -duplicates

50 U. of Alberta, 2001C. Faloutsos50 Problem #2: Dim. reduction

51 U. of Alberta, 2001C. Faloutsos51 Solution: drop the attributes that don’t increase the ‘partial f.d.’ PFD dfn: PFD of attribute set A is the f.d. of the projected cloud of points [w/ Traina, Traina, Wu, SBBD00]

52 U. of Alberta, 2001C. Faloutsos52 Problem #2: dim. reduction PFD~1 global FD=1 PFD=1 PFD=0 PFD=1

53 U. of Alberta, 2001C. Faloutsos53 Problem #2: dim. reduction PFD~1 PFD=1 global FD=1 PFD=1 PFD=0 PFD=1 Notice: ‘max variance’ would fail here

54 U. of Alberta, 2001C. Faloutsos54 Problem #2: dim. reduction PFD~1 global FD=1 PFD=1 PFD=0 PFD=1 Notice: SVD would fail here

55 U. of Alberta, 2001C. Faloutsos55 Currency dataset

56 U. of Alberta, 2001C. Faloutsos56 self-similar? fd=1.98 fd=4.25 currency eigenfaces

57 U. of Alberta, 2001C. Faloutsos57 FDR on the ‘currency’ dataset if unif + indep.

58 U. of Alberta, 2001C. Faloutsos58 FDR on the ‘currency’ dataset if unif + indep. HKD: “useless” >1.98 axis are needed

59 U. of Alberta, 2001C. Faloutsos59 Road map Motivation – problems / case studies Definition of fractals and power laws Solutions to posed problems More examples Conclusions

60 U. of Alberta, 2001C. Faloutsos60 App. : traffic disk traces: self-similar (also: web traffic; comm. errors; etc) time #bytes

61 U. of Alberta, 2001C. Faloutsos61 More apps: Brain scans Oct-trees; brain-scans octree levels Log(#octants) 2.63 = fd

62 U. of Alberta, 2001C. Faloutsos62 More fractals: stock prices (LYCOS) - random walks: 1.5 1 year2 years

63 U. of Alberta, 2001C. Faloutsos63 More fractals: coast-lines: 1.1-1.2 (up to 1.58)

64 U. of Alberta, 2001C. Faloutsos64

65 U. of Alberta, 2001C. Faloutsos65 Examples:MG county Montgomery County of MD (road end- points)

66 U. of Alberta, 2001C. Faloutsos66 Examples:LB county Long Beach county of CA (road end-points)

67 U. of Alberta, 2001C. Faloutsos67 More power laws: Zipf’s law Bible - rank vs frequency (log-log) log(rank) log(freq) “a” “the”

68 U. of Alberta, 2001C. Faloutsos68 More power laws Freq. distr. of first names; last names (Mandelbrot)

69 U. of Alberta, 2001C. Faloutsos69 Internet Internet routers: how many neighbors within h hops? U of Alberta

70 U. of Alberta, 2001C. Faloutsos70 Internet topology Internet routers: how many neighbors within h hops? [SIGCOMM 99] Reachability function: number of neighbors within r hops, vs r (log- log). Mbone routers, 1995 log(hops) log(#pairs) 2.8

71 U. of Alberta, 2001C. Faloutsos71 More power laws: areas – Korcak’s law Scandinavian lakes ([icde99], w/ Proietti)

72 U. of Alberta, 2001C. Faloutsos72 More power laws: areas – Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area)

73 U. of Alberta, 2001C. Faloutsos73 Olympic medals: log rank log(# medals)

74 U. of Alberta, 2001C. Faloutsos74 More power laws Energy of earthquakes (Gutenberg-Richter law) [simscience.org] log(count) magnitudeday amplitude

75 U. of Alberta, 2001C. Faloutsos75 Even more power laws: Income distribution (Pareto’s law); sales distributions; duration of UNIX jobs Distribution of UNIX file sizes publication counts (Lotka’s law)

76 U. of Alberta, 2001C. Faloutsos76 Even more power laws: web hit frequencies ([Huberman]) hyper-link distribution [Barabasi], ++

77 U. of Alberta, 2001C. Faloutsos77 Overall Conclusions: ‘Find similar/interesting things’ in multimedia databases Indexing: feature extraction (‘GEMINI’) –automatic feature extraction: FastMap –Relevance feedback: FALCON

78 U. of Alberta, 2001C. Faloutsos78 Conclusions - cont’d New tools for Data Mining: Fractals/power laws: –appear everywhere –lead to skewed distributions (Gaussian, Poisson, uniformity, independence) –‘correlation integral’ for separability/cluster detection –PFD for dimensionality reduction

79 U. of Alberta, 2001C. Faloutsos79 Conclusions - cont’d –can model bursty time sequences (buffering/prefetching) –selectivity estimation (‘how many neighbors within x km?) –dim. curse diagnosis (it’s the fractal dim. that matters! [ICDE2000])

80 U. of Alberta, 2001C. Faloutsos80 Resources: Software and papers: –http://www.cs.cmu.edu/~christoshttp://www.cs.cmu.edu/~christos –Fractal dimension (FracDim) –Separability (sigmod 2000) –Relevance feedback for query by content (FALCON – vldb 2000)


Download ppt "Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU www.cs.cmu.edu/~christos."

Similar presentations


Ads by Google