Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU www.cs.cmu.edu/~christos.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
Deepayan ChakrabartiCIKM F4: Large Scale Automated Forecasting Using Fractals -Deepayan Chakrabarti -Christos Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #25: Multimedia indexing C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
15-826: Multimedia Databases and Data Mining
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.
Indexing Time Series Based on Slides by C. Faloutsos (CMU) and D. Gunopulos (UCR)
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU
On Power-Law Relationships of the Internet Topology CSCI 780, Fall 2005.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
CMU SCS Graph and stream mining Christos Faloutsos CMU.
1 ISI’02 Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional.
School of Computer Science Carnegie Mellon Boston U., 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
Carnegie Mellon Powerful Tools for Data Mining Fractals, Power laws, SVD C. Faloutsos Carnegie Mellon University.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
Dimensionality Reduction
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Data Mining using Fractals and Power laws
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
Introduction to Fractals and Fractal Dimension Christos Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
School of Computer Science Carnegie Mellon UIUC 04C. Faloutsos1 Advanced Data Mining Tools: Fractals and Power Laws for Graphs, Streams and Traditional.
CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals – examples & algo’s C. Faloutsos.
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY RobustMap: A Fast and Robust Algorithm for Dimension Reduction and Clustering Lionel F.
Lionel F. Lovett, II Jackson State University Research Alliance in Math and Science Computer Science and Mathematics Division Mentors: George Ostrouchov.
CMU SCS : Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos.
School of Computer Science Carnegie Mellon Data Mining using Fractals (fractals for fun and profit) Christos Faloutsos Carnegie Mellon University.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
MindReader: Querying databases through multiple examples Yoshiharu Ishikawa (Nara Institute of Science and Technology, Japan) Ravishankar Subramanya (Pittsburgh.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
School of Computer Science Carnegie Mellon WRIGHT, 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Carnegie Mellon Data Mining – Research Directions C. Faloutsos CMU
SCS-CMU Data Mining Tools A crash course C. Faloutsos.
Digital Video Library - Jacky Ma.
Next Generation Data Mining Tools: SVD and Fractals
15-826: Multimedia Databases and Data Mining
Indexing and Data Mining in Multimedia Databases
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Data Mining using Fractals and Power laws
Presentation transcript:

Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU

U. of Alberta, 2001C. Faloutsos2 Outline Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search New tools for Data Mining: Fractals Conclusions Resources

U. of Alberta, 2001C. Faloutsos3 Problem Given a large collection of (multimedia) records, find similar/interesting things, ie: Allow fast, approximate queries, and Find rules/patterns

U. of Alberta, 2001C. Faloutsos4 Sample queries Similarity search –Find pairs of branches with similar sales patterns –find medical cases similar to Smith's –Find pairs of sensor series that move in sync

U. of Alberta, 2001C. Faloutsos5 Sample queries –cont’d Rule discovery –Clusters (of patients; of customers;...) –Forecasting (total sales for next year?) –Outliers (eg., fraud detection)

U. of Alberta, 2001C. Faloutsos6 Outline Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search New tools for Data Mining: Fractals Conclusions Resourses

U. of Alberta, 2001C. Faloutsos7 Indexing - Multimedia Problem: given a set of (multimedia) objects, find the ones similar to a desirable query object (quickly!)

U. of Alberta, 2001C. Faloutsos8 day $price 1365 day $price 1365 day $price 1365 distance function: by expert

U. of Alberta, 2001C. Faloutsos9 day 1365 day 1365 S1 Sn F(S1) F(Sn) ‘GEMINI’ - Pictorially eg, avg eg,. std off-the-shelf S.A.Ms (spatial Access Methods)

U. of Alberta, 2001C. Faloutsos10 ‘GEMINI’ fast; ‘correct’ (=no false dismissals) used for –images (eg., QBIC) (2x, 10x faster) –shapes (27x faster) –video (eg., InforMedia) –time sequences ([Rafiei+Mendelzon], ++)

U. of Alberta, 2001C. Faloutsos11 Remaining issues how to extract features automatically? how to merge similarity scores from different media

U. of Alberta, 2001C. Faloutsos12 Outline Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search –Visualization: Fastmap –Relevance feedback: FALCON Data Mining / Fractals Conclusions

U. of Alberta, 2001C. Faloutsos13 FastMap O1O2O3O4O5 O O O O O ~100 ~1 ??

U. of Alberta, 2001C. Faloutsos14 FastMap Multi-dimensional scaling (MDS) can do that, but in O(N**2) time We want a linear algorithm: FastMap [SIGMOD95]

U. of Alberta, 2001C. Faloutsos15 Applications: time sequences given n co-evolving time sequences visualize them + find rules [ICDE00] time rate HKD JPY DEM

U. of Alberta, 2001C. Faloutsos16 Applications - financial currency exchange rates [ICDE00] USD(t) USD(t-5) FRF GBP JPY HKD

U. of Alberta, 2001C. Faloutsos17 Applications - financial currency exchange rates [ICDE00] USD HKD JPY FRF DEM GBP USD(t) USD(t-5)

U. of Alberta, 2001C. Faloutsos18 Application: VideoTrails [ACM MM97]

U. of Alberta, 2001C. Faloutsos19 VideoTrails - usage scene-cut detection (about 10% errors) scene classification (eg., dialogue vs action)

U. of Alberta, 2001C. Faloutsos20 Outline Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search –Visualization: Fastmap –Relevance feedback: FALCON Data Mining / Fractals Conclusions

U. of Alberta, 2001C. Faloutsos21 Merging similarity scores eg., video: text, color, motion, audio –weights change with the query! solution 1: user specifies weights solution 2: user gives examples –and we ‘learn’ what he/she wants: rel. feedback (Rocchio, MARS, MindReader) –but: how about disjunctive queries?

U. of Alberta, 2001C. Faloutsos22 DEMO server demo

U. of Alberta, 2001C. Faloutsos23 ‘FALCON’ Inverted VsVs Trader wants only ‘unstable’ stocks

U. of Alberta, 2001C. Faloutsos24 ‘FALCON’ Inverted VsVs average: is flat!

U. of Alberta, 2001C. Faloutsos25 “Single query point” methods Rocchio x avg std

U. of Alberta, 2001C. Faloutsos26 “Single query point” methods RocchioMindReader MARS The averaging affect in action... xx x

U. of Alberta, 2001C. Faloutsos Main idea: FALCON Contours feature1 (eg., avg) feature2 eg., std [Wu+, vldb2000]

U. of Alberta, 2001C. Faloutsos28 A: Aggregate Dissimilarity  : parameter (~ -5 ~ ‘soft OR’) g1 g2 x

U. of Alberta, 2001C. Faloutsos29 converges quickly (~5 iterations) good precision/recall is fast (can use off-the-shelf ‘spatial/metric access methods’) FALCON

U. of Alberta, 2001C. Faloutsos30 Conclusions for indexing + visualization GEMINI: fast indexing, exploiting off-the- shelf SAMs FastMap: automatic feature extraction in O(N) time FALCON: relevance feedback for disjunctive queries

U. of Alberta, 2001C. Faloutsos31 Outline Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search New tools for Data Mining: Fractals Conclusions Resourses

U. of Alberta, 2001C. Faloutsos32 Data mining & fractals – Road map Motivation – problems / case study Definition of fractals and power laws Solutions to posed problems More examples

U. of Alberta, 2001C. Faloutsos33 Problem #1 - spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol) - ‘spiral’ and ‘elliptical’ galaxies (stores & households; healthy & ill subjects) - patterns? (not Gaussian; not uniform) -attraction/repulsion? - separability??

U. of Alberta, 2001C. Faloutsos34 Problem#2: dim. reduction given attributes x 1,... x n –possibly, non-linearly correlated drop the useless ones (Q: why? A: to avoid the ‘dimensionality curse’) engine size mpg

U. of Alberta, 2001C. Faloutsos35 Answer: Fractals / self-similarities / power laws

U. of Alberta, 2001C. Faloutsos36 What is a fractal? = self-similar point set, e.g., Sierpinski triangle:... zero area; infinite length!

U. of Alberta, 2001C. Faloutsos37 Definitions (cont’d) Paradox: Infinite perimeter ; Zero area! ‘dimensionality’: between 1 and 2 actually: Log(3)/Log(2) = 1.58… (long story)

U. of Alberta, 2001C. Faloutsos38 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? xy Eg: #cylinders; miles / gallon

U. of Alberta, 2001C. Faloutsos39 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1

U. of Alberta, 2001C. Faloutsos40 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 Q: fd of a plane? A: nn ( <= r ) ~ r^2 fd== slope of (log(nn) vs log(r) )

U. of Alberta, 2001C. Faloutsos41 Sierpinsky triangle log( r ) log(#pairs within <=r ) 1.58 == ‘correlation integral’

U. of Alberta, 2001C. Faloutsos42 Observations self-similarity -> fractals scale-free power-laws (y=x^a, F=C*r^(-2)) log( r ) log(#pairs within <=r ) 1.58

U. of Alberta, 2001C. Faloutsos43 Road map Motivation – problems / case studies Definition of fractals and power laws Solutions to posed problems More examples Conclusions

U. of Alberta, 2001C. Faloutsos44 Solution#1: spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol - ‘BOPS’ plot - [sigmod2000]) clusters? separable? attraction/repulsion? data ‘scrubbing’ – duplicates?

U. of Alberta, 2001C. Faloutsos45 Solution#1: spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! - repulsion!

U. of Alberta, 2001C. Faloutsos46 Solution#1: spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! - repulsion! [w/ Seeger, Traina, Traina, SIGMOD00]

U. of Alberta, 2001C. Faloutsos47 spatial d.m. r1r2 r1 r2 Heuristic on choosing # of clusters

U. of Alberta, 2001C. Faloutsos48 Solution#1: spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! - repulsion!

U. of Alberta, 2001C. Faloutsos49 Solution#1: spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! -repulsion!! -duplicates

U. of Alberta, 2001C. Faloutsos50 Problem #2: Dim. reduction

U. of Alberta, 2001C. Faloutsos51 Solution: drop the attributes that don’t increase the ‘partial f.d.’ PFD dfn: PFD of attribute set A is the f.d. of the projected cloud of points [w/ Traina, Traina, Wu, SBBD00]

U. of Alberta, 2001C. Faloutsos52 Problem #2: dim. reduction PFD~1 global FD=1 PFD=1 PFD=0 PFD=1

U. of Alberta, 2001C. Faloutsos53 Problem #2: dim. reduction PFD~1 PFD=1 global FD=1 PFD=1 PFD=0 PFD=1 Notice: ‘max variance’ would fail here

U. of Alberta, 2001C. Faloutsos54 Problem #2: dim. reduction PFD~1 global FD=1 PFD=1 PFD=0 PFD=1 Notice: SVD would fail here

U. of Alberta, 2001C. Faloutsos55 Currency dataset

U. of Alberta, 2001C. Faloutsos56 self-similar? fd=1.98 fd=4.25 currency eigenfaces

U. of Alberta, 2001C. Faloutsos57 FDR on the ‘currency’ dataset if unif + indep.

U. of Alberta, 2001C. Faloutsos58 FDR on the ‘currency’ dataset if unif + indep. HKD: “useless” >1.98 axis are needed

U. of Alberta, 2001C. Faloutsos59 Road map Motivation – problems / case studies Definition of fractals and power laws Solutions to posed problems More examples Conclusions

U. of Alberta, 2001C. Faloutsos60 App. : traffic disk traces: self-similar (also: web traffic; comm. errors; etc) time #bytes

U. of Alberta, 2001C. Faloutsos61 More apps: Brain scans Oct-trees; brain-scans octree levels Log(#octants) 2.63 = fd

U. of Alberta, 2001C. Faloutsos62 More fractals: stock prices (LYCOS) - random walks: year2 years

U. of Alberta, 2001C. Faloutsos63 More fractals: coast-lines: (up to 1.58)

U. of Alberta, 2001C. Faloutsos64

U. of Alberta, 2001C. Faloutsos65 Examples:MG county Montgomery County of MD (road end- points)

U. of Alberta, 2001C. Faloutsos66 Examples:LB county Long Beach county of CA (road end-points)

U. of Alberta, 2001C. Faloutsos67 More power laws: Zipf’s law Bible - rank vs frequency (log-log) log(rank) log(freq) “a” “the”

U. of Alberta, 2001C. Faloutsos68 More power laws Freq. distr. of first names; last names (Mandelbrot)

U. of Alberta, 2001C. Faloutsos69 Internet Internet routers: how many neighbors within h hops? U of Alberta

U. of Alberta, 2001C. Faloutsos70 Internet topology Internet routers: how many neighbors within h hops? [SIGCOMM 99] Reachability function: number of neighbors within r hops, vs r (log- log). Mbone routers, 1995 log(hops) log(#pairs) 2.8

U. of Alberta, 2001C. Faloutsos71 More power laws: areas – Korcak’s law Scandinavian lakes ([icde99], w/ Proietti)

U. of Alberta, 2001C. Faloutsos72 More power laws: areas – Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area)

U. of Alberta, 2001C. Faloutsos73 Olympic medals: log rank log(# medals)

U. of Alberta, 2001C. Faloutsos74 More power laws Energy of earthquakes (Gutenberg-Richter law) [simscience.org] log(count) magnitudeday amplitude

U. of Alberta, 2001C. Faloutsos75 Even more power laws: Income distribution (Pareto’s law); sales distributions; duration of UNIX jobs Distribution of UNIX file sizes publication counts (Lotka’s law)

U. of Alberta, 2001C. Faloutsos76 Even more power laws: web hit frequencies ([Huberman]) hyper-link distribution [Barabasi], ++

U. of Alberta, 2001C. Faloutsos77 Overall Conclusions: ‘Find similar/interesting things’ in multimedia databases Indexing: feature extraction (‘GEMINI’) –automatic feature extraction: FastMap –Relevance feedback: FALCON

U. of Alberta, 2001C. Faloutsos78 Conclusions - cont’d New tools for Data Mining: Fractals/power laws: –appear everywhere –lead to skewed distributions (Gaussian, Poisson, uniformity, independence) –‘correlation integral’ for separability/cluster detection –PFD for dimensionality reduction

U. of Alberta, 2001C. Faloutsos79 Conclusions - cont’d –can model bursty time sequences (buffering/prefetching) –selectivity estimation (‘how many neighbors within x km?) –dim. curse diagnosis (it’s the fractal dim. that matters! [ICDE2000])

U. of Alberta, 2001C. Faloutsos80 Resources: Software and papers: – –Fractal dimension (FracDim) –Separability (sigmod 2000) –Relevance feedback for query by content (FALCON – vldb 2000)