Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU www.cs.cmu.edu/~christos.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
Deepayan ChakrabartiCIKM F4: Large Scale Automated Forecasting Using Fractals -Deepayan Chakrabarti -Christos Faloutsos.
Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU
CMU SCS : Multimedia Databases and Data Mining Lecture #25: Multimedia indexing C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
15-826: Multimedia Databases and Data Mining
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
Indexing Time Series Based on Slides by C. Faloutsos (CMU) and D. Gunopulos (UCR)
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
CMU SCS Graph and stream mining Christos Faloutsos CMU.
1 ISI’02 Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional.
Based on Slides by D. Gunopulos (UCR)
School of Computer Science Carnegie Mellon Boston U., 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
Carnegie Mellon Powerful Tools for Data Mining Fractals, Power laws, SVD C. Faloutsos Carnegie Mellon University.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
Dimensionality Reduction
Data Mining using Fractals and Power laws
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
CMU SCS Data Mining in Streams and Graphs Christos Faloutsos CMU.
Introduction to Fractals and Fractal Dimension Christos Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
School of Computer Science Carnegie Mellon UIUC 04C. Faloutsos1 Advanced Data Mining Tools: Fractals and Power Laws for Graphs, Streams and Traditional.
CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals – examples & algo’s C. Faloutsos.
Multimedia Databases (MMDB)
Lionel F. Lovett, II Jackson State University Research Alliance in Math and Science Computer Science and Mathematics Division Mentors: George Ostrouchov.
CMU SCS : Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos.
School of Computer Science Carnegie Mellon Data Mining using Fractals (fractals for fun and profit) Christos Faloutsos Carnegie Mellon University.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
MindReader: Querying databases through multiple examples Yoshiharu Ishikawa (Nara Institute of Science and Technology, Japan) Ravishankar Subramanya (Pittsburgh.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
School of Computer Science Carnegie Mellon WRIGHT, 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Carnegie Mellon Data Mining – Research Directions C. Faloutsos CMU
SCS-CMU Data Mining Tools A crash course C. Faloutsos.
Digital Video Library - Jacky Ma.
Next Generation Data Mining Tools: SVD and Fractals
15-826: Multimedia Databases and Data Mining
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Indexing and Data Mining in Multimedia Databases
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
I don’t need a title slide for a lecture
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Data Mining using Fractals and Power laws
Presentation transcript:

Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU

USC 2001C. Faloutsos2 Outline Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search New tools for Data Mining: Fractals Conclusions Resources

USC 2001C. Faloutsos3 Problem Given a large collection of (multimedia) records, find similar/interesting things, ie: Allow fast, approximate queries, and Find rules/patterns

USC 2001C. Faloutsos4 Sample queries Similarity search –Find pairs of branches with similar sales patterns –find medical cases similar to Smith's –Find pairs of sensor series that move in sync –Find shapes like a spark-plug

USC 2001C. Faloutsos5 Sample queries –cont’d Rule discovery –Clusters (of branches; of sensor data;...) –Forecasting (total sales for next year?) –Outliers (eg., unexpected part failures; fraud detection)

USC 2001C. Faloutsos6 Outline Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search New tools for Data Mining: Fractals Conclusions related CMU and resourses

USC 2001C. Faloutsos7 Indexing - Multimedia Problem: given a set of (multimedia) objects, find the ones similar to a desirable query object

USC 2001C. Faloutsos8 day $price 1365 day $price 1365 day $price 1365 distance function: by expert

USC 2001C. Faloutsos9 day 1365 day 1365 S1 Sn F(S1) F(Sn) ‘GEMINI’ - Pictorially eg, avg eg,. std

USC 2001C. Faloutsos10 Remaining issues how to extract features automatically? how to merge similarity scores from different media

USC 2001C. Faloutsos11 Outline Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search –Visualization: Fastmap –Relevance feedback: FALCON Data Mining / Fractals Conclusions

USC 2001C. Faloutsos12 FastMap O1O2O3O4O5 O O O O O ~100 ~1 ??

USC 2001C. Faloutsos13 FastMap Multi-dimensional scaling (MDS) can do that, but in O(N**2) time We want a linear algorithm: FastMap [SIGMOD95]

USC 2001C. Faloutsos14 Applications: time sequences given n co-evolving time sequences visualize them + find rules [ICDE00] time rate HKD JPY DEM

USC 2001C. Faloutsos15 Applications - financial currency exchange rates [ICDE00] USD(t) USD(t-5) FRF GBP JPY HKD

USC 2001C. Faloutsos16 Applications - financial currency exchange rates [ICDE00] USD HKD JPY FRF DEM GBP USD(t) USD(t-5)

USC 2001C. Faloutsos17 Application: VideoTrails [ACM MM97]

USC 2001C. Faloutsos18 VideoTrails - usage scene-cut detection (about 10% errors) scene classification (eg., dialogue vs action)

USC 2001C. Faloutsos19 Outline Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search –Visualization: Fastmap –Relevance feedback: FALCON Data Mining / Fractals Conclusions

USC 2001C. Faloutsos20 Merging similarity scores eg., video: text, color, motion, audio –weights change with the query! solution 1: user specifies weights solution 2: user gives examples –and we ‘learn’ what he/she wants: rel. feedback (Rocchio, MARS, MindReader) –but: how about disjunctive queries?

USC 2001C. Faloutsos21 ‘FALCON’ Inverted VsVs Trader wants only ‘unstable’ stocks

USC 2001C. Faloutsos22 “Single query point” methods Rocchio x

USC 2001C. Faloutsos23 “Single query point” methods RocchioMindReader MARS The averaging affect in action... xx x

USC 2001C. Faloutsos Main idea: FALCON Contours feature1 (eg., temperature) feature2 eg., frequency [Wu+, vldb2000]

USC 2001C. Faloutsos25 Conclusions for indexing + visualization GEMINI: fast indexing, exploiting off-the- shelf SAMs FastMap: automatic feature extraction in O(N) time FALCON: relevance feedback for disjunctive queries

USC 2001C. Faloutsos26 Outline Goal: ‘Find similar / interesting things’ Problem - Applications Indexing - similarity search New tools for Data Mining: Fractals Conclusions Resourses

USC 2001C. Faloutsos27 Data mining & fractals – Road map Motivation – problems / case study Definition of fractals and power laws Solutions to posed problems More examples

USC 2001C. Faloutsos28 Problem #1 - spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol) - ‘spiral’ and ‘elliptical’ galaxies (stores & households ; mpg & MTBF...) - patterns? (not Gaussian; not uniform) -attraction/repulsion? - separability??

USC 2001C. Faloutsos29 Problem#2: dim. reduction given attributes x 1,... x n –possibly, non-linearly correlated drop the useless ones (Q: why? A: to avoid the ‘dimensionality curse’)

USC 2001C. Faloutsos30 Answer: Fractals / self-similarities / power laws

USC 2001C. Faloutsos31 What is a fractal? = self-similar point set, e.g., Sierpinski triangle:... zero area; infinite length!

USC 2001C. Faloutsos32 Definitions (cont’d) Paradox: Infinite perimeter ; Zero area! ‘dimensionality’: between 1 and 2 actually: Log(3)/Log(2) = 1.58… (long story)

USC 2001C. Faloutsos33 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? xy Eg: #cylinders; miles / gallon

USC 2001C. Faloutsos34 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 (‘power law’: y=x^a)

USC 2001C. Faloutsos35 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 (‘power law’: y=x^a) Q: fd of a plane? A: nn ( <= r ) ~ r^2 fd== slope of (log(nn) vs log(r) )

USC 2001C. Faloutsos36 Sierpinsky triangle log( r ) log(#pairs within <=r ) 1.58 == ‘correlation integral’

USC 2001C. Faloutsos37 Road map Motivation – problems / case studies Definition of fractals and power laws Solutions to posed problems More examples Conclusions

USC 2001C. Faloutsos38 Solution#1: spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol - ‘BOPS’ plot - [sigmod2000]) clusters? separable? attraction/repulsion? data ‘scrubbing’ – duplicates?

USC 2001C. Faloutsos39 Solution#1: spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! - repulsion!

USC 2001C. Faloutsos40 Solution#1: spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! - repulsion! [w/ Seeger, Traina, Traina, SIGMOD00]

USC 2001C. Faloutsos41 spatial d.m. r1r2 r1 r2 Heuristic on choosing # of clusters

USC 2001C. Faloutsos42 Solution#1: spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! - repulsion!

USC 2001C. Faloutsos43 Solution#1: spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! -repulsion!! -duplicates

USC 2001C. Faloutsos44 Problem #2: Dim. reduction

USC 2001C. Faloutsos45 Solution: drop the attributes that don’t increase the ‘partial f.d.’ PFD dfn: PFD of attribute set A is the f.d. of the projected cloud of points [w/ Traina, Traina, Wu, SBBD00]

USC 2001C. Faloutsos46 Problem #2: dim. reduction PFD~1 global FD=1 PFD=1 PFD=0 PFD=1

USC 2001C. Faloutsos47 Problem #2: dim. reduction PFD~1 PFD=1 global FD=1 PFD=1 PFD=0 PFD=1 Notice: ‘max variance’ would fail here

USC 2001C. Faloutsos48 Problem #2: dim. reduction PFD~1 global FD=1 PFD=1 PFD=0 PFD=1 Notice: SVD would fail here

USC 2001C. Faloutsos49 Road map Motivation – problems / case studies Definition of fractals and power laws Solutions to posed problems More examples –fractals –power laws Conclusions

USC 2001C. Faloutsos50 disk traffic Not Poisson, not(?) iid - BUT: self-similar How to model it? time #bytes

USC 2001C. Faloutsos51 traffic disk traces (80-20 ‘law’ = ‘multifractal’ [ICDE’02]) time #bytes 20% 80%

USC 2001C. Faloutsos52 Traffic Many other time-sequences are bursty/clustered: (such as?)

USC 2001C. Faloutsos53 Tape accesses time Tape#1 Tape# N # tapes needed, to retrieve n records? (# days down, due to failures / hurricanes / communication noise...)

USC 2001C. Faloutsos54 Tape accesses time Tape#1 Tape# N # tapes retrieved # qual. records = Poisson real

USC 2001C. Faloutsos55 More apps: Brain scans Oct-trees; brain-scans octree levels Log(#octants) 2.63 = fd

USC 2001C. Faloutsos56 Cross-roads of Montgomery county: any rules? GIS points

USC 2001C. Faloutsos57 GIS A: self-similarity: intrinsic dim. = 1.51 avg#neighbors(<= r ) = r^D log( r ) log(#pairs(within <= r)) 1.51

USC 2001C. Faloutsos58 Examples:LB county Long Beach county of CA (road end-points)

USC 2001C. Faloutsos59 More fractals: cardiovascular system: 3 (!) stock prices (LYCOS) - random walks: 1.5 Coastlines: (?) 1 year2 years

USC 2001C. Faloutsos60

USC 2001C. Faloutsos61 Road map Motivation – problems / case studies Definition of fractals and power laws Solutions to posed problems More examples –fractals –power laws Conclusions

USC 2001C. Faloutsos62 Fractals Power laws self-similarity -> fractals scale-free power-laws (y=x^a, F=C*r^(-2)) log( r ) log(#pairs within <=r ) 1.58

USC 2001C. Faloutsos63 Bible RANK-FREQUENCY plot: (in log-log scales) Zipf’s (first) Law: Zipf’s law log(rank) log(freq)  “the” “and”

USC 2001C. Faloutsos64 Zipf’s law similarly for first names (slope ~-1) last names (~ -0.7) etc

USC 2001C. Faloutsos65 More power laws Energy of earthquakes (Gutenberg-Richter law) [simscience.org] log(count) magnitudeday amplitude

USC 2001C. Faloutsos66 Web Site Traffic log(freq) log(count) Zipf Clickstream data

USC 2001C. Faloutsos67 Lotka’s law library science (Lotka’s law of publication count); and citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) J. Ullman

USC 2001C. Faloutsos68 Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area)

USC 2001C. Faloutsos69 More power laws: Korcak Japan islands; area vs cumulative count (log-log axes) log(area) log(count( >= area))

USC 2001C. Faloutsos70 (Korcak’s law: Aegean islands)

USC 2001C. Faloutsos71 Olympic medals: log rank log(# medals) USA China Russia

USC 2001C. Faloutsos72 SALES data – store#96 # units sold count of products

USC 2001C. Faloutsos73 TELCO data # of service units count of customers

USC 2001C. Faloutsos74 More power laws on the Internet degree vs rank, for Internet domains (log-log) [sigcomm99] log(rank) log(degree) -0.82

USC 2001C. Faloutsos75 Even more power laws: Income distribution (Pareto’s law); duration of UNIX jobs [Harchol-Balter] Distribution of UNIX file sizes Web graph [CLEVER-IBM; Barabasi]

USC 2001C. Faloutsos76 Overall Conclusions: ‘Find similar/interesting things’ in multimedia databases Indexing: feature extraction (‘GEMINI’) –automatic feature extraction: FastMap –Relevance feedback: FALCON

USC 2001C. Faloutsos77 Conclusions - cont’d New tools for Data Mining: Fractals/power laws: –appear everywhere –lead to skewed distributions (Gaussian, Poisson, uniformity, independence) –‘correlation integral’ for separability/cluster detection –PFD for dimensionality reduction

USC 2001C. Faloutsos78 Resources: Software and papers: – –Fractal dimension (FracDim) –Separability (sigmod 2000, kdd2001) –Relevance feedback for query by content (FALCON – vldb 2000)

USC 2001C. Faloutsos79 Resources Manfred Schroeder “Chaos, Fractals and Power Laws”