CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos
CMU SCS (c) 2012, C. Faloutsos2 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search –Points –Text –Time sequences; images etc –Graphs Data Mining
CMU SCS (c) 2012, C. Faloutsos3 Indexing - similarity search
CMU SCS (c) 2012, C. Faloutsos4 Indexing - similarity search R-trees z-ordering / hilbert curves M-trees (DON’T FORGET … ) A B C D E F G H I J P1 P2 P3 P4
CMU SCS (c) 2012, C. Faloutsos5 Indexing - similarity search R-trees z-ordering / hilbert curves M-trees beware of high intrinsic dimensionality A B C D E F G H I J P1 P2 P3 P4
CMU SCS (c) 2012, C. Faloutsos6 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search –Points –Text –Time sequences; images etc –Graphs Data Mining
CMU SCS Text searching ‘find all documents with word bla’ (c) 2012, C. Faloutsos7
CMU SCS Text searching Full text scanning (‘grep’) Inversion (B-tree or hash index) (signature files) Vector space model –Ranked output –Relevance feedback String editing distance (-> dynamic prog.) (c) 2012, C. Faloutsos8
CMU SCS (c) 2012, C. Faloutsos9 Multimedia indexing day 1365 day 1365 S1 Sn
CMU SCS (c) 2012, C. Faloutsos#10 day 1365 day 1365 S1 Sn F(S1) F(Sn) ‘GEMINI’ - Pictorially eg, avg eg,. std
CMU SCS (c) 2012, C. Faloutsos11 Multimedia indexing Feature extraction for indexing (GEMINI) –Lower-bounding lemma, to guarantee no false alarms MDS/FastMap
CMU SCS (c) 2012, C. Faloutsos12 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search –Points –Text –Time sequences; images etc –Graphs Data Mining
CMU SCS (c) 2012, C. Faloutsos13 Time series & forecasting Goal: given a signal (eg., sales over time and/or space) Find: patterns and/or compress year count lynx caught per year
CMU SCS (c) 2012, C. Faloutsos14 Wavelets t f time value Q: baritone/silence/soprano - DWT?
CMU SCS (c) 2012, C. Faloutsos15 Time series + forecasting Fourier; Wavelets Box/Jenkins and AutoRegression non-linear/chaotic forecasting (fractals again) – ‘Delayed Coordinate Embedding’ ~ nearest neighbors
CMU SCS (c) 2012, C. Faloutsos16 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search –Points –Text –Time sequences; images etc –Graphs Data Mining
CMU SCS Graphs Real graphs: surprising patterns –?? (c) 2012, C. Faloutsos17
CMU SCS Graphs Real graphs: surprising patterns –‘six degrees’ –Skewed degree distribution (‘rich get richer’) –Super-linearities (2x nodes -> 3x edges ) –Diameter: shrinks (!) –Might have no good cuts (c) 2012, C. Faloutsos18
CMU SCS (c) 2012, C. Faloutsos19 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining
CMU SCS (c) 2012, C. Faloutsos20 Data Mining - DB
CMU SCS (c) 2012, C. Faloutsos21 Data Mining - DB Association Rules (‘diapers’ -> ‘beer’ ) [ OLAP (DataCubes, roll-up, drill-down) ] [ Classifiers ]
CMU SCS (c) 2012, C. Faloutsos22 Taking a step back: We saw some fundamental, recurring concepts and tools:
CMU SCS (c) 2012, C. Faloutsos23 Powerful, recurring tools Fractals/ self similarity –Zipf, Korcak, Pareto’s laws –intrinsic dimension (Sierpinski triangle) –correlation integral –Barnsley’s IFS compression –(Kronecker graphs)
CMU SCS (c) 2012, C. Faloutsos24 Powerful, recurring tools Fractals/ self similarity –Zipf, Korcak, Pareto’s laws –intrinsic dimension (Sierpinski triangle) –correlation integral –Barnsley’s IFS compression –(Kronecker graphs) ‘Take logarithms’ AVOID ‘avg’
CMU SCS (c) 2012, C. Faloutsos25 Powerful, recurring tools SVD (optimal L2 approx) –LSI, KL, PCA, ‘eigenSpokes’, (& in ICA ) –HITS (PageRank) v1 first singular vector
CMU SCS (c) 2012, C. Faloutsos26 Powerful, recurring tools Discrete Fourier Transform Wavelets
CMU SCS (c) 2012, C. Faloutsos27 Powerful, recurring tools Matrix inversion lemma –Recursive Least Squares –Sherman-Morrison(-Woodbury)
CMU SCS (c) 2012, C. Faloutsos28 Summary fractals / power laws probably lead to the most startling discoveries (‘the mean may be meaningless’) SVD: behind PageRank/HITS/tensors/… Wavelets: Nature seems to prefer them RLS: seems to achieve the impossible (approximate counting – NOT in the exam)
CMU SCS (c) 2012, C. Faloutsos29 Thank you! Feel free to contact me: GHC 8019 Reminder: faculty course eval’s: – Final: Thu. Dec 13, A302 –Double-check at – Have a great break!