CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #20: SVD - part III (more case studies) C. Faloutsos.

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
Advertisements

CMU SCS PageRank Brin, Page description: C. Faloutsos, CMU.
Eigen Decomposition and Singular Value Decomposition
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
Matrices, Digraphs, Markov Chains & Their Use by Google Leslie Hogben Iowa State University and American Institute of Mathematics Leslie Hogben Iowa State.
Information Networks Link Analysis Ranking Lecture 8.
Graphs, Node importance, Link Analysis Ranking, Random walks
Link Analysis: PageRank
15-826: Multimedia Databases and Data Mining
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
Slides by Olga Sorkine, Tel Aviv University. 2 The plan today Singular Value Decomposition  Basic intuition  Formal definition  Applications.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 March 23, 2005
Link Analysis Ranking. How do search engines decide how to rank your query results? Guess why Google ranks the query results the way it does How would.
Introduction to PageRank Algorithm and Programming Assignment 1 CSC4170 Web Intelligence and Social Computing Tutorial 4 Tutor: Tom Chao Zhou
10-603/15-826A: Multimedia Databases and Data Mining SVD - part II (more case studies) C. Faloutsos.
1 CS 430 / INFO 430: Information Retrieval Lecture 16 Web Search 2.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 21: Link Analysis.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 3 April 2, 2006
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
Link Analysis, PageRank and Search Engines on the Web
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 5 April 23, 2006
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Link Structure and Web Mining Shuying Wang
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Singular Value Decomposition and Data Management
Link Analysis. 2 HITS - Kleinberg’s Algorithm HITS – Hypertext Induced Topic Selection For each vertex v Є V in a subgraph of interest: A site is very.
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Link Analysis HITS Algorithm PageRank Algorithm.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.
Presented By: - Chandrika B N
CMU SCS : Multimedia Databases and Data Mining Lecture #20: SVD - part III (more case studies) C. Faloutsos.
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
Carnegie Mellon Powerful Tools for Data Mining Fractals, power laws, SVD C. Faloutsos Carnegie Mellon University.
Lectures 6 & 7 Centrality Measures Lectures 6 & 7 Centrality Measures February 2, 2009 Monojit Choudhury
The PageRank Citation Ranking: Bringing Order to the Web Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd Presented by Anca Leuca, Antonis Makropoulos.
Overview of Web Ranking Algorithms: HITS and PageRank
1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Christian Schindelhauer Search Algorithms Winter Semester 2004/ Nov.
Ranking Link-based Ranking (2° generation) Reading 21.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
1 CS 430: Information Discovery Lecture 5 Ranking.
CS 540 Database Management Systems Web Data Management some slides are due to Kevin Chang 1.
Extrapolation to Speed-up Query- dependent Link Analysis Ranking Algorithms Muhammad Ali Norozi Department of Computer Science Norwegian University of.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Search Engines and Link Analysis on the Web
Link-Based Ranking Seminar Social Media Mining University UC3M
15-826: Multimedia Databases and Data Mining
Part 2: Graph Mining Tools - SVD and ranking
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Lecture 22 SVD, Eigenvector, and Web Search
15-826: Multimedia Databases and Data Mining
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
CS 440 Database Management Systems
15-826: Multimedia Databases and Data Mining
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Next Generation Data Mining Tools: SVD and Fractals
Junghoo “John” Cho UCLA
Lecture 22 SVD, Eigenvector, and Web Search
Lecture 22 SVD, Eigenvector, and Web Search
Presentation transcript:

CMU SCS : Multimedia Databases and Data Mining Lecture #20: SVD - part III (more case studies) C. Faloutsos

CMU SCS Copyright: C. Faloutsos (2012)2 Must-read Material Textbook Appendix DTextbook Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms. Brin, S. and L. Page (1998). Anatomy of a Large-Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf.

CMU SCS Copyright: C. Faloutsos (2012)3 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining

CMU SCS Copyright: C. Faloutsos (2012)4 Indexing - Detailed outline primary key indexing secondary key / multi-key indexing spatial access methods fractals text Singular Value Decomposition (SVD) multimedia...

CMU SCS Copyright: C. Faloutsos (2012)5 SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case studies Conclusions

CMU SCS Copyright: C. Faloutsos (2012)6 SVD - detailed outline... Case studies SVD properties more case studies –google/Kleinberg algorithms –query feedbacks Conclusions

CMU SCS Copyright: C. Faloutsos (2012)7 SVD - Other properties - summary can produce orthogonal basis (obvious) (who cares?) can solve over- and under-determined linear problems (see C(1) property) can compute ‘fixed points’ (= ‘steady state prob. in Markov chains’) (see C(4) property)

CMU SCS Copyright: C. Faloutsos (2012)8 Properties – sneak preview: A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(5): (A T A ) k v’ ~ (constant) v 1 C(1): A [n x m] x [m x 1] = b [n x 1] then, x 0 = V  (-1) U T b: shortest, actual or least- squares solution C(4): A T A v 1 = 1 2 v 1 … … x0x0 IMPORTANT!

CMU SCS Copyright: C. Faloutsos (2012)9 SVD -outline of properties (A): obvious (B): less obvious (C): least obvious (and most powerful!)

CMU SCS Copyright: C. Faloutsos (2012)10 Properties - by defn.: A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] A(1): U T [r x n] U [n x r ] = I [r x r ] (identity matrix) A(2): V T [r x n] V [n x r ] = I [r x r ] A(3):  k = diag( 1 k, 2 k,... r k ) (k: ANY real number) A(4): A T = V  U T

CMU SCS Copyright: C. Faloutsos (2012)11 Less obvious properties A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(1): A [n x m] (A T ) [m x n] = ??

CMU SCS Copyright: C. Faloutsos (2012)12 Less obvious properties A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(1): A [n x m] (A T ) [m x n] = U  2 U T symmetric; Intuition?

CMU SCS Copyright: C. Faloutsos (2012)13 Less obvious properties A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(1): A [n x m] (A T ) [m x n] = U  2 U T symmetric; Intuition? ‘document-to-document’ similarity matrix B(2): symmetrically, for ‘V’ (A T ) [m x n] A [n x m] = V  2 V T Intuition?

CMU SCS Reminder: ‘column orthonormal’ V T V = I [r x r] Copyright: C. Faloutsos (2012)14 v1 v2v2 v 1 T x v 1 = 1 v 1 T x v 2 = 0

CMU SCS Copyright: C. Faloutsos (2012)15 Less obvious properties A: term-to-term similarity matrix B(3): ( (A T ) [m x n] A [n x m] ) k = V  2k V T and B(4): (A T A ) k ~ v 1 1 2k v 1 T for k>>1 where v 1 : [m x 1] first column (singular-vector) of V 1 : strongest singular value

CMU SCS Copyright: C. Faloutsos (2012)16 Proof of (B4)?

CMU SCS Copyright: C. Faloutsos (2012)17 Less obvious properties B(4): (A T A ) k ~ v 1 1 2k v 1 T for k>>1 B(5): (A T A ) k v’ ~ (constant) v 1 ie., for (almost) any v’, it converges to a vector parallel to v 1 Thus, useful to compute first singular vector/value (as well as the next ones, too...)

CMU SCS Copyright: C. Faloutsos (2012)18 Proof of (B5)?

CMU SCS Property (B5) Intuition: –(A T A ) v’ –(A T A ) k v’ Copyright: C. Faloutsos (2012)19 users products users products … … v’ Smith

CMU SCS Property (B5) Intuition: –(A T A ) v’ –(A T A ) k v’ Copyright: C. Faloutsos (2012)20 users products users products … … v’ Smith Smith’s preferences

CMU SCS Property (B5) Intuition: –(A T A ) v’ –(A T A ) k v’ Copyright: C. Faloutsos (2012)21 users products users products … … v’A v’

CMU SCS Property (B5) Intuition: –(A T A ) v’ –(A T A ) k v’ Copyright: C. Faloutsos (2012)22 users products users products … … v’A v’ similarities to Smith

CMU SCS Property (B5) Intuition: –(A T A ) v’ –(A T A ) k v’ Copyright: C. Faloutsos (2012)23 users products users products … … A T A v’A v’

CMU SCS Property (B5) Intuition: –(A T A ) v’what Smith’s ‘friends’ like –(A T A ) k v’what k-step-away-friends like (ie., after k steps, we get what everybody likes, and Smith’s initial opinions don’t count) Copyright: C. Faloutsos (2012)24

CMU SCS Copyright: C. Faloutsos (2012)25 Less obvious properties - repeated: A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(1): A [n x m] (A T ) [m x n] = U  2 U T B(2):(A T ) [m x n] A [n x m] = V  2 V T B(3): ( (A T ) [m x n] A [n x m] ) k = V  2k V T B(4): (A T A ) k ~ v 1 1 2k v 1 T B(5): (A T A ) k v’ ~ (constant) v 1

CMU SCS Copyright: C. Faloutsos (2012)26 Least obvious properties A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] C(1): A [n x m] x [m x 1] = b [n x 1] let x 0 = V  (-1) U T b if under-specified, x 0 gives ‘shortest’ solution if over-specified, it gives the ‘solution’ with the smallest least squares error (see Num. Recipes, p. 62)

CMU SCS Copyright: C. Faloutsos (2012)27 Least obvious properties A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] C(1): A [n x m] x [m x 1] = b [n x 1] let x 0 = V  (-1) U T b ~ ~ =

CMU SCS Copyright: C. Faloutsos (2012)28 Slowly: = =

CMU SCS Copyright: C. Faloutsos (2012)29 Slowly: = = Identity U: column- orthonormal

CMU SCS Copyright: C. Faloutsos (2012)30 Slowly: = =

CMU SCS Copyright: C. Faloutsos (2012)31 Slowly: = =

CMU SCS Copyright: C. Faloutsos (2012)32 Slowly: = =

CMU SCS Copyright: C. Faloutsos (2012)33 Slowly: = =

CMU SCS Copyright: C. Faloutsos (2012)34 Slowly: = = V  -1 UTUT b x

CMU SCS Copyright: C. Faloutsos (2012)35 Least obvious properties Illustration: under-specified, eg [1 2] [w z] T = 4 (ie, 1 w + 2 z = 4) all possible solutions x0x0 w z shortest-length solution

CMU SCS Copyright: C. Faloutsos (2012)36 Verify formula: A = [1 2] b = [4] A = U  V T U = ??  = ?? V= ?? x 0 = V    U T b Exercise

CMU SCS Copyright: C. Faloutsos (2012)37 Verify formula: A = [1 2] b = [4] A = U  V T U = [1]  = [ sqrt(5) ] V= [ 1/sqrt(5) 2/sqrt(5) ] T x 0 = V    U T b Exercise

CMU SCS Copyright: C. Faloutsos (2012)38 Verify formula: A = [1 2] b = [4] A = U  V T U = [1]  = [ sqrt(5) ] V= [ 1/sqrt(5) 2/sqrt(5) ] T x 0 = V    U T b = [ 1/5 2/5] T [4] = [4/5 8/5] T : w= 4/5, z = 8/5 Exercise

CMU SCS Copyright: C. Faloutsos (2012)39 Verify formula: Show that w= 4/5, z = 8/5 is (a)A solution to 1*w + 2*z = 4 and (b)Minimal (wrt Euclidean norm) Exercise

CMU SCS Copyright: C. Faloutsos (2012)40 Verify formula: Show that w= 4/5, z = 8/5 is (a)A solution to 1*w + 2*z = 4 and A: easy (b) Minimal (wrt Euclidean norm) A: [4/5 8/5] is perpenticular to [2 -1] Exercise

CMU SCS Copyright: C. Faloutsos (2012)41 Least obvious properties – cont’d Illustration: over-specified, eg [3 2] T [w] = [1 2] T (ie, 3 w = 1; 2 w = 2 ) reachable points (3w, 2w) desirable point b

CMU SCS Copyright: C. Faloutsos (2012)42 Verify formula: A = [3 2] T b = [ 1 2] T A = U  V T U = ??  = ?? V = ?? x 0 = V    U T b Exercise

CMU SCS Copyright: C. Faloutsos (2012)43 Verify formula: A = [3 2] T b = [ 1 2] T A = U  V T U = [ 3/sqrt(13) 2/sqrt(13) ] T  = [ sqrt(13) ] V = [ 1 ] x 0 = V    U T b = [ 7/13 ] Exercise

CMU SCS Copyright: C. Faloutsos (2012)44 Verify formula: [3 2] T [7/13] = [1 2] T [21/13 14/13 ] T -> ‘red point’ reachable points (3w, 2w) desirable point b Exercise

CMU SCS Copyright: C. Faloutsos (2012)45 Verify formula: [3 2] T [7/13] = [1 2] T [21/13 14/13 ] T -> ‘red point’ - perpenticular? reachable points (3w, 2w) desirable point b Exercise

CMU SCS Copyright: C. Faloutsos (2012)46 Verify formula: A: [3 2]. ( [1 2] – [21/13 14/13]) = [3 2]. [ -8/13 12/13] = [3 2]. [ -2 3] = reachable points (3w, 2w) desirable point b Exercise

CMU SCS Copyright: C. Faloutsos (2012)47 Least obvious properties - cont’d A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] C(2): A [n x m] v 1 [m x 1] = 1 u 1 [n x 1] where v 1, u 1 the first (column) vectors of V, U. (v 1 == right-singular-vector) C(3): symmetrically: u 1 T A = 1 v 1 T u 1 == left-singular-vector Therefore:

CMU SCS Copyright: C. Faloutsos (2012)48 Least obvious properties - cont’d A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] C(4): A T A v 1 = 1 2 v 1 (fixed point - the dfn of eigenvector for a symmetric matrix)

CMU SCS Copyright: C. Faloutsos (2012)49 Least obvious properties - altogether A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] C(1): A [n x m] x [m x 1] = b [n x 1] then, x 0 = V  (-1) U T b: shortest, actual or least- squares solution C(2): A [n x m] v 1 [m x 1] = 1 u 1 [n x 1] C(3): u 1 T A = 1 v 1 T C(4): A T A v 1 = 1 2 v 1

CMU SCS Copyright: C. Faloutsos (2012)50 Properties - conclusions A(0): A [n x m] = U [ n x r ]  [ r x r ] V T [ r x m] B(5): (A T A ) k v’ ~ (constant) v 1 C(1): A [n x m] x [m x 1] = b [n x 1] then, x 0 = V  (-1) U T b: shortest, actual or least- squares solution C(4): A T A v 1 = 1 2 v 1 … … x0x0

CMU SCS Copyright: C. Faloutsos (2012)51 SVD - detailed outline... Case studies SVD properties more case studies –Kleinberg/google algorithms –query feedbacks Conclusions

CMU SCS Copyright: C. Faloutsos (2012)52 Kleinberg’s algo (HITS) Kleinberg, Jon (1998). Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms.

CMU SCS Copyright: C. Faloutsos (2012)53 Kleinberg’s algorithm Problem dfn: given the web and a query find the most ‘authoritative’ web pages for this query Step 0: find all pages containing the query terms Step 1: expand by one move forward and backward

CMU SCS Copyright: C. Faloutsos (2012)54 Kleinberg’s algorithm Step 1: expand by one move forward and backward

CMU SCS Copyright: C. Faloutsos (2012)55 Kleinberg’s algorithm on the resulting graph, give high score (= ‘authorities’) to nodes that many important nodes point to give high importance score (‘hubs’) to nodes that point to good ‘authorities’) hubsauthorities

CMU SCS Copyright: C. Faloutsos (2012)56 Kleinberg’s algorithm observations recursive definition! each node (say, ‘i’-th node) has both an authoritativeness score a i and a hubness score h i

CMU SCS Copyright: C. Faloutsos (2012)57 Kleinberg’s algorithm Let E be the set of edges and A be the adjacency matrix: the (i,j) is 1 if the edge from i to j exists Let h and a be [n x 1] vectors with the ‘hubness’ and ‘authoritativiness’ scores. Then:

CMU SCS Copyright: C. Faloutsos (2012)58 Kleinberg’s algorithm Then: a i = h k + h l + h m that is a i = Sum (h j ) over all j that (j,i) edge exists or a = A T h k l m i

CMU SCS Copyright: C. Faloutsos (2012)59 Kleinberg’s algorithm symmetrically, for the ‘hubness’: h i = a n + a p + a q that is h i = Sum (q j ) over all j that (i,j) edge exists or h = A a p n q i

CMU SCS Copyright: C. Faloutsos (2012)60 Kleinberg’s algorithm In conclusion, we want vectors h and a such that: h = A a a = A T h Recall properties: C(2): A [n x m] v 1 [m x 1] = 1 u 1 [n x 1] C(3): u 1 T A = 1 v 1 T =

CMU SCS Copyright: C. Faloutsos (2012)61 Kleinberg’s algorithm In short, the solutions to h = A a a = A T h are the left- and right- singular-vectors of the adjacency matrix A. Starting from random a’ and iterating, we’ll eventually converge (Q: to which of all the singular-vectors? why?)

CMU SCS Copyright: C. Faloutsos (2012)62 Kleinberg’s algorithm (Q: to which of all the singular-vectors? why?) A: to the ones of the strongest singular-value, because of property B(5): B(5): (A T A ) k v’ ~ (constant) v 1

CMU SCS Copyright: C. Faloutsos (2012)63 Kleinberg’s algorithm - results Eg., for the query ‘java’: java.sun.com (“the java developer”)

CMU SCS Copyright: C. Faloutsos (2012)64 Kleinberg’s algorithm - discussion ‘authority’ score can be used to find ‘similar pages’ (how?) closely related to ‘citation analysis’, social networs / ‘small world’ phenomena

CMU SCS Copyright: C. Faloutsos (2012)65 SVD - detailed outline... Case studies SVD properties more case studies –Kleinberg/google algorithms –query feedbacks Conclusions

CMU SCS Copyright: C. Faloutsos (2012)66 PageRank (google) Brin, Sergey and Lawrence Page (1998). Anatomy of a Large-Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf. Larry Page Sergey Brin

CMU SCS Copyright: C. Faloutsos (2012)67 Problem: PageRank Given a directed graph, find its most interesting/central node A node is important, if it is connected with important nodes (recursive, but OK!)

CMU SCS Copyright: C. Faloutsos (2012)68 Problem: PageRank - solution Given a directed graph, find its most interesting/central node Proposed solution: Random walk; spot most ‘popular’ node (-> steady state prob. (ssp)) A node has high ssp, if it is connected with high ssp nodes (recursive, but OK!)

CMU SCS Copyright: C. Faloutsos (2012)69 (Simplified) PageRank algorithm Let A be the adjacency matrix; let B be the transition matrix: transpose, column-normalized - then = To From B

CMU SCS Copyright: C. Faloutsos (2012)70 (Simplified) PageRank algorithm B p = p =

CMU SCS Copyright: C. Faloutsos (2012)71 (Simplified) PageRank algorithm B p = 1 * p thus, p is the eigenvector that corresponds to the highest eigenvalue (=1, since the matrix is column-normalized ) Why does such a p exist? –p exists if B is nxn, nonnegative, irreducible [Perron–Frobenius theorem]

CMU SCS Copyright: C. Faloutsos (2012)72 (Simplified) PageRank algorithm B p = 1 * p thus, p is the eigenvector that corresponds to the highest eigenvalue (=1, since the matrix is column-normalized ) Why does such a p exist? –p exists if B is nxn, nonnegative, irreducible [Perron–Frobenius theorem] All all

CMU SCS Copyright: C. Faloutsos (2012)73 (Simplified) PageRank algorithm In short: imagine a particle randomly moving along the edges compute its steady-state probabilities (ssp) Full version of algo: with occasional random jumps Why? To make the matrix irreducible

CMU SCS Copyright: C. Faloutsos (2012)74 Full Algorithm With probability 1-c, fly-out to a random node Then, we have p = c B p + (1-c)/n 1 => p = (1-c)/n [I - c B] -1 1

CMU SCS Copyright: C. Faloutsos (2012)75 Full Algorithm With probability 1-c, fly-out to a random node Then, we have p = c B p + (1-c)/n 1 => p = (1-c)/n [I - c B] -1 1

CMU SCS Copyright: C. Faloutsos (2012)76 Alternative notation – eigenvector viewpoint MModified transition matrix M = c B + (1-c)/n 1 1 T Then p = M p That is: the steady state probabilities = PageRank scores form the first eigenvector of the ‘modified transition matrix’ CLIQUE

CMU SCS Copyright: C. Faloutsos (2012)77 Parenthesis: intuition behind eigenvectors Definition 3 properties intuition

CMU SCS Copyright: C. Faloutsos (2012)78 Formal definition If A is a (n x n) square matrix , x) is an eigenvalue/eigenvector pair of A if A x = x CLOSELY related to singular values:

CMU SCS Copyright: C. Faloutsos (2012)79 Property #1: Eigen- vs singular- values if B [n x m] = U [n x r]   r x r] (V [m x r] ) T then A = ( B T B ) is symmetric and C(4): B T B v i = i 2 v i ie, v 1, v 2,...: eigenvectors of A = (B T B)

CMU SCS Copyright: C. Faloutsos (2012)80 Property #2 If A [nxn] is a real, symmetric matrix Then it has n real eigenvalues (if A is not symmetric, some eigenvalues may be complex)

CMU SCS Copyright: C. Faloutsos (2012)81 Property #3 If A [nxn] is a real, symmetric matrix Then it has n real eigenvalues And they agree with its n singular values, except possibly for the sign

CMU SCS Copyright: C. Faloutsos (2012)82 Parenthesis: intuition behind eigenvectors Definition 3 properties intuition

CMU SCS Copyright: C. Faloutsos (2012)83 Intuition A as vector transformation Axx’ = x

CMU SCS Copyright: C. Faloutsos (2012)84 Intuition By defn., eigenvectors remain parallel to themselves (‘fixed points’) Av1v1 v1v1 = 3.62 * 1

CMU SCS Copyright: C. Faloutsos (2012)85 Convergence Usually, fast:

CMU SCS Copyright: C. Faloutsos (2012)86 Convergence Usually, fast:

CMU SCS Copyright: C. Faloutsos (2012)87 Convergence Usually, fast: depends on ratio 1 : 2 1 2

CMU SCS Copyright: C. Faloutsos (2012)88 Closing the parenthesis wrt intuition behind eigenvectors

CMU SCS Copyright: C. Faloutsos (2012)89 Kleinberg/PageRank - conclusions SVD helps in graph analysis: hub/authority scores: strongest left- and right- singular-vectors of the adjacency matrix random walk on a graph: steady state probabilities are given by the strongest eigenvector of the transition matrix

CMU SCS Copyright: C. Faloutsos (2012)90 SVD - detailed outline... Case studies SVD properties more case studies –google/Kleinberg algorithms –query feedbacks Conclusions

CMU SCS Copyright: C. Faloutsos (2012)91 Query feedbacks [Chen & Roussopoulos, sigmod 94] Sample problem: estimate selectivities (e.g., ‘how many movies were made between 1940 and 1945?’ for query optimization, LEARNING from the query results so far!!

CMU SCS Query feedbacks Given: past queries and their results –#movies(1925,1935) = 52 –#movies(1948, 1990) = 123 –… –And a new query, say #movies(1979,1980)? Give your best estimate Copyright: C. Faloutsos (2012)92 year #movies

CMU SCS Copyright: C. Faloutsos (2012)93 Query feedbacks Idea #1: consider a function for the CDF (cummulative distr. function), eg., 6-th degree polynomial (or splines, or anything else) year count, so far PDF

CMU SCS Copyright: C. Faloutsos (2012)94 Query feedbacks For example F(x) = # movies made until year ‘x’ = a 1 + a 2 * x + a 3 * x 2 + … a 7 * x 6

CMU SCS Copyright: C. Faloutsos (2012)95 Query feedbacks GREAT idea #2: adapt your model, as you see the actual counts of the actual queries year count, so far actual original estimate

CMU SCS Copyright: C. Faloutsos (2012)96 Query feedbacks year count, so far actual original estimate

CMU SCS Copyright: C. Faloutsos (2012)97 Query feedbacks year count, so far actual original estimate a query

CMU SCS Copyright: C. Faloutsos (2012)98 Query feedbacks year count, so far actual original estimate new estimate

CMU SCS Copyright: C. Faloutsos (2012)99 Query feedbacks Eventually, the problem becomes: - estimate the parameters a 1,... a 7 of the model - to minimize the least squares errors from the real answers so far. Formally:

CMU SCS Copyright: C. Faloutsos (2012)100 Query feedbacks Formally, with n queries and 6-th degree polynomials: =

CMU SCS Copyright: C. Faloutsos (2012)101 Query feedbacks where x i,j such that Sum (x i,j * a i ) = our estimate for the # of movies and b j : the actual =

CMU SCS Copyright: C. Faloutsos (2012)102 Query feedbacks For example, for query ‘find the count of movies during ( )’: a 1 + a 2 * a 3 * 1932**2 + … - (a 1 + a 2 * a 3 * 1920**2 + … ) =

CMU SCS Copyright: C. Faloutsos (2012)103 Query feedbacks And thus X11 = 0; X12 = , etc a 1 + a 2 * a 3 * 1932**2 + … - (a 1 + a 2 * a 3 * 1920**2 + … ) =

CMU SCS Copyright: C. Faloutsos (2012)104 Query feedbacks In matrix form: = X a = b 1st query n-th query

CMU SCS Copyright: C. Faloutsos (2012)105 Query feedbacks In matrix form: X a = b and the least-squares estimate for a is a = V   U T b according to property C(1) (let X = U  V T )

CMU SCS Copyright: C. Faloutsos (2012)106 Query feedbacks - enhancements The solution a = V   U T b works, but needs expensive SVD each time a new query arrives GREAT Idea #3: Use ‘Recursive Least Squares’, to adapt a incrementally. Details: in paper - intuition:

CMU SCS Copyright: C. Faloutsos (2012)107 Query feedbacks - enhancements Intuition: x b a 1 x + a 2 least squares fit

CMU SCS Copyright: C. Faloutsos (2012)108 Query feedbacks - enhancements Intuition: x b a 1 x + a 2 least squares fit new query

CMU SCS Copyright: C. Faloutsos (2012)109 Query feedbacks - enhancements Intuition: x b a 1 x + a 2 least squares fit new query a’ 1 x + a’ 2

CMU SCS Copyright: C. Faloutsos (2012)110 Query feedbacks - enhancements the new coefficients can be quickly computed from the old ones, plus statistics in a (7x7) matrix (no need to know the details, although the RLS is a brilliant method)

CMU SCS Copyright: C. Faloutsos (2012)111 Query feedbacks - enhancements GREAT idea #4: ‘forgetting’ factor - we can even down-play the weight of older queries, since the data distribution might have changed. (comes for ‘free’ with RLS...)

CMU SCS Copyright: C. Faloutsos (2012)112 Query feedbacks - enhancements Intuition: x b a 1 x + a 2 least squares fit new query a’ 1 x + a’ 2 a’’ 1 x + a’’ 2

CMU SCS Copyright: C. Faloutsos (2012)113 Query feedbacks - conclusions SVD helps find the Least Squares solution, to adapt to query feedbacks (RLS = Recursive Least Squares is a great method to incrementally update least-squares fits)

CMU SCS Copyright: C. Faloutsos (2012)114 SVD - detailed outline... Case studies SVD properties more case studies –google/Kleinberg algorithms –query feedbacks Conclusions

CMU SCS Copyright: C. Faloutsos (2012)115 Conclusions SVD: a valuable tool given a document-term matrix, it finds ‘concepts’ (LSI)... and can reduce dimensionality (KL)... and can find rules (PCA; RatioRules)

CMU SCS Copyright: C. Faloutsos (2012)116 Conclusions cont’d... and can find fixed-points or steady-state probabilities (google/ Kleinberg/ Markov Chains)... and can solve optimally over- and under- constraint linear systems (least squares / query feedbacks)

CMU SCS Copyright: C. Faloutsos (2012)117 References Brin, S. and L. Page (1998). Anatomy of a Large- Scale Hypertextual Web Search Engine. 7th Intl World Wide Web Conf. Chen, C. M. and N. Roussopoulos (May 1994). Adaptive Selectivity Estimation Using Query Feedback. Proc. of the ACM-SIGMOD, Minneapolis, MN.

CMU SCS Copyright: C. Faloutsos (2012)118 References cont’d Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. Proc. 9th ACM-SIAM Symposium on Discrete Algorithms. Press, W. H., S. A. Teukolsky, et al. (1992). Numerical Recipes in C, Cambridge University Press.