Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mathematical Foundations for Search and Surf Engines.

Similar presentations


Presentation on theme: "Mathematical Foundations for Search and Surf Engines."— Presentation transcript:

1 Mathematical Foundations for Search and Surf Engines

2 Jean-Louis Lassez

3 Ryan Rossi

4 Lecture 1 & 2 Introduction Ergodic Theorem Perron-Frobenius Theorem Power Method Foundations of PageRank

5 Search Engines Search engine: deterministic Ranking of sites Mathematical foundations: Markov and Perron Frobenius Surf Engines Non deterministic: window shopping Ranking of links Mathematical foundations: Singular Value Decomposition

6 Initial approach to ranking sites: the in-degree heuristic 1 2 67 5 3 8 4

7 Google’s PageRank approach Problem: rank the sites in order of most visited

8 Another Approach: Hubs and Authorities Authorities: sites that contain the most important information Hubs: sites that provide directions to the authorities A H

9 Ranking Hyperlinks Local Global Update

10 Local ….. ? ? ?

11 Global ?? …..

12 2005 A C DF E B G H J I K 2006 L Updated

13 Part I: Ranking Sites The Web as a Markov Chain The Ergodic Theorem Perron-Frobenius Theorem Algorithms: PageRank, HITS, SALSA

14 Markov Chains Text Analysis Speech Recognition Statistical Mechanics Decision Science: Medicine, Commerce ….more recently…..

15 Bioinformatics M1M1 M2M2 M3M3 d1d1 I1I1 d2d2 I2I2 d1d1 I1I1

16 Internet

17 Markov’s Ergodic Theorem (1906) Any irreducible, finite, aperiodic Markov Chain has all states Ergodic (reachable at any time in the future) and has a unique stationary distribution, which is a probability vector.

18 s1s1.45 1 s2s2 s3s3 s4s4.25.6.75.55.4 Probabilities after 15 steps X1 =.33 X2 =.26 X3 =.26 X4 =.13 30 steps X1 =.33 X2 =.26 X3 =.23 X4 =.16 100 steps X1 =.37 X2 =.29 X3 =.21 X4 =.13 500 steps X1 =.38 X2 =.28 X3 =.21 X4 =.13 1000 steps X1 =.38 X2 =.28 X3 =.21 X4 =.13

19 s1s1.45 1 s2s2 s3s3 s4s4.25.6.75.55.4 Probabilities after 15 steps X1 =.46 X2 =.20 X3 =.26 X4 =.06 30 steps X1 =.36 X2 =.26 X3 =.23 X4 =.13 100 steps X1 =.38 X2 =.28 X3 =.21 X4 =.13 500 steps X1 =.38 X2 =.28 X3 =.21 X4 =.13 1000 steps X1 =.38 X2 =.28 X3 =.21 X4 =.13

20 p 11 x 1 + p 21 x 2 + p 31 x 3 = x 1 p 12 x 1 + p 22 x 2 + p 32 x 3 = x 2 p 13 x 1 + p 23 x 2 + p 33 x 3 = x 3 ∑x i = 1 x i ≥ 0 p 11 x 1 + p 21 x 2 + p 31 x 3 = x 1 p 12 x 1 + p 22 x 2 + p 32 x 3 = x 2 p 13 x 1 + p 23 x 2 + p 33 x 3 = x 3 ∑x i = 1 x i ≥ 0 X1, X2, X3 are probabilities p 11 x 1 + p 21 x 2 + p 31 x 3 = x 1 p 12 x 1 + p 22 x 2 + p 32 x 3 = x 2 p 13 x 1 + p 23 x 2 + p 33 x 3 = x 3 ∑x i = 1 x i ≥ 0 Probability of going to S1 from S2 Steady State Constraints

21 Solved by Power Iteration Method Based on Perron-Frobenius Theorem Where e is an initial vector and M is the stochastic matrix associated to the system.

22 Power Method Example M = 0 1 0 0 1/2 0 1/2 0 0 1/2 1 0 0 0 1 43 2 e = 1 0.3636 0.1818 0.0909

23 Power Method Example M = 0 1 0 0 1/2 0 1/2 0 0 1/2 1 0 0 0 1 43 2 e = 2 100 7 3 0.3636 0.1818 0.0909

24 Difficulties Proof, Design and Implementation Notion of convergence Deal with complex numbers Unique solution ……… Furthermore, it does not always work

25 The process does not converge, even though the solution is obvious. 16 34 52

26 Perron-Frobenius Theorem: Is it really necessary?

27 Secret Weapon: Fourier Gauss Tarski Robinson The Power of Elimination

28 The elimination of the proof is an ideal seldom reached Elimination gives us the heart of the proof

29 Symbolic Gaussian Elimination Three variables: p 11 x 1 + p 21 x 2 + p 31 x 3 = x 1 p 12 x 1 + p 22 x 2 + p 32 x 3 = x 2 p 13 x 1 + p 23 x 2 + p 33 x 3 = x 3 ∑x i = 1 with Maple we get: x 1 = (p 31 p 21 + p 31 p 23 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 12 p 31 + p 12 p 32 ) / Σ x 3 = (p 13 p 21 + p 12 p 23 + p 13 p 23 ) / Σ Σ = (p 31 p 21 + p 31 p 23 + p 32 p 21 + p 13 p 32 + p 12 p 31 + p 12 p 32 + p 13 p 21 + p 12 p 23 + p 13 p 23 )

30 s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ

31 s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ

32 s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ

33 s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ

34 s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ

35 s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 x 1 = (p 31 p 21 + p 23 p 31 + p 32 p 21 ) / Σ x 2 = (p 13 p 32 + p 31 p 12 + p 12 p 32 ) / Σ x 3 = (p 21 p 13 + p 12 p 23 + p 13 p 23 ) / Σ

36 Symbolic Gaussian Elimination System with four variables: p 21 x 2 + p 31 x 3 + p 41 x 4 = x 1 p 12 x 1 + p 42 x 4 = x 2 p 13 x 1 = x 3 p 34 x 3 = x 4 ∑x i = 1

37 s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34 with Maple we get:

38 s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34

39 s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34

40 s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34

41 s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34

42 Do you see the LIGHT? James Brown (The Blues Brothers)

43 s1s1 p 12 p 21 s2s2 s3s3 p 31 s4s4 p 41 p 34 p 42 p 13 x 1 = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 / Σ x 2 = p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 / Σ x 3 = p 41 p 21 p 13 + p 42 p 21 p 13 / Σ x 4 = p 21 p 13 p 34 / Σ Σ = p 21 p 34 p 41 + p 34 p 42 p 21 + p 21 p 31 p 41 + p 31 p 42 p 21 + p 31 p 41 p 12 + p 31 p 42 p 12 + p 34 p 41 p 12 + p 34 p 42 p 12 + p 13 p 34 p 42 + p 41 p 21 p 13 + p 42 p 21 p 13 + p 21 p 13 p 34

44 Ergodic Theorem Revisited If there exists a reverse spanning tree in a graph of the Markov chain associated to a stochastic system, then: (a)the stochastic system admits the following probability vector as a solution: (b) the solution is unique. (c) the conditions {x i ≥ 0} i=1,n are redundant and the solution can be computed by Gaussian elimination.

45 Cycle This system is now covered by the proof 1 6 3 4 5 2

46 Markov Chain as a Conservation System s1s1 p 12 p 21 p 11 s2s2 p 22 s3s3 p 33 p 13 p 23 p 31 p 32 p 11 x 1 + p 21 x 2 + p 31 x 3 = x 1 (p 11 + p 12 + p 13 ) p 12 x 1 + p 22 x 2 + p 32 x 3 = x 2 (p 21 + p 22 + p 23 ) p 13 x 1 + p 23 x 2 + p 33 x 3 = x 3 (p 31 + p 32 + p 33 )

47 Kirchoff’s Current Law (1847) The sum of currents flowing towards a node is equal to the sum of currents flowing away from the node. i 31 + i 21 = i 12 + i 13 i 32 + i 12 = i 21 i 13 = i 31 + i 32 12 3 i 13 i 31 i 32 i 12 i 21 i 3 + i 2 = i 1 + i 4

48 Kirchhoff’s Matrix Tree Theorem (1847) The theorem allows us to calculate the number of spanning trees of a connected graph

49 Two theorems for the price of one!! Differences Ergodic Theorem: The symbolic proof is simpler and more appropriate than using Perron-Frobenius Kirchhoff Theorem: Our version calculates the spanning trees and not their total sum.

50 Internet Sites Kleinberg Google SALSA Indegree Heuristic

51 Google PageRank Patent “The rank of a page can be interpreted as the probability that a surfer will be at the page after following a large number of forward links.” The Ergodic Theorem

52 Google PageRank Patent “The iteration circulates the probability through the linked nodes like energy flows through a circuit and accumulates in important places.” Kirchoff (1847)

53 Rank Sinks 5 6 7 1 2 3 4 No Spanning Tree

54 References Kirchhoff, G. "Über die Auflösung der Gleichungen, auf welche man bei der untersuchung der linearen verteilung galvanischer Ströme geführt wird." Ann. Phys. Chem. 72, 497-508, 1847. А. А. Марков. "Распространение закона больших чисел на величины, зависящие друг от друга". "Известия Физико-математического общества при Казанском университете", 2-я серия, том 15, ст. 135-156, 1906. H. Minkowski, Geometrie der Zahlen (Leipzig, 1896). J. Farkas, "Theorie der einfachen Ungleichungen," J. F. Reine u. Ang. Mat., 124, 1-27, 1902. H. Weyl, "Elementare Theorie der konvexen Polyeder," Comm. Helvet., 7, 290-306, 1935. A. Charnes, W. W. Cooper, “The Strong Minkowski Farkas-Weyl Theorem for Vector Spaces Over Ordered Fields,” Proceedings of the National Academy of Sciences, pp. 914-916, 1958.

55 References E. Steinitz. Bedingt Konvergente Reihen und Konvexe Systeme. J. reine angew. Math., 146:1-52, 1916. K. Jeev, J-L. Lassez: Symbolic Stochastic Systems. MSV/AMCS 2004: 321- 328 V. Chandru, J-L. Lassez: Qualitative Theorem Proving in Linear Constraints. Verification: Theory and Practice 2003: 395-406 Jean-Louis Lassez: From LP to LP: Programming with Constraints. DBPL 1991 : 257-283


Download ppt "Mathematical Foundations for Search and Surf Engines."

Similar presentations


Ads by Google