Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fan Chung University of California, San Diego The PageRank of a graph.

Similar presentations


Presentation on theme: "Fan Chung University of California, San Diego The PageRank of a graph."— Presentation transcript:

1 Fan Chung University of California, San Diego The PageRank of a graph

2

3 What is PageRank? Ranking the vertices of a graph Partially ordered sets + graph theory -Applications --web search,ITdata sets, xxxxxxxxxxxxxpartitioning algorithms,… -Theory --- correlations among vertices A new graph invariant for dealing with:

4 fan

5 Outline of the talk Motivation Local cuts and the Cheeger constant using:  eigenvectors  random walks  PageRank  heat kernel Define PageRank Four versions of the Cheeger inequality Four partitioning algorithms Green’s functions and hitting time

6 What is PageRank? What is Rank?

7

8

9 What is PageRank? PageRank is defined on any graph.

10 An induced subgraph of the collaboration graph with authors of Erdös number ≤ 2.

11 A subgraph of the Hollywood graph.

12 A subgraph of a BGP graph

13 Yahoo IM graph Reid Andersen 2005 The Octopus graph

14 Graph Theory has 250 years of history. Leonhard Euler 1707-1783 The Bridges of Königsburg Is it possible to walk over every bridge once and only once?

15 Geometric graphs Algebraic graphs General graphs Topological graphs

16 Massive data Massive graphs WWW-graphs Call graphs Acquaintance graphs Graphs from any data a.base protein interaction network Jawoong Jeong

17 Big and bigger graphsNew directions in graph theory

18 Many basic questions: Correlation among vertices? The `geometry’ of a network ? Quantitative analysis? distance, flow, cut, … eigenvalues, rapid mixing, … Local versus global?

19 Google’s answer: The definition for PageRank?

20 A measure for the “importance” of a website The “importance” of a website is proportional to the sum of the importance of all the sites that link to it.

21 A solution for the “importance” of a website Solvefor x

22 A solution for the “importance” of a website Solvefor x x = ρ A x Adjacency matrix

23 A solution for the “importance” of a website Solvefor x x = ρ A x

24 Graph models directed graphs weighted graphs (undirected) graphs

25 Graph models directed graphs weighted graphs (undirected) graphs

26 Graph models directed graphs weighted graphs (undirected) graphs 1.5 1.2 3.3 \ 1.5 2 2.3 1 2.81.1

27 In a directed graph, authority there are two types of “importance”: hub Jon Kleinberg 1998

28 Two types of the “importance” of a website Solveand x x = r A y Importance as Authorities : y Importance as Hubs : y = s A x T x = rs A A x T y = rs A A y T

29 Eigenvalue problem for n x n matrix:. n ≈ 30 billion websites Hard to compute eigenvalues Even harder to compute eigenvectors

30 In the old days, compute for a given (whole) graph. In reality, can only afford to compute “locally”.

31 A traditional algorithm Input: a given graph on n vertices. New algorithmic paradigm Input: access to a (huge) graph Efficient algorithm means polynomial algorithms n 3, n 2, n log n, n (e.g., for a vertex v, find its neighbors) Bounded number of access.

32 A traditional algorithm Input: a given graph on n vertices. New algorithmic paradigm Input: access to a (huge) graph Efficient algorithm means polynomial algorithms n 3, n 2, n log n, n (e.g., for a vertex v, find its neighbors) Bounded number of access. Exponential polynomial Infinity finite

33 The definition of PageRank given by Brin and Page is based on random walks.

34 Random walks in a graph. G : a graph P : transition probability matrix A lazy walk: := the degree of u.

35 A (bored) surfer either surf a random webpage with probability 1- α Original definition of PageRank α : the jumping constant or surf a linked webpage with probability α

36 Two equivalent ways to define PageRank pr(α,s) (1) s: the seed as a row vector Definition of personalized PageRank α : the jumping constant s

37 Two equivalent ways to define PageRank p=pr(α,s) (1) (2) s = Definition of PageRank the (original) PageRank some “seed”, e.g., s = personalized PageRank

38 Depends on the applications? How good is PageRank as a measure of correlationship? Isoperimetric properties How “good” is the cut?

39 Isoperimetric properties “What is the shortest curve enclosing a unit area?” In a graph G and an integer m, what is the minimum cut disconnecting a subgraph of ≥ m vertices? In a graph G, what is the minimum cut e(S,V-S) so that e(S,V-S) is the smallest? _____ Vol S

40 Two types of cuts: Vertex cut edge cut How “good” is the cut? S E(S,V-S)

41 e(S,V-S) _____ Vol S e(S,V-S) _____ |S| Vol S = Σ deg(v) v ε S |S| = Σ 1 v ε S S V-S

42 The Cheeger constant The Cheeger constant for graphs h G and its variations are sometimes called “conductance”, “isoperimetric number”, … The volume of S is

43 The Cheeger constant The Cheeger inequality : the first nontrivial eigenvalue of the xx(normalized) Laplacian of a connected graph.

44 The spectrum of a graph Many ways to define the spectrum of a graph. How are the eigenvalues related to properties of graphs? properties of graphs? Adjacency matrix

45 Combinatorial Laplacian diagonal degree matrix adjacency matrix Gustav Robert Kirchhoff 1824-1887 The spectrum of a graph Adjacency matrix

46 Combinatorial Laplacian diagonal degree matrix adjacency matrix Gustav Robert Kirchhoff 1824-1887 The spectrum of a graph Adjacency matrix Matrix tree theorem #spanning. trees

47 Combinatorial Laplacian diagonal degree matrix adjacency matrix Gustav Robert Kirchhoff 1824-1887 The spectrum of a graph Adjacency matrix Normalized Laplacian Random walks Rate of convergence

48 The spectrum of a graph not symmetric in general Normalized Laplacian symmetric normalized { Discrete Laplace operator with eigenvalues { loopless, simple

49 The spectrum of a graph Normalized Laplacian symmetric normalized Discrete Laplace operator with eigenvalues = { { not symmetric in general

50 dictates many properties of a graph. expander diameter discrepancy subgraph containment …. Spectral implications for finding good cuts?

51 Finding a cut by a sweep For order the vertices Consider sets and the Cheeger constant of Define

52 Using a sweep by the eigenvector, can reduce the exponential number of choices of subsets to a linear number. Finding a cut by a sweep

53 Using a sweep by the eigenvector, can reduce the exponential number of choices of subsets to a linear number. Finding a cut by a sweep Still, there is a lower bound guarantee by using the Cheeger inequality.

54 Four types of Cheeger inequalities.  eigenvectors  random walks  PageRank  heat kernel Four proofs using: Leading to four different one-sweep partitioning algorithms.

55 graph spectral method random walks PageRank heat kernel spectral partition algorithm local partition algorithms Four proofs of Cheeger inequalities

56 Graph partitioning Local graph partitioning

57 What is a local graph partitioning algorithm? A local graph partitioning algorithm finds a small cut near the given seed(s) with running time depending only on the size of the output.

58 Examples of local partitioning

59

60

61

62

63

64 graph spectral method random walks PageRank heat kernel spectral partition algorithm local partition algorithms Four proofs of Cheeger inequalities

65 graph spectral method random walks PageRank heat kernel Lovasz, Simonovits, 90, 93 Spielman, Teng, 04 Andersen, Chung, Lang, 06 Chung, PNAS, 08. Four proofs of Cheeger inequalities Cheeger 60’s, Fiedler ’73 Alon 86’, Jerrum+Sinclair 89’

66 where is the first non-trivial eigenvalue of the Laplacian and is the minimum Cheeger ratio in a sweep using the eigenvector. Using eigenvector the Cheeger inequality can be stated as The Cheeger inequalityPartition algorithm,

67 Proof of the Cheeger inequality: from definition by Cauchy-Schwarz ineq. summation by parts. from the definition.

68 where is the minimum Cheeger ratio over sweeps by using a lazy walk of k steps from every vertex for an appropriate range of k. A Cheeger inequality using random walks Lovász, Simonovits, 90, 93 Leads to a Cheeger inequality:

69 Using the PageRank vector. A Cheeger inequality using PageRank Recall the definition of PageRank p=pr(α,s): (1) (2) Organize the random walks by a scalar α.

70 Random walks versus PageRank How fast is the convergence to the stationary distribution? Choose α to satisfy the required property. For what k, can one have ?

71 with seed as a subset S Using the PageRank vector and a Cheeger inequality can be obtained : where S is the Dirichlet eigenvalue of the Laplacian, and is the minimum Cheeger ratio over sweeps by using the appropriate personalized PageRank with seeds S. A Cheeger inequality using PageRank

72 over all f satisfying the Dirichlet boundary condition: Dirichlet eigenvalues for a subset for all

73 Local Cheeger constant for a subset

74 with seed as a subset S Using the PageRank vector and a Cheeger inequality can be obtained : where S is the Dirichlet eigenvalue of the Laplacian, and is the minimum Cheeger ratio over sweeps by using personalized PageRank with seed S. A Cheeger inequality using PageRank

75 Algorithmic aspects of PageRank Fast approximation algorithm for x personalized PageRank Can use the jumping constant to approximate PageRank with a support of the desired size. greedy type algorithm, almost linear complexity Errors can be effectively bounded.

76 A graph partition algorithm using PageRank Given a set S with randomly choose a vertex v in S. With probability at least the one-sweep algorithm using has an initial segment with the Cheeger constant at most

77 Graph partitioning using PageRank vector. 198,430 nodes and 1,133,512 edges

78

79

80 Kevin Lang 2007

81 graph spectral method random walks PageRank heat kernel Lovasz, Simonovits, 90, 93 Spielman, Teng, 04 Andersen, Chung, Lang, 06 Chung, PNAS, 08. Four proofs of Cheeger inequalities Fiedler ’73, Cheeger, 60’s Alon 86’

82 PageRank versus heat kernel Geometric sumExponential sum

83 PageRank versus heat kernel Geometric sumExponential sum recurrenceHeat equation

84 A Cheeger inequality using the heat kernel Theorem: where is the minimum Cheeger ratio over sweeps by using heat kernel pagerank over all u in S. Theorem: For

85 Definition of heat kernel

86 Using the upper and lower bounds, a Cheeger inequality can be obtained : where S is the Dirichlet eigenvalue of the Laplacian, and is the minimum Cheeger ratio over sweeps by using heat kernel with seeds S for appropriate t. A Cheeger inequality using the heat kernel

87

88

89

90 Covering and packing using PageRank Many applications of PageRank for problems in Graph Theory Pebbing and routing using PageRank Graph embedding using PageRank Relating graph invariants of subgraphs to the host graph using PageRank Your favorite old problem using PageRank? Graph drawing using PageRank

91 Random graphs with general degrees pageranks Algorithmic game theory, graphical games New Directions in Graph Theory for information networks Topics: Using: Spectral methods Probabilistic methods Quasirandom …

92

93


Download ppt "Fan Chung University of California, San Diego The PageRank of a graph."

Similar presentations


Ads by Google