Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler

Complexity and Efficient Algorithms Group / Department of Computer Science 2 Very Large Networks Examples  Social networks  The human brain  Crystals  Chip design Size  10 9 – 10 23 vertices  Petabytes of additional information possible

Complexity and Efficient Algorithms Group / Department of Computer Science 3 Very Large Networks Classical graph problems  Connectivity  MinCut, MaxCut  Graphclustering  Graphisomorphism Difficulties  Graph does not fit into main memory

Complexity and Efficient Algorithms Group / Department of Computer Science 4 Classification of Very Large Networks – A Vision Exampe questions  Is a country a democracy or a totalitarian country?  Is a patient schizophrenic?  Is software malicious? Formalization  Given a set of graphs with class labels (training set)  Find a classifier for new graphs

Complexity and Efficient Algorithms Group / Department of Computer Science 5 Classification of Very Large Networks – A Vision A typical szenario  Hundreds or thousands of graphs  Each graph is extremly large  Graphs are sparse A possible approach  Describe graphs by features (graph properties)  Apply classical learning algorithms The challenge  Computation of ten thousands of features for graphs with billions of vertices (12,3,-5,10,0,0,…,20,3)

Complexity and Efficient Algorithms Group / Department of Computer Science 6 Classification of Very Large Networks – A Sampling Approach Random Sampling  Compute a graph property approximately by random sampling Informal Question  What can we learn from the local structure of a sparse graph about its global properties? Sampling from Graphs  How can we sample a graph?

Complexity and Efficient Algorithms Group / Department of Computer Science 7 Classification of Very Large Networks – A Sampling Approach Examples of different sampling strategies 1.Sample set S of s vertices and look at all edges within S (the subgraph G[S] induced by S) 2.Sample set S of s edges and look at their graph 3.Sample a set S of s vertices and perform a BFS from each of them 4.Sample a set S of s vertices and perform a random walk from each of them  Many more possibilities… Question  Which is the right sampling strategy for my learning problem?

Complexity and Efficient Algorithms Group / Department of Computer Science 8 Classification of Very Large Networks – A Sampling Approach Examples of different sampling strategies 1.Sample set S of s vertices and look at all edges within S (the subgraph G[S] induced by S) 2.Sample set S of s edges and look at their graph 3.Sample a set S of s vertices and perform a BFS from each of them 4.Sample a set S of s vertices and perform a random walk from each of them  Many more possibilities… Question  Which is the right sampling strategy for my learning problem?  Depends on the problem…

Complexity and Efficient Algorithms Group / Department of Computer Science 9 Classification of Very Large Networks – A Sampling Approach Question 1  Assume you have some classification task that involves city maps. Which of our four sampling methods is your method of choice? Possible Answers 1.Sample set S of s vertices and look at all edges within S 2.Sample set S of s edges and look at their graph 3.Sample a set S of s vertices and perform a BFS from each of them 4.Sample a set S of s vertices and perform a random walk from each of them

Complexity and Efficient Algorithms Group / Department of Computer Science 10 Classification of Very Large Networks – A Sampling Approach Question 2  Assume you have some classification task that involves social networks. Which of our four sampling methods is your method of choice? Possible Answers 1.Sample set S of s vertices and look at all edges within S 2.Sample set S of s edges and look at their graph 3.Sample a set S of s vertices and perform a BFS from each of them 4.Sample a set S of s vertices and perform a random walk from each of them

Complexity and Efficient Algorithms Group / Department of Computer Science 11 First Wrap-Up Motivation  Some classification problems involve sets of huge graphs  No efficient algorithm for some fundamental graph problems known Sampling approach  We would like to pick small samples from the graph(s) and use them for graph classification Challenge  There are many different sampling procedures  We need to understand which is the right one for which problem

Complexity and Efficient Algorithms Group / Department of Computer Science 12 Sampling from Very Large Networks Property Testing [Rubinfeld, Sudan, 1996, Goldreich, Goldwasser, Ron, 1998]  Formal framework to study sampling algorithms for very large networks Relaxation of „Standard Decision Problems“  Want to distinguish whether input graph G has a property or is far away from it  If G neither has the property nor is far away from it the algorithm may give an arbitrary answer  Randomized algorithms with bounded (worst case) error probability  Only looks at small part of the graph Different graph models  Dense graphs, bounded degree graphs, directed graphs

Complexity and Efficient Algorithms Group / Department of Computer Science 13 Property Testing in Bounded Degree Graphs Bounded degree graphs [Goldreich, Ron, 2002]  Undirected Graph G=(V,E)  Maximum degree bounded by D  D constant Oracle access  V={1,…,n}  n is known to the algorithm  Query(i,j) returns j-th neighbor of vertex i or a symbol that indicates that this neighbor does not exist 12 3 4 5

Complexity and Efficient Algorithms Group / Department of Computer Science 14 Property Testing in Bounded Degree Graphs Graph properties  A graph property is a set of graphs that is closed under isomorphism Definition [Goldreich, Ron, 2002]  G=(V,E) is  -far from P, if one has to modify more than  Dn edges to obtain a bounded degree graph with property P. connected  -far

Complexity and Efficient Algorithms Group / Department of Computer Science 15 Property Testing in Bounded Degree Graphs Property Tester for property P [Goldreich, Ron, 2002]  Oracle access to input graph G  Accepts with probability at least 2/3, if G has property P  Rejects with probability at least 2/3, if G is  -far from P Quality measures  Query complexity: Maximum number of oracle queries  Running time

Complexity and Efficient Algorithms Group / Department of Computer Science 16 A First Example: Connectivity Connectivitytester(G, ,D) [Goldreich, Ron, 2002] (1) Sample set S with s=8/(  D) vertices uniformly at random from V (2) For every vertex from S: (3) Perform a BFS until (a) 4/(  D) vertices have been discovered or (b) all vertices of a small connected component have been discovered (4) if (b) then reject (5) accept

Complexity and Efficient Algorithms Group / Department of Computer Science 17 A First Example: Connectivity Connectivitytester(G, ,D) [Goldreich, Ron, 2002] (1) Sample set S with s=8/(  D) vertices uniformly at random from V (2) For every vertex from S: (3) Perform a BFS until (a) 4/(  D) vertices have been discovered or (b) all vertices of a small connected component have been discovered (4) if (b) then reject (5) accept Observation ConnectivityTester accepts every connected graph

Complexity and Efficient Algorithms Group / Department of Computer Science 18 A First Example: Connectivity Connectivitytester(G, ,D) [Goldreich, Ron, 2002] (1) Sample set S with s=8/(  D) vertices uniformly at random from V (2) For every vertex from S: (3) Perform a BFS until (a) 4/(  D) vertices have been discovered or (b) all vertices of a small connected component have been discovered (4) if (b) then reject (5) accept Claim If G is  -far from connected, then G has more than  Dn/2 connected components.

Complexity and Efficient Algorithms Group / Department of Computer Science 19 A First Example: Connectivity Connectivitytester(G, ,D) [Goldreich, Ron, 2002] (1) Sample set S with s=8/(  D) vertices uniformly at random from V (2) For every vertex from S: (3) Perform a BFS until (a) 4/(  D) vertices have been discovered or (b) all vertices of a small connected component have been discovered (4) if (b) then reject (5) accept Claim At least  Dn/4 of the connected components have size at most 4/(  D).

Complexity and Efficient Algorithms Group / Department of Computer Science 20 A First Example: Connectivity Connectivitytester(G, ,D) [Goldreich, Ron, 2002] (1) Sample set S with s=8/(  D) vertices uniformly at random from V (2) For every vertex from S: (3) Perform a BFS until (a) 4/(  D) vertices have been discovered or (b) all vertices of a small connected component have been discovered (4) if (b) then reject (5) accept Theorem Connectivitytester is a property tester with query complexity O(1/(  ²D)).

Complexity and Efficient Algorithms Group / Department of Computer Science 21 Second Wrap-Up – Introduction to Property Testing Property Testing  Approximately decide based on random sampling whether a graph has a property or is far away from it  Quality measure: Query complexity Connectivity  Sampling + BFS  Check whether the sample violates the property

Complexity and Efficient Algorithms Group / Department of Computer Science 22 Second Wrap-Up – Introduction to Property Testing Question 3  Is the following algorithm a property tester for planarity (for right choice of f)? Planaritytester(G, ,D) (1) Sample set S with s= f( ,D) vertices uniformly at random from V (2) For every vertex from S: (3) Perform a BFS until (a) f( ,D) vertices have been discovered or (b) the discovered graph is not planar (4) if (b) then reject (5) accept

Complexity and Efficient Algorithms Group / Department of Computer Science 23 Second Wrap-Up – Introduction to Property Testing Bad news There is a class of graphs such that every cycle has Length  (log n) and that are  -far from planar Good news The sampling is fine, we just need to modify our acceptance condition 23

Complexity and Efficient Algorithms Group / Department of Computer Science 24 Random Walks, Stationary Distributions & Convergence Random Walk  In each step: move from current vertex v to a neighbor chosen uniformly at random Convergence  If G is connected and not bipartite, a random walk converges to a unique stationary distribution  Pr[Random Walk is at vertex v]  deg(v)

Complexity and Efficient Algorithms Group / Department of Computer Science 25 Random Walks, Stationary Distributions & Convergence Random Walks on Maps  A random walk on a planar graph has the tendency to stay local  It takes a long time to reach the stationary distribution  Reason: The network has sparse cuts Random Walks on Social Networks  A random walk will quickly move to a „random place“  Fast convergence  The network does not have sparse cuts

Complexity and Efficient Algorithms Group / Department of Computer Science 26 Random Walks, Stationary Distributions & Convergence Lazy Random Walk  In each step: - Probability to move from current vertex v to neighbor u is 1/(2D) - stays at v with remaining probability Convergence of Lazy Random Walks  Stationary distribution is uniform Rate of Convergence  Can be expressed in terms of the conductance of G or the second largest eigenvalue of the transition matrix  O(log n) steps, if G is an expander graph

Complexity and Efficient Algorithms Group / Department of Computer Science 27 Conductance, Expanders & Small Worlds Definition  The expansion  (U) of a set U is defined as  The conductance  G of G is min U:1≤|U|≤|V|/2  (U) Definition  A graph G=(V,E) is called  -expander, if  G ≥  for some constant  Interpretations  Expander graphs satisfy the „small-world phenomenon“  Conductance can be viewed as a measure for the social connectivity of a network

Complexity and Efficient Algorithms Group / Department of Computer Science 28 Testing Expanders Facts  A lazy random walk converges to uniform distribution  A lazy random walk converges quickly in expander graphs Hope  A lazy random walk converges much slower, if the graph is  -far from an expander graph  In particular, we hope that the distribution of the endpoints of a  (log n)- step lazy random walk differs significantly from the uniform distribution Question  If so, how could we exploit this to design a property testing algorithm?

Complexity and Efficient Algorithms Group / Department of Computer Science 29 The Birthday Problem & Testing Uniform Distributions Birthday Problem  n possible birthdays  k persons with birthday chosen uniformly at random  How large must k be so that with constant probability two person have the same birthday? Analysis  p=(1/n,..,1/n) T  ||p||² is the collision probability of two birthdays  If we have k persons then the expected number of collision is  So, for k =  (  n) we expect to see a collision

Complexity and Efficient Algorithms Group / Department of Computer Science 30 Testing Uniform Distributions Observation  The uniform distribution minimizes the expected number of pairwise collisions  If a distribution q differs significantly from the uniform distribution then ||q||²>>||p||² TestUniformDistribution(distribution q) 1. Sample  (  n) elements according to q 2. if the number of pairwise collisions is too large then reject 3. else accept

Complexity and Efficient Algorithms Group / Department of Computer Science 31 Testing Expanders TestingExpanders(G) 1. Sample set S of s vertices uniformly at random 2. for each v  S do 3. Let q be the distribution of endpoints of a  (log n)-step lazy random walk 4. if TestUniformDistribution(q) rejects then reject 5. accept History Algorithm was invented by [Goldreich and Ron, 2000] and algorithm conjectured to be a property tester First complete analysis by [Czumaj and Sohler, 2010] (but weaker than conjectured) Later improved by [Nachmias and Shapira, 2010] and [Kale and Seshadhri, 2011]

Complexity and Efficient Algorithms Group / Department of Computer Science 32 Final Result Theorem [ Nachmias and Shapira, 2010, Kale and Seshadhri, 2011]  Algorithm TestingExpansion accepts every  -expander and rejects every graph that is  -far from a  ²)-expander. The algorithm has a running time of O(n 1/2+  ). Key structural property of „  -far“-graphs  If G is  -far from a  ²)-expander then there exists a set U of  (  n) vertices with  (U) = O(  ²).  Implies that for many vertices, the distribution of endpoints of a random walk of length O(log n) is significantly different from the uniform distribution

Complexity and Efficient Algorithms Group / Department of Computer Science 33 Third Wrap-Up – Testing Expansion (Lazy) Random Walks  Moves from a vertex to a random neighbor  Converges to uniform distribution  Speed of convergence depends on graph structure Testing Expansion  Random Walk converges quickly in expander graphs  Random Walk converges slower if we are far from expander graphs  Number of collisions among end points of random walks is minimized in expander graphs  We can test expansion by counting collisions

Complexity and Efficient Algorithms Group / Department of Computer Science 34 Graph Clustering & Web Communities Web Graph Communities  Set of vertices that induces an expander graph and has a sparse cut to the rest of the graph  Question: Is the web graph composed of a set of at most k communities? Definition  A subset C  V is called (  in,  out )-cluster, if   G (G[C]) ≥  in   (C) ≤  out Definition  A partition of V into at most k (  in,  out )-clusters is called (k,  in,  out )-clustering

Complexity and Efficient Algorithms Group / Department of Computer Science 35 Testing k-Clusterings A Simple Case?  Distinguish between a union of at most k expander graphs with no edges in between and a set of more than k (large) expander graphs with no edges in between  Can we use our previous algorithm to test for a k-clustering? Expander

Complexity and Efficient Algorithms Group / Department of Computer Science 36 Testing k-Clusterings A Simple Case?  No! We do not know the size of the clusters (expander graphs) and estimating the support size of a distribution is hard [Raskhodnikova et al., 2009] Expander

Complexity and Efficient Algorithms Group / Department of Computer Science 37 Testing k-Clusterings New idea  If two vertices come from the same cluster, the random walks quickly converge to the same distribution  So, we could try to sample a set of vertices and check for sets of vertices whose random walks induce the same distributions Expander

Complexity and Efficient Algorithms Group / Department of Computer Science 38 Main Idea [Batu et al. 2013; Chan et al. 2014]  if p  q then then the following experiments should give roughly the same number of collisions between elements from S and T:  Draw two sets S and T of m elements from p  Draw two sets S and T of m elements from q  Draw set S of m elements from p and set T of m elements from q  If p and q differ significantly, at least one of the three values is different Testing Closeness of Distributions

Complexity and Efficient Algorithms Group / Department of Computer Science 39 Theorem [Batu et al. 2013; Chan et al. 2014]  There is a tester that w.p. 2/3 accepts, if ||p-q||≤  /2 and rejects, if ||p-q||≥ . The query complexity of the algorithms is O(  b/  ²), where b is an upper bound on ||p||² and ||q||². Testing Closeness of Distributions

Complexity and Efficient Algorithms Group / Department of Computer Science 40 Theorem [Batu et al. 2013; Chan et al. 2014]  There is a tester that w.p. 2/3 accepts, if ||p-q||≤  /2 and rejects, if ||p-q||≥ . The query complexity of the algorithms is O(  b/  ²), where b is an upper bound on ||p||² and ||q||².  We will need b to be O(1/n) Testing Closeness of Distributions

Complexity and Efficient Algorithms Group / Department of Computer Science 41 The Algorithm ClusteringTest 1. Sample set S of s vertices uniformly at random 2. For any v  S let D(v) be the distribution of end points of a random walk of length  (log n) starting at v 3. for each pair u,v  S do 4. if D(u) and D(v) are close then add an edge (u,v) to the „cluster graph“ on vertex set S 5. accept, if and only if the cluster graph is a collection of at most k cliques

Complexity and Efficient Algorithms Group / Department of Computer Science 42 Testing k-Clusterings Observation  Algorithm ClusteringTest distinguishes between at most k expanders and more than k (large) expanders Expander

Complexity and Efficient Algorithms Group / Department of Computer Science 43 Testing k-Clusterings Observation  Algorithm ClusteringTest distinguishes between at most k expanders and more than k (large) expanders  Can we generalize it to testing of (k,  in,  out )-clusterings ? Expander

Complexity and Efficient Algorithms Group / Department of Computer Science 44 Testing k-Clusterings - Soundness Challenge  Since the clusters may be connected in a (k,  in,  out )-clustering the stationary distribution may be uniform over G (and not over the cluster)

Complexity and Efficient Algorithms Group / Department of Computer Science 45 Testing k-Clusterings - Soundness Challenge  Since the clusters may be connected in a (k,  in,  out )-clustering the stationary distribution may be uniform over G (and not over the cluster)  Need to show that for proper length of the random walk there is an „intermediate“ distribution that it is „reasonably stable“ w.r.t. l 2 -error

Complexity and Efficient Algorithms Group / Department of Computer Science 46 The Algorithm ClusteringTest 1. Sample set S of s vertices uniformly at random 2. For any v  S let D(v) be the distribution of end points of a random walk of length  (log n) starting at v 3. for each pair u,v  S do 4. if D(u) and D(v) are close then add an edge (u,v) to the „cluster graph“ on vertex set S 5. accept, if and only if the cluster graph is a collection of at most k cliques

Complexity and Efficient Algorithms Group / Department of Computer Science 47 The Algorithm ClusteringTest 1. Sample set S of s vertices uniformly at random 2. For any v  S let D(v) be the distribution of end points of a random walk of length  (log n) starting at v 3. if ||D(v)||² > O(1/n) then reject 4. for each pair u,v  S do 5. if D(u) and D(v) are close then add an edge (u,v) to the „cluster graph“ on vertex set S 6. accept, if and only if the cluster graph is a collection of at most k connected components

Complexity and Efficient Algorithms Group / Department of Computer Science 48 Testing k-Clusterings - Completeness Required Properties of a (k,  in,  out )-clustering  For most vertices v: The distribution D(v) of end points of a lazy random walk of proper length has ||D(v)||² = O(1/n)  For most pairs u,v from the same cluster: ||D(v)- D(u)||² is very small Useful Tool – Higher Order Cheeger‘s Inequality [Lee et al. 2014]  Relates (k,  in,  out )-clustering to the k+1 largest eigenvalues

Complexity and Efficient Algorithms Group / Department of Computer Science 49 Testing k-Clusterings - Soundness Structural property of „  -far“-graphs (similarly to expanders)  If G is  -far from a (k,  in *,  out * )-clusterings then there exists a partition into k+1 sets C 1,…,C k+1 each of  (  ²n/k) vertices and with  (C i ) = O(  in */  ²).

Complexity and Efficient Algorithms Group / Department of Computer Science 50 Testing k-Clusterings Theorem [Czumaj, Peng, Sohler, 2015]  Algorithm ClusteringTester accepts every (k,  in,  out )-clustering with probability at least 2/3 and rejects every graph that is  -far from every (k,  in *,  out *)-clustering with probability at least 2/3, where  out =O(  4  in ²) and  in * =  (  4  in ²/log n) for constants k,D.  The running time of the algorithm is O*(  n).

Complexity and Efficient Algorithms Group / Department of Computer Science 51 Fourth Wrap-Up Testing Clusterings  End points of Random Walk of proper length should be uniform on its cluster with not much probability „outside“  If Random Walks start from two different points of the same cluster, their end point distributions are similar  Collision statistics can be used to pairwise test similarity of distributions  This can be used to approximate the cut structure Take away message  The distribution of end points of random walks (possibly comparing different starting vertices) contains a lot of information about the cut structure of a graph

Complexity and Efficient Algorithms Group / Department of Computer Science 52 Summary Vision  Learning from very large sets of massive graphs Approach  Feature computation by random sampling  Analysis in the framework of property testing Two Examples  Expanders (connectivity measure in social networks)  Clustering (structure of social networks)

Complexity and Efficient Algorithms Group / Department of Computer Science 53 Thank you! Source Slide 2: Allan Ajifo und cobalt123; creative common license Slide 3: GustavoG und Jasper Nance; creative common license Slide 4: Wikipedia; Jason Brown; creative common license Slide 5: GustavoG; creative common license Slide 6: GoldenRibbon, creative common license

Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Similar presentations

Presentation on theme: "Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler.

Similar presentations

Presentation on theme: "Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler."— Presentation transcript:

Similar presentations

About project

Feedback