Dense subgraphs of random graphs Uriel Feige Weizmann Institute.

Slides:



Advertisements
Similar presentations
Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT
Advertisements

Routing Complexity of Faulty Networks Omer Angel Itai Benjamini Eran Ofek Udi Wieder The Weizmann Institute of Science.
Sublinear-time Algorithms for Machine Learning Ken Clarkson Elad Hazan David Woodruff IBM Almaden Technion IBM Almaden.
On allocations that maximize fairness Uriel Feige Microsoft Research and Weizmann Institute.
The Cover Time of Random Walks Uriel Feige Weizmann Institute.
Introduction to Algorithms Rabie A. Ramadan rabieramadan.org 2 Some of the sides are exported from different sources.
COMP 553: Algorithmic Game Theory Fall 2014 Yang Cai Lecture 21.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Computing Kemeny and Slater Rankings Vincent Conitzer (Joint work with Andrew Davenport and Jayant Kalagnanam at IBM Research.)
Heuristics for the Hidden Clique Problem Robert Krauthgamer (IBM Almaden) Joint work with Uri Feige (Weizmann)
Lecture 22: April 18 Probabilistic Method. Why Randomness? Probabilistic method: Proving the existence of an object satisfying certain properties without.
A polylogarithmic approximation of the minimum bisection Robert Krauthgamer The Hebrew University Joint work with Uri Feige.
Approximating Average Parameters of Graphs Oded Goldreich, Weizmann Institute Dana Ron, Tel Aviv University.
Optimization of Pearl’s Method of Conditioning and Greedy-Like Approximation Algorithm for the Vertex Feedback Set Problem Authors: Ann Becker and Dan.
Artur Czumaj Dept of Computer Science & DIMAP University of Warwick Testing Expansion in Bounded Degree Graphs Joint work with Christian Sohler.
On the Spread of Viruses on the Internet Noam Berger Joint work with C. Borgs, J.T. Chayes and A. Saberi.
The number of edge-disjoint transitive triples in a tournament.
Approximating Maximum Edge Coloring in Multigraphs
1 Maximal Independent Set. 2 Independent Set (IS): In a graph, any set of nodes that are not adjacent.
Online Ramsey Games in Random Graphs Reto Spöhel Joint work with Martin Marciniszyn and Angelika Steger.
Coloring the edges of a random graph without a monochromatic giant component Reto Spöhel (joint with Angelika Steger and Henning Thomas) TexPoint fonts.
Online Graph Avoidance Games in Random Graphs Reto Spöhel Diploma Thesis Supervisors: Martin Marciniszyn, Angelika Steger.
Randomized Algorithms and Randomized Rounding Lecture 21: April 13 G n 2 leaves
1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April
AWESOME: A General Multiagent Learning Algorithm that Converges in Self- Play and Learns a Best Response Against Stationary Opponents Vincent Conitzer.
Michael Bender - SUNY Stony Brook Dana Ron - Tel Aviv University Testing Acyclicity of Directed Graphs in Sublinear Time.
EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.
CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?
Theta Function Lecture 24: Apr 18. Error Detection Code Given a noisy channel, and a finite alphabet V, and certain pairs that can be confounded, the.
1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint work with Mira Gonen Dana Ron Tel-Aviv University.
1 Algorithmic Aspects in Property Testing of Dense Graphs Oded Goldreich – Weizmann Institute Dana Ron - Tel-Aviv University.
1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint works with Mira Gonen and Oded Goldreich Dana Ron Tel-Aviv University.
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.
Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
Finding a maximum independent set in a sparse random graph Uriel Feige and Eran Ofek.
(work appeared in SODA 10’) Yuk Hei Chan (Tom)
Experts and Boosting Algorithms. Experts: Motivation Given a set of experts –No prior information –No consistent behavior –Goal: Predict as the best expert.
Antimagic Labellings of Graphs Torsten Mütze Joint work with Dan Hefetz and Justus Schwartz.
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Small clique detection and approximate Nash equilibria Danny Vilenchik UCLA Joint work with Lorenz Minder.
6.853: Topics in Algorithmic Game Theory Fall 2011 Constantinos Daskalakis Lecture 11.
Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree R 陳建霖 R 宋彥朋 B 楊鈞羽 R 郭慶徵 R
1 Rigorous analysis of heuristics for NP-hard problems Uriel Feige Weizmann Institute Microsoft Research.
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
Heuristic Optimization Methods Greedy algorithms, Approximation algorithms, and GRASP.
Testing the independence number of hypergraphs
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
A Membrane Algorithm for the Min Storage problem Dipartimento di Informatica, Sistemistica e Comunicazione Università degli Studi di Milano – Bicocca WMC.
Amplification and Derandomization Without Slowdown Dana Moshkovitz MIT Joint work with Ofer Grossman (MIT)
Artur Czumaj DIMAP DIMAP (Centre for Discrete Maths and it Applications) Computer Science & Department of Computer Science University of Warwick Testing.
Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Computational Geometry
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Stochastic Streams: Sample Complexity vs. Space Complexity
Approximating the MST Weight in Sublinear Time
Lecture 18: Uniformity Testing Monotonicity Testing
Sum of Squares, Planted Clique, and Pseudo-Calibration
Computability and Complexity
MST in Log-Star Rounds of Congested Clique
Maximal Independent Set
On the effect of randomness on planted 3-coloring models
Presentation transcript:

Dense subgraphs of random graphs Uriel Feige Weizmann Institute

Talk Outline Discuss problems related to dense subgraphs of random graphs: Planted k-clique. Dense k-subgraph (if time permits).

Random Clique Random graph G on n vertices and edge probability ½. Maximum clique size almost surely 2log n. Upper bound: expectation. Lower bound: + variance. Not constructive.

How to actually find the clique? Greedy(degree) algorithm finds clique of size log n (plus low order terms). No better polytime algorithm known. Exhaustive search in time n O(log n).

Cryptographic applications [Juels and Peinado] Assuming state of the art is not improved: Oneway functions. Hierarchical keys. (Idea: distribution does not change if a small number of cliques of size 1.5 log n are planted in the graph.)

Planted/hidden clique Random graph G on n vertices and edge probability ½. A random set H of k vertices turned into a clique. If k > 2log n, H will almost surely be the unique maximum clique in G. Find H. Becomes easier the larger k is.

Degree concentration Degrees of vertices in G strongly concentrated around n/2. Distribution of degrees of H-vertices statistically different than other vertices if k larger than standard deviation. Kucera: if k > c(n log n) 1/2, H is simply all vertices of largest degree. (Greedy(degree) algorithm outputs H)

Use of eigenvectors [Alon, Krivilevich and Sudakov] Normalize adjacency matrix of G to sum up to 0. Eigenvalues of G strongly concentrated around 0. No value larger than n 1/2. If k > cn 1/2, H contributes a larger eigenvalue. H can be recovered from the eigenvector that corresponds to largest eigenvalue (takes some work).

Constant improvements Guess a vertex from H, and restrict problem to its neighborhood. Clique relative size increases, and graph remains random. Can find planted cliques of size n 1/2 /2 t in time n O(t). Polynomial (but very slow) for fixed t.

Use of SDP [Feige and Krauthgamer] Lovasz theta function provides upper bound of clique size. On random graphs, its value is known to be O(n 1/2 ). Can be used to both find and certify optimality of H when k > n 1/2.

Going below n 1/2 A certain Markov chain approach fails [Jerrum]. Use of t levels of Lovasz-Schrijver SDP relaxations no better than simply guessing t vertices of clique [Feige and Krauthgamer]. For k > n 1/3, a global maximum of a certain cubic form [Frieze and Kannan].

Why care about planted clique? Seems to require the development of new algorithmic techniques. A concrete challenge for understanding observable properties of random graphs (does planting a large clique make a noticeable difference?). Related to some other problems.

Interesting connection In a 2-person game, an approximate Nash equilibrium with nearly best payoffs (compared to true Nash) can be found in time n O(log n) [Lipton, Markakis and Metha]. A poly-time algorithm for approximate best Nash will solve the hidden clique problem in polynomial time [Hazan and Krauthgamer].

The experimental approach to the design and analysis of algorithms For hidden clique, the input distribution is well defined and can be sampled from efficiently. To evaluate a candidate algorithm, run it on a random sample and observe performance. If not good, modify the algorithm. If good, analyze the algorithm. In practice, graphs for experiments are generated using pseudorandom generators.

Experimental results (with Dorit Ron) n = 40,000. m = 400,000,000. n 1/2 = 200. For success rate roughly ½: k = 158 (Alg1 - LDR), 137 (Alg2 - TPMR). Is this good or bad? 2 log n = 30 n 1/4 = 14.

Understanding large sets of results To estimate the success probability within 1% error requires roughly 10,000 experiments. To see patterns, helps if results are displayed graphically. Do our algorithms work when k = n 0.49 ? Need experiments with large n.

Jumping to conclusions Care is needed. Is the PRG the issue? Is n sufficiently large to draw asymptotic conclusions? Might the choice of scaling of the x-axis be biasing our interpretation?

Jump to the analysis? The TPMR algorithm (Truncated Power Method Removal) looks promising. Difficult to analyze, but worth it, because the algorithm is so special. Or is it? (there was also Alg1 …)

Information on the algorithms General idea: Sort vertices by likelihood of being in H. Remove (one or more) least likely vertices. Repeat. Our algorithms take linear time (in m).

Low Degree Removal (LDR) Iterative removal phase: If current graph is a clique, move to expansion phase. Remove vertex of lowest degree (breaking ties arbitrarily). Iterative expansion phase: Add vertices that are connected to all the clique.

Theorem For every  < 1 there is a constant c such that if k > cn 1/2 then LDR finds the hidden k-clique H for at least a  fraction of the input instances.

Sketch of proof of theorem Lemma 1. In every subgraph with t > 11k/10 vertices, some vertex not in H has degree at most t/2 + c 1 n 1/2. Proof. Straightforward. Large deviation bounds on average degree + union bound.

Corollary As long as t > 11k/10 vertices remain, LDR removes a vertex of degree “not much larger” than t/2 (at most t/2 +c 1 n 1/2 ).

Lemma 2 For any vertex v, with high probability (say 99/100), up to the point v was removed (if at all), v’s average degree to removed vertices not in H is at most 1/2, with a total deviation no larger than c 2 n 1/2.

Sketch of proof of Lemma 2 Reveal the edges of v only when needed. Given a candidate vertex u for removal, if no edge (u,v) then remove u. Otherwise perhaps delay removal. Average rate of removal at most 1/2. Probability of excursion larger than c 2 n 1/2 is small.

Most vertices of H survive LDR. Almost all vertices of H start with “very high” degree (assuming that c > 4(c 1 + c 2 )). There are always vertices of not high degree available for removal. (Lemma 1.) The first k/10 high degree vertices of H to be removed must have lost degree at a high rate. This is a low probability event, by Lemma 2 and Markov’s inequality.

Finishing the proof 9k/10 vertices of H among the last 11k/10 survivors. Hence no vertex not in H can survive the removal phase. Expansion phase will pick up remaining vertices from H.

Conjectures The leading constant c is small: when  =1/2, then c < 1 suffices. Order of quantifiers can be switched: for some c, the fraction  tends to 1 as n grows. Lower bounds: LDR fails when k = o(n 1/2 ).

Open question Does the size of the planted clique exhibit threshold behavior with respect to the success probability of the LDR algorithm?

Truncated Power Method Removal TPMR algorithm Initially x is the vector of degrees. Compute x’ = Ax. Normalize x’ to sum up to 0. Average x and x’ to get a new x. Repeat 6 times. Sort vertices by their x value. Remove the lower 10%. Etc.

Some observations on TPMR Linear time in m, though slower than LDR. Finds smaller planted cliques than LDR. Why not let x converge? Faster. Performs better in our experiments. Any hope of analysing TPMR?

Summary Experimental approach suggests interesting observations. Commit in small steps. (Related to “decimation” in message passing algs.) Truncated power method is better than power method. Challenge: support observations by analysis.

Running times Lenovo 2.53 Ghz and 3GB RAM. 20 samples with around 50% success rates. N GEN | LDR | TPMR | 17 (3) | 48 (34) | | 80 (8) | 199 (127) | | 365 (31) | 832 (498) |