Ryan O’Donnell - Microsoft Mike Saks - Rutgers Oded Schramm - Microsoft Rocco Servedio - Columbia.

Slides:



Advertisements
Similar presentations
How to Solve Longstanding Open Problems In Quantum Computing Using Only Fourier Analysis Scott Aaronson (MIT) For those who hate quantum: The open problems.
Advertisements

Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.
LEARNIN HE UNIFORM UNDER DISTRIBUTION – Toward DNF – Ryan ODonnell Microsoft Research January, 2006.
Lecture 9. Resource bounded KC K-, and C- complexities depend on unlimited computational resources. Kolmogorov himself first observed that we can put resource.
Circuit and Communication Complexity. Karchmer – Wigderson Games Given The communication game G f : Alice getss.t. f(x)=1 Bob getss.t. f(y)=0 Goal: Find.
Inapproximability of MAX-CUT Khot,Kindler,Mossel and O ’ Donnell Moshe Ben Nehemia June 05.
Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
1 Decomposing Hypergraphs with Hypertrees Raphael Yuster University of Haifa - Oranim.
Divide and Conquer. Subject Series-Parallel Digraphs Planarity testing.
Greedy Algorithms Greed is good. (Some of the time)
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
Kevin Matulef MIT Ryan O’Donnell CMU Ronitt Rubinfeld MIT Rocco Servedio Columbia.
Learning Juntas Elchanan Mossel UC Berkeley Ryan O’Donnell MIT Rocco Servedio Harvard.
Playing Fair at Sudoku Joshua Cooper USC Department of Mathematics.
Discrete Structure Li Tak Sing( 李德成 ) Lectures
Noga Alon Institute for Advanced Study and Tel Aviv University
A Randomized Linear-Time Algorithm to Find Minimum Spanning Trees David R. Karger David R. Karger Philip N. Klein Philip N. Klein Robert E. Tarjan.
1/17 Optimal Long Test with One Free Bit Nikhil Bansal (IBM) Subhash Khot (NYU)
Dictator tests and Hardness of approximating Max-Cut-Gain Ryan O’Donnell Carnegie Mellon (includes joint work with Subhash Khot of Georgia Tech)
The number of edge-disjoint transitive triples in a tournament.
Topological nonexistence results in complexity theory and combinatorics László Lovász Microsoft Research One Microsoft Way, Redmond, WA 98052
CSL758 Instructors: Naveen Garg Kavitha Telikepalli Scribe: Manish Singh Vaibhav Rastogi February 7 & 11, 2008.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
Derandomizing LOGSPACE Based on a paper by Russell Impagliazo, Noam Nissan and Avi Wigderson Presented by Amir Rosenfeld.
Correlation Immune Functions and Learning Lisa Hellerstein Polytechnic Institute of NYU Brooklyn, NY Includes joint work with Bernard Rosell (AT&T), Eric.
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
CSE 421 Algorithms Richard Anderson Lecture 27 NP Completeness.
Chapter 11: Limitations of Algorithmic Power
Lecture 20: April 12 Introduction to Randomized Algorithms and the Probabilistic Method.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
Antimagic Labellings of Graphs Torsten Mütze Joint work with Dan Hefetz and Justus Schwartz.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Randomized Algorithms Morteza ZadiMoghaddam Amin Sayedi.
Computing and Communicating Functions over Sensor Networks A.Giridhar and P. R. Kumar Presented by Srikanth Hariharan.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Approximating the Minimum Degree Spanning Tree to within One from the Optimal Degree R 陳建霖 R 宋彥朋 B 楊鈞羽 R 郭慶徵 R
Edge-disjoint induced subgraphs with given minimum degree Raphael Yuster 2012.
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
PODC Distributed Computation of the Mode Fabian Kuhn Thomas Locher ETH Zurich, Switzerland Stefan Schmid TU Munich, Germany TexPoint fonts used in.
1/19 Minimizing weighted completion time with precedence constraints Nikhil Bansal (IBM) Subhash Khot (NYU)
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
1 Decomposition into bipartite graphs with minimum degree 1. Raphael Yuster.
Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.
Topics in Algorithms 2007 Ramesh Hariharan. Tree Embeddings.
LIMITATIONS OF ALGORITHM POWER
Smooth Boolean Functions are Easy: Efficient Algorithms for Low-Sensitivity Functions Rocco Servedio Joint work with Parikshit Gopalan (MSR) Noam Nisan.
Algorithms for hard problems Parameterized complexity Bounded tree width approaches Juris Viksna, 2015.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Analysis of Boolean Functions and Complexity Theory Economics Combinatorics …
Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.
Theory of Computational Complexity Yusuke FURUKAWA Iwama Ito lab M1.
Theory of Computational Complexity M1 Takao Inoshita Iwama & Ito Lab Graduate School of Informatics, Kyoto University.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
The NP class. NP-completeness
Information Complexity Lower Bounds
Introduction to Randomized Algorithms and the Probabilistic Method
Dana Ron Tel Aviv University
Computing Graph Properties by Randomized Subcube Partitions
Lecture 18: Uniformity Testing Monotonicity Testing
Randomized Algorithms CS648
Richard Anderson Lecture 25 NP-Completeness
Introduction Wireless Ad-Hoc Network
Randomized Algorithms CS648
Clustering.
Richard Anderson Lecture 25 NP Completeness
Switching Lemmas and Proof Complexity
Presentation transcript:

Ryan O’Donnell - Microsoft Mike Saks - Rutgers Oded Schramm - Microsoft Rocco Servedio - Columbia

Part I: Decision trees have large influences

Does anything print? Can print from Notepad? Right size paper? Printer mis-setup? File too complicated? Network printer? Driver OK? Solved Driver OK? Solved Call tech support Printer troubleshooter

f : {Attr 1 } × {Attr 2 } × ∙∙∙ × {Attr n } → {−1,1}. What’s the “best” DT for f, and how to find it? Depth = worst case # of questions. Expected depth = avg. # of questions. Decision tree complexity

1.Identify the most ‘influential’/‘decisive’/‘relevant’ variable. 2.Put it at the root. 3.Recursively build DTs for its children. Almost all real-world learning algs based on this – CART, C4.5, … Almost no theoretical (PAC-style) learning algs based on this – [Blum92, KM93, BBVKV97, PTF-folklore, OS04] – no; [EH89, SJ03] – sorta. Conj’d to be good for some problems (e.g., percolation [SS04]) but unprovable… Building decision trees

Boolean DTs f : {−1,1} n → {−1,1}. D(f) = min depth of a DT for f. 0 ≤ D(f) ≤ n. x1x1 x2x2 x3x3 −1 1 1 x2x2 x3x3 1 Maj 3

Boolean DTs {−1,1} n viewed as a probability space, with uniform probability distribution. uniformly random path down a DT, plus a uniformly random setting of the unqueried variables, defines a uniformly random input expected depth : δ(f).

Influences influence of coordinate j on f = the probability that x j is relevant for f I j (f) = Pr[ f(x) ≠ f(x ( ⊕ j) ) ]. 0 ≤ I j (f) ≤ 1.

Main question: If a function f has a “shallow” decision tree, does it have a variable with “significant” influence?

Main question: No. But for a silly reason: Suppose f is highly biased; say Pr[f = 1] = p ≪ 1. Then for any j, I j (f) = Pr[f(x) = 1, f(x (  j) ) = −1] + Pr[f(x) = −1, f(x (  j) ) = 1] ≤ Pr[f(x) = 1] + Pr[f(x (  j) ) = 1] ≤ p + p = 2p.

Variance ⇒ Influences are always at most 2 min{p,q}. Analytically nicer expression: Var[f]. Var[f] = E[f 2 ] – E[f] 2 = 1 – (p – q) 2 = 1 – (2p − 1) 2 = 4p(1 – p) = 4pq. 2 min{p,q} ≤ 4pq ≤ 4 min{p,q}. It’s 1 for balanced functions. So I j (f) ≤ Var[f], and it is fair to say I j (f) is “significant” if it’s a significant fraction of Var[f].

Main question: If a function f has a “shallow” decision tree, does it have a variable with influence at least a “significant” fraction of Var[f]?

Notation τ(d) = min max { I j (f) / Var[f] }. f : D(f) ≤ dj

Known lower bounds Suppose f : {−1,1} n → {−1,1}. An elementary old inequality states Var[f] ≤ I j (f). Thus f has a variable with influence at least Var[f]/n. A deep inequality of [KKL88] shows there is always a coord. j such that I j (f) ≥ Var[f] · Ω(log n / n). If D(f) = d then f really has at most 2 d variables. Hence we get τ(d) ≥ 1/2 d from the first, and τ(d) ≥ Ω(d/2 d ) from KKL. j = 1 n Σ

Our result τ(d) ≥ 1/d. This is tight: Then Var[SEL] = 1, d = 2, all three variables have infl. ½. (Form recursive version, SEL(SEL, SEL, SEL) etc., gives Var 1 fcn with d = 2 h, all influences 2 −h for any h.) x1x1 x2x2 −11 x3x3 1 “SEL”

Our actual main theorem Given a decision tree f, let δ j (f) = Pr[tree queries x j ]. Then Var[f] ≤ δ j (f) I j (f). Cor: Fix the tree with smallest expected depth. Then δ j (f) = E[depth of a path] =: δ(f) ≤ D(f). ⇒ Var[f] ≤ max I j · δ j = max I j · δ(f) ⇒ max I j ≥ Var[f] / δ(f) ≥ Var[f] / D(f). j = 1 n Σ n Σ n Σ

Proof Pick a random path in the tree. This gives some set of variables, P = ( x J 1, …, x J T ), along with an assignment to them, β P. Call the remaining set of variables P and pick a random assignment β P for them too. Let X be the (uniformly random string) given by combining these two assignments, (β P, β P ). Also, define J T+1, …, J n = ┴.

Proof Let β’ P be an independent random asgn to vbls in P. Let Z = (β’ P, β P ). Note: Z is also uniformly random. x J 1 = –1 x J 2 = 1 x J 3 = -1 –1 x J T = 1 X = (-1, 1, -1, …, 1, ) Z = (, ) 1, -1, 1, -1 J1J1 J2J2 J3J3 JTJT J T+1 = ··· = J n = ┴ P P 1,-1, -1, …,-1

Proof Finally, for t = 0…T, let Y t be the same string as X, except that Z’s assignments ( β’ P ) for variables x J 1, …, x J t are swapped in. Note: Y 0 = X, Y T = Z. Y 0 = X = (-1, 1, -1, …, 1, 1, -1, 1, -1 ) Y 1 = ( 1, 1, -1, …, 1, 1, -1, 1, -1 ) Y 2 = ( 1,-1, -1, …, 1, 1, -1, 1, -1 ) · · · · Y T = Z = ( 1,-1, -1, …,-1, 1, -1, 1, -1 ) Also define Y T+1 = · · · = Y n = Z.

Var[f] = E[f 2 ] – E[f] 2 = E[ f(X)f(X) ] – E[ f(X)f(Z) ] = E[ f(X)f(Y 0 ) – f(X)f(Y n ) ] = E[ f(X) (f(Y t−1 ) – f(Y t )) ] ≤ E[ |f(Y t−1 ) – f(Y t )| ] = 2 Pr[f(Y t−1 ) ≠ f(Y t )] = Pr[J t = j] · 2 Pr[f(Y t−1 ) ≠ f(Y t ) | J t = j] t = 1.. n Σ Σ Σ Σ j = 1.. n Σ t = 1.. n Σ j = 1.. n Σ

Proof …= Pr[J t = j] · 2 Pr[f(Y t−1 ) ≠ f(Y t ) | J t = j] Utterly Crucial Observation: Conditioned on J t = j, (Y t−1, Y t ) are jointly distributed exactly as (W, W’), where W is uniformly random, and W’ is W with jth bit rerandomized. j = 1.. n Σ t = 1.. n Σ

Y 0 = X = (-1, 1, -1, …, 1, 1, -1, 1, -1 ) Y 1 = ( 1, 1, -1, …, 1, 1, -1, 1, -1 ) Y 2 = ( 1,-1, -1, …, 1, 1, -1, 1, -1 ) · · · · Y T = Z = ( 1,-1, -1, …,-1, 1, -1, 1, -1 ) x J 1 = –1 x J 2 = 1 x J 3 = 1 –1 x J T = 1 X = (-1, 1, -1, …, 1, ) Z = (, ) 1, -1, 1, -1 J1J1 J2J2 J3J3 JTJT J T+1 = ··· = J n = ┴ P P 1,-1, -1, …,-1

Proof …= Pr[J t = j] · 2 Pr[f(Y t−1 ) ≠ f(Y t ) | J t = j] = Pr[J t = j] · 2 Pr[f(W) ≠ f(W’)] = Pr[J t = j] · I j (f) = I j · Pr[J t = j] = I j δ j. j = 1.. n Σ t = 1.. n Σ j = 1.. n Σ t = 1.. n Σ j = 1.. n Σ t = 1.. n Σ j = 1.. n Σ Σ t = 1.. n Σ

Part II: Lower bounds for monotone graph properties

Monotone graph properties Consider graphs on v vertices; let n = ( ). “Nontrivial monotone graph property”: “nontrivial property”: a (nonempty, nonfull) subset of all v-vertex graphs “graph property”: closed under permutations of the vertices (  no edge is ‘distinguished’) monotone: adding edges can only put you into the property, not take you out e.g.: Contains-A-Triangle, Connected, Has-Hamiltonian- Path, Non-Planar, Has-at-least-n/2-edges, … v2v2

Aanderaa-Karp-Rosenberg conj. Every nontrivial monotone graph propery has D(f) = n. [Rivest-Vuillemin-75]: ≥ v 2 /16. [Kleitman-Kwiatowski-80] ≥ v 2 /9. [Kahn-Saks-Sturtevant-84] ≥ n/2, = n, if v is a prime power. [Topology + group theory!] [Yao-88] = n in the bipartite case.

Randomized DTs Have ‘coin flip’ nodes in the trees that cost nothing. Or, probability distribution over deterministic DTs. Note: We want both 0-sided error and worst-case input. R(f) = min, over randomized DTs that compute f with 0- error, of max over inputs x, of expected # of queries. The expectation is only over the DT’s internal coins.

D(Maj 3 ) = 3. Pick two inputs at random, check if they’re the same. If not, check the 3rd.  R(Maj 3 ) ≤ 8/3. Let f = recursive-Maj 3 [ Maj 3 (Maj 3, Maj 3, Maj 3 ), etc…] For depth-h version (n = 3 h ), D(f) = 3 h. R(f) ≤ (8/3) h. (Not best possible…!) Maj 3 :

Randomized AKR / Yao conj. Yao conjectured in ’77 that every nontrivial monotone graph property f has R(f) ≥ Ω(v 2 ). Lower bound Ω( · )Who v[Yao-77] v log 1/12 v[Yao-87] v 5/4 [King-88] v 4/3 [Hajnal-91] v 4/3 log 1/3 v[Chakrabarti-Khot-01] min{ v/p, v 2 /log v }[Fried.-Kahn-Wigd.-02] v 4/3 / p 1/3 [us]

Outline Extend main inequality to the p-biased case. (Then LHS is 1.) Use Yao’s minmax principle: Show that under p-biased {−1,1} n, δ = Σ δ j = avg # queries is large for any tree. Main inequality: max influence is small ⇒ δ is large. Graph property  all vbls have the same influence. Hence: sum of influences is small ⇒ δ is large. [OS04]: f monotone ⇒ sum of influences ≤ √ δ. Hence: sum of influences is large ⇒ δ is large. So either way, δ is large.

Generalizing the inequality Var[f] ≤ δ j (f) I j (f). Generalizations (which basically require no proof change): holds for randomized DTs holds for randomized “subcube partitions” holds for functions on any product probability space f : Ω 1 × ∙∙∙ × Ω n → {−1,1} (with notion of “influence” suitably generalized) holds for real-valued functions with (necessary) loss of a factor, at most √ δ j = 1 n Σ

Closing thought It’s funny that our bound gets stuck roughly at the same level as Hajnal / Chakrabarti-Khot, n 2/3 = v 4/3. Note that n 2/3 [I believe] cannot be improved by more than a log factor merely for monotone transitive functions, due to [BSW04]. Thus to get better than v 4/3 for monotone graph properties, you must use the fact that it’s a graph property. Chakrabarti-Khot does definitely use the fact that it’s a graph property (all sorts of graph packing lemmas). Or do they? Since they get stuck at essentially v 4/3, I wonder if there’s any chance their result doesn’t truly need the fact that it’s a graph property…