# Computational Applications of Noise Sensitivity Ryan O’Donnell.

## Presentation on theme: "Computational Applications of Noise Sensitivity Ryan O’Donnell."— Presentation transcript:

Computational Applications of Noise Sensitivity Ryan O’Donnell

Includes joint work with : Elchanan Mossel Rocco Servedio Adam Klivans Nader Bshouty Oded Regev Benny Sudakov

Intro to Noise Sensitivity

Election schemes suppose there is an election between two parties, called 0 and 1 assume unrealistically that n voters cast votes independently and unif. randomly an election scheme is a boolean function f : {0,1} n → {0,1} mapping votes to winner what if there are errors in recording of votes? suppose each vote is misrecorded independently with prob. ε.

Election schemes suppose there is an election between two parties, called 0 and 1 assume unrealistically that n voters cast votes independently and unif. randomly an election scheme is a boolean function f : {0,1} n → {0,1} mapping votes to winner what if there are errors in recording of votes? suppose each vote is misrecorded independently with prob. ε. what is the prob. this affects elec.’s outcome?

Definition Let f : {0,1} n → {0,1} be any boolean function. Let 0 ≤ ε ≤ ½, the noise rate. Let x be a uniformly randomly chosen string in {0,1} n, and let y be an ε-noisy version of x. Then the noise sensitivity of f at ε is: NS ε (f) = Pr [f(x) ≠ f(y)]. x,y

Examples Suppose f is the constant function f(x) = 1. Then NS ε (f) = 0. Suppose f is the “dictator” function f(x) = x 1. Then NS ε (f) = ε. In general, for fixed f, NS ε (f) is a function of ε.

Examples – parity The parity (xor) function on n bits 1 iff there are an odd number of 1’s in the input. In calculating Pr[f(x) ≠ f(y)], it doesn’t matter what x is, just how many flips there are. NS ε ( PARITY n ) = Pr[odd number of heads in n ε-biased coin flips] = ½ – ½(1 – 2ε) n.

NS ε ( PARITY 10 ) = ½ – ½(1 – 2ε) 10

Basic facts about NS NS ε (f) is an increasing, (log-)concave function of ε which is 0 at 0 and 2p(1-p) at ½ (where p=Pr[f = 1]). this follows from a formula for NS ε (f) in terms of Fourier coefficients: NS ε (f) = 2f(Ø) – 2 Σ (1-2ε) |S| f (S) 2. S µ [n]

PARITY, MAJORITY, dictator, and AND on 5 bits

PARITY, MAJORITY, dictator, and AND on 15 bits

PARITY, MAJORITY, dictator, and AND on 45 bits

History of Noise Sensitivity (in computer science)

History of Noise Sensitivity Kahn-Kalai-Linial ’88 The Influence of Variables on Boolean Functions

Kahn-Kalai-Linial ’88 implicitly studied noise sensitivity motivation: study of random walks on the hypercube where the initial distribution is uniform over a subset the question, “What is the prob. that a random walk of length εn, starting uniformly in f -1 (1), ends up outside f -1 (1)?” is essentially asking about NS ε (f) famous for using Fourier analysis and “Bonami-Beckner inequality” in TCS

History of Noise Sensitivity Håstad ’97 Some Optimal Inapproximability Results

Håstad ’97 breakthrough hardness of approximation results decoding the Long Code: given access to the truth-table of a function, want to test that it is “significantly” determined by a “junta” (very small number of variables) roughly, does a noise sensitivity test: picks x and y as in n.s., tests f(x)=f(y)

History of Noise Sensitivity Benjamini-Kalai-Schramm ’98 Noise Sensitivity of Boolean Functions and Applications to Percolation Benjamini-Kalai-Schramm ’98 Noise Sensitivity of Boolean Functions and Applications to Percolation

Benjamini-Kalai-Schramm ’98 intensive study of noise sensitivity of boolean functions introduced asymptotic notions of noise sensitivity/stability, related them to Fourier coefficients studied noise sensitivity of percolation functions, threshold functions made conjectures connecting noise sensitivity to circuit complexity and more…

This thesis New noise sensitivity results and applications: tight noise sensitivity estimates for boolean halfspaces, monotone functions hardness amplification thms. (for NP) learning algorithms for halfspaces, DNF (from random walks), juntas new coin-flipping problem, and use of “reverse” Bonami-Beckner inequality

Hardness Amplification

Hardness on average def: We say f : {0,1} n → {0,1} is (1-ε)-hard for circuits of size s if there is no circuit of size s which computes f correctly on more than (1-ε)2 n inputs. def: A complexity class is (1-ε)-hard for polynomial circuits if there is a function family (f n ) in the class such that for suff. large n, f n is (1-ε)-hard for circuits of size poly(n).

Hardness of EXP, NP Of course we can’t show NP is even (1-2 -n )- hard for poly ckts, since this is NP µ P/poly. But let’s assume EXP, NP µ P/poly. Then just how hard are these for poly circuits? For EXP, extremely strong results known – [ BFNW 93,Imp95,IW97,KvM99,STV99]: if EXP is (1-2 -n )-hard for poly circuits, then it is (½ + 1/poly(n))-hard for poly circuits. What about NP?

Yao’s XOR Lemma Some of the hardness amplification results for EXP use Yao’s XOR Lemma: Thm: If f is (1-ε)-hard for poly circuits, then PARITY k ­ f is (½+½(1-2ε) k )-hard for poly circuits. Here, if f is a boolean fcn on n inputs and g is a boolean fcn on k inputs, g ­ f is the function on kn inputs given by g(f(x 1 ), …, f(x k )). No coincidence that the hardness bound for PARITY k ­ f is 1-NS ε ( PARITY k ).

A general direct product thm. Yao doesn’t help for NP – if you have a hard function f n in NP, PARITY k ­ f n probably isn’t in NP. We generalize Yao and determine the hardness of g ­ f n for any g – in terms of the noise sensitivity of g: Thm: If f (balanced) is (1-ε)-hard for poly circuits, then g ­ f n is roughly (1-NS ε (g))- hard for poly circuits.

Why noise sensitivity? Suppose f is balanced and (1-ε)-hard for poly circuits. x 1, …, x k are chosen uniformly at random, and you, a poly circuit, have to guess g(f(x 1 ), …, f(x k )). Natural strategy is to try to compute each y i = f(x i ) and then guess g(y 1,…,y k ). But f is (1-ε)-hard for you! So Pr[f(x i )≠y i ] = ε. Success prob.: Pr[g(f(x 1 )…f(x k ))=g(y 1 …y k )] = 1-NS ε (g).

Hardness of NP If (f n ) is a (hard) function family in NP, and (g k ) is a monotone function family, then (g k ­ f n ) is in NP. We give constructions and prove tight bounds for the problem of finding monotone g such that NS ε (g) is very large (close to ½) for ε very small. Thm: If NP is (1-1/poly(n))-hard for poly ckts, then NP is (½ + 1/√n)-hard for poly ckts.

Learning algorithms

Learning theory Learning theory ([Valiant84]) deals with the following scenario: someone holds an n-bit boolean function f you know f belongs to some class of fcns (eg, {parities of subsets}, {poly size DNF}) you are given a bunch of uniformly random labeled examples, (x, f(x)) you must efficiently come up with a hypothesis function h that predicts f well

Learning noise-stable functions We introduce a new idea for showing function classes are learnable: Noise-stable classes are efficiently learnable Thm: Suppose C is a class of boolean fcns on n bits, and for all f ∈ C, NS ε (f) ≤ β(ε). Then there is an alg. for learning C to within accuracy ε in time: n O(1)/β (ε).

Example – halfspaces E.g., using [Peres98], every boolean function f which is the “intersection of two halfspaces” has NS ε (f) ≤ O(√ε). Cor: The class of “intersections of two halfspaces” can be learned in time n O(1/ε²). No previously known subexponential alg. We also analyze the noise sensitivity of some more complicated classes based on halfspaces and get learning algs. for them.

Why noise stability? Suppose a function is fairly noise stable. In some sense this means if you know f(x), you have a good guess for f(y) for y’s which are somewhat close to x in Hamming distance. Idea: Draw a “net” of examples: (x 1, f(x 1 )), … (x M, f(x M )). To hypothesize about y, compute a weighted average of known labels, based on dist. to y: hypothesis =… sgn[ w(Δ(y,x 1 ))f(x 1 ) + ··· + w(Δ(y,x M ))f(x M ) ].

Learning from random walks Holy grail of learning: Learn poly size DNF formulas in polynomial time. Consider natural weakening of learning: examples not iid, come from random walk. We show DNF poly-time learnable in this model. Indeed, also in a harder model: “NS-model”: examples are (x,f(x),y,f(y)) Proof: estimate NS on subsets of input bits ⇒ find large Fourier coefficients.

Learning juntas The essential blocking issue for learning poly size DNF formulas is that they can be O(log n)-juntas. Previously, no known algorithm for learning k-juntas in time better than the trivial n k. We give the first improvement: algorithm runs in time n.704k. Can the strong relationship between juntas and noise sensitivity improve this?

Coin flipping

The T 1-2ε operator T 1-2ε operates on the space of functions {0,1} n → R: T 1-2ε (f) (x) = E [f(y)] (= Pr[f(y) = 1]). Notable fact about T 1-2ε : the Bonami- Beckner [Bon68] “hypercontractive” inequality:||T λ (f)|| 2 ≤ ||f|| 1+λ² y = noise ε (x)

Bonami, Beckner

The T 1-2ε operator It follows easily that: NS ε (f) = ½ - ½ ||T √1-2ε (f)|| 2. Thus studying noise sensitivity is equivalent to studying the (2-)norm of the T 1-2ε operator. We consider studying higher norms of the T 1-2ε operator. The problem can be phrased combinatorially, in terms of a natural coin flipping problem.

“Cosmic coin flipping” n random votes cast in an election we use a balanced election scheme, f k different auditors get copies of the votes; however, each gets an ε-noisy copy what is the probability all k auditors agree on the winner of the election? Equivalently, k distributed parties want to flip a shared random coin given noisy access to a “cosmic” random string.

Relevance of the problem Application of this scenario: “Everlasting security” of [DingRabin01] – a cryptographic protocol assuming that many distributed parties have access to a satellite broadcasting stream of random bits. Also a natural error-correction problem: without encoding, can parties attain some shared entropy?

Success as function of k Most interesting asymptotic case: ε a small constant, n unbounded, k → ∞. What is the maximum success probability? Surprisingly, goes to 0 only polynomially: Thm: The best success probability of k players is Õ( 1/k 4ε ), with the majority function being essentially optimal.

Reverse Bonami-Beckner To prove that no protocol can do better than k -Ω(1), we need to use a reverse Bonami- Beckner inequality [Bor82]: for f ≥ 0, t ≥ 0, ||T λ (f)|| 1-t/λ ≥ ||f|| 1-tλ Concentration of measure interpretation: Let A be a reasonably large subset of the cube. Then almost all x have Pr[y ∈ A] somewhat large.

Conclusions

Open directions estimate the noise sensitivity of various classes of functions – general intersections of threshold functions, percolation functions, … new hardness of approx. results using NS- junta connection [DS02,Kho02,DF03?]… find a substantially better algorithm for learning juntas explore applications of reverse Bonami- Beckner – coding theory, e.g.?