Complexity Theory Lecture 12 Lecturer: Moni Naor.

Complexity Theory Lecture 12 Lecturer: Moni Naor

Recap Last week: Hardness and Randomness Semi-random sources Extractors This Week: Finish Hardness and Randomness Circuit Complexity The class NC Formulas = NC 1 Lower bound for Andreev’s function Communication characterization of depth

Derandomization A major research question: How to make the construction of –Small Sample space `resembling’ large one –Hitting sets Efficient. Successful approach: randomness from hardness –(Cryptographic) pseudo-random generators –Complexity oriented pseudo-random generators

Extending the result Theorem : if E contains 2 Ω(n) - unapproximable functions then BPP = P. The assumption is an average case one Based on non-uniformity Improvement: Theorem: If E contains functions that require size 2 Ω(n) circuits (for the worst case), then E contains 2 Ω(n) - unapproximable functions. Corollary: If E requires exponential size circuits, then BPP = P.

How to extend the result Recall the worst-case to average case reduction For permanent The idea: encode the function in a form allowing you to translate a few worst case error into random errors

Properties of a code Want a code C:{0,1} 2 n  {0,1} 2 ℓ Where: 2 ℓ is polynomial in 2 n C is polynomial time computable –efficient encoding –Certain local decoding properties

Codes and Hardness Use for worst-case to average case: truth table of f:{0,1} n  {0,1} worst-case hard truth table of f’:{0,1} ℓ  {0,1} average-case hard 01001010 mf:mf: 01001010 00010 C(m f )

Codes and Hardness if 2 ℓ is polynomial in 2 n, then f  E implies f’  E Want to be able to prove: if f’ is s’- approximable, then f is computable by a size s = poly(s’) circuit

Codes and Hardness Key point : circuit C that approximates f’ implicitly defines a received word R C that is not far from C(m f ) Want the decoding procedure D to computes f exactly 01100010 RC:RC: 01000 0100101000010 D C Requires a special notion of efficient decoding C(m f ):

Decoding requirements Want that –for any received word R that is not far from C(m), –for any input bit 1 · i · 2 n can reconstruct m(i) with probability 2/3 by accessing only poly(n) locations in R Example of code with good local decoding properties: Hadamard But exponential length This gives a probabilistic circuit for f of size poly(n) ¢ size(C) + size of decoding circuit Since probabilistic circuit have deterministic version of similar size - contradiction

Extractor Extractor: a universal procedure for “purifying” imperfect source: –The function Ext(x,y) should be efficiently computable –truly random seed as “catalyst” –Parameters: (n, k, m, t,  ) seed source string near-uniform {0,1} n 2 k strings Ext t bits m bits Truly random

Extractor: Definition (k, ε)- extractor: for all random variables X with min-entropy k : –output fools all tests T : |Pr z [T(z) = 1] – Pr y 2 R { 0,1 } t, x  X [T(Ext(x, y)) = 1]| ≤ ε –distributions Ext(X, U t ) and U m are ε -close ( L 1 dist ≤ 2 ε ) U m uniform distribution on :{0,1} m Comparison to Pseudo-Random Generators –output of PRG fools all efficient tests –output of extractor fools all tests

Extractors: Applications Using extractors –use output in place of randomness in any application –alters probability of any outcome by at most ε Main motivation: –use output in place of randomness in algorithm –how to get truly random seed? –enumerate all seeds, take majority

Extractor as a Graph {0,1} n {0,1} m 2t2t Size : 2 k Want every subset of size 2 k to see almost all of the rhs with equal probability

Extractors: desired parameters Goals:good: optimal: short seed O(log n) log n+O(1) long output m = k Ω(1) m = k+t–O(1) many k’s k = n Ω(1) any k = k(n) seed source string near-uniform {0,1} n 2 k strings Ext t bits m bits Allows going over all seeds

Extractors A random construction for Ext achieves optimal! –but we need explicit constructions Otherwise we cannot derandomize BPP –optimal construction of extractors still open Trevisan Extractor: –idea: any string defines a function String C over  of length ℓ define a function f C :{1… ℓ }   by f C (i)=C[i] –Use NW generator with source string in place of hard function From complexity to combinatorics!

Trevisan Extractor Tools: –An error-correcting code C:{0,1} n  {0,1} ℓ Distance between codewords: (½ - ¼m -4 )ℓ – Important: in any ball of radius ½-  there are at most 1/  2 codewords.  = ½ m -2 Blocklength ℓ = poly(n) Polynomial time encoding –Decoding time does not matter –An (a,h)- design S 1,S 2,…,S m  {1…t } where h=log ℓ a = δlog n/3 t=O(log ℓ) Construction: Ext(x, y)=C(x)[y |S 1 ]◦C(x)[y |S 2 ]◦…◦C(x)[y |S m ]

Trevisan Extractor Ext(x, y)=C(x)[y |S 1 ]◦C(x)[y |S 2 ]◦…◦C(x)[y |S m ] Theorem : Ext is an extractor for min-entropy k = n δ, with –output length m = k 1/3 –seed length t = O(log ℓ ) = O(log n) –error ε ≤ 1/m 010100101111101010111001010 C(x): seed y

Proof of Trevisan Extractor Assume X µ {0,1} n is a min-entropy k random variable failing to ε -pass a statistical test T: |Pr z [T(z) = 1] - Pr x  X, y  {0,1} t [T(Ext(x, y)) = 1]| > ε By applying usual hybrid argument: there is a predictor A and 1 · i · m: Pr x  X, y  {0,1} t [A(Ext(x, y) 1 … i-1 ) = Ext(x, y) i ] > ½+ε/m

The set for which A predict well Consider the set B of x’ s such that Pr y  {0,1} t [A(Ext(x, y) 1 … i-1 ) = Ext(x, y) i ] > ½+ε/2m By averaging Pr x [x 2 B] ¸ ε/2m Since X has min-entropy k: there are at least ε/2m 2 k different x 2 B The contradiction will be by showing a succinct encoding for each x 2 B

…Proof of Trevisan Extractor i, A and B are fixed If you fix the bits outside of S i to  and  and let y’ vary over all possible assignments to bits in S i. Then Ext(x, y) i = Ext(x,  y’  ) i = C(x)[  y’  |S i ] = C(x)[ y’ ] goes over all the bits of C(x) For every x 2 B short description of a string z close to C(x) –fix bits outside of S i to  and  preserving the advantage Pr y’ [P(Ext(x,  y’  ) 1 … i-1 )=C(x)[y’] ] > ½ + ε/(2m)  and  is the assignment to {1…t}\S i maximizing the advantage of A –for j ≠ i, as y’ varies,  y’  | S j varies over only 2 a values! –Can provide (i-1) tables of 2 a values to supply Ext(x,  y’  ) 1 … i-1

Trevisan Extractor short description of a string z agreeing with C(x) A Output is C(x)[y’ ] w.p. ½ + ε/(2m) over Y’ y’ Y’  {0,1} log ℓ

…Proof of Trevisan Extractor Up to ( m-1 ) tables of size 2 a describe a string z that has a ½ + ε/(2m) agreement with C(x) Number of codewords of C agreeing with z: on ½ + ε/(2m) places is O(1/δ 2 )= O(m 4 ) Given z: there are at most O(m 4 ) corresponding x ’s Number of strings z with such a description: 2 (m-1)2 a = 2 n δ 2/3 = 2 k 2/3 total number of x 2 B O(m 4 ) 2 k 2/3 << 2 k (ε/2m) Johnson Bound: A binary code with distance (½ - δ 2 )n has at most O(1/δ 2 ) codewords in any ball of radius (½ - δ)n. C has minimum distance (½ - ¼m -4 )ℓ

Conclusion Given a source of n random bits with min entropy k which is n  (1) it is possible to run any BPP algorithm with and obtain the correct answer with high probability

Application: strong error reduction L  BPP if there is a p.p.t. TM M: x  L  Pr y [M(x,y) accepts ] ≥ 2/3 x  L  Pr y [M(x,y) rejects ] ≥ 2/3 Want: x  L  Pr y [M(x,y) accepts ] ≥ 1 - 2 -k x  L  Pr y [M(x,y) rejects ] ≥ 1 - 2 -k Already know: if we repeat O(k) times and take majority –Use n = O(k)·|y| random bits; Of them 2 n-k can be bad strings

Strong error reduction Better: Ext extractor for k = |y| 3 = n δ, ε < 1/6 –pick random w  R {0,1} n –run M(x, Ext(w, z)) for all z  {0,1} t take majority of answers –call w “ bad ” if maj z M(x, Ext(w, z)) is incorrect |Pr z [M(x,Ext(w,z))=b] - Pr y [M(x,y)=b]| ≥ 1/6 –extractor property: at most 2 k bad w –n random bits; 2 n δ bad strings

Strong error reduction {0,1} n {0,1} m 2t2t Upper bound on Size : 2 k Property: every subset of size 2 k sees almost all of the rhs with equal probability Bad strings for input at most 1/4 All strings for running the original randomized algorithm Strings where the majority of neighbors are bad

Two Surveys on Extractors Nisan and Ta-Shma, Extracting Randomness: A Survey and New Constructions 1999, (predates Trevisan) Shaltiel, Recent developments in Extractors, 2002, www.wisdom.weizmann.ac.il/~ronens/papers/survey.ps Some of the slides based on C. Umans course: www.cs.caltech.edu/~umans/cs151-sp04/index.html

Circuit Complexity We will consider several issues regarding circuit complexity

Parallelism Refinement of polynomial time via (uniform) circuits allow circuit C depth  parallel time size  parallel work depth of a circuit is the length of longest path from input to output Represents circuit latency

Parallelism the NC Hierarchy (of logspace uniform circuits): NC k = O(log k n) depth, poly(n) size circuits Bounded fan-in (2) NC = [ k NC k Aim: to capture efficiently parallelizable problems Not realistic? –overly generous in size –Does not capture all aspects of parallelism –But does capture latency Sufficient for proving (presumed) lower bounds on best latency What is NC 0

Matrix Multiplication Parallel complexity of this problem? –work = poly(n) –time = log k (n)? which k ? n x n matrix A n x n matrix B = n x n matrix AB

Matrix Multiplication arithmetic matrix multiplication… A = (a i, k ) B = (b k, j )(AB) i,j = Σ k (a i,k x b k, j ) … vs. Boolean matrix multiplication: A = (a i, k ) B = (b k, j )(AB) i,j =  k ( a i,k  b k, j ) –single output bit: to make matrix multiplication a language: on input A, B, (i, j) output (AB) i,j

Matrix Multiplication Boolean Matrix Multiplication is in NC 1 –level 1: compute n ANDS: a i,k  b k, j –next log n levels: tree of ORS –n 2 subtrees for all pairs (i, j) –select correct one and output

Boolean formulas and NC 1 Circuit for Boolean Matrix Multiplication is actually a formula. –Formula: fan-out 1. Circuit looks like a tree This is no accident: Theorem : L  NC 1 iff decidable by polynomial-size uniform family of Boolean formulas.

Boolean formulas and NC 1 from small depth circuits to formulas Proof: –convert NC 1 circuit into formula recursively: –note: logspace transformation stack depth log n, stack record 1 bit – “left” or “right”  

Boolean formulas and NC 1 from forumulas to small depth circuits –convert formula of size n into formula of depth O(log n) note: size ≤ 2 depth, so new formula has poly(n) size D CC1C1 1 C0C0 0 D    D key transformation

Boolean formulas and NC 1 –D any minimal subtree with size at least n/3 implies size(D) ≤ 2n/3 –define T(n) = maximum depth required for any size n formula –C 1, C 0, D all size ≤ 2n/3 T(n) ≤ T(2n/3) + 3 implies T(n) ≤ O(log n)

Relation to other classes Clearly NC µ P – P  uniform poly-size circuits NC 1 µ Logspace on input x, compose logspace algorithms for: generating C |x| converting to formula FVAL C |x| (x) –FVAL is: given formula and assignment what is the value of the output logspace composes!

Relation to other classes NL µ NC 2 : Claim: Directed S-T-CONN  NC 2 –Given directed G = (V, E) vertices s, t –A = adjacency matrix (with self-loops) –(A 2 ) i, j = 1 iff path of length at most 2 from node i to node j –(A n ) i, j = 1 iff path of length at most n from node i to node j –Compute with depth log n a tree of Boolean matrix multiplications, output entry s, t Repeated squaring! –log 2 n depth total Boolean MM

NC vs. P Can every efficient algorithm be efficiently parallelized? NC = P Common belief: NC ( P ?

P-Completeness A language L is P-Complete if: L 2 P Any other language in P is reducible to L via a Logspace reduction P -complete problems are the least-likely to be parallelizable if a P -complete problem is in NC, then P = NC –we use logspace reductions to show problem P -complete and we have seen Logspace in NC

Some P-Complete Problems CVAL – Circuit value problem –Given a circuit and an assignment, what is the value of the output of circuit –Canonical P-Complete problem Lexicographically first maximal independent set Linear Programming Finding a happy coloring of a graph

NC vs. P Can every uniform, poly-size Boolean circuit family be converted into a uniform, poly-size Boolean formula family? NC 1 = P Is the NC hierarchy proper: is it true that for all NC i ( NC i+1 Define AC k = O(log k n) depth, poly(n) size circuits with unbounded fan-in  and  gates Is the following true: AC i ( NC i+1 ( AC i+1 ?

Lower bounds Recall: NP does not have polynomial-size circuits ( NP  P/poly) implies P ≠ NP Major goal : prove lower bounds on (non-uniform) circuit size for problems in NP –Belief: exponential lower bound super-polynomial lower bound enough for P ≠ NP –Best bound known: 4.5n don’t even have super-polynomial bounds for problems in NEXP!

Lower bounds lots of work on lower bounds for restricted classes of circuits Formulas –Outdegree of each gate is 1 Monotone circuits –No nots (even at the input level) Constant Depth circuits –Polynomial size but unbounded fan-in

Counting argument for formulas frustrating fact: almost all functions require huge formulas Theorem [Shannon]: With probability at least 1 – o(1), a random function f:{0,1} n  {0,1 } requires a formula of size Ω(2 n /log n).

Shannon’s counting argument Proof (counting): –B(n) = 2 2 n = # functions f:{0,1} n  {0,1} –# formulas with n inputs + size s, is at most F(n, s) ≤ 4 s 2 s (2n) s 4 s binary trees with s internal nodes 2 gate choices per internal node n+2 choices per leaf

Shannon’s counting argument –F(n, c2 n /log n) < (16n) (c2 n /log n) < 16 (c2 n /log n) 2 (c2 n ) = (1 + o(1))2 (c2 n ) < o(1)2 2 n (if c ≤ ½) Probability a random function has a formula of size s = (½)2 n /log n is at most F(n, s)/B(n) < o(1)

Andreev’s function best lower bound for formulas: Theorem (Andreev, Hastad ‘93): the Andreev function requires ( , ,  )-formulas of size at Ω(n 3-o(1) ).

Andreev’s function selector yiyi n-bit string y XOR... log n copies; n/log n bits each The Andreev function A(x,y) A:{0,1} 2n  {0,1} n bits total x

Andreev’s function Theorem : the Andreev function requires ( , ,  )-formulas of size at Ω(n 3-o(1) ). First show Ω(n 2-o(1) ). Two important ideas: Random restrictions Using the existential counting lower bounds on a smaller domain

General Strategy Restrict the function and show: This must simplify the formula (a lot) But The remaining function is still quite complex, so needs a relatively large formula Conclude: we must have started with a large formula Definition : L(f) = smallest ( , ,  ) formula computing f –measured as leaf-size –Directly related to formula size

Random restrictions key idea : given function f:{0,1} n  {0,1} restrict by ρ to get f ρ –ρ sets some variables to 0/1, others remain free R(n, m) = set of restrictions that leave m variables free

Random restrictions Claim : Let m =  n. Then E ρ  R(n,  n) [L(f ρ )] ≤  L(f) –each leaf survives with probability  may shrink even more… –propagate constants What happens to the Xor of a subset under random restriction: –if at least one member of the Xor survives, the Xor is not fixed and can get obtain both values

shrinkage result From the counting argument: there exists a function h h:{0,1} log n  {0,1} for which L(h) > n/2loglog n. – hardwire truth table of that function into y to get function A * (x) –apply random restriction from set R(n, m = 2(log n)(ln log n)) to A * (x). selector n-bit string y XOR...

The lower bound –probability a particular XOR is killed by restriction: probability that we don’t “miss it” m times: (1 – (n/log n)/n) m ≤ (1 – 1/log n) m ≤ (1/e) 2ln log n ≤ 1/log 2 n –probability even one of the log n XORs is killed by the restriction is at most: log n(1/log 2 n) = 1/log n < ½.

The lower bound – probability even one of XORs is killed by restriction is at most: log n(1/log 2 n) = 1/log n < ½. – by Markov’s inequality: Pr[ L(A * ρ ) > 2 E ρ  R(n, m) [L(A * ρ )] ] < ½. Conclusion: for some restriction ρ’ both events happen: all XORs survive and L(A * ρ’ ) ≤ 2 E ρ  R(n, m) [L(A * ρ )]

The lower bound –if all XORs survive, can restrict formula further to compute the hard function h may need additional  ’s (free) L(h) = n /2loglogn ≤ L(A * ρ’ ) ≤ 2E ρ  R(n, m) [L(A * ρ )] ≤ O((m/n)L(A * )) ≤ O( ((log n)(ln log n)/ n ) 1 L(A * ) ) –Conclude: Ω(n 2-o(1) ) ≤ L(A * ) ≤ L(A). Shrinkage factor

Random restrictions and shrinkage factors Recall: E ρ  R(n, єn) [L(f ρ )] ≤  L(f) –each leaf survives with probability  But may shrink even more by propagating constants Lemma [Hastad 93]: for all f E ρ  R(n, єn) [L(f ρ )] ≤ O(  2-o(1) L(f))

The lower bound with new shrinkage factor –if all XORs survive, can restrict formula further to compute hard function h may need to add  ’s L(h) = n /2loglogn ≤ L(A * ρ ) ≤ 2E ρ  R(n, m) [L(A * ρ )] ≤ O((m/n) 2-o(1) L(A * )) ≤ O( ((log n)(ln log n)/ n ) 2-o(1) L(A * ) ) –Conclude: Ω(n 3-o(1) ) ≤ L(A * ) ≤ L(A). Shrinkage factor

What can be done in NC 1 Addition of two number each n bits –In fact can be done in AC 0 Adding n bits –Can compute majority or threshold –Something that cannot be done in AC 0 Multiplication –Adding n numbers Division

Two different characterization of NC 1 Through communication complexity Through branching programs

More on Depth: a communication complexity characterization of depth For Boolean function f:{0,1} n  {0,1} let –X=f -1 (1) –Y=f -1 (0) Consider relation R f µ X £ Y £ {1,…,n} where (x,y,i) are such that x i ≠ y i For Monotone Boolean functions define M f µ X £ Y £ {1,…,n} where (x,y,i) are such that x i = 1 and y i = 0

A communication complexity characterization of depth What is the communication complexity of R f D(R f ) assuming –Alice has x 2 X=f -1 (1) –Bob has y 2 Y=f -1 (0) Lemma: Let C be a circuit for f. Then D(R f ) · depth(C) Lemma: Let C be a monotone circuit for f. Then D(M f ) · depth(C)

From circuits to protocols both monotone and non-monotone case For each  gate Alice says which of the two inputs wires to the gate is ‘ 1 ’ under x –If both are ‘ 1 ’ picks one –This wire must be ‘ 0 ’ under y For each  gate Bob says which of the two inputs wires to the gate is ‘ 0 ’ under y –If both are ‘ 0 ’ picks one –This wire must be ‘ 1 ’ under x At the leaves, find an i such that x i ≠ y i if the circuit is monotone, then we know that x i = 1 and y i = 0 Property maintained for the subformula considered: Alice assignment yields ‘ 1 ’ and Bob’s assignment yields ‘ 0 ’

From protocols to circuits both monotone and non-monotone case Lemma: Let P be a protocol for R f. Then there is a formula of depth C(P) z 0 z 1 z 2 z 3 z 4 z 5 z 6 z 7... z5z5 Label: Alice’s move with  Bob’s move with  Leaf with rectangle A £ B and output i with either z i or  z i

A communication complexity characterization of depth Theorem : D(R f )=depth(f) Theorem : for any monotone function D(M f )=depth monotone (f) Applications: depth monotone (STCON)=  (log 2 n) depth monotone (matching)=  (n)

Example: Majority Input to Alice x 1, x 2, …, x n such that the majority are 1 Input to Bob y 1, y 2, …, y n such that the majority are 0 Partition the input into x 1, x 2, …, x n/2 input into and x n/2+1, x n/2+2, …, x n and report the result on each half

Complexity Theory Lecture 12 Lecturer: Moni Naor.

Similar presentations

Presentation on theme: "Complexity Theory Lecture 12 Lecturer: Moni Naor."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Complexity Theory Lecture 12 Lecturer: Moni Naor.

Similar presentations

Presentation on theme: "Complexity Theory Lecture 12 Lecturer: Moni Naor."— Presentation transcript:

Similar presentations

About project

Feedback