Download presentation

Presentation is loading. Please wait.

Published byNichole Freebern Modified over 3 years ago

1
New degree bounds for polynomials with prescribed signs Ryan ODonnell (MIT) Rocco Servedio (Harvard/Columbia)

2
Polynomials with prescribed signs Suppose m disjoint regions R 1, …, R m are given in R n, along with associated signs, σ 1, …, σ m. What is the lowest degree polynomial p : R nR which has the prescribed signs on the regions? In one dimension the problem is trivial: if the regions are intervals, the number of sign alternations is necessary and sufficient. In two or more dimensions…??

3
Polynomial threshold functions A special case: Let f :{0,1} n{+1,1} be a boolean function. Let p : R nR be a polynomial. We say that p is a polynomial threshold function (PTF) for f, or p sign-represents f, if: f(x) = sgn(p(x)) for all x {0,1} n. We are concerned with finding the lowest degree PTF for f.

4
Polynomial threshold functions For example: x 1 +x 2 +…+x n ½ deg 1 PTF for OR x 1 +x 2 +…+x n (n½) deg 1 PTF for AND x 1 +x 2 +…+x n (n/2) deg 1 PTF for MAJ (12x 1 ) (12x 2 )···(12x n ) deg n for PARITY Every n -bit boolean function has a PTF (indeed, an exact rep.) of degree n. (Consider: … + f (1101) x 1 x 2 (x 31)x 4 + … )

5
Polynomial threshold functions What are PTFs good for? natural algebraic model of complexity upper bounds machine learning: given a class of functions C, if every function has a PTF of degree d, can learn C in time n O(d) used to prove PP closed under intersection lower bounds oracle separations slightly stricter model related to quantum decision tree complexity

6
Prior work lower bounds Minsky & Papert, Perceptrons, 1968: artificial intelligence perspective proved three major lower bounds: -PARITY requires PTF degree n -a certain DNF formula, one in a box, the n 1/3 way OR of n 2/3 way AND s, requires PTF degree n 1/3 -MAJ(x 1,…,x n ) AND MAJ(y 1,…,y n ) requires superconstant PTF degree No new, essentially diff., lower bounds known.

7
Prior work upper bounds [BRS95] considered AND-MAJ n as well; they showed it has PTF degree O(log n) ; they used this to show PP is closed under intersection [KS01] showed that every DNF formula on n variables with s terms has a PTF of degree O(n 1/3 log s) ; they use this to get a subexponential time learning algorithm for DNF formulas which is fastest known

8
Our results Upper bound: every boolean function given by an AND/OR/NOT formula of size s and depth d has a PTF of degrees log O(d) s (note that degree s is trivial) gives a subexponential time learning algorithm for, say, linear size formulas of superconstant depth, first such known Lower bound: new technique AND-MAJ n requires PTF degree Ω(log n / log log n).

9
Talk outline Plan for the talk: 1.Proves log O(d) s PTF upper bound for formulas. 2.Prove Ω(log n / log log n) PTF lower bound for AND-MAJ n.

10
Boolean formulas AND OR x1x1 x2x2 x3x3 x4x4 x7x7 x2x2 x5x5 x1x1 x8x8 x9x9 x 10 x 11 x 12 x1x1 x4x4 x6x6 x7x7 x 13 a formula is a tree whose gates are AND s or OR s, unbounded fanin leaves are labeled with literals size is number of leaves depth is longest roottoleaf path

11
PTFs for boolean formulas (In this section we use {0,1} always.) Idea: replace all gates with low degree polynomials which simulate the gate: AND(v 1,…v k ) ? v 1 + … + v k (k1) [(v 1 + … + v k ) / k] k log(1/ε) AND 1 1 0 0 1 0

12
A better amplifying polynomial We want to amplify the disparity between 11/ k and 1. Raising to the power of k works, but costs a lot of degree. We desire a polynomial of low degree which keeps values in [0, 11/ k] between 0 and 1 but amplifies the point 1 to, say, 2. Equivalently, want to get a polynomial bounded on [0,1], with maximum derivative at 1.

13
Chebyshev polynomials This is an old problem of analysis, solved by the Chebyshev polynomials of the first kind. These are a family of orthogonal polynomials, (C r ) r N, with the properties: deg(C r ) = r, C r ([-1,1]) [-1,1], C r ' (1) = r 2, C r (1+1/r 2 ) 2. C r (x) = cos(r acos(x)).

14
Chebyshev polynomials at gates Chebyshev polynomials give us a square- root degree savings: Imagine replacing AND(v 1,…v k ) with: C k ([(v 1 + … + v k ) / (k-1)]). ( * ) (v 1 + … + v k ) / (k-1) 1+1/k if all v i s are roughly 1, and is in [0,1] otherwise. Hence ( * ) is something like 2 when the AND is true, and is between -1 and 1 otherwise. (This idea is originally from [KS01].)

15
Chebyshev polynomials at gates In fact, we will replace each AND gate by: ε C k ([(v 1 + … + v k ) / (k-1)]) log(1/ε), and something similar for OR gates. Note that if the inputs have 0/1 values ε, so do the outputs. Further, if the v i s all have degree bounded by d, the resulting polynomial has degree bounded by d k log(1/ ε).

16
Almost done By applying these polynomials at every gate, we can easily conclude: Suppose F is a formula in which along every path from root to leaf, the product of the fan-ins is t. Then we can sign-represent F with a polynomial of degreet log O(d) s. (Need to take ε 1/s.) We are not quite done, because these fan-in products can be huge!

17
Bounding fan-in products OR AND OR … x 1 … x n/100 x n/100 … x 2 n/100 … … Only n variables (leaves) are used, but one path has fan-in product (n/100) 100.

18
Solution: bucket The trick is now to partition each gate into gates, each of which has subformulas of similar size: AND s1s1 s2s2 s3s3 s4s4 1 s i < 22 j s i < 2 j+1 s/2 s i < s log s

19
Conclusion of upper bound Now it is easy to see that gates with a subformula of depth d and size s have maximum root-to-leaf fan-in product of O(s log d s) : Pf: By induction: the AND bucket with subsizes in [2 j, 2 j+1 ] has fan-in at most s/2 j. Hence if we first modify our formulas in this way, and then apply the Chebyshev construction, we get PTFs of degrees log O(d) s, as desired.

20
Talk outline Plan for the talk: 1.Proves log O(d) s PTF upper bound for formulas. 2.Prove Ω(log n / log log n) PTF lower bound for AND-MAJ n.

21
Lower bound for AND-MAJ n Recall the AND-MAJ n function: (x 1,…,x n, y 1,…,y n ) MAJ(x 1,…,x n ) AND MAJ(y 1,…,y n ). Minsky and Papert (1968) showed that any PTF required superconstant ω(1) degree. Beigel, Reingold, and Spielman (1995) exhibited a PTF of degree O(log n). We give a new lower bound of: Ω(log n / log log n).

22
The two-dimensional problem Minsky and Papert observed that the problem of PTFs for AND-MAJ n is equivalent to a much simpler polynomial sign prescription problem – the M-intersector problem: -R 2, bivariate polynomial -regions: all odd lattice points bounded by M -upperright points positive, others negative y x M

23
Proof of equivalence Switch to {+1,1} in input and output. ( ) Suppose p is an n -intersector. Then p(x i, y i ) is a PTF for AND-MAJ n of same degree. ( ) Suppose p is the PTF. Consider: q(x 1 …x n, y 1 …y n ) = p(x π(1) …x π(n), y π'(1) …y π'(n) ). By symmetry, q is also a PTF for AND-MAJ n. But q is symmetric in x s and y s, hence depends only on their sum, q=q(x i, y i ). π,π' S n

24
The M -intersector problem Consider the more general sign prescription problem: No polynomial can have these signs! Proof: Assume we have p of minimal degree. By continuity, p must be 0 on x half-axis. By Bezout, x | p. Divide through; the result has smaller degree, solves (essentially) same problem. y x +

25
Reproving Minsky-Papert This can be used to show Minsky and Paperts superconstant lower bound. Suppose there was a fixed d such that there was a M -intersector of degree d for every M. Take M, rescaling to the unit square. By compactness and continuity, there is a limiting degree- d polynomial whose signs are as on the previous slide, a contradiction.

26
The relaxed case [BRS95] constructed a bivariate polynomial of degree O(log M) for the sign pattern shown. We now describe how to obtain a lower bound of Ω(log M / log log M) for the M -intersector problem. We show that for any d, there is a subset of lattice points with coordinates at most d O(d) which cant be done in degree d. y x + 1 M

27
A constructive solution It is possible to show PTF lower bounds constructively. Let Z denote the set of odd lattice points, and let f denote the function which is +1 in the upper-right quadrant,1 elsewhere. Suppose we could find a probability distribution w on Z under which every monomial x i y j, 0 i+j d, had zero correlation with f.

28
A constructive solution I.e., suppose we have w : Z R0, w (z) = 1, such that: f(x,y) x i y j w (x,y) = 0 for all monomials x i y j of degree at most d. Suppose also that w = 0 on points with coordinates exceeding M. We claim this implies no M -intersector of degree d exists. z Z (x,y) Z

29
Proof of constructive method Proof: Suppose p were an M -intersector of degree d. On one hand, by linearity of expectation, E w [f(x,y)p(x,y)] = 0, since f is uncorrelated with monomials of degree d. On the other hand, on all lattice points bounded by M, f(x,y)p(x,y) > 0. But w gives all of its probability mass to these points. Intriguingly, the much stronger converse (no distribution PTF) is true, by LP duality.

30
Constructing the distribution There are D = (d+1)(d+2)/2 constraints – monomials we want to be uncorrelated with. Suppose we pick just D+1 points for our distribution to be supported on, (x 1,y 1 ), …, (x D+1,y D+1 ). Then the condition that w is a probability distribution over these points under which all constraint monomials have 0 correlation with f is a (D+1)×(D+1) linear system.

31
Constructing the distribution monomial x i y j point (x k,y k ) f(x k,y k ) x k i y k j 1 1 1 1 1 · · · · · 1 1 1 = 1000 : :01000 : :0 w (x 1,y 1 ) w (x 2,y 2 ) w (x 3,y 3 ) w ( x D+1, y D+1 ) Our desire is that the solution be nonnegative.

32
Me thinking

33
Rocco thinking

34
Our solution We now pull a rabbit out of our hat and name the exact set of points on which the distribution will be supported. Essentially, we want just the grid of points, but in the log scale. Let h be a large number to be named later. Our points will be a subset of { ( h i, h j ) : 0 i+j d}.

35
Our solution The exact (D+1) points to consider are: {( 1) l h k, ( 1) k h l : 0 k+l d} { 1, 1 }, where h = d O(1), and odd.

36
Finishing the proof We consider the linear system given by this choice of points. We need to show the solution consists of nonnegative values. The solution weights are ratios of two certain determinants, by Cramers rule. Each determinant is a polynomial in h. We calculate the highest order terms, show that they dominate the polynomial (using the fact that h is large), and show they have the same sign. (Details omitted!)

37
Finishing the proof Hence, weve constructed a true probability distribution over the odd lattice points, under which f has zero correlation with all monomials of degree at most d. The largest coordinate used is d O(d). This shows that d O(d) -intersectors require PTF degree d ; i.e., M -intersectors require PTF degree Ω(log M / log log M).

38
Talk outline Plan for the talk: 1.Proves log O(d) s PTF upper bound for formulas. 2.Prove Ω(log n / log log n) PTF lower bound for AND-MAJ n.

39
Open questions Does every boolean formula of size s have a PTF of degree O(s) independent of depth? Minsky and Papert showed a Ω(n 1/3 ) PTF lower bound for a certain depth 2 circuit. Can one show a significantly stronger lower bound for any constant depth circuit? Better lower or upper bounds for the intersection of two weighted thresholds? Explore the polynomial sign prescription problem further.

Similar presentations

Presentation is loading. Please wait....

OK

Artificial Intelligence 12. Two Layer ANNs

Artificial Intelligence 12. Two Layer ANNs

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on wildlife and natural vegetation of india Ppt on series and parallel circuits examples Seminar ppt on mind reading computer Ppt on writing book reviews Ppt on different types of dogs Ppt on misuse of science and technology Store window display ppt on ipad Ppt on queen victoria Ppt on acid-base titration animation Ppt on bill gates as a leader