Download presentation

Presentation is loading. Please wait.

Published byJaliyah Paige Modified over 2 years ago

1
Learning Juntas Elchanan Mossel UC Berkeley Ryan O’Donnell MIT Rocco Servedio Harvard

2
What’s a junta? junta: –A council or committee for political or governmental purposes –A group of persons controlling a government –A junto junta: –A Boolean function which depends on only k << n Boolean variables

3
Example: a 3-junta 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1 0 0 1 0 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 1 0 1 1 1 0 1 0 1 0 1 0 1 0 f(x 1,...,x 10 ) = x 3 OR (x 6 AND x 7 ) x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 f(x)

4
Learning juntas The problem: you get data labeled according to some k-junta. What’s the junta? x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 f(x) 1 1 0 1 1 1 1 1 0 1 1 1 1 0 0 1 1 0 0 1 0 0 0 1 0 1 1 1 1 0 0 1 1 0 0 1 0 0 0 1 1 0 0 1 1 1 0 1 0 1 0 1 0 1 0

5
Motivation Warm-ups Our results How we do it Future work Outline of talk

6
Why learn juntas? Natural, general problem (no assumptions on f ) Real-world learning problems often have lots of irrelevant information Important special case of notorious open questions in learning theory: learning DNF, learning decision trees...

7
Learning decision trees x5x5 x3x3 x1x1 x2x2 x1x1 x4x4 100 x6 x6 110 Given data labeled according to some decision tree, what’s the tree? 01

8
Learning decision trees (cont) Any k-junta is expressible as a decision tree of size 2 k So to learn poly(n)-size decision trees, must be able to learn log(n)-juntas. Big open question: are decision trees of size poly(n) learnable in poly(n) time? Similar situation for learning DNF.

9
Learning decision trees (cont) If we can learn log(n)-juntas, can learn decision trees of size log(n)…even this would be a big step forward. So progress on juntas is necessary for progress on decision trees. It’s also sufficient! Again, similar situation for DNF.

10
The problem: PAC learn k- juntas under uniform Setup: we get random examples (x 1,f(x 1 )), (x 2,f(x 2 )),….where –each x i is uniform from {0,1} n –f is an unknown k-junta Goal: output h such that wvhp Pr[h(x) f(x)] < .

11
The problem refined Setup: we get random examples (x 1,f(x 1 )), (x 2,f(x 2 )),….where –each x i is uniform from {0,1} n –f is an unknown k-junta Goal: output h such that Pr[h(x) f(x)] < . Equivalent goal: output h = f Equivalent goal: find k relevant variables of f

12
What’s known? Easy lower bound: need at least 2 k + k log n examples Easy information-theoretic upper bound: 2 k + k log n examples are sufficient Easy computational upper bound: there are ( ) possible sets of relevant variables, so can do exhaustive search in 2 O(k) ( ) = O(n k ) time Can we learn in time poly(n,2 k )? nknk nknk

13
Variant #1: membership queries If learner can make queries, can learn in poly(n,2 k ) time. –Draw random points. If all positive or all negative, done. Otherwise, “walk” from positive point to negative point to identify relevant variable: –Recurse. 1 1 0 1 0 0 1 0 1 ; 1 0 1 0 1 1 0 0 1 1 ; 0 0 1 0 1 0 0 1 0 1 ; 1 0 1 0 1 1 0 1 0 1 ; 1 0 1 0 1 1 0 0 0 1 ; 0

14
Variant #2: monotone functions If junta is monotone, can learn in poly(n,2 k ) time. –If x i is irrelevant, have Pr[f(x) = 1 | x i = 1] = Pr[f(x) = 1 | x i = 0]. –If x i is relevant, have Pr[f(x) = 1 | x i = 1] > Pr[f(x) = 1 | x i = 0]. –Each probability is integer multiple of 1/2 k. –So can test each variable in poly(2 k) time.

15
Variant #3: random functions If junta is random, whp can learn in poly(n,2 k ) time. –If x i is irrelevant, have Pr[f(x) = x i ] = 1/2 for sure. –If x i is relevant, have Pr[Pr[f(x) = x i ] = 1/2] 1/2 k/2. –Each probability is integer multiple of 1/2 k. –So whp can find the relevant variables this way. ~ ~

16
Back to real problem Lower bound: need at least 2 k + k log n examples Upper bound: there are ( ) possible sets of relevant variables, so can do exhaustive search in 2 O(k) ( ) time Can we learn in time poly(n,2 k )? nknk nknk

17
Previous work [Blum & Langley, 1994] suggested problem Little progress until…. [Kalai & Mansour, 2001] gave algorithm that learns in time n k - k 1/2

18
Our result We give an algorithm that learns in time n k ~ ~ where 2.376 is the matrix multiplication exponent. So currently n.704k.

19
The main idea Let g be the hidden k-bit function Look at two different representations for g: –Only weird functions are hard to learn under first representation –Only perverse functions are hard to learn under second representation –No function is both weird and perverse

20
First representation: real polynomial View inputs, outputs as 1/ 1 valued Fact: every Boolean function g: { 1, 1} k { 1, 1} has unique interpolating real polynomial g R (x 1,x 2,….,x k ) –g R coefficients are Fourier coefficients of g –Examples: parity on x 1,x 2,….,x k : polynomial is x 1 x 2 ….x k x 1 AND x 2 : polynomial is (1 + x 1 + x 2 - x 1 x 2 )/2

21
Real polynomials Fourier coefficients measure correlation of g with corresponding parities: E[g(x)x T ] = coefficient of x T in g R So given a set T of variables, can estimate coefficient of x T via sampling –Nonzero only if every variable in T is relevant –Problem: may have to test all sets of up to k variables to find a nonzero coefficient

22
First technical theorem: Let g be a Boolean function on k variables such that g R has nonzero constant term: g R (x) = c 0 c T x T. (s = degree of smallest nontrivial monomial) Then s < 2k/3. |T|>s _ _

23
Second representation: GF 2 polynomial View inputs, outputs as 0/1 valued Fact: every Boolean function g: {0,1} k {0,1} has unique interpolating GF 2 polynomial g 2 (x 1,x 2,…., x k ) Examples: parity on x 1,x 2,….,x k : polynomial is x 1 + x 2 +….+ x k x 1 AND …. AND x k : polynomial is x 1 x 2 ….x k

24
Learning parities Suppose g is some parity function, e.g. g(x)=parity( x 1,x 2, x 4 ) Can add labeled examples mod 2: 0 1 0 1 0 0 1 0 1 ; 0 1 1 1 1 1 0 1 0 1 ; 1 1 0 1 0 1 0 0 0 0 ; 1

25
Learning parities (cont) Given a set of labeled examples, can do Gaussian elimination to obtain Will have b =1 iff x 1 is in parity Repeat for x 2,…,x n to learn parity 1 0 0 0 0 0 0 0 0 ; b

26
Learning GF 2 polynomials Given any g: {0,1} k {0,1}, can view g 2 as parity over monomials (ANDs) If deg(g 2 ) = d, have k d monomials In junta setting, have n d monomials –Problem: d could be k

27
Second technical theorem: Let g be a Boolean function on k variables such that g R has zero constant term: g R (x) = c T x T. Then deg( g 2 ) < k-s. |T|>s _ _

28
Algorithm to learn k-juntas Sample to test whether f is constant If not, sample to estimate Fourier coefficient of all sets of up to k variables –Nonzero coefficient of size m: recurse on all 2 m settings of those variables –All small coefficients zero: run parity-learning algorithm with monomials of size up to (1 )k

29
Why does it work? If f unbalanced, will find nonzero coefficient of size at most 2k/3 < k If f balanced, parity learning algorithm guaranteed to succeed So either way, make progress Take > 2/3.

30
Running time Checking sets of of (up to) k variables takes n k time Running Gaussian elimination on monomials of size (1 )k takes time n k ( = matrix multiplication exponent) So best is

31
What else can we do? Restrictions: Can look at f under “small” restrictions x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 x 9 x 10 0 x 2 1 0 x 5 x 6 x 7 x 8 0 x 10

32
A question Suppose g: { 1, 1} k { 1, 1} is g R (x) = g T x T. Must there be some restriction fixing at most 2k/3 variables such that g( (x)) is a parity function? If yes, can learn k-juntas in time n 2k/3 |T|>k

33
Future work Faster algorithms? Non-binary input alphabets? –(non-binary outputs easy) Non-uniform distributions? –Product distributions? –General distributions?

Similar presentations

OK

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Genitourinary system anatomy and physiology ppt on cells Ppt on endangered species of plants and animals in india Natural disasters for kids ppt on batteries Interactive ppt on classification Ppt on role of construction industry in indian economy Ppt on media research group Ppt on information security management system Ppt on campus recruitment system process Ppt on nitrogen cycle and nitrogen fixation in plants Ppt on viruses and anti viruses name