Download presentation

Presentation is loading. Please wait.

Published byMiguel Doherty Modified over 2 years ago

1
1/15 Agnostically learning halfspaces FOCS 2005

2
2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ ) samples P [f*(x) y] P [h(x) y] · opt + L. Sellie Agnostic learning arbitrary arbitrary dist. over (x,y) 2 X £ {0,1} f* = argmin f 2F P [f(x) y]

3
3/15 n n nn Set X n µ R n, F n class of functions f: X n ! {0,1}. Efficient Agnostic Learner w.h.p. n h: X n ! {0,1} n, poly(n,1/ ) samples Agnostic learning n P [f*(x) y] P [h(x) y] · opt + arbitrary arbitrary dist. over (x,y) 2 X £ {0,1} f* = argmin f 2F P [f(x) y] L. Sellie

4
4/15 arbitrary arbitrary dist. over (x,y) 2 X £ {0,1} f* = argmin f 2F P [f(x) y] n n nn Set X n µ R n, F n class of functions f: X n ! {0,1}. Efficient Agnostic Learner w.h.p. n h: X n ! {0,1} n, poly(n,1/ ) samples Agnostic learning n P [f*(x) y] P [h(x) y] · opt + in PAC model, P [f*(x) y] = 0 L. Sellie

5
5/15 F n = { f(x)= I (w ¢ x ¸ ) | w 2 R n, 2 R }. h: R n ! {0,1} P [h(x) y] · opt + Agnostic learning of halfspaces f* argmin f 2F P [f(x) y] h P [f*(x) y]

6
6/15 F n = { f(x)= I (w ¢ x ¸ ) | w 2 R n, 2 R }. h: R n ! {0,1} P [h(x) y] · opt + Agnostic learning of halfspaces f*h Special case: junctions, e.g., f(x) = x 1 Ç x 3 = I (x 1 + x 3 ¸ 1) Efficient agnostic-learn junctions ) PAC-learn DNF NP-hard to properly agnostic learn P [f*(x) y]

7
7/15 F n = { f(x)= I( w ¢ x ¸ ) | w 2 R n, 2 R }. h: R n ! {0,1} P [h(x) y] · opt + Agnostic learning of halfspaces f* PAC learning halfspaces solved by LP P [f*(x) y]

8
8/15 F n = { f(x)= I (w ¢ x ¸ ) | w 2 R n, 2 R }. h: R n ! {0,1} P [h(x) y] · opt + Agnostic learning of halfspaces hf* PAC learning halfspaces with indep./random noise solved by: P [f*(x) y]

9
9/15 F n = { f(x)= I (w ¢ x ¸ ) | w 2 R n, 2 R }. h: R n ! {0,1} n min f 2F n P [f(x) y] P [h(x) y] · opt + Agnostic learning of halfspaces f*h Equivalently, f*=truth with adversarial noise

10
10/15 Theorem 1 : Our alg. outputs h: R n ! {0,1} with P [h(x) y] · opt +, in time poly(n) ( 8 const ), x 2 R n as long as draws x 2 R n from: Log-concave distribution, e.g.: uniform over convex set, exponential e -|x|, normal Uniform over {-1,1} n or S n-1 = { x 2 R n | |x|=1 } … n O ( -4 ) (w.h.p.)

11
11/15 1. L 1 polynomial regression algorithm Given: d>0, (x 1,y 1 ),…,(x m,y m ) 2 R n £ {0,1} Find deg-d p(x) to minimize: 2 [0,1]h(x) = I (p(x) ¸ ) Pick 2 [0,1] at random, output h(x) = I (p(x) ¸ ) time n O(d) multivariate x y ¼ minimize deg(p) · d E [ |p(x)-y| ] time n O(d) 2. Low-degree Fourier algorithm of Chose, where h(x) = I (p(x) ¸ ½) Output h(x) = I (p(x) ¸ ½) ¼ minimize deg(p) · d E [ (p(x)-y) 2 ] (requires x uniform from {-1,1} n )

12
12/15 time n O(d) 1. L 1 polynomial regression algorithm Given: d>0, (x 1,y 1 ),…,(x m,y m ) 2 R n £ {0,1} Find deg-d p(x) to minimize: 2 [0,1]h(x) = I (p(x) ¸ ) Pick 2 [0,1] at random, output h(x) = I (p(x) ¸ ) multivariate x y 2. Low-degree Fourier algorithm of ¼ minimize deg(p) · d E [ |p(x)-y| ] Chose, where h(x) = I (p(x) ¸ ½) Output h(x) = I (p(x) ¸ ½) ¼ minimize deg(p) · d E [ (p(x)-y) 2 ] (requires x uniform from {-1,1} n ) lemma: algs error · opt + min deg(q) · d E [ |f*(x)-q(x)| ] lemma: algs error · 8(opt + min deg(q) · d E [(f*(x)-q(x)) 2 ]) ·p lemma of : algs error · ½ - (½ - opt) 2 + & Sellie

13
13/15 Approx degree is dimension-free for halfspaces Useful properties of logconcave dists: projection is logconcave, …, q(x) ¼ I (x ¸ 0) degree d=10 q(w ¢ x) ¼ I (w ¢ x ¸ 0) degree d=10

14
14/15 Approximating I(x ¸ ) (1 dimension) Bound min deg(q) · d E [(q(x) – I (x ¸ )) 2 ] Continuous distributions: orthogonal polynomials Normal: Hermite polynomials Logconcave (e -|x| /2 suffices): new polynomials Uniform on sphere: Gegenbauer polynomials Uniform on hypercube: Fourier = E [f(x)g(x)] Hey, Ive used Hermite (pronounced air-meet) polynomials many times.

15
15/15 Theorem 2: junctions (e.g., x 1 Æ x 11 Æ x 17 ) For arbitrary over {0,1} n £ {0,1} the polynomial regression algorithm with d=O(n 1/2 log( 1/ )) (time -O*(n ½ ) ) outputs h with P [h(x) y] · opt + Follows from previous lemmas +

16
16/15 How far can we get in poly(n, 1/ ) time? Assume draws x uniform from: S n-1 = { x 2 R n | |x|=1 } Perceptron algorithm: error · O( p n) opt + We show: simple averaging algorithm of achieves error · O( log(1/opt) ) opt + Assume (x,y) = (1- ) (x,f*(x)) + (arbitrary (x,y)): We get: error · O( n 1/4 log(n/ ) ) + using Rankins second bound uniform 2 S n-1

17
17/15 Half-space conclusions & future work L 1 poly reg: natural extension of Fourier learning Works for non-uniform/arbitrary distributions Tolerates agnostic noise Works on both continuous and discrete problems Future work Work on all distributions (not just logconcave/uniform {-1,1} n ) opt + using poly(n,1/ ) algorithm (we have poly(n) for fixed, and trivial: poly( ) for fixed n) Other interesting classes of functions

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google