Download presentation

Presentation is loading. Please wait.

Published byMiguel Doherty Modified over 5 years ago

1
1/15 Agnostically learning halfspaces FOCS 2005

2
2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ ) samples P [f*(x) y] P [h(x) y] · opt + L. Sellie Agnostic learning arbitrary arbitrary dist. over (x,y) 2 X £ {0,1} f* = argmin f 2F P [f(x) y]

3
3/15 n n nn Set X n µ R n, F n class of functions f: X n ! {0,1}. Efficient Agnostic Learner w.h.p. n h: X n ! {0,1} n, poly(n,1/ ) samples Agnostic learning n P [f*(x) y] P [h(x) y] · opt + arbitrary arbitrary dist. over (x,y) 2 X £ {0,1} f* = argmin f 2F P [f(x) y] L. Sellie

4
4/15 arbitrary arbitrary dist. over (x,y) 2 X £ {0,1} f* = argmin f 2F P [f(x) y] n n nn Set X n µ R n, F n class of functions f: X n ! {0,1}. Efficient Agnostic Learner w.h.p. n h: X n ! {0,1} n, poly(n,1/ ) samples Agnostic learning n P [f*(x) y] P [h(x) y] · opt + in PAC model, P [f*(x) y] = 0 L. Sellie

5
5/15 F n = { f(x)= I (w ¢ x ¸ ) | w 2 R n, 2 R }. h: R n ! {0,1} P [h(x) y] · opt + Agnostic learning of halfspaces f* argmin f 2F P [f(x) y] h P [f*(x) y]

6
6/15 F n = { f(x)= I (w ¢ x ¸ ) | w 2 R n, 2 R }. h: R n ! {0,1} P [h(x) y] · opt + Agnostic learning of halfspaces f*h Special case: junctions, e.g., f(x) = x 1 Ç x 3 = I (x 1 + x 3 ¸ 1) Efficient agnostic-learn junctions ) PAC-learn DNF NP-hard to properly agnostic learn P [f*(x) y]

7
7/15 F n = { f(x)= I( w ¢ x ¸ ) | w 2 R n, 2 R }. h: R n ! {0,1} P [h(x) y] · opt + Agnostic learning of halfspaces f* PAC learning halfspaces solved by LP P [f*(x) y]

8
8/15 F n = { f(x)= I (w ¢ x ¸ ) | w 2 R n, 2 R }. h: R n ! {0,1} P [h(x) y] · opt + Agnostic learning of halfspaces hf* PAC learning halfspaces with indep./random noise solved by: P [f*(x) y]

9
9/15 F n = { f(x)= I (w ¢ x ¸ ) | w 2 R n, 2 R }. h: R n ! {0,1} n min f 2F n P [f(x) y] P [h(x) y] · opt + Agnostic learning of halfspaces f*h Equivalently, f*=truth with adversarial noise

10
10/15 Theorem 1 : Our alg. outputs h: R n ! {0,1} with P [h(x) y] · opt +, in time poly(n) ( 8 const ), x 2 R n as long as draws x 2 R n from: Log-concave distribution, e.g.: uniform over convex set, exponential e -|x|, normal Uniform over {-1,1} n or S n-1 = { x 2 R n | |x|=1 } … n O ( -4 ) (w.h.p.)

11
11/15 1. L 1 polynomial regression algorithm Given: d>0, (x 1,y 1 ),…,(x m,y m ) 2 R n £ {0,1} Find deg-d p(x) to minimize: 2 [0,1]h(x) = I (p(x) ¸ ) Pick 2 [0,1] at random, output h(x) = I (p(x) ¸ ) time n O(d) multivariate x y ¼ minimize deg(p) · d E [ |p(x)-y| ] time n O(d) 2. Low-degree Fourier algorithm of Chose, where h(x) = I (p(x) ¸ ½) Output h(x) = I (p(x) ¸ ½) ¼ minimize deg(p) · d E [ (p(x)-y) 2 ] (requires x uniform from {-1,1} n )

12
12/15 time n O(d) 1. L 1 polynomial regression algorithm Given: d>0, (x 1,y 1 ),…,(x m,y m ) 2 R n £ {0,1} Find deg-d p(x) to minimize: 2 [0,1]h(x) = I (p(x) ¸ ) Pick 2 [0,1] at random, output h(x) = I (p(x) ¸ ) multivariate x y 2. Low-degree Fourier algorithm of ¼ minimize deg(p) · d E [ |p(x)-y| ] Chose, where h(x) = I (p(x) ¸ ½) Output h(x) = I (p(x) ¸ ½) ¼ minimize deg(p) · d E [ (p(x)-y) 2 ] (requires x uniform from {-1,1} n ) lemma: algs error · opt + min deg(q) · d E [ |f*(x)-q(x)| ] lemma: algs error · 8(opt + min deg(q) · d E [(f*(x)-q(x)) 2 ]) ·p lemma of : algs error · ½ - (½ - opt) 2 + & Sellie

13
13/15 Approx degree is dimension-free for halfspaces Useful properties of logconcave dists: projection is logconcave, …, q(x) ¼ I (x ¸ 0) degree d=10 q(w ¢ x) ¼ I (w ¢ x ¸ 0) degree d=10

14
14/15 Approximating I(x ¸ ) (1 dimension) Bound min deg(q) · d E [(q(x) – I (x ¸ )) 2 ] Continuous distributions: orthogonal polynomials Normal: Hermite polynomials Logconcave (e -|x| /2 suffices): new polynomials Uniform on sphere: Gegenbauer polynomials Uniform on hypercube: Fourier = E [f(x)g(x)] Hey, Ive used Hermite (pronounced air-meet) polynomials many times.

15
15/15 Theorem 2: junctions (e.g., x 1 Æ x 11 Æ x 17 ) For arbitrary over {0,1} n £ {0,1} the polynomial regression algorithm with d=O(n 1/2 log( 1/ )) (time -O*(n ½ ) ) outputs h with P [h(x) y] · opt + Follows from previous lemmas +

16
16/15 How far can we get in poly(n, 1/ ) time? Assume draws x uniform from: S n-1 = { x 2 R n | |x|=1 } Perceptron algorithm: error · O( p n) opt + We show: simple averaging algorithm of achieves error · O( log(1/opt) ) opt + Assume (x,y) = (1- ) (x,f*(x)) + (arbitrary (x,y)): We get: error · O( n 1/4 log(n/ ) ) + using Rankins second bound uniform 2 S n-1

17
17/15 Half-space conclusions & future work L 1 poly reg: natural extension of Fourier learning Works for non-uniform/arbitrary distributions Tolerates agnostic noise Works on both continuous and discrete problems Future work Work on all distributions (not just logconcave/uniform {-1,1} n ) opt + using poly(n,1/ ) algorithm (we have poly(n) for fixed, and trivial: poly( ) for fixed n) Other interesting classes of functions

Similar presentations

OK

Enumerative Lattice Algorithms in any Norm via M-Ellipsoid Coverings Daniel Dadush (CWI) Joint with Chris Peikert and Santosh Vempala.

Enumerative Lattice Algorithms in any Norm via M-Ellipsoid Coverings Daniel Dadush (CWI) Joint with Chris Peikert and Santosh Vempala.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google