1/15 Agnostically learning halfspaces FOCS 2005. 2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )

Slides:

Advertisements

Similar presentations

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST

Advertisements

A Fast PTAS for k-Means Clustering

Path-Sensitive Analysis for Linear Arithmetic and Uninterpreted Functions SAS 2004 Sumit Gulwani George Necula EECS Department University of California,

Optimal Space Lower Bounds for All Frequency Moments David Woodruff MIT

Efficient classification for metric data Lee-Ad GottliebWeizmann Institute Aryeh KontorovichBen Gurion U. Robert KrauthgamerWeizmann Institute TexPoint.

Slide 1 Insert your own content. Slide 2 Insert your own content.

Lower Bounds for Local Search by Quantum Arguments Scott Aaronson.

Pretty-Good Tomography Scott Aaronson MIT. Theres a problem… To do tomography on an entangled state of n qubits, we need exp(n) measurements Does this.

Copyright © 2002 Pearson Education, Inc. 1. Chapter 2 An Introduction to Functions.

Reductions to the Noisy Parity Problem TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A Vitaly Feldman Parikshit.

Computing with Polynomials over Composites Parikshit Gopalan Algorithms, Combinatorics & Optimization. Georgia Tech.

1 A Statistical Analysis of the Precision-Recall Graph Ralf Herbrich Microsoft Research UK Joint work with Hugo Zaragoza and Simon Hill.

Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.

Subspace Embeddings for the L1 norm with Applications Christian Sohler David Woodruff TU Dortmund IBM Almaden.

STATISTICS Univariate Distributions

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)

SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION

MULT. INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

On Sequential Experimental Design for Empirical Model-Building under Interval Error Sergei Zhilin, Altai State University, Barnaul, Russia.

Combinatorial Auctions with Complement-Free Bidders – An Overview Speaker: Michael Schapira Based on joint works with Shahar Dobzinski & Noam Nisan.

Linear-Degree Extractors and the Inapproximability of Max Clique and Chromatic Number David Zuckerman University of Texas at Austin.

LEARNIN HE UNIFORM UNDER DISTRIBUTION – Toward DNF – Ryan ODonnell Microsoft Research January, 2006.

Guy EvenZvi LotkerDana Ron Tel Aviv University Conflict-free colorings of unit disks, squares, & hexagons.

1 Bart Jansen Polynomial Kernels for Hard Problems on Disk Graphs Accepted for presentation at SWAT 2010.

Short seed extractors against quantum storage Amnon Ta-Shma Tel-Aviv University 1.

O X Click on Number next to person for a question.

THE CENTRAL LIMIT THEOREM

Load Balancing Parallel Applications on Heterogeneous Platforms.

Past Tense Probe. Past Tense Probe Past Tense Probe – Practice 1.

Module 16: One-sample t-tests and Confidence Intervals

Addition 1’s to 20.

25 seconds left…...

Test B, 100 Subtraction Facts

Complexity ©D.Moshkovits 1 Where Can We Draw The Line? On the Hardness of Satisfiability Problems.

Shortest Vector In A Lattice is NP-Hard to approximate

11 = This is the fact family. You say: 8+3=11 and 3+8=11

2 4 Theorem:Proof: What shall we do for an undirected graph?

Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)

O X Click on Number next to person for a question.

Chapter 11 Limitations of Algorithm Power Copyright © 2007 Pearson Addison-Wesley. All rights reserved.

The Fundamental Theorem of Algebra The Fundamental Theorem of Algebra

1 ECE 776 Project Information-theoretic Approaches for Sensor Selection and Placement in Sensor Networks for Target Localization and Tracking Renita Machado.

Restriction Access, Population Recovery & Partial Identification Avi Wigderson IAS, Princeton Joint with Zeev Dvir Anup Rao Amir Yehudayoff.

Minimum Vertex Cover in Rectangle Graphs

Dana Moshkovitz MIT Joint work with Subhash Khot, NYU 1.

An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution Jeffrey C. Jackson Presented By: Eitan Yaakobi Tamar.

Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua On Agnostic Boosting and Parity Learning.

Batch online learning Toyota Technological Institute (TTI)transductive [Littlestone89] i.i.d.i.i.d. Sham KakadeAdam Kalai.

Robust Network Design with Exponential Scenarios By: Rohit Khandekar Guy Kortsarz Vahab Mirrokni Mohammad Salavatipour.

Dasgupta, Kalai & Monteleoni COLT 2005 Analysis of perceptron-based active learning Sanjoy Dasgupta, UCSD Adam Tauman Kalai, TTI-Chicago Claire Monteleoni,

Dana Moshkovitz, MIT Joint work with Subhash Khot, NYU.

Correlation testing for affine invariant properties on Shachar Lovett Institute for Advanced Study Joint with Hamed Hatami (McGill)

Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami.

Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech.

Potential-Based Agnostic Boosting Varun Kanade Harvard University (joint work with Adam Tauman Kalai (Microsoft NE))

Learning and smoothed analysis Adam Kalai Microsoft Research Cambridge, MA Shang-Hua Teng* University of Southern California Alex Samorodnitsky* Hebrew.

1 On Completing Latin Squares Iman Hajirasouliha Joint work with Hossein Jowhari, Ravi Kumar, and Ravi Sundaram.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Harmonic Analysis in Learning Theory Jeff Jackson Duquesne University.

Vitaly Feldman and Jan Vondrâk IBM Research - Almaden

Introduction to Machine Learning

Generalization and adaptivity in stochastic convex optimization

Basic Algorithms Christina Gallner

Background: Lattices and the Learning-with-Errors problem

CSCI B609: “Foundations of Data Science”

CSCI B609: “Foundations of Data Science”

Presentation transcript:

1/15 Agnostically learning halfspaces FOCS 2005

2/15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ ) samples P [f*(x) y] P [h(x) y] · opt + L. Sellie Agnostic learning arbitrary arbitrary dist. over (x,y) 2 X £ {0,1} f* = argmin f 2F P [f(x) y]

3/15 n n nn Set X n µ R n, F n class of functions f: X n ! {0,1}. Efficient Agnostic Learner w.h.p. n h: X n ! {0,1} n, poly(n,1/ ) samples Agnostic learning n P [f*(x) y] P [h(x) y] · opt + arbitrary arbitrary dist. over (x,y) 2 X £ {0,1} f* = argmin f 2F P [f(x) y] L. Sellie

4/15 arbitrary arbitrary dist. over (x,y) 2 X £ {0,1} f* = argmin f 2F P [f(x) y] n n nn Set X n µ R n, F n class of functions f: X n ! {0,1}. Efficient Agnostic Learner w.h.p. n h: X n ! {0,1} n, poly(n,1/ ) samples Agnostic learning n P [f*(x) y] P [h(x) y] · opt + in PAC model, P [f*(x) y] = 0 L. Sellie

5/15 F n = { f(x)= I (w ¢ x ¸ ) | w 2 R n, 2 R }. h: R n ! {0,1} P [h(x) y] · opt + Agnostic learning of halfspaces f* argmin f 2F P [f(x) y] h P [f*(x) y]

6/15 F n = { f(x)= I (w ¢ x ¸ ) | w 2 R n, 2 R }. h: R n ! {0,1} P [h(x) y] · opt + Agnostic learning of halfspaces f*h Special case: junctions, e.g., f(x) = x 1 Ç x 3 = I (x 1 + x 3 ¸ 1) Efficient agnostic-learn junctions ) PAC-learn DNF NP-hard to properly agnostic learn P [f*(x) y]

7/15 F n = { f(x)= I( w ¢ x ¸ ) | w 2 R n, 2 R }. h: R n ! {0,1} P [h(x) y] · opt + Agnostic learning of halfspaces f* PAC learning halfspaces solved by LP P [f*(x) y]

8/15 F n = { f(x)= I (w ¢ x ¸ ) | w 2 R n, 2 R }. h: R n ! {0,1} P [h(x) y] · opt + Agnostic learning of halfspaces hf* PAC learning halfspaces with indep./random noise solved by: P [f*(x) y]

9/15 F n = { f(x)= I (w ¢ x ¸ ) | w 2 R n, 2 R }. h: R n ! {0,1} n min f 2F n P [f(x) y] P [h(x) y] · opt + Agnostic learning of halfspaces f*h Equivalently, f*=truth with adversarial noise

10/15 Theorem 1 : Our alg. outputs h: R n ! {0,1} with P [h(x) y] · opt +, in time poly(n) ( 8 const ), x 2 R n as long as draws x 2 R n from: Log-concave distribution, e.g.: uniform over convex set, exponential e -|x|, normal Uniform over {-1,1} n or S n-1 = { x 2 R n | |x|=1 } … n O ( -4 ) (w.h.p.)

11/15 1. L 1 polynomial regression algorithm Given: d>0, (x 1,y 1 ),…,(x m,y m ) 2 R n £ {0,1} Find deg-d p(x) to minimize: 2 [0,1]h(x) = I (p(x) ¸ ) Pick 2 [0,1] at random, output h(x) = I (p(x) ¸ ) time n O(d) multivariate x y ¼ minimize deg(p) · d E [ |p(x)-y| ] time n O(d) 2. Low-degree Fourier algorithm of Chose, where h(x) = I (p(x) ¸ ½) Output h(x) = I (p(x) ¸ ½) ¼ minimize deg(p) · d E [ (p(x)-y) 2 ] (requires x uniform from {-1,1} n )

12/15 time n O(d) 1. L 1 polynomial regression algorithm Given: d>0, (x 1,y 1 ),…,(x m,y m ) 2 R n £ {0,1} Find deg-d p(x) to minimize: 2 [0,1]h(x) = I (p(x) ¸ ) Pick 2 [0,1] at random, output h(x) = I (p(x) ¸ ) multivariate x y 2. Low-degree Fourier algorithm of ¼ minimize deg(p) · d E [ |p(x)-y| ] Chose, where h(x) = I (p(x) ¸ ½) Output h(x) = I (p(x) ¸ ½) ¼ minimize deg(p) · d E [ (p(x)-y) 2 ] (requires x uniform from {-1,1} n ) lemma: algs error · opt + min deg(q) · d E [ |f*(x)-q(x)| ] lemma: algs error · 8(opt + min deg(q) · d E [(f*(x)-q(x)) 2 ]) ·p lemma of : algs error · ½ - (½ - opt) 2 + & Sellie

13/15 Approx degree is dimension-free for halfspaces Useful properties of logconcave dists: projection is logconcave, …, q(x) ¼ I (x ¸ 0) degree d=10 q(w ¢ x) ¼ I (w ¢ x ¸ 0) degree d=10

14/15 Approximating I(x ¸ ) (1 dimension) Bound min deg(q) · d E [(q(x) – I (x ¸ )) 2 ] Continuous distributions: orthogonal polynomials Normal: Hermite polynomials Logconcave (e -|x| /2 suffices): new polynomials Uniform on sphere: Gegenbauer polynomials Uniform on hypercube: Fourier = E [f(x)g(x)] Hey, Ive used Hermite (pronounced air-meet) polynomials many times.

15/15 Theorem 2: junctions (e.g., x 1 Æ x 11 Æ x 17 ) For arbitrary over {0,1} n £ {0,1} the polynomial regression algorithm with d=O(n 1/2 log( 1/ )) (time -O*(n ½ ) ) outputs h with P [h(x) y] · opt + Follows from previous lemmas +

16/15 How far can we get in poly(n, 1/ ) time? Assume draws x uniform from: S n-1 = { x 2 R n | |x|=1 } Perceptron algorithm: error · O( p n) opt + We show: simple averaging algorithm of achieves error · O( log(1/opt) ) opt + Assume (x,y) = (1- ) (x,f*(x)) + (arbitrary (x,y)): We get: error · O( n 1/4 log(n/ ) ) + using Rankins second bound uniform 2 S n-1

17/15 Half-space conclusions & future work L 1 poly reg: natural extension of Fourier learning Works for non-uniform/arbitrary distributions Tolerates agnostic noise Works on both continuous and discrete problems Future work Work on all distributions (not just logconcave/uniform {-1,1} n ) opt + using poly(n,1/ ) algorithm (we have poly(n) for fixed, and trivial: poly( ) for fixed n) Other interesting classes of functions