Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR) TexPoint fonts used in EMF. Read the TexPoint manual before.

Slides:



Advertisements
Similar presentations
Computational Learning Theory
Advertisements

1/15 Agnostically learning halfspaces FOCS /15 Set X, F class of functions f: X! {0,1}. Efficient Agnostic Learner w.h.p. h: X! {0,1} poly(1/ )
Quantum Lower Bound for the Collision Problem Scott Aaronson 1/10/2002 quant-ph/ I was born at the Big Bang. Cool! We have the same birthday.
Reductions to the Noisy Parity Problem TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A Vitaly Feldman Parikshit.
Hardness of Reconstructing Multivariate Polynomials. Parikshit Gopalan U. Washington Parikshit Gopalan U. Washington Subhash Khot NYU/Gatech Rishi Saket.
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Department of Computer Science & Engineering
Learning Juntas Elchanan Mossel UC Berkeley Ryan O’Donnell MIT Rocco Servedio Harvard.
3-Query Dictator Testing Ryan O’Donnell Carnegie Mellon University joint work with Yi Wu TexPoint fonts used in EMF. Read the TexPoint manual before you.
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.
Adam Tauman Kalai, Georgia Tech. Yishay Mansour, Google and Tel-Aviv Elad Verbin, Tsinghua On Agnostic Boosting and Parity Learning.
A general agnostic active learning algorithm
A 3-Query PCP over integers a.k.a Solving Sparse Linear Systems Prasad Raghavendra Venkatesan Guruswami.
Dictator tests and Hardness of approximating Max-Cut-Gain Ryan O’Donnell Carnegie Mellon (includes joint work with Subhash Khot of Georgia Tech)
Introduction to Approximation Algorithms Lecture 12: Mar 1.
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA A.
Yi Wu (CMU) Joint work with Parikshit Gopalan (MSR SVC) Ryan O’Donnell (CMU) David Zuckerman (UT Austin) Pseudorandom Generators for Halfspaces TexPoint.
Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR)
A Parallel Repetition Theorem for Any Interactive Argument Iftach Haitner Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before.
Learning, testing, and approximating halfspaces Rocco Servedio Columbia University DIMACS-RUTCOR Jan 2009.
Computational Learning Theory
2D1431 Machine Learning Boosting.
Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.
The Computational Complexity of Searching for Predictive Hypotheses Shai Ben-David Computer Science Dept. Technion.
Probably Approximately Correct Model (PAC)
Computational Learning Theory; The Tradeoff between Computational Complexity and Statistical Soundness Shai Ben-David CS Department, Cornell and Technion,
Exact Learning of Boolean Functions with Queries Lisa Hellerstein Polytechnic University Brooklyn, NY AMS Short Course on Statistical Learning Theory,
1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
1 On The Learning Power of Evolution Vitaly Feldman.
The Apparent Tradeoff between Computational Complexity and Generalization of Learning: A Biased Survey of our Current Knowledge Shai Ben-David Technion.
Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Ilias Diakonikolas UC Berkeley Costis Daskalakis.
1 Slides by Asaf Shapira & Michael Lewin & Boaz Klartag & Oded Schwartz. Adapted from things beyond us.
Part I: Classification and Bayesian Learning
Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.
CSSE463: Image Recognition Day 27 This week This week Last night: k-means lab due. Last night: k-means lab due. Today: Classification by “boosting” Today:
Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.
Credentials Our understanding of this topic is based on the Work of many researchers. In particular: Rosa Arriaga Peter Bartlett Avrim Blum Bhaskar DasGupta.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Hardness of Learning Halfspaces with Noise Prasad Raghavendra Advisor Venkatesan Guruswami.
Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
1 CS546: Machine Learning and Natural Language Discriminative vs Generative Classifiers This lecture is based on (Ng & Jordan, 02) paper and some slides.
Potential-Based Agnostic Boosting Varun Kanade Harvard University (joint work with Adam Tauman Kalai (Microsoft NE))
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Manipulating the Quota in Weighted Voting Games (M. Zuckerman, P. Faliszewski, Y. Bachrach, and E. Elkind) ‏ Presented by: Sen Li Software Technologies.
1/19 Minimizing weighted completion time with precedence constraints Nikhil Bansal (IBM) Subhash Khot (NYU)
CS 188: Artificial Intelligence Spring 2006 Lecture 12: Learning Theory 2/23/2006 Dan Klein – UC Berkeley Many slides from either Stuart Russell or Andrew.
Maria-Florina Balcan 16/11/2015 Active Learning. Supervised Learning E.g., which s are spam and which are important. E.g., classify objects as chairs.
Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
Iftach Haitner and Eran Omri Coin Flipping with Constant Bias Implies One-Way Functions TexPoint fonts used in EMF. Read the TexPoint manual before you.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
COMPLEXITY. Satisfiability(SAT) problem Conjunctive normal form(CNF): Let S be a Boolean expression in CNF. That is, S is the product(and) of several.
Page 1 CS 546 Machine Learning in NLP Review 1: Supervised Learning, Binary Classifiers Dan Roth Department of Computer Science University of Illinois.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
HW 2.
Dan Roth Department of Computer and Information Science
Dana Ron Tel Aviv University
Computational Learning Theory
Introduction to Machine Learning
Computational Learning Theory
Learning, testing, and approximating halfspaces
Computational Learning Theory
Computational Learning Theory Eric Xing Lecture 5, August 13, 2010
Seth Neel University of Pennsylvania EC 2018: MD4SG Workshop
Presentation transcript:

Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR) TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A

10 MillionLotteryCheapPharmacyJunkIs Spam YES NOYESNOSPAM NOYES NOYESNOT SPAM YES SPAM NO YESNOT SPAM YESNOYESNOYESNOT SPAM YES NOYESNOSPAM “10 Millon= yes” and “Lottery=yes” and “Pharmacy=yes” The Spam Problem

10 MillionLotteryCheapPharmacyJunkIs Spam YES NOYESNOSPAM NOYES NOYESNOT SPAM YES SPAM NO YES NOT SPAM YESNOYESNOYESNOT SPAM YES NO SPAM If “10 Millon= NO” then Not SPAM Else If “Lottery = No” then Not Spam Else If “Pharmacy = No” then Not Spam Else SPAM The Spam Problem

10 MillionLotteryCheapPharmacyJunkIs Spam YES NOYESNOSPAM NOYES NOYESNOT SPAM YES SPAM NO YES NOT SPAM YESNOYESNOYESNOT SPAM YES NO SPAM “Million= YES” + 2 “Lottery=YES”+ “Pharmacy = YES” ≥ 4 The Spam Problem

Halfspaces Conjunctions Decision List

Unknown distribution D over R n, examples labeled by an unknown function f After receiving examples, algorithm does its computation and outputs hypothesis h. + Accuracy of hypothesis is f h

Unknown distribution D over{0,1} n examples labeled by an unknown conjunctions is easy! Since conjunctions is a special halfspaces, we can use poly-time linear programming to find a halfspace hypothesis consistent with all examples: Well-known theory (VC dimension)  for any D random sample of many examples yields -accurate hypothesis w.h.p.

Real-world data probably doesn’t come with guarantee that examples are labeled perfectly according to a conjunction. Linear programming is brittle: noisy examples can easily result in no consistent hypothesis. Motivates study of noisy variants of PAC learning for conjunctions. is easy!…but not very realistic… perfectly labeled ^

Unknown distribution D over {0,1} n examples labeled by an unknown conjunction function f. All the random examples given to learner: –1- ε fraction of the example is perfectly labeled, i.e.x~D, y = f(x). –ε fraction of the example mislabeled. Goal: To find a good hypothesis that has good accuracy (close to 1- ε? Or just better than 50%?)

 No Noise: [Val84, Lit88, Hau88]: PAC Learnable  Random Noise: [Kea98]: PAC Learnable under random noise model.

 For any ε,δ > 0, NP-hard to tell whether ◦ Some conjunction consistent with 1- ε fraction of the data, ◦ No conjunction is ½ + δ consistent with the data. [FGKP06] It is NP-hard to find a 51%-accuracy conjunction even if knowing some conjunction is consistent with 99% of the data.

 Proper: Given f is in function class C (e.g. conjunctions), learner output a function in class C.  Non-Proper: Given f is in class C (e.g. conjunctions), learner can output function in the class D (e.g. halfspaces).

 We might still be able to learn conjunctions by outputing larger class of functions (say by linear programming?). ◦ E.g. [Lit88] use the winnow algorithm which output halfspace function.

 For any ε,δ > 0, NP-hard to tell whether ◦ Some halfspace consistent with 1- ε fraction of the data, ◦ No halfspace is ½ + δ consistent with the data. [FGKP, GR]. It is NP-hard to find a 51%-accuracy halfspaces even if knowing some halfspaces is consistent with 99% of the data.

 For any ε,δ > 0, NP-hard to tell whether ◦ Some conjunction consistent with 1- ε fraction of the data, ◦ No function in any hypothesis class is ½ + δ consistent with the data.

 [ABX08]: Showing NP-hardness using black- box reductions for unrestricted-class of improper learning is hard. ◦ It will otherwise break some long-standing cryptographic assumptions: (transformation from any average-case hard problem in NP to a one-way function)

 For any ε,δ > 0, NP-hard to tell whether ◦ Some conjunction consistent with 1- ε fraction of the data, ◦ No halfspaces is ½ + δ consistent with the data. It is NP-hard to find a 51%-accurate halfspaces even if knowing some conjunction is consistent with 99% of the data.

In practice, halfspace are at the heart of many learning algorithms:  Perceptron  Winnow  SVM  Logistic Regression  Linear Discriminant Analysis Learning Theory Computational We can not agnostically learn conjunctions using any of the above mentioned algorithm!

Halfspaces Conjunctions Decision List Weakly Agnostic learning Conjunctions/Decision Lists/Halfspaces by Halfspaces is hard!

◦ “Dictator” (halfspaces depending on very few variables  e.g. f(x) = sgn(x 1 )) ◦ “Majority”(no variables has too much weight,  e.g. f(x) = sgn(x 1 +x 2 +x 3 +…+x n ). 24

chooses: x 2 {0,1} n, b 2 {0,1} from some distribution. x f(x) Completeness ¸ c $ all (Monomials) f(x) = x i accepted w. prob. ¸ c Soundness · s $ “Majority like function” accepted “w. prob. · s With such a test, we can show NP-hard to tell i) some monomial satisfies c fraction of the data; ii) no halfspaces satisfies more than s fraction of the data. Accept if f(x) = b Tester

1) Generate z by setting each z i independently to be random bits. 1) Generate y by resetting each z i to be 0 with probability ) Generating a random bit b and setting x i to be y i + b/2 n. 2) Output (x,b) (Accept if f(x) = sgn(b)).

z = 0 0 y= 0 0 x = b/2 n random bit b

 f(x)= x i ◦ Then  Pr(f(x) =x i =b) > Pr(y i = 0) =0.99  f(x) = sgn ( ) ◦ Then  Pr( f(x) = b) = Pr(sgn (N(0, 0.1) + b /2 n ) =b)< 0.51

 We prove that even weakly agnostic learning Conjunctions by Halfspace is NP-hard.  To propose a efficient halfspace learning algorithm for conjunctions/decision lists/halfspaces, we need either modeling the distribution of example or the noise.

 Prove: For any ε,δ > 0, given a set of training examples such that there is a conjunction consistent with 1- ε fraction of the data, it is NP-hard to find a degree d polynomial threshold function that is ½ + δ consistent with the data. Why low degree ptf? Because such a hypothesis can agnostically learn conjunctions/halfspaces under uniform distribution.