Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR) TexPoint fonts used in EMF. Read the TexPoint manual before.

Similar presentations


Presentation on theme: "Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR) TexPoint fonts used in EMF. Read the TexPoint manual before."— Presentation transcript:

1 Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR) TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A

2

3 10 MillionLotteryCheapPharmacyJunkIs Spam YES NOYESNOSPAM NOYES NOYESNOT SPAM YES SPAM NO YESNOT SPAM YESNOYESNOYESNOT SPAM YES NOYESNOSPAM “10 Millon= yes” and “Lottery=yes” and “Pharmacy=yes” The Spam Problem

4 10 MillionLotteryCheapPharmacyJunkIs Spam YES NOYESNOSPAM NOYES NOYESNOT SPAM YES SPAM NO YES NOT SPAM YESNOYESNOYESNOT SPAM YES NO SPAM If “10 Millon= NO” then Not SPAM Else If “Lottery = No” then Not Spam Else If “Pharmacy = No” then Not Spam Else SPAM The Spam Problem

5 10 MillionLotteryCheapPharmacyJunkIs Spam YES NOYESNOSPAM NOYES NOYESNOT SPAM YES SPAM NO YES NOT SPAM YESNOYESNOYESNOT SPAM YES NO SPAM “Million= YES” + 2 “Lottery=YES”+ “Pharmacy = YES” ≥ 4 The Spam Problem

6 Halfspaces Conjunctions Decision List

7 Unknown distribution D over R n, examples labeled by an unknown function f. + - - - - + + + - - - - + + + + - After receiving examples, algorithm does its computation and outputs hypothesis h. + Accuracy of hypothesis is f h

8 Unknown distribution D over{0,1} n examples labeled by an unknown conjunctions. + - - - - + + + - - - - + + + + - + is easy! Since conjunctions is a special halfspaces, we can use poly-time linear programming to find a halfspace hypothesis consistent with all examples: Well-known theory (VC dimension)  for any D random sample of many examples yields -accurate hypothesis w.h.p.

9 Real-world data probably doesn’t come with guarantee that examples are labeled perfectly according to a conjunction. Linear programming is brittle: noisy examples can easily result in no consistent hypothesis. Motivates study of noisy variants of PAC learning for conjunctions. is easy!…but not very realistic… perfectly labeled ^ + - - - - + + + - - - - + + + + - + - + + -

10 Unknown distribution D over {0,1} n examples labeled by an unknown conjunction function f. All the random examples given to learner: –1- ε fraction of the example is perfectly labeled, i.e.x~D, y = f(x). –ε fraction of the example mislabeled. Goal: To find a good hypothesis that has good accuracy (close to 1- ε? Or just better than 50%?)

11  No Noise: [Val84, Lit88, Hau88]: PAC Learnable  Random Noise: [Kea98]: PAC Learnable under random noise model.

12  For any ε,δ > 0, NP-hard to tell whether ◦ Some conjunction consistent with 1- ε fraction of the data, ◦ No conjunction is ½ + δ consistent with the data. [FGKP06] It is NP-hard to find a 51%-accuracy conjunction even if knowing some conjunction is consistent with 99% of the data.

13  Proper: Given f is in function class C (e.g. conjunctions), learner output a function in class C.  Non-Proper: Given f is in class C (e.g. conjunctions), learner can output function in the class D (e.g. halfspaces).

14  We might still be able to learn conjunctions by outputing larger class of functions (say by linear programming?). ◦ E.g. [Lit88] use the winnow algorithm which output halfspace function.

15  For any ε,δ > 0, NP-hard to tell whether ◦ Some halfspace consistent with 1- ε fraction of the data, ◦ No halfspace is ½ + δ consistent with the data. [FGKP, GR]. It is NP-hard to find a 51%-accuracy halfspaces even if knowing some halfspaces is consistent with 99% of the data.

16  For any ε,δ > 0, NP-hard to tell whether ◦ Some conjunction consistent with 1- ε fraction of the data, ◦ No function in any hypothesis class is ½ + δ consistent with the data.

17  [ABX08]: Showing NP-hardness using black- box reductions for unrestricted-class of improper learning is hard. ◦ It will otherwise break some long-standing cryptographic assumptions: (transformation from any average-case hard problem in NP to a one-way function)

18

19  For any ε,δ > 0, NP-hard to tell whether ◦ Some conjunction consistent with 1- ε fraction of the data, ◦ No halfspaces is ½ + δ consistent with the data. It is NP-hard to find a 51%-accurate halfspaces even if knowing some conjunction is consistent with 99% of the data.

20 In practice, halfspace are at the heart of many learning algorithms:  Perceptron  Winnow  SVM  Logistic Regression  Linear Discriminant Analysis Learning Theory Computational We can not agnostically learn conjunctions using any of the above mentioned algorithm!

21 Halfspaces Conjunctions Decision List Weakly Agnostic learning Conjunctions/Decision Lists/Halfspaces by Halfspaces is hard!

22

23

24 ◦ “Dictator” (halfspaces depending on very few variables  e.g. f(x) = sgn(x 1 )) ◦ “Majority”(no variables has too much weight,  e.g. f(x) = sgn(x 1 +x 2 +x 3 +…+x n ). 24

25 chooses: x 2 {0,1} n, b 2 {0,1} from some distribution. x f(x) Completeness ¸ c $ all (Monomials) f(x) = x i accepted w. prob. ¸ c Soundness · s $ “Majority like function” accepted “w. prob. · s With such a test, we can show NP-hard to tell i) some monomial satisfies c fraction of the data; ii) no halfspaces satisfies more than s fraction of the data. Accept if f(x) = b Tester

26 1) Generate z by setting each z i independently to be random bits. 1) Generate y by resetting each z i to be 0 with probability 0.99. 1) Generating a random bit b and setting x i to be y i + b/2 n. 2) Output (x,b) (Accept if f(x) = sgn(b)).

27 z = 0 0 y= 0 0 x = b/2 n random bit b

28  f(x)= x i ◦ Then  Pr(f(x) =x i =b) > Pr(y i = 0) =0.99  f(x) = sgn ( ) ◦ Then  Pr( f(x) = b) = Pr(sgn (N(0, 0.1) + b /2 n ) =b)< 0.51

29  We prove that even weakly agnostic learning Conjunctions by Halfspace is NP-hard.  To propose a efficient halfspace learning algorithm for conjunctions/decision lists/halfspaces, we need either modeling the distribution of example or the noise.

30  Prove: For any ε,δ > 0, given a set of training examples such that there is a conjunction consistent with 1- ε fraction of the data, it is NP-hard to find a degree d polynomial threshold function that is ½ + δ consistent with the data. Why low degree ptf? Because such a hypothesis can agnostically learn conjunctions/halfspaces under uniform distribution.


Download ppt "Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR) TexPoint fonts used in EMF. Read the TexPoint manual before."

Similar presentations


Ads by Google