Presentation is loading. Please wait.

Presentation is loading. Please wait.

Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Similar presentations


Presentation on theme: "Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,"— Presentation transcript:

1 Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center, *** : Yahoo! Research Journal of Computer and System Sciences 2009 2010-10-08 Presented by Yongjin Kwon

2 Copyright  2010 by CEBT Introduction  Nowadays a plentiful amount of data are cheaply available and are used to find useful patterns or concepts.  Traditional machine learning has concentrated on the problems that require labeled data only. However, labeling is expensive! speech recognition, document classification, etc.  How can we reduce the number of labeled data required? Exploit the abundance of unlabeled data! 2

3 Copyright  2010 by CEBT Introduction (Cont’d)  Semi-supervised Learning Use a set of unlabeled data under additional assumptions.  Active Learning Ask for labels of “informative” data. 3 Supervised Learning Semi-supervised and Active Learning more informative less informative

4 Copyright  2010 by CEBT Active Learning  If the machine actively tries to learn some “informative” data, it will perform better with less training! 4 Answer Query “informative” points only. (b) Active Learning One-way teaching (a) Passive Learning Learn something Everything should be prepared!

5 Copyright  2010 by CEBT Active Learning (Cont’d)  What are “informative” points? If the learner is NOT unsure about the label of a point, then the point will be less informative. 5 less informativemore informative

6 Copyright  2010 by CEBT Typical Active Learning Approach  Start by querying the labels of a few randomly-chosen points.  Repeat the following process: Determine the decision boundary on current set of labeled points. Choose the next unlabeled point closest to the current decision boundary. (i.e. the most “uncertain” or “informative” point) Query that point and obtain its label. 6 Decision Boundary Binary Classification:

7 Copyright  2010 by CEBT Improvement in Label Complexity  1-D Binary Classification in the noise-free setting Find the optimal threshold (or classifier). In order to achieve misclassification error ≤ ε, – Supervised Learning : O ( 1/ ε ) labeled examples are needed. – Active Learning : O (log 1/ ε ) labeled examples are needed! Exponential improvement in label complexity!! How general is this phenomenon? 7 Number of label requests to achieve a given accuracy threshold +++- - - (Binary Search)

8 Copyright  2010 by CEBT CAL Active Learning  General-purpose learning strategy (in the noise-free setting) 8 Region of uncertainty Binary Classification Rectangular Classifier Ask its label!

9 Copyright  2010 by CEBT Lebel Complexity of CAL  In realizable (or noise-free) case Label complexity for misclassification error ≤ ε, – Supervised Learning : O ( 1/ ε ) labeled examples – Active Learning : O (log 1/ ε ) labeled examples  In unrealizable (or agnostic) case There is no perfect classifier of any form! A small amount of adversarial noise can make CAL fail to find the ( ε -)optimal classifier! A noise-robust algorithm is needed… 9 Binary Classification Threshold OptimalClassifier

10 Copyright  2010 by CEBT A Algorithm  General-purpose learning strategy (in the agnostic setting) Do NOT trust answers from the oracle completely. Compare error bounds between classifiers. 10 2 Still uncertain (b) Unrealizable Case Binary Classification Linear Classifier Must be RED! (a) Realizable Case Now it must be RED! Blue Best Classifier? Best Classifier!

11 Copyright  2010 by CEBT Size of region of uncertainty In my opinion, the paper is wrong at these points. Upper bound of error Lower bound of error A Algorithm (Cont’d)  General-purpose learning strategy (in the agnostic setting) Do NOT trust answers from the oracle completely. Compare error bounds between classifiers. 11 2

12 Copyright  2010 by CEBT A Algorithm (Cont’d) 12 2 Binary Classification Threshold Error Rates of Classifiers Sampling and Labeling Error Rate Domain Upper Bound Lower Bound min upper bound Remove classifiers such that

13 Copyright  2010 by CEBT A Algorithm (Cont’d)  Correctness It returns an ε -optimal classifier with high probability.  Fallback Analysis It is never much worse than a standard batch, bound-based algorithm in terms of label complexity.  Improvement in label complexity It achieve great improvement compared to passive learning in some special cases (thresholds, and homogeneous linear sepa-rators under a uniform distribution). 13 2

14 Copyright  2010 by CEBT Conclusions  A Algorithm First active learning algorithm that finds an ( ε -)optimal classifier in the unrealizable (or agnostic) case It achieves a (near-)exponential improvement in label complexity for several unrealizable settings. It never requires substantially more labeling requests than passive learning. 14 2

15 Copyright  2010 by CEBT Discussions  This paper shows a theoretical approach of active learning, especially in the unrealizable (or agnostic) case.  It does NOT ensure the improvement in label complexity for any kind of hypothesis class.  The A Algorithm is intended to theoretically extend the power of active learning to the unrealizable case. How can we apply it for practical purposes? 15 2


Download ppt "Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,"

Similar presentations


Ads by Google