Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

Slides:



Advertisements
Similar presentations
Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.
Advertisements

On-line learning and Boosting
Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.
Machine Learning Theory Machine Learning Theory Maria Florina Balcan 04/29/10 Plan for today: - problem of “combining expert advice” - course retrospective.
Proactive Learning: Cost- Sensitive Active Learning with Multiple Imperfect Oracles Pinar Donmez and Jaime Carbonell Pinar Donmez and Jaime Carbonell Language.
A general agnostic active learning algorithm
Robust Multi-Kernel Classification of Uncertain and Imbalanced Data
Maria-Florina Balcan Modern Topics in Learning Theory Maria-Florina Balcan 04/19/2006.
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
Chapter 1: Introduction to Pattern Recognition
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Active Learning of Binary Classifiers
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Analysis of greedy active learning Sanjoy Dasgupta UC San Diego.
Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.
Probably Approximately Correct Model (PAC)
Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research.
Instance Based Learning. Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return the answer associated.
Active Learning with Support Vector Machines
Hierarchical Subquery Evaluation for Active Learning on a Graph Oisin Mac Aodha, Neill Campbell, Jan Kautz, Gabriel Brostow CVPR 2014 University College.
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Machine Learning Theory Maria-Florina Balcan Lecture 1, Jan. 12 th 2010.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
Incorporating Unlabeled Data in the Learning Process
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
Machine Learning Theory Maria-Florina (Nina) Balcan Lecture 1, August 23 rd 2011.
Semi-Supervised Learning with Concept Drift using Particle Dynamics applied to Network Intrusion Detection Data Fabricio Breve Institute of Geosciences.
Active Learning for Class Imbalance Problem
Trust-Aware Optimal Crowdsourcing With Budget Constraint Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Submodular Functions Learnability, Structure & Optimization Nick Harvey, UBC CS Maria-Florina Balcan, Georgia Tech.
Coarse sample complexity bounds for active learning Sanjoy Dasgupta UC San Diego.
Universit at Dortmund, LS VIII
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
Semi-supervised Learning on Partially Labeled Imbalanced Data May 16, 2010 Jianjun Xie and Tao Xiong.
Machine Learning.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 5.
1 Limits of Learning-based Signature Generation with Adversaries Shobha Venkataraman, Carnegie Mellon University Avrim Blum, Carnegie Mellon University.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
Confidence Based Autonomy: Policy Learning by Demonstration Manuela M. Veloso Thanks to Sonia Chernova Computer Science Department Carnegie Mellon University.
Maria-Florina Balcan Active Learning Maria Florina Balcan Lecture 26th.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Ensemble Methods in Machine Learning
Maria-Florina Balcan 16/11/2015 Active Learning. Supervised Learning E.g., which s are spam and which are important. E.g., classify objects as chairs.
05/04/07 Using Active Learning to Label Large Corpora Ted Markowitz Pace University CSIS DPS & IBM T. J. Watson Research Ctr.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Hierarchical Sampling for Active Learning Sanjoy Dasgupta and Daniel Hsu University of California, San Diego Session : Active Learning and Experimental.
1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.
Correlation Clustering
Semi-Supervised Clustering
Machine Learning – Classification David Fenyő
Zhipeng (Patrick) Luo December 6th, 2016
Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas
Importance Weighted Active Learning
An Introduction to Support Vector Machines
Pawan Lingras and Cory Butz
Adaboost Team G Youngmin Jun
A general agnostic active learning algorithm
Semi-Supervised Learning
Computational Learning Theory
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Computational Learning Theory
Concave Minimization for Support Vector Machine Classifiers
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Xiao-Yu Zhang, Shupeng Wang, Xiaochun Yun
Classifier-Based Approximate Policy Iteration
Presentation transcript:

Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center, *** : Yahoo! Research Journal of Computer and System Sciences Presented by Yongjin Kwon

Copyright  2010 by CEBT Introduction  Nowadays a plentiful amount of data are cheaply available and are used to find useful patterns or concepts.  Traditional machine learning has concentrated on the problems that require labeled data only. However, labeling is expensive! speech recognition, document classification, etc.  How can we reduce the number of labeled data required? Exploit the abundance of unlabeled data! 2

Copyright  2010 by CEBT Introduction (Cont’d)  Semi-supervised Learning Use a set of unlabeled data under additional assumptions.  Active Learning Ask for labels of “informative” data. 3 Supervised Learning Semi-supervised and Active Learning more informative less informative

Copyright  2010 by CEBT Active Learning  If the machine actively tries to learn some “informative” data, it will perform better with less training! 4 Answer Query “informative” points only. (b) Active Learning One-way teaching (a) Passive Learning Learn something Everything should be prepared!

Copyright  2010 by CEBT Active Learning (Cont’d)  What are “informative” points? If the learner is NOT unsure about the label of a point, then the point will be less informative. 5 less informativemore informative

Copyright  2010 by CEBT Typical Active Learning Approach  Start by querying the labels of a few randomly-chosen points.  Repeat the following process: Determine the decision boundary on current set of labeled points. Choose the next unlabeled point closest to the current decision boundary. (i.e. the most “uncertain” or “informative” point) Query that point and obtain its label. 6 Decision Boundary Binary Classification:

Copyright  2010 by CEBT Improvement in Label Complexity  1-D Binary Classification in the noise-free setting Find the optimal threshold (or classifier). In order to achieve misclassification error ≤ ε, – Supervised Learning : O ( 1/ ε ) labeled examples are needed. – Active Learning : O (log 1/ ε ) labeled examples are needed! Exponential improvement in label complexity!! How general is this phenomenon? 7 Number of label requests to achieve a given accuracy threshold (Binary Search)

Copyright  2010 by CEBT CAL Active Learning  General-purpose learning strategy (in the noise-free setting) 8 Region of uncertainty Binary Classification Rectangular Classifier Ask its label!

Copyright  2010 by CEBT Lebel Complexity of CAL  In realizable (or noise-free) case Label complexity for misclassification error ≤ ε, – Supervised Learning : O ( 1/ ε ) labeled examples – Active Learning : O (log 1/ ε ) labeled examples  In unrealizable (or agnostic) case There is no perfect classifier of any form! A small amount of adversarial noise can make CAL fail to find the ( ε -)optimal classifier! A noise-robust algorithm is needed… 9 Binary Classification Threshold OptimalClassifier

Copyright  2010 by CEBT A Algorithm  General-purpose learning strategy (in the agnostic setting) Do NOT trust answers from the oracle completely. Compare error bounds between classifiers Still uncertain (b) Unrealizable Case Binary Classification Linear Classifier Must be RED! (a) Realizable Case Now it must be RED! Blue Best Classifier? Best Classifier!

Copyright  2010 by CEBT Size of region of uncertainty In my opinion, the paper is wrong at these points. Upper bound of error Lower bound of error A Algorithm (Cont’d)  General-purpose learning strategy (in the agnostic setting) Do NOT trust answers from the oracle completely. Compare error bounds between classifiers. 11 2

Copyright  2010 by CEBT A Algorithm (Cont’d) 12 2 Binary Classification Threshold Error Rates of Classifiers Sampling and Labeling Error Rate Domain Upper Bound Lower Bound min upper bound Remove classifiers such that

Copyright  2010 by CEBT A Algorithm (Cont’d)  Correctness It returns an ε -optimal classifier with high probability.  Fallback Analysis It is never much worse than a standard batch, bound-based algorithm in terms of label complexity.  Improvement in label complexity It achieve great improvement compared to passive learning in some special cases (thresholds, and homogeneous linear sepa-rators under a uniform distribution). 13 2

Copyright  2010 by CEBT Conclusions  A Algorithm First active learning algorithm that finds an ( ε -)optimal classifier in the unrealizable (or agnostic) case It achieves a (near-)exponential improvement in label complexity for several unrealizable settings. It never requires substantially more labeling requests than passive learning. 14 2

Copyright  2010 by CEBT Discussions  This paper shows a theoretical approach of active learning, especially in the unrealizable (or agnostic) case.  It does NOT ensure the improvement in label complexity for any kind of hypothesis class.  The A Algorithm is intended to theoretically extend the power of active learning to the unrealizable case. How can we apply it for practical purposes? 15 2