Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.

Slides:



Advertisements
Similar presentations
Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise.
Advertisements

Florida International University COP 4770 Introduction of Weka.
1 Semi-supervised learning for protein classification Brian R. King Chittibabu Guda, Ph.D. Department of Computer Science University at Albany, SUNY Gen*NY*sis.
1 Kshitij Judah, Alan Fern, Tom Dietterich TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: UAI-2012 Catalina Island,
Active Learning and Experimental Design
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Multiple Instance Learning
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Assuming normally distributed data! Naïve Bayes Classifier.
Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.
An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.
ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.
Text Classification from Labeled and Unlabeled Documents using EM Kamal Nigam Andrew K. McCallum Sebastian Thrun Tom Mitchell Machine Learning (2000) Presented.
Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.
Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Presented by Zeehasham Rasheed
Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at
DUAL STRATEGY ACTIVE LEARNING presenter: Pinar Donmez 1 Joint work with Jaime G. Carbonell 1 & Paul N. Bennett 2 1 Language Technologies Institute, Carnegie.
Holistic Web Page Classification William W. Cohen Center for Automated Learning and Discovery (CALD) Carnegie-Mellon University.
Active Learning Strategies for Compound Screening Megon Walker 1 and Simon Kasif 1,2 1 Bioinformatics Program, Boston University 2 Department of Biomedical.
Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.
Distributed Representations of Sentences and Documents
Using Error-Correcting Codes for Efficient Text Categorization with a Large Number of Categories Rayid Ghani Center for Automated Learning & Discovery.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Online Learning Algorithms
1 Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling by Pinar Donmez, Jaime Carbonell, Jeff Schneider School of Computer Science,
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Active Learning for Class Imbalance Problem
Bayesian Networks. Male brain wiring Female brain wiring.
Text Classification, Active/Interactive learning.
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
Active Learning An example From Xu et al., “Training SpamAssassin with Active Semi- Supervised Learning”
Universit at Dortmund, LS VIII
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.
Source-Selection-Free Transfer Learning
Probabilistic Graphical Models for Semi-Supervised Traffic Classification Rotsos Charalampos, Jurgen Van Gael, Andrew W. Moore, Zoubin Ghahramani Computer.
Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.
Andrew McCallum Just Research (formerly JPRC)
Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
A Comparison of Event Models for Naïve Bayes Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Selective Sampling for Information Extraction with a Committee of Classifiers Evaluating Machine Learning for Information Extraction, Track 2 Ben Hachey,
Classification using Co-Training
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Bayesian Active Learning with Evidence-Based Instance Selection LMCE at ECML PKDD th September 2015, Porto Niall Twomey, Tom Diethe, Peter Flach.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Diverse Ensembles for Active Learning
Semi-Supervised Clustering
Sofus A. Macskassy Fetch Technologies
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Presentation transcript:

Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University

Text Active Learning Many applications Scenario: ask for labels of a few documents While learning: –Learner carefully selects unlabeled document –Trainer provides label –Learner rebuilds classifier

Query-By-Committee (QBC) Label documents with high classification variance Iterate: –Create a committee of classifiers –Measure committee disagreement about the class of unlabeled documents –Select a document for labeling Theoretical results promising [Freund et al. 97] [Seung et al. 92]

Text Framework “Bag of Words” document representation Naïve Bayes classification: For each class, estimate P(word|class)

Outline: Our approach Create committee by sampling from distribution over classifiers Measure committee disagreement with KL- divergence of the committee members Select documents from a large pool using both disagreement and density-weighting Add EM to use documents not selected for labeling

Creating Committees Each class a distribution of word frequencies For each member, construct each class by: –Drawing from the Dirichlet distribution defined by the labeled data labeled data Classifier distribution MAP classifier Member 1 Member 2 Member 3 Committee

Measuring Committee Disagreement Kullback-Leibler Divergence to the mean –compares differences in how members “vote” for classes –Considers entire class distribution of each member –Considers “confidence” of the top-ranked class

Selecting Documents Stream-based sampling: –Disagreement => Probability of selection –Implicit (but crude) instance distribution information Pool-based sampling: –Select highest disagreement of all documents –Lose distribution information

Disagreement

Density-weighted pool-based sampling A balance of disagreement and distributional information Select documents by: Calculate Density by: –(Geometric) Average Distance to all documents

Disagreement

Density

Datasets and Protocol Reuters and subset of Newsgroups One initial labeled document per class 200 iterations of active learning mac ibm graphics windows X computers acqcorntrade...

QBC on Reuters acq, P(+) = 0.25trade, P(+) = 0.038corn, P(+) = 0.018

Selection comparison on News5

EM after Active Learning After active learning only a few documents have been labeled Use EM to predict the labels of the remaining unlabeled documents Use all documents to build a new classification model, which is often more accurate.

QBC and EM on News5

Related Work Active learning with text: –[Dagan & Engelson 95]: QBC Part of speech tagging –[Lewis & Gale 94]: Pool-based non-QBC –[Liere & Tadepalli 97 & 98]: QBC Winnow & Perceptrons EM with text: –[Nigam et al. 98]: EM with unlabeled data

Conclusions & Future Work Small P(+) => better active learning Leverage unlabeled pool by: –pool-based sampling –density-weighting –Expectation-Maximization Different active learning approaches a la [Cohn et al. 96] Interleaved EM & active learning

Document classification: the Potential 3 x 10 8 unlabeled web pages Classification important for the Web –Knowledge extraction –User interest modeling

Document classification: the Status Good techniques exist –Many parameters to estimate –Data very sparse –Lots of training examples needed

Document classification: the Challenge Labeling data is expensive –requires human interaction –domains may constrain labeling effort Use Active Learning! –Pick carefully which documents to label –Make a quicker knee

Disagreement Example

Reuters Skewed priors => better active learning? Reuters: Binary classification & skewed priors Better active learning results with more infrequent classes

comp.* Newsgroups dataset 5 categories, 1000 documents each 20% held out for testing One initial labeled document per class 200 iterations of active learning 10 runs per curve mac ibm graphics windows X computers

Text Classification Many applications Good techniques exist Require lots of data Labeling expensive Use active learning Corn prices rose today while corn futures dropped in surprising trading activity. Corn...

For each unlabeled document: Pick two consistent hypotheses If they disagree about the label, request it Old QBC stuff