Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.

Slides:

Advertisements

Similar presentations

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Advertisements

Random Forest Predrag Radenković 3237/10

On-line learning and Boosting

Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.

Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.

Discriminative and generative methods for bags of features

Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2005.

1 Application of Metamorphic Testing to Supervised Classifiers Xiaoyuan Xie, Tsong Yueh Chen Swinburne University of Technology Christian Murphy, Gail.

Decision Tree Algorithm

2D1431 Machine Learning Boosting.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

MCS 2005 Round Table In the context of MCS, what do you believe to be true, even if you cannot yet prove it?

Reducing Multiclass to Binary LING572 Fei Xia Week 9: 03/04/08.

Learning from Observations Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 18 Fall 2004.

Ensemble Learning: An Introduction

ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.

Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.

Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.

Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.

Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)

Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research

Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at

What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.

Machine Learning: Ensemble Methods

Experimental Evaluation

Statistical Comparison of Two Learning Algorithms Presented by: Payam Refaeilzadeh.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Text Classification With Labeled and Unlabeled Data Presenter: Aleksandar Milisic Supervisor: Dr. David Albrecht.

Using Error-Correcting Codes for Efficient Text Categorization with a Large Number of Categories Rayid Ghani Center for Automated Learning & Discovery.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Ensembles of Classifiers Evgueni Smirnov

CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 01: Training, Testing, and Tuning Datasets.

Machine Learning CS 165B Spring 2012

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Employing EM and Pool-Based Active Learning for Text Classification Andrew McCallumKamal Nigam Just Research and Carnegie Mellon University.

Efficient Model Selection for Support Vector Machines

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Information Coding in noisy channel error protection:-- improve tolerance of errors error detection: --- indicate occurrence of errors. Source.

Ensemble Classification Methods Rayid Ghani IR Seminar – 9/26/00.

Combinatorial Algorithms Reference Text: Kreher and Stinson.

Discrete Distributions The values generated for a random variable must be from a finite distinct set of individual values. For example, based on past observations,

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

Ensemble Based Systems in Decision Making Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: IEEE CIRCUITS AND SYSTEMS MAGAZINE 2006, Q3 Robi.

CLASSIFICATION: Ensemble Methods

Introducing the Separability Matrix for ECOC coding

DIGITAL COMMUNICATIONS Linear Block Codes

ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.

Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.

Bayesian Averaging of Classifiers and the Overfitting Problem Rayid Ghani ML Lunch – 11/13/00.

Online Multiple Kernel Classification Steven C.H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Machine Learning (2013) Presented by Audrey Cheong Electrical.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.

Classification Ensemble Methods 1

Using Error-Correcting Codes for Efficient Text Categorization with a Large Number of Categories Rayid Ghani Advisor: Tom Mitchell.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Machine Learning in Practice Lecture 24 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.

… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.

Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.

1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.

Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.

1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.

Sofus A. Macskassy Fetch Technologies

Bayesian Averaging of Classifiers and the Overfitting Problem

Machine Learning: Lecture 5

Presentation transcript:

Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University This presentation can be accessed at

Outline Review of ECOC Previous Work Types of Codes Experimental Results Semi-Theoretical Model Drawbacks Conclusions & Work in Progress

Overview of ECOC Decompose a multiclass problem into multiple binary problems The conversion can be independent or dependent of the data (it does depend on the number of classes) Any learner that can learn binary functions can then be used to learn the original multivalued function

ECOC-Picture AB C

Training ECOC Given m distinct classes Create an m x n binary matrix M. Each class is assigned ONE row of M. Each column of the matrix divides the classes into TWO groups. Train the Base classifier to learn the n binary problems.

Testing ECOC To test a new instance Apply each of the n classifiers to the new instance Combine the predictions to obtain a binary string(codeword) for the new point Classify to the class with the nearest codeword (usually hamming distance is used as the distance measure)

Previous Work Combine with Boosting – ADABOOST.OC (Schapire, 1997)

Types of Codes Random Algebraic Constructed/Meaningful

Experimental Setup Generate the code Choose a Base Learner

Dataset Industry Sector Dataset Consists of company web pages classified into 105 economic sectors Standard stoplist No Stemming Skip all MIME and HTML headers Experimental approach similar to McCallum et al. (1997) for comparison purposes.

Results Classification Accuracies on five random train-test splits of the Industry Sector dataset with a vocabulary size of ECOC - 88% accurate!

How does the length of the code matter? Table 2: Average Classification Accuracy on 5 random train-test splits of the Industry Sector dataset with a vocabulary size of words selected using Information Gain. Longer codes mean larger codeword separation The minimum hamming distance of a code C is the smallest distance between any pair of distance codewords in C If minimum hamming distance is h, then the code can correct  (h-1)/2 errors

Theoretical Evidence Model ECOC by a Binomial Distribution B(n,p)n = length of the code p = probability of each bit being classified incorrectly

Size Matters?

Size does NOT matter!

Choosing Codes

Interesting Observations NBC does not give good probabilitiy estimates- using ECOC results in better estimates.

Drawbacks Can be computationally expensive Random Codes throw away the real- world nature of the data by picking random partitions to create artificial binary problems

Conclusion Improves Classification Accuracy considerably! Extends a binary learner to a multiclass learner Can be used when training data is sparse

Future Work Use meaningful codes (hierarchy or distinguishing between particularly difficult classes) Use artificial datasets Combine ECOC with Co-Training or Shrinkage Methods Sufficient and Necessary conditions for optimal behavior