Reducing Multiclass to Binary LING572 Fei Xia Week 9: 03/04/08.

Slides:



Advertisements
Similar presentations
DCSP-8: Minimal length coding II, Hamming distance, Encryption Jianfeng Feng
Advertisements

DCSP-10 Jianfeng Feng Department of Computer Science Warwick Univ., UK
Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey.
INTRODUCTION TO Machine Learning 2nd Edition
Linear Classifiers (perceptrons)
Greedy Algorithms Amihood Amir Bar-Ilan University.
Data and Computer Communications Tenth Edition by William Stallings Data and Computer Communications, Tenth Edition by William Stallings, (c) Pearson Education.
ERROR CORRECTION.
Problem: Huffman Coding Def: binary character code = assignment of binary strings to characters e.g. ASCII code A = B = C =
T(n) = 4 T(n/3) +  (n). T(n) = 2 T(n/2) +  (n)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.
Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.
Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.
Chapter 11 Algebraic Coding Theory. Single Error Detection M = (1, 1, …, 1) is the m  1 parity check matrix for single error detection. If c = (0, 1,
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.
Error detection and correction
Using Error-Correcting Codes For Text Classification Rayid Ghani This presentation can be accessed at
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
15-853Page :Algorithms in the Real World Error Correcting Codes I – Overview – Hamming Codes – Linear Codes.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
Using Error-Correcting Codes for Efficient Text Categorization with a Large Number of Categories Rayid Ghani Center for Automated Learning & Discovery.
Difficulties with Nonlinear SVM for Large Problems  The nonlinear kernel is fully dense  Computational complexity depends on  Separating surface depends.
Error Detection and Correction Rizwan Rehman Centre for Computer Studies Dibrugarh University.
exercise in the previous class (1)
Hamming Codes 11/17/04. History In the late 1940’s Richard Hamming recognized that the further evolution of computers required greater reliability, in.
Ensembles of Classifiers Evgueni Smirnov
Final review LING572 Fei Xia Week 10: 03/11/
Using Error-Correcting Codes For Text Classification Rayid Ghani Center for Automated Learning & Discovery, Carnegie Mellon University.
DIGITAL COMMUNICATION Error - Correction A.J. Han Vinck.
Combinatorial Algorithms Reference Text: Kreher and Stinson.
ECE 2110: Introduction to Digital Systems BCD, Gray, Character, Action/Event, Serial Data.
1 SNS COLLEGE OF ENGINEERING Department of Electronics and Communication Engineering Subject: Digital communication Sem: V Cyclic Codes.
Codes Codes are used for the following purposes: - to detect errors - to correct errors after detection Error Control Coding © Erhan A. Ince Types: -Linear.
Data and Computer Communications by William Stallings Eighth Edition Digital Data Communications Techniques Digital Data Communications Techniques Click.
Riyadh Philanthropic Society For Science Prince Sultan College For Woman Dept. of Computer & Information Sciences CS 251 Introduction to Computer Organization.
Greedy Algorithms Input: Output: Objective: - make decisions “greedily”, previous decisions are never reconsidered Optimization problems.
Introducing the Separability Matrix for ECOC coding
DIGITAL COMMUNICATIONS Linear Block Codes
Information Theory Linear Block Codes Jalal Al Roumy.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
10.1 Chapter 10 Error Detection and Correction Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Error Detection. Data can be corrupted during transmission. Some applications require that errors be detected and corrected. An error-detecting code can.
Perfect and Related Codes
Error Detection and Correction
A simple rate ½ convolutional code encoder is shown below. The rectangular box represents one element of a serial shift register. The contents of the shift.
Error-Detecting and Error-Correcting Codes
Using Error-Correcting Codes for Efficient Text Categorization with a Large Number of Categories Rayid Ghani Advisor: Tom Mitchell.
… Algo 1 Algo 2 Algo 3 Algo N Meta-Learning Algo.
Course Review #2 and Project Parts 3-6 LING 572 Fei Xia 02/14/06.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Hamming Distance & Hamming Code
Transmission Errors Error Detection and Correction.
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.
1 Product Codes An extension of the concept of parity to a large number of words of data 0110… … … … … … …101.
Ensembles of Classifiers Evgueni Smirnov. Outline 1 Methods for Independently Constructing Ensembles 1.1 Bagging 1.2 Randomness Injection 1.3 Feature-Selection.
Dan Roth Department of Computer Science
The Viterbi Decoding Algorithm
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
The Greedy Method and Text Compression
CS 4/527: Artificial Intelligence
COS 463: Wireless Networks Lecture 9 Kyle Jamieson
INTRODUCTION TO Machine Learning
Combining Base Learners
COS 463: Wireless Networks Lecture 9 Kyle Jamieson
Error Correction Coding
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Reducing Multiclass to Binary LING572 Fei Xia Week 9: 03/04/08

Highlights What? –Converting a k-class problem to a binary problem. Why? –For some ML algorithms, a direct extension to the multiclass case may be problematic. –Ex: Boosting, support-vector machines (SVM) How? –Many methods

Methods One-vs-all All-pairs Error-correcting Output Codes (ECOC)**: see additional slides …

One-vs-all Idea: –Each class is compared to all others. –K classifiers: one classifier for each class. Training time: –For each class c m, train a classifier cl m (x) replace (x,y) with (x, 1) if y = c m (x, 0) if y != c m

One-vs-all (cont) Testing time: given a new example x – Run each of the k classifiers on x – Choose the class c m with the highest confidence score cl m (x): c* = arg max m cl m (x)

All-pairs Idea: –all pairs of classes are compared to each other –C k 2 classifiers: one classifier for each class pair. Training: –For each pair (c m, c n ) of classes, train a classifier cl mn replace a training instance (x,y) with (x, 1) if y = c m (x, 0) if y = c n otherwise ignore the instance

All-pairs (cont) Testing time: given a new example x – Run each of the C k 2 classifiers on x – Max-win strategy: Choose the class c m that wins the most pairwise comparisons: –Other coupling models have been proposed: e.g., (Hastie and Tibshirani, 1998)

Summary Different methods: –Direct multiclass –One-vs-all (a.k.a. one-per-class): k-classifiers –All-pairs: C k 2 classifiers –ECOC: n classifiers (n is the num of columns) Some studies report that All-pairs and ECOC work better than one-vs-all.

Additional slides

Error-correcting output codes (ECOC) Proposed by (Dietterich and Bakiri, 1995) Idea: –Each class is assigned a unique binary string of length n. –Train n classifiers, one for each bit. –Testing time: run n classifiers on x to get a n-bit string s, and choose the class which is closest to s.

An example

Meaning of each column

Another example: 15-bit code for a 10-class problem

Hamming distance Definition: the Hamming distance between two strings of equal length is the number of positions for which the corresponding symbols are different. Ex: –10111 and –2143 and 2233 –Toned and roses

How to choose a good error- correcting code? Choose the one with large minimum Hamming distance between any pair of code words. If the min Hamming distance is d, then the code can correct at least (d-1)/2 single bit errors.

Two properties of a good ECOC Row separations: Each codeword should be well-separated in Hamming distance from each of the other codewords Column separation: Each bit-position function f i should be uncorrelated with each of the other f j.

All possible columns for a three-class problem If there are k classes, there will be at most 2 k-1 -1 usable columns after removing complements and the all-zeros or all-ones column.

Finding a good code for different values of k Exhaustive codes Column selection from exhaustive codes Randomized hill climbing BCH codes …

Results