Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Slides:

Advertisements

Similar presentations

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Text Categorization Moshe Koppel Lecture 1: Introduction Slides based on Manning, Raghavan and Schutze and odds and ends from here and there.

Imbalanced data David Kauchak CS 451 – Fall 2013.

Evaluation of Decision Forests on Text Categorization

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

Machine learning continued Image source:

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

TÍTULO GENÉRICO Concept Indexing for Automated Text Categorization Enrique Puertas Sanz Universidad Europea de Madrid.

Made with OpenOffice.org 1 Sentiment Classification using Word Sub-Sequences and Dependency Sub-Trees Pacific-Asia Knowledge Discovery and Data Mining.

Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.

Text Classification With Support Vector Machines

Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.

Deep Belief Networks for Spam Filtering

SVM Active Learning with Application to Image Retrieval

ECOC for Text Classification Hybrids of EM & Co-Training (with Kamal Nigam) Learning to build a monolingual corpus from the web (with Rosie Jones) Effect.

Using IR techniques to improve Automated Text Classification

Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.

Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)

Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,

Holistic Web Page Classification William W. Cohen Center for Automated Learning and Discovery (CALD) Carnegie-Mellon University.

Text Classification With Labeled and Unlabeled Data Presenter: Aleksandar Milisic Supervisor: Dr. David Albrecht.

Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learing.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Fast Webpage classification using URL features Authors: Min-Yen Kan Hoang and Oanh Nguyen Thi Conference: ICIKM 2005 Reporter: Yi-Ren Yeh.

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

Smart RSS Aggregator A text classification problem Alban Scholer & Markus Kirsten 2005.

Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

Universit at Dortmund, LS VIII

Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Practical Issues with SVM. Handwritten Digits:

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Processing of large document collections Part 3 (Evaluation of text classifiers, term selection) Helena Ahonen-Myka Spring 2006.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Instance Filtering for Entity Recognition Advisor ： Dr.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.

Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.

A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.

Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.

Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.

Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Text Categorization by Boosting Automatically Extracted Concepts Lijuan Cai and Tommas Hofmann Department of Computer Science, Brown University SIGIR 2003.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Classification using Co-Training

Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

Using Asymmetric Distributions to Improve Text Classifier Probability Estimates Paul N. Bennett Computer Science Dept. Carnegie Mellon University SIGIR.

Twitter as a Corpus for Sentiment Analysis and Opinion Mining

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Compact Bilinear Pooling

An Empirical Study of Learning to Rank for Entity Search

Measuring Sustainability Reporting using Web Scraping and Natural Language Processing Alessandra Sozzi

iSRD Spam Review Detection with Imbalanced Data Distributions

Using Uneven Margins SVM and Perceptron for IE

MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn

Presentation transcript:

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural Language Processing, pp

Introduction Corpus-based supervised learning is now a standard approach to achieve high-performance in natural language processing. The weakness of supervised learning approach is to need an annotated corpus, the size of which is reasonably large. The problem is that corpus annotation is labor intensive and very expensive. Previous methods: –minimally-supervised learning methods (e.g., (Yarowsky, 1995; Blum and Mitchell, 1998)), –active learning methods (e.g., (Thompson et al., 1999; Sassano, 2002)).

Introduction The spirit behind these methods is to utilize precious labeled examples maximally. Another method following the same spirit is one using virtual examples (artificially created examples) generated from labeled examples. This method has been rarely discussed in natural language processing. The purpose of this study is to explore the effectiveness of virtual examples in NLP. It is important to investigate this approach by which it is expected that we can alleviate the cost of corpus annotation.

Support Vector Machines Assume that we are given the training data To train an SVM is to find α i and b by solving the following optimization problem: The solution gives an optimal hyperplane, which is a decision boundary between the two classes.

Virtual Examples and Virtual Support Vectors Virtual examples are generated from labeled examples. Based on prior knowledge of a target task, the label of a generated example is set to the same value as that of the original example. Virtual examples that are generated from support vectors are called virtual support vectors. Reasonable virtual support vectors are expected to give a better optimal hyperplane. Assuming that virtual support vectors represent natural variations of examples of a target task, the decision boundary should be more accurate.

Virtual Examples for Text Classification Assumption The category of a document is unchanged even if a small number of words are added or deleted. Two methods to create virtual examples: GenerateByDeletion –to delete some portion of a document. GenerateByAddition. –to add a small number of words to a document.

Text representation Tokenize a document to words, downcase them and then remove stopwords, where the stopword list of freeWAIS-sf is used. (freeWAIS-sf: Available at dortmund.de/ir/projects/freeWAIS-sf/) Stemming is not performed. We adopt binary feature vectors where word frequency is not used.

GenerateByDeletion Given a feature vector (a document) x and x ’ is a generated vector from x Algorithm: 1. Copy x to x ’. 2. For each binary feature f of x ’, if rand() ≤ t then remove the feature f, where rand() is a function which generates a random number from 0 to 1, and t is a parameter to decide how many features are deleted.

GenerateByAddition Given a feature vector (a document) x and x ’ is a generated vector from x Algorithm: 1. Collect from a training set documents, the label of which is the same as that of x. 2. Concatenate all the feature vectors (documents) to create an array a of features. Each element of a is a feature which represents a word. 3. Copy x to x ’. 4. For each binary feature f of x ’, if rand() ≤ t then select a feature randomly from a and put it to x ’.

Example Virtual examples generated from Document 1 by GenerateByDeletion: (f2; f3; f4; f5; +1), (f1; f3; f4; +1), or (f1; f2; f4; f5; +1). Virtual examples generated from Document 2 by GenerateByAddition: –Array a = (f1; f2; f3; f4; f5; f2; f4; f5; f6; f2; f3; f5; f6; f7). (f1; f2; f4; f5; f6; +1), (f2; f3; f4; f5; f6; +1), or (f2; f4; f5; f6; f7; +1).

Test Set Collection Reuters dataset (Available from David D. Lewis ’ s page: We used here “ ModApte ” split, which is most widely used in the literature on text classification. This split has 9,603 training examples and 3,299 test examples. We use only the most frequent 10 categories in the dataset.

The most frequent 10 categories

Performance Measures We use F-measure as a primal performance measure to evaluate the result. where p is precision and q is recall and β is a parameter Evaluate the performance of a classifier to a multiple category dataset, there are two ways to compute F-measure: macro-averaging and micro-averaging (Yang, 1999).

Experimental Results SVM setting C = which is 1/(the average of x∙x) where x is a feature vector in the 9,603 size training set. We fixed C through all the experiments. We built a binary classifier for each of the 10 categories shown in Table 2. We set the parameter t to To build an SVM with virtual examples by the following steps: 1.Train an SVM. 2.Extract Support Vectors. 3.Generate virtual examples from the Support Vectors. 4.Train another SVM using both the original labeled examples and the virtual examples.

Experimental Results We evaluated the performance of the two methods depending on the size of a training set. We created subsamples by selecting randomly from the 9603 size training set. We prepared seven sizes: 9603, 4802, 2401, 1200, 600, 300, and 150.

Experimental Results

Both the methods give better performance than that of the original SVM. The smaller the number of examples in the training set is, the larger the gain is. For the 9603 size training set, the gain of GenerateByDeletion is 0.75 (= – 89.42), while for the 150 size set, the gain is 6.88 (= – 53.28).

Experimental Results Combined methods: SVM + 2 VSVs per SV –To create one virtual example with GenerateByDeletion and GenerateByAddition respectively. SVM + 4 VSVs per SV –To create two virtual examples with GenerateByDeletion and GenerateByAddition respectively. The best result is achieved by the combined method to create four virtual examples per Support Vector.

Experimental Results SVM + 4 VSVs per SV outperforms the original SVM.

Experimental Results We plot changes of the error rate for tests (3299 tests for each of the 10 categories) in Figure 5. SVM with 4 VSVs still outperforms the original SVM.

Experimental Results The performance changes for each of the 10 categories are shown in Tables 4 and 5. The original SVM fails to find a good hyperplane due to the imbalanced training sets which have very few positive examples. In contrast, SVM with 4 VSVs yields better results for such harder cases.

Conclusion and Future Directions Our proposed methods improve the performance of text classification with SVMs, especially for small training sets. Although the proposed methods are not readily applicable to NLP tasks other than text classification, it is notable that the use of virtual examples, which has been very little studied in NLP, is empirically evaluated. In the future, –to employ virtual examples with methods to use both labeled and unlabeled examples. –to develop methods to create virtual examples for the other tasks (e.g., named entity recognition, POS tagging, and parsing) in NLP.