Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

A Multiperiod Production Problem

PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,

Even More TopX: Relevance Feedback Ralf Schenkel Joint work with Osama Samodi, Martin Theobald.

Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.

Machine Learning Intro iCAMP 2012

Text Categorization.

Less is More Probabilistic Model for Retrieving Fewer Relevant Docuemtns Harr Chen and David R. Karger MIT CSAIL SIGIR2006 4/30/2007.

Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.

Introduction to Information Retrieval

Imbalanced data David Kauchak CS 451 – Fall 2013.

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

GSRS 2.1 Training Module S.2.1 Navigation General

Linear Separators.

Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.

Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.

Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.

Example 14.3 Football Production at the Pigskin Company

Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov

Precision and Recall.

Optimizing F-Measure with Support Vector Machines David R. Musicant Vipin Kumar Aysel Ozgur FLAIRS 2003 Tuesday, May 13, 2003 Carleton College.

Active Learning with Support Vector Machines

ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.

Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.

Learning at Low False Positive Rate Scott Wen-tau Yih Joshua Goodman Learning for Messaging and Adversarial Problems Microsoft Research Geoff Hulten Microsoft.

Mixed-level English classrooms What my paper is about: Basically my paper is about confirming with my research that the use of technology in the classroom.

Support Vector Machines Piyush Kumar. Perceptrons revisited Class 1 : (+1) Class 2 : (-1) Is this unique?

Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)

Evaluation David Kauchak cs458 Fall 2012 adapted from:

Evaluation David Kauchak cs160 Fall 2009 adapted from:

Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.

Active Learning for Class Imbalance Problem

SVM by Sequential Minimal Optimization (SMO)

1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.

Bug Localization with Machine Learning Techniques Wujie Zheng

CS Learning Rules1 Learning Sets of Rules. CS Learning Rules2 Learning Rules If (Color = Red) and (Shape = round) then Class is A If (Color.

Universit at Dortmund, LS VIII

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

Reduction of Training Noises for Text Classifiers Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.

This paper was presented at KDD ‘06 Discovering Interesting Patterns Through User’s Interactive Feedback Dong Xin Xuehua Shen Qiaozhu Mei Jiawei Han Presented.

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.

CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

A feature-based kernel for object classification P. Moreels - J-Y Bouguet Intel.

CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.

Support Vector Machines Tao Department of computer science University of Illinois.

A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

Selecting Good Expansion Terms for Pseudo-Relevance Feedback Guihong Cao, Jian-Yun Nie, Jianfeng Gao, Stephen Robertson 2008 SIGIR reporter: Chen, Yi-wen.

Agnostic Active Learning Maria-Florina Balcan*, Alina Beygelzimer**, John Langford*** * : Carnegie Mellon University, ** : IBM T.J. Watson Research Center,

CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:

05/04/07 Using Active Learning to Label Large Corpora Ted Markowitz Pace University CSIS DPS & IBM T. J. Watson Research Ctr.

Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -

Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Copyright Paula Matuszek Kinds of Machine Learning.

Feature Selection on Time-Series Cab Data

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.

Xiangnan Kong,Philip S. Yu An Ensemble-based Approach to Fast Classification of Multi-label Data Streams Dept. of Computer Science University of Illinois.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.

CS791 - Technologies of Google Spring A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.

PEBL: Web Page Classification without Negative Examples

Modern Information Retrieval

Active learning The learning algorithm must have some control over the data from which it learns It must be able to query an oracle, requesting for labels.

Learning Literature Search Models from Citation Behavior

SVMs for Document Ranking

Presentation transcript:

Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented by: John Paisley

Outline Discuss problem Discuss proposed solution Discuss results Conclusion

Problem of Paper Imagine you want to filter junk via some classifier and youre willing to help train that classifier by labeling things, but you want to do it quickly because youre impatient. Imagine you want to sort a database of news articles, etc. This paper is concerned with trying to speed this process up, meaning reach a high performance in fewer iterations.

Suggestion of Paper Traditionally, active learning will query a user about instances (articles, s etc) and the user will provide a label for that instance (one-vs-rest in this paper). This paper suggests that the user also be queried about features (words) and their relevance for distinguishing classes to speed up the learning process. The reason is that, apparently, in typical applications, all words of a document are used as features in classification. Therefore the feature is a very high dimension and, with only a few labeled data, its hard to build a good classifier. By asking about features, the dimensionality is (effectively) reduced early on with the nuisance dimensions (effectively) removed.

Traditional Active Learning 1.Several instances are selected at random and labeled by a user 2.A model is built (SVM using direct kernel here) 3.Sequentially, the most uncertain (closest to boundary and called uncertainty sampling) instances are selected, labeled, and the model updated. 4.The algorithm terminates at some point (when a high enough level of performance is reached).

Their Feature Feedback Addition 1.(same) Several instances are selected at random and labeled by a user 2.(same) A model (SVM using direct kernel here) is built. 3.(same) Sequentially, the most uncertain (closest to boundary and called uncertainty sampling) instances are selected, labeled, and the model updated. 4.Then, the user is shown a list of features (words) and asked whether they are relevant to distinguishing this class from others. Their algorithm then incorporates this in further training by simply multiplying that dimension by 10 (arbitrary) to increase the impact that dimension has on classification (because of the direct kernel I assume)

How They Assess Performance (1) Before humans are involved, they create an oracle that can rank features by importance (it has all labels a priori) as determined via Information Gain Where P(c) it the probability of the class of interest, P(t) is the probability of the word of interest appearing in an article, and P(c,t) is their joint probability. The larger the IG, the more informative the word is on determining the class (e.g. football is informative for sports).

How They Assess Performance (2) They devise their performance metric called efficiency F1 is the harmonic mean of the precision and recall, where precision is the fraction of (e.g.) articles classified as 1 that are correct and recall is the fraction of articles correctly classified as 1 to all articles with label 1 They set M = 1000, assuming that the classifier will be about perfect at that point and theyre measuring how far active learning (ACT) is from that perfection compared with random sampling. [Right: Efficiency is defined as one minus blue area divided by grey area. They only measure after seeing 42 documents throughout the paper]

Results with Oracle These results show the ideal performance of feature feedback to see if its worthwhile to begin with. Basically, they select the top n features that maximize performance (via Information Gain) and do active learning, reporting the efficiency after 42 documents, as well as the F1 score after 7 and 22 documents. The F1 results are upper bounded by the far right column. The results indicate that selecting the most informative features speeds up learning (the uninformative features are distractions for the classifier in the early stages when there are only a few labels).

Results with Human How well can a human label features compared with the oracle and, if not as well, is it still beneficial? Experiment: Have a human read an article and show the top 20 words from the oracle mixed in with some other words. Have the user mark relevant or not relevant/dont know for each. Below shows the human compared with the oracle. Also shown is the ability of 50 labeled documents (picked via uncertainty sampling) to select the top 20 words (via Information Gain) aka, traditional active learning after 50. What it says is that after seeing one document, a human can tell the relevant features better than the classifier can after 50. Kappa is a measure of how well the humans agree (which they say is good).

Putting Humans In the Loop They then took the human responses and simulated active learning with feature feedback. The experimenters were shown an article and the features to respond to (relevant or not) for that article and they input what the humans of the previous slide said. UNC is no feature feedback, ORA is the oracle (correct answers for the feature queries) and HIL is the human response (as opposed to oracle). It says that humans speed up the active learning process.

Conclusions Knowing what features are relevant at the early stages of active learning will help speed up the process of building an accurate classifier. Far fewer instances will need to be labeled for the classifier to reach a high performance. Humans are able to identify these features (in the case of identifying words for documents)