 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

Slides:

Advertisements

Similar presentations

January 23 rd, Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document.

Advertisements

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

Machine learning continued Image source:

CMPUT 466/551 Principal Source: CMU

Frustratingly Easy Domain Adaptation

Multiple Instance Learning

Text Classification With Support Vector Machines

SVM Active Learning with Application to Image Retrieval

Active Learning with Support Vector Machines

Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.

Announcements  Project teams should be decided today! Otherwise, you will work alone.  If you have any question or uncertainty about the project, talk.

Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)

Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.

TTI's Gender Prediction System using Bootstrapping and Identical-Hierarchy Mohammad Golam Sohrab Computational Intelligence Laboratory Toyota.

Distributed Representations of Sentences and Documents

Thanks to Nir Friedman, HU

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Ensemble Learning (2), Tree and Forest

Introduction to machine learning

(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Efficient Model Selection for Support Vector Machines

Active Learning for Class Imbalance Problem

Learning with Positive and Unlabeled Examples using Weighted Logistic Regression Wee Sun Lee National University of Singapore Bing Liu University of Illinois,

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor ： Dr. Hsu Presenter ： Chien-Shing Chen Author: Tie-Yan.

Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.

Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.

Universit at Dortmund, LS VIII

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.

ICS 178 Introduction Machine Learning & data Mining Instructor max Welling Lecture 6: Logistic Regression.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science ＆ Information Engineering.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.

Understanding User’s Query Intent with Wikipedia G 여 승 후.

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa

Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

Date: 2011/1/11 Advisor: Dr. Koh. Jia-Ling Speaker: Lin, Yi-Jhen Mr. KNN: Soft Relevance for Multi-label Classification (CIKM’10) 1.

NTU & MSRA Ming-Feng Tsai

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.

Multimodal Learning with Deep Boltzmann Machines

Introductory Seminar on Research: Fall 2017

Asymmetric Gradient Boosting with Application to Spam Filtering

Machine Learning Week 1.

PEBL: Web Page Classification without Negative Examples

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Learning to Rank with Ties

Supervised machine learning: creating a model

Presentation transcript:

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter: Nonhlanhla Shongwe Date:

Preview  Introduction  Optimization framework  Experiment  Results  Summary 2

Introduction  Text data has become a major information source in our daily life  Text classification to better organize text data like  Document filtering  classification  Web search  Text classification tasks are multi-labeled  Each document can belong to more than one category 3

Introduction cont’s 4 World news Politics Education Example Category

Introduction cont’s  Supervised learning  Trained on randomly labeled data  Requires  Sufficient amount of labeled data  Labeling  Time consuming  Expensive process done by domain expects  Active learning  Reduce labeling cost 5

Introduction cont’s  How does an active learner works? 6 Augment the labeled set D l Data Pool Train classifier Selection strategy Query for true labels Select an optimal set

Introduction cont’s  Challenges for Multi-label Active Learning  How to select the most informative multi-labeled data?  Can we use single label selection strategy? NO  Example: 7 x1 x2 0.8 c1 0.7 c1 0.1 c2 0.5 c2 0.1 c3 0.1 c3

Optimization framework  Goal  To label data which can help maximize the reduction of the expected loss 8 Description Symbol Input distribution Training set Prediction function given a training set Predicted label set x Estimated loss Unlabeled data or

9 Optimization framework cont’s If belongs to class j EE p(x)

Optimization framework cont’s  Optimization problem can be divided into two parts  How to measure the loss reduction  How to provide a good probability estimation 10 Loss reduction Probability estimation

Optimization framework cont’s  How to measure the loss reduction?  Loss of the classifier  Measure the model loss by the size of version space of a binary SVM  Where W denotes the parameter space. The size of the version space is defined as the surface area of the hypersphere ||W|| = 1 in W 11

Optimization framework cont’s  How to measure the loss reduction?  With version space, the loss reduction rate can be approximated by using the SVM output margin 12 Loss on binary classifier built on D l associated with class i Size of the version space for classifier If x belongs to class i, then y = 1 otherwise y = -1

Optimization framework cont’s  How to measure the loss reduction?  Maximize the sum of the loss reduction of all binary classifiers 13 if f is correctly predict x Then |f(x)| uncertainty If f does not correctly predict x Then |f(x)| uncertainty

Optimization framework cont’s  How to provide a good probability estimation  Intractable to directly compute the expected loss function  Limited training data  Large number of possible label vectors  Approximate by the loss function with the largest conditional probability 14 Label vector with the largest conditional probability

Optimization framework cont’s  How to provide a good probability estimation  Predicting approach to address this problem  Try to decide the possible label number for each data  Determine the final labels based on the results of the probability on each label 15

Optimization framework cont’s  How to provide a good probability estimation 16 Assign probability output for each class For each x, sort them in decreasing order and normalize the classification probabilities, make the sum = 1 Train logistic regression classifier Features: Label: the true label number of x For each unlabeled data, predict the probabilities of having different number of labels If the label number with the largest probability is j, then

Experiment  Data set used  RCV1-V2 text data set [ D. D. Lewis 04]  Contained documents falling into 101 categories  Yahoo webpage's collection through hyperlinks 17 Data set# Instance# Feature# Label Arts & Humanities Business & Economy Computers & Internet Education Entertainment Health

Experiment cont’s  Comparing methods 18 Name of methoddescription MMC ( Maximum loss reduction with Maximal confidence) The sample selections strategy proposed in this paper Random The strategy is to randomly select data examples from the unlabeled pool Mean Max Loss (MML) are the predicted labels BinMin

Results cont’s  Compare the labeling methods  The proposed method  Scut [D.D. Lewis 04]  Tune threshold for each class  Scut (threshold =0) 19

Results cont’s  Initial set: 500 examples  50 iteration, S = 20 20

Results cont’s  Vary the size of initial labeled set 50 iterations s=20 21

Results cont’s  Vary the sampling size per rum: initial labeled set: 500 examples  Stop after adding labeled data 22

Results cont’s 23 Initial labeled set: 500 examples Iterations: 50 s=50

Summary  Multi-Label Active Learning for Text Classification  Important to reduce human labeling effort  Challenging tast  SVM-based Multi-Label Active learning  Optimize loss reduction rate based on SVM version space  Effective label prediction method  From the results  Successfully reduce labeling effort on the real world datasets and its better than other methods 24

Thanks you for listening