Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University.

Slides:

Advertisements

Similar presentations

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao Wei Fan Jing JiangJiawei Han University of Illinois at Urbana-Champaign IBM T. J.

Advertisements

Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise.

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.

K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.

LECTURE 11: BAYESIAN PARAMETER ESTIMATION

Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.

Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)

Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.

Middle Term Exam 03/04, in class. Project It is a team work No more than 2 people for each team Define a project of your own Otherwise, I will assign.

Assuming normally distributed data! Naïve Bayes Classifier.

Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.

Spatial Semi- supervised Image Classification Stuart Ness G07 - Csci 8701 Final Project 1.

Project  Now it is time to think about the project  It is a team work Each team will consist of 2 people  It is better to consider a project of your.

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.

Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

A Comparative Study on Feature Selection in Text Categorization (Proc. 14th International Conference on Machine Learning – 1997) Paper By: Yiming Yang,

Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Scalable Text Mining with Sparse Generative Models

Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.

Training and future (test) data follow the same distribution, and are in same feature space.

Crash Course on Machine Learning

(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.

Text Classification, Active/Interactive learning.

General Database Statistics Using Maximum Entropy Raghav Kaushik 1, Christopher Ré 2, and Dan Suciu 3 1 Microsoft Research 2 University of Wisconsin--Madison.

CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.

Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

One-class Training for Masquerade Detection Ke Wang, Sal Stolfo Columbia University Computer Science IDS Lab.

Source-Selection-Free Transfer Learning

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.

Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova ， Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.

Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.

Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15

Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.

Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.

Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.

Linear Models (II) Rong Jin. Recap  Classification problems Inputs x  output y y is from a discrete set Example: height 1.8m  male/female?  Statistical.

Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.

Semi-Supervised Clustering

k-Nearest neighbors and decision tree

Cross Domain Distribution Adaptation via Kernel Mapping

Ch3: Model Building through Regression

Introductory Seminar on Research: Fall 2017

Machine Learning Basics

Shashi Shekhar Weili Wu Sanjay Chawla Ranga Raju Vatsavai

Recap: Conditional Exponential Model

Knowledge Transfer via Multiple Model Local Structure Mapping

LECTURE 07: BAYESIAN ESTIMATION

Multivariate Methods Berlin Chen

Zhedong Zheng, Liang Zheng and Yi Yang

Recap: Naïve Bayes classifier

Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]

Presentation transcript:

Unsupervised Transfer Classification Application to Text Categorization Tianbao Yang, Rong Jin, Anil Jain, Yang Zhou, Wei Tong Michigan State University

Overview  Introduction  Related Work  Unsupervised Transfer Classification  Problem Definition  Approach & Analysis  Experiments  Conclusions

Introduction  Classification:  supervised learning  semi-supervised learning  What if No label information is available?  impossible but not with some additional information supervised semi-supervised unsupervised classification

Introduction  Unsupervised transfer classification (UTC)  a collection of training examples and their assignments to auxiliary classes  to build a classification model for a target class …. auxiliary class 1auxiliary class K target class No Labeled training examples prior conditional probabilities

Introduction: Motivated Examples Image Annotation sky 1 sun water grass ? ? ? ? Social Tagging phoneverizonapple 1 google ? ? ? ? How to predict an annotation word/social tag that does not appear in the training data ? ?// / ///? auxiliary classes target classes

Related Work  Transfer Learning  transfer knowledge from source domain to target domain  similarity: transfer label information for auxiliary classes to target class  difference: assume NO label information for target class  Multi-Label Learning, Maximum Entropy Model

Unsupervised Transfer Classification Data for auxiliary class target class target class label target classification model Goal Prior probabilityconditional probabilities Class Information Examples Auxiliary Classes assignments to auxiliary classes

Maximum Entropy Model (MaxEnt) Favor uniform distribution Feature statistics computed from conditional model Feature statistics computed from training data : the jth feature function

Generalized MaxEnt With a large probability Equality constraints Inequality constraints

Generalized MaxEnt

is unknown for target class How to extend generalized MaxEnt to unsupervised transfer classification ?

 Estimating feature statistics of target class from those of the auxiliary classes Unsupervised Transfer Classification ~ ~

 Build up Relation between Auxiliary Classes and Target Class Independence Assumption

Unsupervised Transfer Classification  Estimating feature statistics for the target class by regression Feature Statistics for Auxiliary Classes Feature Statistics for Target Class Class Information

Unsupervised Transfer Classification  Dual problem : function of U; definition can be found in paper

Consistency Result With a large probability The optimal dual solution using the label information for the target class The dual solution obtained by the proposed approach

Experiments  Text categorization  Data sets: multi-labeled data  Protocol: leave one-class out as the target class  Metric: AUC (Area under ROC curve)

Experiments: Baselines  cModel  train a classifier for each auxiliary class  linearly combine them for the target class  cLabel  predict the assignment of the target class for training examples by linearly combining the labels of auxiliary classes  train a classifier using the predicted labels for target class  GME-avg  use generalized maxent model  compute the feature statistics for the target class by linearly combining those for the auxiliary classes  Proposed Approach: GME-Reg

Experiment (I)  Estimate class information from training data

 Compare to the classifier of the target class learned by supervised learning Experiment (I)

Experiment (II)  Obtain class information from external sources  Datasets: bibtex and delicious  bibsonomy  bibtexwww.bibsonomy.org/tags  ACM DL  bibtexwww.portal.acm.org  d eli.cio.us  deliciouswww.delicious.com/tag

Experiment (II)  Comparison with Supervised Classification ~1200

Conclusions  A new problem: unsupervised transfer classification  A statistical framework for unsupervised transfer classification  based on generalized maximum entropy  robust estimate feature statistics for target class  provable performance by consistency analysis  Future Work  relax independence assumption  better estimation of feature statistics for target class

Thanks Questions ?