Co-clustering based classification for Out-of-domain Documents

Slides:

Advertisements

Similar presentations

January 23 rd, Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document.

Advertisements

Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.

Co Training Presented by: Shankar B S DMML Lab

EE-148 Expectation Maximization Markus Weber 5/11/99.

Belief Propagation in a Continuous World Andrew Frank 11/02/2009 Joint work with Alex Ihler and Padhraic Smyth TexPoint fonts used in EMF. Read the TexPoint.

Personalized Search Result Diversification via Structured Learning

Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.

Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Spatial Semi- supervised Image Classification Stuart Ness G07 - Csci 8701 Final Project 1.

ON THE IMPROVEMENT OF IMAGE REGISTRATION FOR HIGH ACCURACY SUPER-RESOLUTION Michalis Vrigkas, Christophoros Nikou, Lisimachos P. Kondi University of Ioannina.

Distributional Clustering of Words for Text Classification L. Douglas Baker Andrew Kachites McCallum SIGIR’98.

Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.

Visual Recognition Tutorial

Cross Validation Framework to Choose Amongst Models and Datasets for Transfer Learning Erheng Zhong ¶, Wei Fan ‡, Qiang Yang ¶, Olivier Verscheure ‡, Jiangtao.

Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.

Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.

Large-Scale Text Categorization By Batch Mode Active Learning Steven C.H. Hoi †, Rong Jin ‡, Michael R. Lyu † † CSE Department, Chinese University of Hong.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Introduction to Data Mining Engineering Group in ACL.

Data Visualization and Feature Selection: New Algorithms for Nongaussian Data Howard Hua Yang and John Moody NIPS ’ 99.

Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Dongyeop Kang1, Youngja Park2, Suresh Chari2

Active Learning for Class Imbalance Problem

Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

Modern Topics in Multivariate Methods for Data Analysis.

CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

Bridged Refinement for Transfer Learning XING Dikan, DAI Wenyua, XUE Gui-Rong, YU Yong Shanghai Jiao Tong University

Transfer Learning Motivation and Types Functional Transfer Learning Representational Transfer Learning References.

1 KDD-09, Paris France Quantification and Semi-Supervised Classification Methods for Handling Changes in Class Distribution Jack Chongjie Xue † Gary M.

A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Understanding User’s Query Intent with Wikipedia G 여 승 후.

Dual Transfer Learning Mingsheng Long 1,2, Jianmin Wang 2, Guiguang Ding 2 Wei Cheng, Xiang Zhang, and Wei Wang 1 Department of Computer Science and Technology.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.

Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.

Wenyuan Dai, Ou Jin, Gui-Rong Xue, Qiang Yang and Yong Yu Shanghai Jiao Tong University & Hong Kong University of Science and Technology.

Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang.

1 Limiting Privacy Breaches in Privacy Preserving Data Mining In Proceedings of the 22 nd ACM SIGACT – SIGMOD – SIFART Symposium on Principles of Database.

Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.

NTU & MSRA Ming-Feng Tsai

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

Self-taught Clustering – an instance of Transfer Unsupervised Learning † Wenyuan Dai joint work with ‡ Qiang Yang, † Gui-Rong Xue, and † Yong Yu † Shanghai.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

RADFORD M. NEAL GEOFFREY E. HINTON 발표: 황규백

A PAC-Bayesian Approach to Formulation of Clustering Objectives Yevgeny Seldin Joint work with Naftali Tishby.

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.

PAC-Bayesian Analysis of Unsupervised Learning Yevgeny Seldin Joint work with Naftali Tishby.

Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Lecture 1.31 Criteria for optimal reception of radio signals.

Introductory Seminar on Research: Fall 2017

LECTURE 23: INFORMATION THEORY REVIEW

Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^

Xiao-Yu Zhang, Shupeng Wang, Xiaochun Yun

Presentation transcript:

Co-clustering based classification for Out-of-domain Documents Wenyuan Dai Gui-Rong Xue Qiang Yang Yong Yu by Venkata Ramana Reddy Banda

ABSTRACT To learn from the in-domain and apply the learned knowledge to out-of –domain. We propose a Co-clustering based classification(CoCC) algorithm to tackle this problem. Co-clustering is used as a bridge to propagate the class structure and knowledge from the in-domain to the out-of-domain.

INTRODUCTION In-domain(Di) Out-of-Domain(D0) Class label set(C) 2.Co-clustering Documents in Di 1.Word clustering Documents in D0 word

RELATED WORK Classification Learning Multi-task and Multi-domain Learning Semi-supervised Clustering

PRELIMINARIES Let X and Y be random variable sets with a joint distribution p(X, Y ) and marginal distributions p(X) and p(Y ). The mutual information I(X; Y ) is defined as I(X; Y ) =∑x ∑y p(x, y) log(p(x, y)÷p(x)p(y)). Kullback-Leibler (KL) divergence or relative entropy measures, defined for two probability mass functions p(x) and q(x), D(p||q) =∑x p(x) log(p(x)÷q(x)).

PROBLEM FORMULATION Let ˆDo denote the out-of-domain document clustering, and ˆW denote the word clustering, where | ˆ W| = k. The document cluster-partition function CDo and the word clusterpartition function CW can be defined as CDo (d) = ˆ d, where d ∈ ˆ d ∧ ˆ d ∈ ˆDo (3) CW(w) = ˆ w, where w ∈ ˆ w ∧ ˆ w ∈ ˆW (4) where ˆ d represents the document cluster that d belongs to and ˆ w represents the word cluster that w belongs to. Then, the co-clustering can be represented by (CDo, CW) or (ˆDo, ˆW ).

PROBLEM FORMULATION we define the loss for co-clustering in mutual information as I(Do;W) − I(ˆDo; ˆW). (5) We define the loss in mutual information for a word clustering as I(C;W) − I(C; ˆW). (6) Integrating Equations (5) and (6), the loss function for co-clustering based classification can be obtained: I(Do;W) − I(ˆDo; ˆW) + λ · (I(C;W) − I(C; ˆW )) . (7) where λ is a trade-off parameter that balances the effect to word clusters from co-clustering (see Equation (5)) and word clustering (see Equation (6)). The objective is to find a co-clustering that minimizes the function value of Equation(7). we will rewrite the objective function in Equation (7) into another form that is represented by KL-divergence D(f(Do,W)|| ˆ f(Do,W)) + λ · D(g(C,W)||ˆg(C,W)). (8)

CO-CLUSTERING BASED CLASSIFICATION The objective function described in (8) is a multi-part function. Lemma 2. D(f(Do,W)|| ˆ f(Do,W)) =∑ ˆ d∈ ˆ Do ∑d∈ ˆ df(d)D(f(W|d)|| ˆ f(W|ˆd)) = ∑ ˆ w∈ ˆ W ∑w∈ ˆ wf(w)D(f(Do|w)|| ˆ f(Do| ˆ w)) Lemma 3. D(g(C,W)||ˆg(C,W)) =∑ ˆ w∈ ˆ W ∑w∈ ˆ wg(w)D(g(C|w)||ˆg(C| ˆ w)).

ALGORITHM Input:A labeled in-domain data set Di; an unlabeled out-of-domain data set Do; a set C of all the class labels; a set W of all the word features; initial co-clustering (C(0)Do, C(0)W); the number of iterations T. Initialize the joint probability distribution f, ˆ f, g and ˆg based on Equations (8), (9), (10) and (11), respectively. For t ← 1, 3, 5, . . . , 2T + 1 1: Compute the document cluster: C(t)Do(d) = argminˆ dD(f(W|d)|| ˆ f(t−1)(W|ˆ d)) 2: Update the probability distribution ˆ f(t) based on C(t)Do ,C(t−1)W, and Equation (9). C(t)W= C(t−1)W and ˆg(t) =ˆg(t−1). 3: Compute the word cluster: C(t+1)W(w) = argminˆ w f(w)D(f(Do|w)|| ˆ f(t)(Do| ˆ w))+ λ · g(w)D(g(C|w)||ˆg(t)(C|ˆ w)) 4: Update the probability distribution ˆg(t+1) based on C(t+1)W, and Equation (11). ˆ f(t+1) = ˆ f(t) and C(t+1)Do=C(t)Do. End For Output: the partition functions C(T)Do and C(T)W.

This algorithm converges in a finite number of iterations. The time complexity of our co-clustering based classification algorithm is O((|C|+| ˆ W|)·T·N). The space complexity is O(N).

EXPERIMENTS Data Sets Comparision Methods Implementation Details Evaluation Metrics Experimental Results

EVALUATION METRICS The performance of the proposed methods was evaluated by test error rate. Let C be the function which maps from document d to its true class label c = C(d), and F be the function which maps from document d to its prediction label c = F(d) given by the classifiers. Test error rate is defined as ε = |{d|d ∈ Do ∧ C(d) = F(d)}|÷|Do|.

PERFORMANCE

Convergence Parameters tuning KL-divergence and Improvement

Conclusions CoCC can monotonically reduce the objective function value and outperforms traditional supervised and semisupervised classification algorithms when classifying out-of-domain documents. The number of word clusters are quite large (128 clusters in the experiments) to obtain good performance. Since the time complexity of CoCC depends on the number of word clusters, it can inefficient. Parameters in CoCC are tuned manually.

QUERIES Thank You