Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.

Similar presentations


Presentation on theme: "Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He."— Presentation transcript:

1 Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He

2 2015-9-2Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions

3 2015-9-2Ping Luo CIKM 08 Research Motivation (1) How to exploit the distribution differences among multiple source domains to boost the learning performance in a target domain How to deal with the situation that the source domains are geographically separated with some privacy concerns

4 2015-9-2Ping Luo CIKM 08 Research Motivation (1) Motivating Examples –Web pages Classification Label Web pages from multiple different universities to find the course main page by text classification Different university with different terms to describe the course metadata –Video concept detection Generalize to models to detect semantic concepts from multiple source video data Common Features 1. Multiple source domains with different data distributions 2. Separated source domains

5 2015-9-2Ping Luo CIKM 08 Challenges and Contributions New Challenges - How to make good use of the distribution mismatch among multiple source-domains to promote the prediction performance on target-domain - Extend the consensus regularization to implement in a distributed manner, which modestly preserves privacy Contributions - Propose a consensus regularization based algorithm for transfer learning from multiple source-domains - Perform in a distributed and modest privacy-preserving manner

6 2015-9-2Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions

7 2015-9-2Ping Luo CIKM 08 Consensus Measuring (1) Example: three-class classification problem, three classifiers predict an instance x Minimal entropy, Maximal Consensus Maximal entropy, Minimal Consensus

8 2015-9-2Ping Luo CIKM 08 Consensus Measuring (2) Example: t wo-classes classification problem, three classifiers predict an instance x Due to computing complexity in the entropy, for 2-entry probability distribution vectors, we can simplify the consensus measure as:

9 2015-9-2Ping Luo CIKM 08 Logistical Regression [Davie et al, 2000] Logistic regression can be an approach to learn classification model for discrete outputs. Given:  Training data set X, where X is any vector containing discrete or continuous random variables  Discrete outputs Y, where Y is discrete value Maximize the following formula to obtain Model w: Classification:

10 2015-9-2Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions

11 2015-9-2Ping Luo CIKM 08 Problem Formulation (1) Given:  Let be m source-domains of labeled data, and the l-th source-domain is represented by  The unlabeled target-domain is denoted by  Assume that are of different but closely related distributions Find:  Train m classifiers  covers the knowledge from the i-th source domain  achieve high degree of consensus on their prediction results on the target domain

12 2015-9-2Ping Luo CIKM 08 Problem Formulation (2) Formulation:  Adapt supervised learning framework with consensus regularization  Output m models, which maximize: where is the probability of the hypotheses given the observed data set. is the consensus degree of the prediction results of these classifiers on the target domain

13 2015-9-2Ping Luo CIKM 08 Why Consensus Regularization (1) In this study we focus on binary classification problems with the labels 1 and -1, and the number of classifiers m = 3. The non-trivial classifier can be restated as:

14 2015-9-2Ping Luo CIKM 08 Why Consensus Regularization (2) Thus, minimizing the disagreement means to decrease the classification error.

15 2015-9-2Ping Luo CIKM 08 Consensus Regularization by Logistic Regression (1) The proposed consensus regularization framework outputs m logistic models, which minimize: For binary classification problem, the entropy based consensus measure C e can be equivalent with C s. Thus, the objective function can be rewritten as

16 2015-9-2Ping Luo CIKM 08 The partial differential of objective is, where A function of a local classifier and the data from the corresponding source domain. Thus, this function can be computed locally on each source domain. A function of all the local classifiers and the data from the target domain. Thus, this function can be computed on the target domain with all the classifiers. Consensus Regularization by Logistic Regression (2)

17 2015-9-2Ping Luo CIKM 08 Distributed Implementation of Consensus Regularization (1) In the distributed setting, the data notes contain source-domain data are used as slave nodes, denoted by, and the node contains target-domain is used as master node, called.

18 2015-9-2Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions

19 2015-9-2Ping Luo CIKM 08 Experimental Preparation (1) Data Preparation –Three source domains (A 1, B 1 ) (A 2, B 2 ) (A 3, B 3 ), one target domain (A 4, B 4 ) –96 ( ) problem instances can be constructed for the experimental evaluation Baseline Algorithms –Distributed approach: Distributed Ensemble (DE), Distributed Consensus Regularization (CCR 3 ) –Centralized approach: Centralized Training (CT), Centralized Consensus Regularization (CCR) (eg. CCR 1 means m = 1), CoCC [Dai et al., KDD’07], TSVM [Joachims, ICML’99], SGT [Joachims, ICML’03] A 1 sci.crypt A 2 sic.electronics A 3 sci.med A 4 sci.space B 1 talk.guns B 2 talk.mideast B 3 talk.misc B 4 talk.religion

20 2015-9-2Ping Luo CIKM 08 Experimental Parameters and Metrics Note that, when parameterθ= 0, DE is equivalent to DCR, and CT is equivalent to CCR 1. Parameter setting –The range ofθis [0,0.25] –The parameters of CoCC, TSVM, SGT are the same as [Dai ea al., KDD’07 ] Experimental metrics  Accuracy  Convergence

21 2015-9-2Ping Luo CIKM 08 Experimental Results (1) Comparison of CCR 3, CCR 1, DE and CT are the best performance whenθis sampled in [0, 0.25]

22 2015-9-2Ping Luo CIKM 08 Experimental Results (2) The average performance comparison of CCR 3, CCR 1, DE and CT on 96 problem instances Comparison of TSVM, SGT, CoCC and CCR 3

23 2015-9-2Ping Luo CIKM 08 Experimental Results on Algorithm Convergence The algorithm almost converges after 20 iterations, which indicates that our algorithm owns a good property of convergence.

24 2015-9-2Ping Luo CIKM 08 More experiments (1) Note that, the original source-domains have much large distribution mismatch, but after merging, the distribution mismatch is greatly alleviated.

25 2015-9-2Ping Luo CIKM 08 More experiments (2) The experiments on image classification are also very promising

26 2015-9-2Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions

27 2015-9-2Ping Luo CIKM 08 Related Work (1) Transfer Learning Solve the fundamental problem of different distributions between the training and testing data. –Assume there are some labeled data from the target domain data  Estimation of mismatch degree by Liao et al.[ICML’05]  Boosting based learning by Dai et al.[ICML’07]  Building generative classifiers by Smith et al.[KDD’07]  Constructing information priors from source-domain and then encoding it to the model built by Raina et al.[ICML’06] –The data in target-domain are totally unlabeled  Co-clustering based Classification by Dai et al.[KDD’07]  Transductive Bridged-Refinement by Xing et al.[PKDD’07]

28 2015-9-2Ping Luo CIKM 08 Related Work (2) Self-Taught Learning Use a large amount of unlabeled data to improve performance of given classification task –Apply sparse coding to construct higher-level features using the unlabeled data by Raina et al.[ICML’07] Semi-supervised Classification –Entropy minimization by Grandvalet et al.[NIPS’05], which is a special case of our regularization framework when m = 1 Multi-View Learning –Co-training by Blum et al.[COLT’98] –Boosting mixture models by Grandvalet et al.[ICANN’01] –Co-regularization by Sindhwani et al.[ICML’05], which focus on two views only and does not have the effect of entropy minimization

29 2015-9-2Ping Luo CIKM 08 Overview Introduction Preliminaries Consensus Regularization Experimental Evaluation Related Works Conclusions

30 2015-9-2Ping Luo CIKM 08 Conclusions Propose a consensus regularization framework for transfer learning by learning from multiple source-domains  Maximize the likelihood of each model on its corresponding source domain  Maximize the consensus degree of all the trained models Extend the algorithm to a distributed implementation  Only some statistical values are shared between the source- domains and the target-domain, so it can modestly alleviate the privacy concerns Experiments on real-world text data sets show the effectiveness of our consensus regularization approach

31 2015-9-2Ping Luo CIKM 08 Q. & A. Acknowledgement


Download ppt "Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He."

Similar presentations


Ads by Google