Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise.

Presentation on theme: "Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise."— Presentation transcript:

Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise ask and dont stretch it

2 Standard Supervised Learning New York Times training (labeled) test (unlabeled) Classifier New York Times 85.5%

3 In Reality…… New York Times training (labeled) test (unlabeled) New York Times Labeled data are insufficient! 47.3% How to improve the performance?

4 Solution I : Active Learning New York Times training (labeled) test (unlabeled) Classifier New York Times Label Domain Expert \$ Labeling Cost 83.4%

5 Solution II : Transfer Learning Reuters Out-of-domain training (labeled) In-domain test (unlabeled) Transfer Classifier New York Times No guarantee transfer learning could help! Accuracy drops Significant Differences 82.6%?? 43.5%

6 Motivation Active Learning: –Labeling cost Transfer Learning: –Domain difference risk Both have disadvantages, what to choose?

7 Active Learner choose Proposed Solution (AcTraK) Reuters Transfer Classifier Domain Expert Label Unreliable Decision Function Reliable, label by the classifier Classification Result Test Labeled Training Classifier Unlabeled in-domain Training Data out- domain training (labeled)

8 Transfer Classifier MoMo M L+ M L- L+L+ L-L- + - X: In-domain unlabeled 1.Classify X by out-of-domain M o : P(L+|X, M o ) and P(L-|X, M o ). 2.Classify X by mapping classifiers M L+ and M L- : P(+|X, M L+ ) and P(+|X, M L- ). 3.Then the probability for X to be + is: T(X) = P(+|X) = P(L+|X, M o ) × P(+|X, M L+ ) + P(L-|X, M o ) ×P(+|X, M L- ) Out-of-domain dataset (labeled) In-domain labeled (few) P(L+|X, M o ) P(L-|X, M o ) P(+|X, M L+ ) P(+|X, M L- ) Train MoMo L+L+ L-L- In-domain labeled (very few) M L+ M L- Train L + = { (x,y=+/-)|M o (x)=L+ } the true in-domain label may be either- or + -/L--/L+ +/L-+/L+ In- domain Label Transfer M o mapping

9 Active Learner Our Solution (AcTraK) Reuters Transfer Classifier Domain Expert Label Unreliable Decision Function Reliable, label by the classifier Classification Result Test Labeled Training Classifier unlabeled Training Data outdomain training (labeled)

when prediction by transfer classifier is unreliable, ask domain experts 10 Decision Function Transfer Classifier In the following, ask the domain expert to label the instance, not the transfer classifier: a) Conflictb) Low in confidence c) Few labeled in-domain examples

11 Decision Function a) Conflict? b) Confidence?c) Size? Decision Function: Label by Transfer Classifier Label by Domain Expert R : random number [0,1] AcTraK asks the domain expert to label the instance with probability of T(x): prediction by the transfer classifier M L (x): prediction given by the in-domain classifier

12 It can reduce domain difference risk. - According to Theorem 2, the expected error is bounded.Theorem 2 It can reduce Labeling cost. - According to Theorem 3, the query probability is bounded.Theorem 3 Properties

13 Theorems expected error of the transfer classifier Maximum size

14 Data Sets –Synthetic data sets –Remote Sensing: data collected from regions with a specific ground surface condition data collected from a new region –Text classification: same top-level classification problems with different sub-fields in the training and test sets (Newsgroup) Comparable Models –Inductive Learning model: AdaBoost, SVM –Transfer Learning model: TrAdaBoost (ICML07) –Active Learning model: ERS (ICML01) Experiments setup

15 Experiments on Synthetic Datasets In-domain: 2 labeled training & testing 4 out domain labeled training

16 Experiments on Real World Dataset Evaluation metric: Compared with transfer learning on accuracy. Compared with active learning on IEA (Integral Evaluation on Accuracy).

17 1. Comparison with Transfer Learner 2. Comparison with Active Learner 20 Newsgroup comparison with active learner ERS

18 Actively Transfer Domain Knowledge –Reduce domain difference risk: transfer useful knowledge (Theorem 2) –Reduce labeling cost: query domain experts only when necessary (Theorem 3) Conclusions

Similar presentations