Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois.

Similar presentations


Presentation on theme: "A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois."— Presentation transcript:

1 A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign

2 Nov 7, 2007 CIKM2007 2 What is domain adaptation?

3 Nov 7, 2007 CIKM2007 3 Example: named entity recognition New York Times train (labeled) test (unlabeled) NER Classifier standard supervised learning 85.5% persons, locations, organizations, etc.

4 Nov 7, 2007 CIKM2007 4 Example: named entity recognition New York Times NER Classifier train (labeled) test (unlabeled) New York Times labeled data not available Reuters 64.1% persons, locations, organizations, etc. non-standard (realistic) setting

5 Nov 7, 2007 CIKM2007 5 Domain difference  performance drop traintest NYT New York Times NER Classifier 85.5% Reuters NYT ReutersNew York Times NER Classifier 64.1% ideal setting realistic setting

6 Nov 7, 2007 CIKM2007 6 Another NER example traintest mouse gene name recognizer flymouse gene name recognizer ideal setting realistic setting 54.1% 28.1%

7 Nov 7, 2007 CIKM2007 7 Other examples Spam filtering:  Public email collection  personal inboxes Sentiment analysis of product reviews  Digital cameras  cell phones  Movies  books Can we do better than standard supervised learning? Domain adaptation: to design learning methods that are aware of the training and test domain difference.

8 Nov 7, 2007 CIKM2007 8 How do we solve the problem in general?

9 Nov 7, 2007 CIKM2007 9 Observation 1 domain-specific features wingless daughterless eyeless apexless …

10 Nov 7, 2007 CIKM2007 10 Observation 1 domain-specific features wingless daughterless eyeless apexless … describing phenotype in fly gene nomenclature feature “-less” weighted high CD38 PABPC5 … feature still useful for other organisms? No!

11 Nov 7, 2007 CIKM2007 11 Observation 2 generalizable features …decapentaplegic and wingless are expressed in analogous patterns in each … …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues.

12 Nov 7, 2007 CIKM2007 12 Observation 2 generalizable features …decapentaplegic and wingless are expressed in analogous patterns in each … …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues. feature “X be expressed”

13 Nov 7, 2007 CIKM2007 13 General idea: two-stage approach Source Domain Target Domain features generalizable features domain-specific features

14 Nov 7, 2007 CIKM2007 14 Goal Target Domain Source Domain features

15 Nov 7, 2007 CIKM2007 15 Regular classification Source Domain Target Domain features

16 Nov 7, 2007 CIKM2007 16 Generalization: to emphasize generalizable features in the trained model Source Domain Target Domain features Stage 1

17 Nov 7, 2007 CIKM2007 17 Adaptation: to pick up domain-specific features for the target domain Source Domain Target Domain features Stage 2

18 Nov 7, 2007 CIKM2007 18 Regular semi-supervised learning Source Domain Target Domain features

19 Nov 7, 2007 CIKM2007 19 Comparison with related work We explicitly model generalizable features.  Previous work models it implicitly [Blitzer et al. 2006, Ben- David et al. 2007, Daumé III 2007]. We do not need labeled target data but we need multiple source (training) domains.  Some work requires labeled target data [Daumé III 2007]. We have a 2 nd stage of adaptation, which uses semi-supervised learning.  Previous work does not incorporate semi-supervised learning [Blitzer et al. 2006, Ben-David et al. 2007, Daumé III 2007].

20 Nov 7, 2007 CIKM2007 20 Implementation of the two- stage approach with logistic regression classifiers

21 Nov 7, 2007 CIKM2007 21 x Logistic regression classifiers 01001::01001001::010 0.2 4.5 5 -0.3 3.0 : 2.1 -0.9 0.4 -less X be expressed wyT xwyT x p binary features … and wingless are expressed in… wywy

22 Nov 7, 2007 CIKM2007 22 Learning a logistic regression classifier 01001::01001001::010 0.2 4.5 5 -0.3 3.0 : 2.1 -0.9 0.4 log likelihood of training data regularization term penalize large weights control model complexity wyT xwyT x

23 Nov 7, 2007 CIKM2007 23 Generalizable features in weight vectors 0.2 4.5 5 -0.3 3.0 : 2.1 -0.9 0.4 3.2 0.5 4.5 -0.1 3.5 : 0.1 -0.2 0.1 0.7 4.2 0.1 3.2 : 1.7 0.1 0.3 D1D1 D2D2 DKDK w1w1 w2w2 wKwK … K source domains generalizable features domain-specific features

24 Nov 7, 2007 CIKM2007 24 We want to decompose w in this way 0.2 4.5 5 -0.3 3.0 : 2.1 -0.9 0.4 0 4.6 0 3.2 : 0 0.2 4.5 0.4 -0.3 -0.2 : 2.1 -0.9 0.4 =+ h non-zero entries for h generalizable features

25 Nov 7, 2007 CIKM2007 25 Feature selection matrix A 0 0 1 0 0 … 0 0 0 0 0 1 … 0 : 0 0 0 0 0 … 1 01001::01001001::010 01::001::0 x A z = Ax h matrix A selects h generalizable features

26 Nov 7, 2007 CIKM2007 26 0.2 4.5 5 -0.3 3.0 : 2.1 -0.9 0.4 4.6 3.2 : 3.6 0.2 4.5 0.4 -0.3 -0.2 : 2.1 -0.9 0.4 = + Decomposition of w 01001::01001001::010 01::001::0 01001::01001001::010 w T x = v T z + u T x weights for generalizable features weights for domain-specific features

27 Nov 7, 2007 CIKM2007 27 Decomposition of w w T x = v T z + u T x = (Av) T x + u T x = v T Ax + u T x w = A T v + u

28 Nov 7, 2007 CIKM2007 28 0.2 4.5 5 -0.3 3.0 : 2.1 -0.9 0.4 4.6 3.2 : 3.6 0.2 4.5 0.4 -0.3 -0.2 : 2.1 -0.9 0.4 =+ 0 0 … 0 1 0 … 0 0 0 … 0 0 1 … 0 : 0 0 … 0 Decomposition of w shared by all domainsdomain-specific w = A T v + u

29 Nov 7, 2007 CIKM2007 29 Framework for generalization Fix A, optimize: wkwk regularization term λ s >> 1: to penalize domain-specific features Source Domain Target Domain likelihood of labeled data from K source domains

30 Nov 7, 2007 CIKM2007 30 Framework for adaptation Source Domain Target Domain likelihood of pseudo labeled target domain examples λ t = 1 << λ s : to pick up domain-specific features in the target domain Fix A, optimize:

31 Nov 7, 2007 CIKM2007 31 Joint optimization  Alternating optimization How to find A? (1)

32 Nov 7, 2007 CIKM2007 32 How to find A? (2) Domain cross validation  Idea: training on (K – 1) source domains and test on the held-out source domain  Approximation: w f k : weight for feature f learned from domain k w f k : weight for feature f learned from other domains rank features by  See paper for details

33 Nov 7, 2007 CIKM2007 33 Intuition for domain cross validation … domains … expressed … -less D1D1 D2D2 D k-1 D k (fly) … -less … expressed … w 1.5 0.05 w 2.0 1.2 … expressed … -less 1.8 0.1

34 Nov 7, 2007 CIKM2007 34 Experiments Data set  BioCreative Challenge Task 1B  Gene/protein name recognition  3 organisms/domains: fly, mouse and yeast Experiment setup  2 organisms for training, 1 for testing  F1 as performance measure

35 Nov 7, 2007 CIKM2007 35 Experiments: Generalization MethodF+M  YM+Y  FY+F  M BL0.6330.1290.416 DA-1 (joint-opt) 0.6270.1530.425 DA-2 (domain CV) 0.6540.1950.470 Source Domain Target Domain Source Domain Target Domain using generalizable features is effective domain cross validation is more effective than joint optimization F: fly M: mouse Y: yeast

36 Nov 7, 2007 CIKM2007 36 Experiments: Adaptation MethodF+M  YM+Y  FY+F  M BL-SSL0.6330.2410.458 DA-2-SSL0.7590.3050.501 Source Domain Target Domain F: fly M: mouse Y: yeast domain-adaptive bootstrapping is more effective than regular bootstrapping Source Domain Target Domain

37 Nov 7, 2007 CIKM2007 37 Experiments: Adaptation domain-adaptive SSL is more effective, especially with a small number of pseudo labels

38 Nov 7, 2007 CIKM2007 38 Conclusions and future work Two-stage domain adaptation  Generalization: outperformed standard supervised learning  Adaptation: outperformed standard bootstrapping Two ways to find generalizable features  Domain cross validation is more effective Future work  Single source domain?  Setting parameters h and m

39 Nov 7, 2007 CIKM2007 39 References S. Ben-David, J. Blitzer, K. Crammer & F. Pereira. Analysis of representations for domain adaptation. NIPS 2007. J. Blitzer, R. McDonald & F. Pereira. Domain adaptation with structural correspondence learning. EMNLP 2006. H. Daumé III. Frustratingly easy domain adaptation. ACL 2007.

40 Nov 7, 2007 CIKM2007 40 Thank you!


Download ppt "A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois."

Similar presentations


Ads by Google