A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign
Nov 7, 2007 CIKM What is domain adaptation?
Nov 7, 2007 CIKM Example: named entity recognition New York Times train (labeled) test (unlabeled) NER Classifier standard supervised learning 85.5% persons, locations, organizations, etc.
Nov 7, 2007 CIKM Example: named entity recognition New York Times NER Classifier train (labeled) test (unlabeled) New York Times labeled data not available Reuters 64.1% persons, locations, organizations, etc. non-standard (realistic) setting
Nov 7, 2007 CIKM Domain difference performance drop traintest NYT New York Times NER Classifier 85.5% Reuters NYT ReutersNew York Times NER Classifier 64.1% ideal setting realistic setting
Nov 7, 2007 CIKM Another NER example traintest mouse gene name recognizer flymouse gene name recognizer ideal setting realistic setting 54.1% 28.1%
Nov 7, 2007 CIKM Other examples Spam filtering: Public collection personal inboxes Sentiment analysis of product reviews Digital cameras cell phones Movies books Can we do better than standard supervised learning? Domain adaptation: to design learning methods that are aware of the training and test domain difference.
Nov 7, 2007 CIKM How do we solve the problem in general?
Nov 7, 2007 CIKM Observation 1 domain-specific features wingless daughterless eyeless apexless …
Nov 7, 2007 CIKM Observation 1 domain-specific features wingless daughterless eyeless apexless … describing phenotype in fly gene nomenclature feature “-less” weighted high CD38 PABPC5 … feature still useful for other organisms? No!
Nov 7, 2007 CIKM Observation 2 generalizable features …decapentaplegic and wingless are expressed in analogous patterns in each … …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues.
Nov 7, 2007 CIKM Observation 2 generalizable features …decapentaplegic and wingless are expressed in analogous patterns in each … …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues. feature “X be expressed”
Nov 7, 2007 CIKM General idea: two-stage approach Source Domain Target Domain features generalizable features domain-specific features
Nov 7, 2007 CIKM Goal Target Domain Source Domain features
Nov 7, 2007 CIKM Regular classification Source Domain Target Domain features
Nov 7, 2007 CIKM Generalization: to emphasize generalizable features in the trained model Source Domain Target Domain features Stage 1
Nov 7, 2007 CIKM Adaptation: to pick up domain-specific features for the target domain Source Domain Target Domain features Stage 2
Nov 7, 2007 CIKM Regular semi-supervised learning Source Domain Target Domain features
Nov 7, 2007 CIKM Comparison with related work We explicitly model generalizable features. Previous work models it implicitly [Blitzer et al. 2006, Ben- David et al. 2007, Daumé III 2007]. We do not need labeled target data but we need multiple source (training) domains. Some work requires labeled target data [Daumé III 2007]. We have a 2 nd stage of adaptation, which uses semi-supervised learning. Previous work does not incorporate semi-supervised learning [Blitzer et al. 2006, Ben-David et al. 2007, Daumé III 2007].
Nov 7, 2007 CIKM Implementation of the two- stage approach with logistic regression classifiers
Nov 7, 2007 CIKM x Logistic regression classifiers 01001:: :: : less X be expressed wyT xwyT x p binary features … and wingless are expressed in… wywy
Nov 7, 2007 CIKM Learning a logistic regression classifier 01001:: :: : log likelihood of training data regularization term penalize large weights control model complexity wyT xwyT x
Nov 7, 2007 CIKM Generalizable features in weight vectors : : : D1D1 D2D2 DKDK w1w1 w2w2 wKwK … K source domains generalizable features domain-specific features
Nov 7, 2007 CIKM We want to decompose w in this way : : : =+ h non-zero entries for h generalizable features
Nov 7, 2007 CIKM Feature selection matrix A … … 0 : … :: ::010 01::001::0 x A z = Ax h matrix A selects h generalizable features
Nov 7, 2007 CIKM : : : = + Decomposition of w 01001:: ::010 01::001:: :: ::010 w T x = v T z + u T x weights for generalizable features weights for domain-specific features
Nov 7, 2007 CIKM Decomposition of w w T x = v T z + u T x = (Av) T x + u T x = v T Ax + u T x w = A T v + u
Nov 7, 2007 CIKM : : : =+ 0 0 … … … … 0 : 0 0 … 0 Decomposition of w shared by all domainsdomain-specific w = A T v + u
Nov 7, 2007 CIKM Framework for generalization Fix A, optimize: wkwk regularization term λ s >> 1: to penalize domain-specific features Source Domain Target Domain likelihood of labeled data from K source domains
Nov 7, 2007 CIKM Framework for adaptation Source Domain Target Domain likelihood of pseudo labeled target domain examples λ t = 1 << λ s : to pick up domain-specific features in the target domain Fix A, optimize:
Nov 7, 2007 CIKM Joint optimization Alternating optimization How to find A? (1)
Nov 7, 2007 CIKM How to find A? (2) Domain cross validation Idea: training on (K – 1) source domains and test on the held-out source domain Approximation: w f k : weight for feature f learned from domain k w f k : weight for feature f learned from other domains rank features by See paper for details
Nov 7, 2007 CIKM Intuition for domain cross validation … domains … expressed … -less D1D1 D2D2 D k-1 D k (fly) … -less … expressed … w w … expressed … -less
Nov 7, 2007 CIKM Experiments Data set BioCreative Challenge Task 1B Gene/protein name recognition 3 organisms/domains: fly, mouse and yeast Experiment setup 2 organisms for training, 1 for testing F1 as performance measure
Nov 7, 2007 CIKM Experiments: Generalization MethodF+M YM+Y FY+F M BL DA-1 (joint-opt) DA-2 (domain CV) Source Domain Target Domain Source Domain Target Domain using generalizable features is effective domain cross validation is more effective than joint optimization F: fly M: mouse Y: yeast
Nov 7, 2007 CIKM Experiments: Adaptation MethodF+M YM+Y FY+F M BL-SSL DA-2-SSL Source Domain Target Domain F: fly M: mouse Y: yeast domain-adaptive bootstrapping is more effective than regular bootstrapping Source Domain Target Domain
Nov 7, 2007 CIKM Experiments: Adaptation domain-adaptive SSL is more effective, especially with a small number of pseudo labels
Nov 7, 2007 CIKM Conclusions and future work Two-stage domain adaptation Generalization: outperformed standard supervised learning Adaptation: outperformed standard bootstrapping Two ways to find generalizable features Domain cross validation is more effective Future work Single source domain? Setting parameters h and m
Nov 7, 2007 CIKM References S. Ben-David, J. Blitzer, K. Crammer & F. Pereira. Analysis of representations for domain adaptation. NIPS J. Blitzer, R. McDonald & F. Pereira. Domain adaptation with structural correspondence learning. EMNLP H. Daumé III. Frustratingly easy domain adaptation. ACL 2007.
Nov 7, 2007 CIKM Thank you!