A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois.

Slides:



Advertisements
Similar presentations
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao Wei Fan Jing JiangJiawei Han University of Illinois at Urbana-Champaign IBM T. J.
Advertisements

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos.
A Survey on Transfer Learning Sinno Jialin Pan Department of Computer Science and Engineering The Hong Kong University of Science and Technology Joint.
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
On-line learning and Boosting
Farag Saad i-KNOW 2014 Graz- Austria,
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Support vector machine
Bilge Mutlu, Andreas Krause, Jodi Forlizzi, Carlos Guestrin, and Jessica Hodgins Human-Computer Interaction Institute, Carnegie Mellon University Robust,
Multi-Task Transfer Learning for Weakly- Supervised Relation Extraction Jing Jiang Singapore Management University ACL-IJCNLP 2009.
Differentiable Sparse Coding David Bradley and J. Andrew Bagnell NIPS
Confidence-Weighted Linear Classification Mark Dredze, Koby Crammer University of Pennsylvania Fernando Pereira Penn  Google.
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Self Taught Learning : Transfer learning from unlabeled data Presented by: Shankar B S DMML Lab Rajat Raina et al, CS, Stanford ICML 2007.
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
Today Linear Regression Logistic Regression Bayesians v. Frequentists
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Learning from Multiple Outlooks Maayan Harel and Shie Mannor ICML 2011 Presented by Minhua Chen.
Introduction to domain adaptation
Unsupervised Domain Adaptation: From Practice to Theory John Blitzer TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Instance Weighting for Domain Adaptation in NLP Jing Jiang & ChengXiang Zhai University of Illinois at Urbana-Champaign June 25, 2007.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
1 Introduction to Transfer Learning (Part 2) For 2012 Dragon Star Lectures Qiang Yang Hong Kong University of Science and Technology Hong Kong, China
Mehdi Ghayoumi Kent State University Computer Science Department Summer 2015 Exposition on Cyber Infrastructure and Big Data.
Introduction to Machine Learning for Information Retrieval Xiaolong Wang.
Functional Brain Signal Processing: EEG & fMRI Lesson 8 Kaushik Majumdar Indian Statistical Institute Bangalore Center M.Tech.
Using Fast Weights to Improve Persistent Contrastive Divergence Tijmen Tieleman Geoffrey Hinton Department of Computer Science, University of Toronto ICML.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
Domain Adaptation in Natural Language Processing Jing Jiang Department of Computer Science University of Illinois at Urbana-Champaign.
Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.
IIIT Hyderabad Synthesizing Classifiers for Novel Settings Viresh Ranjan CVIT,IIIT-H Adviser: Prof. C. V. Jawahar, IIIT-H Co-Adviser: Dr. Gaurav Harit,
Mianwei Zhou, Kevin Chen-Chuan Chang University of Illinois at Urbana-Champaign Unifying Learning to Rank and Domain Adaptation -- Enabling Cross-Task.
The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor.
University of Southern California Department Computer Science Bayesian Logistic Regression Model (Final Report) Graduate Student Teawon Han Professor Schweighofer,
Exploiting Domain Structure for Named Entity Recognition Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign.
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Crowdsourcing for Spoken Dialogue System Evaluation Ling 575 Spoken Dialog April 30, 2015.
Transfer Learning with Applications to Text Classification Jing Peng Computer Science Department.
CS 445/545 Machine Learning Winter, 2012 Course overview: –Instructor Melanie Mitchell –Textbook Machine Learning: An Algorithmic Approach by Stephen Marsland.
CS 445/545 Machine Learning Spring, 2013 See syllabus at
Confidence-Aware Graph Regularization with Heterogeneous Pairwise Features Yuan FangUniversity of Illinois at Urbana-Champaign Bo-June (Paul) HsuMicrosoft.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
November 10, 2004Dmitriy Fradkin, CIKM'041 A Design Space Approach to Analysis of Information Retrieval Adaptive Filtering Systems Dmitriy Fradkin, Paul.
Machine Learning CUNY Graduate Center Lecture 4: Logistic Regression.
Geodesic Flow Kernel for Unsupervised Domain Adaptation Boqing Gong University of Southern California Joint work with Yuan Shi, Fei Sha, and Kristen Grauman.
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification John Blitzer, Mark Dredze and Fernando Pereira University.
Domain Adaptation for Biomedical Information Extraction Jing Jiang BeeSpace Seminar Oct 17, 2007.
MAXIMUM ENTROPY MARKOV MODEL Adapted From: Heshaam Faili University of Tehran – Dikkala Sai Nishanth – Ashwin P. Paranjape
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Big Data Infrastructure Week 8: Data Mining (1/4) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.
Hierarchical Sampling for Active Learning Sanjoy Dasgupta and Daniel Hsu University of California, San Diego Session : Active Learning and Experimental.
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
BeeSpace Informatics Research
Big data classification using neural network
Boosting and Additive Trees
CIKM Competition 2014 Second Place Solution
Knowledge Transfer via Multiple Model Local Structure Mapping
Housam Babiker, Randy Goebel and Irene Cheng
Presentation transcript:

A Two-Stage Approach to Domain Adaptation for Statistical Classifiers Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign

Nov 7, 2007 CIKM What is domain adaptation?

Nov 7, 2007 CIKM Example: named entity recognition New York Times train (labeled) test (unlabeled) NER Classifier standard supervised learning 85.5% persons, locations, organizations, etc.

Nov 7, 2007 CIKM Example: named entity recognition New York Times NER Classifier train (labeled) test (unlabeled) New York Times labeled data not available Reuters 64.1% persons, locations, organizations, etc. non-standard (realistic) setting

Nov 7, 2007 CIKM Domain difference  performance drop traintest NYT New York Times NER Classifier 85.5% Reuters NYT ReutersNew York Times NER Classifier 64.1% ideal setting realistic setting

Nov 7, 2007 CIKM Another NER example traintest mouse gene name recognizer flymouse gene name recognizer ideal setting realistic setting 54.1% 28.1%

Nov 7, 2007 CIKM Other examples Spam filtering:  Public collection  personal inboxes Sentiment analysis of product reviews  Digital cameras  cell phones  Movies  books Can we do better than standard supervised learning? Domain adaptation: to design learning methods that are aware of the training and test domain difference.

Nov 7, 2007 CIKM How do we solve the problem in general?

Nov 7, 2007 CIKM Observation 1 domain-specific features wingless daughterless eyeless apexless …

Nov 7, 2007 CIKM Observation 1 domain-specific features wingless daughterless eyeless apexless … describing phenotype in fly gene nomenclature feature “-less” weighted high CD38 PABPC5 … feature still useful for other organisms? No!

Nov 7, 2007 CIKM Observation 2 generalizable features …decapentaplegic and wingless are expressed in analogous patterns in each … …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues.

Nov 7, 2007 CIKM Observation 2 generalizable features …decapentaplegic and wingless are expressed in analogous patterns in each … …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues. feature “X be expressed”

Nov 7, 2007 CIKM General idea: two-stage approach Source Domain Target Domain features generalizable features domain-specific features

Nov 7, 2007 CIKM Goal Target Domain Source Domain features

Nov 7, 2007 CIKM Regular classification Source Domain Target Domain features

Nov 7, 2007 CIKM Generalization: to emphasize generalizable features in the trained model Source Domain Target Domain features Stage 1

Nov 7, 2007 CIKM Adaptation: to pick up domain-specific features for the target domain Source Domain Target Domain features Stage 2

Nov 7, 2007 CIKM Regular semi-supervised learning Source Domain Target Domain features

Nov 7, 2007 CIKM Comparison with related work We explicitly model generalizable features.  Previous work models it implicitly [Blitzer et al. 2006, Ben- David et al. 2007, Daumé III 2007]. We do not need labeled target data but we need multiple source (training) domains.  Some work requires labeled target data [Daumé III 2007]. We have a 2 nd stage of adaptation, which uses semi-supervised learning.  Previous work does not incorporate semi-supervised learning [Blitzer et al. 2006, Ben-David et al. 2007, Daumé III 2007].

Nov 7, 2007 CIKM Implementation of the two- stage approach with logistic regression classifiers

Nov 7, 2007 CIKM x Logistic regression classifiers 01001:: :: : less X be expressed wyT xwyT x p binary features … and wingless are expressed in… wywy

Nov 7, 2007 CIKM Learning a logistic regression classifier 01001:: :: : log likelihood of training data regularization term penalize large weights control model complexity wyT xwyT x

Nov 7, 2007 CIKM Generalizable features in weight vectors : : : D1D1 D2D2 DKDK w1w1 w2w2 wKwK … K source domains generalizable features domain-specific features

Nov 7, 2007 CIKM We want to decompose w in this way : : : =+ h non-zero entries for h generalizable features

Nov 7, 2007 CIKM Feature selection matrix A … … 0 : … :: ::010 01::001::0 x A z = Ax h matrix A selects h generalizable features

Nov 7, 2007 CIKM : : : = + Decomposition of w 01001:: ::010 01::001:: :: ::010 w T x = v T z + u T x weights for generalizable features weights for domain-specific features

Nov 7, 2007 CIKM Decomposition of w w T x = v T z + u T x = (Av) T x + u T x = v T Ax + u T x w = A T v + u

Nov 7, 2007 CIKM : : : =+ 0 0 … … … … 0 : 0 0 … 0 Decomposition of w shared by all domainsdomain-specific w = A T v + u

Nov 7, 2007 CIKM Framework for generalization Fix A, optimize: wkwk regularization term λ s >> 1: to penalize domain-specific features Source Domain Target Domain likelihood of labeled data from K source domains

Nov 7, 2007 CIKM Framework for adaptation Source Domain Target Domain likelihood of pseudo labeled target domain examples λ t = 1 << λ s : to pick up domain-specific features in the target domain Fix A, optimize:

Nov 7, 2007 CIKM Joint optimization  Alternating optimization How to find A? (1)

Nov 7, 2007 CIKM How to find A? (2) Domain cross validation  Idea: training on (K – 1) source domains and test on the held-out source domain  Approximation: w f k : weight for feature f learned from domain k w f k : weight for feature f learned from other domains rank features by  See paper for details

Nov 7, 2007 CIKM Intuition for domain cross validation … domains … expressed … -less D1D1 D2D2 D k-1 D k (fly) … -less … expressed … w w … expressed … -less

Nov 7, 2007 CIKM Experiments Data set  BioCreative Challenge Task 1B  Gene/protein name recognition  3 organisms/domains: fly, mouse and yeast Experiment setup  2 organisms for training, 1 for testing  F1 as performance measure

Nov 7, 2007 CIKM Experiments: Generalization MethodF+M  YM+Y  FY+F  M BL DA-1 (joint-opt) DA-2 (domain CV) Source Domain Target Domain Source Domain Target Domain using generalizable features is effective domain cross validation is more effective than joint optimization F: fly M: mouse Y: yeast

Nov 7, 2007 CIKM Experiments: Adaptation MethodF+M  YM+Y  FY+F  M BL-SSL DA-2-SSL Source Domain Target Domain F: fly M: mouse Y: yeast domain-adaptive bootstrapping is more effective than regular bootstrapping Source Domain Target Domain

Nov 7, 2007 CIKM Experiments: Adaptation domain-adaptive SSL is more effective, especially with a small number of pseudo labels

Nov 7, 2007 CIKM Conclusions and future work Two-stage domain adaptation  Generalization: outperformed standard supervised learning  Adaptation: outperformed standard bootstrapping Two ways to find generalizable features  Domain cross validation is more effective Future work  Single source domain?  Setting parameters h and m

Nov 7, 2007 CIKM References S. Ben-David, J. Blitzer, K. Crammer & F. Pereira. Analysis of representations for domain adaptation. NIPS J. Blitzer, R. McDonald & F. Pereira. Domain adaptation with structural correspondence learning. EMNLP H. Daumé III. Frustratingly easy domain adaptation. ACL 2007.

Nov 7, 2007 CIKM Thank you!