Presentation is loading. Please wait.

Presentation is loading. Please wait.

Domain Adaptation in Natural Language Processing Jing Jiang Department of Computer Science University of Illinois at Urbana-Champaign.

Similar presentations


Presentation on theme: "Domain Adaptation in Natural Language Processing Jing Jiang Department of Computer Science University of Illinois at Urbana-Champaign."— Presentation transcript:

1 Domain Adaptation in Natural Language Processing Jing Jiang Department of Computer Science University of Illinois at Urbana-Champaign

2 Jan 8, 20082 Contains much useful information –E.g. >85% corporate data stored as text Hard to handle –Large amount: e.g. by 2002, 2.5 billion documents on surface Web, +7.3 million / day –Diversity: emails, news, digital libraries, Web logs, etc. –Unstructured: vs. relation databases How to manage textual data? Textual Data in the Information Age

3 Jan 8, 20083 Information retrieval: to rank documents based on relevance to keyword queries Not always satisfactory –More sophisticated services desired

4 Jan 8, 20084 Automatic Text Summarization

5 Jan 8, 20085 Question Answering

6 Jan 8, 20086 CompanyFounder …… GoogleLarry Page …… Information Extraction

7 Jan 8, 20087 Beyond Information Retrieval Automatic text summarization Question answering Information extraction Sentiment analysis Machine translation Etc. All relies on Natural Language Processing (NLP) techniques to deeply understand and analyze text

8 Jan 8, 20088 Typical NLP Tasks “Larry Page was Google’s founding CEO” Part-of-speech tagging Larry/noun Page/noun was/verb Google/noun ’s/possessive-end founding/adjective CEO/noun Chunking [NP: Larry Page] [V: was] [NP: Google ’s founding CEO] Named entity recognition [person: Larry Page] was [organization: Google] ’s founding CEO Relation extraction Founder(Larry Page, Google) Word sense disambiguation “Larry Page” vs. “Page 81” state-of-the-art solution: supervised machine learning

9 Jan 8, 20089 WSJ articles Supervised Learning for NLP Larry/NNP Page/NNP was/VBD Google/NNP ’s/POS founding/ADJ CEO/NN trained POS tagger Standard Supervised Learning Algorithm part-of-speech tagging on news articles representative corpushuman annotation POS-tagged WSJ articles training

10 Jan 8, 200810 MEDLINE articles In Reality… We/PRP analyzed/VBD the/DT mutations/NNS of/IN the/DT H-ras/NN genes/NNS trained POS tagger Standard Supervised Learning Algorithm part-of-speech tagging on biomedical articles representative corpushuman annotation POS-tagged MEDLINE articles training X human annotation is expensive POS-tagged WSJ articles

11 Jan 8, 200811 Many Other Examples Named entity recognition –News articles  personal blogs –Organism A  organism B Spam filtering –Public email collection  personal inboxes Sentiment analysis of product reviews (positive vs. negative) –Movies  books –Cell phones  digital cameras Problem with this non-standard setting with domain difference?

12 Jan 8, 200812 Domain Difference  Performance Degradation MEDLINE POS Tagger ~96% WSJ MEDLINE POS Tagger ~86% ideal setting realistic setting

13 Jan 8, 200813 Another Example gene name recognizer 54.1% gene name recognizer 28.1% ideal setting realistic setting

14 Jan 8, 200814 Domain Adaptation source domain target domain Labeled Unlabeled Domain Adaptive Learning Algorithm to design learning algorithms that are aware of domain difference and exploit all available data to adapt to the target domain

15 Jan 8, 200815 With Domain Adaptation Techniques… Fly + Mouse Yeast gene name recognizer 63.3% Fly + Mouse Yeast gene name recognizer 75.9% standard learning domain adaptive learning

16 Jan 8, 200816 Roadmap What is domain adaptation in NLP? Our work –Overview –Instance weighting –Feature selection Summary and future work

17 Jan 8, 200817 Overview Source Domain Target Domain

18 Jan 8, 200818 Ideal Goal Target Domain Source Domain

19 Jan 8, 200819 Standard Supervised Learning Source Domain Target Domain

20 Jan 8, 200820 Source Domain Target Domain Standard Semi-Supervised Learning

21 Jan 8, 200821 Idea 1: Generalization Source Domain Target Domain

22 Jan 8, 200822 Idea 2: Adaptation Source Domain Target Domain

23 Jan 8, 200823 Source Domain Target Domain How to formally formulate the ideas?

24 Jan 8, 200824 Instance Weighting Source Domain Target Domain instance space (each point represents an observed instance) to find appropriate weights for different instances

25 Jan 8, 200825 Feature Selection Source Domain Target Domain feature space (each point represents a useful feature) to separate generalizable features from domain-specific features

26 Jan 8, 200826 Roadmap What is domain adaptation in NLP? Our work –Overview –Instance weighting –Feature selection Summary and future work

27 Jan 8, 200827 Observation source domain target domain

28 Jan 8, 200828 Observation source domain target domain

29 Jan 8, 200829 Analysis of Domain Difference p(x, y) p(x)p(y | x) p s (y | x) ≠ p t (y | x) p s (x) ≠ p t (x) labeling difference instance difference labeling adaptation instance adaptation ? x: observed instancey: class label (to be predicted)

30 Jan 8, 200830 Labeling Adaptation source domain target domain p t (y | x) ≠ p s (y | x) remove/demote instances

31 Jan 8, 200831 Labeling Adaptation source domain target domain p t (y | x) ≠ p s (y | x) remove/demote instances

32 Jan 8, 200832 Instance Adaptation (p t (x) < p s (x)) source domain target domain p t (x) < p s (x) remove/demote instances

33 Jan 8, 200833 Instance Adaptation (p t (x) < p s (x)) source domain target domain p t (x) < p s (x) remove/demote instances

34 Jan 8, 200834 Instance Adaptation (p t (x) > p s (x)) source domain target domain p t (x) > p s (x) promote instances

35 Jan 8, 200835 Instance Adaptation (p t (x) > p s (x)) source domain target domain p t (x) > p s (x) promote instances

36 Jan 8, 200836 Instance Adaptation (p t (x) > p s (x)) Target domain instances are useful source domain target domain p t (x) > p s (x)

37 Jan 8, 200837 Empirical Risk Minimization with Three Sets of Instances DsDs D t, l D t, u loss function expected loss use empirical loss to replace expected loss optimal classification model

38 Jan 8, 200838 Using D s DsDs D t, l D t, u instance difference (hard for high-dimensional data) XDsXDs labeling difference (need labeled target data)

39 Jan 8, 200839 DsDs D t, l D t, u Using D t,l X  D t,l small sample size estimation not accurate

40 Jan 8, 200840 DsDs D t, l D t, u Using D t,u X  D t,u use predicted labels (bootstrapping)

41 Jan 8, 200841 Combined Framework a flexible setup covering both standard methods and new domain adaptive methods

42 Jan 8, 200842 Experiments NLP tasks –POS tagging: WSJ (Penn TreeBank)  Oncology (biomedical) text (Penn BioIE) –NE type classification: newswire  conversational telephone speech (CTS) and web-log (WL) (ACE 2005) –Spam filtering: public email collection  personal inboxes (u01, u02, u03) (ECML/PKDD 2006) Three heuristics to partially explore the parameter settings

43 Jan 8, 200843 Instance Pruning removing “misleading” instances from D s kCTSkWL 00.781500.7045 16000.864012000.6975 32000.882524000.6795 all0.8830all0.6600 kOncology 00.8630 80000.8709 160000.8714 all0.8720 kUser 1User 2User 3 00.63060.69500.7644 3000.66110.72280.8222 6000.79110.83220.8328 all0.81060.85170.8067 POS NE Type Spam useful in most cases; failed in some case When is it guaranteed to work? (future work)

44 Jan 8, 200844 D t,l with Larger Weights methodCTSWL DsDs 0.78150.7045 D s + D t,l 0.93400.7735 D s + 5D t,l 0.93600.7820 D s + 10D t,l 0.93550.7840 methodOncology DsDs 0.8630 D s + D t,l 0.9349 D s + 10D t,l 0.9429 D s + 20D t,l 0.9443 methodUser 1User 2User 3 DsDs 0.63060.69500.7644 D s + D t,l 0.9572 0.9461 D s + 5D t,l 0.96280.96110.9601 D s + 10D t,l 0.96390.96280.9633 POS NE Type Spam D t,l is very useful promoting D t,l is more useful

45 Jan 8, 200845 Bootstrapping with Larger Weights until D s and D t,u are balanced methodCTSWL supervised0.77810.7351 standard bootstrap 0.89170.7498 balanced bootstrap 0.89230.7523 methodOncology supervised0.8630 standard bootstrap 0.8728 balanced bootstrap 0.8750 methodUser 1User 2User 3 supervised0.64760.69760.8068 standard bootstrap 0.87200.92120.9760 balanced bootstrap 0.88160.92560.9772 POSNE Type Spam promoting target instances is useful, even with predicted labels

46 Jan 8, 200846 Roadmap What is domain adaptation in NLP? Our work –Overview –Instance weighting –Feature selection Summary and future work

47 Jan 8, 200847 Observation 1 Domain-specific features wingless daughterless eyeless apexless …

48 Jan 8, 200848 Observation 1 Domain-specific features wingless daughterless eyeless apexless … describing phenotype in fly gene nomenclature feature “-less” useful for this organism CD38 PABPC5 … feature still useful for other organisms? No!

49 Jan 8, 200849 Observation 2 Generalizable features …decapentaplegic and wingless are expressed in analogous patterns in each … …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues.

50 Jan 8, 200850 Observation 2 Generalizable features …decapentaplegic and wingless are expressed in analogous patterns in each … …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues. feature “X be expressed”

51 Jan 8, 200851 Assume Multiple Source Domains source domains target domain LabeledUnlabeled Domain Adaptive Learning Algorithm

52 Jan 8, 200852 x Detour: Logistic Regression Classifiers 01001::01001001::010 0.2 4.5 5 -0.3 3.0 : 2.1 -0.9 0.4 -less X be expressed wyT xwyT x p binary features … and wingless are expressed in… wywy

53 Jan 8, 200853 Learning a Logistic Regression Classifier 01001::01001001::010 0.2 4.5 5 -0.3 3.0 : 2.1 -0.9 0.4 log likelihood of training data regularization term penalize large weights control model complexity wyT xwyT x

54 Jan 8, 200854 Generalizable Features in Weight Vectors 0.2 4.5 5 -0.3 3.0 : 2.1 -0.9 0.4 3.2 0.5 4.5 -0.1 3.5 : 0.1 -0.2 0.1 0.7 4.2 0.1 3.2 : 1.7 0.1 0.3 D1D1 D2D2 DKDK w1w1 w2w2 wKwK … K source domains generalizable features domain-specific features

55 Jan 8, 200855 0.2 4.5 5 -0.3 3.0 : 2.1 -0.9 0.4 4.6 3.2 : 3.6 0.2 4.5 0.4 -0.3 -0.2 : 2.1 -0.9 0.4 =+ 0 0 … 0 1 0 … 0 0 0 … 0 0 1 … 0 : 0 0 … 0 Decomposition of w k for Each Source Domain shared by all domainsdomain-specific w k = A T v + u k a matrix that selects generalizable features

56 Jan 8, 200856 Framework for Generalization Fix A, optimize: wkwk regularization term λ s >> 1: to penalize domain-specific features Source Domain Target Domain log likelihood of labeled data from K source domains

57 Jan 8, 200857 Framework for Adaptation Source Domain Target Domain log likelihood of target domain examples with predicted labels λ t = 1 << λ s : to pick up domain-specific features in the target domain Fix A, optimize:

58 Jan 8, 200858 Joint optimization How to Find A? (1)

59 Jan 8, 200859 How to Find A? (2) Domain cross validation –Idea: training on (K – 1) source domains and validate on the held-out source domain –Approximation: w f k : weight for feature f learned from domain k w f k : weight for feature f learned from other domains rank features by

60 Jan 8, 200860 Intuition for Domain Cross Validation … domains … expressed … -less D1D1 D2D2 D k-1 D k (fly) … -less … expressed … w 1.5 0.05 w 2.0 1.2 … expressed … -less 1.8 0.1 product of w 1 and w 2 w1w1 w2w2

61 Jan 8, 200861 Experiments Data set –BioCreative Challenge Task 1B –Gene/protein recognition –3 organisms/domains: fly, mouse and yeast Experimental setup –2 organisms for training, 1 for testing –F1 as performance measure

62 Jan 8, 200862 Experiments: Generalization MethodF+M  YM+Y  FY+F  M BL0.6330.1290.416 DA-1 (joint-opt) 0.6270.1530.425 DA-2 (domain CV) 0.6540.1950.470 Source Domain Target Domain Source Domain Target Domain using generalizable features is effective F: fly M: mouse Y: yeast domain cross validation is more effective than joint optimization

63 Jan 8, 200863 Experiments: Adaptation MethodF+M  YM+Y  FY+F  M BL-SSL0.6330.2410.458 DA-2-SSL0.7590.3050.501 Source Domain Target Domain F: fly M: mouse Y: yeast Source Domain Target Domain domain-adaptive bootstrapping is more effective than regular bootstrapping

64 Jan 8, 200864 Related Work Problem relatively new to NLP and ML communities –Most related work developed concurrently with our work Instances Used Standard Instance Weighting Feature Selection IW + FS DsDs supervised learning Shimodaira 00Blitzer et al. 06 Our Future Wok D s + D t,l supervised learning Daumé III & Marcus 06 Daumé III 07 D s + D t,u semi- supervised learning ACL’07 HLT’06, CIKM’07 D s + D t,l + D t,u semi- supervised learning

65 Jan 8, 200865 Roadmap What is domain adaptation in NLP? Our work –Overview –Instance weighting –Feature selection Summary and future work

66 Jan 8, 200866 Summary Domain adaptation is a critical novel problem in natural language processing and machine learning Contributions –First systematic formal analysis of domain adaptation –Two novel general frameworks, both shown to be effective –Potentially applicable to other classification problems outside of NLP Future work –Domain difference measure –Unify two frameworks –Incorporate domain knowledge into adaptation process –Leverage domain adaptation to perform large-scale information extraction on scientific literature and on the Web

67 Jan 8, 200867 Information Extraction System Existing Knowledge Bases Labeled Data from Related Domains Entity Recognition Relation Extraction Intelligent Learning Knowledge Resources Exploitation Interactive Expert Supervisio n Domain Adaptive Learning Domain Expert

68 Jan 8, 200868 Biomedical Literature (MEDLINE abstracts, full-text articles, etc.) DWnt-2 is expressed in somatic cells of the gonad throughout development. Entity Recognition Relation Extraction Information Extraction System Extracted Facts genetissue/position DWnt-2gonad expression relations Inference Engine Pathway Construction … Hypothesis Generation Knowledge Base Curation Applications

69 Jan 8, 200869 Applications (cont.) Similar ideas for Web text mining –Product reviews Existing annotated reviews limited (certain products from certain sources) Large amount of semi-structured reviews from review websites Unstructured reviews from personal blogs

70 Jan 8, 200870 Selected Publications J. Jiang & C. Zhai. “A two-stage approach to domain adaptation for statistical classifiers.” In CIKM’07. J. Jiang & C. Zhai. “Instance weighting for domain adaptation in NLP.” In ACL’07. J. Jiang & C. Zhai. “Exploiting domain structure for named entity recognition.” In HLT-NAACL’06. J. Jiang & C. Zhai. “A systematic exploration of the feature space for relation extraction.” In NAACL-HLT’07. J. Jiang & C. Zhai. “Extraction of coherent relevant passages using hidden Markov models.” ACM Transactions on Information Systems (TOIS), Jul 2006. J. Jiang & C. Zhai. “An empirical study of tokenization strategies for biomedical information retrieval.” Information Retrieval, Oct 2007. X. Ling, J. Jiang, X. He, Q. Mei, C. Zhai & B. Schatz. “Generating semi- structured gene summaries from biomedical literature.” Information Processing & Management, Nov 2007. X. Ling, J. Jiang, X. He, Q. Mei, C. Zhai & B. Schatz. “Automatically generating gene summaries from biomedical literature.” In PSB’06. this talk feature exploration for relation extraction information retrieval gene summarization


Download ppt "Domain Adaptation in Natural Language Processing Jing Jiang Department of Computer Science University of Illinois at Urbana-Champaign."

Similar presentations


Ads by Google