Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012.

Similar presentations


Presentation on theme: "Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012."— Presentation transcript:

1 Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012

2 Roadmap Semi-supervised learning: Motivation & perspective Yarowsky’s model Co-training Summary 2

3 Semi-supervised Learning 3

4 Motivation Supervised learning: 4

5 Motivation Supervised learning: Works really well But need lots of labeled training data Unsupervised learning: 5

6 Motivation Supervised learning: Works really well But need lots of labeled training data Unsupervised learning: No labeled data required, but May not work well, may not learn desired distinctions 6

7 Motivation Supervised learning: Works really well But need lots of labeled training data Unsupervised learning: No labeled data required, but May not work well, may not learn desired distinctions E.g. Unsupervised parsing techniques Fits data, but doesn’t correspond to linguistic intuition 7

8 Solution Semi-supervised learning: 8

9 Solution Semi-supervised learning: General idea: Use a small amount of labeled training data 9

10 Solution Semi-supervised learning: General idea: Use a small amount of labeled training data Augment with large amount of unlabeled training data Use information in unlabeled data to improve models 10

11 Solution Semi-supervised learning: General idea: Use a small amount of labeled training data Augment with large amount of unlabeled training data Use information in unlabeled data to improve models Many different semi-supervised machine learners Variants of supervised techniques: Semi-supervised SVMs, CRFs, etc 11

12 Solution Semi-supervised learning: General idea: Use a small amount of labeled training data Augment with large amount of unlabeled training data Use information in unlabeled data to improve models Many different semi-supervised machine learners Variants of supervised techniques: Semi-supervised SVMs, CRFs, etc Bootstrapping approaches Yarowsky’s method, self-training, co-training 12

13 There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. Biological Example The Paulus company was founded in Since those days the product range has been the subject of constant expansions and is brought up continuously to correspond with the state of the art. We ’ re engineering, manufacturing and commissioning world- wide ready-to-run plants packed with our comprehensive know- how. Our Product Range includes pneumatic conveying systems for carbon, carbide, sand, lime andmany others. We use reagent injection in molten metal for the… Industrial Example Label the First Use of “ Plant ” 13

14 Word Sense Disambiguation Application of lexical semantics Goal: Given a word in context, identify the appropriate sense E.g. plants and animals in the rainforest Crucial for real syntactic & semantic analysis 14

15 Word Sense Disambiguation Application of lexical semantics Goal: Given a word in context, identify the appropriate sense E.g. plants and animals in the rainforest Crucial for real syntactic & semantic analysis Correct sense can determine. 15

16 Word Sense Disambiguation Application of lexical semantics Goal: Given a word in context, identify the appropriate sense E.g. plants and animals in the rainforest Crucial for real syntactic & semantic analysis Correct sense can determine Available syntactic structure Available thematic roles, correct meaning,.. 16

17 Disambiguation Features Key: What are the features? 17

18 Disambiguation Features Key: What are the features? Part of speech Of word and neighbors Morphologically simplified form Words in neighborhood Question: How big a neighborhood? Is there a single optimal size? Why? (Possibly shallow) Syntactic analysis E.g. predicate-argument relations, modification, phrases Collocation vs co-occurrence features Collocation: words in specific relation: p-a, 1 word +/- Co-occurrence: bag of words.. 18

19 WSD Evaluation 19

20 WSD Evaluation Ideally, end-to-end evaluation with WSD component Demonstrate real impact of technique in system Difficult, expensive, still application specific 20

21 WSD Evaluation Ideally, end-to-end evaluation with WSD component Demonstrate real impact of technique in system Difficult, expensive, still application specific Typically, intrinsic, sense-based Accuracy, precision, recall SENSEVAL/SEMEVAL: all words, lexical sample 21

22 WSD Evaluation Ideally, end-to-end evaluation with WSD component Demonstrate real impact of technique in system Difficult, expensive, still application specific Typically, intrinsic, sense-based Accuracy, precision, recall SENSEVAL/SEMEVAL: all words, lexical sample Baseline: Most frequent sense Topline: Human inter-rater agreement: 75-80% fine; 90% coarse 22

23 Minimally Supervised WSD Yarowsky’s algorithm (1995) Bootstrapping approach: Use small labeled seedset to iteratively train 23

24 Minimally Supervised WSD Yarowsky’s algorithm (1995) Bootstrapping approach: Use small labeled seedset to iteratively train Builds on 2 key insights: One Sense Per Discourse Word appearing multiple times in text has same sense Corpus of bass instances: always single sense 24

25 Minimally Supervised WSD Yarowsky’s algorithm (1995) Bootstrapping approach: Use small labeled seedset to iteratively train Builds on 2 key insights: One Sense Per Discourse Word appearing multiple times in text has same sense Corpus of bass instances: always single sense One Sense Per Collocation Local phrases select single sense Fish -> Bass 1 Play -> Bass 2 25

26 Yarowsky’s Algorithm Training Decision Lists 1. Pick Seed Instances & Tag 26

27 Yarowsky’s Algorithm Training Decision Lists 1. Pick Seed Instances & Tag 2. Find Collocations: Word Left, Word Right, Word +K 27

28 Yarowsky’s Algorithm Training Decision Lists 1. Pick Seed Instances & Tag 2. Find Collocations: Word Left, Word Right, Word +K (A) Calculate Informativeness on Tagged Set, Order: 28

29 Yarowsky’s Algorithm Training Decision Lists 1. Pick Seed Instances & Tag 2. Find Collocations: Word Left, Word Right, Word +K (A) Calculate Informativeness on Tagged Set, Order: (B) Tag New Instances with Rules 29

30 Yarowsky’s Algorithm Training Decision Lists 1. Pick Seed Instances & Tag 2. Find Collocations: Word Left, Word Right, Word +K (A) Calculate Informativeness on Tagged Set, Order: (B) Tag New Instances with Rules (C) Apply 1 Sense/Discourse (D) 30

31 Yarowsky’s Algorithm Training Decision Lists 1. Pick Seed Instances & Tag 2. Find Collocations: Word Left, Word Right, Word +K (A) Calculate Informativeness on Tagged Set, Order: (B) Tag New Instances with Rules (C) Apply 1 Sense/Discourse (D) If Still Unlabeled, Go To 2 31

32 Yarowsky’s Algorithm Training Decision Lists 1. Pick Seed Instances & Tag 2. Find Collocations: Word Left, Word Right, Word +K (A) Calculate Informativeness on Tagged Set, Order: (B) Tag New Instances with Rules (C) Apply 1 Sense/Discourse (D) If Still Unlabeled, Go To 2 3. Apply 1 Sense/Discourse 32

33 Yarowsky’s Algorithm Training Decision Lists 1. Pick Seed Instances & Tag 2. Find Collocations: Word Left, Word Right, Word +K (A) Calculate Informativeness on Tagged Set, Order: (B) Tag New Instances with Rules (C) Apply 1 Sense/Discourse (D) If Still Unlabeled, Go To 2 3. Apply 1 Sense/Discourse Disambiguation: First Rule Matched 33

34 Yarowsky Decision List 34

35 Iterative Updating 35

36 There are more kinds of plants and animals in the rainforests than anywhere else on Earth. Over half of the millions of known species of plants and animals live in the rainforest. Many are found nowhere else. There are even plants and animals in the rainforest that we have not yet discovered. Biological Example The Paulus company was founded in Since those days the product range has been the subject of constant expansions and is brought up continuously to correspond with the state of the art. We ’ re engineering, manufacturing and commissioning world- wide ready-to-run plants packed with our comprehensive know- how. Our Product Range includes pneumatic conveying systems for carbon, carbide, sand, lime andmany others. We use reagent injection in molten metal for the… Industrial Example Label the First Use of “ Plant ” 36

37 Sense Choice With Collocational Decision Lists Create Initial Decision List Rules Ordered by 37

38 Sense Choice With Collocational Decision Lists Create Initial Decision List Rules Ordered by Check nearby Word Groups (Collocations) Biology: “Animal” in words Industry: “Manufacturing” in words 38

39 Sense Choice With Collocational Decision Lists Create Initial Decision List Rules Ordered by Check nearby Word Groups (Collocations) Biology: “Animal” in words Industry: “Manufacturing” in words Result: Correct Selection 95% on Pair-wise tasks 39

40 Self-Training Basic approach: Start off with small labeled training set Train a supervised classifier with the training set Apply new classifier to residual unlabeled training data Add ‘best’ newly labeled examples to labeled training Iterate 40

41 Self-Training Simple – right? 41

42 Self-Training Simple – right? Devil in the details: Which instances are ‘best’ to add? 42

43 Self-Training Simple – right? Devil in the details: Which instances are ‘best’ to add? Highest confidence? Probably accurate, but Probably add little new information to classifier 43

44 Self-Training Simple – right? Devil in the details: Which instances are ‘best’ to add? Highest confidence? Probably accurate, but Probably add little new information to classifier Most different? Probably adds information, but May not be accurate Use most different, highly confident instances 44

45 Co-Training Blum & Mitchell, 1998 Basic intuition: “Two heads are better than one” 45

46 Co-Training Blum & Mitchell, 1998 Basic intuition: “Two heads are better than one” Ensemble classifier: Uses results from multiple classifiers 46

47 Co-Training Blum & Mitchell, 1998 Basic intuition: “Two heads are better than one” Ensemble classifier: Uses results from multiple classifiers Multi-view classifier: Uses different views of data – feature subsets Ideally, views should be: Conditionally independent Individually sufficient – enough information to learn 47

48 Co-training Set-up Create two views of data: Typically partition feature set by type E.g. predicting speech emphasis View 1: Acoustics: loudness, pitch, duration View 2: Lexicon, syntax, context 48

49 Co-training Set-up Create two views of data: Typically partition feature set by type E.g. predicting speech emphasis View 1: Acoustics: loudness, pitch, duration View 2: Lexicon, syntax, context Some approaches use learners of different types In practice, views may not truly be conditionally indep. But often works pretty well anyway 49

50 Co-training Approach Create small labeled training data set Train two (supervised) classifiers on current training Using different views Use two classifiers to label residual unlabeled instances Select ‘best’ newly labeled data to add to training data* Adding instances labeled by C1 to training data for C2, v.v. Iterate 50

51 Graphically Figure from Jeon&Liu’11 51

52 More Devilish Details 52 Questions for co-training:

53 More Devilish Details 53 Questions for co-training: Which instances are ‘best’ to add to training? Most confident? Most different? Random? Many approaches combine

54 More Devilish Details 54 Questions for co-training: Which instances are ‘best’ to add to training? Most confident? Most different? Random? Many approaches combine How many instances to add per iteration? Threshold – by count, by value?

55 More Devilish Details 55 Questions for co-training: Which instances are ‘best’ to add to training? Most confident? Most different? Random? Many approaches combine How many instances to add per iteration? Threshold – by count, by value? How long to iterate? Fixed count? Threshold classifier confidence? etc…

56 Co-training Applications Applied to many language related tasks Blum & Mitchell’s paper Academic home web page classification 95% accuracy: 12 pages labeled; 788 classified Sentiment analysis Statistical parsing Prominence recognition Dialog classification 56

57 Learning Curves: Semi-supervised vs Supervised 57

58 Semi-supervised Learning Umbrella term for machine learning techniques that: Use a small amount of labeled training data Augmented with information from unlabeled data 58

59 Semi-supervised Learning Umbrella term for machine learning techniques that: Use a small amount of labeled training data Augmented with information from unlabeled data Can be very effective: Training on ~10 labeled samples Can yield results comparable to training on 1000s 59

60 Semi-supervised Learning Umbrella term for machine learning techniques that: Use a small amount of labeled training data Augmented with information from unlabeled data Can be very effective: Training on ~10 labeled samples Can yield results comparable to training on 1000s Can be temperamental: Sensitive to data, learning algorithm, design choices Hard to predict effects of: amount of labeled data, unlabeled data, etc 60

61 Summary 61

62 Review Introduction: Entropy, cross-entropy, and mutual information Classic machine learning algorithms: Decision trees, kNN, Naïve Bayes Discriminative machine learning algorithms: MaxEnt, CRFs, SVMs Other models: TBL, EM, Semi-supervised approaches 62

63 General Methods Data organization: Training, development, test data splits Cross-validation: Parameter turning, evaluation Feature selection: Wrapper methods, filtering, weighting Beam search 63

64 Tools, Data, & Tasks Tools: Mallet libSVM Data: 20 Newsgroups (Text classification) Penn Treebank (POS tagging) 64

65 Beyond 572 Ling 573: ‘Capstone’ project class: Integrates material from 57* classes More ‘real world’: project teams, deliverables, repositories 65

66 Beyond 572 Ling 573: ‘Capstone’ project class: Integrates material from 57* classes More ‘real world’: project teams, deliverables, repositories Ling 575s: Speech technology: Michael Tjalve (TH: 4pm) NLP on mobile devices: Scott Farrar (T: 4pm) 66

67 Beyond 572 Ling 573: ‘Capstone’ project class: Integrates material from 57* classes More ‘real world’: project teams, deliverables, repositories Ling 575s: Speech technology: Michael Tjalve (TH: 4pm) NLP on mobile devices: Scott Farrar (T: 4pm) Ling and other electives 67

68 Course Evaluations https://depts.washington.edu/oeaias/webq/survey.cgi?u ser=UWDL&survey=1397 https://depts.washington.edu/oeaias/webq/survey.cgi?u ser=UWDL&survey=1397 Thank you! 68


Download ppt "Semi-Supervised Learning & Summary Advanced Statistical Methods in NLP Ling 572 March 8, 2012."

Similar presentations


Ads by Google