Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011.

Similar presentations


Presentation on theme: "Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011."— Presentation transcript:

1 Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011

2 Roadmap Named Entity Recognition Chunking HW #9

3 Named Entity Recognition

4 Roadmap Named Entity Recognition Definition Motivation Challenges Common Approach

5 Named Entity Recognition Task: Identify Named Entities in (typically) unstructured text Typical entities: Person names Locations Organizations Dates Times

6 Example Microsoft released Windows Vista in 2007. Example due to F. Xia

7 Example Microsoft released Windows Vista in 2007. Microsoft released Windows Vista in 2007 Example due to F. Xia

8 Example Microsoft released Windows Vista in 2007. Microsoft released Windows Vista in 2007 Entities: Often application/domain specific Business intelligence: Example due to F. Xia

9 Example Microsoft released Windows Vista in 2007. Microsoft released Windows Vista in 2007 Entities: Often application/domain specific Business intelligence: products, companies, features Biomedical: Example due to F. Xia

10 Example Microsoft released Windows Vista in 2007. Microsoft released Windows Vista in 2007 Entities: Often application/domain specific Business intelligence: products, companies, features Biomedical: Genes, proteins, diseases, drugs, … Example due to F. Xia

11 Named Entity Types Common categories

12 Named Entity Examples For common categories:

13 Why NER? Machine translation:

14 Why NER? Machine translation: Person

15 Why NER? Machine translation: Person names typically not translated Possibly transliterated Waldheim Number:

16 Why NER? Machine translation: Person names typically not translated Possibly transliterated Waldheim Number: 9/11: Date vs ratio 911: Emergency phone number, simple number

17 Why NER? Information extraction: MUC task: Joint ventures/mergers Focus on

18 Why NER? Information extraction: MUC task: Joint ventures/mergers Focus on Company names, Person Names (CEO), valuations

19 Why NER? Information extraction: MUC task: Joint ventures/mergers Focus on Company names, Person Names (CEO), valuations Information retrieval: Named entities focus of retrieval In some data sets, 60+% queries target NEs

20 Why NER? Information extraction: MUC task: Joint ventures/mergers Focus on Company names, Person Names (CEO), valuations Information retrieval: Named entities focus of retrieval In some data sets, 60+% queries target NEs Text-to-speech:

21 Why NER? Information extraction: MUC task: Joint ventures/mergers Focus on Company names, Person Names (CEO), valuations Information retrieval: Named entities focus of retrieval In some data sets, 60+% queries target NEs Text-to-speech: 206-616-5728 Phone numbers (vs other digit strings), differ by language

22 Challenges Ambiguity Washington chose

23 Challenges Ambiguity Washington chose D.C., State, George, etc Most digit strings

24 Challenges Ambiguity Washington chose D.C., State, George, etc Most digit strings cat: (95 results)

25 Challenges Ambiguity Washington chose D.C., State, George, etc Most digit strings cat: (95 results) CAT(erpillar) stock ticker Computerized Axial Tomography Chloramphenicol Acetyl Transferase small furry mammal

26 Context & Ambiguity

27 Evaluation Precision Recall F-measure

28 Resources Online: Name lists Baby name, who’s who, newswire services, census.gov Gazetteers SEC listings of companies Tools Lingpipe OpenNLP Stanford NLP toolkit

29 Approaches to NER Rule/Regex-based:

30 Approaches to NER Rule/Regex-based: Match names/entities in lists Regex:

31 Approaches to NER Rule/Regex-based: Match names/entities in lists Regex: e.g \d\d/\d\d/\d\d: 11/23/11 Currency: $\d+\.\d+

32 Approaches to NER Rule/Regex-based: Match names/entities in lists Regex: e.g \d\d/\d\d/\d\d: 11/23/11 Currency: $\d+\.\d+ Machine Learning via Sequence Labeling: Better for names, organizations

33 Approaches to NER Rule/Regex-based: Match names/entities in lists Regex: e.g \d\d/\d\d/\d\d: 11/23/11 Currency: $\d+\.\d+ Machine Learning via Sequence Labeling: Better for names, organizations Hybrid

34 NER as Sequence Labeling

35 NER as Classification Task Instance:

36 NER as Classification Task Instance: token Labels:

37 NER as Classification Task Instance: token Labels: Position: B(eginning), I(nside), Outside

38 NER as Classification Task Instance: token Labels: Position: B(eginning), I(nside), Outside NER types: PER, ORG, LOC, NUM

39 NER as Classification Task Instance: token Labels: Position: B(eginning), I(nside), Outside NER types: PER, ORG, LOC, NUM Label: Type-Position, e.g. PER-B, PER-I, O, … How many tags?

40 NER as Classification Task Instance: token Labels: Position: B(eginning), I(nside), Outside NER types: PER, ORG, LOC, NUM Label: Type-Position, e.g. PER-B, PER-I, O, … How many tags? (|NER Types|x 2) + 1

41 NER as Classification: Features What information can we use for NER?

42 NER as Classification: Features What information can we use for NER?

43 NER as Classification: Features What information can we use for NER? Predictive tokens: e.g. MD, Rev, Inc,.. How general are these features?

44 NER as Classification: Features What information can we use for NER? Predictive tokens: e.g. MD, Rev, Inc,.. How general are these features? Language? Genre? Domain?

45 NER as Classification: Shape Features Shape types:

46 NER as Classification: Shape Features Shape types: lower: e.g. cumming All lower case

47 NER as Classification: Shape Features Shape types: lower: e.g. cumming All lower case capitalized: e.g. Washington First letter uppercase

48 NER as Classification: Shape Features Shape types: lower: e.g. cumming All lower case capitalized: e.g. Washington First letter uppercase all caps: e.g. WHO all letters capitalized

49 NER as Classification: Shape Features Shape types: lower: e.g. cumming All lower case capitalized: e.g. Washington First letter uppercase all caps: e.g. WHO all letters capitalized mixed case: eBay Mixed upper and lower case

50 NER as Classification: Shape Features Shape types: lower: e.g. cumming All lower case capitalized: e.g. Washington First letter uppercase all caps: e.g. WHO all letters capitalized mixed case: eBay Mixed upper and lower case Capitalized with period: H.

51 NER as Classification: Shape Features Shape types: lower: e.g. cumming All lower case capitalized: e.g. Washington First letter uppercase all caps: e.g. WHO all letters capitalized mixed case: eBay Mixed upper and lower case Capitalized with period: H. Ends with digit: A9

52 NER as Classification: Shape Features Shape types: lower: e.g. cumming All lower case capitalized: e.g. Washington First letter uppercase all caps: e.g. WHO all letters capitalized mixed case: eBay Mixed upper and lower case Capitalized with period: H. Ends with digit: A9 Contains hyphen: H-P

53 Example Instance Representation Example

54 Sequence Labeling Example

55 Evaluation System: output of automatic tagging Gold Standard: true tags

56 Evaluation System: output of automatic tagging Gold Standard: true tags Precision: # correct chunks/# system chunks Recall: # correct chunks/# gold chunks F-measure:

57 Evaluation System: output of automatic tagging Gold Standard: true tags Precision: # correct chunks/# system chunks Recall: # correct chunks/# gold chunks F-measure: F 1 balances precision & recall

58 Evaluation Standard measures: Precision, Recall, F-measure Computed on entity types (Co-NLL evaluation)

59 Evaluation Standard measures: Precision, Recall, F-measure Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures Classifiers optimize tag accuracy

60 Evaluation Standard measures: Precision, Recall, F-measure Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures Classifiers optimize tag accuracy Most common tag?

61 Evaluation Standard measures: Precision, Recall, F-measure Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures Classifiers optimize tag accuracy Most common tag? O – most tokens aren’t NEs Evaluation measures focuses on NE

62 Evaluation Standard measures: Precision, Recall, F-measure Computed on entity types (Co-NLL evaluation) Classifiers vs evaluation measures Classifiers optimize tag accuracy Most common tag? O – most tokens aren’t NEs Evaluation measures focuses on NE State-of-the-art: Standard tasks: PER, LOC: 0.92; ORG: 0.84

63 Hybrid Approaches Practical sytems Exploit lists, rules, learning…

64 Hybrid Approaches Practical sytems Exploit lists, rules, learning… Multi-pass: Early passes: high precision, low recall Later passes: noisier sequence learning

65 Hybrid Approaches Practical sytems Exploit lists, rules, learning… Multi-pass: Early passes: high precision, low recall Later passes: noisier sequence learning Hybrid system: High precision rules tag unambiguous mentions Use string matching to capture substring matches

66 Hybrid Approaches Practical sytems Exploit lists, rules, learning… Multi-pass: Early passes: high precision, low recall Later passes: noisier sequence learning Hybrid system: High precision rules tag unambiguous mentions Use string matching to capture substring matches Tag items from domain-specific name lists Apply sequence labeler

67 Chunking

68 Roadmap Chunking Definition Motivation Challenges Approach

69 What is Chunking? Form of partial (shallow) parsing

70 What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees

71 What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees Task: identify and classify Flat, non-overlapping segments of a sentence

72 What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees Task: identify and classify Flat, non-overlapping segments of a sentence Basic non-recursive phrases

73 What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees Task: identify and classify Flat, non-overlapping segments of a sentence Basic non-recursive phrases Correspond to major POS May ignore some categories; i.e. base NP chunking

74 What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees Task: identify and classify Flat, non-overlapping segments of a sentence Basic non-recursive phrases Correspond to major POS May ignore some categories; i.e. base NP chunking Create simple bracketing [ NP The morning flight][ PP from][ NP Denver][ Vp has arrived]

75 What is Chunking? Form of partial (shallow) parsing Extracts major syntactic units, but not full parse trees Task: identify and classify Flat, non-overlapping segments of a sentence Basic non-recursive phrases Correspond to major POS May ignore some categories; i.e. base NP chunking Create simple bracketing [ NP The morning flight][ PP from][ NP Denver][ Vp has arrived] [ NP The morning flight] from [ NP Denver] has arrived

76 Why Chunking? Used when full parse unnecessary

77 Why Chunking? Used when full parse unnecessary Or infeasible or impossible (when?)

78 Why Chunking? Used when full parse unnecessary Or infeasible or impossible (when?) Extraction of subcategorization frames Identify verb arguments e.g. VP NP VP NP NP VP NP to NP

79 Why Chunking? Used when full parse unnecessary Or infeasible or impossible (when?) Extraction of subcategorization frames Identify verb arguments e.g. VP NP VP NP NP VP NP to NP Information extraction: who did what to whom

80 Why Chunking? Used when full parse unnecessary Or infeasible or impossible (when?) Extraction of subcategorization frames Identify verb arguments e.g. VP NP VP NP NP VP NP to NP Information extraction: who did what to whom Summarization: Base information, remove mods

81 Why Chunking? Used when full parse unnecessary Or infeasible or impossible (when?) Extraction of subcategorization frames Identify verb arguments e.g. VP NP VP NP NP VP NP to NP Information extraction: who did what to whom Summarization: Base information, remove mods Information retrieval: Restrict indexing to base NPs

82 Processing Example Tokenization: The morning flight from Denver has arrived

83 Processing Example Tokenization: The morning flight from Denver has arrived POS tagging: DT JJ N PREP NNP AUX V

84 Processing Example Tokenization: The morning flight from Denver has arrived POS tagging: DT JJ N PREP NNP AUX V Chunking: NP PP NP VP

85 Processing Example Tokenization: The morning flight from Denver has arrived POS tagging: DT JJ N PREP NNP AUX V Chunking: NP PP NP VP Extraction: NP NP VP etc

86 Approaches Finite-state Approaches Grammatical rules in FSTs Cascade to produce more complex structure

87 Approaches Finite-state Approaches Grammatical rules in FSTs Cascade to produce more complex structure Machine Learning Similar to POS tagging

88 Finite-State Rule-Based Chunking Hand-crafted rules model phrases Typically application-specific

89 Finite-State Rule-Based Chunking Hand-crafted rules model phrases Typically application-specific Left-to-right longest match (Abney 1996) Start at beginning of sentence Find longest matching rule

90 Finite-State Rule-Based Chunking Hand-crafted rules model phrases Typically application-specific Left-to-right longest match (Abney 1996) Start at beginning of sentence Find longest matching rule Greedy approach, not guaranteed optimal

91 Finite-State Rule-Based Chunking Chunk rules: Cannot contain recursion NP -> Det Nominal:

92 Finite-State Rule-Based Chunking Chunk rules: Cannot contain recursion NP -> Det Nominal: Okay Nominal -> Nominal PP:

93 Finite-State Rule-Based Chunking Chunk rules: Cannot contain recursion NP -> Det Nominal: Okay Nominal -> Nominal PP: Not okay Examples: NP  (Det) Noun* Noun NP  Proper-Noun VP  Verb VP  Aux Verb

94 Finite-State Rule-Based Chunking Chunk rules: Cannot contain recursion NP -> Det Nominal: Okay Nominal -> Nominal PP: Not okay Examples: NP  (Det) Noun* Noun NP  Proper-Noun VP  Verb VP  Aux Verb Consider: Time flies like an arrow Is this what we want?

95 Cascading FSTs Richer partial parsing Pass output of FST to next FST

96 Cascading FSTs Richer partial parsing Pass output of FST to next FST Approach: First stage: Base phrase chunking Next stage: Larger constituents (e.g. PPs, VPs) Highest stage: Sentences

97 Example

98 Chunking by Classification Model chunking as task similar to POS tagging Instance:

99 Chunking by Classification Model chunking as task similar to POS tagging Instance: tokens Labels: Simultaneously encode segmentation & identification

100 Chunking by Classification Model chunking as task similar to POS tagging Instance: tokens Labels: Simultaneously encode segmentation & identification IOB (or BIO tagging) (also BIOE or BIOSE) Segment: B(eginning), I (nternal), O(utside)

101 Chunking by Classification Model chunking as task similar to POS tagging Instance: tokens Labels: Simultaneously encode segmentation & identification IOB (or BIO tagging) (also BIOE or BIOSE) Segment: B(eginning), I (nternal), O(utside) Identity: Phrase category: NP, VP, PP, etc.

102 Chunking by Classification Model chunking as task similar to POS tagging Instance: tokens Labels: Simultaneously encode segmentation & identification IOB (or BIO tagging) (also BIOE or BIOSE) Segment: B(eginning), I (nternal), O(utside) Identity: Phrase category: NP, VP, PP, etc. The morning flight from Denver has arrived NP-B NP-I NP-I PP-B NP-B VP-B VP-I

103 Chunking by Classification Model chunking as task similar to POS tagging Instance: tokens Labels: Simultaneously encode segmentation & identification IOB (or BIO tagging) (also BIOE or BIOSE) Segment: B(eginning), I (nternal), O(utside) Identity: Phrase category: NP, VP, PP, etc. The morning flight from Denver has arrived NP-B NP-I NP-I PP-B NP-B VP-B VP-I NP-B NP-I NP-I NP-B

104 Features for Chunking What are good features?

105 Features for Chunking What are good features? Preceding tags for 2 preceding words

106 Features for Chunking What are good features? Preceding tags for 2 preceding words Words for 2 preceding, current, 2 following

107 Features for Chunking What are good features? Preceding tags for 2 preceding words Words for 2 preceding, current, 2 following Parts of speech for 2 preceding, current, 2 following

108 Features for Chunking What are good features? Preceding tags for 2 preceding words Words for 2 preceding, current, 2 following Parts of speech for 2 preceding, current, 2 following Vector includes those features + true label

109 Chunking as Classification Example

110 Evaluation System: output of automatic tagging Gold Standard: true tags Typically extracted from parsed treebank Precision: # correct chunks/# system chunks Recall: # correct chunks/# gold chunks F-measure: F 1 balances precision & recall

111 State-of-the-Art Base NP chunking: 0.96

112 State-of-the-Art Base NP chunking: 0.96 Complex phrases: Learning: 0.92-0.94 Most learners achieve similar results Rule-based: 0.85-0.92

113 State-of-the-Art Base NP chunking: 0.96 Complex phrases: Learning: 0.92-0.94 Most learners achieve similar results Rule-based: 0.85-0.92 Limiting factors:

114 State-of-the-Art Base NP chunking: 0.96 Complex phrases: Learning: 0.92-0.94 Most learners achieve similar results Rule-based: 0.85-0.92 Limiting factors: POS tagging accuracy Inconsistent labeling (parse tree extraction) Conjunctions Late departures and arrivals are common in winter Late departures and cancellations are common in winter

115 HW #9

116 Building a MaxEnt POS Tagger Q1: Build feature vector representations for POS tagging in SVMlight format maxent_features.* training_file testing_file rare_wd_threshold rare_feat_threshold outdir training_file, testing_file: like HW#7 w1/t1 w2/t2 …wn/tn Filter rare words and infrequent features Store vectors & intermediate representations in outdir

117 Feature Representations Features: Ratnaparkhi, 1996, Table 1 (duplicated in MaxEnt slides) Character issues: Replace “,” with “comma” Replace “:” with “colon” Mallet and svmlight format use these as delimiters

118 Q2: Experiments Run MaxEnt classification using your training and test files Compare effects of different thresholds on feature count, accuracy, and runtime Note: Big files This assignment will produce even larger sets of results that HW#8. Please gzip your tar files. If the DropBox won’t accept the files, you can store the files on patas. Just let Sanghoun know where to find them.


Download ppt "Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011."

Similar presentations


Ads by Google