Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Similar presentations


Presentation on theme: "Overview of Statistical NLP IR Group Meeting March 7, 2006."— Presentation transcript:

1 Overview of Statistical NLP IR Group Meeting March 7, 2006

2 03/07/2006 IR Group Meeting -- NLP 2 Outline Some basic/important NLP problems Topics that recently attracted many interests NLP research groups Discussion on the relation between NLP and IR

3 03/07/2006 IR Group Meeting -- NLP 3 Levels of Analysis in NLP (from Dan Roth’s CS598) Morphology  How words are constructed Syntax  Structural relation between words Semantics  The meaning of words and of combinations of words Pragmatics.  How is a sentence used? What’s its purpose? Discourse (sometimes distinguished as a subfield of Pragmatics)  Relationships between sentences; global context.

4 03/07/2006 IR Group Meeting -- NLP 4 Some NLP Problems N-gram Models Word Sense Disambiguation Lexical Acquisition (POS) Tagging (Syntactic) Parsing Semantic Role Labeling (Semantic Parsing) Named Entity Recognition Textual Entailment …

5 03/07/2006 IR Group Meeting -- NLP 5 N-gram Models The task: to estimate P(w n |w 1,…,w n-1 ) Approaches:  Maximum likelihood estimation  Various smoothing methods Applications:  Automatic speech recognition  Spelling correction  Handwriting recognition  Statistical machine translation

6 03/07/2006 IR Group Meeting -- NLP 6 Word Sense Disambiguation (WSD) The task: to determine which of the senses of an ambiguous word is involved in a particular use of the word Approaches:  Supervised: Log-linear models Information-theoretic Memory-based learning (kNN)  Dictionary-based: Sense definitions Thesauri Translations in a second language  Unsupervised: Clustering using EM algorithm

7 03/07/2006 IR Group Meeting -- NLP 7 Word Sense Disambiguation (WSD) Accuracy:  Word-specific  Easy words: > 90%  Hard words: 50~70% Applications:  Statistical machine translation  Information retrieval

8 03/07/2006 IR Group Meeting -- NLP 8 Lexical Acquisition The task: to develop algorithms and statistical techniques for filling the holes in existing machine-learnable dictionaries by looking at the occurrence patterns of words in large text corpora Examples:  Verb subcategorization  Propositional phrase attachment disambiguation  Selectional preferences  Semantic similarity

9 03/07/2006 IR Group Meeting -- NLP 9 Semantic Similarity The task: to acquire a relative measure of similarity between two words Approaches:  Vector space measures (document space, word space, modifier space, etc.)  Probabilistic measures (KL-divergence, etc.) Applications:  Information retrieval (query expansion)

10 03/07/2006 IR Group Meeting -- NLP 10 POS Tagging The task: labeling each word in a sentence with its appropriate part of speech Major approaches  HMM  Transformation-based Advantages: speed and storage Other approaches  Neural networks, decision trees, memory-based learning, maximum entropy models

11 03/07/2006 IR Group Meeting -- NLP 11 POS Tagging Accuracy:  95~97%  Achieved only when the application text and the training text are from the similar source Applications  For higher-level NLP tasks: partial parsing, parsing, NER, etc. “…the best lexicalized probabilistic parsers are now good enough that they perform better starting with untagged text and doing the tagging themselves, rather than using a tagger as preprocessor.” (Charniak 1997)

12 03/07/2006 IR Group Meeting -- NLP 12 (Syntactic) Parsing The task: to find the most likely syntactic parse tree of a sentence Approaches:  Probabilistic context free grammar (PCFG) Supervised Unsupervised  Lexicalized models  Dependency-based models

13 03/07/2006 IR Group Meeting -- NLP 13 (Syntactic) Parsing Accuracy:  Charniak 1997: Rec 0.875 Prec 0.874  Collins 1997: Rec 0.881 Prec 0.886 Applications:  For other NLP tasks such as semantic role labeling and relation extraction

14 03/07/2006 IR Group Meeting -- NLP 14 Semantic Role Labeling The task: to identify the predicate-argument structures in sentences Approaches:  Supervised learning Accuracy:  Best ~70% (CoNLL 04 shared task) Applications:  Information extraction  Question answering

15 03/07/2006 IR Group Meeting -- NLP 15 Textual Entailment The task: given two text fragments, to recognize whether the meaning of one text is entailed (can be inferred) from the other text Approaches:  Word overlap  Statistical lexical relations  Syntactic matching  Logic inference Accuracy:  ~0.56, best ~0.60 (PASCAL Challenge 05) Applications:  Question answering  Multi-document summarization

16 03/07/2006 IR Group Meeting -- NLP 16 Tools Brill Tagger Brill Charniak Parser Charniak Collins Parser Collins MiniPar Semantic Parser  ASSERT Parser ASSERT  CCG’s demodemo

17 03/07/2006 IR Group Meeting -- NLP 17 Corpora WordNet Penn Treebank (Sample) Penn TreebankSample PropBank FrameNet

18 03/07/2006 IR Group Meeting -- NLP 18 Other Tasks Automatic Speech Recognition Natural Language Generation Automatic Summarization …

19 03/07/2006 IR Group Meeting -- NLP 19 Outline Some basic/important NLP problems Topics that recently attracted many interests NLP research groups Discussion on the relation between NLP and IR

20 03/07/2006 IR Group Meeting -- NLP 20 Recent topics Unsupervised and semi-supervised approaches  Knowledge acquisition bottleneck Semantic role labeling  Improve the performance of SRL  Use the results for other tasks Relation extraction WSD Parsing Statistical machine translation  Word alignment

21 03/07/2006 IR Group Meeting -- NLP 21 Outline Some basic/important NLP problems Topics that recently attracted many interests NLP research groups Discussion on the relation between NLP and IR

22 03/07/2006 IR Group Meeting -- NLP 22 NLP Research Groups USC/ISI Stanford UPenn Johns-Hopkins UIUC …

23 03/07/2006 IR Group Meeting -- NLP 23 Outline Some basic/important NLP problems Topics that recently attracted many interests NLP research groups Discussion on the relation between NLP and IR


Download ppt "Overview of Statistical NLP IR Group Meeting March 7, 2006."

Similar presentations


Ads by Google