LING/C SC 581: Advanced Computational Linguistics

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.

Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.

1 I256: Applied Natural Language Processing Marti Hearst Sept 13, 2006.

LING 581: Advanced Computational Linguistics Lecture Notes January 19th.

Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.

LING 438/538 Computational Linguistics Sandiway Fong Lecture 20: 11/8.

Computational Intelligence 696i Language Lecture 2 Sandiway Fong.

LING 438/538 Computational Linguistics

Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.

IGERT External Advisory Board Meeting Wednesday, March 14, 2007 INSTITUTE FOR COGNITIVE SCIENCES University of Pennsylvania.

LING 581: Advanced Computational Linguistics Lecture Notes January 12th.

LING 438/538 Computational Linguistics Sandiway Fong Lecture 19: 10/31.

LING 438/538 Computational Linguistics Sandiway Fong Lecture 18: 10/26.

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Jan 22 nd.

SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.

LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong. Administrivia Homework 4 – out today – due next Wednesday – (recommend you attempt it early) Reading.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.

1 Statistical Parsing Chapter 14 October 2012 Lecture #9.

인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.

LING 581: Advanced Computational Linguistics Lecture Notes February 16th.

Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.

Sequence Models With slides by me, Joshua Goodman, Fei Xia.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.

Tokenization & POS-Tagging

LING/C SC/PSYC 438/538 Lecture 22 Sandiway Fong. Last Time Gentle introduction to probability Important notions: –sample space –events –rule of counting.

CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.

LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.

LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.

General Information on Context-free and Probabilistic Context-free Grammars İbrahim Hoça CENG784, Fall 2013.

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Jan 20 th.

Overview of Statistical Language Models

Tools for Natural Language Processing Applications

Natural Language Processing (NLP)

LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15

LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.

LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.

Prototype-Driven Learning for Sequence Models

CSCI 5832 Natural Language Processing

LING/C SC 581: Advanced Computational Linguistics

LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.

LING/C SC 581: Advanced Computational Linguistics

LING 581: Advanced Computational Linguistics

CSCI 5832 Natural Language Processing

LING 581: Advanced Computational Linguistics

CSCI 5832 Natural Language Processing

LING/C SC 581: Advanced Computational Linguistics

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27

LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.

LING/C SC 581: Advanced Computational Linguistics

LING/C SC 581: Advanced Computational Linguistics

Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26

CS4705 Natural Language Processing

LING/C SC/PSYC 438/538 Lecture 13 Sandiway Fong.

Natural Language Processing

CPSC 503 Computational Linguistics

Natural Language Processing (NLP)

INF 141: Information Retrieval

LING/C SC 581: Advanced Computational Linguistics

Artificial Intelligence 2004 Speech & Natural Language Processing

CS249: Neural Language Model

LING/C SC 581: Advanced Computational Linguistics

Natural Language Processing (NLP)

Presentation transcript:

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 6th

Adminstrivia The Homework Pipeline: No classes next week: Homework 2 graded Homework 4 not back yet… soon Homework 5 due Weds by midnight No classes next week: I'm out of town on business No new homework assigned this week

Today's Topics Homework 4 review

Homework 4 Review: Question 1 Construct a WSJ text corpus that excludes both words tagged as – NONE- and punctuation words (defined previously) Show your Python console. How many words in the corpus? How many distinct words? Plot the cumulative frequency distribution graph How many top words do you need to account for 50% of the corpus?

Homework 4 Review: Question 1 excluded = set(['-NONE-', '-LRB-', '-RRB-', 'SYM', ':', '.', ',', '``', "''"]) tokens = [x[0] for x in ptb.tagged_words(categories=['news']) if x[1] not in excluded] words = set(tokens) print('Tokens: {}; #Words: {}'.format(len(text),len(words))) Tokens: 1037490; #Words: 49184 len(words) 49184 print('Lexical diversity: {:.3f}%'.format(len(words)/len(text))) Lexical diversity: 0.047% text = nltk.Text(tokens) dist = nltk.FreqDist(text) print(dist) <FreqDist with 49184 samples and 1037490 outcomes>

Homework 4 Review: Question 1 list = sorted(dist.items(),key=lambda t:t[1],reverse=True) half = len(text) / 2.0 total = 0 index = 0 while total < half: total += list[index][1] index += 1 print('No of words: {}; total: {}'.format(index,total)) No of words: 217; total: 518763 1037490 /2 = 518745

Homework 4 Review: Question 1 print('{:12s} {:5s}'.format('Word','Freq')) for word, freq in list[:index]: print('{:12s} {:5d}'.format(word,freq))

Homework 4 Review: Question 1

Homework 4 Review: Question 2 With case folding: tokens = [x[0].lower() for x in ptb.tagged_words(categories=['news']) if x[1] not in excluded] Tokens: 1037490; #Words: 43746 Lexical diversity: 0.042% No of words: 176; total: 518944 (1037490/2= 518745)

Homework 4 Review: Question 2

Colorless green ideas examples Chomsky (1957): (1) colorless green ideas sleep furiously (2) furiously sleep ideas green colorless Chomsky (1957): . . . It is fair to assume that neither sentence (1) nor (2) (nor indeed any part of these sentences) has ever occurred in an English discourse. Hence, in any statistical model for grammaticalness, these sentences will be ruled out on identical grounds as equally `remote' from English. Yet (1), though nonsensical, is grammatical, while (2) is not. idea (1) is syntactically valid, (2) is word salad One piece of supporting evidence: (1) pronounced with normal intonation (2) pronounced like a list of words …

Background: Language Models and N-grams given a word sequence w1 w2 w3 ... wn chain rule how to compute the probability of a sequence of words p(w1 w2) = p(w1) p(w2|w1) p(w1 w2 w3) = p(w1) p(w2|w1) p(w3|w1w2) ... p(w1 w2 w3...wn) = p(w1) p(w2|w1) p(w3|w1w2)... p(wn|w1...wn-2 wn-1) note It’s not easy to collect (meaningful) statistics on p(wn|wn-1wn-2...w1) for all possible word sequences

Background: Language Models and N-grams Given a word sequence w1 w2 w3 ... wn Bigram approximation just look at the previous word only (not all the proceedings words) Markov Assumption: finite length history 1st order Markov Model p(w1 w2 w3...wn) = p(w1) p(w2|w1) p(w3|w1w2) ...p(wn|w1...wn-3wn-2wn-1) p(w1 w2 w3...wn)  p(w1) p(w2|w1) p(w3|w2)...p(wn|wn-1) note p(wn|wn-1) is a lot easier to collect data for (and thus estimate well) than p(wn|w1...wn-2 wn-1)

Colorless green ideas Sentences: Statistical Experiment (Pereira 2002) (1) colorless green ideas sleep furiously (2) furiously sleep ideas green colorless Statistical Experiment (Pereira 2002) bigram language model wi-1 wi

Part-of-Speech (POS) Tag Sequence Chomsky's example: colorless green ideas sleep furiously JJ JJ NNS VBP RB (POS Tags) Similar but grammatical example: revolutionary new ideas appear infrequently JJ JJ NNS VBP RB LSLT pg. 146

Stanford Parser Stanford Parser: a probabilistic PS parser trained on the Penn Treebank

Stanford Parser Stanford Parser: a probabilistic PS parser trained on the Penn Treebank

Penn Treebank (PTB) Corpus: word frequencies: Word POS Frequency colorless green NNP 33 JJ 19 NN 5 ideas NNS 32 sleep VB 4 VBP 2 1 furiously RB Word POS Frequency revolutionary JJ 6 NNP 2 NN new 1795 1459 NNPS 1 ideas NNS 32 appear VB 55 VBP 41 infrequently

Stanford Parser Structure of NPs: colorless green ideas revolutionary new ideas Phrase Frequency [NP JJ JJ NNS] 1073 [NP NNP JJ NNS] 61

An experiment examples Question: (1) colorless green ideas sleep furiously (2) furiously sleep ideas green colorless Question: Is (1) even the most likely permutation of these particular five words?

Parsing Data All 5! (=120) permutations of colorless green ideas sleep furiously .

Parsing Data The winning sentence was: furiously ideas sleep colorless green . after training on sections 02-21 (approx. 40,000 sentences) sleep selects for ADJP object with 2 heads adverb (RB) furiously modifies noun

Parsing Data The next two highest scoring permutations were: Furiously green ideas sleep colorless . Green ideas sleep furiously colorless . sleep takes NP object sleep takes ADJP object

Parsing Data (Pereira 2002) compared Chomsky’s original minimal pair: colorless green ideas sleep furiously furiously sleep ideas green colorless Ranked #23 and #36 respectively out of 120

Parsing Data But graph (next slide) shows how arbitrary these rankings are when trained on randomly chosen sections covering 14K- 31K sentences Example: #36 furiously sleep ideas green colorless outranks #23 colorless green ideas sleep furiously (and the top 3) over much of the training space Example: Chomsky's original sentence #23 colorless green ideas sleep furiously outranks both the top 3 and #36 just briefly at one data point

Sentence Rank vs. Amount of Training Data Best three sentences

Sentence Rank vs. Amount of Training Data #23 colorless green ideas sleep furiously #36 furiously sleep ideas green colorless

Sentence Rank vs. Amount of Training Data #23 colorless green ideas sleep furiously #36 furiously sleep ideas green colorless