LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
LING 388: Language and Computers Sandiway Fong Lecture 2.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
LING 581: Advanced Computational Linguistics Lecture Notes January 19th.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 18, March 13, 2007.
LING 581: Advanced Computational Linguistics Lecture Notes February 2nd.
LING 581: Advanced Computational Linguistics Lecture Notes March 9th.
Probabilistic Parsing: Enhancements Ling 571 Deep Processing Techniques for NLP January 26, 2011.
Introduction to treebanks Session 1: 7/08/
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
1 Annotation Guidelines for the Penn Discourse Treebank Part B Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber.
LING 364: Introduction to Formal Semantics Lecture 4 January 24th.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 20: 11/8.
Computational Intelligence 696i Language Lecture 2 Sandiway Fong.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
LING 388 Language and Computers Take-Home Final Examination 12/9/03 Sandiway FONG.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
IGERT External Advisory Board Meeting Wednesday, March 14, 2007 INSTITUTE FOR COGNITIVE SCIENCES University of Pennsylvania.
LING 581: Advanced Computational Linguistics Lecture Notes January 12th.
LING 438/538 Computational Linguistics Sandiway Fong Lecture 19: 10/31.
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong. Administrivia Homework 4 – out today – due next Wednesday – (recommend you attempt it early) Reading.
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
LING/C SC/PSYC 438/538 Lecture 27 Sandiway Fong. Administrivia 2 nd Reminder – 538 Presentations – Send me your choices if you haven’t already.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
LING 581: Advanced Computational Linguistics Lecture Notes February 12th.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
Methods for the Automatic Construction of Topic Maps Eric Freese, Senior Consultant ISOGEN International.
LING 388: Language and Computers Sandiway Fong Lecture 18.
LING 581: Advanced Computational Linguistics Lecture Notes February 16th.
University of Edinburgh27/10/20151 Lexical Dependency Parsing Chris Brew OhioState University.
LING 581: Advanced Computational Linguistics Lecture Notes February 19th.
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Tokenization & POS-Tagging
LING/C SC/PSYC 438/538 Lecture 22 Sandiway Fong. Last Time Gentle introduction to probability Important notions: –sample space –events –rule of counting.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
CSA2050 Introduction to Computational Linguistics Parsing I.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 5 th.
LING/C SC/PSYC 438/538 Lecture 18 Sandiway Fong. Adminstrivia Homework 7 out today – due Saturday by midnight.
LING/C SC/PSYC 438/538 Lecture 19 Sandiway Fong 1.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
LING 581: Advanced Computational Linguistics Lecture Notes February 24th.
LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.
Natural Language Processing Vasile Rus
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
LING/C SC/PSYC 438/538 Lecture 20 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
LING/C SC/PSYC 438/538 Lecture 23 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
David Kauchak CS159 – Spring 2019
LING/C SC 581: Advanced Computational Linguistics
Artificial Intelligence 2004 Speech & Natural Language Processing
LING/C SC 581: Advanced Computational Linguistics
Presentation transcript:

LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 3 rd

Administrivia Homework 3 is out today No lecture next week Therefore due date for homework 3 will be the following Tuesday by midnight You are welcome to listen to Prof. Bryan Heidorn of the School of Information this afternoon in my seminar class (Douglass 211; 3:30pm–6pm)

Today's Topics Revisiting Colorless Green Ideas and the Penn Treebank (WSJ) tregex lecture Homework 3

Colorless green ideas examples – (1) colorless green ideas sleep furiously – (2) furiously sleep ideas green colorless Chomsky (1957): –... It is fair to assume that neither sentence (1) nor (2) (nor indeed any part of these sentences) has ever occurred in an English discourse. Hence, in any statistical model for grammaticalness, these sentences will be ruled out on identical grounds as equally `remote' from English. Yet (1), though nonsensical, is grammatical, while (2) is not. idea – (1) is syntactically valid, – (2) is word salad One piece of supporting evidence: (1) pronounced with normal intonation (2) pronounced like a list of words …

Colorless green ideas Sentences: – (1) colorless green ideas sleep furiously – (2) furiously sleep ideas green colorless Statistical Experiment (Pereira 2002) wiwi w i-1 bigram language model

Part-of-Speech (POS) Tag Sequence Chomsky's example: – colorless green ideas sleep furiously – JJ JJ NNS VBP RB(POS Tags) Similar but grammatical example: – revolutionary new ideas appear infrequently – JJ JJ NNS VBP RB LSLT pg. 146

How is it used? One million words of 1989 Wall Street Journal (WSJ) material nearly 50,000 sentences (25 sections) Training: (2–21) 39,832 sentences Evaluation: (23) 2,416 sentences Standard practice training sections 2–21 evaluation % Penn Treebank (PTB) Corpus

Stanford Parser Stanford Parser: – a probabilistic PS parser trained on the Penn Treebank

Stanford Parser Stanford Parser: a probabilistic PS parser trained on the Penn Treebank

Penn Treebank (PTB) Corpus: word frequencies:

Stanford Parser Structure of NPs: – colorless green ideas – revolutionary new ideas

An experiment examples – (1) colorless green ideas sleep furiously – (2) furiously sleep ideas green colorless Question: – Is (1) the most likely permutation of these particular five words?

Parsing Data All 5! (=120) permutations of – colorless green ideas sleep furiously.

Parsing Data The winning sentence was: ①furiously ideas sleep colorless green. after training on sections (approx. 40,000 sentences) sleep selects for ADJP object with 2 heads adverb (RB) furiously modifies noun

Parsing Data The next two highest scoring permutations were: ②Furiously green ideas sleep colorless. ③Green ideas sleep furiously colorless. sleep takes NP objectsleep takes ADJP object

Parsing Data (Pereira 2002) compared Chomsky’s original minimal pair: 23.colorless green ideas sleep furiously 36.furiously sleep ideas green colorless Ranked #23 and #36 respectively out of 120

Parsing Data But … graph (next slide) shows how arbitrary these rankings are when trained on randomly chosen sections covering 14K- 31K sentences Example: #36 furiously sleep ideas green colorless outranks #23 colorless green ideas sleep furiously (and the top 3) over much of the training space Example: Chomsky's original sentence #23 colorless green ideas sleep furiously outranks both the top 3 and #36 just briefly at one data point

Sentence Rank vs. Amount of Training Data Best three sentences

Sentence Rank vs. Amount of Training Data #23 colorless green ideas sleep furiously #36 furiously sleep ideas green colorless

Sentence Rank vs. Amount of Training Data #23 colorless green ideas sleep furiously #36 furiously sleep ideas green colorless

Break

tregex The best introduction to Tregex is the brief powerpoint tutorial for Tregex by Galen Andrew. – The_Wonderful_World_of_Tregex.ppt

Homework Discussion useful command line tool: –diff

Homework 3 Consider small clauses, e.g. – John considers himself smart – John considers [ S himself smart] – himself smart is a "small clause" (no verb) – small clause has a subject: himself – small clause has a predicate: smart

Homework 3 docs/prsguid1.pdf (page 252)

Homework 3 docs/prsguid1.pdf (page 252)

Homework 3 docs/prsguid1.pdf (page 252)

Homework 3 Double Object Examples: – verbs give and persuade may take two objects – give Calvin a comic book – [ VP [ VB give] [ NP [ NNP Calvin]] [ NP [ DT a] comic book]] – persuade Hobbes to eat Calvin – [ VP [ VB persuade] [ NP [ NNP Hobbes]] [ S [ NP *][ VP to eat [ NP [ NNP Calvin]]]]]

Homework 3 Wikipedia on small clauses and constituency testing:

Homework 3 Using tregex find all the small clauses in the WSJ corpus Question 1: How many are there in total? Question 2: Would you expect the NP-PRD type to be the most numerous? Is it? Question 3: List all the different types and their frequencies (in descending order) Show your work: – i.e. show the tregex search terms you used