10/12/2015CPSC503 Winter 20091 CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Mrach 1, 2009Dr. Muhammed Al-Mulhem1 ICS482 Formal Grammars Chapter 12 Muhammed Al-Mulhem March 1, 2009.
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
NLP and Speech Course Review. Morphological Analyzer Lexicon Part-of-Speech (POS) Tagging Grammar Rules Parser thethe – determiner Det NP → Det.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
6/9/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 11 Giuseppe Carenini.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
CPSC 503 Computational Linguistics
1 Basic Parsing with Context Free Grammars Chapter 13 September/October 2012 Lecture 6.
8/27/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
9/8/20151 Natural Language Processing Lecture Notes 1.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin and Rada Mihalcea.
Speech and Language Processing Lecture 12—02/24/2015 Susan W. Brown.
Session 13 Context-Free Grammars and Language Syntax Introduction to Speech and Natural Language Processing (KOM422 ) Credits: 3(3-0)
BİL711 Natural Language Processing1 Statistical Parse Disambiguation Problem: –How do we disambiguate among a set of parses of a given sentence? –We want.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
10/3/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Ling 570 Day 17: Named Entity Recognition Chunking.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
10/24/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
6/2/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 9 Giuseppe Carenini.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Tokenization & POS-Tagging
1 Context Free Grammars October Syntactic Grammaticality Doesn’t depend on Having heard the sentence before The sentence being true –I saw a unicorn.
CPSC 503 Computational Linguistics
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Natural Language Processing Lecture 15—10/15/2015 Jim Martin.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Natural Language Processing Lecture 14—10/13/2015 Jim Martin.
1/22/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 8 Giuseppe Carenini.
2/29/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 5 Giuseppe Carenini.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
Speech and Language Processing Formal Grammars Chapter 12.
6/18/2016CPSC503 Winter CPSC 503 Computational Linguistics Lecture 6 Giuseppe Carenini.
Heng Ji September 13, 2016 SYNATCTIC PARSING Heng Ji September 13, 2016.
Speech and Language Processing
Basic Parsing with Context Free Grammars Chapter 13
Natural Language Processing (NLP)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
CSC 594 Topics in AI – Natural Language Processing
CPSC 503 Computational Linguistics
Machine Learning in Natural Language Processing
CPSC 503 Computational Linguistics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27
CPSC 503 Computational Linguistics
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 26
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CPSC 503 Computational Linguistics
CSCI 5832 Natural Language Processing
Natural Language Processing (NLP)
Artificial Intelligence 2004 Speech & Natural Language Processing
CPSC 503 Computational Linguistics
Natural Language Processing (NLP)
Presentation transcript:

10/12/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 10 Giuseppe Carenini

10/12/2015CPSC503 Winter Knowledge-Formalisms Map Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners

10/12/2015CPSC503 Winter Today 9/10 NLTK demos and more….. Partial Parsing: Chunking Dependency Grammars / Parsing Treebank

10/12/2015CPSC503 Winter Chunking Classify only basic non-recursive phrases (NP, VP, AP, PP) –Find non-overlapping chunks –Assign labels to chunks Chunk: typically includes headword and pre-head material [NP The HD box] that [NP you] [VP ordered] [PP from] [NP Shaw] [VP never arrived]

10/12/2015CPSC503 Winter Approaches to Chunking (1): Finite- State Rule-Based Set of hand-crafted rules (no recursion!) e.g., NP -> (Det) Noun* Noun Implemented as FSTs (unionized/deteminized/minimized) F-measure To build tree-like structures several FSTs can be combined [Abney ’96]

10/12/2015CPSC503 Winter Approaches to Chunking (1): Finite- State Rule-Based … several FSTs can be combined

10/12/2015CPSC503 Winter Approaches to Chunking (2): Machine Learning A case of sequential classification IOB tagging: (I) internal, (O) outside, (B) beginning Internal and Beginning for each chunk type => size of tagset (2n + 1) where n is the num of chunk types Find an annotated corpus Select feature set Select and train a classifier

10/12/2015CPSC503 Winter Context window approach Typical features: –Current / previous / following words –Current / previous / following POS –Previous chunks

10/12/2015CPSC503 Winter Context window approach and others.. Specific choice of machine learning approach does not seem to matter F-measure range Common causes of errors: –POS tagger inaccuracies –Inconsistencies in training corpus –Inaccuracies in identifying heads –Ambiguities involving conjunctions (e.g., “late arrivals and cancellations/departure are common in winter” ) NAACL ‘03

10/12/2015CPSC503 Winter Today 9/10 Partial Parsing: Chunking Dependency Grammars / Parsing Treebank

10/12/2015CPSC503 Winter Dependency Grammars Syntactic structure: binary relations between words Links: grammatical function or very general semantic relation Abstract away from word-order variations (simpler grammars) Useful features in many NLP applications (for classification, summarization and NLG)

10/12/2015CPSC503 Winter Dependency Grammars (more verbose) In CFG-style phrase-structure grammars the main focus is on constituents. But it turns out you can get a lot done with just binary relations among the words in an utterance. In a dependency grammar framework, a parse is a tree where –the nodes stand for the words in an utterance –The links between the words represent dependency relations between pairs of words. Relations may be typed (labeled), or not.

10/12/2015CPSC503 Winter Dependency Relations Show grammar primer

10/12/2015CPSC503 Winter Dependency Parse (ex 1) They hid the letter on the shelf

10/12/2015CPSC503 Winter Dependency Parse (ex 2)

10/12/2015CPSC503 Winter Dependency Parsing (see MINIPAR / Stanford demos) Dependency approach vs. CFG parsing. –Deals well with free word order languages where the constituent structure is quite fluid –Parsing is much faster than CFG-based parsers –Dependency structure often captures all the syntactic relations actually needed by later applications

10/12/2015CPSC503 Winter Dependency Parsing There are two modern approaches to dependency parsing (supervised learning from Treebank data) –Optimization-based approaches that search a space of trees for the tree that best matches some criteria –Transition-based approaches that define and learn a transition system (state machine) for mapping a sentence to its dependency graph

10/12/2015CPSC503 Winter Today 9/10 Partial Parsing: Chunking Dependency Grammars / Parsing Treebank

10/12/2015CPSC503 Winter Treebanks DEF. corpora in which each sentence has been paired with a parse tree These are generally created –Parse collection with parser –human annotators revise each parse Requires detailed annotation guidelines –POS tagset –Grammar –instructions for how to deal with particular grammatical constructions.

10/12/2015CPSC503 Winter Penn Treebank Penn TreeBank is a widely used treebank.  Most well known is the Wall Street Journal section of the Penn TreeBank.  1 M words from the Wall Street Journal.

10/12/2015CPSC503 Winter Treebank Grammars Treebanks implicitly define a grammar. Simply take the local rules that make up the sub-trees in all the trees in the collection if decent size corpus, you’ll have a grammar with decent coverage.

10/12/2015CPSC503 Winter Treebank Grammars Such grammars tend to be very flat due to the fact that they tend to avoid recursion. –To ease the annotators burden For example, the Penn Treebank has 4500 different rules for VPs! Among them...

10/12/2015CPSC503 Winter Heads in Trees Finding heads in treebank trees is a task that arises frequently in many applications. –Particularly important in statistical parsing We can visualize this task by annotating the nodes of a parse tree with the heads of each corresponding node.

10/12/2015CPSC503 Winter Lexically Decorated Tree

10/12/2015CPSC503 Winter Head Finding The standard way to do head finding is to use a simple set of tree traversal rules specific to each non-terminal in the grammar.

10/12/2015CPSC503 Winter Noun Phrases

10/12/2015CPSC503 Winter Treebank Uses Searching a Treebank. TGrep2 NP < PP or NP << PP Treebanks (and headfinding) are particularly critical to the development of statistical parsers –Chapter 14 Also valuable to Corpus Linguistics –Investigating the empirical details of various constructions in a given language

10/12/2015CPSC503 Winter Today 9/10 Partial Parsing: Chunking Dependency Grammars / Parsing Treebank Final Project

Final Project: Decision (Group of 2 people is OK) Two ways: Select and NLP task / problem or a technique used in NLP that truly interests you Tasks: summarization of ……, computing similarity between two terms/sentences (skim through the textbook) Techniques: extensions / variations / combinations of what we saw in class – Max Entropy Classifiers or MM, Dirichlet Multinomial Distributions, Conditional Random Fields 10/12/2015CPSC503 Winter

Final Project: goals (and hopefully contributions ) Apply a technique which has been used for nlp taskA to a different nlp taskB. Apply a technique to a different dataset or to a different language Proposing a different evaluation measure Improve on a proposed solution by using a possibly more effective technique or by combining multiple techniques Proposing a novel (minimally is OK!) different solution. 10/12/2015CPSC503 Winter

Final Project: what to do + Examples / Ideas Look on the course WebPage 10/12/2015CPSC503 Winter Proposal due on Nov 4!

10/12/2015CPSC503 Winter Next time: read Chpt 14 Logical formalisms (First-Order Logics) Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) State Machines (and prob. versions) (Finite State Automata,Finite State Transducers, Markov Models) Morphology Syntax Pragmatics Discourse and Dialogue Semantics AI planners

10/12/2015CPSC503 Winter For Next Time Read Chapter 14 (Probabilistic CFG and Parsing)