Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Grammar Engineering: Set-valued Attributes Various Kinds of Constraints Case Restrictions on Arguments Miriam Butt (University of Konstanz) and Martin.
Statistical NLP: Lecture 3
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Sequence Classification: Chunking & NER Shallow Processing Techniques for NLP Ling570 November 23, 2011.
Stemming, tagging and chunking Text analysis short of parsing.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
1 I256: Applied Natural Language Processing Marti Hearst Sept 25, 2006.
Syntax LING October 11, 2006 Joshua Tauberer.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Syntax Lecture 3: The Subject. The Basic Structure of the Clause Recall that our theory of structure says that all structures follow this pattern: It.
CS 4705 Lecture 11 Feature Structures and Unification Parsing.
Linguistic Theory Lecture 2 Phrase Structure. What was there before structure? Classical studies: Classical studies: –Languages such as Latin Rich morphology.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
March 2006 CLINT-CS 1 Introduction to Computational Linguistics Chunk Parsing.
Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Intro to NLP - J. Eisner1 Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:  no restrictions on the form of the grammar:  A.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda Automatic Extraction of Hierarchical Relations.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
October 2005CSA3180: Text Processing II1 CSA3180: Natural Language Processing Text Processing 2 Shallow Parsing and Chunking Python and NLTK NLTK Exercises.
Using Text Mining and Natural Language Processing for Health Care Claims Processing Cihan ÜNAL
Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short.
Ling 570 Day 17: Named Entity Recognition Chunking.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
Development of a German- English Translator Felix Zhang Period Thomas Jefferson High School for Science and Technology Computer Systems Research.
10. Parsing with Context-free Grammars -Speech and Language Processing- 발표자 : 정영임 발표일 :
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2007 Lecture August 2007.
Linguistic Essentials
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שבע Partial Parsing אורן גליקמן.
Grammars Grammars can get quite complex, but are essential. Syntax: the form of the text that is valid Semantics: the meaning of the form – Sometimes semantics.
Rules, Movement, Ambiguity
CSA2050 Introduction to Computational Linguistics Parsing I.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Section 11.3 Features structures in the Grammar ─ Jin Wang.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
LING 388: Language and Computers Sandiway Fong Lecture 21.
Supertagging CMSC Natural Language Processing January 31, 2006.
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
POS Tagger and Chunker for Tamil
Shallow Parsing for South Asian Languages -Himanshu Agrawal.
Instructor: Nick Cercone CSEB - 1 Parsing and Context Free Grammars Parsers, Top Down, Bottom Up, Left Corner, Earley.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Modified from Diane Litman's version of Steve Bird's notes 1 Rule-Based Tagger The Linguistic Complaint –Where is the linguistic knowledge of a tagger?
3.3 A More Detailed Look At Transformations Inversion (revised): Move Infl to C. Do Insertion: Insert interrogative do into an empty.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Natural Language Processing Vasile Rus
Statistical NLP: Lecture 3
Machine Learning in Natural Language Processing
Earley’s Algorithm (1970) Nice combo of our parsing ideas so far:
Chunk Parsing CS1573: AI Application Development, Spring 2003
Linguistic Essentials
Natural Language Processing
David Kauchak CS159 – Spring 2019
Presentation transcript:

Chunk Parsing

Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full parsing not feasible or not desirable. Because of the expense of full-parsing, often treated as a stop-gap solution.

Chunk Parsing No rich hierarchy, as in parsing. Usually one layer above tagging. The process: 1.Tokenize 2.Tag 3.Chunk

Chunk Parsing Like tokenizing and tagging in a few respects: 1.Can skip over material in the input 2.Often finite-state (or finite-state like) methods are used (applied over tags) 3.Often application specific (i.e., the chunks tagged have uses for particular applications)

Chunk Parsing Chief Motivations: to find data or to ignore data Example from Bird and Loper: find the argument structures for the verb give. Can “discover” significant grammatical structures before developing a grammar: gave NP gave up NP in NP gave NP up gave NP help gave NP to NP

Chunk Parsing Like parsing, except: –It is not exhaustive, and doesn’t pretend to be. Structures and data can be skipped when not convenient or not desired –Structures of fixed depth produced Nested structures typical in parsing [S[NP The cow [PP in [NP the barn]]] ate Not in chunking [NP The cow] in [NP the barn] ate

Chunk Parsing Finds contiguous, non-overlapping spans of related text, and groups them into chunks. Because contiguity is given, finite state methods can be adapted to chunking

Longest Match Abney 1995 discusses longest match heuristic: –One automaton for each phrasal category –Start automata at position i (where i=0 initially) –Winner is the automaton with the longest match

Longest Match He took chunks from the PTB: NP → D N NP → D Adj N VP → V Encoded each rule as an automaton Stored longest matching pattern (the winner) If no match for a given word, skipped it (in other words, didn’t chunk it) Results: Precision.92, Recall.88

An Application Data-Driven Linguistics Ontology Development (NSF BCE ) One focus: locate linguistically annotated (read: tagged) text and extract linguistically relevant terms from text Attempt to discover “meaning” of the terms Intended to build out content of the ontology (GOLD) Focus on Interlinear Glossed Text (IGT)

An Application Interlinear Glossed Text (IGT), some examples: (1) Afisi a-na-ph-a nsomba hyenas SP-PST-kill-ASP fish `The hyenas killed the fish.' (Baker 1988:254)

An Application More examples: (4) a. yerexa-np'at'uhan-ebats-ets child-NOMwindow-ACCopen-AOR.3SG ‘The child opened the window.’(Megerdoomian ??)

An Application Problem: How do we ‘discover’ the meaning of the linguistically salient terms, such as NOM, ACC, AOR, 3SG? Perhaps we can discover the meanings by examining the contexts in which the occur. POS can be a context. Problem: POS tags rarely used in IGT How do you assign POS tags to a language you know nothing about? IGT gives us aligned text for free!! (4) a. yerexa-np'at'uhan-ebats-ets child-NOMwindow-ACCopen-AOR.3SG ‘The child opened the window.’ (Megerdoomian ??)

An Application IGT gives us aligned text for free!! POS tag the English translation Align with the glosses and language data That helps. We now know that NOM and ACC attach to nouns, not verbs (nominal inflections) And AOR and 3SG attach to verbs (verbal inflections) (4) a. yerexa-np'at'uhan-ebats-ets child-NOMwindow-ACCopen-AOR.3SG ‘The child opened the window.’ (Megerdoomian ??) DT NN VBP DT NN

An Application (4) a. yerexa-np'at'uhan-ebats-ets child-NOMwindow-ACCopen-AOR.3SG ‘The child opened the window.’ (Megerdoomian ??) DT NN VBP DT NN In the LaPolla example, we know that NOM does not attach to nouns, but to verbs. Must be some other kind of NOM.

An Application How we tagged: –Globally applied most frequent tags (stupid tagger) –Repaired tags where context dictated a change (e.g., TO preceding race = VB) –Technique similar to Brill 1995 (4) a. yerexa-np'at'uhan-ebats-ets child-NOMwindow-ACCopen-AOR.3SG ‘The child opened the window.’ (Megerdoomian ??) DT NN VBP DT NN

An Application But can we get more information about NOM, ACC, etc.? Can chunking tell us something more about these terms? Yes! (4) a. yerexa-np'at'uhan-ebats-ets child-NOMwindow-ACCopen-AOR.3SG ‘The child opened the window.’ (Megerdoomian ??) DT NN VBP DT NN

An Application Chunk phrases, mainly NPs Since relationship (in simple sentences) between NPs and verbs tells us something about the verbs’ arguments (Bird and Loper 2005)… We can tap this information to discover more about the linguistic tags (4) a. yerexa-np'at'uhan-ebats-ets child-NOMwindow-ACCopen-AOR.3SG ‘The child opened the window.’ (Megerdoomian ??) DT NN VBP DT NN

An Application Apply Abney 1995’s longest match heuristic to get as many chunks as possible (especially NP) Leverage English canonical SVO (NVN) order to identify simple argument structures Use these to discover more information about the terms Thus… (4) a. yerexa-np'at'uhan-ebats-ets child-NOMwindow-ACCopen-AOR.3SG ‘The child opened the window.’ (Megerdoomian ??) DT NN VBP DT NN NP VP

An Application We know that –NOM attaches to subject NPs – may be a case marker indicating subject –ACC attaches to object NPs – may be a case marker indicating object (4) a. yerexa-np'at'uhan-ebats-ets child-NOMwindow-ACCopen-AOR.3SG ‘The child opened the window.’ (Megerdoomian ??) DT NN VBP DT NN NP VP

An Application What we do next: look at co-occurrence relations (clustering) of –Terms with terms –Host categories with terms To determine more information about the terms Done by building feature vectors of the various linguistic grammatical terms (“grams”) representing their contexts And measuring relative distances between these vectors (in particular, for terms we know)

Linguistic “Gram” Space