Chunk Parsing CS1573: AI Application Development, Spring 2003

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

CS Morphological Parsing CS Parsing Taking a surface input and analyzing its components and underlying structure Morphological parsing:
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Omphalos Session. Omphalos Session Programme Design & Results 25 mins Award Ceremony 5 mins Presentation by Alexander Clark25 mins Presentation by Georgios.
Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)
Shallow Parsing CS 4705 Julia Hirschberg 1. Shallow or Partial Parsing Sometimes we don’t need a complete parse tree –Information extraction –Question.
1 A Sentence Boundary Detection System Student: Wendy Chen Faculty Advisor: Douglas Campbell.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004.
1 CSC 594 Topics in AI – Applied Natural Language Processing Fall 2009/ Shallow Parsing.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שבע Tagging, Partial Parsing Context.
1 I256: Applied Natural Language Processing Marti Hearst Sept 25, 2006.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
March 2006 CLINT-CS 1 Introduction to Computational Linguistics Chunk Parsing.
Robert Hass CIS 630 April 14, 2010 NP NP↓ Super NP tagging JJ ↓
ICS611 Introduction to Compilers Set 1. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
1 Statistical NLP: Lecture 10 Lexical Acquisition.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
October 2005CSA3180: Text Processing II1 CSA3180: Natural Language Processing Text Processing 2 Shallow Parsing and Chunking Python and NLTK NLTK Exercises.
Syntax The study of how words are ordered and grouped together Key concept: constituent = a sequence of words that acts as a unit he the man the short.
Ling 570 Day 17: Named Entity Recognition Chunking.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Lesson 3 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור שבע Partial Parsing אורן גליקמן.
Artificial Intelligence: Natural Language
CSA2050 Introduction to Computational Linguistics Parsing I.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
February 2007CSA3050: Tagging III and Chunking 1 CSA2050: Natural Language Processing Tagging 3 and Chunking Transformation Based Tagging Chunking.
1 Introduction to NLTK part 2 Euromasters SS Trevor Cohn Euromasters summer school 2005 Introduction to NLTK Part II Trevor Cohn July 12, 2005.
POS Tagger and Chunker for Tamil
October 2005CSA3180: Text Processing II1 CSA3180: Natural Language Processing Text Processing 2 Python and NLTK Shallow Parsing and Chunking NLTK Lite.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Chunk Parsing. Also called chunking, light parsing, or partial parsing. Method: Assign some additional structure to input over tagging Used when full.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
NATURAL LANGUAGE PROCESSING
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
Dependency Parsing Niranjan Balasubramanian March 24 th 2016 Credits: Many slides from: Michael Collins, Mausam, Chris Manning, COLNG 2014 Dependency Parsing.
Syntax Analysis Or Parsing. A.K.A. Syntax Analysis –Recognize sentences in a language. –Discover the structure of a document/program. –Construct (implicitly.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Natural Language Processing Vasile Rus
CSE 3302 Programming Languages
Advanced Computer Systems
Grammars and Parsing.
Statistical NLP: Lecture 3
CSCI 5832 Natural Language Processing
CS416 Compiler Design lec00-outline September 19, 2018
Syntax Analysis Sections :.
CSE 3302 Programming Languages
Dept. of Computer Science University of Liverpool
CSCI 5832 Natural Language Processing
CPSC 388 – Compiler Design and Construction
CS 388: Natural Language Processing: Syntactic Parsing
Interpreter Pattern.
CS416 Compiler Design lec00-outline February 23, 2019
CS246: Information Retrieval
CPSC 503 Computational Linguistics
CSCI 5832 Natural Language Processing
CSCE 590 Web Scraping – NLTK IE
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
David Kauchak CS159 – Spring 2019
Statistical NLP: Lecture 10
Presentation transcript:

Chunk Parsing CS1573: AI Application Development, Spring 2003 (modified from Steven Bird’s notes)

Light Parsing Difficulties with full parsing Motivations for Parsing Light (or “partial”) parsing Chunk parsing (a type of light parsing) Introduction Advantages Implementations REChunk Parser

Full Parsing Goal: build a complete parse tree for a sentence. Problems with full parsing: Low accuracy Slow Domain Specific These problems are relevant for both symbolic and statistical parsers

Full Parsing: Accuracy Full Parsing gives relatively low accuracy Exponential solution space Dependence on semantic context Dependence on pragmatic context Long-range dependencies Ambiguity Errors propagate

Full Parsing: Domain Specificity Full parsing tends to be domain specific Importance of semantic/lexical context Stylistic differences

Full Parsing: Efficiency Full parsing is very processor-intensive and memory-intensive Exponential solution space Large relevant context Long-range dependencies Need to process lexical content of each word Too slow to use with very large sources of text (e.g., the web).

Motivations for Parsing Why parse sentences in the first place? Parsing is usually an intermediate stage Builds structures that are used by later stages of processing Full Parsing is a sufficient but not neccessary intermediate stage for many NLP tasks. Parsing often provides more information than we need.

Light Parsing Goal: assign a partial structure to a sentence. Simpler solution space Local context Non-recursive Restricted (local) domain

Output from Light Parsing What kind of partial structures should light parsing construct? Different structures useful for different tasks: Partial constituent structure [NPI] [VPsaw [NP a tall man in the park]]. Prosodic segments (phi phrases) [I saw] [a tall man] [in the park] Content word groups [I] [saw] [a tall man] [in the park].

Chunk Parsing Goal: divide a sentence into a sequence of chunks. Chunks are non-overlapping regions of a text [I] saw [a tall man] in [the park] Chunks are non-recursive A chunk can not contain other chunks Chunks are non-exhaustive Not all words are included in the chunks

Chunk Parsing Examples Noun-phrase chunking: [I] saw [a tall man] in [the park]. Verb-phrase chunking: The man who [was in the park] [saw me]. Prosodic chunking: [I saw] [a tall man] [in the park].

Chunks and Constituency Constituants: [a tall man in [the park]]. Chunks: [a tall man] in [the park]. Chunks are not constituants Constituants are recursive Chunks are typically subsequences of constituants. Chunks do not cross constituant boundaries

Chunk Parsing: Accuracy Chunk parsing achieves higher accuracy Smaller solution space Less word-order flexibility within chunks than between chunks Fewer long-range dependencies Less context dependence Better locality No need to resolve ambiguity Less error propagation

Chunk Parsing: Domain Specificity Chunk parsing is less domain specific Dependencies on lexical/semantic information tend to occur at levels “higher” than chunks: Attachment Argument selection Movement Fewer stylistic differences with chunks

Psycholinguistic Motivations Chunk parsing is psycholinguistically motivated Chunks are processing units Humans tend to read texts one chunk at a time Eye movement tracking studies Chunks are phonologically marked Pauses Stress patterns Chunking might be a first step in full parsing

Chunk Parsing: Efficiency Chunk parsing is more efficient Smaller solution space Relevant context is small and local Chunks are non-recursive Chunk parsing can be implemented with a finite state machine Fast Low memory requirement Chunk parsing can be applied to a very large text sources (e.g., the web)

Chunk Parsing Techniques Chunk parsers usually ignore lexical content Only need to look at part-of-speech tags Techniques for implementing chunk parsing Regular expression matching Chinking Transformational regular expressions Finite state transducers

Regular Expression Matching Define a regular expression that matches the sequences of tags in a chunk A simple noun phrase chunk regrexp: <DT> ? <JJ> * <NN.?> Chunk all matching subsequences: The /DT little /JJ cat /NN sat /VBD on /IN the /DT mat /NN [The /DT little /JJ cat /NN] sat /VBD on /IN [the /DT mat /NN] If matching subsequences overlap, the first one gets priority Regular expressions can be cascaded

Chinking A chink is a subsequence of the text that is not a chunk. Define a regular expression that matches the sequences of tags in a chink. A simple chink regexp for finding NP chunks: (<VB.?> | <IN>)+ Chunk anything that is not a matching subsequence: the/DT little/JJ cat/NN sat/VBD on /IN the /DT mat/NN [the/DT little/JJ cat/NN] sat/VBD on /IN [the /DT mat/NN] chunk chink chunk

Transformational Regular Exprs Define regular-expressions transformations that add brackets to a string of tags. A transformational regexp for NP chunks: (<DT> ? <JJ> * <NN.?>)  {\1} - Note: use {} for bracketing because [] has special meaning for regular expressions Improper bracketing is an error. Use the regexp to add brackets to the text: The/DT little/JJ cat/NN sat/VBD on /IN the/DT mat/NN {The/DT little/JJ cat/NN} sat/VBD on /IN {the/DT mat/NN}

Transformational Regular Exprs (2) Chinking with transformational regular exprs: Put the entire text in one chunk: (<.*>*)  {\1} Then, add brackets that exclude chinks: ((<VB.?> | <IN>) +)  } \1{ Cascade these transformations: The/DT little/JJ cat/NN sat/VBD on /IN the/DT mat/NN

Transformational Regular Exprs (2) Chinking with transformational regular exprs: Put the entire text in one chunk: (<.*>*)  {\1} Then, add brackets that exclude chinks: ((<VB.?> | <IN>) +)  } \1{ Cascade these transformations: The/DT little/JJ cat/NN sat/VBD on /IN the/DT mat/NN {The/DT little/JJ cat/NN sat/VBD on /IN the/DT mat/NN}

Transformational Regular Exprs (2) Chinking with transformational regular exprs: Put the entire text in one chunk: (<.*>*)  {\1} Then, add brackets that exclude chinks: ((<VB.?> | <IN>) +)  } \1{ Cascade these transformations: The/DT little/JJ cat/NN sat/VBD on /IN the/DT mat/NN {The/DT little/JJ cat/NN sat/VBD on /IN the/DT mat/NN} {The/DT little/JJ cat/NN} sat/VBD on /IN {the/DT mat/NN}

Transformational Regular Exprs (3) Transformational regular expressions can remove brackets added by previous stages: {(<VB.?> | <IN>)}  \1 {the/DT} {little/JJ} {cat/NN} {sat/VBD} {on/IN} {the/DT} {mat/NN} {the/DT} {little/JJ} {cat/NN} sat/VBD on/IN {the/DT} {mat/NN}

Finite State Transducers A finite state machine that adds bracketing to a text. Efficient Other techniques can be implemented using finite state transducers: Matching regular expressions Chinking regular expressions Transformational regular expressions

Evaluating Performance Basic Measures Target Target Selected True Positive False Positive Selected False Negative True Negative Precision = What proportion of selected items are correct Recall = What a proportion of target items are selected?

REChunk Parser A regular expression-driven chunk parser Chunk rules are defined using transformational regular expressions Chunk rules can be cascaded