April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 1 Layering of Annotations in the Penn Discourse TreeBank (PDTB) Rashmi Prasad Institute.

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Computational language: week 10 Lexical Knowledge Representation concluded Syntax-based computational language Sentence structure: syntax Context free.
Using Syntax to Disambiguate Explicit Discourse Connectives in Text Source: ACL-IJCNLP 2009 Author: Emily Pitler and Ani Nenkova Reporter: Yong-Xiang Chen.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Feature Structures and Unification.
Progress update Lin Ziheng. System overview 2 Components – Connective classifier Features from Pitler and Nenkova (2009): – Connective: because – Self.
Syntax-Semantics Mapping Rajat Kumar Mohanty CFILT.
Syntax and Context-Free Grammars Julia Hirschberg CS 4705 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin.
Layering Semantics (Putting meaning into trees) Treebank Workshop Martha Palmer April 26, 2007.
Semantic Role Labeling Abdul-Lateef Yussiff
Recognizing Implicit Discourse Relations in the Penn Discourse Treebank Ziheng Lin, Min-Yen Kan, and Hwee Tou Ng Department of Computer Science National.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
6/29/051 New Frontiers in Corpus Annotation Workshop, 6/29/05 Ann Bies – Linguistic Data Consortium* Seth Kulick – Institute for Research in Cognitive.
Annotation Types for UIMA Edward Loper. UIMA Unified Information Management Architecture Analytics framework –Consists of components that perform specific.
1 Annotation Guidelines for the Penn Discourse Treebank Part B Eleni Miltsakaki, Rashmi Prasad, Aravind Joshi, Bonnie Webber.
Recovering empty categories. Penn Treebank The Penn Treebank Project annotates naturally occurring text for linguistic structure. It produces skeletal.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics.
Extracting LTAGs from Treebanks Fei Xia 04/26/07.
Systematic Mismatches Across Annotations Alan Lee and Aravind Joshi Institute for Research in Cognitive Science & Department of Computer and Information.
Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Language-specific Issues Czech Jan Hajič Institute of Formal and Applied Linguistics.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
Workshop on Treebanks, Rochester NY, April 26, 2007 The Penn Treebank: Lessons Learned and Current Methodology Ann Bies Linguistic Data Consortium, University.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
Embedded Clauses in TAG
Chapter 4 Syntax Part II.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
SOME ASPECTS OF TRANSITION FROM SENTENCE TO DISCOURSE Aravind K. Joshi Department of Computer and Information Science and Institute for Research in Cognitive.
COP4020 Programming Languages
Writing Effective Sentences Prof ADama. Objective To help the student write clear and effective sentences.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
Automatic classification for implicit discourse relations Lin Ziheng.
AQUAINT Workshop – June 2003 Improved Semantic Role Parsing Kadri Hacioglu, Sameer Pradhan, Valerie Krugler, Steven Bethard, Ashley Thornton, Wayne Ward,
Discourse Connectives and Their Argument Structure: Annotating a discourse treebank ARAVIND K. JOSHI Department of Computer and Information Science October.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Linguistic Essentials
Conversion of Penn Treebank Data to Text. Penn TreeBank Project “A Bank of Linguistic Trees” (as of 11/1992) University of Pennsylvania, LINC Laboratory.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Rules, Movement, Ambiguity
NLP. Introduction to NLP Background –From the early ‘90s –Developed at the University of Pennsylvania –(Marcus, Santorini, and Marcinkiewicz 1993) Size.
Supertagging CMSC Natural Language Processing January 31, 2006.
Automatic recognition of discourse relations Lecture 3.
DiscAn : Towards a Discourse Annotation system for Dutch language corpora or why and how we would want to annotate corpora on the discourse level Ted Sanders.
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
NLP. Introduction to NLP (U)nderstanding and (G)eneration Language Computer (U) Language (G)
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
An Introduction to Semantic Parts of Speech Rajat Kumar Mohanty rkm[AT]cse[DOT]iitb[DOT]ac[DOT]in Centre for Indian Language Technology Department of Computer.
NLP. Parsing ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT.
X-Bar Theory. The part of the grammar regulating the structure of phrases has come to be known as X'-theory (X’-bar theory'). X-bar theory brings out.
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Feb 17 th.
Week 3. Clauses and Trees English Syntax. Trees and constituency A sentence has a hierarchical structure Constituents can have constituents of their own.
Natural Language Processing Vasile Rus
SOME ASPECTS OF TRANSITION FROM SENTENCE TO DISCOURSE
Authorship Attribution Using Probabilistic Context-Free Grammars
Improving a Pipeline Architecture for Shallow Discourse Parsing
LING/C SC/PSYC 438/538 Lecture 21 Sandiway Fong.
LING/C SC 581: Advanced Computational Linguistics
LING/C SC 581: Advanced Computational Linguistics
Probabilistic and Lexicalized Parsing
Constraining Chart Parsing with Partial Tree Bracketing
Linguistic Essentials
Progress report on Semantic Role Labeling
LING/C SC 581: Advanced Computational Linguistics
Presentation transcript:

April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 1 Layering of Annotations in the Penn Discourse TreeBank (PDTB) Rashmi Prasad Institute for Research in Cognitive Science University of Pennsylvania

April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 2 Discourse Relations in the PDTB  Argument Structure of Explicit/Implicit Conns (spans):  She hasn’t played any music since the earthquake hit.  “We asked police to investigate why they are allowed to distribute the flag in this way. Implicit=because It should be considered against the law,” said Danny Leish, a spokesman for the association.  Semantics (labels) of connectives: Temporal Causal  Attribution (spans and (4) features (labels)): Source= Writer (implicit), span= unmarked Source= Other agent, span= marked 3 other attribution features: Type: Assertion, Belief, Factive, Intention Scopal Polarity: I don’t think X > I think NOT X Determinacy: I might think X !> I think X

April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 3 Layering with the PTB  Stand-off annotations of connective, argument and attribution spans: Character offsets in the WSJ raw texts: generated during the annotation Tree node addresses of constituents in PTB trees (constituent sets for spans not dominated by a single node and for discontinuous text spans): generated in post-annotation phase

April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 4 PTB Affecting PDTB Choices  Distinct POS marking of connectives in the PTB could have allowed for automatic identification of connectives: For example, Discourse connective: (PP (IN For (NP (NN example )))) For John, Not a discourse connective: (PP (IN For (NP (NN John)))) Subordinating conjunctions marked as adverbs: When: (WHADP (WRB When ))  Effect of PS vs. dependency annotation: none

April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 5 PTB Affecting PDTB Choices Discourse relations occurring intra-sententially could have been marked in the underlying annotation if not constrained by certain syntactic choices: S says VBZ When WRB WHADVP-1 SBAR-TMP S he PRP NP-SBJ S Sue had already left VP John was hired Syntax incorrectly forces attribution to be the temporally modified element Syntax assumption: All words/phrases must be connected in a tree!

April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 6 What else could be annotated  Attribution phrases: since they often lead to a mismatch with discourse arguments of connectives  When Max was hired, he says Sue had already left. Representative list obtainable from PDTB. Directly observable during syntactic annotation.  Alternative Lexicalizations (AltLex): lexical realizations of discourse relations with non-connective expressions  Mary has been depressed lately. The reason: she failed Representative list obtainable from PDTB. May involve some multi-sentence processing.

April 26th, 2007 Workshop on Treebanking, HLT/NAACL, Rochester 7 Methodology and Quality Control  Choices made at more basic levels should make the task easier for discourse-level annotations. Do some annotations at more basic levels if it prevents a reassessment of annotator choices/judgements. Quality control can be done by checking existing annotations (or representative samples thereof)  Stand-off annotation: prevents incompatibilities in representation where unavoidable  Alignments with other layers to check for incompatibilities e.g., attribution in PDTB and PTB